ARM ARM1176JZF-S Technical Reference Manual page 311

Table of Contents

Advertisement

5.2.3
Static branch predictor
5.2.4
Branch folding
ARM DDI 0301H
ID012310
The BTAC provides dynamic prediction of branches, including BL and BLX instructions in both
ARM, Thumb, and Jazelle states. The BTAC is a 128-entry direct-mapped cache structure used
for allocation of Branch Target Addresses for resolved branches. The BTAC uses a 2-bit
saturating prediction history scheme to provide the dynamic branch prediction. When a branch
has been allocated into the BTAC, it is only evicted in the case of a capacity clash. That is, by
another branch at the same index.
The prediction is based on the previous behavior of this branch. The four possible states of the
prediction bits are:
strongly predict branch taken
weakly predict branch taken
weakly predict branch not taken
strongly predict branch not taken.
The history is updated for each occurrence of the branch. This updating is scheduled by the
integer core when the branch has been resolved.
Branch entries are allocated into the BTAC after having been resolved at Execute. BTAC hits
enable branch prediction with zero cycle delay. When a BTAC hit occurs, the Branch Target
Address stored in the BTAC is used as the Program Counter for the next Fetch. Both branches
resolved taken and not taken are allocated into the BTAC. This enables the BTAC to do the most
useful amount of work and improves performance for tight backward branching loops.
The second level of branch prediction in the processor uses static branch prediction that is based
solely on the characteristics of a branch instruction. It does not make use of any history
information. The scheme used in the ARM1176JZF-S processor predicts that all forward
conditional branches are not taken and all backward branches are taken. Around 65% of all
branches are preceded by enough non-branch cycles to be completely predicted.
Branch prediction is performed only when the Z bit in CP15 Register c1 is set to 1. See c1,
Control Register on page 3-44 for details of this register. Dynamic prediction works on the basis
of caching the previously seen branches in the BTAC, and like all caches suffers from the
compulsory miss that exists on the first encountering of the branch by the predictor. A second
static predictor is added to the design to counter these misses, and to deal with any capacity and
conflict misses in the BTAC. The static predictor amounts to an early evaluation of branches in
the pipeline, combined with a predictor based on the direction of the branches to handle the
evaluation of condition codes that are not known at the time of the handling of these branches.
Only items that have not been predicted in the dynamic predictor are handled by the static
predictor.
The static branch predictor is hard-wired with backward branches being predicted as taken, and
forward branches as not taken. The SBP looks at the MSB of the branch offset to determine the
branch direction. Statically predicted taken branches incur a one-cycle delay before the target
instructions start refilling the pipeline. The SBP works in both ARM and Thumb states. The SBP
does not function in Jazelle state.
Branch folding is a technique where, on the prediction of most branches, the branch instruction
is completely removed from the instruction stream presented to the execution pipeline. Branch
folding can significantly improve the performance of branches, taking the CPI for branches
significantly lower than 1.
Branch folding only operates in ARM and Thumb states.
Copyright © 2004-2009 ARM Limited. All rights reserved.
Non-Confidential, Unrestricted Access
Program Flow Prediction
5-5

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents