About Program Flow Prediction - ARM ARM1176JZF-S Technical Reference Manual

Table of Contents

Advertisement

5.1

About program flow prediction

ARM DDI 0301H
ID012310
Program flow prediction in the processor is carried out by:
The integer core
Implements static branch prediction and the Return Stack.
The Prefetch Unit The PU implements dynamic branch prediction.
The processor is responsible for handling branches the first time they are executed, that is, when
no historical information is available for dynamic prediction by the PU.
The integer core makes static predictions about the likely outcome of a branch early in its
pipeline and then resolves those predictions when the outcome of conditional execution is
known. Condition codes are evaluated at three points in the integer core pipeline, and branches
are resolved as soon as the flags are guaranteed not to be modified by a preceding instruction.
When a branch is resolved, the integer core passes information to the PU so that it can make a
Branch Target Address Cache (BTAC) allocation or update an existing entry as appropriate. The
integer core is also responsible for identifying likely procedure calls and returns to predict the
returns. It can handle nested procedures up to three deep.
The integer core includes:
a Static Branch Predictor (SBP)
a Return Stack (RS)
branch resolution logic
a BTAC update interface to the PU
a BTAC allocate interface to the PU.
The processor PU is responsible for fetching instructions from the memory system as required
by the integer core, and coprocessors. The PU buffers up to seven instructions in its FIFO to:
detect branch instructions ahead of the integer core requirement
dynamically predict those that it considers are to be taken
provide branch folding of predicted branches if possible
identify unconditional procedure return instructions.
This reduces the cycle time of the branch instructions, so increasing processor performance.
The PU includes:
a BTAC
branch update and allocate logic
a Dynamic Branch Predictor (DBP), and associated update mechanism
branch folding logic.
It is responsible for providing the integer core with instructions, and for requesting cache
accesses. The pattern of cache accesses is based on the predicted instruction stream as
determined by the dynamic branch prediction mechanism or the integer core flush mechanism.
The BTAC can:
be globally flushed by a CP15 instruction
have individual entries flushed by a CP15 instruction
be enabled or disabled by a CP15 instruction.
For details of CP15 instructions see c7, Cache operations on page 3-69 and Flush operations on
page 3-79.
The BTAC is globally flushed for:
Main TLB FCSE PID changes
Copyright © 2004-2009 ARM Limited. All rights reserved.
Non-Confidential, Unrestricted Access
Program Flow Prediction
5-2

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents