Branch Target Forwarding - ARM Cortex-M3 Technical Reference Manual

R2p0
Hide thumbs Also See for Cortex-M3:
Table of Contents

Advertisement

1.5

Branch target forwarding

ARM DDI 0337G
Unrestricted Access
The processor forwards certain branch types, by which the memory transaction of the
branch is presented at least a cycle earlier than when the opcode reaches execute.
Branch forwarding increases the performance of the core, because branches are a
significant part of embedded controller applications. Branches affected are PC relative
with immediate offset, or use LR as the target register. For conditional branches, by
opcode definition or within IT block, that are forwarded, the address must be presented
speculatively because the condition evaluation is an internal critical path.
Branch forwarding loses a fetch opportunity if speculated on a conditional opcode, but
is mitigated by a three-entry fetch queue and a mix of 16/32-bit opcodes and single
cycle ALU. The additional penalty is a cycle of pipeline stalling. The worst case is three
32-bit load/store single opcodes, the instructions word-unaligned, with no data
waitstates. The BRCHSTAT interface provides information on forwarded branches to
conditional execution, the direction if conditional, and a trailing registered evaluation
of success of the preceding conditional opcode. For more information on BRCHSTAT
see Branch status interface on page 15-6.
The performance of the core with ICODE registered with prefetch is effectively the
same as the core without the branch forwarding interface, around 10% slower. Branch
forwarding can be thought of as the internal address generation logic pre-registration to
the address interface, increasing flexibility to the memory controller if you have the
timing budget to make use of the information a cycle sooner. For example lower MHz
power sensitive targets, in 0.13u down to 65nm. Otherwise, you have the flexibility of
having access to this early address in your memory controller for lookups before
registration to the system.
Branch speculation is more costly against a wait-stated memory because of
mispredictions. To avoid this overhead, a rule in the controller that conditional branches
are not speculated but instead registered gives subroutine calls and returns the benefits
of branch forwarding without the mispredictions penalty. A refinement is to only predict
backward conditional branches to accelerate loops. Alternatively, with ARM compilers
favouring loops with unconditional branch backwards at the bottom and then
conditional branch forward tests on the loop limit, the core fetch queue being ahead at
the start of the loop yields good behavior.
The BRCHSTAT also includes other information about the next opcode to reach
execute. Unlike the forwarded branches where BRCHSTAT is incident with the
transaction, BRCHSTAT with respect to execute opcodes is a hint unrelated to any
transaction and can be asserted for multiple cycles. The controller can use this
information to suppress additional prefetching because it knows a branch is taken
shortly. This helps to avoid any trailing waitstates of the controller prefetch from
impacting the branch target when it is generated in execute.
Copyright © 2005-2008 ARM Limited. All rights reserved.
Non-Confidential
Introduction
1-15

Advertisement

Table of Contents
loading

Table of Contents