Xilinx MicroBlaze Reference Manual page 52

Hide thumbs Also See for MicroBlaze:
Table of Contents

Advertisement

Branches
Normally the instructions in the fetch and decode stages (as well as prefetch buffer) are
flushed when executing a taken branch. The fetch pipeline stage is then reloaded with a new
instruction from the calculated branch address. A taken branch in MicroBlaze takes three
clock cycles to execute, two of which are required for refilling the pipeline. To reduce this
latency overhead, MicroBlaze supports branches with delay slots and the optional branch
target cache.
Delay Slots
When executing a taken branch with delay slot, only the fetch pipeline stage in MicroBlaze
is flushed. The instruction in the decode stage (branch delay slot) is allowed to complete.
This technique effectively reduces the branch penalty from two clock cycles to one. Branch
instructions with delay slots have a D appended to the instruction mnemonic. For example,
the BNE instruction does not execute the subsequent instruction (does not have a delay
slot), whereas BNED executes the next instruction before control is transferred to the
branch location.
A delay slot must not contain the following instructions: IMM, branch, or break. Interrupts
and external hardware breaks are deferred until after the delay slot branch has been
completed. Instructions that could cause recoverable exceptions (for example unaligned
word or halfword load and store) are allowed in the delay slot.
If an exception is caused in a delay slot the ESR[DS] bit is set, and the exception handler is
responsible for returning the execution to the branch target (stored in the special purpose
register BTR). If the ESR[DS] bit is set, register R17 is not valid (otherwise it contains the
address following the instruction causing the exception).
Branch Target Cache
To improve branch performance, MicroBlaze provides a branch target cache (BTC) coupled
with a branch prediction scheme. With the BTC enabled, a correctly predicted immediate
branch or return instruction incurs no overhead.
The BTC operates by saving the target address of each immediate branch and return
instruction the first time the instruction is encountered. The next time it is encountered, it
is usually found in the Branch Target Cache, and the Instruction Fetch Program Counter is
then simply changed to the saved target address, in case the branch should be taken.
Unconditional branches and return instructions are always taken, whereas conditional
branches use branch prediction, to avoid taking a branch that should not have been taken
and vice versa.
The BTC is cleared when a memory barrier (MBAR 0) or synchronizing branch (BRI 4) is
executed. This also occurs when the memory barrier or synchronizing branch follows
immediately after a branch instruction, even if that branch is taken. To avoid inadvertently
MicroBlaze Processor Reference Guide
UG984 (v2018.2) June 21, 2018
www.xilinx.com
Chapter 2: MicroBlaze Architecture
Send Feedback
52

Advertisement

Table of Contents
loading

Table of Contents