16.1
About cycle timings and interlock behavior
16.1.1
Changes in instruction flow overview
ARM DDI 0301H
ID012310
Complex instruction dependencies and memory system interactions make it impossible to
describe briefly the exact cycle timing behavior for all instructions in all circumstances. The
timings that this chapter describes are accurate in most cases. If precise timings are required you
must use a cycle-accurate model of the processor.
Unless otherwise stated, cycle counts and result latencies that this chapter describes are best case
numbers. They assume:
•
no outstanding data dependencies between the current instruction and a previous
instruction
•
the instruction does not encounter any resource conflicts
•
all data accesses hit in the MicroTLB and Data Cache, and do not cross protection region
boundaries
•
all instruction accesses hit in the Instruction Cache.
This section describes:
•
Changes in instruction flow overview
•
Instruction execution overview on page 16-3
•
Conditional instructions on page 16-4
•
Opposite condition code checks on page 16-4
•
Definition of terms on page 16-5.
To minimize the number of cycles, because of changes in instruction flow, the processor
includes a:
•
dynamic branch predictor
•
static branch predictor
•
return stack.
The dynamic branch predictor is a 128-entry direct-mapped branch predictor using VA bits
[9:3]. The prediction scheme uses a two-bit saturating counter for predictions that are:
•
Strongly Not Taken
•
Weakly Not Taken
•
Weakly Taken
•
Strongly Taken.
Only branches with a constant offset are predicted. Branches with a register-based offset are not
predicted. A dynamically predicted branch can be folded out of the instruction stream if the
following instruction arrives while the branch is within the prefetch instruction buffer. A
dynamically predicted branch takes one cycle or zero cycles if folded out.
The static branch predictor operates on branches with a constant offset that are not predicted by
the dynamic branch predictor. Static predictions are issued from the Iss stage of the main
pipeline, consequently a statically predicted branch takes four cycles.
The return stack consists of three entries, and as with static predictions, issues a prediction from
the Iss stage of the main pipeline. The return stack mispredicts if the value taken from the return
stack is not the value that is returned by the instruction. Only unconditional returns are
Copyright © 2004-2009 ARM Limited. All rights reserved.
Non-Confidential, Unrestricted Access
Cycle Timings and Interlock Behavior
16-2