Table 16-1 Pipeline Stages - ARM ARM1176JZF-S Technical Reference Manual

Table of Contents

Advertisement

16.1.2
Instruction execution overview
ARM DDI 0301H
ID012310
predicted. A conditional return pops an entry from the return stack but is not predicted. If the
return stack is empty a return is not predicted. Items are placed on the return stack from the
following instructions:
BL #<immed>
BLX #<immed>
BLX Rx
Items are popped from the return stack by the following types of instruction:
BX lr
MOV pc, lr
LDR pc, [sp], #cns
LDMIA sp!, {....,pc}
A correctly predicted return stack pop takes four cycles.
The instruction execution pipeline is constructed from three parallel four-stage pipelines. See
Table 16-1. For a complete description of these pipeline stages see Pipeline stages on page 1-26.
The ALU and multiply pipelines operate in a lock-step manner, causing all instructions in these
pipelines to retire in order. The load/store pipeline is a decoupled pipeline enabling subsequent
instructions in the ALU and multiply pipeline to complete underneath outstanding loads.
Extensive forwarding to the Sh, MAC1, ADD, ALU, MAC2, and DC1 stages enables many
dependent instruction sequences to run without pipeline stalls. General forwarding occurs from
the ALU, Sat, WBex and WBls pipeline stages. In addition, the multiplier contains an internal
multiply accumulate forwarding path. Most instructions do not require a register until the ALU
stage. All result latencies are given as the number of cycles until the register is required by a
following instruction in the ALU stage.
The following sequence takes four cycles:
LDR R1, [R2]
ADD R3, R3, R1
If a subsequent instruction requires the register at the start of the Sh, MAC1, or ADD stage then
an extra cycle must be added to the result latency of the instruction producing the required
register. Instructions that require a register at the start of these stages are specified by describing
that register as an Early Reg. The following sequence, requiring an Early Reg, takes five cycles:
LDR R1, [R2]
ADD R3, R3, R1 LSL#6
Copyright © 2004-2009 ARM Limited. All rights reserved.
Non-Confidential, Unrestricted Access
Pipeline
ALU
Multiply
Load/Store
;Result latency three
;Register R1 required by ALU
;Result latency three plus one
;plus one because Register R1 is required by Sh
Cycle Timings and Interlock Behavior

Table 16-1 Pipeline stages

Stages
Sh
ALU
Sat
MAC1
MAC2
MAC3
ADD
DC1
DC2
WBex
WBls
16-3

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents