Load Instruction Dependency; String/Multiple Operations; Load-And-Reserve And Store-Conditional Instructions - IBM A2 User Manual

Table of Contents

Advertisement

User's Manual
A2 Processor
• wrtee
• wrteei
• isync
• rfi
• rfci
• rfmci
Each of these instructions requires that the instruction stream be flushed and refetched immediately after the
instruction's execution, either at the next sequential address (for mtmsr, wrtee, wrteei, and isync), or at the
system call interrupt vector location (for sc), or at the interrupt return address (for rfi, rfci, and rfmci). Due to
the instruction refetching requirement and other instruction processing requirements, the minimum execution
time for a 2-instruction sequence involving one of these instructions as the first instruction is as follows:
• Fifteen cycles (for sc, wrteei, rfi, rfci, mtmsr, wrtee, and isync, and rfmci)
D.4.13 Load Instruction Dependency
Load instructions that obtain their data from the data cache provide their result in the EX6 pipeline stage.
Therefore, instruction sequences consisting of a load instruction followed immediately by an instruction that
uses the target GPR of the load instruction as an input operand generally take six cycles to complete, which
corresponds to a 4-cycle penalty.
Note that there are many other factors that affect the performance of load and other storage access instruc-
tions (such as whether or not their target location is in the data cache).These factors are described in more
detail in Loads, Stores, and Data Cache Organization on page 847.
D.4.14 String/Multiple Operations
All load string multiples are handled in microcode.
D.4.15 Load-and-Reserve and Store-Conditional Instructions
The store-conditional instructions (stwcx. and stdcx.) conditionally write memory based on the reservation.
Both the reservation and the write are performed outside of the A2 core, typically in the L2 cache. As a result,
after issuing a store-conditional instruction, all subsequent instructions for the same thread must wait for the
for the store-conditional instruction to complete before issuing from IU6. Therefore, the total execution time
for a stwcx. instruction followed by any other instruction is variable based on the stwcx. complete indication
from the L2 interface.
Similarly, a load-and-reserve instruction (lwarx and ldarx) must atomically perform a read and set the reser-
vation, again typically in the L2 cache. After issuing a load-and-reserve instruction, all subsequent instruc-
tions for the same thread must wait for the load-and-reserve instruction to complete before dispatching from
IU5. Therefore, the total execution time for an lwarx instruction followed by any other instruction is variable
based on the lwarx data return from the L2 interface.
Because load-and-reserve and store-conditional both operate directly on the L2 outside of the core, these
instructions must flush the line from the L1 data cache, in the case of a data cache hit. Load-and-reserve
instructions reload the line, however.
Instruction Execution Performance and Code Optimizations
Page 846 of 864
Version 1.3
October 23, 2012

Advertisement

Table of Contents
loading

Table of Contents