Table of Contents

Advertisement

User's Manual
A2 Processor
D.5.2 Loads
Load instructions proceed through EX1 to EX3 as described. If the load hits in the data cache, the data array
is accessed in EX4. The load result is produced and bypassed in EX5. If the load misses the data cache, the
load is placed in the load miss queue in EX5. When the load reaches the head of the LMQ, it arbitrates for the
L2 command port, and is sent to the L2. This can occur as soon as the next cycle.
For each load miss, the L2 responds with a reload, the data for the request cache line. The 64-byte cache line
reload is written into the data array in two 32-byte writes in two separate cycles. Because there is only one
shared port for the data array, other instructions that must access the data array in the same cycle as a reload
write are flushed when the L2 returns reload data quadwords back-to-back. For instance, a load hit in EX4
might collide with the write for a reload coming back from the L2. In this case, the reload wins access to the
data array, and the load is flushed. When the L2 returns reload data quadwords with gaps, a bubble is
requested in the pipe to avoid this structural hazard.
If there is a prior load miss to the same real line address already present in the LMQ, the load is flushed in
EX5. If the LMQ is full, the load is also flushed and flush is held.
For loads, if there is a prior store (integer or AXU) to the same address in the pipeline ahead of the load, no
penalties occur. Store data is properly aligned and forwarded to following load without penalty in all cases,
including partially overlapping cases.
Instructions that are dependent on a load are released from IU5 speculatively, assuming that the load hits in
the data cache. Hence, if the dependent instruction passes IU5 within seven cycles of a load miss, it is
flushed. After this window, IU5 is made aware of the load miss and stalls dependent instruction until the miss
is complete. Hence, when such a flushed instruction returns to IU5, it will then stall and not flush repeatedly.
A similar situation applies to write-after-write hazards on load misses. While a load miss is outstanding, the
thread cannot complete further writes to the same GPR. If an instruction, which writes the same register as a
prior load that misses, passes IU5 within 7 cycles of a load miss, it is flushed. After this window, IU5 is made
aware of the load miss and stalls dependent instructions until the miss is complete. Hence, when such a
flushed instruction returns to IU5, it will then stall and not flush repeatedly.
D.5.3 Stores
Store instructions proceed through EX1 to EX3 as described. In the case of a store hit, the store updates the
data array in EX4. The data cache is a write-through, no-write-allocate design. All stores, even hits, are sent
to the L2. All data cache lines are effectively clean, and cache lines are never castout to the L2. Store misses
do not read a line into the data cache from the L2. All stores are committed once presented to the L2, and no
acknowledgments are required.
There are no store buffers in the A2 core. The L2 is expected to contain sufficient store buffering. A credit-
based flow-control mechanism is used to indicate when the L2 runs out of buffering. If the A2 does not have
credits available to present a store to the L2 interface, the store is flushed.
A store hit might collide in EX4 with a reload coming back from the L2. In this case, the reload wins access to
the data array, and the stores are flushed.
For store misses, if there is a prior load miss to the same real line address present in the LMQ, then the store
is flushed in EX5.
Instruction Execution Performance and Code Optimizations
Version 1.3
Page 848 of 864
October 23, 2012

Advertisement

Table of Contents
loading

Table of Contents