Fetch Arbitration; Next Instruction Fetch Address Computation; Instruction Cache Access And Alignment; Instruction Cache Misses - IBM A2 User Manual

Table of Contents

Advertisement

User's Manual
A2 Processor
D.2.1 Fetch Arbitration
Each cycle, any or all of the threads might or might not be available to perform a fetch. The IU0 stage selects
one thread for fetch in each cycle, from among those threads that are available. Common reasons why a
thread might not be available for fetch include instruction cache and I-ERAT misses and a full instruction
buffer.
The selection is done in a fair, round-robin manner with two priority levels. A thread has high-priority if its
instruction buffers are completely empty and no fetches are in-flight for that thread. Otherwise, the thread has
low priority. A high-priority thread is chosen if one is available.
D.2.2 Next Instruction Fetch Address Computation
Each cycle, the instruction fetch address for the next cycle is computed for all four threads. If the thread is
being flushed, the instruction fetch address is updated to the target address for the flush. This case includes
flushes due to branch mispredictions and taken branches. If the thread was selected for fetch, then the fetch
address is updated to the start of the next 16-byte-aligned fetch group; fetch follows the sequential not-taken
path until a taken branch is detected in IU5. If the thread was not selected, the fetch address is unchanged.
D.2.3 Instruction Cache Access and Alignment
The instruction cache is accessed in IU1. This includes the I-ERAT access to translate the effective instruc-
tion fetch address to a real address, the directory access to determine if the fetch hit in the instruction cache,
and the data array access to read the requested fetch group from the instruction cache.
All four ways of the instruction cache are pulled from the instruction cache. In IU2, the results of the four tag
comparisons are used to select one of the resulting ways. Furthermore, instructions before the fetch address
are discarded, and the fetch group is realigned so that the first fetched instruction is in the first slot.
D.2.4 Instruction Cache Misses
If the fetched line is not found in the instruction cache, the fetch address is reset back to the missing address,
and a request for the missing cache line is sent to the L2. That thread is not allowed to fetch until either the
line returns or a flush is detected. If a flush redirects the thread to begin fetching at a new address, the thread
is reenabled and can continue fetching while the prior instruction cache miss is outstanding, as long as the
thread hits in the instruction cache.
If a second instruction cache miss is detected, fetching is disabled until the first miss returns. Then, the
second miss is sent to the L2 cache. Only one instruction cache can be outstanding to the L2 per thread.
When an instruction cache miss returns data from the L2 cache, the line can be discarded rather than
inserted into the instruction cache. This occurs if the current fetch address is in a different 2 KB region than
the returned line. This can happen when a flush redirects fetch to a different region.
All threads are unable to fetch for four cycles when an instruction cache miss returns. This is because the
instruction cache data array has a single read-write port, and it is needed to write the new line into the instruc-
tion cache. If the returned data is discarded, the cache data array is available for instruction fetching. When a
back-invalidation from the L2 is received, all threads are unable to fetch instructions for one cycle.
Version 1.3
Instruction Execution Performance and Code Optimizations
October 23, 2012
Page 837 of 864

Advertisement

Table of Contents
loading

Table of Contents