Microcode Unit; Integer Unit; Registers - IBM A2 User Manual

Table of Contents

Advertisement

User's Manual
A2 Processor

2.3.8.5 Microcode Unit

The microcode unit (uCode) is partially shared and partially duplicated logic. The ROM that contains the
actual stream of instructions to be issued is a shared unit; however, each thread contains its own microcode
engine so that all four threads can be within a uCode stream at the same time. One of the engines will read a
single command from the ROM each cycle based upon a fair round-robin scheme (not based upon the thread
priority level for the issue logic), and issue that command to the appropriate thread's instruction buffer. If the
instruction buffer is over halfway filled, the uCode will stop issuing new commands. In addition, it will not
include this thread for ROM reads until the instruction buffer has drained below this point.

2.3.8.6 Integer Unit

The integer execution unit is shared between threads because there is a unified execution, load/store, and
branch pipeline. Exceptions and flushes from one thread usually will not affect another thread.
However, a flush that will affect all threads when encountered by one of the threads is caused by a data
cache invalidate (DCI) or instruction cache invalidate (ICI) that reaches completion. A DCI or ICI will flush all
threads for one cycle to allow the L1 caches to be invalidated. Software is required to guarantee that the load
miss queue is empty for all threads before execution of a DCI.
Another flush condition caused by one thread that can affect another thread occurs when reload data
returning for an outstanding load collides with a load or store at the data cache array pins.
For a comprehensive list of flush conditions, see Interrupt Conditions on page 854.
Some multiply operations and all divide operations require recirculation within the multiply/divide unit, there-
fore blocking all other threads from executing multiplies and divides. This does not prevent other threads from
executing any instructions other than multiplies and divides. If any multiply or divide instructions are issued
and collide with a recirculating multiply or divide, the younger instructions are flushed. In the case of the multi-
plier, the size of the operands determines how many cycles are needed for recirculation. The width of the
multiplier is 32 bits by 32 bits, so any operations that require multiplying 64-bit operands will require recircula-
tion. If both operands are 32 bits, no recirculation is needed (in other words, the instruction is pipelined as
normal). The width of the divider is 64 bits. Divide instructions dealing with 64-bit operands recirculate for 65
cycles, and operations with 32-bit operands recirculate for 32 cycles. No divide instructions are pipelined; they
all require some recirculation.
A forward progress timer monitors that each thread is making forward progress. If the thread appears to be
hung, thread priorities are adjusted to break out of a potential live-lock condition.

2.4 Registers

This section provides an overview of the register categories and types provided by the A2 core. Detailed
descriptions of each of the registers are provided within the chapters covering the functions with which they
are associated (for example, the cache control and cache debug registers are described in Instruction and
Data Caches on page 169). An alphabetical summary of all registers, including bit definitions, is provided in
Register Summary on page 529
All registers in the A2 core are architected as 64 bits wide, although certain bits in some registers are
reserved and thus not necessarily implemented. For all registers with fields marked as reserved, these
reserved fields should be written as 0 and read as undefined. The recommended coding practice is to
CPU Programming Model
Version 1.3
Page 82 of 864
October 23, 2012

Advertisement

Table of Contents
loading

Table of Contents