Completion Unit - Motorola MPC750 User Manual

Risc
Hide thumbs Also See for MPC750:
Table of Contents

Advertisement

When a prediction is made, instruction fetching, dispatching, and execution continue from
the predicted path, but instructions cannot complete and write back results to architected
registers until the prediction is determined to be correct (resolved). When a prediction is
incorrect, the instructions from the incorrect path are flushed from the processor and
processing begins from the correct path. The MPC750 allows a second branch instruction
to be predicted; instructions from the second predicted instruction stream can be fetched
but cannot be dispatched.
Dynamic prediction is implemented using a 512-entry branch history table (BHT), a cache
that provides two bits per entry that together indicate four levels of prediction for a branch
instruction-not-taken, strongly not-taken, taken, strongly taken. When dynamic branch
prediction is disabled, the BPU uses a bit in the instruction encoding to predict the direction
of the conditional branch. Therefore, when an unresolved conditional branch instruction is
encountered, the MPC750 executes instructions from the predicted target stream although
the results are not committed to architected registers until the conditional branch is
resolved. This execution can continue until a second unresolved branch instruction is
encountered.
When a branch is taken (or predicted as taken), the instructions from the untaken path must
be flushed and the target instruction stream must be fetched into the IQ. The BTIC is a 64-
entry cache that contains the most recently used branch target instructions, typically in
pairs. When an instruction fetch hits in the BTIC, the instructions arrive in the instruction
queue in the next clock cycle, a clock cycle sooner than they would arrive from the
instruction cache. Additional instructions arrive from the instruction cache in the next clock
cycle. The BTIC reduces the number of missed opportunities to dispatch instructions and
gives the processor a one-cycle head start on processing the target stream.
The BPU contains an adder to compute branch target addresses and three user-control
registers-the link register (LR), the count register (CTR), and the CR. The BPU calculates
the return pointer for subroutine calls and saves it into the LR for certain types of branch
instructions. The LR also contains the branch target address for the Branch Conditional to
Link Register (bclrx) instruction. The CTR contains the branch target address for the
Branch Conditional to Count Register (bcctrx) instruction. Because the LR and CTR are
SPRs, their contents can be copied to or from any GPR. Because the BPU uses dedicated
registers rather than GPRs or FPRs, execution of branch instructions is largely independent
from execution of integer and floating-point instructions.
1.2.2.3 Completion Unit
The completion unit operates closely with the instruction unit. Instructions are fetched and
dispatched in program order. At the point of dispatch, the program order is maintained by
assigning each dispatched instruction a successive entry in the six-entry completion queue.
The completion unit tracks instructions from dispatch through execution and retires them
in program order from the two bottom entries in the completion queue (CQO and CQl).
Chapter 1. Overview
1-9

Advertisement

Table of Contents
loading

Table of Contents