Intel Pentium II Developer's Manual page 29

Hide thumbs Also See for Pentium II:
Table of Contents

Advertisement

During every clock cycle, up to three Intel Architecture macro instructions can be decoded in
the ID1 pipestage. However, if the instructions are complex or are over seven bytes then the
decoder is limited to decoding fewer instructions.
The decoders can decode:
1. Up to three macro-instructions per clock cycle.
2. Up to six µops per clock cycle.
3. Macro-instructions up to seven bytes in length.
Pentium II processors have three decoders in the D1 pipestage. The first decoder is capable
of decoding one Intel Architecture macro-instruction of four or fewer µops in each clock
cycle. The other two decoders can each decode an Intel Architecture instruction of one µop in
each clock cycle. Instructions composed of more than four µops will take multiple cycles to
decode. When programming in assembly language, scheduling the instructions in a 4-1-1 µop
sequence increases the number of instructions that can be decoded each clock cycle. In
general:
Simple instructions of the register-register form are only one µop.
Load instructions are only one µop.
Store instructions have two µops.
Simple read-modify instructions are two µops.
Simple instructions of the register-memory form have two to three µops.
Simple read-modify write instructions are four µops.
Complex instructions generally have more than four µops, therefore they will take
multiple cycles to decode.
For the purpose of counting µops, MMX technology instructions are simple instructions. See
Appendix D in AP-526, Optimizations for Intel's 32-bit Processors (Order Number 242816)
for a table that specifies the number of µops for each instruction in the Intel Architecture
instruction set.
Once the µops are decoded, they will be issued from the In-Order Front-End into the
Reservation Station (RS), which is the beginning pipestage of the Out-of-Order core. In the
RS, the µops wait until their data operands are available. Once a µop has all data sources
available, it will be dispatched from the RS to an execution unit. If a µop enters the RS in a
data-ready state (that is, all data is available), then the µop will be immediately dispatched to
an appropriate execution unit, if one is available. In this case, the µop will spend very few
clock cycles in the RS. All of the execution units are clustered on ports coming out of the RS.
Once the µop has been executed it returns to the ROB, and waits for retirement.
In this pipestage, all data values are written back to memory and all µops are retired in-order,
three at a time. The figure below provides details about the Out-of-Order core and the In-
Order retirement pipestages.
MICRO-ARCHITECTURE OVERVIEW
2-11

Advertisement

Table of Contents
loading

Table of Contents