Early Decoding - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

22007E/0—November 1999

Early Decoding

DirectPath Decoder
VectorPath Decoder
AMD Athlon™ Processor Microarchitecture
return stack. Subsequent RETs pop a predicted return address
off the top of the stack.
T h e D i re c t Pa t h a n d Ve c t o r Pa t h d e c o d e r s p e r f o r m
early-decoding of instructions into MacroOPs. A MacroOP is a
fixed length instruction which contains one or more OPs. The
output s of the early dec oders keep all (D irectPat h o r
VectorPath) instructions in program order. Early decoding
produces three MacroOPs per cycle from either path. The
outputs of both decoders are multiplexed together and passed
to the next stage in the pipeline, the instruction control unit.
When the target 16-byte instruction window is obtained from
the instruction cache, the predecode data is examined to
determine which t ype of basic decode should occ ur —
DirectPath or VectorPath.
DirectPath instructions can be decoded directly into a
MacroOP, and subsequently into one or two OPs in the final
issue stage. A DirectPath instruction is limited to those x86
instructions that can be further decoded into one or two OPs.
The length of the x86 instruction does not determine DirectPath
instructions. A maximum of three DirectPath x86 instructions
can occupy a given aligned 8-byte block. 16-bytes are fetched at
a time. Therefore, up to six DirectPath x86 instructions can be
passed into the DirectPath decode pipeline.
Uncommon x86 instructions requiring two or more MacroOPs
proceed down the VectorPath pipeline. The sequence of
MacroOPs is produced by an on-chip ROM known as the MROM.
The VectorPath decoder can produce up to three MacroOPs per
cycle. Decoding a VectorPath instruction may prevent the
simultaneous decode of a DirectPath instruction.
AMD Athlon™ Processor x86 Code Optimization
133

Advertisement

Table of Contents
loading

Table of Contents