Predecode; Branch Prediction - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization

Predecode

Branch Prediction

132
re p l a c e m e n t i s b a s e d o n a l e a s t -re c e n t ly u s e d ( L RU )
replacement algorithm.
The L1 instruction cache has an associated two-level translation
look-aside buffer (TLB) structure. The first-level TLB is fully
associative and contains 24 entries (16 that map 4-Kbyte pages
and eight that map 2-Mbyte or 4-Mbyte pages). The second-level
TLB is four-way set associative and contains 256 entries, which
can map 4-Kbyte pages.
Predecoding begins as the L1 instruction cache is filled.
Predecode information is generated and stored alongside the
instruction cache. This information is used to help efficiently
identify the boundari es between var iable len gth x86
instructions, to distinguish DirectPath from VectorPath
early-decode instructions, and to locate the opcode byte in each
instruction. In addition, the predecode logic detects code
branches such as CALLs, RETURNs and short unconditional
JMPs. When a branch is detected, predecoding begins at the
target of the branch.
The fetch logic accesses the branch prediction table in parallel
with the instruction cache and uses the information stored in
the branch prediction table to predict the direction of branch
instructions.
The AMD Athlon processor employs combinations of a branch
target address buffer (BTB), a global history bimodal counter
(GHBC) table, and a return address stack (RAS) hardware in
order to predict and accelerate branches. Predicted-taken
branches incur only a single-cycle delay to redirect the
instruction fetcher to the target instruction. In the event of a
mispredict, the minimum penalty is ten cycles.
The BTB is a 2048-entry table that caches in each entry the
predicted target address of a branch.
In addition, the AMD Athlon processor implements a 12-entry
return address stack to predict return addresses from a near or
far call. As CALLs are fetched, the next EIP is pushed onto the
22007E/0—November 1999
AMD Athlon™ Processor Microarchitecture

Advertisement

Table of Contents
loading

Table of Contents