Predecode; Branch Prediction - AMD Athlon Processor x86 Optimization Manual

X86 code optimization

page of 256

/ 256
Contents
Table of Contents
Bookmarks

Table of Contents

AMD Athlon™ Processor x86 Code Optimization

Predecode

Branch Prediction

132

re p l a c e m e n t i s b a s e d o n a l e a s t -re c e n t ly u s e d ( L RU )

replacement algorithm.

The L1 instruction cache has an associated two-level translation

look-aside buffer (TLB) structure. The first-level TLB is fully

associative and contains 24 entries (16 that map 4-Kbyte pages

and eight that map 2-Mbyte or 4-Mbyte pages). The second-level

TLB is four-way set associative and contains 256 entries, which

can map 4-Kbyte pages.

Predecoding begins as the L1 instruction cache is filled.

Predecode information is generated and stored alongside the

instruction cache. This information is used to help efficiently

identify the boundari es between var iable len gth x86

instructions, to distinguish DirectPath from VectorPath

early-decode instructions, and to locate the opcode byte in each

instruction. In addition, the predecode logic detects code

branches such as CALLs, RETURNs and short unconditional

JMPs. When a branch is detected, predecoding begins at the

target of the branch.

The fetch logic accesses the branch prediction table in parallel

with the instruction cache and uses the information stored in

the branch prediction table to predict the direction of branch

instructions.

The AMD Athlon processor employs combinations of a branch

target address buffer (BTB), a global history bimodal counter

(GHBC) table, and a return address stack (RAS) hardware in

order to predict and accelerate branches. Predicted-taken

branches incur only a single-cycle delay to redirect the

instruction fetcher to the target instruction. In the event of a

mispredict, the minimum penalty is ten cycles.

The BTB is a 2048-entry table that caches in each entry the

predicted target address of a branch.

In addition, the AMD Athlon processor implements a 12-entry

return address stack to predict return addresses from a near or

far call. As CALLs are fetched, the next EIP is pushed onto the

22007E/0—November 1999

AMD Athlon™ Processor Microarchitecture

Table of Contents

Need help?

Do you have a question about the Athlon Processor x86 and is the answer not in the manual?

Predecode; Branch Prediction - AMD Athlon Processor x86 Optimization Manual

Predecode

Branch Prediction

Need help?

Questions and answers

Subscribe to Our Youtube Channel

Related Manuals for AMD Athlon Processor x86

Related Content for AMD Athlon Processor x86

Table of Contents