Table of Contents

Advertisement

AMD-K5 Processor Technical Reference Manual
2.2.1

Fetch

2-6
The processor can fetch up to 16 bytes per clock out of the
instruction cache. Fetching begins with the calculation of the
linear address for the next instruction along a predicted
branch of the x86 instruction stream. The address accesses the
instruction cache or, during a miss, the prefetch cache. Fetch-
ing can occur along a single execution stream with up to three
taken branches. Fetches that miss both the instruction cache
and prefetch cache are driven to the prefetcher.
In addition to fetching instructions, the fetch logic handles
branch predictions and detects conditions requiring pipeline
invalidation and restarting, such as context switches or
branches into cache lines that do not contain the correct prede-
code state. Branches are dynamically predicted on a cache-line
basis using a 1-bit algorithm. Each of the 1024 instruction-
cache lines has a tag that predicts the last byte in the cache
line to be executed, whether or not the branch will be taken,
and the cache index of the branch target (called the successor
index). When the caches are invalidated, all branch predictions
are cleared.
During prefetch all branch instructions are predicted as not-
taken. Later, if the execution of a branch instruction reveals a
misprediction, the fetch unit backs out of the branch by invali-
dating all speculative states in the prefetch cache, reorder
buffer, load/store reservation station, and store buffer. Then,
for cacheable instructions, the branch prediction stored in the
instruction cache is updated while the correct branch target is
fetched. Prediction updates are disabled when the branch
instruction is non-cacheable, because no prediction informa-
tion is saved for non-cacheable instructions.
In typical x86 desktop programs, a branch occurs about once
every seven x86 instructions. Without branch prediction,
branch targets remain unresolved until the execution phase,
which creates pipeline delays. The processor's branch-predic-
tion mechanism accurately predicts 70% to 85% of branches
(depending on program behavior) and has a misprediction pen-
alty of only three processor clocks.
18524C/0—Nov1996
Internal Architecture

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Amd-k5

Table of Contents