Branching; Register Rotation; Floating-Point Architecture - Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 Manual

Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3:
Table of Contents

Advertisement

2.8

Branching

In addition to removing branches through the use of predication, several mechanisms
are provided to decrease the branch misprediction rate and the cost of the remaining
mispredicted branches. These mechanisms provide ways for the compiler to
communicate information about branch conditions to the processor.
Branch predict instructions are provided which can be used to communicate an early
indication of the target address and the location of the branch. The compiler will try to
indicate whether a branch should be predicted dynamically or statically. The processor
can use this information to initialize branch prediction structures, enabling good
prediction even the first time a branch is encountered. This is beneficial for
unconditional branches or in situations where the compiler has information about likely
branch behavior.
For indirect branches, a branch register is used to hold the target address. Branch
predict instructions provide an indication of which register will be used in situations
when the target address can be computed early. A branch predict instruction can also
signal that an indirect branch is a procedure return, enabling the efficient use of
call/return stack prediction structures.
Special loop-closing branches are provided to accelerate counted loops and
modulo-scheduled loops. These branches and their associated branch predict
instructions provide information that allows for perfect prediction of loop termination,
thereby eliminating costly mispredict penalties and a reduction of the loop overhead.
2.9

Register Rotation

Modulo scheduling of a loop is analogous to hardware pipelining of a functional unit
since the next iteration of the loop starts before the previous iteration has finished. The
iteration is split into stages similar to the stages of an execution pipeline. Modulo
scheduling allows the compiler to execute loop iterations in parallel rather than
sequentially. The concurrent execution of multiple iterations traditionally requires
unrolling of the loop and software renaming of registers. The Itanium architecture
allows the renaming of registers which provide every iteration with its own set of
registers, avoiding the need for unrolling. This kind of register renaming is called
register rotation. The result is that software pipelining can be applied to a much wider
variety of loops
2.10

Floating-point Architecture

The Itanium architecture defines a floating-point architecture with full IEEE support for
the single, double, and double-extended (80-bit) data types. Some extensions, such as
a fused multiply and add operation, minimum and maximum functions, and a register
file format with a larger range than the double-extended memory format, are also
included. 128 floating-point registers are defined. Of these, 96 registers are rotating
(not stacked) and can be used to modulo schedule loops compactly. Multiple
floating-point status registers are provided for speculation.
Volume 1, Part 1: Introduction to the Intel
both small as well as large with significantly reduced overhead.
®
®
Itanium
Architecture
1:19

Advertisement

Table of Contents
loading

This manual is also suitable for:

Itanium architecture 2.3

Table of Contents