Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 Manual page 921

Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3:
Table of Contents

Advertisement

br
group as br.ia are not allowed, since br.ia may implicitly reads all ARs. If an
illegal RAW dependency is present between an AR write and br.ia, the first IA-32
instruction fetch and execution may or may not see the updated AR value.
IA-32 instruction set execution leaves the contents of the ALAT undefined. Software
can not rely on ALAT values being preserved across an instruction set transition. All
registers left in the current register stack frame are undefined across an instruction
set transition. On entry to IA-32 code, existing entries in the ALAT are ignored. If
the register stack contains any dirty registers, an Illegal Operation fault is raised on
the br.ia instruction. The current register stack frame is forced to zero. To flush
the register file of dirty registers, the flushrs instruction must be issued in an
instruction group preceding the br.ia instruction. To enhance the performance of
the instruction set transition, software can start the register stack flush in parallel
with starting the IA-32 instruction set by 1) ensuring flushrs is exactly one
instruction group before the br.ia, and 2) br.ia is in the first B-slot. br.ia should
always be executed in the first B-slot with a hint of "static-taken" (default),
otherwise processor performance will be degraded.
If a br.ia causes any Itanium traps (e.g., Single Step trap, Taken Branch trap, or
Unimplemented Instruction Address trap), IIP will contain the original 64-bit target
IP. (The value will not have been zero extended from 32 bits.)
Another branch type is provided for simple counted loops. This branch type uses the
Loop Count application register (LC) to determine the branch condition, and does not
use a qualifying predicate:
• cloop: If the LC register is not equal to zero, it is decremented and the branch is
taken.
In addition to these simple branch types, there are four types which are used for
accelerating modulo-scheduled loops (see also
Support" on page
and two for while loops (which use the qualifying predicate). These loop types use
register rotation to provide register renaming, and they use predication to turn off
instructions that correspond to empty pipeline stages.
The Epilog Count application register (EC) is used to count epilog stages and, for some
while loops, a portion of the prolog stages. In the epilog phase, EC is decremented each
time around and, for most loops, when EC is one, the pipeline has been drained, and
the loop is exited. For certain types of optimized, unrolled software-pipelined loops, the
target of a br.cexit or br.wexit is set to the next sequential bundle. In this case, the
pipeline may not be fully drained when EC is one, and continues to drain while EC is
zero.
For these modulo-scheduled loop types, the calculation of whether the branch is taken
or not depends on the kernel branch condition (LC for counted types, and the qualifying
predicate for while types) and on the epilog condition (whether EC is greater than one
or not).
These branch types are of two categories: top and exit. The top types (ctop and wtop)
are used when the loop decision is located at the bottom of the loop body and therefore
a taken branch will continue the loop while a fall through branch will exit the loop. The
exit types (cexit and wexit) are used when the loop decision is located somewhere
other than the bottom of the loop and therefore a fall though branch will continue the
loop and a taken branch will exit the loop. The exit types are also used at intermediate
points in an unrolled pipelined loop. (For more details, see
"Modulo-scheduled Loop Support" on page
3:22
1:75). Two of these are for counted loops (which use the LC register),
Section 4.5.1, "Modulo-scheduled Loop
Section 4.5.1,
1:75).
Volume 3: Instruction Reference

Advertisement

Table of Contents
loading

This manual is also suitable for:

Itanium architecture 2.3

Table of Contents