Wtop Loop Trace - Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 Manual

Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3:
Table of Contents

Advertisement

Notice that the load for the second source iteration is executed before the compare and
branch of the first source iteration. That is, the load (and the update of r5) is
speculative. The loop condition is not computed until cycle X+2, but in order to
maximize the use of resources, it is desirable to start the second source iteration at
cycle X+1. Without the support for control speculation in the Itanium architecture, the
second source iteration could not be started until cycle X+3.
The computation of the loop condition for while loops is very different from that of
counted loops. In counted loops, it is possible to compute the loop condition in one
cycle using a counted loop branch. This is what a br.ctop instruction does when it sets
p16. In while loops, a compare must compute the loop condition and set the stage
predicates. The stages prior to the one containing the compare are called the
speculative stages of the pipeline, because it is not possible for the compare to
completely control the execution of these stages. Therefore, the stage predicate set by
the compare is used (after rotation) to control the first non-speculative stage of the
pipeline.
The pipelined version of the while loop on
speculative load is included:
L1:
(p18)
(p18)
(p18)
(p17)
L2:
To explain why the kernel loop is programmed the way it is, it is helpful to examine a
trace of the execution of the loop (assume there are 200 source iterations) shown in
Table
5-2.
There is no stage predicate assigned to the load because it is speculative. The compare
sets p17. This is the branch predicate for the current iteration and, after rotation, the
stage predicate for the first non-speculative stage (stage three) of the next source
iteration. During the prolog, the compare cannot produce its first valid result until cycle
two. The initialization of the predicates provides a pipeline that disables the compare
until the first source iteration reaches stage two in cycle two.
compare starts generating stage predicates to control the non-speculative stages of the
pipeline. Notice that the compare is conditional. If it were unconditional, it would
always write a zero to p17 and the pipeline would not get started correctly.
Table 5-2.
Cycle
0
1
2
Volume 1, Part 2: Software Pipelining and Loop Support
mov
ec = 2
mov
pr.rot = 1 << 16;;
ld4.s
r32 = [r5],4
chk.s
r34, recovery
cmp.ne p17,p0 = r34,r0
st4
[r6] = r34,4
br.wtop.sptkL1;;

wtop Loop Trace

Port/Instructions
M
I
I
ld4.s
ld4.s
ld4.s
cmp
chk
page 1:190
is shown below. A check for the
// PR16 = 1, rest = 0
// Cycle 0
// Cycle 0
// Cycle 0
// Cycle 0
// Cycle 0
M
B
p16
br.wtop
br.wtop
st4
br.wtop
At that point the
State before br.wtop
p17
p18
1
0
0
0
1
0
0
1
1
EC
2
1
1
1:191

Advertisement

Table of Contents
loading

This manual is also suitable for:

Itanium architecture 2.3

Table of Contents