Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 Manual page 210

Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3:
Table of Contents

Advertisement

predicate for the odd iteration is in predicate register X, the stage predicate for the
even iteration is in predicate register X-1. The pseudo-code to implement this pipeline
assuming an unknown trip count is shown below:
L1:
(p16)
(p18)
(p17)
(p16)
(p16)
(p16)
(p19)
(p18)
L2:
L3:
Notice that the stages are not equal in length. Stages 1 and 3 are one cycle each, and
stages 2 and 4 are two cycles each. Also, the length of the epilog phase varies with the
trip count. If the trip count is odd, the number of epilog stages is three, starting after
the br.cexit and ending at the br.ctop. If the trip count is even, the number of epilog
stages is two, starting after the br.ctop and ending at the br.ctop. The EC must be set
to account for the maximum number of epilog stages. Thus for this example, EC is
initialized to four. When the trip count is even, one extra epilog stage is executed and
br.exit L3 is taken. All of the stage predicates used during the extra epilog stages are
equal to 0, so nothing is executed.
The extra epilog stage for even trip counts can be eliminated by setting the target of
the br.cexit branch to the next sequential bundle and initializing EC to three as shown
below:
L1:
(p16)
(p18)
(p17)
(p16)
L4:
(p16)
(p16)
(p19)
(p18)
L2:
L3:
Volume 1, Part 2: Software Pipelining and Loop Support
add
r15 = r5,4
add
r18 = r8,4
mov
lc = r2
mov
ec = 4
mov
pr.rot=1<<16;;
ld4
r33 = [r5],8
add
r39 = r35,r38
add
r38 = r34,r37
ld4
r36 = [r8],8
br.cexit.spnt L3;;
ld4
r33 = [r15],8
ld4
r36 = [r18],8;;
st4
[r6]
= r40,8
st4
[r16] = r39,8
br.ctop.sptk L1;;
add
r15 = r5,4
add
r18 = r8,4
mov
lc = r2
mov
ec = 3
mov
pr.rot=1<<16;;
ld4
r33 = [r5],8
add
r39 = r35,r38
add
r38 = r34,r37
ld4
r36 = [r8],8
br.cexit.spnt L4;;
ld4
r33 = [r15],8
ld4
r36 = [r18],8;;
st4
[r6]
= r40,8
st4
[r16] = r39,8
br.ctop.sptk L1;;
// LC = loop count - 1
// EC = epilog stages + 1
// PR16 = 1, rest = 0
// Cycle 0 odd iteration
// Cycle 0 odd iteration
// Cycle 0 even iteration
// Cycle 0 odd iteration
// Cycle 0
// Cycle 1 even iteration
// Cycle 1 even iteration
// Cycle 2 odd iteration
// Cycle 2 even iteration
// Cycle 2
// LC = loop count - 1
// EC = epilog stages + 1
// PR16 = 1, rest = 0
// Cycle 0 odd iteration
// Cycle 0 odd iteration
// Cycle 0 even iteration
// Cycle 0 odd iteration
// Cycle 0
// Cycle 1 even iteration
// Cycle 1 even iteration
// Cycle 2 odd iteration
// Cycle 2 even iteration
// Cycle 2
1:199

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 and is the answer not in the manual?

Questions and answers

This manual is also suitable for:

Itanium architecture 2.3

Table of Contents