Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 Manual page 206

Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3:
Table of Contents

Advertisement

5.5.3.2
Pipelining with Explicit Multiple Exits
The second approach is to combine the last three instructions in the loop into a
br.cloop instruction and then pipeline the loop.
shown below:
stage 1:
stage 4:
stage 6:
(p1)
There are five speculative stages in this pipeline because a non-speculative decision to
initiate another loop iteration cannot be made until the br.cond and br.cloop are
executed in stage 6. The code to implement this pipeline is shown below assuming a
trip count of 200:
L1:
(p21)
(p21)
(p1)
L2:
When the kernel loop is exited at either the br.cond or the br.ctop, the last source
iteration is complete. Thus, EC is initialized to 1 and there is no explicit epilog block
generated for the early exit.
because there are five speculative stages. The purpose of the first five executions of
br.ctop is simply to keep the loop going until the first valid branch predicate is
generated for the br.cond. During each of these executions, LC is decremented, so five
must be added to the LC initialization amount to compensate.
A smaller II is achieved with the second approach. This pipelined code will also work if
LC is initialized to 199 and EC is initialized to 6. However, if the early exit is taken, LC
will have been decremented too many times and will need to be adjusted if it is used at
the target of the early exit. If there is any epilog when the early exit is taken, that
epilog must be explicit.
5.5.4
Software Pipelining Considerations
There may be instances where it may not be desirable to pipeline a loop. Software
pipelining increases the throughput of iterations, but may increase the time required to
complete a single iteration. As a result, loops with very small trip counts may
experience decreased performance when pipelined. For example, consider the following
loop:
L1:
Volume 1, Part 2: Software Pipelining and Loop Support
ld4.s r4 = [r5],4;;
ld4.s r9 = [r4];;
cmp.eq.unc p1,p0 = r9,r7
br.cond
exit
br.cloop L1;;
mov
lc = 204
mov
ec = 1
mov
pr.rot = 1 << 16;;
ld4.s
r32 = [r5],4
chk.s
r38, recovery
cmp.eq.uncp1,p0 = r38,r7
ld4.s
r36 = [r35]
br.cond exit
br.ctop.sptkL1;
The LC register is initialized to five more than 199
ld4
r4 = [r5],4
ld4
r7 = [r8],4;;
st4
[r6] = r4,4
st4
[r9] = r7,4
br.cloop
L1;;
The pipeline using this approach is
// II = 1
// PR16 = 1, rest = 0
// Cycle 0
// Cycle 0
// Cycle 0
// Cycle 0
// Cycle 0
// Cycle 0
// Cycle 0
// Cycle 0
// Cycle 2
// Cycle 2
// Cycle 2
1:195

Advertisement

Table of Contents
loading

This manual is also suitable for:

Itanium architecture 2.3

Table of Contents