Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 Manual page 205

Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3:
Table of Contents

Advertisement

5.5.3.1
Converting Multiple Exit Loops to Single Exit Loops
The first is to transform the multiple exit loop into a single exit loop. In the source loop,
execution of the add, the second compare and the second branch is guarded by the first
branch. The loop can be transformed into a single exit loop by using predicates to guard
the execution of these instructions and moving the early exit branch out of the loop as
shown below:
L1:
(p2)
(p3)
(p1)
The computation of p3 determines if either exit of the source loop would have been
taken. If p3 is zero, the loop is exited and p1 is used to determine which exit was
actually taken. The add is executed speculatively (it is not guarded by p2) to keep the
dependency from the cmp.eq to the add from limiting the II. It is assumed that either
r8 is not live out at the early exit or that compensation code is added at the target of
the early exit. The pipeline for this loop is shown below with the stage predicate
assignments but no other rotating register allocation. The compare and the branch at
the end of stage 4 are not assigned stage predicates because they already have
qualifying predicates in the source loop:
stage 1:ld4.s
stage 2:---
stage 3:---
stage 4:
(p19)
(p19)
(p2)
(p3)
The code to implement this pipeline is shown below complete with the chk instruction:
L1:
(p19)
(p19)
(p19)
(p32)
L2:
(p18)
(p32)
Note: When the loop is exited, one final rotation occurs, rotating the value in p31 to
p32. Thus, p32 is used as the branch predicate for the early exit branch.
1:194
ld4
r4 = [r5],4;;
ld4
r9 = [r4];;
cmp.eq.uncp1,p2 = r9,r7
add
r8 = -1,r8;;
cmp.ge.unc p3,p0 = r8,r0
br.cond L1;;
br.cond exit
// early exit if p1 is 1
r4 = [r5],4;;
---
ld4.s
r9 = [r4];;
add
r8 = -1,r8
cmp.eq.uncp1,p2 = r9,r7;;
cmp.ge.uncp3,p0 = r8,r0
br.cond L1;;
mov
ec = 3
mov
pr.rot = 1 << 16;;
ld4.s
r32 = [r5],4
chk.s
r36, recovery
add
r8
= -1,r8
cmp.eq.unc p31,p32 = r36,r7;; // Cycle 0
ld4.s
r34 = [r33]
cmp.ge p18,p0 = r8,r0
br.wtop.sptk L1;;
br.cond exit
// II = 2
// empty cycle
// empty cycle
// empty stage
// PR16 = 1, rest = 0
// Cycle 0
// Cycle 0
// Cycle 0
// Cycle 1
// Cycle 1
// Cycle 1
// early exit if p32 is 1
Volume 1, Part 2: Software Pipelining and Loop Support

Advertisement

Table of Contents
loading

This manual is also suitable for:

Itanium architecture 2.3

Table of Contents