Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 Manual page 194

Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3:
Table of Contents

Advertisement

utilization can be increased by unrolling the loop more times, but at the cost of further
code expansion. The loop below is unrolled four times (assuming the trip count is
multiple of four):
L1:
The two memory ports are now utilized in every cycle except cycle 2. Four iterations are
now executed in five cycles verses the two iterations in four cycles for the previous
version of the loop.
5.3.2
Software Pipelining
Software pipelining is a technique that seeks to overlap loop iterations in a manner that
is analogous to hardware pipelining of a functional unit. Each iteration is partitioned into
stages with zero or more instructions in each stage. A conceptual view of a single
pipelined iteration of the loop from
shown below:
stage 1:ld4 r4 = [r5],4
stage 2:---
stage 3:add r7 = r4,r9
stage 4:st4 [r6] = r7,4
The following is a conceptual view of five pipelined iterations:
1
2
----------------------------------------------------
ld4
ld4
add
st4 add
st4 add
The number of cycles between the start of successive iterations is called the initiation
interval (II). In the above example, the II is one. Each stage of a pipelined iteration is II
cycles long. Most of the examples in this chapter utilize modulo scheduling, which is a
particular form of software pipelining in which the II is a constant and every iteration of
Volume 1, Part 2: Software Pipelining and Loop Support
add
r15 = 4,r5
add
r25 = 8,r5
add
r35 = 12,r5
add
r16 = 4,r6
add
r26 = 8,r6
add
r36 = 12,r6;;
ld4
r4 = [r5],16
ld4
r14 = [r15],16;;
ld4
r24 = [r25],16
ld4
r34 = [r35],16;;
add
r7 = r4,r9
add
r17 = r14,r9;;
st4
[r6] = r7,16
st4
[r16] = r17,16
add
r27 = r24,r9
add
r37 = r34,r9;;
st4
[r26] = r27,16
st4
[r36] = r37,16
br.cloop L1;;
3
4
5
Cycle
X
X+1
ld4
X+2
ld4
X+3
ld4
X+4
st4 add
X+5
st4 add
X+6
st4
X+7
// Cycle 0
// Cycle 0
// Cycle 1
// Cycle 1
// Cycle 2
// Cycle 2
// Cycle 3
// Cycle 3
// Cycle 3
// Cycle 3
// Cycle 4
// Cycle 4
// Cycle 4
page 1:181
in which each stage is one cycle long is
// empty stage
1:183

Advertisement

Table of Contents
loading

This manual is also suitable for:

Itanium architecture 2.3

Table of Contents