5.5.5.2
Conflicts in the ALAT
Using an advanced load to remove a likely invariant load from a loop while advancing
another load inside the loop results in poor performance if the latter load targets a
rotating register. The advanced load that targets the rotating register will eventually
invalidate the ALAT entry for the loop invariant load. Thereafter, every execution of the
check load for the loop invariant load will cause an ALAT miss.
When more than one advanced load in the loop targets a rotating register, the registers
must be assigned and the register lifetimes controlled so that the check load for a
particular advanced load X is executed before any of the other advanced loads can
invalidate the entry allocated by load X. For example, the following loop successfully
targets rotating registers with two advanced loads without any ALAT misses because
the two advanced load – check load pairs never create more than 32 simultaneously
live ALAT entries:
L1:
(p16)
(p31)
(p16)
(p31)
When the code cannot be arranged to avoid ALAT misses, it may be best to assign static
registers to the destinations of the advanced loads and unroll the loop to explicitly
rename the destinations of the advanced loads where necessary.
example shows how to unroll the loop to avoid the use of rotating registers. The loop
has an II equal to 1 and the check load is executed one cycle (and one rotation) after
the advanced load:
L1:
(p16)
(p17)
Static registers can be assigned to the destinations of the loads if the loop is unrolled
twice:
L1:
(p16)
(p17)
(p16)
(p17)
L2:
Rotating registers could still be used for the values that are not generated by advanced
loads. The effect of this unrolling on instruction cache performance must be considered
as part of the cost of advancing a load.
Volume 1, Part 2: Software Pipelining and Loop Support
ld4.a
r32 = [r8]
ld4.c
r47 = [r8]
ld4.a
r48 = [r9]
ld4.c
r63 = [r9]
br.ctop L1;;
ld4.a
r33 = [r8]
ld4.c
r34 = [r8]
br.ctop L1;;
ld4.a
r3 = [r8]
ld4.c
r4 = [r8]
br.cexit L2;;
ld4.a
r4 = [r8]
ld4.c
r3 = [r8]
br.ctop L1;;
//
The following
1:197
Need help?
Do you have a question about the ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 and is the answer not in the manual?
Questions and answers