Lesson 2: Balancing Resources With Dual-Data Paths
2-14
The first iteration interval (ii) attempted was two cycles because the Partitioned
Resource Bound is two. We can see the reason for this if we look below at the
.D units and the .T address paths. This loop requires two loads (from xptr and
yptr) and one store (to w_sum) for each iteration of the loop.
Each memory access requires a .D unit for address calculation, and a .T ad-
dress path to send the address out to memory. Because the 'C6000 has two
.D units and two .T address paths available on any given cycle (A side and B
side), the compiler must partition at least two of the operations on one side (the
A side). That means that these operations are the bottleneck in resources
(highlighted with an *) and are the limiting factor in the Partitioned Resource
Bound. The feedback in lesson1_c.asm shows that there is an imbalance in
resources between the A and B side due, in this case, to an odd number of op-
erations being mapped to two sides of the machine.
Q Is it possible to improve the balance of resources?
A One way to balance an odd number of operations is to unroll the loop. Now,
instead of three memory accesses, you will have six, which is an even number.
You can only do this if you know that the loop counter is a multiple of two; other-
wise, you will incorrectly execute too few or too many iterations. In tutor_d.c,
LOOPCOUNT is defined to be 40, which is a multiple of two, so you are able
to unroll the loop.
Q Why did the compiler not unroll the loop?
A In the limited scope of lesson1_c, the loop counter is passed as a parameter
to the function. Therefore, it might be any value from this limited view of the
function. To improve this scope you must pass more information to the compil-
er. One way to do this is by inserting a MUST_ITERATE pragma. A MUST_IT-
ERATE pragma is a way of passing iteration information to the compiler. There
is no code generated by a MUST_ITERATE pragma; it is simply read at com-
pile time to allow the compiler to take advantage of certain conditions that may
exist. In this case, we want to tell the compiler that the loop will execute a multi-
ple of 2 times; knowing this information, the compiler can unroll the loop auto-
matically.
Unrolling a loop can incur some minor overhead in loop setup. The compiler
does not unroll loops with small loop counts because unrolling may not reduce
the overall cycle count. If the compiler does not know what the minimum value
of the loop counter is, it will not automatically unroll the loop. Again, this is infor-
mation the compiler needs but does not have in the local scope of lesson1_c.
You know that LOOPCOUNT is set to 40, so you can tell the compiler that N
is greater than some minimum value. lesson2_c demonstrates how to pass
these two pieces of information.
Need help?
Do you have a question about the TMS320C6000 and is the answer not in the manual?
Questions and answers