Lesson_C.asm - Texas Instruments TMS320C6000 Programmer's Manual

Hide thumbs Also See for TMS320C6000:
Table of Contents

Advertisement

Example 2–4. lesson_c.asm
A schedule with ii = 10, implies that each iteration of the loop takes ten cycles.
Obviously, with eight resources available every cycle on such a small loop, we
would expect this loop to do better than this.
Q Where are the problems with this loop?
A A closer look at the feedback in lesson_c.asm gives us the answer.
Q Why did the loop start searching for a software pipeline at ii=10 (for a
10–cycle loop)?
A The first iteration interval attempted by the compiler is always the maximum
of the Loop Carried Dependency Bound and the Partitioned Resource Bound.
In such a case, the compiler thinks there is a loop carry path equal to ten
cycles:
;* Loop Carried Dependency Bound(^) : 10
The ^ symbol is interspersed in the assembly output in the comments of each
instruction in the loop carry path, and is visible in lesson_c.asm.
L2:
; PIPED LOOP KERNEL
LDH
||
LDH
NOP
[ B0]
SUB
[ B0]
B
MPY
||
MPY
NOP
ADD
SHR
STH
You can also use a dependency graph to analyze feedback, for example:
Q Why is there a dependency between STH and LDH? They do not use any
common registers so how can there be a dependency?
A If we look at the original C code in lesson_c.c, we see that the LDHs corre-
spond to loading values from xptr and yptr, and the STH corresponds to storing
values into w_sum array.
Q Is there any dependency between xptr, yptr, and w_sum?
Lesson 1: Loop Carry Path From Memory Pointers
.D1T1
*A4++,A0
.D2T2
*B4++,B6
2
.L2
B0,1,B0
.S2
L2
.M1
A0,A5,A0
.M2
B6,B5,B6
1
.L1X
B6,A0,A0
.S1
A0,15,A0
.D1T1
A0,*A3++
Compiler Optimization Tutorial
;
^ |32|
;
^ |32|
; |33|
; |33|
;
^ |32|
;
^ |32|
;
^ |32|
;
^ |32|
;
^ |32|
2-7

Hide quick links:

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the TMS320C6000 and is the answer not in the manual?

Questions and answers

Table of Contents