Intel i86W Manual page 176

Table of Contents

Advertisement

PROGRAMMING EXAMPLES
II
SINGLE-PRECISION VECTOR SUM
II
input:
r16 - vector address, r17 - vector size (must be
>
5)
II
output: f16 - sum of vector elements
L1: :
L2: :
S· .
..
fld.d
r0(r16),
f20
II
Load first two elements
mov
-2,
r21
II
Loop decrement for bla
II
Initiate entry into dual-instruction mode
d.pfadd.ss
adds
f0,
-6,
f0,
f0
II
Clear adder pipe (1)
r17,
r17
II
Decrement size by 6
II
Enter into dual-instruction mode
d.pfadd.ss
bla
d.pfadd.ss
fld.d
f0,
f0,
f0
II
Clear adder pipe (2)
r21,
r17,
L1
II
Initialize LCC
f0,
f0,
f0
II
Clear adder pipe (3)
8(r16)++,
f22
II
Load 3rd and 4th elements
d.pfadd.ss
f20,
f30,
bla
r21,
r17,
d.pfadd.ss
f21,
f31,
fld.d
8(r16)++,
II
If we reach this point,
II
r17 is either -4 or -3.
f30
II
Add f20 to pipeline
L2
II
If more, go to L2 after
f31
II
adding f21 to pipeline and
f20
II
loading next f20:f21
at least one element remains to be loaded.
II
f20, f21, f22, and f23 still contain vector elements.
II
Add f20 and f22 to the pipeline, too.
d.pfadd.ss
f20,
f30,
f30
br
S
d.pfadd.ss
f21,
f31,
nop
d.pfadd.ss
f22,
f30,
bla
r21,
r17,
d.pfadd.ss
f23,
f31,
fld.d
8(r16)++,
II
If we reach this point,
II
r17 is either -4 or -3.
II
Exit loop after adding
f31
II
f21 to the pipeline
f30
II
Add f22 to pipeline
L1
II
If more, go to L1 after
f31
II
adding f23 to pipeline and
f22
II
loading next f22:f23
at least one element remains to be loaded.
II
f20, f21, f22, and f23 still contain vector
II
Add f20 and f21 to the pipeline, too.
elements.
d.pfadd.ss
f20,
f30,
f30
nop
d.pfadd.ss
f21,
nop
pfadd.ss
f22,
mov
-4,
pfadd.ss
f23,
bte
r21,
f31,
f30,
f31,
r17,
fld.1
8(r16)++,
pfadd.ss
f20,
f30,
f31
1/
Initiate exit from dual mode
f30
1/
Still in dual mode
r21
f31
1/
Last dual-mode pair
DONE
1/
If there is one more
f20
1/
element, load it and
f30
1/
add to pipeline
II
Intermediate results are sitting in the adder pipeline.
1/
Let A1:A2:A3 represent the current pipeline contents
DONE::
pfadd.ss
f0,
f0,
f30
1/
0:A1:A2
f30=A3
pfadd.ss
f30,
f31,
f31
1/
A2+A3:0:A1
f31=A2
pfadd.ss
f0,
f0,
f30
1/
0:A2+A3:0
f30=A1
pfadd.ss
f0,
f0,
f0
1/
0:0:A2+A3
pfadd.ss
f0,
f0,
f31
1/
0:0:0
f31=A2+A3
fadd.ss
f30,
f31,
f16
II
f16 = A1+A2+A3
Example 9-12. Dual-Instruction Mode
9-14

Advertisement

Table of Contents
loading

Table of Contents