Intel i86W Manual page 181

Table of Contents

Advertisement

PROGRAMMING EXAMPLES
II
MATRIX MULTIPLY, C = A
*
8, CACHED AND PIPELINED LOADS MIXED
II
Registers loaded by calling routine
A=r16
II
pointer into A, stored in memory by rows
8=r17
II
pointer into 8, stored in memory by columns
C=r18
II
pointer into C, stored inmemory by rows
L=r19
II
the number of rows in A
M=r20
II
the number of columns in A and rows in 8
N=r21
II
the number of columns in 8
II
Registers used locally
Ap=r29
II
temporary pointer into A
RC=r28
II
rowlcolumn counter decremented by bla for loop control
DEC=r27
II
decrement or for rowlcolumn pointers
Ar=r26
II
counter of rows in A
8c=r2S
II
counter of columns in 8
8p=r24
II
temporary pointer into 8
SIZ=r23
II
number of bytes in row of A or column of 8
Al=f4; A2=fS; A3=f6; A4=f7; AS=f8; A6=f9; A7=f10;A8=fll
II
matrix A row values
81=f12;82=f13;83=f14;84=flS;8S=f16;86=f17;87=f18;88=fl9
II
matrx 8 column vals
Tl=f20;T2=f21;T3=f22
II
temporary results
mov
8,
8p
II
Pointer to 8
shl
2,
M,
SIZ
II
Number of bytes in M entries
adds
-8,
r0, DEC
II
Set decrement or for bla
adds
-8,
M,
RC
II
Initialize rowlcolumn counter
d.fiadd·dd
f0,
f0, f0
II
Initiate dual-instruction mode
adds
-4,
C,
C
II
Start C index one entry low
d.fnop
II
First dual-mode pair
adds
-1,
L,
Ar
II
Make row counter zero relative
d.fnop
II
bla
DEC, RC, start_row
II
Initialize LCC
d.fnop
II
mov
A,
Ap
II
Pointer to A
start_row::
II
Executed once per row of A
d.pfmul.ss
f0,
f0, f0
II
pfld. d
o
(8p) ,
f0
II
Load 2 entries of 8 into load pipe
d.pfmul.ss
f0,
f0, f0
II
pfld.d
8(8p)++, f0
II
Load 2 entries of 8 into load pipe
d.pfmul.ss
f0,
f0, f0
II
pfld. d
8(8p)++, f0
II
Load 2 entries of 8 into load pipe
d.pfadd.ss
f0,
f0, f0
II
fld.q
o
(Ap) ,
Al
II
Load
entries of A
d.pfadd.ss
f0,
f0, f0
II
pfld.d
8(8p)++, 81
II
Load 2 entries of 8
d.pfadd.ss
f0,
f0, f0
II
adds
-1,
N,
8c
II
Initialize column counter
d.fnop
II
pfld.d
8(8p)++, 83
II
Load 2 entries of 8
inner_loop::
II
Process eight entries from row of A with eight from col of 8
d.m12apm.ss
Al,
81, f0
II
fld.q
16(Ap)++, AS
II
Load
entries of A
d.m12apm.ss
A2,
82, f0
II
pfld.d
8(8p)++, 85
II
Load 2 entries of 8
d.m12apm.ss
A3,
83, f0
II
pfld.d
8(8p)++, 87
II
Load 2 entries of 8
Ex~mple
9-14. Matrix Multiply, Cached and Pipelined Loads (Sheet 1 of 2)
9-19

Advertisement

Table of Contents
loading

Table of Contents