PEs
The PA has 64 processing elements (PEs) that work in parallel. Each PE has the
following structure:
8 x 8
Coefficients
acc
PE
*
Can also be configured as two 16 x 8
multipliers or as one 16 x 16 multiplier.
The PEs operate as a linear array, whereby each consecutive PE works on a
horizontally adjacent result pixel. When performing a neighborhood operation,
each PE can sometimes work on two 16-bit or four 8-bit result pixels at a time;
this is referred to as operating in dual and quad pixel mode, respectively. In
addition, each PE can process 32 packed binary pixels at time.
SRC0
8 x 8
8 x 8
8 x 8
Multipliers*
MAC unit
40-bit Accumulators
acc
acc
acc
SRC1
SRC2 SRC3
mux
mux
mux
40-bit Register file (R0-R6)
ALU
Multiply operations:
16-bit x 16-bit
Other ALU operations:
40-bit
DEST0 DEST1 DEST2 DEST3
Matrox Oasis
SRC0
Nanocode
Constants
LUT data
89