General Remarks On Multi-Sample Technique; Data Alignment Techniques - Intel PXA270 Optimization Manual

Pxa27x processor family

page of 144

/ 144
Contents
Table of Contents
Bookmarks

Table of Contents

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization

WALIGNI

WMAC

WALIGNI

WMAC

WLDRD

WMAC

WLDRD

WMAC

BNE

; ** Outer loop code calculates the last four taps for

; y(n), y(n+1), y(n+2), y(n+3)**

; ** Store results

BNE Outer_Loop

4.4.2.1

General Remarks on Multi-Sample Technique

In the example for the real block FIR filter, four outputs are computed simultaneously in the same

inner loop. This has allowed the re-use of coefficients and sample data loaded into the register for

computation of the first output to be used for the computation of the next three outputs. The

interleave factor is set at k=2, which results in the elimination of load-to-use stalls. The throughput

for the sequence is 20 cycles for every 32 taps, or 0.625 cycles per tap. This represents near ideal

saturation of the execution resources.

The multi-sample technique may be applied whenever the same data is being utilized for multiple

calculations. The large register file on Intel® Wireless MMX™ Technology facilitates this

approach and a number of variations are possible.

4.4.3

Data Alignment Techniques

The exploitation of the data parallelism present in multimedia algorithms is accomplished by

executing the same operation on different elements in parallel. This is accomplished by packing

several data elements into a single register and using the packed data instructions provided by the

Intel® Wireless MMX™ Technology.

An important guideline for achieving optimum performance is always to align memory references.

This means that an N-byte memory read or write should always be on an N-byte boundary. In some

it is easy to align data so that all of the reads and writes are aligned. In other cases it is more

difficult because an algorithm naturally reads data in a misaligned fashion. A couple of examples

of this include the single-sample FIR and video motion estimation.

The Intel® Wireless MMX™ Technology provides a mechanism for reducing the overhead

associated with the classes of algorithms which require data to be accessed on 32-bit, 16-bit, or 8-

bit binaries. The ALIGNI instruction is useful when the sequence of alignment is known

beforehand as with the single-sample FIR filter. The ALIGNR instruction is useful when sequence

of alignments are calculated when the algorithm executes as with the fast motion search algorithms

used in video compression. Both of these instructions operate on register pairs which may be

effectively ping-ponged with alternate loads reducing the alignments overhead significantly.

Intel® PXA27x Processor Family Optimization Guide

wR3 ,wR0 , wR1, #2

wR4 ,wR0 , wR1, #4

wR15,wR9 , wR1

; y(n) +=

wR5 ,wR0 , wR1, #6

wR14,wR9 , wR3

; y(n+1) +=

wR1, [R1], #8

; even groups of 4 inputs

wR13,wR9 , wR4

; y(n+2) +=

wR8, [R2], #8

; even groups of 4 coeff.

wR12,wR8 , wR5

; y(n+3) +=

Inner_Loop

4-25

Table of Contents

This manual is also suitable for:

Pxa271 Pxa272 Pxa273

General Remarks On Multi-Sample Technique; Data Alignment Techniques - Intel PXA270 Optimization Manual

General Remarks on Multi-Sample Technique

Data Alignment Techniques

Related Manuals for Intel PXA270

Related Content for Intel PXA270

This manual is also suitable for:

Table of Contents