Scheduling The Wmul And Wmadd Instructions; Simd Optimization Techniques; Software Pipelining - Intel PXA270 Optimization Manual

Pxa27x processor family

page of 144

/ 144
Contents
Table of Contents
Bookmarks

Table of Contents

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization

4.3.2.4

Scheduling the WMUL and WMADD Instructions

The issue latency of the WMUL and WMADD instructions is one cycle and the result and resource

latency are two cycles. The second WMUL instruction in the following example stalls for one

cycle due to the two cycle resource latency.

WMUL wR0, wR1, wR2

WMUL wR3, wR4, wR5

The WADD instruction in the following example stalls for one cycle due to the two cycle result

latency.

WMUL wR0, wR1, wR2

WADD wR1, wR0, wR2

4.4

SIMD Optimization Techniques

The Single Instruction Multiple Data, (SIMD), architectures provided by the Intel® Wireless

MMX™ Technology enables us to exploit the inherent parallelism found in the wide domain of

multimedia and communication applications. The most time-consuming code sequences have

certain characteristics in common:

•

Operations are performed on small-native-data types (8-bit pixels, 16-bit voice, 32-bit audio)

•

Regular and recurring memory access patterns, usually data independent

•

Localized, recurring computations performed on the data

•

Compute-intensive processing

In the following sections we illustrate how the rules for writing fast sequences of Intel® MMX™

Technology instructions on Intel® Wireless MMX™ Technology can be applied to the

optimization of short loops of Intel® MMX™ Technology code.

4.4.1

Software Pipelining

Software pipelining or loop unrolling is a well known optimization technique where multiple

calculations are in executed with each loop iteration. The disadvantages of applying this technique

include: increases in code size for critical loops and restrictions on the minimum and multiples of

taps or samples

The obvious advantage is in reduced cycle consumption. Overhead from loop exit testing may be

reduced load-use stalls may be minimized and in some cases eliminated completely instruction

scheduling opportunities may be created and exploited.

To illustrate the need for software pipe-lining, lets consider a key kernel of Intel® MMX™

Technology code that is central to many signal-processing algorithms, the real block Finite-

Impulse-Response (FIR) filter. A real block FIR filter operates on two real vectors c(i) and x(i) and

produces and output vector y(n). The vectors are represented for Intel® MMX™ Technology

programming as arrays of 16-bit integers of some length N. The real FIR filter is represented by the

equation:

Intel® PXA27x Processor Family Optimization Guide

4-21

Table of Contents

This manual is also suitable for:

Pxa271 Pxa272 Pxa273

Scheduling The Wmul And Wmadd Instructions; Simd Optimization Techniques; Software Pipelining - Intel PXA270 Optimization Manual

Scheduling the WMUL and WMADD Instructions

SIMD Optimization Techniques

Software Pipelining

Related Manuals for Intel PXA270

Related Content for Intel PXA270

This manual is also suitable for:

Table of Contents