Signed Unpack Example; Interleaved Pack With Saturation Example; Optimizing Libraries For System Performance; Case Study 1: Memory-To-Memory Copy - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
4.5.3

Signed Unpack Example

The signed unpack replaces the Intel® MMX™ Technology sequence:
Intel® Wireless MMX™ Technology
Instructions
Input: wR0
WUNPCKELS wR1 ,
WUNPCKEHS wR2 ,
4.5.4

Interleaved Pack with Saturation Example

This example uses signed words as source operands and the result is interleaved signed halfwords.
Intel® Wireless MMX™ Technology
Instructions
Instructions
Input: wR0
WPACKWSS
WSHUFH
4.6

Optimizing Libraries for System Performance

Many of the standard C library routines can benefit greatly by being optimized for the Intel
XScale® Microarchitecture. The following string and memory manipulation routines are good
candidates to be tuned for the Intel XScale® Microarchitecture.
strcat, strchr, strcmp, strcoll, strcpy, strcspn, strlen, strncat, strncmp, strpbrk, strrchr, strspn,
strstr, strtok, strxfrm, memchr, memcmp, memcpy, memmove, memset
Apart from the C libraries, there are many critical functions that can be optimized in the same
fashion. For example, graphics drivers and graphics applications frequently use a set of key
functions. These functions can be optimized for the PXA27x processor. In the following sections a
set of routines are provided as optimization case studies.
4.6.1

Case Study 1: Memory-to-Memory Copy

The performance of memory copy (memcpy) is influenced by memory-access latency and memory
throughput. During memcpy, if the source and destination are both in cache, the performance is the
highest and simple load-instruction scheduling can ensure the most efficient performance.
However, if the source or the destination is not in the cache, a load-latency-hiding technique has to
be applied.
Intel® PXA27x Processor Family Optimization Guide
: Source Value
wR0
wR0
: Source Value 1
wR1
: Source Value 2
wR2 ,
wR0, wR1
wR2 ,
wR2, #216
Intel® MMX™ Technology Instructions
Input: mm0 : Source Value
PUNPCKHWD mm1, mm0
PUNPCKLWD
mm0, mm0
PSRAD
mm0, 16
PSRAD
mm1, 16
Intel® MMX™ Technology
Input:
mm0
mm1
PACKSSDW
PACKSSDW
PUNPKLWD
: Source Value 1
: Source Value 2
mm0, mm0
mm1, mm1
mm0, mm1
4-29

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents