Case Study 5: 8X8 Block 1/2X Motion Compensation - Intel PXA270 Optimization Manual

Pxa27x processor family

page of 144

/ 144
Contents
Table of Contents
Bookmarks

Table of Contents

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization

ldr r5, [r0], r4

ldr r11, [r0], r4

ldr r8, [r0], r4

ldr r12, [r0], r4

; These loads are scheduled to distinct destination registers

and r6, r5, r9

orr r6, r6, r11, lsl #16

and r11, r11, r9,

and r7, r8, r9

orr r11, r11, r5,

orr r7, r7,

str r6, [r1], #4

str r7, [r1], #4

and r12, r12, r9,

orr r12, r12, r8,

str r11, [r10], #4

str r12, [r10], #4

subs r14, r14, #1

bgt LOOP

In the following example, scheduled instructions take advantage of write-coalescing of multiple

store instructions to the same line. In this example, the two stores are combined in a single write-

buffer entry and issued as a single write request.

str r11, [r10], #4; Write Coalesce the two stores

str r12, [r10], #4

This can be exploited by either unrolling the C loop or by explicitly inlining multiple stores which

can be combined.

The register rotation technique also allows multiple loads to be outstanding.

4.6.5

Case Study 5: 8x8 Block 1/2X Motion Compensation

Bi-linear interpolation is a typical operation in image and video processing applications. For

example the video decode motion compensation uses the 1/2X interpolation operation. Using

Intel® Wireless MMX™ Technology features can help to accelerate these key applications. The

following code demonstrates how to attain this acceleration. These items are key issues for

optimizing the 1/2X motion compensation:

•

Use WALIGNR instruction for aligning the packed byte array

•

Use the WAVG2BR instruction for calculating the average of bytes.

•

Schedule around the load-to-use-latency

This example code is for the 1/2X interpolation:

; Test for special case of aligned ( LSBs = 110b and 000b)

; r0 -> pointer to misaligned array.

MOV r5,#7

AND r7,r0,r5

MOV r12,#4

Intel® PXA27x Processor Family Optimization Guide

; r0 = pSrc,

; r6->tmp = tmp0 & 0xffff;

; r6->tmp |= tmp1 << 16;

lsl #16 ; r11->tmp1 &= 0xffff0000;

; r7->tmp = tmp0 & 0xffff;

lsr #16 ; r11->tmp1 |= tmp0 >> 16;

r12, lsl #16

; r6->tmp |= tmp1 << 16;

; Write Coalesce the two stores

lsl #16 ; r11->tmp1 &= 0xffff0000;

lsr #16 ; r11->tmp1 |= tmp0 >> 16;

; Write Coalesce the two stores

; r5 =0x7

; r7 -> 3 LSBs of *psrc

; counter

4-33

Table of Contents

This manual is also suitable for:

Pxa271 Pxa272 Pxa273

Case Study 5: 8X8 Block 1/2X Motion Compensation - Intel PXA270 Optimization Manual

Case Study 5: 8x8 Block 1/2X Motion Compensation

Related Manuals for Intel PXA270

Related Content for Intel PXA270

This manual is also suitable for:

Table of Contents