Case Study 3: Dot Product - Intel PXA270 Optimization Manual

Pxa27x processor family

page of 144

/ 144
Contents
Table of Contents
Bookmarks

Table of Contents

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization

return;

}

The optimized assembly is shown as:

Start

; Set up code ...

; Get wColor for each pixel arranged into hi and lo half-words

; of register so that multiple pixels can be written out

orr r4,r1,r1,LSL #16

; code to check alignment of source and destination

; and code to handle end cases

; setup counters etc. is not shown

; Optimized loop may look like ...

LOOP

str r4,[r0],#4 ; inner loop that fills destination scan line

str r4,[r0],#4 ; pointed by r0

str r4,[r0],#4 ; these stores take advantage of write coalescing

str r4,[r0],#4

str r4,[r0],#4 ; writing out as words

str r4,[r0],#4 ; instead of bytes or half-words

str r4,[r0],#4 ; achieves optimum performance

str r4,[r0],#4

subs r5,r5,#1 ;Fill 32 units(16 bits WORD) in each loop here

bne LOOP

If the data is going to the internal memory, the same code offers even a greater throughput.

4.6.3