Case Study 3: Dot Product - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
return;
}
The optimized assembly is shown as:
Start
; Set up code ...
; Get wColor for each pixel arranged into hi and lo half-words
; of register so that multiple pixels can be written out
orr r4,r1,r1,LSL #16
; code to check alignment of source and destination
; and code to handle end cases
; setup counters etc. is not shown
; Optimized loop may look like ...
LOOP
str r4,[r0],#4 ; inner loop that fills destination scan line
str r4,[r0],#4 ; pointed by r0
str r4,[r0],#4 ; these stores take advantage of write coalescing
str r4,[r0],#4
str r4,[r0],#4 ; writing out as words
str r4,[r0],#4 ; instead of bytes or half-words
str r4,[r0],#4 ; achieves optimum performance
str r4,[r0],#4
str r4,[r0],#4
str r4,[r0],#4
str r4,[r0],#4
str r4,[r0],#4
str r4,[r0],#4
str r4,[r0],#4
str r4,[r0],#4
str r4,[r0],#4
subs r5,r5,#1 ;Fill 32 units(16 bits WORD) in each loop here
bne LOOP
If the data is going to the internal memory, the same code offers even a greater throughput.
4.6.3

Case Study 3: Dot Product

Dot product is a typical vector operation for signal processing applications and graphics. For
example, vertex transformation uses a graphic dot product. Using Intel® Wireless MMX™
Technology features can help accelerate these applications. The following code demonstrates how
to attain this acceleration. These items are key issues for optimizing the dot-product code:
Use LDRD if input is aligned
Use the 2 cycle WMAC instruction
Intel® PXA27x Processor Family Optimization Guide
4-31

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents