Increasing Load Throughput - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
add
orr
; The value in register r6 is not used after this
The Intel XScale® Microarchitecture has four fill buffers used to fetch data from external memory
when a data cache miss occurs. The Intel XScale® Microarchitecture stalls when all fill buffers are
in use. This happens when more than four loads are outstanding and are being fetched from
memory. Write the code to ensure no more than four loads are simultaneously outstanding. For
example, the number of loads issued sequentially should not exceed four. A preload instruction can
cause a fill buffer to be used. As a result, the number of outstanding preload instructions should
also be considered to arrive at the number of loads that are outstanding.
Use the number of outstanding loads to improve performance of the PXA27x processor.
4.3.1.2

Increasing Load Throughput

Increasing load throughput for data-demanding applications is important. Making use of multiple
outstanding loads increases throughput in the PXA27x processor. Use register rotation to allow
multiple outstanding loads. The following code allows one outstanding load at a time due to the
data dependency between the instructions (load and add). Throughput falls drastically in cases
where there is a cache miss.
Loop:
ldr r1, [r0], #32; r0 be a pointer to some initialized memory
add r2, r2, r1
ldr r1, [r0], #32;
add r2, r2, r1
ldr r1, [r0], #32;
add r2, r2, r1
.
.
.
bne Loop
However, the following example uses multiple registers as the target for loads and allows multiple
outstanding loads.
ldr r1, [r0], #32; r0 be a pointer to some initialized memory
ldr r2, [r0], #32
ldr r3, [r0], #32
ldr r4, [r0], #32
Loop:
add r5, r5, r1
ldr r1, [r0], #32
add r5, r5, r3
ldr r2, [r0], #32
add r5, r5, r3
ldr r3, [r0], #32
add r5, r5, r4
Intel® PXA27x Processor Family Optimization Guide
r8, r8, #4
r8,r8, #0xf
4-11

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents