Increasing Store Throughput - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
ldr r4, [r0], #32
.
.
.
bne Loop
The modified code not only hides the load-to-use latencies for the cases of cache-hits, but also
increases the throughput by allowing several loads to be outstanding at a time.
Due to the complexity of the PXA27x processor, the memory latency can be higher. Latency hiding
is very critical. Thus, two things to remember: issue loads as early as possible, and make up to four
outstanding loads. Another technique for hiding memory latency is to use preloads. The prefetch
technique is mentioned in
4.3.1.3

Increasing Store Throughput

Increasing store throughput is important in applications that process video while updating the
output to the display. Write coalescing in the PXA27x processor (set by the page table attributes)
combines multiple stores going to the same half of the cache line into a single memory transaction.
This approach increases the bus efficiency and throughput. The coalescing operation is transparent
to software. However, software can cause more frequent coalescing by placing store instructions
targeted to the same cache line next to each other and configuring the target page attributes as
bufferable. For example, this code does not take advantage of coalescing:
add r1, r1,r2
str r1,[r0],#4 ; A separate bus transaction
add r1, r1,r3
str r1,[r0],#4; A separate bus transaction
add r1, r1,r4
str r1,[r0],#4; A separate bus transaction
add r1, r1,r5
str r1,[r0],#4; A separate bus transaction
However, it can be modified to allow coalescing to occur as:
add r1, r1,r2
add r6, r1,r3
add r7, r6,r4
add r8, r7,r5
str r1,[r0],#4
str r6,[r0],#4
str r7,[r0],#4
str r8,[r0],#4; All four writes can now coalesce into one trans.
Section 4.6
their acceleration with coalescing.
4-12
Chapter 5, "High Level Language
contains case studies showing typical functions such as memory fill and zero fill, and
Optimization".
Intel® PXA27x Processor Family Optimization Guide

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents