Effective Use Of Addressing Modes; Instruction Scheduling For Intel Xscale® Microarchitecture And Intel® Wireless Mmx™ Technology; Instruction Scheduling For Intel Xscale® Microarchitecture; Scheduling Loads - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
The add instruction stalls for one cycle. Prevent this stall by filling in another instruction before the
add instruction.
4.2.6

Effective Use of Addressing Modes

The Intel XScale® Microarchitecture provides a variety of addressing modes that make indexing
an array of objects highly efficient. Refer to the ARM* Architecture Reference Manual for a
detailed description of ARM*addressing modes. These code samples illustrate how various kinds
of array operations can be optimized to make use of the various addressing modes:
;Set the contents of the word pointed to by r0 to the value
;contained in r1 and make r0 point to the next word
str
;Increment the contents of r0 to make it point to the next word
;and set the contents of the word pointed to the value contained
;in r1
str
;Set the contents of the word pointed to by r0 to the value
;contained in r1 and make r0 point to the previous word
str
;Decrement the contents of r0 to make it point to the previous
;word and set the contents of the word pointed to the value
;contained in r1
str
4.3
Instruction Scheduling for Intel XScale®
Microarchitecture and Intel® Wireless MMX™
Technology
This section discusses instruction scheduling optimizations. Instruction scheduling refers to the
rearrangement of a sequence of instructions for the purpose of helping to minimize pipeline stalls.
Reducing the number of pipeline stalls helps improve application performance. While these
rearrangements, ensure the new sequence of instructions has the same effect as the original
sequence of instructions.
4.3.1
Instruction Scheduling for Intel XScale® Microarchitecture
4.3.1.1

Scheduling Loads

On the Intel XScale® Microarchitecture, an LDR instruction has a result latency of 3 cycles,
assuming the data being loaded is in the data cache. If the instruction after the LDR needs to use the
result of the load, then it would stall for 2 cycles. If possible, rearrange the instructions surrounding
the LDR instruction to avoid this stall.
4-8
r1,[r0], #4
r1, [r0, #4]!
r1,[r0], #-4
r1,[r0, #-4]!
Intel® PXA27x Processor Family Optimization Guide

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents