Effective Use Of Addressing Modes; Cache And Prefetch Optimizations - Intel PXA255 User Manual

Xscale microarchitecture
Hide thumbs Also See for PXA255:
Table of Contents

Advertisement

Optimization Guide
Multiplication by an integer constant that can be expressed as
optimized as:
;Multiplication of r0 by an integer constant that can be
;expressed as (2
add
mov
Please note that the above optimization should only be used in cases where the multiply operation
cannot be advanced far enough to prevent pipeline stalls.
Dividing an unsigned integer by an integer constant should be optimized to make use of the shift
operation whenever possible.
;Dividing r0 containing an unsigned value by an integer constant
;that can be represented as 2
mov
Dividing a signed integer by an integer constant should be optimized to make use of the shift
operation whenever possible.
;Dividing r0 containing a signed value by an integer constant
;that can be represented as 2
mov
add
mov
The add instruction would stall for 1 cycle. The stall can be prevented by filling in another
instruction before add.
A.3.5

Effective Use of Addressing Modes

The Intel® XScale™ core provides a variety of addressing modes that make indexing an array of
objects highly efficient. For a detailed description of these addressing modes please refer to the
ARM* Architecture Reference Manual.
array operations can be optimized to make use of these addressing modes:
;Set the contents of the word pointed to by r0 to the value
;contained in r1 and make r0 point to the next word
str
;Increment the contents of r0 to make it point to the next word
;and set the contents of the word pointed to the value contained
;in r1
str
;Set the contents of the word pointed to by r0 to the value
;contained in r1 and make r0 point to the previous word
str
;Decrement the contents of r0 to make it point to the previous
;word and set the contents of the word pointed to the value
;contained in r1
str
A.4

Cache and Prefetch Optimizations

This chapter considers how to use the various cache memories in all their modes and then examines
when and how to use prefetch to improve execution efficiencies.
A-12
n
m
+1)*(2
)
r0, r0, r0, LSL #n
r0, r0, LSL #m
n
r0, r0, LSR #n
n
r1, r0, ASR #31
r0, r0, r1, LSR #(32 - n)
r0, r0, ASR #n
r1,[r0], #4
r1, [r0, #4]!
r1,[r0], #-4
r1,[r0, #-4]!
n
(
2
The following code samples illustrate how various kinds of
Intel® XScale™ Microarchitecture User's Manual
·
m
)
(
)
can similarly be
+
1
2

Advertisement

Table of Contents
loading

Table of Contents