Scheduling Data Processing Instructions - Intel IXP45X Developer's Manual

Network processors
Table of Contents

Advertisement

Intel
3.10.5.1.2
Scheduling Load and Store Multiple (LDM/STM)
LDM and STM instructions have an issue latency of 2-20 cycles depending on the
number of registers being loaded or stored. The issue latency is typically two cycles
plus an additional cycle for each of the registers being loaded or stored assuming a
data cache hit. The instruction following an LDM would stall whether or not this
instruction depends on the results of the load. A LDRD or STRD instruction does not
suffer from this drawback (except when followed by a memory operation) and should
be used where possible. Consider the task of adding two 64-bit integer values. Assume
that the addresses of these values are aligned on an 8-byte boundary. This can be
achieved using the LDM instructions as shown below:
; r0 contains the address of the value being copied
; r1 contains the address of the destination location
ldm
ldm
adds
adc
If the code were written as shown above, assuming all the accesses hit the cache, the
code would take 11 cycles to complete. Rewriting the code as shown below using LDRD
instruction would take only seven cycles to complete. The performance would increase
further if we can fill in other instructions after LDRD to reduce the stalls due to the
result latencies of the LDRD instructions.
; r0 contains the address of the value being copied
; r1 contains the address of the destination location
ldrd
ldrd
adds
adc
Similarly, the code sequence shown below takes five cycles to complete.
.
stm
add
The alternative version which is shown below would only take three cycles to complete.
strd
add
3.10.5.2

Scheduling Data Processing Instructions

Most data processing instructions for the IXP45X/IXP46X network processors have a
result latency of one cycle. This means that the current instruction is able to use the
result from the previous data processing instruction. However, the result latency is two
cycles if the current instruction needs to use the result of the previous data processing
instruction for a shift by immediate. As a result, the following code segment would
incur a one-cycle stall for the MOV instruction:
sub
add
mov
The code above can be rearranged as follows to remove the one-cycle stall:
add
sub
mov
®
®
Intel
IXP45X and Intel
IXP46X Product Line of Network Processors
Developer's Manual
216
®
®
IXP45X and Intel
IXP46X Product Line of Network Processors—Intel XScale
r0, {r2, r3}
r1, {r4, r5}
r0, r2, r4
r1,r3, r5
r2, [r0]
r4, [r1]
r0, r2, r4
r1,r3, r5
r0, {r2, r3}
r1, r1, #1
r2, [r0]
r1, r1, #1
r6, r7, r8
r1, r2, r3
r4, r1, LSL #2
r1, r2, r3
r6, r7, r8
r4, r1, LSL #2
®
Processor
August 2006
Order Number: 306262-004US

Advertisement

Table of Contents
loading

This manual is also suitable for:

Ixp46x

Table of Contents