Scheduling Data Processing Instructions - Intel XScale Core Developer's Manual

page of 220

/ 220
Contents
Table of Contents
Bookmarks

Table of Contents

Intel XScale® Core Developer's Manual

Optimization Guide

A.5.2

Scheduling Data Processing Instructions

Most core data processing instructions have a result latency of 1 cycle. This means that the current

instruction is able to use the result from the previous data processing instruction. However, the

result latency is 2 cycles if the current instruction needs to use the result of the previous data

processing instruction for a shift by immediate. As a result, the following code segment would

incur a 1 cycle stall for the mov instruction:

sub

add

mov

The code above can be rearranged as follows to remove the 1 cycle stall:

add

sub

mov

All data processing instructions incur a 2 cycle issue penalty and a 2 cycle result penalty when the

shifter operand is a shift/rotate by a register or shifter operand is RRX. Since the next instruction

would always incur a 2 cycle issue penalty, there is no way to avoid such a stall except by

re-writing the assembler instruction. Consider the following segment of code:

mov

mul

add

sub

The subtract instruction would incur a 1 cycle stall due to the issue latency of the add instruction as

the shifter operand is shift by a register. The issue latency can be avoided by changing the code as

follows:

mov

mul

add

sub

212

r6, r7, r8

r1, r2, r3

r4, r1, LSL #2

r1, r2, r3

r6, r7, r8

r4, r1, LSL #2

r3, #10

r4, r2, r3

r5, r6, r2, LSL r3

r7, r8, r2

r3, #10

r4, r2, r3

r5, r6, r2, LSL #10

r7, r8, r2

January, 2004

Developer's Manual

Table of Contents

Need help?

Do you have a question about the XScale Core and is the answer not in the manual?

Scheduling Data Processing Instructions - Intel XScale Core Developer's Manual

Scheduling Data Processing Instructions

Need help?

Related Manuals for Intel XScale Core

Related Products for Intel XScale Core

Table of Contents