Scheduling Coprocessor 15 Instructions; Instruction Scheduling For Intel® Wireless Mmx™ Technology; Increasing Load Throughput On Intel® Wireless Mmx™ Technology - Intel PXA270 Optimization Manual

Pxa27x processor family

page of 144

/ 144
Contents
Table of Contents
Bookmarks

Table of Contents

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization

Move the ADD instruction to after the ORR instruction to prevent this stall.

4.3.1.11

Scheduling Coprocessor 15 Instructions

The MRC instruction has an issue latency of one cycle and a result latency of three cycles. The

MCR instruction has an issue latency of one cycle. The MOV instruction in the following example,

incurs a 2-cycle latency due to the 3-cycle result latency of the MRC instruction.

add

mrc

mov

add

Rearrange the code to avoid these stalls:

mrc

add

mov

4.3.2

Instruction Scheduling for Intel® Wireless MMX™

Technology

The Intel® Wireless MMX™ Technology provides an instruction set which offers the same

functionality as the Intel® Wireless MMX™ Technology and Streaming SIMD Extensions (SSE)

integer instructions.

4.3.2.1

Increasing Load Throughput on Intel® Wireless MMX™ Technology

The constraints on issuing load transactions with Intel XScale® Microarchitecture also hold with

Intel® Wireless MMX™ Technology. The considerations reviewed using the Intel XScale®

Microarchitecture instructions are re-illustrated in this section using the Intel® Wireless MMX™

Technology instruction set. The primary observations with load transactions are:

•

The buffering in the memory pipeline allows two load double transactions to be outstanding

without incurring a penalty (stall).

•

Back-to-back WLDRD instructions incur a stall, back-to-back WLDR(BHW) instructions do

not incur a stall

•

The WLDRD requires 4 cycles to return the DWORD assuming a cache hit, back-to-back

WLDR (BHW) require 3 cycles to return the data.

•

Use prefetching schemes with the above suggestions.

The overhead on issuing load transactions can be minimized by instruction scheduling and load

pipelining. In most cases it is straightforward to interleave other operation to avoid the penalty with

back-to-back LDRD instructions. In the following code sequence three WLDRD instructions are

issued back-to-back incurring a stall on the second and third instruction.

WLDRD wR3,R4,#8

WLDRD wR5,r4,#8 - STALL

4-18

r1, r2, r3

p15, 0, r7, C1, C0, 0

r0, r7

r1, r1, #1

p15, 0, r7, C1, C0, 0

r1, r2, r3

r1, r1, #1

r0, r7

Intel® PXA27x Processor Family Optimization Guide

Table of Contents

This manual is also suitable for:

Pxa271 Pxa272 Pxa273

Scheduling Coprocessor 15 Instructions; Instruction Scheduling For Intel® Wireless Mmx™ Technology; Increasing Load Throughput On Intel® Wireless Mmx™ Technology - Intel PXA270 Optimization Manual

Scheduling Coprocessor 15 Instructions

Related Manuals for Intel PXA270

Related Content for Intel PXA270

This manual is also suitable for:

Table of Contents