Scheduling Coprocessor 15 Instructions; Instruction Scheduling For Intel® Wireless Mmx™ Technology; Increasing Load Throughput On Intel® Wireless Mmx™ Technology - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
Move the ADD instruction to after the ORR instruction to prevent this stall.
4.3.1.11

Scheduling Coprocessor 15 Instructions

The MRC instruction has an issue latency of one cycle and a result latency of three cycles. The
MCR instruction has an issue latency of one cycle. The MOV instruction in the following example,
incurs a 2-cycle latency due to the 3-cycle result latency of the MRC instruction.
add
mrc
mov
add
Rearrange the code to avoid these stalls:
mrc
add
add
mov
4.3.2
Instruction Scheduling for Intel® Wireless MMX™
Technology
The Intel® Wireless MMX™ Technology provides an instruction set which offers the same
functionality as the Intel® Wireless MMX™ Technology and Streaming SIMD Extensions (SSE)
integer instructions.
4.3.2.1
Increasing Load Throughput on Intel® Wireless MMX™ Technology
The constraints on issuing load transactions with Intel XScale® Microarchitecture also hold with
Intel® Wireless MMX™ Technology. The considerations reviewed using the Intel XScale®
Microarchitecture instructions are re-illustrated in this section using the Intel® Wireless MMX™
Technology instruction set. The primary observations with load transactions are:
The buffering in the memory pipeline allows two load double transactions to be outstanding
without incurring a penalty (stall).
Back-to-back WLDRD instructions incur a stall, back-to-back WLDR(BHW) instructions do
not incur a stall
The WLDRD requires 4 cycles to return the DWORD assuming a cache hit, back-to-back
WLDR (BHW) require 3 cycles to return the data.
Use prefetching schemes with the above suggestions.
The overhead on issuing load transactions can be minimized by instruction scheduling and load
pipelining. In most cases it is straightforward to interleave other operation to avoid the penalty with
back-to-back LDRD instructions. In the following code sequence three WLDRD instructions are
issued back-to-back incurring a stall on the second and third instruction.
WLDRD wR3,R4,#8
WLDRD wR5,r4,#8 - STALL
4-18
r1, r2, r3
p15, 0, r7, C1, C0, 0
r0, r7
r1, r1, #1
p15, 0, r7, C1, C0, 0
r1, r2, r3
r1, r1, #1
r0, r7
Intel® PXA27x Processor Family Optimization Guide

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents