Scheduling The Wmac Instructions - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
WLDRD wR4,R4,#8 -STALL
WADDB wR0,wR1,wR2
WADDB wR0,wR0,wR6
WADDB wR0,wR0,wR7
The same code sequence is reorganized to avoid a back-to-back issue of WLDRD instructions.
WLDRD wR3,R4,#8
WADDB wR0,wR1,wR2
WLDRD wR4,R4,#8
WADDB wR0,wR0,wR6
WLDRD wR5,r4,#8
WADDB wR0,wR0,wR7
Always try to separate 3 consecutive WLDRD instructions so that only 2 are outstanding at any
one time and the loads are always interleaved with other instructions
WLDRD wR0, [r2] , #8
WZERO wR15
WLDRD wR1, [r4] , #8
SUBS r3, r3, #8
WLDRD wR3, [r4] , #8
Always try to interleave additional operations between the load instruction and the instruction
which will first use the data.
WLDRD wR0, [r2] , #8
WZERO wR15
WLDRD wR1, [r4] , #8
SUBS r3, r3, #8
WLDRD wR3, [r4] , #8
SUBS r4, r4, #1
WMACS R15, wR1, wR0
4.3.2.2

Scheduling the WMAC Instructions

The issue latency of the WMAC instruction is one cycle and the result and resource latency is two
cycles. The second WMAC instruction in the following example stalls for one cycle due to the two
cycle resource latency.
WMACS wR0, wR2, wR3
WMACS wR1, wR4, wR5
The WADD instruction in the following example stalls for one cycle due to the two cycle result
latency.
WMACS wR0, wR2, wR3
WADD wR1, wR0, wR2
Intel® PXA27x Processor Family Optimization Guide
4-19

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents