Repeated String Instruction Usage; Latency Of Repeated String Instructions; Guidelines For Repeated String Instructions; Table 1. Latency Of Repeated String Instructions - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization

Repeated String Instruction Usage

Latency of Repeated String Instructions

Guidelines for Repeated String Instructions

Use the Largest
Possible Operand
Size
84
In addition, using MMX instructions increases the available
parallelism. The AMD Athlon processor can issue three integer
OPs and two MMX OPs per cycle.
Table 1 shows the latency for repeated string instructions on the
AMD Athlon processor.
Table 1.
Latency of Repeated String Instructions
Instruction
ECX=0 (cycles)
REP MOVS
REP STOS
REP LODS
REP SCAS
REP CMPS
Note:
c = value of ECX, (ECX > 0)
Table 1 lists the latencies with the direction flag (DF) = 0
(increment) and DF = 1. In addition, these latencies are
a s su m ed fo r al i g n e d m em o ry o p e ra n d s . N o t e t h at for
MOVS/STOS, when DF = 1 (DOWN), the overhead portion of the
latency increases significantly. However, these types are less
commonly found. The user should use the formula and round up
to the nearest integer value to determine the latency.
To help achieve good performance, this section contains
guidelines for the careful scheduling of VectorPath repeated
string instructions.
Always move data using the largest operand size possible. For
example, use REP MOVSD rather than REP MOVSW and REP
MOVSW rather than REP MOVSB. Use REP STOSD rather than
REP STOSW and REP STOSW rather than REP MOVSB.
DF = 0 (cycles)
11
15 + (4/3*c)
11
14 + (1*c)
11
15 + (2*c)
11
15 + (5/2*c)
11
16 + (10/3*c)
Repeated String Instruction Usage
22007E/0—November 1999
DF = 1 (cycles)
25 + (4/3*c)
24 + (1*c)
15 + (2*c)
15 + (5/2*c)
16 + (10/3*c)

Advertisement

Table of Contents
loading

Table of Contents