Use 3Dnow! Instructions For Fast Division; Optimized 14-Bit Precision Divide; Optimized Full 24-Bit Precision Divide - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization
Use 3DNow!™ Instructions for Fast Division

Optimized 14-Bit Precision Divide

Optimized Full 24-Bit Precision Divide

108
FEMMS instruction is supported for backward compatibility
with AMD-K6 family processors, and is aliased to the EMMS
instruction.
3DNow! and MMX instructions are designed to be used
concurrently with no switching issues. Likewise, enhanced
3DNow! instructions can be used simultaneously with MMX
instructions. However, x87 and 3DNow! instructions share the
same architectural registers so there is no easy way to use them
concurrently without cleaning up the register file in between
using FEMMS/EMMS.
3DNow! instructions can be used to compute a very fast, highly
accurate reciprocal or quotient.
This divide operation executes with a total latency of seven
cycles, assuming that the program hides the latency of the first
MOVD/MOVQ instructions within preceding code.
Example:
MOVD
MM0, [MEM]
PFRCP
MM0, MM0
MOVQ
MM2, [MEM]
PFMUL
MM2, MM0
This divide operation executes with a total latency of 15 cycles,
assuming that the program hides the latency of the first
MOVD/MOVQ instructions within preceding code.
Example:
MOVD
MM0, [W]
PFRCP
MM1, MM0
PUNPCKLDQ
MM0, MM0
PFRCPIT1
MM0, MM1
MOVQ
MM2, [X_Y]
PFRCPIT2
MM0, MM1
PFMUL
MM2, MM0
;
0 | W
;
1/W | 1/W
;
Y | X
;
Y/W | X/W
;
0 | W
;
1/W | 1/W
;
W | W
;
1/W | 1/W
;
Y | X
;
1/W | 1/W
:
Y/W | X/W
Use 3DNow!™ Instructions for Fast Division
22007E/0—November 1999
(approximate)
(approximate)
(MMX instr.)
(refine)
(final)

Advertisement

Table of Contents
loading

Table of Contents