Use 3Dnow! Instructions For Fast Square Root And Reciprocal Square Root; Optimized 15-Bit Precision Square Root; Optimized 24-Bit Precision Square Root - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization
Use 3DNow!™ Instructions for Fast Square Root and
Reciprocal Square Root

Optimized 15-Bit Precision Square Root

Optimized 24-Bit Precision Square Root

110
3DNow! instructions can be used to compute a very fast, highly
accurate square root and reciprocal square root.
This square root operation can be executed in only 7 cycles,
assuming a program hides the latency of the first MOVD
instruction within previous code. The reciprocal square root
operation requires four less cycles than the square root
operation.
Example:
MOVD
MM0, [MEM]
PFRSQRT
MM1, MM0
PUNPCKLDQ MM0, MM0
PFMUL
MM0, MM1
This square root operation can be executed in only 19 cycles,
assuming a program hides the latency of the first MOVD
instruction within previous code. The reciprocal square root
operation requires four less cycles than the square root
operation.
Example:
MOVD
MM0, [MEM]
PFRSQRT
MM1, MM0
MOVQ
MM2, MM1
PFMUL
MM1, MM1
PUNPCKLDQ MM0, MM0
PFRSQIT1
MM1, MM0
PFRCPIT2
MM1, MM2
PFMUL
MM0, MM1
;
0 | a
;1/sqrt(a) | 1/sqrt(a) (approximate)
;
a | a
;
sqrt(a) | sqrt(a)
;
0 | a
; 1/sqrt(a) | 1/sqrt(a)
;
X_0 = 1/(sqrt a)
;
X_0 * X_0 | X_0 * X_0
;
a | a
;
(intermediate)
; 1/sqrt(a) | 1/sqrt(a)
;
sqrt(a) | sqrt(a)
Use 3DNow!™ Instructions for Fast Square Root and
22007E/0—November 1999
(MMX instr.)
(approx.)
(approx.)
(step 1)
(MMX instr)
(step 2)
(step 3)

Advertisement

Table of Contents
loading

Table of Contents