Sample Code Translated Into 3Dnow! Code - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

22007E/0—November 1999
Sample Code Translated into 3DNow!™ Code
Example 1:
Replace Branches with Computation in 3DNow!™ Code
Example 2 (Preferred):
; r = (x < y) ? a : b
;
; in:
mm0
a
;
mm1
b
;
mm2
x
;
mm3
y
; out: mm1
r
PCMPGTD
MM3, MM2
PAND
MM1, MM3
PANDN
MM3, MM0
POR
MM1, MM3
The following examples use scalar code translated into 3DNow!
code. Note that it is not recommended to use 3DNow! SIMD
instructions for scalar code, because the advantage of 3DNow!
instructions lies in their "SIMDness". These examples are
meant to demonstrate general techniques for translating source
code with branches into branchless 3DNow! code. Scalar source
code was chosen to keep the examples simple. These techniques
work in an identical fashion for vector code.
Each example shows the C code and the resulting 3DNow! code.
C code:
float x,y,z;
if (x < y) {
z += 1.0;
}
else {
z -= 1.0;
}
3DNow! code:
;in:
MM0 = x
;
MM1 = y
;
MM2 = z
;out: MM0 = z
MOVQ
MM3, MM0
MOVQ
MM4, one
PFCMPGE
MM0, MM1
PSLLD
MM0, 31
PXOR
MM0, MM4
PFADD
MM0, MM2
AMD Athlon™ Processor x86 Code Optimization
; y > x ? 0xffffffff : 0
; y > x ? b : 0
; y > x > 0 : a
; r = y > x ? b : a
;save x
;1.0
;x < y ? 0 : 0xffffffff
;x < y ? 0 : 0x80000000
;x < y ? 1.0 : -1.0
;x < y ? z+1.0 : z-1.0
"
61

Advertisement

Table of Contents
loading

Table of Contents