Replace Branches With Computation In 3Dnow! Code; Muxing Constructs - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization
Replace Branches with Computation in 3DNow!™ Code

Muxing Constructs

60
Branches negatively impact the performance of 3DNow! code.
Branches can operate only on one data item at a time, i.e., they
are inherently scalar and inhibit the SIMD processing that
makes 3DNow! code superior. Also, branches based on 3DNow!
comparisons require data to be passed to the integer units,
which requires either transport through memory, or the use of
"MOVD reg, MMreg" instructions. If the body of the branch is
small, one can achieve higher performance by replacing the
branch with com putation. The com putation simulat es
predicated execution or conditional moves. The principal tools
for this are the following instructions: PCMPGT, PFCMPGT,
PFCMPGE, PFMIN, PFMAX, PAND, PANDN, POR, PXOR.
The most important construct to avoiding branches in
3DNow!™ and MMX™ code is a 2-way muxing construct that is
equivalent to the ternary operator "?:" in C and C++. It is
implemented using the PCMP/PFCMP, PAND, PANDN, and
POR instructions. To maximize performance, it is important to
apply the PAND and PANDN instructions in the proper order.
Example 1 (Avoid):
; r = (x < y) ? a : b
;
; in:
mm0
a
;
mm1
b
;
mm2
x
;
mm3
y
; out: mm1
r
PCMPGTD
MM3, MM2
MOVQ
MM4, MM3
PANDN
MM3, MM0
PAND
MM1, MM4
POR
MM1, MM3
Because the use of PANDN destroys the mask created by PCMP,
the mask needs to be saved, which requires an additional
register. This adds an instruction, lengthens the dependency
chain, and increases register pressure. Therefore 2-way muxing
constructs should be written as follows.
; y > x ? 0xffffffff : 0
; duplicate mask
; y > x ? 0 : a
; y > x ? b : 0
; r = y > x ? b : a
Replace Branches with Computation in 3DNow!™ Code
22007E/0—November 1999

Advertisement

Table of Contents
loading

Table of Contents