3Dnow! And Mmx Intra-Operand Swapping - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization
3DNow!™ and MMX™ Intra-Operand Swapping
AMD Athlon™
Specific Code
Blended Code
112
Example:
PXOR
MM2, MM2
MOVD
MM0, [ab]
MOVD
MM1, [cd]
PUNPCKLWD MM0, MM2
PUNCPKLWD MM1, MM2
PMADDWD
MM0, MM1
If the swapping of MMX register halves is necessary, use the
PSWAPD instruction, which is a new AMD Athlon 3DNow! DSP
ex t e ns i o n. Us e o f t hi s i ns t r uc t i o n s h ou l d o n ly b e fo r
AMD Athlon specific code. "PSWAPD MMreg1, MMreg2"
performs the following operation:
mmreg1[63:32] = mmreg2[31:0])
mmreg1[31:0] = mmreg2[63:32])
See the AMD Extensions to the 3DNow! and MMX Instruction Set
Manual, order #22466 for more usage information.
Otherwise, for blended code, which needs to run well on
AMD-K6 and AMD Athlon family processors, the following code
is recommended:
Example 1 (Preferred, faster):
;MM1 = SWAP (MM0), MM0 destroyed
MOVQ
MM1, MM0
PUNPCKLDQ
MM0, MM0
PUNPCKHDQ
MM1, MM0
Example 2 (Preferred, fast):
;MM1 = SWAP (MM0), MM0 preserved
MOVQ
MM1, MM0
PUNPCKHDQ
MM1, MM1
PUNPCKLDQ
MM1, MM0
Both examples accomplish the swapping, but the first example
should be used if the original contents of the register do not
need to be preserved. The first example is faster due to the fact
that the MOVQ and PUNPCKLDQ instructions can execute in
parallel. The instructions in the second example are dependent
on one another and take longer to execute.
;
0 | 0
; 0 0 | b a
; 0 0 | d c
; 0 b | 0 a
; 0 d | 0 c
; b*d | a*c
;make a copy
;duplicate lower half
;combine lower halves
;make a copy
;duplicate upper half
;combine upper halves
3DNow!™ and MMX™ Intra-Operand Swapping
22007E/0—November 1999

Advertisement

Table of Contents
loading

Table of Contents