Use Mmx Pcmp Instead Of 3Dnow! Pfcmp - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization
Use MMX™ PCMP Instead of 3DNow!™ PFCMP
Both Numbers
Positive
One Negative, One
Positive
Both Numbers
Negative
114
cycle bypassing penalty, and another one cycle penalty if the
result goes to a 3DNow! operation. The PFMUL execution
latency is four, therefore, in the worst case, the PXOR and
PMUL instructions are the same in terms of latency. On the
AMD-K6 processor, there is only a one cycle latency for PXOR,
versus a two cycle latency for the 3DNow! PFMUL instruction.
Use the following code to negate 3DNow! data:
msgn
DQ 8000000080000000h
PXOR
MM0, [msgn]
Use the MMX PCMP instruction instead of the 3DNow! PFCMP
instruction. On the AMD Athlon processor, the PCMP has a
latency of two cycles while the PFCMP has a latency of four
cycles. In addition to the shorter latency, PCMP can be issued to
either the FADD or the FMUL pipe, while PFCMP is restricted
to the FADD pipe.
Note: The PFCMP instruction has a 'GE' (greater or equal)
version (PFCMPGE) that is missing from PCMP.
If both arguments are positive, PCMP always works.
If one number is negative and the other is positive, PCMP still
works, except when one number is a positive zero and the other
is a negative zero.
Be careful when performing integer comparison using PCMPGT
on two negative 3DNow! numbers. The result is the inverse of
the PFCMPGT floating-point comparison. For example:
–2 = 84000000
–4 = 84800000
PCMPGT gives 84800000 > 84000000, but –4 < –2. To address
this issue, simply reverse the comparison by swapping the
source operands.
;toggle sign bit
Use MMX™ PCMP Instead of 3DNow!™ PFCMP
22007E/0—November 1999

Advertisement

Table of Contents
loading

Table of Contents