Take Advantage Of Write Combining; Use 3Dnow! Instructions; Avoid Branches Dependent On Random Data - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization
Avoid Load-Execute Floating-Point Instructions with Integer Operands
TOP

Take Advantage of Write Combining

TOP
Use 3DNow!™ Instructions
TOP

Avoid Branches Dependent on Random Data

TOP
10
Do not use load-execute floating-point instructions with integer
operands. The floating-point load-execute instructions with
integer operands are VectorPath and generate two OPs in a
cycle, while the discrete equivalent enables a third DirectPath
instruction to be decoded in the same cycle.
This guideline applies only to operating system, device driver,
a n d B I O S p rog ra m m e rs . I n o rd e r t o i m p rove s y s t e m
performance, the AMD Athlon processor aggressively combines
multiple memory-write cycles of any data size that address
locations within a 64-byte cache line aligned write buffer.
See Appendix C, "Implementation of Write Combining" on
page 155 for more details.
Unless accuracy requirements dictate otherwise, perform
floating-point computations using the 3DNow! instructions
instead of x87 instructions. The SIMD nature of 3DNow!
instructions achieves twice the number of FLOPs that are
achieved through x87 instructions. 3DNow! instructions also
provide for a flat register file instead of the stack-based
approach of x87 instructions.
See Table 23 on page 217 for a list of 3DNow! instructions. For
information about instruction usage, see the 3DNow!™
Technology Manual, order# 21928.
Avoid data-dependent branches around a single instruction .
Data-dependent branches acting upon basically random data
can cause the branch prediction logic to mispredict the branch
about 50% of the time. Design branch-free alternative code
sequences, which results in shorter average execution time.
See "Avoid Branches Dependent on Random Data" on page 57
for more details.
Group II Optimizations—Secondary Optimizations
22007E/0—November 1999

Advertisement

Table of Contents
loading

Table of Contents