Scheduling Optimizations; Schedule Instructions According To Their Latency; Unrolling Loops; Complete Loop Unrolling - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

22007E/0—November 1999

Schedule Instructions According to their Latency

Unrolling Loops

Complete Loop Unrolling

Schedule Instructions According to their Latency

Scheduling Optimizations

This chapter describes how to code instructions for efficient
scheduling. Guidelines are listed in order of importance.
The AMD Athlon™ processor can execute up to three x86
instructions per cycle, with each x86 instruction possibly having
a different latency. The AMD Athlon processor has flexible
scheduling, but for absolute maximum performance, schedule
instructions, especially FPU and 3DNow!™ instructions,
according to their latency. Dependent instructions will then not
have to wait on instructions with longer latencies.
See Appendix F, "Instruction Dispatch and Execution
Resources" on page 187 for a list of latency numbers.
Make use of the large AMD Athlon processor 64-Kbyte
instruction cache and unroll loops to get more parallelism and
reduce loop overhead, even with branch prediction. Complete
AMD Athlon™ Processor x86 Code Optimization
7
67

Advertisement

Table of Contents
loading

Table of Contents