Select Directpath Over Vectorpath Instructions; Load-Execute Instruction Usage; Use Load-Execute Integer Instructions - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization

Select DirectPath Over VectorPath Instructions

TOP

Load-Execute Instruction Usage

Use Load-Execute Integer Instructions

TOP
34
U s e D i re c t Pa t h i n s t r u c t i o n s ra t h e r t h a n Ve c t o r Pa t h
instructions. DirectPath instructions are optimized for decode
and execute efficiently by minimizing the number of operations
per x86 instruction, which includes 'register ← register op
memory' as well as 'register ← register op register' forms of
instructions. Up to three DirectPath instructions can be
decoded per cycle. VectorPath instructions will block the
decoding of DirectPath instructions.
The very high majority of instructions used be a compiler has
b e e n i m p l e m e n t e d a s D i re c t Pa t h i n s t r u c t i o n s i n t h e
AMD Athlon processor. Assembly writers must still take into
consideration the usage of DirectPath versus VectorPath
instructions.
See Appendix F, "Instruction Dispatch and Execution
Resources" on page 187 and Appendix G, "DirectPath versus
VectorPath Instructions" on page 219 for tables of DirectPath
and VectorPath instructions.
Most load-execute integer instructions are DirectPath
decodable and can be decoded at the rate of three per cycle.
Splitting a load-execute integer instruction into two separate
instructions—a load instruction and a "reg, reg" instruction—
reduces decoding bandwidth and increases register pressure,
which results in lower performance. The split-instruction form
can be used to avoid scheduler stalls for longer executing
instructions and to explicitly schedule the load and execute
operations.
22007E/0—November 1999
Select DirectPath Over VectorPath Instructions

Advertisement

Table of Contents
loading

Table of Contents