Optimize Instruction Scheduling; Enable Vectorization - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

Avoid longer latency instructions: integer multiplies and divides.
Replace them with alternate code sequences (e.g., use shifts instead
of multiplies).
Use the
address calculation.
Some types of stores use more µops than others, try to use simpler
store variants and/or reduce the number of stores.
Avoid use of complex instructions that require more than 4 µops.
Avoid instructions that unnecessarily introduce dependence-related
stalls:
operands).
Avoid use of
because accessing them requires a shift operation internally.
Use
xor
dependencies for integer operations; also use
clear XMM registers for floating-point operations.
Use efficient approaches for performing comparisons.

Optimize Instruction Scheduling

Consider latencies and resource constraints.
Calculate store addresses as early as possible.

Enable Vectorization

Use the smallest possible data type. This enables more parallelism
with the use of a longer vector.
Arrange the nesting of loops so the innermost nesting level is free of
inter-iteration dependencies. It is especially important to avoid the
case where the store of data in an earlier iteration happens lexically
after the load of that data in a future iteration (called
lexically-backward dependence).
instruction and the full range of addressing modes to do
lea
and
instructions, partial register operations (8/16-bit
inc
dec
,
, and other higher 8-bits of the 16-bit registers,
ah
bh
and
instructions to clear registers and break
pxor
General Optimization Guidelines
and
xorps
xorpd
2
to
2-7

Advertisement

Table of Contents
loading

Table of Contents