Intel ARCHITECTURE IA-32 Reference Manual page 84

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization
Intel Core Solo and Intel Core Duo processors have enhanced front
end that is less sensitive to the 4-1-1 template. The practice has no
real impact on processors based on the Intel NetBurst
microarchitecture.
Dependencies for partial register writes incur large penalties when
using the Pentium M processor (this applies to processors with
CPUID signature family 6, model 9). On Pentium 4, Intel Xeon
processors, Pentium M processor (with CPUID signature family 6,
model 13), and Intel Core Solo, and Intel Core Duo processors, such
penalties are resolved by artificial dependencies between each
partial register write. To avoid false dependences from partial
register updates, use full register updates and extended moves.
On Pentium 4 and Intel Xeon processors, some latencies have
increased: shifts, rotates, integer multiplies, and moves from
memory with sign extension are longer than before. Use care when
using the
Instruction" for recommendations.
The
and
performance.
Dependence-breaking support is added for the
Floating point register stack exchange instructions were free; now
they are slightly more expensive due to issue restrictions.
Writes and reads to the same location should now be spaced apart.
This is especially true for writes that depend on long-latency
instructions.
Hardware prefetching may shorten the effective memory latency for
data and instruction accesses.
Cacheability instructions are available to streamline stores and
manage cache utilization.
Cache lines are 64 bytes (see Table 1-1 and Table 1-3). Because of
this, software prefetching should be done less often. False sharing,
however, can be an issue.
2-12
instruction. See the section "Use of the lea
lea
and
instructions should always be avoided. Using
inc
dec
instructions instead avoids data dependence and improves
sub
instruction.
pxor
add

Advertisement

Table of Contents
loading

Table of Contents