IA-32 Intel® Architecture Optimization
Example 3-21 Emulation of Conditional Moves (continued)
top_of_loop:
movq
pcmpgtw mm0, [B + eax]; Create compare mask
movq
pand
pandn
por
movq
add
cmp
jle
Note that this can be applied to both SIMD integer and SIMD
floating-point code.
If there are multiple consumers of an instance of a register, group the
consumers together as closely as possible. However, the consumers
should not be scheduled near the producer.
SIMD Optimizations and Microarchitectures
Pentium M, Intel Core Solo and Intel Core Duo processors have a
different microarchitecture than Intel NetBurst
following sub-section discusses optimizing SIMD code targeting Intel
Core Solo and Intel Core Duo processors.
The register-register variant of the following instructions has improved
performance on Intel Core Solo and Intel Core Duo processor relative to
Pentium M processors. This is because the instructions consist of two
micro-ops instead of three. Relevant instructions are: unpcklps,
unpckhps, packsswb, packuswb, packssdw, pshufd, shuffps and shuffpd.
3-38
mm0, [A + eax]
mm1, [D + eax]
mm1, mm0; Drop elements where A<B
mm0, [E + eax] ; Drop elements where A>B
mm0, mm1; Crete single word
[C + eax], mm0
eax, 8
eax, MAX_ELEMENT*2
top_of_loop
®
microarchitecture. The
Need help?
Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?