Simd Optimizations And Microarchitectures - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization
Example 3-21 Emulation of Conditional Moves (continued)
top_of_loop:
movq
pcmpgtw mm0, [B + eax]; Create compare mask
movq
pand
pandn
por
movq
add
cmp
jle
Note that this can be applied to both SIMD integer and SIMD
floating-point code.
If there are multiple consumers of an instance of a register, group the
consumers together as closely as possible. However, the consumers
should not be scheduled near the producer.

SIMD Optimizations and Microarchitectures

Pentium M, Intel Core Solo and Intel Core Duo processors have a
different microarchitecture than Intel NetBurst
following sub-section discusses optimizing SIMD code targeting Intel
Core Solo and Intel Core Duo processors.
The register-register variant of the following instructions has improved
performance on Intel Core Solo and Intel Core Duo processor relative to
Pentium M processors. This is because the instructions consist of two
micro-ops instead of three. Relevant instructions are: unpcklps,
unpckhps, packsswb, packuswb, packssdw, pshufd, shuffps and shuffpd.
3-38
mm0, [A + eax]
mm1, [D + eax]
mm1, mm0; Drop elements where A<B
mm0, [E + eax] ; Drop elements where A>B
mm0, mm1; Crete single word
[C + eax], mm0
eax, 8
eax, MAX_ELEMENT*2
top_of_loop
®
microarchitecture. The

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Subscribe to Our Youtube Channel

Table of Contents

Save PDF