Chapter 5 Optimizing For Simd Floating-Point Applications; General Rules For Simd Floating-Point Code - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

Optimizing for SIMD
Floating-point Applications
This chapter discusses general rules of optimizing for the
single-instruction, multiple-data (SIMD) floating-point instructions
available in Streaming SIMD Extensions (SSE), Streaming SIMD
Extensions 2 (SSE2)and Streaming SIMD Extensions 3 (SSE3). This
chapter also provides examples that illustrate the optimization
techniques for single-precision and double-precision SIMD
floating-point applications.

General Rules for SIMD Floating-point Code

The rules and suggestions listed in this section help optimize
floating-point code containing SIMD floating-point instructions.
Generally, it is important to understand and balance port utilization to
create efficient SIMD floating-point code. The basic rules and
suggestions include the following:
Follow all guidelines in Chapter 2 and Chapter 3.
Exceptions: mask exceptions to achieve higher performance. When
exceptions are unmasked, software performance is slower.
Utilize the flush-to-zero and denormals-are-zero modes for higher
performance to avoid the penalty of dealing with denormals and
underflows.
Incorporate the prefetch instruction where appropriate (for details,
refer to Chapter 6, "Optimizing Cache Usage").
Use MMX technology instructions and registers if the computations
can be done in SIMD integer for shuffling data.
5
5-1

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Questions and answers

Table of Contents