Intel ARCHITECTURE IA-32 Reference Manual page 9

Architecture optimization
Table of Contents

Advertisement

Data Alignment........................................................................................................................ 5-4
Data Arrangement ............................................................................................................ 5-4
Vertical versus Horizontal Computation...................................................................... 5-5
Data Swizzling ............................................................................................................ 5-9
Data Deswizzling ...................................................................................................... 5-14
Using MMX Technology Code for Copy or Shuffling Functions ................................ 5-17
Horizontal ADD Using SSE....................................................................................... 5-18
Use of cvttps2pi/cvttss2si Instructions .................................................................................. 5-21
Flush-to-Zero and Denormals-are-Zero Modes .................................................................... 5-22
SIMD Floating-point Programming Using SSE3 ................................................................... 5-22
SSE3 and Complex Arithmetics ..................................................................................... 5-23
SSE3 and Horizontal Computation................................................................................. 5-26
SIMD Optimizations and Microarchitectures .................................................................. 5-27
Packed Floating-Point Performance ......................................................................... 5-27
General Prefetch Coding Guidelines....................................................................................... 6-2
Hardware Prefetching of Data................................................................................................. 6-4
Prefetch and Cacheability Instructions.................................................................................... 6-5
Prefetch................................................................................................................................... 6-6
Software Data Prefetch .................................................................................................... 6-6
Prefetch and Load Instructions......................................................................................... 6-8
Cacheability Control ................................................................................................................ 6-9
The Non-temporal Store Instructions.............................................................................. 6-10
Fencing ..................................................................................................................... 6-10
Streaming Non-temporal Stores ............................................................................... 6-10
Memory Type and Non-temporal Stores ................................................................... 6-11
Write-Combining ....................................................................................................... 6-12
Streaming Store Usage Models...................................................................................... 6-13
Coherent Requests................................................................................................... 6-13
Non-coherent requests ............................................................................................. 6-13
Streaming Store Instruction Descriptions ....................................................................... 6-14
The fence Instructions .................................................................................................... 6-15
The sfence Instruction .............................................................................................. 6-15
The lfence Instruction ............................................................................................... 6-16
The mfence Instruction ............................................................................................. 6-16
The clflush Instruction .................................................................................................... 6-17
Memory Optimization Using Prefetch.................................................................................... 6-18
Software-controlled Prefetch .......................................................................................... 6-18
ix

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Questions and answers

Table of Contents