Intel ARCHITECTURE IA-32 Reference Manual page 10

Architecture optimization
Table of Contents

Advertisement

Hardware Prefetch ......................................................................................................... 6-19
Example of Effective Latency Reduction with H/W Prefetch .......................................... 6-20
Example of Latency Hiding with S/W Prefetch Instruction ............................................ 6-22
Software Prefetching Usage Checklist ........................................................................... 6-24
Software Prefetch Scheduling Distance ......................................................................... 6-25
Software Prefetch Concatenation................................................................................... 6-26
Minimize Number of Software Prefetches ...................................................................... 6-29
Mix Software Prefetch with Computation Instructions .................................................... 6-32
Software Prefetch and Cache Blocking Techniques....................................................... 6-34
Hardware Prefetching and Cache Blocking Techniques ................................................ 6-39
Single-pass versus Multi-pass Execution ....................................................................... 6-41
Memory Optimization using Non-Temporal Stores................................................................ 6-43
Non-temporal Stores and Software Write-Combining..................................................... 6-43
Cache Management ....................................................................................................... 6-44
Video Encoder .......................................................................................................... 6-45
Video Decoder .......................................................................................................... 6-45
Optimizing Memory Copy Routines .......................................................................... 6-46
TLB Priming .............................................................................................................. 6-47
Using the 8-byte Streaming Stores and Software Prefetch....................................... 6-48
Using 16-byte Streaming Stores and Hardware Prefetch ......................................... 6-50
Performance Comparisons of Memory Copy Routines ............................................ 6-52
Deterministic Cache Parameters .......................................................................................... 6-53
Cache Sharing Using Deterministic Cache Parameters................................................. 6-55
Cache Sharing in Single-core or Multi-core.................................................................... 6-55
Chapter 7
Performance and Usage Models............................................................................................. 7-2
Multithreading ................................................................................................................... 7-2
Multitasking Environment ................................................................................................. 7-4
Programming Models and Multithreading ............................................................................... 7-6
Parallel Programming Models .......................................................................................... 7-7
Domain Decomposition............................................................................................... 7-7
Functional Decomposition ................................................................................................ 7-8
Specialized Programming Models .................................................................................... 7-8
Producer-Consumer Threading Models.................................................................... 7-10
Tools for Creating Multithreaded Applications ................................................................ 7-14
Optimization Guidelines ........................................................................................................ 7-16
Key Practices of Thread Synchronization ...................................................................... 7-16
x

Advertisement

Table of Contents
loading

Table of Contents