Using Shared Execution Resources In A Processor Core - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

7
Multi-Core and Hyper-Threading Technology
seldom reaches 50% of peak retirement bandwidth. Thus, improving
single-thread execution throughput should also benefit multi-threading
performance.
Tuning Suggestion 4. (H Impact, M Generality) Optimize multithreaded
applications to achieve optimal processor scaling with respect to the number of
physical processors or processor cores.
Following the guidelines, such as reduce thread synchronization costs,
locality enhancements, and conserving bus bandwidth, will allow
multi-threading hardware to exploit task-level parallelism in the
workload and improve MP scaling. In general, reducing the dependence
of resources shared between physical packages will benefit processor
scaling with respect to the number of physical processors. Similarly,
heavy reliance on resources shared with different cores is likely to
reduce processor scaling performance. On the other hand, using shared
resource effectively can deliver positive benefit in processor scaling, if
the workload does saturate the critical resource in contention.
Tuning Suggestion 5. (M Impact, L Generality) Schedule threads that
compete for the same execution resource to separate processor cores.
Tuning Suggestion 6. (M Impact, L Generality) Use on-chip execution
resources cooperatively if two logical processors are sharing the execution
resources in the same processor core.

Using Shared Execution Resources in a Processor Core

One way to measure the degree of overall resource utilization by a
single thread is to use performance-monitoring events to count the clock
cycles that a logical processor is executing code and compare that
number to the number of instructions executed to completion. Such
performance metrics are described in Appendix B and can be accessed
using the Intel VTune Performance Analyzer.
An event ratio like non-halted cycles per instructions retired (non-halted
CPI) and non-sleep CPI can be useful in directing code-tuning efforts.
The non-sleep CPI metric can be interpreted as the inverse of the overall
7-59

Advertisement

Table of Contents
loading

Table of Contents