Why Is The No Crossfire Case Slower Than The Crossfire Case On A System Under A Very High Background Load (Full Subscription); Idle System For Write-Only Threads - AMD Athlon 64 Manuallines

Performance guidelines for multiprocessor systems
Hide thumbs Also See for AMD Athlon 64:
Table of Contents

Advertisement

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
ccNUMA Multiprocessor Systems
A.3
Why Is the No Crossfire Case Slower Than the
Crossfire Case on a System under a Very High
Background Load (Full Subscription)?
When the threads are firing at each other (crossfire) and all other free cores are running background
threads at very high load, the system sees the following traffic pattern, where each node receives
memory requests from the threads as described:
Node 0: 1 background and 1 foreground threads.
Node 1: 1 background and 1 foreground threads.
Node 3: 2 background threads.
Node 2: 2 background threads.
In the no crossfire case, the system sees the following traffic pattern:
Node 0: 1 background thread
Node 1: 1 background and 1 foreground threads.
Node 3: 2 background and 1 foreground threads.
Node 2: 2 background threads.
The no crossfire case suffers from a greater load imbalance than the crossfire case with node 3
suffering the worst effect of this imbalance.
Remember that each of the background threads asks for data at a rate of 4GB/s and each of the
foreground threads asks for data at a rate of 2.98 GB/s.
Data shows that there is total memory access of 4.5GB/s on node 3 and that several buffer queues on
node 3 are saturated and cannot absorb the data provided by the memory controller any faster.
A.4
Why Is 0 Hop-0 Hop Case Slower Than the
0 Hop-1 Hop Case on an Idle System for Write-
Only Threads?
When both write-only threads running on different cores of node 0 access data locally
(0 hop-0 hop), significant demands are placed on the local memory on node 0.
Data demonstrates that there is total memory access of 4.5 GB/s on node 0. The memory on node 0
cannot handle requests for data any faster and is running at full capacity. Several buffer queues on
node 0 are saturated and waiting for the memory requests to be serviced.
42
40555 Rev. 3.00 June 2006
Appendix A

Advertisement

Table of Contents
loading

This manual is also suitable for:

Amd opteronOpteron ccnuma

Table of Contents