Figure 11. Both Write-Only Threads Running On Node 0 (Different Cores) On An Idle System - AMD Athlon 64 Manuallines

Performance guidelines for multiprocessor systems
Hide thumbs Also See for AMD Athlon 64:
Table of Contents

Advertisement

40555 Rev. 3.00 June 2006
However, as shown in Figure 11 on page 31, when both threads are write-only, the 0 hop-1 hop and
0 hop-2 hop cases are faster than the 0 hop-0 hop case.
Figure 11. Both Write-Only Threads Running on Node 0 (Different Cores) on an Idle
System
When a single thread reads locally, it generates a memory bandwidth load of 1.64 GB/s. Assuming a
sustained memory bandwidth of 70% of the theoretical maximum of 6.4 GB/s (PC3200 DDR
memory), the cumulative bandwidth demanded by two read-only threads does not exceed the
sustained memory bandwidth on that node and hence the local or 0 hop-0 hop case is the fastest.
However, when a single thread writes locally it generates a memory bandwidth load of 2.98 GB/s.
This is because each write in this test case results in a cache line eviction and thus generates twice the
memory traffic generated by a read. The cumulative memory bandwidth demanded by 2 write-only
threads now exceeds the sustained memory bandwidth on that node. The 0 hop-0 hop case now incurs
the penalty of saturating the memory bandwidth on that node. For detailed analysis, refer to Section
A.4 on page 42.
It is useful to study whether this observation is also applicable under a variable background load.
One would expect that, if the memory bandwidth demanded of the remote node were increased, at
some point the 0 hop-1 hop case would become as slow as, and perhaps slower than, the
0 hop-0 hop case for the write-only threads.
The same two write-only threads as before are running on node 0, going though the following cases:
Both threads access local memory.
First thread accesses local memory and second thread accesses memory that is remote by one hop.
First thread accesses local memory and second thread access memory that is remote by two hops.
Chapter 3
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
Total Time for both threads (write-write)
1.8
147%
1.6
1.4
1.2
1
0
0 Hop
0.8
Hop
1 Hop
0.6
0.4
0.2
0
0.0.w.0 0.1.w.0 (0 Hops) (0 Hops)
0.0.w.0 0.1.w.1 (0 Hops) (1 Hops)
0.0.w.0 0.1.w.2 (0 Hops) (1 Hops)
0.0.w.0 0.1.w.3 (0 Hops) (2 Hops)
Analysis and Recommendations
136%
126%
125%
0 Hop
0 Hop
1 Hop
2 Hop
ccNUMA Multiprocessor Systems
31

Advertisement

Table of Contents
loading

This manual is also suitable for:

Amd opteronOpteron ccnuma

Table of Contents