AMD Athlon 64 Manuallines page 16

Performance guidelines for multiprocessor systems
Hide thumbs Also See for AMD Athlon 64:
Table of Contents

Advertisement

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
ccNUMA Multiprocessor Systems
resources approach saturation. The test has two modes: read-only and write-only. When the test
threads are read-only, the throughput does not stress the capacity of the system resources and, thus,
the test is more sensitive to latency. However, when the threads are write-only, there is a heavy
throughput load on the system. This is described in detail in later sections of this document.
Each thread is successively placed on all possible cores in the system. The data (array) accessed by
each thread is also successively placed on all possible nodes in the system. Several Linux application
programming interfaces (APIs) are used to explicitly pin a thread to a specified core and data to a
specified node, thus allowing full control over thread and memory placement. (For additional details
on the Linux API refer to section A.1 on page 39.) Once a thread or data is pinned to a core or node, it
remains resident there for its entire lifetime. Thus the test runs through all permutations of thread and
memory placement possible for the two threads. Since the test does not rely on the OS for thread and
memory placement, the results obtained from the test are independent of the low level decisions made
by the OS and are thus OS agnostic.
First, the two thread experiments are run on an idle system, thereby generating a truth table of 4096
timing entries for the two threads. The results are then mined to evaluate interesting scenarios of
thread and memory placement. Several of these scenarios are presented in various graphs in this
document.
Next, the experiments are enhanced by adding a variable load of background threads. The behavior of
the two test (or foreground) threads is studied under the impact of these variable load background
threads.
Each of the background threads reads a local 64-MB array. The rate at which each background thread
accesses memory can be adjusted from low to medium to high to very high to control the background
load. Table 1 defines these rate qualifiers.
Table 1.
Data Access Rate Qualifiers
Data Access Rate Qualifier
Medium
Very High
The number of background threads is also varied as needed to make an increasing number of cores
and nodes on the system busy—in other words, to increase the subscription. Full subscription means
that every core in the system is busy running a thread. High subscription means that while several
cores are busy, there are still some cores left free in the system.
The data-mining suggested several basic recommendations for performance enhancement on these
systems. Also revealed were some interesting cases of asymmetry that allowed the low level
16
Low
High
Experimental Setup
Memory Bandwidth Demanded by a Background
Thread on an Idle System
0.5 GB/s
1 GB/s
2 GB/s
4 GB/s
40555 Rev. 3.00 June 2006
Chapter 2

Advertisement

Table of Contents
loading

This manual is also suitable for:

Amd opteronOpteron ccnuma

Table of Contents