Data Placement Techniques To Alleviate Unnecessary Data Sharing; Between Nodes Due To First Touch - AMD Athlon 64 Manuallines

Performance guidelines for multiprocessor systems

Hide thumbs Also See for AMD Athlon 64:

User manual (76 pages)

Thermal design manual (82 pages)

Table Of Contents

Table of Contents

40555 Rev. 3.00 June 2006

A ccNUMA-aware OS keeps data local on the node where first-touch occurs as long as there is

enough physical memory available on that node. If enough physical memory is not available on the

node, then various advanced techniques are used to determine where to place the data, depending on

the OS.

Data once placed on a node due to first touch normally resides on that node for its lifetime. However,

the OS scheduler can migrate the thread that first touched the data from one core to another core—

even to a core on a different node. This can be done by the OS for the purpose of load balancing [3].

This migration has the effect of moving the thread farther from its data. Some schedulers try to bring

the thread back to a core on a node where the data is in local memory, but this is never guaranteed.

Furthermore, the thread could first touch more data on the node to which it was moved before it is

moved back. This is a difficult problem for the OS to resolve, since it has no prior information as to

how long the thread will run and, hence, whether migrating it back is desirable or not.

If an application demonstrates that threads are being moved away from their associated memory by

the scheduler, it is typically useful to explicitly set thread placement. By explicitly pinning a thread to

a node, the application can tell the OS to keep the thread on that node and, thus, keep data accessed by

the thread local to it by the virtue of first touch.

The performance improvement obtained by explicit thread placement may vary depending on whether

the application is multithreaded, whether it needs more memory than available on a node, whether

threads are being moved away from their data, etc.

In some cases, where threads are scheduled from the outset on a core that is remote from their data, it

might be useful to explicitly control the data placement. This is discussed in detail in the Section

3.2.2.

The previously discussed tools and APIs for explicitly controlling thread placement can also be used

for explicitly controlling data placement. For additional details on thread and memory placement

tools and API in various OS, refer to Section A.7 on page 44.

3.2.2

Data Placement Techniques to Alleviate Unnecessary Data

Sharing Between Nodes Due to First Touch

When data is shared between threads running on different nodes, the default policy of local allocation

by first touch used by the OS can become non-optimal.

For example, a multithreaded application may have a startup thread that sets up the environment,

allocates and initializes a data structure and forks off worker threads. As per the default local

allocation policy, the data structure is placed on physical memory of the node where the start up

thread did the first touch. The forked worker threads are spread around by the scheduler to be

balanced across all nodes and their cores. A worker thread starts accessing the data structure remotely

from the memory on the node where the first touch occurred. This could lead to significant memory

and HyperTransport traffic in the system. This makes the node where the data resides the bottleneck.

This situation is especially bad for performance if the startup thread only does the initialization and

Chapter 3

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

Analysis and Recommendations

ccNUMA Multiprocessor Systems

Table of Contents

This manual is also suitable for:

Amd opteron Opteron ccnuma

Data Placement Techniques To Alleviate Unnecessary Data Sharing; Between Nodes Due To First Touch - AMD Athlon 64 Manuallines

Related Manuals for AMD AMD Athlon 64

Related Content for AMD AMD Athlon 64

This manual is also suitable for:

Table of Contents