Avoid Cache Line Sharing; Common Hop Myths Debunked; Myth: All Equal Hop Cases Take Equal Time - AMD Athlon 64 Manuallines

Performance guidelines for multiprocessor systems

Hide thumbs Also See for AMD Athlon 64:

User manual (76 pages)

Thermal design manual (82 pages)

Table Of Contents

Table of Contents

40555 Rev. 3.00 June 2006

Spec JBB 2005 was run using the NUMA tools provided by Linux

improvement with node interleaving. The results were obtained on the same internal 4P Quartet

system used for the synthetic tests.

3.3

Avoid Cache Line Sharing

In a ccNUMA multiprocessor system, data within a single cache line that is shared between cores,

even on the same node, can reduce performance. In certain cases, such as semaphores, this kind of

cache-line data sharing cannot be avoided, but it should be minimized where possible.

Data can often be restructured so that such cache-line sharing does not occur. Cache lines on

AMD Athlon™ 64 and AMD Opteron™ processors are currently 64 bytes, but a scheme that avoids

this problem, regardless of cache-line size, makes for more performance-portable code. For example,

a multithreaded application should avoid using statically defined shared arrays and variables that are

potentially located in a single cache line and shared between threads.

3.4

Common Hop Myths Debunked

This section addresses several commonly held beliefs concerning the effect of memory access hops

on system performance.

3.4.1

Myth: All Equal Hop Cases Take Equal Time.

As a general rule, any n hop case is equivalent to any other n hop case in performance, if the only

change between the two cases is thread and memory placement. However, there are exceptions to this

rule.

The following example demonstrates how a given 1 hop-1 hop case is not equivalent in performance

to another 1 hop-1 hop case using the synthetic test. The example shows how saturating the

HyperTransport link throughput and stressing the HyperTransport queue buffers can cause this

exception to occur.

In the graphs that follow, we compare the following three cases:

•

Threads access local data

The first thread runs on node 0 and writes to memory on node 0 ( 0 hop). The second thread runs

on node 1 and writes to memory on node 1 (0 hop).

•

Threads not firing at each other (no crossfire)

The first thread runs on node 0 and writes to memory on node 1 (1 hop). The second thread runs

on node 1 and writes to memory on node 3 (1 hop).

Chapter 3

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

Analysis and Recommendations

ccNUMA Multiprocessor Systems

to measure the performance

Table of Contents

This manual is also suitable for:

Amd opteron Opteron ccnuma

Avoid Cache Line Sharing; Common Hop Myths Debunked; Myth: All Equal Hop Cases Take Equal Time - AMD Athlon 64 Manuallines

Avoid Cache Line Sharing

Common Hop Myths Debunked

Myth: All Equal Hop Cases Take Equal Time.

Related Manuals for AMD AMD Athlon 64

Related Content for AMD AMD Athlon 64

This manual is also suitable for:

Table of Contents