IBM 88743BU - System x3950 E User Manual page 95

Planning, installing, and managing
Table of Contents

Advertisement

Linux Scalability in a NUMA World
http://oss.intel.com/pdf/linux_scalability_in_a_numa_world.pdf
What Every Programmer Should Know About Memory, by Ulrich Drepper
http://people.redhat.com/drepper/cpumemory.pdf
A NUMA API for Linux
http://www.novell.com/collateral/4621437/4621437.pdf
Anatomy of the Linux slab allocator
http://www.ibm.com/developerworks/linux/library/l-linux-slab-allocator/
The documents describe features of the Linux 2.6 kernel and components such
as the Linux task scheduler and memory allocator, which affect the scaling of the
Linux operating system on the IBM x3850 M2 and x3950 M2.
Factors affecting Linux performance on a multinode x3950 M2
The overall performance of a NUMA system depends on:
The local and remote CPU cores on which tasks are scheduled to execute
Ensure threads from the same process or task are scheduled to execute on
CPU cores in the same node. This can be beneficial for achieving the best
NUMA performance, because of the opportunity for reuse of CPU core's
cache data, and also for reducing the likelihood of a remote CPU core having
to access data in the local node's memory.
The ratio of local node to remote node memory accesses made by all CPU
cores
Remote memory accesses should be kept to a minimum because it increases
latency and reduces the performance of that task. It can also reduce the
performance of other tasks because of the contention on the scalability links
for remote memory resources.
The Linux operating system determines where processor cores and memory are
located in the multinode complex from the ACPI System Resource Affinity Table
(SRAT) and System Locality Information Table (SLIT) provided by firmware. The
SRAT table associates each core and each contiguous memory block with the
node they are installed in. The connections between the nodes and the number
of hops between them is described by the SLIT table.
In general, memory is allocated from the memory pool closest to the core on
which the process is running. Some system-wide data structures are allocated
evenly from all nodes in the complex to spread the load across the entire
complex and to ensure that node 0 does not run out of resources, because most
boot-time code is run from that node.
Chapter 2. Product positioning
77

Advertisement

Table of Contents
loading

This manual is also suitable for:

System x3950 m2

Table of Contents