IBM Power7 Optimization And Tuning Manual page 30

Table of Contents

Advertisement

The POWER7 processor and affinity performance effects
The IBM POWER7 is the latest processor chip in the IBM Power Systems family. The
POWER7 processor chip is available in configurations with four, six, or eight cores per chip,
as compared to the POWER5 and POWER6, each of which have two cores per chip. Along
with the increased number of cores, the POWER7 processor chip implements SMT4 mode,
supporting four hardware threads per core, as compared to the POWER5 and POWER6,
which support only two hardware threads per core. Each POWER7 processor core supports
running in single-threaded mode with one hardware thread, an SMT2 mode with two
hardware threads, or an SMT4 mode with four hardware threads.
Each SMT hardware thread is represented as a logical processor in AIX or Linux. When the
operating system runs in SMT4 mode, it has four logical processors for each dedicated
POWER7 processor core that is assigned to the partition. To gain full benefit from the
throughput improvement of SMT, applications must use all of the SMT threads of the
processor cores.
Each POWER7 chip has memory controllers that allow direct access to a portion of the
memory DIMMs in the system. Any processor core on any chip in the system can access the
memory of the entire system, but it takes longer for an application thread to access the
memory that is attached to a remote chip than to access data in the local memory DIMMs.
For more information about the POWER7 hardware, see Chapter 2, "The POWER7
processor" on page 21. This short description provides some background to help understand
affinity effects
two important performance issues that are known as
.
Cache affinity
The hardware threads for each core of a POWER7 processor share a core-specific cache
space. For multi-threaded applications where different threads are accessing the same data,
it can be advantageous to arrange for those threads to run on the same core. By doing so, the
shared data remains resident in the core-specific cache space, as opposed to moving
cache affinity
between different private cache spaces in the system. This enhanced
can
provide more efficient utilization of the cache space in the system and reducing the latency of
data references.
Similarly, the multiple cores on a POWER7 processor share a chip-specific cache space.
Again, arranging the software threads that are sharing the data to run on the same POWER7
processor (when the partition spans multiple sockets) often allows more efficient utilization of
cache space and reduced data reference latencies.
Memory affinity
By default, the POWER Hypervisor attempts to satisfy the memory requirements of a partition
using the local memory DIMMs for the processor cores that are allocated to the partition. For
larger partitions, however, the partition might contain a mixture of local and remote memory.
For an application that is running on a particular core or chip, the application should always
use only local memory. This enhanced
memory affinity
reduces the latency of
memory accesses.
Partition sizes and affinity
In terms of partition sizes and affinity, this section describes Power dedicated LPARs, shared
resource environments, and memory requirements.
14
POWER7 and POWER7+ Optimization and Tuning Guide

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power7+

Table of Contents