IBM Power7 Optimization And Tuning Manual page 50

Hide thumbs Also See for Power7:

page of 224

/ 224
Contents
Table of Contents
Bookmarks

Table of Contents

Additionally, to achieve optimal performance, floating point and VMX/VSX have different

alignment requirements. For example, the preferred VSX alignment is 16 bytes instead of the

element size of the data type being used. This situation means that VSX data that is smaller

than 16 bytes in length must be padded out to 16 bytes. The compilers introduce padding as

necessary to provide optimal alignment for vector data types.

Sensitivity of scaling to more cores

Different processor chip versions and system models provide less or more scaling of LPARs

and workloads to cores. Different processor chips and systems might have different bus

widths and latencies. All of these factors result in the sensitivity of the performance of an

application/workload to the number of cores it is running on to change based on the

processor chip version and system model.

In general terms, an application that tends to not access memory without CPU intervention

(that are core-centric) scales perfectly across more cores. Performance loss when scaling

across multiple cores tends to come from one or more of the following sources:

Increased cache misses (often from invalidations of data by other processor cores,

especially for locks)

The increased cost of cache misses, which in turn drives overall memory and interconnect

fabric traffic into the region of bandwidth limitations (saturating the memory busses and

interconnect)

The additional cores that are being added to the workload in other nodes, resulting in

increased latency in reaching memory and caches in those nodes

Briefly, cache miss requests and returning data can end up being routed through busses that

connect multiple chips and memory, which have particular bandwidth and latency

characteristics. The goal for scaling across multiple cores, then, is to minimize the change in

the potential penalties that are associated with cache misses and data requests as the

workload size grows.

It is difficult to assess what strategies are effective for scaling to more cores without

considering the complex aspects of a specific application. For example, if all of the cores that

the application is running across eventually access all of the data, then it might be wise to

interleave data across the processor sockets (which are typically a grouping of processor

chips) to optimize them from a memory bus utilization point of view. However, if the access

pattern to data is more localized so that, for most of the data, separate processor cores are

accessing it most of the time, the application might obtain better performance if the data is

close to the processor core that is accessing that data the most (maintaining affinity between

the application thread and the data it is accessing). For the latter case, where the data ought

to be close to the processor core that is accessing the data, the AIX MEMORY_AFFINITY=MCM

environment variable can be set to achieve this behavior.

When multiple processor cores are accessing the same data and that data is being held by a

lock, resulting in the data line in the cache that is invalidated, programs can suffer. This

phenomenon is often referred to as

of contention. Hot locks result in intervention and can easily limit the ability to scale a

workload because all updates to the lock are serialized. Tools such as splat (see "AIX

trace-based analysis tools" on page 165) can be used to identify hot locks.

POWER7 and POWER7+ Optimization and Tuning Guide

hot locks

, where a lock is holding data that has a high rate

Table of Contents

Show Quick Links

Quick Links:
Introduction to the Power7 Processor

Hide quick links:

Table of Contents

Need help?

Do you have a question about the Power7 and is the answer not in the manual?

This manual is also suitable for:

Power7+

IBM Power7 Optimization And Tuning Manual page 50

Hide quick links:

Need help?

Related Manuals for IBM Power7

Related Products for IBM Power7

This manual is also suitable for:

Table of Contents