IBM Power7 Optimization And Tuning Manual page 49

Hide thumbs Also See for Power7:

page of 224

/ 224
Contents
Table of Contents
Bookmarks

Table of Contents

Prefetching to avoid cache miss penalties

Prefetching to avoid cache miss penalties is another technique that is used to improve

performance of applications. The concept is to prefetch blocks of data to be placed into the

cache a number of cycles before the data is needed. This action hides the penalty of waiting

for the data to be read from main storage. Prefetching can be speculative when, based on the

conditional path that is taken through the code, the data might end up not actually being

required. The benefit of prefetching depends on how often the prefetched data is used.

Although prefetching is not strictly related to cache geometry, it is an important technique.

A caveat to prefetching is that, although it is common for the technique to improve

performance for single-thread, single core, and low utilization environments, it actually can

decrease performance in high thread-count per-socket and high-utilization environments.

Most systems today virtualize processors and the memory that is used by the workload.

Because of this situation, the application designer must consider that, although an LPAR

might be assigned only a few cores, the overall system likely has a large number of cores.

Further, if the LPARs are sharing processor cores, the problem becomes compounded.

The dcbt and dcbtst instructions are commonly used to prefetch data.

Architecture ISA 2.06 Stride N Prefetch Engines to boost Application's performance

provides an overview about how these instructions can be used to improve application

performance. These instructions can be used directly in hand-tuned assembly language

code, or they can be accessed through compiler built-ins or directives.

Prefetching is also automatically done by the POWER7 hardware and is configurable, as

described in 2.3.7, "Data prefetching using d-cache instructions and the Data Streams

Control Register (DSCR)" on page 46.

Alignment of data

Processors are optimized for accessing data elements on their naturally aligned boundaries.

Unaligned data accesses might require extra processing time by the processor for individual

load or store instructions. They might require a trap and emulation by the host operating

system. Ensuring natural data alignment also ensures that individual accesses do not span

cache line boundaries.

Similar to the idea of splitting structures into hot and cold elements, the concept of data

alignment seeks to optimize cache performance by ensuring that data does not span across

multiple cache lines. The cache line size in Power Systems is 128 bytes.

The general technique for alignment is to keep operands (data) on

as a word or doubleword boundary (that is, an int would be aligned to be on a word boundary

in memory). This technique might involve padding and reordering data structures to avoid

cases such as the interleaving of chars and doubles:

language compilers do automatic data alignment. However, padding must be carefully

analyzed to ensure that it does not result in more cache misses or page misses (especially for

rarely referenced groupings of data).

dcbt (Data Cache Block Touch) Instruction, available at:

http://pic.dhe.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.aixassem/doc/alangref/idalangref_dcbt_i

nstrs.htm

dcbtst (Data Cache Block Touch for Store) Instruction, available at:

http://pic.dhe.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.aixassem/doc/alangref/idalangref_dcbsts

t_instrs.htm

Power Architecture ISA 2.06 Stride N prefetch Engines to boost Application's performance, available at:

https://www.power.org/documentation/whitepaper-on-stride-n-prefetch-feature-of-isa-2-06/

required)

23,24

Power

natural

boundaries, such

char; double; char; double

Chapter 2. The POWER7 processor

. High-level

(registration

Table of Contents

This manual is also suitable for:

Power7+

IBM Power7 Optimization And Tuning Manual page 49

Related Manuals for IBM Power7

Related Products for IBM Power7

This manual is also suitable for:

Table of Contents