IBM Power7 Optimization And Tuning Manual page 48

Hide thumbs Also See for Power7:

page of 224

/ 224
Contents
Table of Contents
Bookmarks

Table of Contents

Cache

L3 cache:

Capacity/associativity

bandwidth

Optimizing for cache geometry

There are several ways to optimize for cache geometry, as described in this section.

Splitting structures into hot and cold elements

A technique for optimizing applications to take advantage of cache is to lay out data

structures so that fields that have a high rate of reference (that is, hot) are grouped, and fields

that have a relatively low rate of reference (that is, cold) are grouped.

place the hot elements into the same

the cache, they are co-located into the same cache line or lines. Additionally, because hot

elements are referenced often, they are likely to stay in the cache. Likewise, the cold

elements are in the same area of memory and result in being in the same cache line, so that

being written out to main storage and discarded causes less of a performance degradation.

This situation occurs because they have a much lower rate of access. Power Systems use

128-byte length cache lines. Compared to Intel processors (64-byte cache lines), these larger

cache lines have the advantage of increasing the reach possible with the same size cache

directory, and the efficiency of the cache by covering up to 128-bytes of hot data in a single

line. However, it also has the implication of potentially bringing more data into the cache than

needed for fine-grained accesses (that is, less than 64 bytes).

As described in Eliminate False Sharing, Stop your CPU power from invisibly going down the

drain,

applied to systems where there are a high number of CPU cores and a phenomenon referred

to as

false sharing

same cache line that can otherwise be accessed independently. For example, if two different

hardware threads wanted to update (store) two different words in the same cache line, only

one of them at a time can gain exclusive access to the cache line to complete the store. This

situation results in:

Cache line transfers between the processors where those threads are

Stalls in other threads that are waiting for the cache line

Leaving all but the most recent thread to update the line without a copy in their cache

This effect is compounded as the number of application threads that share the cache line

(that is, threads that are using different data in the cache line under contention) is scaled

upwards.

analyzing false sharing and suggestions for addressing the phenomenon.

Splitting Data Objects to Increase Cache Utilization (Preliminary Version, 9th October 1998). available at:

http://www.ics.uci.edu/%7Efranz/Site/pubs-pdf/ICS-TR-98-34.pdf

Eliminate False Sharing, Stop your CPU power from invisibly going down the drain, available at:

http://drdobbs.com/goparallel/article/showArticle.jhtml?articleID=217500206

Ibid

POWER7 and POWER7+ Optimization and Tuning Guide

POWER7

On-Chip

4 MB/core, 8-way

16 B reads and 16 B writes per cycle

it is also important to carefully assess the impact of this strategy, especially when

can occur. False sharing occurs when multiple data elements are in the

21, 20

The discussion about cache sharing

POWER7+

On-Chip

10 MB/core, 8-way

16 B reads and 16 B writes per cycle

byte

region of memory, so that when they are pulled into

in also presents techniques for

The concept is to

Table of Contents

Show Quick Links

Quick Links:
Introduction to the Power7 Processor

Hide quick links:

Table of Contents

Need help?

Do you have a question about the Power7 and is the answer not in the manual?

This manual is also suitable for:

Power7+

IBM Power7 Optimization And Tuning Manual page 48

Hide quick links:

Need help?

Subscribe to Our Youtube Channel

Related Manuals for IBM Power7

Related Products for IBM Power7

This manual is also suitable for:

Table of Contents