IBM Power7 Optimization And Tuning Manual page 127

Hide thumbs Also See for Power7:

page of 224

/ 224
Contents
Table of Contents
Bookmarks

Table of Contents

For applications with available parallelism, OpenMP can provide a simple solution for parallel

programming, without requiring low-level thread manipulation. The OpenMP implementation

on the XL compilers is available by using the -qsmp=omp option.

Whole-program analysis

Traditional compiler optimizations operate independently on each application source file.

Inter-procedural optimizations operate at the whole-program scope, using the interaction

between parts of the application on different source files. It is often effective for large-scale

applications that are composed of hundreds or thousands of source files.

On the XL compilers, these capabilities are accessed by using the -qipa option. It is also

implied when you use optimization levels -O4 and -O5. In this phase, the compiler saves a

high-level representation of the program in the object files during compilation, and reoptimizes

it at the whole-program scope during the link phase. For this situation to occur, the compiler

driver must be used to link the resulting binary, instead of invoking the system linker directly.

Whole-program analysis (IPA) is effective on programs that use many global variables,

overflowing the default AIX limit on global symbols. If the application requires the use of the

-bbigtoc option to link successfully on AIX, it is likely a good candidate for IPA optimization.

There are three levels of IPA optimization on the XL compilers (0, 1, and 2). By default, -qipa

implies ipa=level=1, which performs basic program restructuring. For more aggressive

optimization, apply -qipa=level=2, which performs full program restructuring during the link

step. The time that it takes to complete the link step can increase significantly.

Optimization that is based on Profile Directed Feedback

Profile-based optimization allows the compiler to collect information about the program

behavior and use that information when you make code generation decisions. It involves

instrumented

compiling the program twice: first, to generate an

version of the application that

collects program behavior data when run, and a second time to generate an optimized binary

file using information that is collected by running the instrumented binary through a set of

typical inputs for the application.

Profile-based optimization in the XL compiler is accessed through the -qpdf1 and -qpdf2

options, on top of -O or higher optimization levels. The instrumented binary file is generated

by using -qpdf1 on top of all other options, and the resulting binary file generates the profile

data on a file, named ._pdf by default.

The Profile Directed Feedback (PDF) framework on the XL compilers is built on top of the IPA

infrastructure, with -qpdf1 and -qpdf2 implying -qipa=level=0. For the PDF2 step, it is

possibly to reuse the object files from the -qpdf1 compilation step, and relink only the

application with the -qpdf2 option.

For PDF optimizations to be successful, the instrumented workload must be run with common

workloads that reflect common usage of the application. Use multiple workloads that can

exercise the program in different ways. The data for all instrumentation runs are aggregated

into a single PDF file and used during optimization.

For the PDF profile data to be written out at the end of execution, the program must either

implicitly or explicitly call the exit() library subroutine. Using exit() causes code that is

introduced as part of the PDF instrumentation to be run and write out the PDF profile data. In

contrast, running the _exit() system call skips the writing of the PDF profile data file, which

results in inaccurate profile data being recorded.

Chapter 6. Compilers and optimization tools for C, C++, and Fortran

111

Table of Contents

Show Quick Links

Quick Links:
Introduction to the Power7 Processor

Hide quick links:

Table of Contents

Need help?

Do you have a question about the Power7 and is the answer not in the manual?

This manual is also suitable for:

Power7+

IBM Power7 Optimization And Tuning Manual page 127

Hide quick links:

Need help?

Subscribe to Our Youtube Channel

Related Manuals for IBM Power7

Related Products for IBM Power7

This manual is also suitable for:

Table of Contents