IBM Power7 Optimization And Tuning Manual page 127

Table of Contents

Advertisement

For applications with available parallelism, OpenMP can provide a simple solution for parallel
programming, without requiring low-level thread manipulation. The OpenMP implementation
on the XL compilers is available by using the -qsmp=omp option.
Whole-program analysis
Traditional compiler optimizations operate independently on each application source file.
Inter-procedural optimizations operate at the whole-program scope, using the interaction
between parts of the application on different source files. It is often effective for large-scale
applications that are composed of hundreds or thousands of source files.
On the XL compilers, these capabilities are accessed by using the -qipa option. It is also
implied when you use optimization levels -O4 and -O5. In this phase, the compiler saves a
high-level representation of the program in the object files during compilation, and reoptimizes
it at the whole-program scope during the link phase. For this situation to occur, the compiler
driver must be used to link the resulting binary, instead of invoking the system linker directly.
Whole-program analysis (IPA) is effective on programs that use many global variables,
overflowing the default AIX limit on global symbols. If the application requires the use of the
-bbigtoc option to link successfully on AIX, it is likely a good candidate for IPA optimization.
There are three levels of IPA optimization on the XL compilers (0, 1, and 2). By default, -qipa
implies ipa=level=1, which performs basic program restructuring. For more aggressive
optimization, apply -qipa=level=2, which performs full program restructuring during the link
step. The time that it takes to complete the link step can increase significantly.
Optimization that is based on Profile Directed Feedback
Profile-based optimization allows the compiler to collect information about the program
behavior and use that information when you make code generation decisions. It involves
instrumented
compiling the program twice: first, to generate an
version of the application that
collects program behavior data when run, and a second time to generate an optimized binary
file using information that is collected by running the instrumented binary through a set of
typical inputs for the application.
Profile-based optimization in the XL compiler is accessed through the -qpdf1 and -qpdf2
options, on top of -O or higher optimization levels. The instrumented binary file is generated
by using -qpdf1 on top of all other options, and the resulting binary file generates the profile
data on a file, named ._pdf by default.
The Profile Directed Feedback (PDF) framework on the XL compilers is built on top of the IPA
infrastructure, with -qpdf1 and -qpdf2 implying -qipa=level=0. For the PDF2 step, it is
possibly to reuse the object files from the -qpdf1 compilation step, and relink only the
application with the -qpdf2 option.
For PDF optimizations to be successful, the instrumented workload must be run with common
workloads that reflect common usage of the application. Use multiple workloads that can
exercise the program in different ways. The data for all instrumentation runs are aggregated
into a single PDF file and used during optimization.
For the PDF profile data to be written out at the end of execution, the program must either
implicitly or explicitly call the exit() library subroutine. Using exit() causes code that is
introduced as part of the PDF instrumentation to be run and write out the PDF profile data. In
contrast, running the _exit() system call skips the writing of the PDF profile data file, which
results in inaccurate profile data being recorded.
Chapter 6. Compilers and optimization tools for C, C++, and Fortran
111

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power7+

Table of Contents