IBM Power7 Optimization And Tuning Manual page 129

Table of Contents

Advertisement

OpenMP
The OpenMP API is an industry specification for shared-memory parallel programming. The
current GCC compilers, starting with GCC- 4.4 (Advance Toolchain 4.0+), provide a full
implementation of the OpenMP 3.0 specification in C, C++, and Fortran. Programming with
OpenMP allows you to benefit from the incremental introduction of parallelism in an existing
application by adding pragmas or directives to specify how the application can
be parallelized.
For applications with available parallelism, OpenMP can provide a simple solution for parallel
programming, without requiring low-level thread manipulation. The GNU OpenMP
implementation on the GCC compilers is available under the -fopenmp option. GCC also
provides auto-parallelization under the -ftree-parallelize-loops option.
Whole-program analysis
Traditional compiler optimizations operate independently on each application source file.
Inter-procedural optimizations operate at the whole-program scope, using the interaction
between parts of the application on different source files. It is often effective for large-scale
applications that are composed of hundreds or thousands of source files.
Starting with GCC- 4.6 (Advance Toolchain 5.0), there is the Link Time Optimization (LTO)
feature. LTO allows separate compilation of multiple source files but saves additional (abstract
program description) information in the resulting object file. Then, at application link time, the
linker can collect all the objects (with additional information) and pass them back to the
compiler (GCC) for whole program IPA and final code generation.
The GCC LTO feature is enabled on the compile and link phases by the -flto option. A
simple example follows:
gcc -flto -O3 -c a.c
gcc -flto -O3 -c b.c
gcc -flto -o program a.o b.o
Additional options that can be used with -flto include:
1to1
-flto-partition={
-flto-compression-level=n
Detailed descriptions about -flto and its related options are in Options That Control
Optimization, available at:
http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Optimize-Options.html#Optimize-Options
Profiled-based optimization
Profile-based optimization allows the compiler to collect information about the program
behavior and use that information when you make code generation decisions. It involves
compiling the program twice: first, to generate an
collects program behavior data when run, and a second time to generate an optimized binary
using information that is collected by running the instrumented binary through a set of typical
inputs for the application.
Profile-based optimization in the GCC compiler is accessed through the -fprofile-generate
and -fprofile-use options on top of -O2 optimization levels. The instrumented binary is
generated by using -fprofile-generate on top of all other options, and the resulting binary
file generates the profile data in a file, named ._pdf by default. For example:
gcc -fprofile-generate -O3 -c a.c
gcc -fprofile-generate -O3 -c b.c
balanced
none
|
|
}
Chapter 6. Compilers and optimization tools for C, C++, and Fortran
instrumented
version of the application that
113

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power7+

Table of Contents