IBM Power7 Optimization And Tuning Manual page 137

Table of Contents

Advertisement

Architecture-specific optimizations
Here are some architecture-specific optimizations:
--machine tgt (-m tgt): FDPR optimizations include general optimizations that are based
on a high-level program representation as a control and data flow, in addition to peephole
optimizations, relying on different architecture features. Those optimizations can perform
better when tuned for specific platforms. The -m flag allows the user to specify the target
machine model when known in cases where the program is not intended for use on
multiple target platforms. The default target is POWER7.
--align-code code (-A code): Optimizing the alignment and the placement of the code is
crucial to the performance of the program. Correct alignment can improve instruction
fetching and dispatching. The alignment algorithm in FDPR uses different techniques that
are based on the target platform. Some techniques are generic for the Power Architecture,
and others are considered dispatch rules of the specific machine model. If code is 1 (the
default), FDPR applies a standard alignment algorithm that is adapted for the selected
target machine (see -m in the previous bullet point). If code is 2, FDPR applies a more
advanced version, using dispatch rules and other heuristics to decide how the program
code chunks are placed relatively to i-cache sectors, again based on the selected target. A
value of 0 disables the alignment algorithm.
Function optimization
FDPR includes a number of function level optimizations that are based on detailed data flow
analysis (DFA). With DFA, optimizations can determine the data that is contained in each
register at each point in the function and whether this value is used later.
The function optimizations are:
--killed-regs (-kr): A register is considered killed at a point (in the function) if its value is
not used in any ensuing path. FDPR uses the Power ABI convention that defines which
registers are non-volatile (NV) across function calls. NV registers that are used inside a
function are saved in its prolog and restored in its epilog. The -kr optimization analyzes
called functions that are looking for save and restore instructions of killed NV registers. If
the register is killed at the calling site, then the save and restore instructions for this
register are removed. The optimization considers all calls to this function, because an NV
might be alive when the function is called. When needed, the optimization might also
reassign (rename) registers at the calling side to ensure that an NV is indeed killed and
can be optimized.
--hco-reschedule (-hr): The optimization analyzes the flow through hot basic blocks and
looks for instructions that can be moved to dominating colder basic blocks (basic block b1
dominates b2 if all paths to b2 first go through b1). For example, an instruction that loads a
constant to a register is a candidate for such motion.
--simplify-early-exit factor (-see factor): Sometimes a function starts with an early
exit condition so that if the condition is met, the whole body of the function is ignored. If the
condition is commonly taken, it makes sense to avoid saving the registers in the prolog
and restoring them in the epilog. The -see optimization detects such a condition and
provides a reduced epilog that restores only registers modified by computing the
factor
condition. If
is 1, a more aggressive optimization is performed where the prolog is
also optimized.
Chapter 6. Compilers and optimization tools for C, C++, and Fortran
121

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power7+

Table of Contents