IBM Power7 Optimization And Tuning Manual page 115

Table of Contents

Advertisement

Applications can use 32-bit and 64-bit execution modes, depending on their specific
requirements, if their dependent libraries are available for the wanted mode.
The 32-bit mode is lighter with a simpler function call sequence and smaller footprint for stack
and C+\+ objects, which can be important for some dynamic language interpreters and
applications with many small functions.
The 64-bit mode has a larger footprint because of the larger pointer and general register size,
which can be an asset when you handle large data structures or text data, where larger
(64-bit) general registers are used for high bandwidth in the memory and string functions.
The handling of floating point and vector data is the same (registers size and format and
instructions) for 32-bit and 64-bit modes. Therefore, for these applications, the key decision
depends on the address space requirements. For 32-bit Power applications (32-bit mode
applications that are running on 64-bit Power hardware with a 64-bit kernel), the address
space is limited to 4 GB, which is the limit of a 32-bit address. 64-bit applications are currently
limited to 16 TB of application program or data per process. This limitation is not a hardware
one, but is a restriction of the shared Linux virtual memory manager implementation. For
applications with low latency response requirements, using the larger, 64-bit addressing to
avoid I/O latencies using memory mapped files or large local caches is a good trade-off.
CPU-tuned libraries
If an application must support only one Power hardware platform (such as POWER7 and
newer), then compiling the entire application with the appropriate -mcpu= and -mtune=
compiler flags might be the best option.
For example, -mcpu=power7 allows the compiler to use all the new instructions, such as the
Vector Scalar Extended category. The -mcpu=power7 option also implies -mtune=power7 if it is
not explicitly set.
mcpu focuses on the instruction mix that the compiler generates. mtune focuses on optimizing
the order of the instructions
Most applications do need to run on more than one platform, for example, in POWER6 mode
and POWER7 mode. For applications composed of a main program and a set of shared
libraries or applications that spend significant execution time in other (from the Linux run time
or extra package) shared libraries, you can create packages that automatically select the best
optimization for each platform.
Linux also supports automatic CPU tuned library selection. There are a number of
implementation options for CPU tuned library implementers as described here. For more
information, see Optimized Libraries, available at:
http://www.ibm.com/developerworks/wikis/display/LinuxP/Optimized%20Libraries
The Linux Technology Center works with the SUSE and Red Hat Linux Distribution Partners
to provide some automatic CPU-tuned libraries for the C/POSIX runtime libraries. However,
these libraries might not be supported for all platforms or have the latest optimization.
One advantage of the Advance Toolchain is that the runtime RPMs for the current release do
include CPU-tuned libraries for all the currently supported POWER processors and the latest
processor-specific optimization and capabilities, which are constantly updated. Additional
libraries are added as they are identified. The Advance Toolchain run time can be used with
either Advance Toolchain GCC or XL compilers and includes configuration files to simplify
linking XL compiled programs with the Advance Toolchain runtime libraries.
99
Chapter 5. Linux

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power7+

Table of Contents