Using The Ibm Sdk For Powerlinux Trace Analyzer; High Library Usage - IBM Power7 Optimization And Tuning Manual

Table of Contents

Advertisement

High kernel usage
If the bulk of the CPU cycles are consumed in the kernel or runtime libraries that are not part
of your application, then a different type of analysis is required. If the kernel is consuming
significant cycles, then the application might be I/O or lock contention bound. This situation
can occur when an application moves to larger systems (higher core count) and fails to
scale up.
I/O bound applications can be constrained by small buffer sizes or a poor choice of an access
method. One issue to look for is applications that use local loopback sockets for interprocess
communications (IPC). This situation is common for applications that are migrating from early
scale-out designs to larger systems (and core-count). The first application change is to
choose a lighter weight form of IPC for in-system communications.
Excessive locking or poor lock granularity can also result is high kernel usage (in the kernel's
spin_lock, futex, and scheduler components) when applications move to larger system
configurations. This situation might require adjusting the application lock strategy and
possibly the type of lock mechanism that is used as well:
POSIX pthread_mutex and pthread_rwlock locks are complex and heavy, and POSIX
semaphores are simpler and lighter.
Use trylock forms to spin in user mode for a limited time when appropriate. Use this
technique when there is normally a finite lock hold time and limited contention for the
resource. This situation avoids context switch and scheduler impact in the kernel.
Reserve POSIX pthread_spinlock and sched_yield for applications that have exclusive
use of the system and with carefully designed thread affinity (assigning specific threads to
specific cores).
The compiler provides inline functions (__sync_fetch_and_add, __sync_fetch_and_or, and
so on) that are better suited for simple atomic updates than POSIX lock and unlock. Use
thread local storage, where appropriate, to avoid locking for thread safe code.

Using the IBM SDK for PowerLinux Trace Analyzer

The IBM SDK for PowerLinux provides tools, including the SystemTap and pthread monitor,
for tracking I/O and lock usage of a running application. The higher level Trace Analyzer tools
can target a specific application for combined SystemTap syscall trace and Lock Trace. The
resulting trace information is correlated for time strip display and analysis within the tool.

High library usage

If libraries are consuming significant cycles, then you must determine if:
Those libraries are part of your application, provided by a third party, or the Linux
distribution
There are alternative libraries that are better optimized
You can recompile those libraries at a higher optimization
Libraries that are part of your application require the same level of empirical analysis as the
rest of your application (by using source profiling and the Source Code Advisor). Libraries that
are used by but not part of your application implies a number of options and strategies:
Most open source packages in the Linux environment are compiled with optimization level
-O2 and tend to avoid additional (higher level GCC) compiler options. This configuration
might be sufficient for a CISC processor with limited register resources, but not sufficient
for a RISC based register-rich processor, such as POWER7.
Appendix B. Performance tooling and empirical performance analysis
173

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power7+

Table of Contents