Deep Performance Optimization Guidelines - IBM Power7 Optimization And Tuning Manual

Table of Contents

Advertisement

The POWER Hypervisor uses a three-level affinity mechanism in its scheduler to enforce
affinity as much as possible. The reason why absolute affinity is not always possible is that
partitions can expand and use unused cycles of other LPARs. This process is done using
uncapped mode in Power, where the uncapped cycles might not always have affinity.
Therefore, binding logical processors that are seen at the operating system level to physical
threads seen at the hypervisor level works only in some cases in shared partitions. Achieving
a high level of affinity is difficult when multiple partitions share resources from a single pool,
especially at high utilization, and when partitions are expanding to use other partition cycles.
Therefore, creating large shared processor core pools that span across chips tends to create
remote memory accesses. For this reason, it might be less desirable to use larger partitions
and large processor core pools where high-level affinity performance is expected.
Virtualized deployments can use Micro-Partitioning, where a partition is allocated a fraction of
a core. Micro-Partitioning allow a core allocation as small as 0.1 cores in older firmware
levels, and as small as 0.05 cores starting at the 760 firmware level, when coupled with
supporting operating system levels. This powerful mechanism provides great flexibility in
deployments. However, very small core allocations may be more appropriate for situations in
which many virtual machines are often idle. Therefore, active 0.05 core LPARs can use those
idle cycles. Also, there is one negative performance effect in deployments with considerably
small partitions, in particular with 0.1 or less cores at high system utilization: Java warm-up
times can be greatly increased. In a Java execution, the JIT compiler is producing binary code
for Java methods dynamically. Steady-state optimal performance is reached after a portion of
the Java methods are compiled to binary code. With considerably small partitions, there might
be a long warm-up period before reaching steady-state performance, where a 0.05 LPAR
cannot get additional cycles from other LPARs because the other LPARs are consuming their
cycles. Also, if the workload that is running on this small-size LPAR does not need more than
5% of a processor core capacity, then the performance impact is mitigated.
Memory requirements
For good performance, there should be enough physical memory that is available so that
application data does not need to be frequently paged in and out between memory and disk.
The physical memory that is allocated to a partition must be enough to satisfy the
requirements of the operating system and the applications that are running on the partition.
Java is sensitive to having enough physical memory available to contain the Java heap
because Java applications often have frequent GC cycles where large portions of the Java
heap are accessed. If portions of the Java heap are paged out to disk by the operating system
because of a lack of physical memory, then GC cycles can cause a large amount of disk
activity, which is known as

1.5.3 Deep performance optimization guidelines

Performance tools for AIX and Linux are described in Appendix B, "Performance tooling and
empirical performance analysis" on page 155. A deep performance optimization effort
typically uses those tools and follows this general strategy:
1. Gather general information about the execution of an application when it is running on a
dedicated POWER7 performance system. Important statistics to consider are:
– The user and system CPU usage of the application: Ideally, a multi-threaded
application generates a high overall CPU usage with most of the CPU time in user
code. Too high a system CPU usage is generally a sign of a locking bottleneck in the
application. Too low an overall usage usually indicates some type of resource
bottleneck, such as network or disk. For low CPU usage, look at the number of
runnable threads reported by the operating system, and try to ensure that there are as
many runnable threads as there are logical processors in the partition.
thrashing
.
Chapter 1. Optimization and tuning on IBM POWER7 and IBM POWER7+
17

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power7+

Table of Contents