Deep Performance Optimization Guidelines - IBM Power7 Optimization And Tuning Manual

Hide thumbs Also See for Power7:

page of 224

/ 224
Contents
Table of Contents
Bookmarks

Table of Contents

The POWER Hypervisor uses a three-level affinity mechanism in its scheduler to enforce

affinity as much as possible. The reason why absolute affinity is not always possible is that

partitions can expand and use unused cycles of other LPARs. This process is done using

uncapped mode in Power, where the uncapped cycles might not always have affinity.

Therefore, binding logical processors that are seen at the operating system level to physical

threads seen at the hypervisor level works only in some cases in shared partitions. Achieving

a high level of affinity is difficult when multiple partitions share resources from a single pool,

especially at high utilization, and when partitions are expanding to use other partition cycles.

Therefore, creating large shared processor core pools that span across chips tends to create

remote memory accesses. For this reason, it might be less desirable to use larger partitions

and large processor core pools where high-level affinity performance is expected.

Virtualized deployments can use Micro-Partitioning, where a partition is allocated a fraction of

a core. Micro-Partitioning allow a core allocation as small as 0.1 cores in older firmware

levels, and as small as 0.05 cores starting at the 760 firmware level, when coupled with

supporting operating system levels. This powerful mechanism provides great flexibility in

deployments. However, very small core allocations may be more appropriate for situations in

which many virtual machines are often idle. Therefore, active 0.05 core LPARs can use those

idle cycles. Also, there is one negative performance effect in deployments with considerably

small partitions, in particular with 0.1 or less cores at high system utilization: Java warm-up

times can be greatly increased. In a Java execution, the JIT compiler is producing binary code

for Java methods dynamically. Steady-state optimal performance is reached after a portion of

the Java methods are compiled to binary code. With considerably small partitions, there might

be a long warm-up period before reaching steady-state performance, where a 0.05 LPAR

cannot get additional cycles from other LPARs because the other LPARs are consuming their

cycles. Also, if the workload that is running on this small-size LPAR does not need more than

5% of a processor core capacity, then the performance impact is mitigated.

Memory requirements

For good performance, there should be enough physical memory that is available so that

application data does not need to be frequently paged in and out between memory and disk.

The physical memory that is allocated to a partition must be enough to satisfy the

requirements of the operating system and the applications that are running on the partition.

Java is sensitive to having enough physical memory available to contain the Java heap

because Java applications often have frequent GC cycles where large portions of the Java

heap are accessed. If portions of the Java heap are paged out to disk by the operating system

because of a lack of physical memory, then GC cycles can cause a large amount of disk

activity, which is known as

1.5.3 Deep performance optimization guidelines

Performance tools for AIX and Linux are described in Appendix B, "Performance tooling and

empirical performance analysis" on page 155. A deep performance optimization effort

typically uses those tools and follows this general strategy:

1. Gather general information about the execution of an application when it is running on a

dedicated POWER7 performance system. Important statistics to consider are:

– The user and system CPU usage of the application: Ideally, a multi-threaded

application generates a high overall CPU usage with most of the CPU time in user

code. Too high a system CPU usage is generally a sign of a locking bottleneck in the

application. Too low an overall usage usually indicates some type of resource

bottleneck, such as network or disk. For low CPU usage, look at the number of

runnable threads reported by the operating system, and try to ensure that there are as

many runnable threads as there are logical processors in the partition.

thrashing

Chapter 1. Optimization and tuning on IBM POWER7 and IBM POWER7+

Table of Contents

Need help?

Do you have a question about the Power7 and is the answer not in the manual?

This manual is also suitable for:

Power7+

Deep Performance Optimization Guidelines - IBM Power7 Optimization And Tuning Manual

1.5.3 Deep performance optimization guidelines

Need help?

Related Manuals for IBM Power7

Related Products for IBM Power7

This manual is also suitable for:

Table of Contents