IBM Power7 Optimization And Tuning Manual page 185

Table of Contents

Advertisement

Finding alignment issues
Improperly aligned code or data can cause performance degradation. By default, the IBM
compilers and linkers correctly align code and data, including stack and statically allocated
variables. Incorrect typecasting can result in references to storage that are not correctly
aligned. There are two types of alignment issues to be concerned with:
Alignment issues that are handled by microcode in the POWER7 processor
Alignment issues that are handled through alignment interrupts.
Examples of alignment issues that are handled by microcode with a performance penalty in
the POWER7 processor are loads that cross a 128-byte boundary and stores that cross a
4 KB page boundary. To give an indication of the penalty for this type of misalignment, on a
4 GHz processor, a nine-instruction loop that contains an 8 byte load that crosses a 128-byte
boundary takes double the time of the same loop with the load correctly aligned.
Alignment issues that are handled by microcode can be detected by running hpmcount or
hpmstat. The hpmcount command is a command-line utility that runs a command and collects
statistics from the POWER7 PMU while the command runs. To detect alignment issues that
are handled by microcode, run hpmcount to collect data for group 38. An example is provided
in Example B-8.
Example B-8 Example of the results of the hpmcount command
# hpmcount -g 38 ./unaligned
Group: 38
Counting mode: user
Counting duration: 21.048874056 seconds
PM_LSU_FLUSH_ULD (LRQ unaligned load flushes)
PM_LSU_FLUSH_UST (SRQ unaligned store flushes)
PM_LSU_FLUSH_LRQ (LRQ flushes)
PM_LSU_FLUSH_SRQ (SRQ flushes)
PM_RUN_INST_CMPL (Run instructions completed)
PM_RUN_CYC (Run cycles)
Normalization base: time
Counting mode: user
Derived metric group: General
[
] Run cycles per run instruction
The hpmstat command is similar to hpmcount, except that it collects performance data on a
system-wide basis, rather than just for the execution of a command.
Generally, scenarios in which the ratio of (
flushes
Run instructions completed
) divided by
investigated. The tprof command can be used to further pinpoint where in the code the
unaligned storage references are occurring. To pinpoint unaligned loads, the -E
PM_MRK_LSU_FLUSH_ULD flag is added to the tprof command line, and to pinpoint unaligned
stores, the -E PM_MRK_LSU_FLUSH_UST flag is added. When these flags are used, tprof
generates a profile where unaligned loads and stores are sampled instead of
time-based sampling.
Appendix B. Performance tooling and empirical performance analysis
LRQ unaligned load flushes
is greater than 0.5% must be further
:
4320840034
:
0
:
450842085
:
149
:
19327363517
:
84219113069
:
4.358
SRQ unaligned store
+
169

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power7+

Table of Contents