Freescale Semiconductor PowerQUICC III Application Note

Performance monitors. using the core and system performance monitors.

Advertisement

Quick Links

Freescale Semiconductor
Application Note
PowerQUICC III Performance
Monitors
Using the Core and System Performance Monitors
This application note describes aspects of utilizing the core
and device-level performance monitors on PowerQUICC III
(PQ3). Included are example calculations to aid in
interpreting data collected.
1

Performance Monitors

PowerQUICC III processors are the first family of
PowerQUICC processors to include performance monitors
on-chip. These include both core performance monitors,
described in detail in the Power PC
Reference Manual, as well as device-level performance
monitors, described in detail in the product-specific
reference manual.
The e500 core level performance monitors enable the
counting of e500-specific events, for example, cache misses,
mispredicted branches, or the number of cycles an execution
unit stalls. These are configured by a set of special purpose
registers that can only be written through supervisor-level
accesses. The core-level event counters are also available
through a read-only set of user-level registers.
The device-level performance monitors can be used to
monitor and record selected events on a device level. These
© 2008-2014 Freescale Semiconductor, Inc. All rights reserved.
®
e500 Core Family
Document Number: AN3636
Rev. 2, 03/2014
Contents
1. Performance Monitors . . . . . . . . . . . . . . . . . . . . . . . . 1
2. e500 Core Performance Monitors . . . . . . . . . . . . . . . . 2
3. Device Performance Monitors . . . . . . . . . . . . . . . . . . 2
4. Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5. Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7. Data Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8. Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the PowerQUICC III and is the answer not in the manual?

Questions and answers

Summary of Contents for Freescale Semiconductor PowerQUICC III

  • Page 1: Table Of Contents

    8. Revision History ......15 PowerQUICC III processors are the first family of PowerQUICC processors to include performance monitors on-chip.
  • Page 2: E500 Core Performance Monitors

    The device performance monitor consists of ten counters (PMC0-PMC9), capable of monitoring 576 events, as well as the associated local control registers (PMLCA0-PLMCA9) and the global control register (PMGC0). These registers are all memory-mapped and can be accessed in supervisor or user mode. PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 3: Performance Metrics

    CCB (platform) clock cycles. Note that for counter-specific events, an offset of 64 must be used when programming the field, because counter-specific events occupy the bottom 4 values of the 7-bit event fields. PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 4: Rev.

    This is advantageous, since only a limited number of events can be captured simultaneously in the limited number of PMCs available. Example Configuration As an example, note the calculation of the L2 cache core miss rate. This metric requires the following performance monitor events: PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 5: Data Collection

    CCB clock cycles. Collectively, these counters allow the capture of four core events, eight system events, and the CCB clock cycles simultaneously. Collecting data from various events simultaneously makes the captured events almost perfectly correlated, as they are collected under the PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 6 Table 2. Events Necessary for Data Collection of Common Metrics Core Event System Event CE:Ref:2 SE:C0 CE:Com:12 SE:Ref:36 CE:Com:17 SE:Ref:22 CE:Com:68 SE:Ref:23 CE:Com:9 SE:Ref:24 CE:Com:10 SE:C1:54 CE:Com:41 SE:C2:59 SE:C4:57 SE:C2 SE:C4 SE:C6 SE:C8 PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 7 Although this measurement is not as exact as CE:Ref:1, it is within plus/minus “Ratio” number of clocks. The deviation is minute in comparison to the number of clocks captured during a typical experiment. PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 8 For example, SE:0 (CCB clock cycles) can be started by counter 1 (SE:1) and stopped by counter 2 (SE:2). It is possible to configure SE:1 to count an inbound packet accepted on SRIO and SE:2 to count PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 9: Examples

    Utilizing the core and system performance monitors, it is possible to analyze cache performance and compare cache hit ratios with system architectural predictions. On the PowerQUICC III it is possible to tune the L2 cache to handle solely instructions or solely data, which has the potential to boost performance on certain applications as well.
  • Page 10 Example: DDR Performance It may be desirable to determine the performance of the DDR controller and possibly optimize parameters. This example illustrates the impact of tweaking the BSTOPRE field. PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 11 ECM dispatch from TSEC1 SE:C3:19 ECM dispatch from TSEC2 SE:C4:21 ECM dispatch from RIO SE:C5:17 ECM dispatch from PCI SE:C6:17 ECM dispatch from DMA CSE:C7:14 Figure 5. Pass1 - ECM Dispatch Source PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 12 LBC are counter-specific to C6. The code executed in these examples is previously known to read only from LBC SDRAM, so this step is not necessary for the purposes of this example. PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 13: Data Presentation

    Branch miss rate (lower bounds) • Overall DDR page hit rate (higher bounds) • Cycles reading LBC SDRAM (higher bounds) • Packets per second TSEC1 (lower bounds) • L2 non-core miss rate (higher bounds) PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 14 Cycles Read/Wr DDR 60.000% 40.000% 20.000% Packets Per Second TSEC1 0.000% Cache Hit Ratio Cycles Reading LBC SDRAM Branch Miss Rate Overall DDR Page Hit Rate Figure 8. Kiviat Graph for Unbalanced System PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 15: Revision History

    Table 3. Document Revision History Rev. Date Substantive Change(s) Number 03/2014 Added new Figure 08/06/2008 In Table 1, corrected formula of the L1 I-cach miss rate from CE:Com:68/CE:Ref:2 to CE:Com:60/CE:Ref:2. Fixed table formatting. PowerQUICC III Performance Monitors, Rev. 2 Freescale Semiconductor...
  • Page 16 The Power Architecture and Power.org word marks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. © 2008-2014 Freescale Semiconductor, Inc. Document Number: AN3636 Rev. 2...

Table of Contents