Page 1
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Guide March 2012 Reference Number: 327043-001...
Page 2
Site. Intel, Pentium, Intel Xeon, Intel NetBurst, Intel Core Solo, Intel Core Duo, Intel Core 2 Duo, Intel Core 2 Extreme, Intel Pentium D, Itanium, Intel SpeedStep, MMX, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
On Parsing and Using Derived Events ..............14 1.6.1 On Common Terms found in Derived Events ..........15 Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring. 17 Uncore Per-Socket Performance Monitoring Control..........17 2.1.1 Setting up a Monitoring Session ............... 17 2.1.2...
Page 4
PCU Performance Monitors ...............72 2.6.3.1 PCU Box Level PMON State ............73 2.6.3.2 PCU PMON state - Counter/Control Pairs........74 2.6.3.3 Intel® PCU Extra Registers - Companions to PMON HW ....76 2.6.4 PCU Performance Monitoring Events ............76 2.6.4.1 An Overview:................76 2.6.5 PCU Box Events Ordered By Code .............78 2.6.6...
The uncore subsystem of the Intel® Xeon® processor E5-2600 product family is shown in Figure 1-1. The uncore subsystem also applies to the Intel® Xeon® processor E5-1600 product family in a single-socket platform . The uncore sub-system consists of a variety of components, ranging from the CBox caching agent to the power controller unit (PCU), integrated memory controller (iMC) and home agent (HA), to name a few.
• Accessed by MSR are PMON registers within the Cbo units, PCU, and U-Box, see Table 1-2. • Access by PCI device configuration space are PMON registers within the HA, iMC, Intel® QPI, R2PCIe and R3QPI units, see Table 1-3.
Introduction Uncore PMON - Typical Control/Counter Logic Following is a diagram of the standard perfmon counter block illustrating how event information is routed and stored within each counter and how its paired control register helps to select and filter the incoming information.
Introduction Additional control bits include: Applying a Threshold to Incoming Events: .thresh - since most counters can increment by a value greater than 1, a threshold can be applied to generate an event based on the outcome of the comparison. If the .thresh is set to a non-zero value, that value is compared against the incoming count for that event in each cycle.
Introduction • e.g., POWER_THROTTLE_CYCLES.RANKx / MC_Chy_PCI_PMON_CTR_FIXED Requires more input to software to determine the specific event/subevent • In some cases, there may be multiple events/subevents that cover the same information across multiple like hardware units. Rather than manufacturing a derived event for each combination, the derived event will use a lower case variable in the event name.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Intel® Xeon® Processor E5- 2600 Product Family Uncore Performance Monitoring Uncore Per-Socket Performance Monitoring Control The uncore PMON does not support interrupt based sampling. To manage the large number of counter...
Reset counters in each box to ensure no stale values have been acquired from previous sessions. • For each CBo, set Cn_MSR_PMON_BOX_CTL[1:0] to 0x2. • For each Intel® QPI Port, set Q_Py_PCI_PMON_BOX_CTL[1:0] to 0x2. • Set PCU_MSR_PMON_BOX_CTL[1:0] to 0x2.
• The UBox is the intermediary for interrupt traffic, receiving interrupts from the sytem and dispatching interrupts to the appropriate core. • The UBox serves as the system lock master used when quiescing the platform (e.g., Intel® QPI bus lock).
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.2.3.2 UBox PMON state - Counter/Control Pairs The following table defines the layout of the UBox performance monitor control registers. The main task of these configuration registers is to select the event to be monitored by their respective data counter (.ev_sel, .umask).
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-4. U_MSR_PMON_FIXED_CTL Register – Field Definitions Field Bits Attr Rese Description t Val 31:23 Reserved (?) Enable counter when global enable is set. 21:20 Reserved. SW must write to 0 for proper operation.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.3.2 CBo Performance Monitoring Overview Each of the CBos in the uncore supports event monitoring through four 44-bit wide counters (Cn_MSR_PMON_CTR{3:0}). Event programming in the CBo is restricted such that each events can only be measured in certain counters within the CBo.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Note: Not all transactions can be associated with a specific thread. For example, when a snoop triggers a WB, it does not have an associated thread. Transactions that are associated with PCIe will come from “0x1E” (b11110).
Acronyms frequently used in CBo Events: The Rings: AD (Address) Ring - Core Read/Write Requests and Intel QPI Snoops. Carries Intel QPI requests and snoop responses from C to Intel® QPI. BL (Block or Data) Ring - Data == 2 transfers for 1 cache line...
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring AK (Acknowledge) Ring - Acknowledges Intel® QPI to CBo and CBo to Core. Carries snoop responses from Core to CBo. IV (Invalidate) Ring - CBo Snoop requests of core caches Internal CBo Queues: IRQ - Ingress Request Queue on AD Ring.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-14. Performance Monitor Events for CBO (Sheet 2 of 2) Event Symbol Name Ctrs Inc/ Description Code RxR_ISMQ_RETRY 0x33 ISMQ Retries LLC_LOOKUP 0x34 Cache Lookups TOR_INSERTS 0x35 TOR Inserts TOR_OCCUPANCY...
Page 33
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-15. Metrics Derived from CBO Events (Sheet 2 of 3) Symbol Name: Equation Definition AVG_TOR_DRD_REM_MISS_LATENCY: (TOR_OCCUPANCY.MISS_OPCODE / Average Latency of Data Reads through the TOR_INSERTS.MISS_OPCODE) TOR that miss the LLC and were satsified by a with:Cn_MSR_PMON_BOX_FILTER.{opc,nid}={0x182,other_no...
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-15. Metrics Derived from CBO Events (Sheet 3 of 3) Symbol Name: Equation Definition PCIE_DATA_BYTES: (TOR_INSERTS.OPCODE Data from PCIe in Number of Bytes with:Cn_MSR_PMON_BOX_FILTER.opc=0x194 + TOR_INSERTS.OPCODE with:Cn_MSR_PMON_BOX_FILTER.opc=0x19C) * 64 RING_THRU_DNEVEN_BYTES: RING_BL_USED.DN_EVEN * 32...
Page 36
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-17. Unit Masks for LLC_VICTIMS (Sheet 2 of 2) umask Extension Filter Dep Description [15:8] S_STATE bxxxxx1xx Lines in S State MISS bxxxx1xxx bx1xxxxxx CBoFilter[1 Victimized Lines that Match NID:...
Page 37
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-19. Unit Masks for RING_AD_USED umask Extension Description [15:8] UP_EVEN bxxxxxxx1 Up and Even: Filters for the Up and Even ring polarity. UP_ODD bxxxxxx1x Up and Odd: Filters for the Up and Odd ring polarity.
Page 38
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-21. Unit Masks for RING_BL_USED umask Extension Description [15:8] UP_EVEN bxxxxxxx1 Up and Even: Filters for the Up and Even ring polarity. UP_ODD bxxxxxx1x Up and Odd: Filters for the Up and Odd ring polarity.
Page 39
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RxR_EXT_STARVED • Title: Ingress Arbiter Blocking Cycles • Category: INGRESS Events • Event Code: 0x12 • Max. Inc/Cyc: 1, Register Restrictions: 0-1 • Definition: Counts cycles in external starvation. This occurs when one of the ingress queues is being starved by the other queues.
Page 40
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-26. Unit Masks for RxR_IPQ_RETRY umask Extension Description [15:8] bxxxxxxx1 Any Reject: Counts the number of times that a request form the IPQ was retried because of a TOR reject. TOR rejects from the IPQ can be caused by the Egress being full or Address Conflicts.
Page 41
QPI_CREDITS bxxx1xxxx No Intel® QPI Credits: Number of requests rejects because of lack of Intel® QPI Ingress credits. These credits are required in order to send transactions to the Intel® QPI agent. Please see the QPI_IGR_CREDITS events for more information.
Page 42
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RxR_OCCUPANCY • Title: Ingress Occupancy • Category: INGRESS Events • Event Code: 0x11 • Max. Inc/Cyc: 20, Register Restrictions: 0 • Definition: Counts number of entries in the specified Ingress queue in each cycle.
Page 43
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-30. Unit Masks for TOR_INSERTS (Sheet 2 of 2) umask Extension Filter Dep Description [15:8] MISS_OPCODE b00000011 CBoFilter[3 Miss Opcode Match: 1:23] Miss transactions inserted into the TOR that match an opcode.
Page 44
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-31. Unit Masks for TOR_OCCUPANCY (Sheet 2 of 2) umask Extension Filter Dep Description [15:8] b00001000 Any: All valid TOR entries. This includes requests that reside in the TOR for a short time, such as LLC Hits...
In other words, the view of data must be the same across all coherency agents regardless of who is reading or modifying the data. On Intel® QPI, the home agent is responsible for tracking all requests to a given address and ensuring that the results are consistent.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring The Home Agent supports Intel® QPI’s home snoop protocol by initiating snoops on behalf of requests. Closely tied to the directory feature, the home agent has the ability to issue snoops to the peer caching agents for requests based on the directory information.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring In the case of the HA, the HA_PCI_PMON_BOX_CTL register governs what happens when a freeze signal is received (.frz_en). It also provides the ability to manually freeze the counters in the box (.frz).
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-35. HA_PCI_PMON_CTL{3-0} Register – Field Definitions (Sheet 2 of 2) Field Bits Attr Reset Description 17:16 Reserved. SW must write to 0 else behavior is undefined. umask 15:8 RW-V Select subevents to be counted within the selected event.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-39. HA_PCI_PMON_BOX_ADDRMATCH0 Register – Field Definitions Field Bits Reset Reset Description lo_addr 31:6 Match to this System Address - Least Significant 26b of cache aligned address [31:6] Reserved (?) Note: The address comparison always ignores the lower 12 bits of the physical address, even if they system is interleaving between sockets at the cache-line level.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.4.5 HA Box Events Ordered By Code The following table summarizes the directly measured HA Box events. Table 2-40. Performance Monitor Events for HA Event Symbol Name Ctrs Inc/ Description Code...
Ubox because of enable/freeze delays. The HA is on the other side of the die from the fixed Ubox uclk counter, so the drift could be somewhat larger than in units that are closer like the Intel® QPI Agent.
Page 52
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-43. Unit Masks for CONFLICT_CYCLES umask Extension Description [15:8] NO_CONFLICT bxxxxxxx1 No Conflict: Counts the number of cycles that we are NOT handling conflicts. CONFLICT bxxxxxx1x Conflict Detected: Counts the number of cycles that we are handling conflicts.
Page 53
• Definition: Counts the number of cycles when the HA does not have credits to send messages to the Intel® QPI Agent. This can be filtered by the different credit pools and the different links. Table 2-46. Unit Masks for IGR_NO_CREDIT_CYCLES...
Page 54
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-47. Unit Masks for IMC_WRITES umask Extension Description [15:8] FULL bxxxxxxx1 Full Line Non-ISOCH PARTIAL bxxxxxx1x Partial Non-ISOCH FULL_ISOCH bxxxxx1xx ISOCH Full Line PARTIAL_ISOCH bxxxx1xxx ISOCH Partial b00001111 All Writes REQUESTS •...
Page 55
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring TAD_REQUESTS_G0 • Title: HA Requests to a TAD Region - Group 0 • Category: TAD Events • Event Code: 0x1B • Max. Inc/Cyc: 2, Register Restrictions: 0-3 • Definition: Counts the number of HA requests to a given TAD region. There are up to 11 TAD (tar- get address decode) regions in each home agent.
Page 56
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-51. Unit Masks for TAD_REQUESTS_G1 (Sheet 2 of 2) umask Extension Description [15:8] REGION10 bxxxxx1xx TAD Region 10: Filters request made to TAD Region 10 REGION11 bxxxx1xxx TAD Region 11:...
Page 57
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-54. Unit Masks for TxR_AD_CYCLES_FULL umask Extension Description [15:8] SCHED0 bxxxxxxx1 Scheduler 0: Filter for cycles full from scheduler bank 0 SCHED1 bxxxxxx1x Scheduler 1: Filter for cycles full from scheduler bank 1...
Page 58
Filter for data being sent directly to the requesting core. DRS_QPI bxxxxx1xx Data to Intel® QPI: Filter for data being sent to a remote socket over Intel® QPI. TxR_BL_CYCLES_FULL • Title: BL Egress Full • Category: BL_EGRESS Events • Event Code: 0x36 •...
The iMC supports event monitoring through four 48-bit wide counters (MC_CHy_PCI_PMON_CTR{3:0}) and one fixed counter (MC_CHy_PCI_PMON_FIXED_CTR) for each DRAM channel (of which there are 4 in Intel Xeon Processor E5-2600 family) the MC is attached to. Each of these counters can be programmed (MC_CHy_PCI_PMON_CTL{3:0}) to capture any MC event.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-60. MC_CHy_PCI_PMON_BOX_CTL Register – Field Definitions (Sheet 2 of 2) Field Bits Attr Reset Description 15:9 Reserved (?) Freeze. If set to 1 and the .frz_en is 1, the counters in this box will be frozen.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring All MC performance monitor data registers are 48-bit wide. Should a counter overflow (a carry out from bit 47), the counter will wrap and continue to collect events. If accessible, software can continuously read the data registers without disabling event collection.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RPQ - Read Pending Queue. NOTE: HA also tracks some information related to the iMC’s RPQ. WPQ - Write Pending Queue. NOTE: HA also tracks some information related to the iMC’s WPQ.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-65. Metrics Derived from iMC Events Symbol Name: Equation Definition MEM_BW_READS: (CAS_COUNT.RD * 64) Memory bandwidth consumed by reads. Expressed in bytes. MEM_BW_TOTAL: MEM_BW_READS + MEM_BW_WRITES Total memory bandwidth. Expressed in bytes.
Page 65
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • Definition: Counts the number of DRAM Activate commands sent on this channel. Activate com- mands are issued to open up a page on the DRAM devices so that it can be read or written to with a CAS.
Page 67
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring POWER_CHANNEL_PPD • Title: Channel PPD Cycles • Category: POWER Events • Event Code: 0x85 • Max. Inc/Cyc: 4, Register Restrictions: 0-3 • Definition: Number of cycles when all the ranks in the channel are in PPD mode. If IBT=off is enabled, then this can be used to count those cycles.
Page 68
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • Definition: Counts the number of cycles when the iMC is in self-refresh and the iMC still has a clock. This happens in some package C-states. For example, the PCU may ask the iMC to enter self-refresh even though some of the cores are still processing.
Page 69
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring PRE_COUNT • Title: DRAM Precharge commands. • Category: PRE Events • Event Code: 0x02 • Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Definition: Counts the number of DRAM Precharge commands sent on this channel.
Page 70
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring being sent from the HA to the iMC. They deallocate after the CAS command has been issued to memory. This includes both ISOCH and non-ISOCH requests. RPQ_OCCUPANCY • Title: Read Pending Queue Occupancy •...
Page 71
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring WPQ_OCCUPANCY • Title: Write Pending Queue Occupancy • Category: WPQ Events • Event Code: 0x81 • Max. Inc/Cyc: 32, Register Restrictions: 0-3 • Definition: Accumulates the occupancies of the Write Pending Queue each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations).
Note: Many power saving features are tracked as events in their respective units. For example, Intel® QPI Link Power saving states and Memory CKE statistics are captured in the Intel® QPI Perfmon and iMC Perfmon respectively. 2.6.2 PCU Performance Monitoring Overview The uncore PCU supports event monitoring through four 48-bit wide counters (PCU_MSR_PMON_CTR{3:0}).
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.6.3.2 PCU PMON state - Counter/Control Pairs The following table defines the layout of the PCU performance monitor control registers. The main task of these configuration registers is to select the event to be monitored by their respective data counter (.ev_sel, .umask).
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • For frequency/voltage band filters, the multipler is at 100MHz granularity. So, a value of 32 (0x20) would represent a frequency of 3.2GHz. • Support for limited Frequency/Voltage Band histogramming. Each of the four bands provided for...
Page 77
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • Core State Transitions - there are a larger number of events provided to track when cores transition C-state, when the enter/exit specific C-states, when they receive a C-state demotion, etc.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.6.5 PCU Box Events Ordered By Code The following table summarizes the directly measured PCU Box events. Table 2-81. Performance Monitor Events for PCU (Sheet 1 of 2) Extra Event Symbol Name...
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-81. Performance Monitor Events for PCU (Sheet 2 of 2) Extra Event Symbol Name Select Ctrs Inc/ Description Code CORE7_TRANSITION_CYCLES 0x0A Core C State Transition Cycles TOTAL_TRANSITION_CYCLES 0x0B Total Core C State Transition Cycles 2.6.6...
Page 80
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring CORE1_TRANSITION_CYCLES • Title: Core C State Transition Cycles • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x04 • Extra Select Bit: Y • Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Definition: Number of cycles spent performing core C state transitions. There is one event per core.
Page 81
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring CORE6_TRANSITION_CYCLES • Title: Core C State Transition Cycles • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x09 • Extra Select Bit: Y • Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Definition: Number of cycles spent performing core C state transitions. There is one event per core.
Page 82
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring DEMOTIONS_CORE4 • Title: Core C State Demotions • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x22 • Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Filter Dependency: PCUFilter[7:0] • Definition: Counts the number of times when a configurable cores had a C-state demotion DEMOTIONS_CORE5 •...
Page 83
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring FREQ_BAND1_CYCLES • Title: Frequency Residency • Category: FREQ_RESIDENCY Events • Event Code: 0x0C • Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Filter Dependency: PCUFilter[15:8] • Definition: Counts the number of cycles that the uncore was running at a frequency greater than or equal to the frequency that is configured in the filter.
Page 84
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring FREQ_MAX_CURRENT_CYCLES • Title: Current Strongest Upper Limit Cycles • Category: FREQ_MAX_LIMIT Events • Event Code: 0x07 • Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Definition: Counts the number of cycles when current is the upper limit on frequency.
Page 85
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring FREQ_MIN_PERF_P_CYCLES • Title: Perf P Limit Strongest Lower Limit Cycles • Category: FREQ_MIN_LIMIT Events • Event Code: 0x02 • Extra Select Bit: Y • Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Definition: Counts the number of cycles when Perf P Limit is preventing us from dropping the fre- quency lower.
Page 86
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring PROCHOT_EXTERNAL_CYCLES • Title: External Prochot • Category: PROCHOT Events • Event Code: 0x0A • Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Definition: Counts the number of cycles that we are in external PROCHOT mode. This mode is triggered when a sensor off the die determines that something off-die (like DRAM) is too hot and must throttle to avoid damaging the chip.
The Intel® QPI Link Layer is responsible for packetizing requests from the caching agent on the way out to the system interface. As such, it shares responsibility with the CBo(s) as the Intel QPI caching agent(s). It is responsible for converting CBo requests to Intel QPI messages (i.e. snoop generation and data response messages from the snoop response) as well as converting/forwarding ring messages to Intel QPI packets and vice versa.
QPI Rate Status 2.7.3.1 Intel® QPI Box Level PMON State The following registers represent the state governing all box-level PMUs in each Port of the Intel® QPI Box. In the case of the Intel® QPI Ports, the Q_Py_PCI_PMON_BOX_CTL register governs what happens when a freeze signal is received (.frz_en).
2.7.3.2 Intel® QPI PMON state - Counter/Control Pairs The following table defines the layout of the Intel® QPI performance monitor control registers. The main task of these configuration registers is to select the event to be monitored by their respective data counter (.ev_sel, .umask, .ev_sel_ext).
Intel® QPI Registers for Packet Mask/Match Facility In addition to generic event counting, each port of the Intel® QPI Link Layer provides two pairs of MATCH/MASK registers that allow a user to filter packet traffic serviced (crossing from an input port to an output port) by the Intel®...
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-88. Q_Py_PCI_PMON_PKT_MATCH1 Registers Field Bits Reset Description 31:20 Reserved; Must write to 0 else behavior is undefined. 19:16 Response Data State (valid when MC == DRS and Opcode == 0x0- 2).
2.7.3.3.1 Events Derived from Packet Filters Following is a selection of common events that may be derived by using the Intel® QPI packet matching facility. The Match/Mask columns correspond to the Match0/Mask0 registers. For the cases where additional fields need to be specified, they will be noted.
Page 93
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-92. Message Events Derived from the Match/Mask filters (Sheet 1 of 2) Match Mask Field Description [12:0] [12:0] DRS.AnyDataC 0x1C00 0x1F80 Any Data Response message containing a cache line in response to a core request.
11 flits in length. 2.7.3.4 Intel® QPI Extra Registers - Companions to PMON HW The uncore’s Intel® QPI box includes an extra MSR that provides the current Intel® QPI transfer rate. Table 2-93. QPI_RATE_STATUS Register – Field Definitions Field...
TxL (aka EGR) - “Transmit to Link” referring to Egress (requests headed for the Ring) queues. 2.7.5 Intel® QPI LL Box Events Ordered By Code The following table summarizes the directly measured Intel QPI LL Box events. Table 2-94. Performance Monitor Events for Intel® QPI LL (Sheet 1 of 2) Extra Event Symbol Name...
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-94. Performance Monitor Events for Intel® QPI LL (Sheet 2 of 2) Extra Event Symbol Name Select Ctrs Inc/ Description Code VNA_CREDIT_RETURN_OCCUPANCY 0x1B VNA Credits Pending Return - Occupancy VNA_CREDIT_RETURNS...
Page 97
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-95. Metrics Derived from Intel QPI LL Events (Sheet 2 of 3) Symbol Name: Equation Definition DRS_F_OR_E_FROM_QPI: ((CTO_COUNT DRS response in F or E states received from with:{Q_Py_PCI_PMON_PKT_MATCH0[12:0]=0x1C00, QPI in bytes. To calculate the total data...
• Definition: Counts the number of clocks in the Intel® QPI LL. This clock runs at 1/8th the "GT/s" speed of the Intel® QPI link. For example, a 8GT/s link will have qfclk or 1GHz. JKT does not sup- port dynamic link speeds, so this frequency is fixed.
Page 99
• Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Definition: Number of Intel® QPI qfclk cycles spent in L0 power mode in the Link Layer. L0 is the default mode which provides the highest performance with the most power. Use edge detect to count the number of instances that the link entered L0.
Page 100
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RxL_BYPASSED • Title: Rx Flit Buffer Bypassed • Category: RXQ Events • Event Code: 0x09 • Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Definition: Counts the number of times that an incoming flit was able to bypass the flit buffer and pass directly across the BGF and into the Egress.
Page 101
Note that this is not the same as "data" bandwidth. For exam- ple, when we are transfering a 64B cacheline across Intel® QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual "data" and an additional 16 bits of other informa- tion.
Page 102
"data" bandwidth. For example, when we are transfering a 64B cache- line across Intel® QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual "data"...
Page 103
• Definition: Number of allocations into the Intel® QPI Rx Flit Buffer. Generally, when data is trans- mitted across Intel® QPI, it will bypass the RxQ and pass directly to the ring interface. If things back up getting transmitted onto the ring, however, it may need to allocate into this buffer, thus increasing the latency.
Page 104
• Definition: Number of allocations into the Intel® QPI Rx Flit Buffer. Generally, when data is trans- mitted across Intel® QPI, it will bypass the RxQ and pass directly to the ring interface. If things back up getting transmitted onto the ring, however, it may need to allocate into this buffer, thus increasing the latency.
Page 105
• Definition: Number of allocations into the Intel® QPI Rx Flit Buffer. Generally, when data is trans- mitted across Intel® QPI, it will bypass the RxQ and pass directly to the ring interface. If things back up getting transmitted onto the ring, however, it may need to allocate into this buffer, thus increasing the latency.
Page 106
• Definition: Accumulates the number of elements in the Intel® QPI RxQ in each cycle. Generally, when data is transmitted across Intel® QPI, it will bypass the RxQ and pass directly to the ring interface. If things back up getting transmitted onto the ring, however, it may need to allocate into this buffer, thus increasing the latency.
Page 107
• Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Definition: Number of Intel® QPI qfclk cycles spent in L0 power mode in the Link Layer. L0 is the default mode which provides the highest performance with the most power. Use edge detect to count the number of instances that the link entered L0.
Page 108
"data" bandwidth. For example, when we are transfering a 64B cache- line across Intel® QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual "data"...
Page 109
"data" bandwidth. For example, when we are transfering a 64B cache- line across Intel® QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of actual "data"...
Page 110
• Definition: Accumulates the number of flits in the TxQ. Generally, when data is transmitted across Intel® QPI, it will bypass the TxQ and pass directly to the link. However, the TxQ will be used with L0p and when LLR occurs, increasing latency to transfer out to the link. This can be used with the cycles not empty event to track average occupancy, or the allocations event to track average life- time in the TxQ.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.8.3.2 R2PCIe PMON state - Counter/Control Pairs The following table defines the layout of the R2PCIe performance monitor control registers. The main task of these configuration registers is to select the event to be monitored by their respective data counter (.ev_sel, .umask).
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.8.4 R2PCIe Performance Monitoring Events 2.8.4.1 An Overview R2PCIe provides events to track information related to all the traffic passing through it’s boundaries. • IIO credit tracking - credits rejected, acquired and used all broken down by message Class.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-109. Metrics Derived from R2PCIe Events (Sheet 2 of 2) Symbol Name: Equation Definition IIO_RDS_TO_RING_IN_BYTES: TxR_INSERTS.BL * 32 IIO Reads, data transmitted to Ring in Bytes RING_THRU_DNEVEN_BYTES: RING_BL_USED.CCW_EVEN * 32...
Page 116
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RING_AK_USED • Title: R2 AK Ring in Use • Category: RING Events • Event Code: 0x08 • Max. Inc/Cyc: 1, Register Restrictions: 0-3 • Definition: Counts the number of cycles that the AK ring is being used at this ring stop. This includes when packets are passing by and when packets are being sunk, but does not include when packets are being sent from the ring stop.
Page 117
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-113. Unit Masks for RING_IV_USED umask Extension Description [15:8] b00001111 Any: Filters any polarity RxR_AK_BOUNCES • Title: AK Ingress Bounced • Category: INGRESS Events • Event Code: 0x12 • Max. Inc/Cyc: 1, Register Restrictions: 0 •...
Page 118
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring TxR_CYCLES_NE • Title: Egress Cycles Not Empty • Category: EGRESS Events • Event Code: 0x23 • Max. Inc/Cyc: 1, Register Restrictions: 0 • Definition: Counts the number of cycles when the R2PCIe Egress is not empty. This tracks one of the three rings that are used by the R2PCIe agent.
The R3QPI agent provides several functions: • Interface between Ring and Intel® QPI: One of the primary attributes of the ring is its ability to convey Intel® QPI semantics with no translation. For example, this architecture enables initiators to communicate with a local Home agent in exactly the same way as a remote Home agent on another socket.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.9.3.2 R3QPI PMON state - Counter/Control Pairs The following table defines the layout of the R3QPI performance monitor control registers. The main task of these configuration registers is to select the event to be monitored by their respective data counter (.ev_sel, .umask).
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.9.4 R3QPI Performance Monitoring Events 2.9.4.1 An Overview R3QPI provides events to track information related to all the traffic passing through it’s boundaries. • VN/IIO credit tracking - in addition to tracking the occupancy of the full VNA queue, R3QPI provides a great deal of additional information: credits rejected, acquired and used often broken down by Message Class.
• Definition: Counts the number of times that a request attempted to acquire an NCS/NCB/DRS credit in the Intel® QPI for sending messages on BL to the IIO but was rejected because no credit was available. There is one credit for each of these three message classes (three credits total).
Page 124
• Max. Inc/Cyc: 1, Register Restrictions: 0-1 • Definition: Counts the number of cycles when the NCS/NCB/DRS credit is in use in the Intel® QPI for sending messages on BL to the IIO. There is one credit for each of these three message classes (three credits total).
Page 125
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RING_AK_USED • Title: R3 AK Ring in Use • Category: RING Events • Event Code: 0x08 • Max. Inc/Cyc: 1, Register Restrictions: 0-2 • Definition: Counts the number of cycles that the AK ring is being used at this ring stop. This includes when packets are passing by and when packets are being sent, but does not include when packets are being sunk into the ring stop.
Page 126
• Definition: Counts the number of cycles when the Intel® QPI Ingress is not empty. This tracks one of the three rings that are used by the Intel® QPI agent. This can be used in conjunction with the Intel® QPI Ingress Occupancy Accumulator event in order to calculate average queue occu- pancy.
Page 127
• Definition: Counts the number of allocations into the Intel® QPI Ingress. This tracks one of the three rings that are used by the Intel® QPI agent. This can be used in conjunction with the Intel® QPI Ingress Occupancy Accumulator event in order to calculate average queue latency. Multiple ingress buffers can be tracked at a given time using multiple counters.
Page 128
• Definition: Counts the number of cycles when the Intel® QPI Egress is not empty. This tracks one of the three rings that are used by the Intel® QPI agent. This can be used in conjunction with the Intel® QPI Egress Occupancy Accumulator event in order to calculate average queue occupancy.
Page 129
• Definition: Number of times a VN0 credit was used on the DRS message channel. In order for a request to be transferred across Intel® QPI, it must be guaranteed to have a flit buffer on the remote socket to sink into. There are two credit pools, VNA and VN0. VNA is a shared pool used to achieve high performance.
Page 130
VNA credits were not fully used up. The VNA pool is generally used to pro- vide the bulk of the Intel® QPI bandwidth (as opposed to the VN0 pool which is used to guarantee forward progress).
• Max. Inc/Cyc: 1, Register Restrictions: 0-1 • Definition: Number of Intel® QPI uclk cycles with one or more VNA credits in use. This event can be used in conjunction with the VNA In-Use Accumulator to calculate the average number of used VNA credits.
Page 132
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-143. Opcode Match by Message Class (Sheet 2 of 2) HOM0 HOM1 1101 WbMtoE RspIWb 1110 WbMtoS RspSWb 1111 AckCnflt PrefetchHint 0000 Gnt_Cmp NcWr NcRd 0001 Gnt_FrcAckCnflt WcWr IntAck 0010...
Page 133
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-144. Opcodes (Alphabetical Listing) (Sheet 2 of 4) Name Desc DataC_(FEIMS)_Cmp 0010 Data Response in (FEIMS) state, Complete NOTE: Set RDS field to specify which state is to be measured.
Page 134
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-144. Opcodes (Alphabetical Listing) (Sheet 3 of 4) Name Desc PrefetchHint 1111 Snoop Prefetch Hint RdCode 0001 HOM0 Read cache line in F (or S, if the F state not supported)
Page 135
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-144. Opcodes (Alphabetical Listing) (Sheet 4 of 4) Name Desc WbMtoI 1100 HOM0 Write a cache line in M state back to memory and transition its state to I. WbMtoE...