Cryptographic performance on the 2nd generation intel core processor family (11 pages)
Summary of Contents for Intel BX80571E7500 - Core 2 Duo 2.93 GHz Processor
Page 1
Intel® Xeon® Processor 7500 Series Uncore Programming Guide Reference Number: 323535-001 March 2010...
Page 2
Intel processor. Intel reserves these features or instructions for future definition and shall have no respon- sibility whatsoever for conflicts or incompatibilities arising from their unauthorized use.
Figure 1-1 provides an Intel Xeon Processor 7500 Series block diagram. Figure 1-1. Intel Xeon Processor 7500 Series Block Diagram Uncore PMU Overview The processor uncore performance monitoring is supported by PMUs local to each of the C, S, B, M, R, U, and W-Boxes.
® X ® P 7500 S INTRODUCTION NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE The general performance monitoring capabilities in each box are outlined in the following table. Table 1-1. Per-Box Performance Monitoring Capabilities # Counters/ Generic Packet Match/ # Boxes Bit Width Counters? Mask Filters?
S-Box and a final summary register in the U-Box. ® ® The Intel Xeon Processor 7500 Series uncore performance monitors may be configured to respond to this overflow with two basic actions: 2.1.1.1...
Frozen counters - If software set up the counters to freeze on overflow and send notification when it happens, the next question is: Who caused the freeze? Overflow bits are stored hierarchically within the Intel Xeon Processor 7500 Series uncore. First, software should read the U_MSR_PMON_GLOBAL_STATUS.ov_* bits to determine whether a U or W box counter caused the overflow or whether it was a counter in a box attached to the S0 or S1 Box.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE a) Clear all uncore counters: Set U_MSR_PMON_GLOBAL_CTL.rst_all to 1. b) Clear all overflow bits. When an overflow bit is cleared, all bits that summarize that overflow (above in the hierarchy) will also be cleared.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-2. U_MSR_PMON_GLOBAL_CTL Register – Field Definitions Field Bits Reset Description frz_all Disable uncore counting (by clearing .en_all) if PMI is received from box with overflowing counter. Read zero;...
NCORE ROGRAMMING UIDE U-Box Performance Monitoring The U-Box serves as the system configuration controller for the Intel Xeon Processor 7500 Series. It contains one counter which can be configured to capture a small set of events. 2.2.1 U-Box PMON Summary Table 2-5.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE The U-Box performance monitor data register is 48b wide. A counter overflow occurs when a carry out bit from bit 47 is detected. Software can force all uncore counting to freeze after N events by preloading a monitor with a count value of 2 - N and setting the control register to send a PMI to the U-Box.
Intel QuickPath Interconnect messages that pass through the socket’s LLC remain coherent. The Intel Xeon Processor 7500 Series contains eight instances of the C-Box, each assigned to manage a distinct 3MB, 24-way set associative slice of the processor’s total LLC capacity. For processors with fewer than 8 3MB LLC slices, the C-Boxes for missing slices will still be active and track ring traffic caused by their co-located core even if they have no LLC related traffic to track (i.e.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE If an overflow is detected from one of the C-Box PMON registers, the corresponding bit in the _GLOBAL_STATUS.ov field will be set. To reset the overflow bits set in the _GLOBAL_STATUS.ov field, a user must set the corresponding bits in the _GLOBAL_OVF_CTL.clr_ov field before beginning a new sample interval.
Acronyms frequently used in C-Box Events: The Rings: AD (Address) Ring - Core Read/Write Requests and Intel QPI Snoops. Carries Intel QPI requests and snoop responses from C to S-Box. BL (Block or Data) Ring - Data == 2 transfers for 1 cache line AK (Acknowledge) Ring - Acknowledges S-Box to C-Box and C-Box to Core.
Intel QPI-snoop path. There is a one-to-one correspondence between the Intel QPI snoops received by the socket, and the IPQ allocations in the C-Boxes. In both cases, if the message is in the IRQ/IPQ then the C-Box hasn’t acknowledged it yet and the request hasn’t yet entered the LLC’s “coherence domain”.
0x22 RSPF Occupancy TRANS_RSPF 0x23 RSPF Transactions SNPS 0x27 Snoops to LLC SNP_HITS 0x28 Snoop Hits in LLC 2.3.6 C-Box Performance Monitor Event List This section enumerates Intel Xeon Processor 7500 Series uncore performance monitoring events for the C-Box. 2-18...
Page 31
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE ARB_LOSSES • Title: Arbiter Losses. • Category: Ring - Egress • Event Code: 0x0A, Max. Inc/Cyc: 7, • Definition: Number of Ring arbitration losses A loss occurs when a message injection on to the ring fails.
Page 32
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE BOUNCES_C2P_AK • Title: C2P AK Bounces • Category: Ring - WIR • Event Code: 0x02, Max. Inc/Cyc: 1, • Definition: Number of LLC Ack responses to the core that bounced on the AK ring. umask Extension Description...
Page 33
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE EGRESS_BYPASS_WINS • Title: Egress Bypass Wins • Category: Local - Egress • Event Code: 0x0C, Max. Inc/Cyc: 7, • Definition: Number of times a ring egress bypass was taken when a message was injected onto the ring.
Page 34
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE umask Extension Description [15:8] b00001xxx Forward (S with right to Forward on snoop) b00001111 All hits (to any cacheline state) LLC_MISSES • Title: LLC Misses •...
Page 35
A message associated with a transaction monitored by the MAF was delayed because the transaction had a snoop pending. AC_PENDING bxxxx1xxx An incoming remote Intel QPI snoop was delayed because it conflicted with an existing MAF transaction that had an Ack Conflict pending. IDX_BLOCK bxxx1xxxx An incoming local core RD that missed the LLC was delayed because a victim way could not be immediately chosen.
Page 36
Home (for example, LLC RD Miss) was delayed because the S-Box Request Table was full. WB_PENDING bx1xxxxxx An incoming remote Intel QPI snoop request to the LLC was delayed because it conflicted with an existing transaction that had a WB to Home pending. NACK2_ELSE...
Page 37
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE OCCUPANCY_RWRF • Title: RWRF Occupancy • Category: Queue Occupancy • Event Code: 0x20, Max. Inc/Cyc: 12, • Definition: Cumulative count of the occupancy in the Read/Write Request FIFO. OCCUPANCY_VIQ •...
Page 38
• Event Code: 0x28, Max. Inc/Cyc: 1, • Definition: Number of Intel QPI snoops that hit in the LLC according to state of LLC when the hit occurred. GotoS: LLC Data or Code Read Snoop Hit ‘x’ state in remote cache. GotoI: LLC Data Read for Ownership Snoop Hit ‘x’...
Page 39
Sum across all BL Egresses. b10000000 IV Egress TRANS_IPQ • Title: IPQ Transactions • Category: Queue Occupancy • Event Code: 0x1B, Max. Inc/Cyc: 1, • Definition: Number of Intel QPI snoop probes that entered the LLC’s Ingress Probe Queue. 2-27...
Page 40
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE TRANS_IRQ • Title: IRQ Transactions • Category: Queue Occupancy • Event Code: 0x19, Max. Inc/Cyc: 1, • Definition: Number of processor RD and/or WR requests to the LLC that entered the Ingress Response Queue.
2.4.2 B-Box Performance Monitoring Overview Each of the two B-Boxes in the Intel Xeon Processor 7500 Series supports event monitoring through four 48-bit wide counters (BBx_CR_B_MSR_PERF_CNT{3:0}). Each of these four counters is dedicated to observe a specific set of events as specified in its control register (BBx_CR_B_MSR_PERF_CTL{3:0}).
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-17. B_MSR_PMON_GLOBAL_CTL Register – Field Definitions Field Bits Reset Description ctr_en Must be set to enable each B-Box counter (bit 0 to enable ctr0, etc) NOTE: U-Box enable and per counter enable must also be set to fully enable the counter.
OPCODE_ADDR_IN_MATCH for counter 0) to capture the filter match as an event. The fields are laid out as follows: Note: Refer to Table 2-103, “Intel® QuickPath Interconnect Packet Message Classes” Table 2-104, “Opcode Match by Message Class” to determine the encodings of the B-Box Match Register fields.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-22. B_MSR_MATCH_REG Register – Field Definitions Field Bits Reset Description opc_out 59:56 Match to this outgoing opcode opc_in 55:52 Match to this incoming opcode msg_out 51:48 Match to this outgoing message class...
M-Box backpressures the B-Box (e.g. a link error flow in M-Box). • MDRSOQ (Mirror DRS Output Queue) 32-entry - Request is pushed onto Mirror DRS output queue when a NonSnpWrData(Ptl) needs to be sent to the mirror slave and VN1 DRS channel or Intel QPI output resources are unavailable.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-24. Performance Monitor Events for B-Box Events Event Symbol Name Description Code Inc/Cyc Counter 0 Events MSG_ADDR_IN_MATCH 0x01 Message + Address In Match OPCODE_ADDR_IN_MATCH 0x02 Message + Opcode + Address In Match MSG_OPCODE_ADDR_IN_MATCH...
Page 50
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE IMT_FULL • Title: IMT Full • Category: In-Flight Memory Table • Event Code: 0x16, Max. Inc/Cyc: 1, PERF_CTL: 3, • Definition: Number of times In-Flight Memory Table was full when entry was needed by incoming transaction.
Page 51
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE IMT_INSERTS_IOH_WR • Title: IMT IOH Write Inserts • Category: In-Flight Memory Table • Event Code: 0x0D, Max. Inc/Cyc: 1, PERF_CTL: 1, • Definition: In-Flight Memory Table Write IOH Request Inserts (e.g. all IOH triggered memory write transactions targeting this B-Box as their home node and processed by this B-Box) •...
Page 52
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE IMT_INSERTS_RD • Title: IMT Read Inserts • Category: In-Flight Memory Table • Event Code: 0x1D, Max. Inc/Cyc: 1, PERF_CTL: 1, • Definition: In-Flight Memory Table inserts of read requests (e.g. all memory read transactions target- ing this B-Box as their home node and processed by this B-Box) •...
Page 53
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE MSG_IN_MATCH • Title: Message In Match • Category: Mask/Match • Event Code: 0x01, Max. Inc/Cyc: 1, PERF_CTL: 1, • Definition: Message Class Match at B-Box Input. Use B_MSR_MATCH/MASK_REG MSGS_IN_NON_SNP •...
Page 54
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE OPCODE_IN_MATCH • Title: Opcode In Match • Category: Mask/Match • Event Code: 0x03, Max. Inc/Cyc: 1, PERF_CTL: 1, • Definition: Opcode Match at B-Box Input. Use B_MSR_MATCH/MASK_REG OPCODE_OUT_MATCH •...
Page 55
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE TF_INVITOE • Title: TF Occupancy - InvItoEs • Category: Tracker File • Event Code: 0x06, Max. Inc/Cyc: 1, PERF_CTL: 0, • Definition: Tracker File occupancy for InvItoE requests. Accumulates lifetimes of InvItoE memory transactions that have arrived in this B-Box (TF starts tracking transactions before they are sent to the M-Box).
Page 56
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE TF_WR • Title: TF Occupancy - Writes • Category: Tracker File • Event Code: 0x05, Max. Inc/Cyc: 1, PERF_CTL: 0, • Definition: Tracker File occupancy for write requests. Accumulates lifetimes of write memory transactions that have arrived in this B-Box (TF starts tracking transactions before they are sent to the M-Box).
The S-Box represents the interface between the last level cache and the system interface. It manages flow control between the C and R & B-Boxes. The S-Box is broken into system bound (ring to Intel QPI) and ring bound (Intel QPI to ring) connections.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-26. S_MSR_PMON_SUMMARY Register Fields Field Bits Reset Description 63:20 Read zero; writes ignored. ov_r Overflow in R Box In S-Box0, indicates overflow from Left R-Box In S-Box1, indicates overflow from Right R-Box ov_s Overflow in S Box...
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-29. S_MSR_PMON_OVF_CTRL Register Fields Field Bits Reset Description clr_ov Writing ‘1’ to bit in filed causes corresponding bit in ‘Overflow PerfMon Counter’ field in S_CSR_PMON_GLOBAL_STATUS register to be cleared to 2.5.3.3 S-Box PMON state - Counter/Control Pairs + Filters The following table defines the layout of the S-Box performance monitor control registers.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE The S-Box performance monitor data registers are 48b wide. A counter overflow occurs when a carry out bit from bit 47 is detected. Software can force all uncore counting to freeze after N events by preloading a monitor with a count value of (2 - 1) - N and setting the control register to send a PMI to the U-Box.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-33. S_MSR_MATCH Register – Field Definitions Field Bits Reset Description resp 63:59 Match if returning data is in b1xxxx - ‘F’ state. bx1xxx - ‘S’ state bxx1xx - ‘E’...
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-35. S_MSR_MASK Register – Field Definitions Field Bits Reset Description 62:39 Read zero; writes ignored. addr 38:1 Mask PA address bits [43:6]. For each mask bit that is set, the corresponding bit in the address is already considered matched (e.g.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-36. S-Box Data Structure Occupancy Events Insta Structure/Event Name Subev Description/Comment Entries nces System Bound HOM Message RBOX HOM Packet to System Queue B-Box TO_R_B_HOM_MSGQ_OCCUPANCY 1 buffer for R-Box and 1 for B-Box, 64 entries each.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Flits per Message Class Comment Ring Bound NCB The only ring bound NCB message types are: NcMsgB, IntLogical, IntPhysical. These are all 11 flit messages. NOTE: flits are variable in the Sys Bound direction.
Page 66
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-37. Performance Monitor Events for S-Box Events Event Symbol Name Description Code Inc/Cyc TO_RING_NCB_MSGQ_CYCLES_NE 0x24 Cycles Ring Bound NCB Message Queue Not Empty TO_RING_NCS_MSGQ_CYCLES_NE 0x25 Cycles Ring Bound NCS Message Queue Not Empty TO_RING_MSGQ_OCCUPANCY...
NO_CREDIT_IPQ 0x8A IPQ Credit Unavailable 2.5.6 S-Box Performance Monitor Event List This section enumerates Intel Xeon Processor 7500 Series uncore performance monitoring events for the S-Box. B2S_DRS_BYPASS • Title: B-Box to S-Box DRS Bypass • Category: Ring Bound Enhancement • Event Code: 0x53, Max. Inc/Cyc: 1, •...
Page 68
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE EGRESS_ARB_LOSSES • Title: Egress ARB Losses • Category: Ring Bound Credits • Event Code: 0x42, Max. Inc/Cyc: 1, • Definition: Egress Arbitration Losses. • NOTE: Enabling multiple subevents in this category will result in the counter being increased by the number of selected subevents that occur in a given cycle.
Page 69
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE EGRESS_BYPASS • Title: Egress Bypass • Category: Ring Bound Enhancement • Event Code: 0x40, Max. Inc/Cyc: 1, • Definition: Egress Bypass optimization utilized. • NOTE: Enabling multiple subevents in this category will result in the counter being increased by the number of selected subevents that occur in a given cycle.
Page 70
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE FLITS_SENT_DRS • Title: DRS Flits Sent to System • Category: System Bound Transmission • Event Code: 0x65, Max. Inc/Cyc: 1, • Definition: Number of data response flits the S-Box has transmitted to the system. FLITS_SENT_NCB •...
Page 71
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE NO_CREDIT_HOM • Title: HOM Credit Unavailable • Category: System Bound Credits • Event Code: 0x80, Max. Inc/Cyc: 1, • Definition: Number of times the S-Box has a pending home message to send and there is no HOM or VNA credit available.
Page 72
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE NO_CREDIT_VNA • Title: VNA Credit Unavailable • Category: System Bound Transmission • Event Code: 0x86, Max. Inc/Cyc: 1, • Definition: Number of times the S-Box has exhausted its VNA credit pool. When more than one subevent is selected, the credit counter will be incremented by the number of selected subevents that occur in each cycle.
Page 73
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE PKTS_RCVD_SNP • Title: SNP Packets Received from System • Category: Ring Bound Transmission • Event Code: 0x71, Max. Inc/Cyc: 1, • Definition: Number of snoop packets the S-Box has received from the system. PKTS_SENT_DRS •...
Page 74
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE CBOX2_6 bx1xx C-Boxes 2 and 6 CBOX3_7 b1xxx C-Boxes 3 and 7 PKTS_SENT_NCS • Title: NCS Packets Sent to System • Category: System Bound Transmission •...
Page 75
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE RBOX_SNP_BYPASS • Title: R-Box SNP Bypass • Category: System Bound Enhancement • Event Code: 0x51, Max. Inc/Cyc: 1, • Definition: R-Box SNP bypass optimization utilized. When both snoop and big snoop bypass are selected, the performance counter will increment on both subevents.
Page 76
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE TO_RING_B2S_MSGQ_OCCUPANCY • Title: Ring Bound B2S Message Queue Occupancy • Category: Ring Bound Queue • Event Code: 0x2F, Max. Inc/Cyc: 8, • Definition: Number of entries in header buffer containing B to S-Box messages on their way to the Ring.
Page 77
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE TO_RING_NDR_MSGQ_CYCLES_NE • Title: Cycles Ring Bound NDR Message Queue Not Empty • Category: Ring Bound Queue • Event Code: 0x28, Max. Inc/Cyc: 1, • Definition: Number of cycles in which the header buffer, containing NDR messages on their way to the Ring, has one or more entries allocated.
Page 78
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE TO_R_DRS_MSGQ_CYCLES_FULL • Title: Cycles System Bound DRS Message Queue Full. • Category: System Bound Queue • Event Code: 0x0E, Max. Inc/Cyc: 1, • Definition: Number of cycles in which the header buffer for the selected C-Box, containing DRS mes- sages heading to a System Agent (through the R-Box), is full.
Page 79
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE CBOX3_7 b1xxx CBOX 3 and 7 b1111 All C-Boxes TO_R_B_HOM_MSGQ_CYCLES_FULL • Title: Cycles System Bound HOM Message Queue Full. • Category: System Bound Queue • Event Code: 0x03, Max. Inc/Cyc: 1, •...
Page 80
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE TO_R_NCB_MSGQ_CYCLES_FULL • Title: Cycles System Bound NCB Message Queue Full. • Category: System Bound Queue • Event Code: 0x11, Max. Inc/Cyc: 1, • Definition: Number of cycles in which the header buffer for the selected C-Box, containing NCB mes- sages heading to a System Agent (through the R-Box), is full.
Page 81
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE CBOX3_7 b1xxx CBOX 3 and 7 b1111 All C-Boxes TO_R_NCS_MSGQ_CYCLES_FULL • Title: Cycles System Bound NCS Message Queue Full. • Category: System Bound Queue • Event Code: 0x14, Max. Inc/Cyc: 1, •...
Page 82
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE CBOX1_5 bxx1x CBOX 1 and 5 CBOX2_6 bx1xx CBOX 2 and 6 CBOX3_7 b1xxx CBOX 3 and 7 b1111 All C-Boxes TO_R_NDR_MSGQ_CYCLES_FULL • Title: Cycles System Bound NDR Message Queue Full •...
Page 83
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE TO_R_SNP_MSGQ_CYCLES_FULL • Title: Cycles System Bound SNP Message Queue Full • Category: System Bound Queue • Event Code: 0x08, Max. Inc/Cyc: 1, • Definition: Number of cycles in which the header buffer, containing SNP messages heading to a Sys- tem Agent (through the R-Box), is full.
B-Box1 on Port7). The R-Box connects to these through full flit 80b links. Ports 0,1,4 and 5 are connected to external Intel QPI agents (through P-boxes also known as the physical layers), also through full flit 80b links.
2.6.1.4 R-Box Link Layer Resources ® Each R-Box port supports up to three virtual networks (VN0, VN1, and VNA) as defined by the Intel QuickPath Interconnect Specification. The following table specifies the port resources. Table 2-38. Input Buffering Per Port...
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE 3) Pick a generic counter (control+data) that can monitor an event on that port. (e.g R_MSR_PMON_CTL/CTR3) 4) Pick one of the two sub counters that allows a user to monitor the event (R_MSR_PORT1_IPERF1), program it to monitor the chosen event (R_MSR_PORT1_IPERF1[31] = 0x1) and set the generic control to point to it (R_MSR_PMON_CTL3.ev_sel == 0x7).
Page 87
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Size MSR Name Access Addres Description (bits) R_MSR_PORT5_XBR_SET1_MASK RW_NA 0x0E86 R-Box Port 5 Mask 1 R_MSR_PORT5_XBR_SET1_MATCH RW_NA 0x0E85 R-Box Port 5 Match 1 R_MSR_PORT5_XBR_SET1_MM_CFG RW_NA 0x0E84 R-Box Port 5 Mask/Match Config 1 R_MSR_PORT4_XBR_SET2_MASK...
Page 88
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Size MSR Name Access Addres Description (bits) R_MSR_PMON_CTR13 RW_RW 0x0E3B R-Box PMON Counter 13 R_MSR_PMON_CTL13 RW_NA 0x0E3A R-Box PMON Control 13 R_MSR_PMON_CTR12 RW_RW 0x0E39 R-Box PMON Counter 12 R_MSR_PMON_CTL12 RW_NA 0x0E38...
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE 2.6.3.2 R-Box Box Level PMON state The following registers represent the state governing all box-level PMUs in the R-Box. The _GLOBAL_CTL register contains the bits used to enable monitoring. It is necessary to set the .ctr_en bit to 1 before the corresponding data register can collect events.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE 2.6.3.4 R-Box IPERF Performance Monitoring Control Registers The following table contains the events that can be monitored if one of the RIX (IPERF) registers was chosen to select the event.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-48. R_MSR_PORT{7-0}_IPERF_CFG{1-0} Registers (Sheet 2 of 2) Field Bits Reset Description IQA_READ_OK Bid wins arbitration. Read flit from IQA and drains to XBAR. NEW_PVN New Packet VN Select: Anded with result of New Packet Class Bit Mask.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE To use the match/mask facility : a) Set the MM_CFG (see Table 2-50, “R_MSR_PORT{7-0}_XBR_SET{2-1}_MM_CFG Registers”) .dis field (bit 63) to 0 and .mm_trig_en (bit 21) to 1. NOTE: In order to monitor packet traffic, instead of the flit traffic associated with each packet, set .match_flt_cnt to 0x1.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-53. Message Events Derived from the Match/Mask filters Match Mask Field Description [15:0] [15:0] DRS.AnyDataC 0x1C00 0x1F80 Any Data Response message containing a cache line in response to a core request.
Target Available STARVING Starvation Detective 2.6.6 R-Box Performance Monitor Event List This section enumerates Intel Xeon Processor 7500 Series uncore performance monitoring events for the R-Box. ALLOC_TO_ARB • Title: Transactions allocated to ARB • Category: RIX • [Bit(s)] Value: See Note, Max. Inc/Cyc: 1, •...
Page 101
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-55. Unit Masks for ALLOC_TO_ARB IPERF Bit Extension Values Description [15:9] b0000000 (*nothing will be counted*) bxxxxxx1 Non-Coherent Bypass Messages bxxxxx1x Non-Coherent Standard Messages DRS_VN01 bxxxx1xx Data Response (VN0 &...
Page 102
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE EOT_OCCUPANCY • Title: EOT Occupancy • Category: RIX • [Bit(s)] Value: [21]0x0, Max. Inc/Cyc: 1, • Definition: Used with MC field. Report a rolling count whenever a 7b counter (count == 128) over- flows for the selected MC’s allocation into EOT.
Page 103
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE GLOBAL_ARB_BID_FAIL • Title: Failed Global ARB Bids • Category: QLX • [Bit(s)] Value: [3:0]0x5, Max. Inc/Cyc: 1, • Definition: Number of bids for output port that were rejected at the global ARB. Table 2-58.
Page 104
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE NEW_PACKETS_RECV • Title: New Packets Received by Port • Category: RIX • [Bit(s)] Value: see table, Max. Inc/Cyc: 1, • Definition: Counts new packets received according to the Virtual Network and Message Class speci- fied.
Page 105
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE OUTPUTQ_NE • Title: Output Queue Not Empty • Category: RIX • [Bit(s)] Value: [26]0x1, Max. Inc/Cyc: 1, • Definition: Output Queue Not Empty in this Output Port. OUTPUTQ_OVFL •...
Page 106
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE RETRYQ_NE • Title: Retry Queue Not Empty • Category: RIX • [Bit(s)] Value: [28]0x1, Max. Inc/Cyc: 1, • Definition: Retry Queue Not Empty in this Output Port. RETRYQ_OV •...
ECC support. There are two memory controllers per socket, each controlling two Intel SMI channels in lockstep. Because of the data path affinity to the B-Box data path, each B-Box is paired with a memory controller, that is, B-Boxes and memory controllers come in pairs.
— Support for integrating RDIMM thermal sensor information into Intel SMI Status Frame. • No support for daisy chaining (Intel 7500 Scalable Memory Buffer is the only Intel SMI device in the channel). • No support for FB-DIMM1 protocol and signaling.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE For instance, to count (in counter 0) the number of RAS DRAM commands (PLD_DRAM_EV.DRAM_CMD.RAS) that have been issued, set up is as follows: M_MSR_PMU_CNT_CTL_0.en [0] = 1 M_MSR_PMU_CNT_CTL_0.count_mode [3:2] = 0x0 M_MSR_PMU_CNT_CTL_0.flag_mode [7] = 0 M_MSR_PMU_CNT_CTL_0.inc_sel [13:9] = 0xa...
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-64. M_MSR_PERF_GLOBAL_CTL Register Fields Field Bits Reset Description ctr_en Must be set to enable each MBOX 0 counter (bit 0 to enable ctr0, etc) NOTE: U-Box enable and per counter enable must also be set to fully enable the counter.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-69. M_MSR_PMU_TIMESTAMP_UNIT Register – Field Definitions Field Bits Reset Description timestamp 15:0 Timestamp is updated every timestamp_unit MClk’s 2.7.4.3 M-Box PMU Filter Registers The M-Box also provides a limited ability to perform address matching for PLD events. The following 3 tables contain the field definitions for the configuration registers governing the M-Box’s address match/ mask facility.
Page 115
Original B-Box transaction’s FVID sent from DSP during subcommand execution where the appropriate ® subcommand information is accessed to compose the Intel SMI command frame. PGT - Page Table - Keeps track of open pages. Translates the read/write commands into DRAM command combinations (i.e.
FVID (Fill Victim Index) of transaction for which scheduler latency is to be counted. Only fully completed transactions are counted. The ISS subcontrol register contains bits to specify subevents for the ISS_EV (by Intel SMI frame), CYCLES_SCHED_MODE (cycles spent per ISS mode) and PLD_DRAM_EV (DRAM commands broken down by scheduling mode in the ISS) events.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE The MAP subcontrol register contains bits to specify subevents for BCMD_SCHEDQ_OCCUPANCY (by B- Box command type). Table 2-75. M_MSR_PMU_MAP Register – Field Definitions Field Bits Reset Reset Type 31:12...
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-77. TRP_PT_{DN,UP}_CND Encodings Name Description ABOVE_TEMPMID_RISE 0b11 Above the mid temperature trip point (rising) ABOVE_TEMPMID_FALL 0b10 Above the mid temperature trip point (falling) ABOVE_TEMPLO Above the low temperature trip point, but below the mid temperature 0b01 trip point.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-79. M_MSR_PMU_PLD Register – Field Definitions Field Bits Reset Reset Type 31:14 Reads 0; writes ignored. pld_trig_sel 15:14 When 0, corresponding PMU event records number of ZAD parity errors. When 1 or 2, respective trigger match event is selected.
Buffers OR PBOX init error (see pbox_init_err field). These bits are denoted NBDE in the Intel SMI spec status frame description. An OR of all the bits over all the Intel 7500 Scalable Memory Buffers is selected here as an event.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-82. M_MSR_PMU_ZDP_CTL_FVC.RESP Encodings Name Value Description spr_uncor_resp 0b111 Uncorrectable response for command to misbehaving DIMM during sparing. Reserved 0b110 spr_ack_resp 0b101 Positive acknowledgment for command to misbehaving DIMM during sparing.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE - Auto page closes. - Open-page to closed-page policy transitions. As well as length of time spent in each policy. - Starvation event in scheduler, starvation state and back-pressure to B-Box. - Thermal throttling and many more.
BCMD_SCHEDQ_OCCUPANCY B-Box Command Scheduler Queue Occupancy 2.7.7 M-Box Performance Monitor Event List This section enumerates Intel Xeon Processor 7500 Series uncore performance monitoring events for the M-Box. BBOX_CMDS_ALL • Title: All B-Box Commands • Category: M-Box Commands Received • Event Code: 0x1a, Max. Inc/Cyc: 1, •...
Page 124
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE CYCLES_PGT_STATE • Title: Time in Page Table State • Category: Cycle Events • Event Code: [21:19]0x05 && [7]0x1, Max. Inc/Cyc: 1, • Definition: Counts cycles page table stays in state as specified in PMU_PGT.opencls_time. Table 2-86.
Page 125
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE PLD Dep ISS Dep Extension Description Bits Bits PREALL.TRDOFF [12:8]0x1 [9:7]0x0 Count Preall (no auto-precharge, open page mode) && [0]0x1 DRAM commands during ‘static trade off’ scheduling mode.
Page 126
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE PLD Dep ISS Dep Extension Description Bits Bits CAS_WR_CLS.WRPRIO [12:8]0x6 [9:7]0x2 Count CAS Write (precharge, closed page mode) DRAM && [0]0x1 commands during ‘static write priority’ scheduling mode.
Page 127
DIMM during sparing SMI_NB_TRIG Select Intel SMI Northbound debug event bits from Intel SMI status frames as returned from the Intel 7500 Scalable Memory Buffers. Used for Debug purposes FVC_EV1 • Title: FVC Event 1 •...
Page 128
DIMM during sparing SMI_NB_TRIG Select Intel SMI Northbound debug event bits from Intel SMI status frames as returned from the Intel 7500 Scalable Memory Buffers. Used for Debug purposes FVC_EV2 • Title: FVC Event 2 •...
Page 129
DIMM during sparing SMI_NB_TRIG Select Intel SMI Northbound debug event bits from Intel SMI status frames as returned from the Intel 7500 Scalable Memory Buffers. Used for Debug purposes FVC_EV3 • Title: FVC Event 3 •...
Page 130
• Definition: Number of new memory controller (read and write) commands accepted FRM_TYPE • Title: Frame (Intel SMI) Types • Category: DRAM Commands • Event Code: 0x09, Max. Inc/Cyc: 1, • Definition: Count ISS Related Intel SMI Frame Type Events ISS[3:0] Extension Description Bits...
Page 131
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE BCMD_SCHEDQ_OCCUPANCY • Title: B-Box Command Scheduler Queue Occupancy • Category: Cycle Events • Event Code: [21:19]0x06 && [7]0x1, Max. Inc/Cyc: 1, • Definition: Counts the queue occupancy of the B-Box Command scheduler per FVID. The FVID (Fill Victim Index) for the command to be monitored must be programmed in MSR_PMU_MAP.fvid.
Page 132
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE PAGE_MISS • Title: Page Table Misses • Category: Page Table Related • Event Code: 0x13, Max. Inc/Cyc: 1, • Definition: Number of page misses detected. This is a command that requires a PRE-RAS-CAS to complete.
Page 133
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE RETRY_MFULL • Title: Retry MFull • Category: Retry Events • Event Code: 0x02, Max. Inc/Cyc: 1, • Definition: Number of retries detected while in the "mfull" state. Also known as the “badly starved” state.
Page 134
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE THR Bits Extension Description [10:9],[3] DIMM{n}.GT_LO 0x1,0x0 Advance the counter when the above low temp, but below mid temp thermal trip point is crossed in the "down" direction for DIMM #? NOTE: THR Bits [6:4] must be programmed with the DIMM DIMM{n}.LT_LO...
Page 135
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE TT_CMD_CONFLICT • Title: Thermal Throttling Command Conflicts • Category: Thermal Throttle • Event Code: 0x19, Max. Inc/Cyc: 1, • Definition: Count command conflicts due to thermal throttling. 2-123...
UIDE W-Box Performance Monitoring 2.8.1 Overview of the W-Box The W-Box is the primary Power Controller for the Intel Xeon Processor 7500 Series. 2.8.2 W-Box Performance Monitoring Overview The W-Box supports event monitoring through four 48-bit wide counters (W_MSR_PERF_CNT{3:0}). Each of these four counters can be programmed to count any W-Box event. The W-Box counters will increment by a maximum of 1 per cycle.
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-96. W_MSR_PMON_GLOBAL_STATUS Register Fields Field Bits Reset Description ov_fixed If an overflow is detected from the WBOX PMON fixed counter, this bit will be set. 30:4 Read zero;...
48-bit performance event counter Note: Due to an errata found within the Intel Xeon Processor 7500 Series, SW must consider two special cases: • If SW reads a counter whose value ends in 0x000000 or 0x000001, SW should subtract 0x1000000 to get the correct value.
Page 141
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE C_CYCLES_TURBO • Title: Core in C0 at Turbo • Category: W-Box Events • Event Code: 0x04, Max. Inc/Cyc: 1, • Definition: Selected core is in C0 and operating at a ‘Turbo’ operating point. C_C0_THROTTLE_DIE •...
In several boxes (S, R and B), the performance monitoring infrastructure allows a user to filter packet traffic according to certain fields. A couple common fields, the Message Class/Opcode fields, have been summarized in the following tables. Table 2-103. Intel® QuickPath Interconnect Packet Message Classes Code Name...
Page 143
® X ® P 7500 S UNCORE PERFORMANCE MONITORING NTEL ROCESSOR ERIES NCORE ROGRAMMING UIDE Table 2-104. Opcode Match by Message Class HOM0 HOM1 0000 RdCur RspI SnpCur DataC_(FEIMS) 0001 RdCode RspS SnpCode DataC_(FEIMS)_FrcAck Cnflt 0010 RdData SnpData DataC_(FEIMS)_Cmp 0011 NonSnpRd DataNc 0100...
Page 144
0000 Data Response in (FEIMS) state NOTE: Set RDS field to specify which state is to be measured. - Intel Xeon Processor 7500 Series supports getting data in E, F, I or M state DataC_(FEIMS)_Cmp 0010 Data Response in (FEIMS) state, Complete NOTE: Set RDS field to specify which state is to be measured.
Page 145
Peer has sent a WbSData message to the home, has not sent any message to the requestor and is left with line in S-state SnpCode 0001 Snoop Code (get data in F or S state) - Intel Xeon Processor 7500 Series supports getting data in F state 2-133...
Page 146
Snoop to get data in I state SnpData 0010 Snoop Data (get data in E, F or S state) - Intel Xeon Processor 7500 Series supports getting data in E or F state SnpInvItoE 1000 Snoop Invalidate to E state. To invalidate peer caching...