Data Cache Efficiency Mode; Instruction Fetch Latency Mode; Data/Bus Request Buffer Full Mode - Intel PXA255 User Manual

Xscale microarchitecture
Hide thumbs Also See for PXA255:
Table of Contents

Advertisement

Performance Monitoring
Statistics derived from these two events:
Instruction cache miss-rate. This is derived by dividing PMN1 by PMN0.
The average number of cycles it took to execute an instruction or commonly referred to as
cycles-per-instruction (CPI). CPI can be derived by dividing CCNT by PMN0, where CCNT
was used to measure total execution time.
8.5.2

Data Cache Efficiency Mode

PMN0 totals the number of data cache accesses, which includes cacheable and non-cacheable
accesses, mini-data cache access and accesses made to locations configured as data RAM.
Note that STM and LDM will each count as several accesses to the data cache depending on the
number of registers specified in the register list. LDRD will register two accesses.
PMN1 counts the number of data cache and mini-data cache misses. Cache operations do not
contribute to this count. See
The common statistic derived from these two events is:
Data cache miss-rate. This is derived by dividing PMN1 by PMN0.
8.5.3

Instruction Fetch Latency Mode

PMN0 accumulates the number of cycles when the instruction-cache is not able to deliver an
instruction to the Intel® XScale™ core due to an instruction-cache miss or instruction-TLB miss.
This event means that the processor core is stalled.
PMN1 counts the number of instruction fetch requests to external memory. Each of these requests
loads 32 bytes at a time. This is the same event as measured in instruction cache efficiency mode
and is included in this mode for convenience so that only one performance monitoring run is need.
Statistics derived from these two events:
The average number of cycles the processor stalled waiting for an instruction fetch from
external memory to return. This is calculated by dividing PMN0 by PMN1. If the average is
high then the Intel® XScale™ core may be starved of memory access due to other bus traffic.
The percentage of total execution cycles the processor stalled waiting on an instruction fetch
from external memory to return. This is calculated by dividing PMN0 by CCNT, which was
used to measure total execution time.
8.5.4

Data/Bus Request Buffer Full Mode

The Data Cache has buffers available to service cache misses or uncacheable accesses. For every
memory request that the Data Cache receives from the processor core a buffer is speculatively
allocated in case an external memory request is required or temporary storage is needed for an
unaligned access. If no buffers are available, the Data Cache will stall the processor core. How
often the Data Cache stalls depends on the performance of the bus external to the Intel® XScale™
core (the internal bus inside the application processor) and what the memory access latency is for
Data Cache miss requests to external memory. If the Intel® XScale™ core memory access latency
8-6
Section 7.2.7
for a description of these operations.
Intel® XScale™ Microarchitecture User's Manual

Advertisement

Table of Contents
loading

Table of Contents