Internal Cache Performance Issues - Intel Embedded Intel486 Hardware Reference Manual

Embedded intel486 processor
Table of Contents

Advertisement

EMBEDDED Intel486™ PROCESSOR HARDWARE REFERENCE MANUAL
9.2.2
Application Programs Used in Analysis
For the bus utilization and cache statistics presented later, a series of five programs were used.
Each was traced to record the address access pattern. These patterns were then used in a cache
simulator to measure how many accesses could be handled in the on-chip cache of the Intel486
processor. The cache simulator is an accurate representation of on-chip cache. External bus traffic
was also measured to give bus utilization statistics. An external DRAM controller and external
cache can also be simulated to measure their effect on program execution.
The programs represent different types of work. Each was run in the UNIX environment. Some
are 16-bit DOS applications run under a DOS emulator. Each had 16 million memory references
recorded.
9.3

INTERNAL CACHE PERFORMANCE ISSUES

The Intel486 processor is capable of high speed operations, as fast as 1 clock for many common
instructions. Since external memory cannot provide data for the CPU every clock, an on-chip
cache that can be accessed very quickly is necessary to enhance the overall performance. The
cache eases the bandwidth differences between the external bus and the CPU. The size, organi-
zation, write policy, miss replacement, and busing of the Intel486 processor on-chip cache were
chosen to support a broad range of applications.
9.3.1
On-Chip Cache Organization Issues
The Intel486 processor contains an 8-Kbyte (16-Kbyte on the IntelDX4 processor) on-chip cache.
The cache is unified (containing both code and data), and is organized as 4-way set-associative,
with four 2-Kbyte (4-Kbyte on the IntelDX4 processor) sets. Each set contains 128 lines (256
lines on the IntelDX4 processor). Cache lines are 16 bytes long. Lines in the cache are either valid
or not valid. There is no provision for partially valid lines.
Read requests are generated either by program flow (data request) or an instruction prefetch (code
request). The great majority of the time, these requests are usually satisfied by the on-chip cache.
However, if a cache miss occurs, an external bus request is generated. For reads to non-cacheable
areas of memory, the read is completely normal. If, however, the read request is to a cacheable
portion of memory, then the CPU initiates a cache bus line fill. Cache line fills require the exe-
cution of additional bus cycles in order to read the remainder of the 16-byte line into the CPU.
Cache line size can impact system performance. If the line size is too large, then the number of
blocks that can fit in the cache is reduced. In addition, as the line length is increased, the latency
for the external memory system to fill a cache line increases, reducing overall performance.
However, the Intel486 processor bus is optimized for a line size of 16 bytes. Because the Intel486
processor can access four bytes in each bus cycle and the cache lines are 16 bytes long, four bus
cycles are necessary to fill a cache line. To reduce latency of reading cache lines, the CPU uses
burst cycles. During burst cycles, four bytes of data can be read into the CPU every clock. With
the use of burst cycles, a 16-byte cache line can be read into the CPU in as few as five clock cy-
cles. Static column DRAMs can be implemented to support burst cycles to the CPU.
During writes, the main memory update method utilized by the Intel486 processor (except for the
IntelDX4 processor) is the write-through policy. All writes from the Intel486 processor initiate
9-4

Advertisement

Table of Contents
loading

Table of Contents