ARM Cortex-M3 Technical Reference Manual page 42

R2p0
Hide thumbs Also See for Cortex-M3:
Table of Contents

Advertisement

Introduction
1.5.1
Zero waitstate
1.5.2
Zero waitstate, registered fetch interface (ICODE)
1.5.3
One wait state flash
1.5.4
One wait state flash, registered fetch interface (ICODE)
1-16
The following scenarios show how you can use branch forwarding and the BRCHSTAT
control to get the best performance from your memory system. The scenarios focus on
the ideal Harvard setup, where instructions execute from ICODE, literals execute from
DCODE (unified to ICODE), and stack/heap/application data executes from SYSTEM.
Zero waitstate
Zero waitstate, registered fetch interface (ICODE)
One wait state flash
One wait state flash, registered fetch interface (ICODE)
Two wait states flash on page 1-17.
Branch prediction provides approximately 10% gain over not having the feature, and
except for extreme cases, the processor has all the benefits of 100% branch prediction
but with no penalty from branch speculation.
Branch forwarding results in more aggressive timing on the ICODE interface. If this bus
is a critical path in the system, the ICODE interface might be registered. To avoid an
approximate 25% penalty of adding a wait state, you can add a circuit that acts as a
single-entry prefetcher.
Adding wait states to the flash impacts performance of any core. You can use a cache to
lessen this penalty, but this has a dramatic effect on determinism and silicon area. A line
prefetcher with two line entries can provide comparable performance to a cache using
many less gates. 128-bits is a common prefetch width for ARM7 targets because of the
32-bit instruction set. The processor has the benefit of Thumb 32-bit instructions, a
mixed 16/32-bit instruction set. This means that a 64-bit prefetch width provides
comparable benefits to a 128-bit interface.
If the ICODE interface must be registered, you can reduce the cost of mispredictions to
only the slave side of the prefetch controller. The core still loses the opportunity of the
fetch queue request on the ICODE interface, as in the zero wait state case. However, the
trailing registered BRCHSTAT[3] status of the conditional execution can mask the
external mispredict on the output of the controller's registered system interface,
appearing as an idle cycle.
Copyright © 2005-2008 ARM Limited. All rights reserved.
Non-Confidential
ARM DDI 0337G
Unrestricted Access

Advertisement

Table of Contents
loading

Table of Contents