Performance Optimization; Instruction Optimizations; Load / Store Execution Model; Compare Operations - Intel i960 Jx Developer's Manual

Microprocessor
Table of Contents

Advertisement

INSTRUCTION SET OVERVIEW
5.3

PERFORMANCE OPTIMIZATION

Performance optimization are categorized into two sections: instructions optimizations and
miscellaneous optimizations.
5.3.1

Instruction Optimizations

The instruction optimizations are broken down by the instruction classification.
5.3.1.1

Load / Store Execution Model

Because the i960 Jx processor has a 32-bit external data bus, multiple word accesses require
multiple cycles. The processor uses microcode to sequence the multi-word accesses. Because the
microcode can ensure that aligned multi-words are bursted together on the external bus, software
should not substitute multiple single-word instructions for one multi-word instruction for data that
is not likely to be in cache. For example a
tions.
Once a load is issued, the processor attempts to execute other instructions while the load is
outstanding. It is important to note that when the load misses the data cache, the processor does
not stall the issuing of subsequent instructions (other than stores) that do not depend on the load.
Software should avoid following a load with an instruction that depends on the result of the load.
For a load that hits the data cache, there is a one-cycle stall when the instruction immediately after
the load requires the data. When the load fails to hit the data cache, the instruction depending on
the load stalls until the outstanding load request is resolved.
Multiple, back-to-back load instructions do not stall the processor until the bus queue becomes
full.
The processor delays issuing a store instruction until all previously-issued load instructions
complete. This happens regardless of whether the store is dependent on the load. This ordering
between loads and stores ensures that the return data from a previous cache-read miss does not
overwrite the cache line updated by a subsequent store.
5.3.1.2

Compare Operations

Byte and short word data is more efficiently compared using the new byte and short compare
instructions (
cmpob, cmpib, cmpos, cmpis
compare instruction.
5-20
provides better bus performance than four
ldq
), rather than shifting the data and using a word
instruc-
ld

Advertisement

Table of Contents
loading

Table of Contents