Instruction Latency Summary - IBM PowerPC 604 User Manual

Risc
Table of Contents

Advertisement

the count or link registers, branch instructions that depend on the condition
register
and
CR logical instructions can
be
dispatched to the MCIU. The bit is
cleared when the mtctr, mtcrf, or mtlr instruction that set the bit is executed
Because mtcrf instructions that update a single field
do
not require such
synchronization that other mtcrf instructions do,
and
because two such single-field
instructions can execute in parallel, it is typically more efficient
to
use multiple
mtcrf instructions that update only one field apiece
than to
use one mtcrf instruction
that updates multiple fields.
A
rule of thwnb follows:
- It is
always
more efficient to use two mtcrf instructions that update only one field
apiece than
to
use one mtcrf instruction that updates two fields.
- It is
almost always
more efficient
to
use
three
or four mtcrf instructions that
update only one field apiece than to use one mtcrf instruction that updates
three
fields.
- It is
often
more efficient
to
use more
than
four mtcrf instructions that update only
one field than
to
use one mtcrf instruction that updates four fields.
• Minimize
branching.
The 604 supports dynamic branch prediction and other mechanisms that .reduce the
impact of branching; nevertheless, changing control flow in a program is relatively
expensive, in that fullest advantage cannot
be
taken of resources that can improve
throughput. such as superscalar instruction dispatch and execution.
In
some cases,
branches can
be
minimized by simply rewriting an algorithm.
In
other cases, special
PowerPC instructions, such as fsel, can be used
to
eliminate a conditional branch
altogether.
• Note that the fsel instruction is optional to the PowerPC architecture
and
may not
be
implemented on all Power PC implementations, so use of this instruction to improve
performance
in
the 604 should be weighed against portability considerations.
6. 7 Instruction Latency Summary
Table 6-2 summarizes the execution cycle time of each instruction. Note that the latencies
themselves provide limited insight as to the actual behavior of an instruction. The following
list summarizes some aspects of instruction behavior:
• For a store operation, availability means data is visible to the following loads from
the same address. Misaligned load or store operations require one additional cycle,
assuming cache
hits.
- Floating-point stores that require denormalization
take
an additional cycle for
each bit of shifting that is needed up
to
a maximum of 23.
- Store multiple instructions are taken in pairs
and
take one additional cycle
if
an
odd
number of registers is stored.
Chapter 6. Instruction Timing
6-45

Advertisement

Table of Contents
loading

Table of Contents