Performance Tips; Dual Execution; Vectorization - Nintendo Ultra64 Programmer's Manual

Rsp
Table of Contents

Advertisement

Advanced Information

Performance Tips

128
Assembly language optimizations or vector processing tricks are beyond the
scope of this document, however it is worthwhile to mention a few issues
specifically relating to the RSP architecture.

Dual Execution

The RSP executes up to one Scalar Unit (SU) instruction and one Vector Unit
(VU) instruction per clock cycle; the most efficient RSP code exploits this.
Spreading loads, loop counting, and other SU "bookkeeping" code among
VU computations can greatly accelerate sections of code.
Of course this is not always possible, there is not always useful work that can
be done in both units.
Interleaving SU and VU code inhibits code readability somewhat; a
consistent coding style helps improve the chance of finding a bug that would
otherwise be hidden in an unreadable section of code.
This optimization technique is best left for last. As code is reorganized
during development and testing the dual-issue pattern will change.
"Keeping the both halves of the RSP busy"
Hint:
of your keys to maximum performance.

Vectorization

The computational power of the RSP lies in the Vector Unit (VU). Choice of
algorithm and data organization are the fundamental design decisions for
optimal RSP programs.
A vector architecture like the VU of the RSP, is a SIMD (Single-Instruction,
Multiple-Data) machine, meaning that one instruction may operate on
several pieces of data.
Reviewing the literature in computer architecture or compiler design, it is
apparent that certain kinds of programming constructs are especially good
(or bad) on a vector architecture:
is going to be one

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents