Loop Unrolling; Program Flow Of Control - Nintendo Ultra64 Programmer's Manual

Rsp
Table of Contents

Advertisement

Advanced Information
132
}
In this fictitious example, we have theoretically improved our program's
speed by (num_pts - 4)*(time to do the translation). A big
improvement! This technique is common to help vectorizing compilers
"recognize" loops that can be vectorized. The compiler will actually break
up the loop into multiple vector operations the size of the number of vector
elements.
Loop inversion is not free. By changing which loop is vectorized, we change
the start-up costs associated with the loop. In terms of microcode, this means
the organization of the data, the use of registers, and the "overhead"
associated with this code fragment will be different.
An additional consideration for our implementation is that we know the
vector unit size and characteristics. While the above code fragment might be
better code for a Cray machine with a vectorizing compiler and unknown
CPU resources, on the RSP we must vectorize the loop by hand, breaking up
the iterations into 8 elements at a time (the size of our vector unit).
Careful evaluation of each loop should include trying to maximize the
vector elements (keeping them filled) as well as avoiding unnecessary loop
start-up and loop overhead.

Loop Unrolling

Unrolling a loop or section of code, while consuming precious IMEM space
and registers, can potentially double the speed of a section of code that has
lots of data dependencies. Unrolling a loop is the simplest way to perform
useful work during pipeline delays.

Program Flow of Control

Since program flow constructs like conditional branches interfere with
vectorization, it is often more efficient to do some "extra" work (which
vectorizes) and decide later which result to use, rather than having a more
complex program using conditional execution to minimize computation.
For example, in the triangle rasterization setup code, the vertex attributes (
g, b, a, s, t, w, z
) fit nicely in vector registers. Rather than having complicated
r,

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents