Multiple Register Sets; Instruction Cache; Register Scoreboarding - Intel 80960KB Manual

Embedded 32-bit microprocessor with integrated floating-point unit

Hide thumbs

Table Of Contents

Table of Contents

Enlarged version

perform the same function as the general-purpose

registers provided in other popular microprocessors.

The term global refers to the fact that these registers

retain their contents across procedure calls.

The local registers, on the other hand, are procedure

specific. For each procedure call, the 80960KB

allocates 16 local registers (R0 through R15). Each

local register is 32 bits wide. Any register can also be

used for single or double-precision floating-point

operations; the 80-bit floating-point registers are

provided for extended precision.

1.1.4

Multiple Register Sets

To further increase the efficiency of the register set,

multiple sets of local registers are stored on-chip

(See Figure 4). This cache holds up to four local

procedure calls can be made without having to

access the procedure stack resident in memory.

Although programs may have procedure calls nested

many calls deep, a program typically oscillates back

and forth between only two to three levels. As a

result, with four stack frames in the cache, the

probability of having a free frame available on the

cache when a call is made is very high. In fact, runs

of representative C-language programs show that

80% of the calls are handled without needing to

access memory.

If four or more procedures are active and a new

procedure is called, the 80960KB moves the oldest

local register set in the stack-frame cache to a

procedure stack in memory to make room for a new

set of registers. Global register G15 is the frame

pointer (FP) to the procedure stack.

Global

and

floating

point

exchanged on a procedure call, but retain their

contents, making them available to all procedures for

fast parameter passing.

1.1.5

Instruction Cache

To further reduce memory accesses, the 80960KB

includes a 512-byte on-chip instruction cache. The

instruction cache is based on the concept of locality

of reference; most programs are not usually

executed in a steady stream but consist of many

branches, loops and procedure calls that lead to

jumping back and forth in the same small section of

code. Thus, by maintaining a block of instructions in

cache, the number of memory references required to

read instructions into the processor is greatly

reduced.

To load the instruction cache, instructions are

fetched in 16-byte blocks; up to four instructions can

be fetched at one time. An efficient prefetch

algorithm increases the probability that an instruction

will already be in the cache when it is needed.

Code for small loops often fits entirely within the

cache, leading to a great increase in processing

speed since further memory references might not be

necessary until the program exits the loop. Similarly,

when calling short procedures, the code for the

calling procedure is likely to remain in the cache so it

will be there on the procedure's return.

1.1.6

The instruction decoder is optimized in several ways.

One optimization method is the ability to overlap

instructions by using register scoreboarding.

a variable from memory into a register. When the

instruction initiates, a scoreboard bit on the target

reset. In between, any reference to the register

contents is accompanied by a test of the scoreboard

bit to ensure that the load has completed before

processing continues. Since the processor does not

need to wait for the LOAD to complete, it can

execute additional instructions placed between the

LOAD and the instruction that uses the register

contents, as shown in the following example:

registers

are

not

In essence, the two unrelated instructions between

LOAD and ADD are executed "for free" (i.e., take no

apparent time to execute) because they are

executed while the register is being loaded. Up to

three load instructions can be pending at one time

with three corresponding scoreboard bits set. By

exploiting this feature, system programmers and

compiler writers have a useful tool for optimizing

execution speed.

Register Scoreboarding

ld data_2, r4

ld data_2, r5

Unrelated instruction

add R4, R5, R6

80960KB

Table of Contents

Multiple Register Sets; Instruction Cache; Register Scoreboarding - Intel 80960KB Manual

Multiple Register Sets

Instruction Cache

Register Scoreboarding

Related Manuals for Intel 80960KB

Related Content for Intel 80960KB

Table of Contents