Prefetch Considerations; A.4.4 Prefetch Considerations; Prefetch Distances; Prefetch Loop Scheduling - Intel XScale Core Developer's Manual

page of 220

/ 220
Contents
Table of Contents
Bookmarks

Table of Contents

A.4.4

Prefetch Considerations

The Intel XScale

is to preload data into the data and mini-data caches. Data prefetching allows hiding of memory

transfer latency while the processor continues to execute instructions. The prefetch is important to

compiler and assembly code because judicious use of the prefetch instruction can enormously

improve throughput performance of the core. Data prefetch can be applied not only to loops but

also to any data references within a block of code. Prefetch also applies to data writing when the

memory type is enabled as write allocate

The Intel XScale

destination is the data or mini-data cache and not a register. Compilers for processors which have

data caches, but do not support prefetch, sometimes use a load instruction to preload the data cache.

This technique has the disadvantages of using a register to load data and requiring additional

registers for subsequent preloads and thus increasing register pressure. By contrast, the prefetch

can be used to reduce register pressure instead of increasing it.

The prefetch load is a hint instruction and does not guarantee that the data will be loaded.

Whenever the load would cause a fault or a table walk, then the processor will ignore the prefetch

instruction, the fault or table walk, and continue processing the next instruction. This is particularly

advantageous in the case where a linked list or recursive data structure is terminated by a NULL

pointer. Prefetching the NULL pointer will not fault program flow.

A.4.4.1.

Prefetch Distances

Scheduling the prefetch instruction requires understanding the system latency times and system

resources which affect when to use the prefetch instruction. Refer to the Intel XScale

implementation option section of the ASSP architecture specification for more information.

A.4.4.2.

Prefetch Loop Scheduling

When adding prefetch to a loop which operates on arrays, it may be advantages to prefetch ahead

one, two, or more iterations. The data for future iterations is located in memory by a fixed offset

from the data for the current iteration. This makes it easy to predict where to fetch the data. The

number of iterations to prefetch ahead is referred to as the prefetch scheduling distance. Refer to

the Intel XScale

more information.

A.4.4.3.

Prefetch Loop Limitations

It is not always advantages to add prefetch to a loop. Loop characteristics that limit the use value of

prefetch are discussed below.

A.4.4.4.

Compute vs. Data Bus Bound

At the extreme, a loop, which is data bus bound, will not benefit from prefetch because all the

system resources to transfer data are quickly allocated and there are no instructions that can

profitably be executed. On the other end of the scale, compute bound loops allow complete hiding

of all data transfer latencies.

Developer's Manual

core has a true prefetch load instruction (PLD). The purpose of this instruction

core prefetch load instruction is a true prefetch instruction because the load

core implementation option section of the ASSP architecture specification for

January, 2004

Intel XScale® Core Developer's Manual

Optimization Guide

core

199

Table of Contents

Need help?

Do you have a question about the XScale Core and is the answer not in the manual?

Prefetch Considerations; A.4.4 Prefetch Considerations; Prefetch Distances; Prefetch Loop Scheduling - Intel XScale Core Developer's Manual

Prefetch Considerations

Prefetch Distances

Prefetch Loop Scheduling

Prefetch Loop Limitations

Compute vs. Data Bus Bound

Need help?

Related Manuals for Intel XScale Core

Related Products for Intel XScale Core

Table of Contents