Prefetch Considerations - Intel IXP45X Developer's Manual

Network processors
Table of Contents

Advertisement

®
Intel XScale
Processor—Intel
bank access different pages. The memory page change adds three to four bus clock
cycles to memory latency. This added delay extends the prefetch distance
correspondingly making it more difficult to hide memory access latencies. This type of
thrashing can be resolved by placing the conflicting data structures into different
memory banks or by paralleling the data structures such that the data resides within
the same memory page. It is also extremely important to insure that instruction and
data sections are in different memory banks, or they will continually trash the memory
page selection.
3.10.4.4

Prefetch Considerations

The IXP45X/IXP46X network processors have a true prefetch load instruction (PLD).
The purpose of this instruction is to preload data into the data and mini-data caches.
Data prefetching allows hiding of memory transfer latency while the processor
continues to execute instructions. The prefetch is important to compiler and assembly
code because judicious use of the prefetch instruction can enormously improve
throughput performance of the IXP45X/IXP46X network processors. Data prefetch can
be applied not only to loops but also to any data references within a block of code.
Prefetch also applies to data writing when the memory type is enabled as write-
allocate.
The prefetch load instruction of the IXP45X/IXP46X network processors is a true
prefetch instruction because the load destination is the data or mini-data cache and not
a register. Compilers for processors which have data caches, but do not support
prefetch, sometimes use a load instruction to preload the data cache. This technique
has the disadvantages of using a register to load data and requiring additional registers
for subsequent preloads and thus increasing register pressure. By contrast, the
prefetch can be used to reduce register pressure instead of increasing it.
The prefetch load is a hint instruction and does not guarantee that the data will be
loaded. Whenever the load would cause a fault or a table walk, then the processor will
ignore the prefetch instruction, the fault or table walk, and continue processing the
next instruction. This is particularly advantageous in the case where a linked list or
recursive data structure is terminated by a NULL pointer. Prefetching the NULL pointer
will not fault program flow.
3.10.4.4.1
Prefetch Loop Limitations
It is not always advantages to add prefetch to a loop. Loop characteristics that limit the
use value of prefetch are discussed below.
3.10.4.4.2
Compute versus Data Bus Bound
At the extreme, a loop, which is data bus bound, will not benefit from prefetch because
all the system resources to transfer data are quickly allocated and there are no
instructions that can profitably be executed. On the other end of the scale, compute
bound loops allow complete hiding of all data transfer latencies.
3.10.4.4.3
Low Number of Iterations
Loops with very low iteration counts may have the advantages of prefetch completely
mitigated. A loop with a small fixed number of iterations may be faster if the loop is
completely unrolled rather than trying to schedule prefetch instructions.
3.10.4.4.4
Bandwidth Limitations
Overuse of prefetches can usurp resources and degrade performance. This happens
because once the bus traffic requests exceed the system resource capacity, the
processor stalls. The data transfer resources for the IXP45X/IXP46X network
processors are:
August 2006
Order Number: 306262-004US
®
®
IXP45X and Intel
IXP46X Product Line of Network Processors
Intel
®
®
IXP45X and Intel
IXP46X Product Line of Network Processors
Developer's Manual
207

Advertisement

Table of Contents
loading

This manual is also suitable for:

Ixp46x

Table of Contents