Literal Pools; Cache Considerations; Cache Conflicts, Pollution And Pressure - Intel PXA255 User Manual

Xscale microarchitecture
Hide thumbs Also See for PXA255:
Table of Contents

Advertisement

If the structure is not sized to a multiple of the cache line size, then the prefetch address must be
advanced appropriately and will require extra prefetch instructions. Consider the following
example:
struct {
long ia;
long ib;
long ic;
long id;
long ie;
} tdata[IMAX];
ADDRESS preadd = tdata
for (i=0, i<IMAX; i++)
{
PREFETCH(predata+=16);
tdata[i].ia = tdata[i].ib + tdata[i].ic - tdata[i].id + tdata[i].ie;
....
tdata[i].ie = 0;
}
In this case, the prefetch address was advanced by size of half a cache line and every other prefetch
instruction is ignored. Further, an additional register is required to track the next prefetch address.
Generally, not aligning and sizing data will add extra computational overhead.
A.4.2.7.

Literal Pools

The Intel® XScale™ core does not have a single instruction that can move all literals (a constant or
address) to a register. One technique to load registers with literals in the Intel® XScale™ core is by
loading the literal from a memory location that has been initialized with the constant or address.
These blocks of constants are referred to as literal pools. See
for more information on how to do this. It is advantageous to place all the literals together in a pool
of memory known as a literal pool. These data blocks are located in the text or code address space
so that they can be loaded using PC relative addressing. However, references to the literal pool area
load the data into the data cache instead of the instruction cache. Therefore it is possible that the
literal may be present in both the data and instruction caches, resulting in waste of space.
For maximum efficiency, the compiler should align all literal pools on cache boundaries and size
each pool to a multiple of 32 bytes, the size of a cache line. One additional optimization would be
to group highly used literal pool references into the same cache line. The advantage is that once one
of the literals has been loaded, the other seven will be available immediately from the data cache.
A.4.3

Cache Considerations

A.4.3.1.

Cache Conflicts, Pollution and Pressure

Cache pollution occurs when unused data is loaded in the cache and cache pressure occurs when
data that is not temporal to the current process is loaded into the cache. For an example, see
Section A.4.4.2., "Prefetch Loop Scheduling"
Intel® XScale™ Microarchitecture User's Manual
Section A.3, "Basic Optimizations"
below.
Optimization Guide
A-17

Advertisement

Table of Contents
loading

Table of Contents