High Level Language Optimization; C And C++ Level Optimization; Efficient Usage Of Preloading; Preload Considerations - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

High Level Language Optimization

5.1

C and C++ Level Optimization

For embedded systems, the system's performance is greatly affected by the software programming
techniques. In order to attain performance at the application level, there are many techniques which
can be applied at the C/ C++ code development phase. This chapter covers a set of programming
optimization techniques which are relevant to deeply embedded system such as the Intel® PXA27x
Processor Family (PXA27x processor).
5.1.1

Efficient Usage of Preloading

The Intel XScale® Microarchitecture preload instruction is a true preload instruction because the
load destination is the data or mini-data cache and not a register. Compilers for processors which
have data caches, but do not support preload, sometimes use a load instruction to preload the data
cache. This technique has the disadvantages of using a register to load data and requiring additional
registers for subsequent preloads and thus increasing register pressure. By contrast, the Intel
XScale® Microarchitecture preload can be used to reduce register pressure instead of increasing it.
The Intel XScale® Microarchitecture preload is a hint instruction and does not guarantee that the
data is loaded. Whenever the load would cause a fault or a table walk, then the processor ignores
the preload instruction, the fault or table walk, and continue processing the next instruction. This is
particularly advantageous in the case where a linked list or recursive data structure is terminated by
a NULL pointer. Preloading the NULL pointer does not cause a fault.
The preload instructions (PLD) can be inserted by the compiler during compilation. However, the
programmer can effectively insert preload operations in the code. A function can be defined during
high level language programming which results in a PLD instruction being inserted in-line. This
function can the be called at other suitable places in the code to insert PLD instructions.
5.1.1.1

Preload Considerations

The issues associated with using preloading which require consideration are explained below.
5.1.1.1.1
Preload Distances In the Intel XScale® Microarchitecture
Scheduling the preload instruction requires understanding the system latency times and system
resources which determine when to use the preload instruction.
The optimum advantage of using preload is obtained if the preload issue-to-use distance is equal to
the memory latency. The memory latency shown in
Latency and Bandwidth"
Depending on whether the target is in the internal memory or in the external memory, the preload
distance may need to be varied. Also, for external memory in which the target address is not
aligned to a cacheline the memory latency can increase due to the critical word first (CWF) mode
of the memory accesses. CWF mode returns the requested data starting with the requested word
instead of starting with the word at the aligned address.When using preloads, align the target
address to a cache-line boundary in order to avoid the extra memory bus usage.
Intel® PXA27x Processor Family Optimization Guide
should be used to determine the proper insertion point for preloads.
Section 3.2.1, "Optimal Setting for Memory
5
5-1

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents