Buffer For Context Switch; Scratch Ram; Os Acceleration; Increasing Preloads For Memory Performance - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

System Level Optimization
3.4.3

Buffer for Context Switch

During context switch the states of the process has to be saved. For the PXA27x processor, the
PCB (process control block) can be large in size due to additional registers for Intel® Wireless
MMX™ Technology. In order to reduce context switch latency the internal memory can be
employed.
3.4.4

Scratch Ram

For many application (such as graphics, etc.) the working set may often be larger than the data
cache, and due to the random access nature of the application effective preload may be difficult to
perform. Thus part of the internal ram can be used for storing these critical data-structures. OS can
offer management of such critical data spaces through malloc() or virtual_alloc().
3.4.5

OS Acceleration

There is much OS- and system- related code that is used in a periodic fashion (e.g. device drivers,
OS daemon processes). Codes for these routines can be stored in the internal memory, this will
reduce the instruction cache miss penalties for the periodic routines.
3.4.6

Increasing Preloads for Memory Performance

Apart from increasing cache efficiency, hiding the memory latency is extremely important. The
proper preload scheme can be used to hide the memory latency for data accesses.
The Intel XScale® Microarchitecture has a preload load instruction (PLD). The purpose of this
instruction is to preload data into the data and mini-data caches. Data pre-loading allows hiding of
memory transfer latency while the processor continues to execute instructions. The preload is
important to compiler and assembly code because judicious use of the preload instruction can
enormously improve throughput performance of Intel XScale® Microarchitecture-based
processors. Data preload can be applied not only to loops but also to any data references within a
block of code. Preload also applies to data writing when the memory type is enabled as write
allocate.
Note: The Intel XScale® Microarchitecture PLD instruction encoding translates to a never execute in the
ARM* V4 architecture. This is to allow compatibility between code using PLD on an Intel
XScale® Microarchitecture processor and older devices. Code that has to run on both architectures
can include the PLD instruction, gaining performance on the Intel XScale® Microarchitecture,
while maintaining compatibility for ARM* V4 (for example, StrongARM). A detailed discussion
on the efficient pre-loading of the data and possible use cases has been explained in
"Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology
Section 5, "High Level Language
3.5

Optimization of System Components

In the PXA27x processor, the LCD, DMA controller, Intel® Quick Capture Interface and Intel
XScale® core share the same resources such as system bus, memory controller, etc. Thus, there
may be potential resource conflicts and the sharing of resources may impact the performance of the
end application. For example, a larger LCD display consumes more memory and system bus
3-10
Optimization", and
Section 6, "Power
Intel® PXA27x Processor Family Optimization Guide
Section 4,
Optimization",
Optimization".

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents