Execute 2 (X2) Pipestage; Write-Back (Wb); Memory Pipeline; D1 And D2 Pipestage - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

2.2.3.5

Execute 2 (X2) Pipestage

The X2 pipestage contains the program status registers (PSR). This pipestage selects the data to be
written to the RFU in the WB cycle including the following items.
The X2 pipestage contains the current program status register (CPSR). This pipestage selects what
is written to the RFU in the WB cycle including program status registers.
2.2.3.6

Write-Back (WB)

When an instruction reaches the write-back stage it is considered complete. Instruction results are
written to the RFU.
2.2.4

Memory Pipeline

The memory pipeline consists of two stages, D1 and D2. The data cache unit (DCU) consists of the
data cache array, mini-data cache, fill buffers, and write buffers. The memory pipeline handles load
and store instructions.
2.2.4.1

D1 and D2 Pipestage

Operation begins in D1 after the X1 pipestage calculates the effective address for loads and stores.
The data cache and mini-data cache return the destination data in the D2 pipestage. Before data is
returned in the D2 pipestage, sign extension and byte alignment occurs for byte and half-word
loads.
2.2.4.1.1
Write Buffer Behavior
The Intel XScale® Microarchitecture has enhanced write performance by the use of write
coalescing. Coalescing is combining a new store operation with an existing store operation already
resident in the write buffer. The new store is placed in the same write buffer entry as an existing
store when the address of new store falls in the 4-word aligned address of the existing entry.
The core can coalesce any of the four entries in the write buffer. The Intel XScale®
Microarchitecture has a global coalesce disable bit located in the Control register (CP15, register 1,
opcode_2=1).
2.2.4.1.2
Read Buffer Behavior
The Intel XScale® Microarchitecture has four fill buffers that allow four outstanding loads to the
cache and external memory. Four outstanding loads increases the memory throughput and the bus
efficiency. This feature can also be used to hide latency. Page table attributes affect the load
behavior; for a section with C=0, B=0 there is only one outstanding load from the memory. Thus,
the load performance for a memory page with C=0, B=1 is significantly better compared to a
memory page with C=0, B=0.
2.2.5

Multiply/Multiply Accumulate (MAC) Pipeline

The multiply-accumulate (MAC) unit executes the multiply and multiply-accumulate instructions
supported by the Intel XScale® Microarchitecture. The MAC implements the 40-bit Intel XScale®
Microarchitecture accumulator register acc0 and handles the instructions which transfers its value
to and from general-purpose ARM* registers.
Intel® PXA27x Processor Family Optimization Guide
Microarchitecture Overview
2-5

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents