Execute 2 (X2) Pipestage; Write-Back (Wb); Memory Pipeline; D1 And D2 Pipestage - Intel PXA270 Optimization Manual

Pxa27x processor family

page of 144

/ 144
Contents
Table of Contents
Bookmarks

Table of Contents

2.2.3.5

Execute 2 (X2) Pipestage

The X2 pipestage contains the program status registers (PSR). This pipestage selects the data to be

written to the RFU in the WB cycle including the following items.

The X2 pipestage contains the current program status register (CPSR). This pipestage selects what

is written to the RFU in the WB cycle including program status registers.

2.2.3.6

Write-Back (WB)

When an instruction reaches the write-back stage it is considered complete. Instruction results are

written to the RFU.

2.2.4

Memory Pipeline

The memory pipeline consists of two stages, D1 and D2. The data cache unit (DCU) consists of the

data cache array, mini-data cache, fill buffers, and write buffers. The memory pipeline handles load

and store instructions.

2.2.4.1

D1 and D2 Pipestage

Operation begins in D1 after the X1 pipestage calculates the effective address for loads and stores.

The data cache and mini-data cache return the destination data in the D2 pipestage. Before data is

returned in the D2 pipestage, sign extension and byte alignment occurs for byte and half-word

loads.

2.2.4.1.1

Write Buffer Behavior

The Intel XScale® Microarchitecture has enhanced write performance by the use of write

coalescing. Coalescing is combining a new store operation with an existing store operation already

resident in the write buffer. The new store is placed in the same write buffer entry as an existing

store when the address of new store falls in the 4-word aligned address of the existing entry.

The core can coalesce any of the four entries in the write buffer. The Intel XScale®

Microarchitecture has a global coalesce disable bit located in the Control register (CP15, register 1,

opcode_2=1).

2.2.4.1.2

Read Buffer Behavior

The Intel XScale® Microarchitecture has four fill buffers that allow four outstanding loads to the

cache and external memory. Four outstanding loads increases the memory throughput and the bus

efficiency. This feature can also be used to hide latency. Page table attributes affect the load

behavior; for a section with C=0, B=0 there is only one outstanding load from the memory. Thus,

the load performance for a memory page with C=0, B=1 is significantly better compared to a

memory page with C=0, B=0.

2.2.5

Multiply/Multiply Accumulate (MAC) Pipeline

The multiply-accumulate (MAC) unit executes the multiply and multiply-accumulate instructions

supported by the Intel XScale® Microarchitecture. The MAC implements the 40-bit Intel XScale®

Microarchitecture accumulator register acc0 and handles the instructions which transfers its value

to and from general-purpose ARM* registers.

Intel® PXA27x Processor Family Optimization Guide

Microarchitecture Overview

2-5

Table of Contents

This manual is also suitable for:

Pxa271 Pxa272 Pxa273

Execute 2 (X2) Pipestage; Write-Back (Wb); Memory Pipeline; D1 And D2 Pipestage - Intel PXA270 Optimization Manual

Execute 2 (X2) Pipestage

Write-Back (WB)

Memory Pipeline

D1 and D2 Pipestage

Multiply/Multiply Accumulate (MAC) Pipeline

Related Manuals for Intel PXA270

Related Content for Intel PXA270

This manual is also suitable for:

Table of Contents