X2 (Execute 2) Pipestage; Xwb (Write-Back; Memory Pipeline; D1 And D2 Pipestage - Intel PXA255 User Manual

Xscale microarchitecture
Hide thumbs Also See for PXA255:
Table of Contents

Advertisement

Optimization Guide
cancelled, and will not cause any architectural state changes, including modifications of
registers, memory, and PSR.
Branch target determination - If a branch was mispredicted by the BTB, the X1 pipestage
flushes all of the instructions in the previous pipestages and sends the branch target address to
the BTB, which will restart the pipeline
A.2.3.5.

X2 (Execute 2) Pipestage

The X2 pipestage contains the program status registers (PSRs). This pipestage selects what is
going to be written to the RFU in the XWB cycle: PSRs (MRS instruction), ALU output, or other
items.
A.2.3.6.

XWB (write-back)

When an instruction has reached the write-back stage, it is considered complete. Changes are
written to the RFU.
A.2.4

Memory Pipeline

The memory pipeline consists of two stages, D1 and D2. The data cache unit, or DCU, consists of
the data-cache array, mini-data cache, fill buffers, and writebuffers. The memory pipeline solely
handles load and store instructions.
A.2.4.1.

D1 and D2 Pipestage

Operation begins in D1 after the X1 pipestage has calculated the effective address for load/stores.
The data cache and mini-data cache returns the destination data in the D2 pipestage. Before data is
returned in the D2 pipestage, sign extension and byte alignment occurs for byte and half-word
loads.
A.2.5

Multiply/Multiply Accumulate (MAC) Pipeline

The Multiply-Accumulate (MAC) unit executes all multiply and multiply-accumulate instructions
supported by the Intel® XScale™ core. The MAC implements the 40-bit Intel® XScale™ core
accumulator register acc0 and handles the instructions, which transfer its value to and from
general-purpose ARM* registers.
The following are important characteristics about the MAC:
The MAC is not truly pipelined, as the processing of a single instruction may require use of the
same datapath resources for several cycles before a new instruction can be accepted. The type
of instruction and source arguments determines the number of cycles required.
No more than two instructions can occupy the MAC pipeline concurrently.
When the MAC is processing an instruction, another instruction may not enter M1 unless the
original instruction completes in the next cycle.
The MAC unit can operate on 16-bit packed signed data. This reduces register pressure and
memory traffic size. Two 16-bit data items can be loaded into a register with one LDR.
The MAC can achieve throughput of one multiply per cycle when performing a 16 by 32 bit
multiply.
A-6
Intel® XScale™ Microarchitecture User's Manual

Advertisement

Table of Contents
loading

Table of Contents