21.10 Parallel Execution - ARM ARM1176JZF-S Technical Reference Manual

Table of Contents

Advertisement

21.10 Parallel execution

ARM DDI 0301H
ID012310
The VFP11 coprocessor is capable of execution in each of the three pipelines independently of
the others and without blocking issue or writeback from any pipeline. Separate LS, FMAC, and
DS pipelines enable parallel operation of CDP and data transfer instructions. Scheduling
instructions to take advantage of the parallelism that occurs when multiple instructions execute
in the VFP11 pipelines can result in a significant improvement in program execution time.
A data transfer operation can begin execution if:
no data hazards exist with any currently executing operations
the LS pipeline is not currently stalled by the ARM11 processor or busy with a data
transfer multiple.
A CDP can be issued to the FMAC pipeline if:
no data hazards exist with any currently executing operations
the FMAC pipeline is available, that is, no short vector CDP is executing and no
double-precision multiply is in the first cycle of the multiply operation
no short vector operation with unissued iterations is currently executing in either the
FMAC or DS pipeline.
A divide or square root instruction can be issued to the DS pipeline if:
no data hazards exist with any currently executing operations
the DS pipeline is available, that is, no current divide or square root is executing in the DS
pipeline E1 stage
no short vector operation with unissued iterations is executing in the FMAC pipeline.
Example 21-13 on page 21-21 shows a case of the VFP11 coprocessor executing instructions in
parallel in each of the three pipelines:
a load multiple in the L/S pipeline
a divide in the DS pipeline
a short vector add in the FMAC pipeline.
In this example, the LEN field contains b011, selecting a vector length of four iterations, and the
STRIDE field contains b00, for a vector stride of one.
FLDM
[R4], {S4-S13}
FDIVS
S0, S1, S2
FADDS
S16, S20, S24
Copyright © 2004-2009 ARM Limited. All rights reserved.
Non-Confidential, Unrestricted Access
Example 21-13 Parallel execution in all three pipelines
VFP Instruction Execution
21-20

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents