21.11 Execution Timing; Table 21-16 Throughput And Latency Cycle Counts For Vfp11 Instructions - ARM ARM1176JZF-S Technical Reference Manual

Table of Contents

Advertisement

21.11 Execution timing

Instructions
FABS, FNEG, FCVT, FCPY
FCMP, FCMPE, FCMPZ, FCMPEZ
FSITO, FUITO, FTOSI, FTOUI, FTOUIZ, FTOSIZ
FADD, FSUB
FMUL, FNMUL
FMAC, FNMAC, FMSC, FNMSC
FDIV, FSQRT
a
FLD
a
FST
a
FLDM
a
FSTM
FMSTAT
FMSR/FMSRR
FMDHR/FMDHC/FMDRR
FMRS/FMRRS
FMRDH/FMRDL/FMRRD
d
FMXR
d
FMRX
a. The cycle count for a load instruction is based on load data that is cached and available to the ARM11 processor from the
cache. The cycle count for a store instruction is based on store data that is written to the cache and/or write buffer immediately.
When the data is not cached or the write buffer is unavailable, the number of cycles also depends on the memory subsystem.
b. The number of cycles represented by X is (N/2) if N is even or (N/2 + 1) if N is odd.
c. FMDRR and FMRRD transfer one double-precision data per transfer. FMSRR and FMRRS transfer two single-precision data
per transfer.
d. FMXR and FMRX are serializing instructions. The latency depends on the register transferred and the current activity in the
VFP11 coprocessor when the instruction is issued.
ARM DDI 0301H
ID012310
Complex instruction dependencies and memory system interactions make it impossible to
describe briefly the exact cycle timing of all instructions in all circumstances. The timing that
Table 21-16 lists is accurate in most cases. For precise timing, you must use a cycle-accurate
model of your ARM11 processor.
In Table 21-16, throughput is defined as the cycle after issue in which another instruction can
begin execution. Instruction latency is the number of cycles after which the data is available for
another operation. Forwarding reduces the latency by one cycle for operations that depend on
floating-point data. Table 21-16 lists the throughput and latency for all VFP11 instructions.

Table 21-16 Throughput and latency cycle counts for VFP11 instructions

c
c
c
c
Copyright © 2004-2009 ARM Limited. All rights reserved.
Non-Confidential, Unrestricted Access
Single-precision
Throughput
Latency
1
4
1
4
1
8
1
8
1
8
1
8
15
19
1
4
System-
a
1
dependent
b
b
X
X
+ 3
b
System-
X
dependent
1
2
1
4
-
-
1
2
-
-
1
4
1
2
VFP Instruction Execution
Double-precision
Throughput
Latency
1
4
1
4
1
8
1
8
2
9
2
9
29
33
1
4
1
System-
dependent
b
b
X
X
+ 3
b
System-
X
dependent
-
-
-
-
1
4
-
-
1
2
-
-
-
-
21-22

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents