Texas Instruments TMS320C6670 Data Manual page 19

Multicore fixed and floating-point system-on-chip
Hide thumbs Also See for TMS320C6670:
Table of Contents

Advertisement

www.ti.com
C66x CPU improves the performance over the C674x double-precision multiplies by adding a instruction allowing
one double-precision multiply per cycle and also reduces the number of delay slots from ten to four. Each C66x .M
unit can also perform one the following floating-point operations each clock cycle: one, two, or four single-precision
multiplies or a complex single-precision multiply.
The .L and .S units can now support up to 64-bit operands. This allows for new versions of many of the arithmetic,
logical, and data packing instructions to allow for more parallel operations per cycle. Additional instructions were
added yielding performance enhancements of the floating point addition and subtraction instructions, including the
ability to perform one double-precision addition or subtraction per cycle. Conversion to/from integer and
single-precision values can now be done on both .L and .S units on the C66x. Also, by taking advantage of the larger
operands, instructions were also added to double the number of these conversions that can be done. The .L unit also
has additional instructions for logical AND and OR instructions, as well as 90 degree or 270 degree rotation of
complex numbers (up to two per cycle). Instructions have also been added that allow for computing the conjugate
of a complex number.
The MFENCE instruction is a new instruction introduced with the C66x DSP. This instruction creates a CPU stall
until the completion of all the CPU-triggered memory transactions, including:
Cache line fills
Writes from L1D to L2 or from the CorePac to MSMC and/or other system endpoints
Victim write backs
Block or global coherence operations
Cache mode changes
Outstanding XMC prefetch requests
This is useful as a simple mechanism for programs to wait for these requests to reach their endpoint. It also provides
ordering guarantees for writes arriving at a single endpoint via multiple paths, multiprocessor algorithms that
depend on ordering, and manual coherence operations.
For more details on the C66x CPU and its enhancements over the C64x+ and C674x architectures, see the following
documents
(2.9 ''Related Documentation from Texas Instruments'' on page
C66x CPU and Instruction Set Reference Guide
C66x DSP Cache User Guide
C66x CorePac User Guide
Copyright 2012 Texas Instruments Incorporated
Submit Documentation Feedback
Multicore Fixed and Floating-Point System-on-Chip
TMS320C6670
SPRS689D—March 2012
66):
Device Overview
19

Advertisement

Table of Contents
loading

Table of Contents