Chapter 3:
DSP48E1 Design Considerations
Connecting DSP48E1 Slices across Columns
Using the cascade paths to implement adders significantly improves power consumption
and speed. The maximum number of cascades in a path is limited only by the total number
of DSP48E1 slices in one column in the chip.
The height of the DSP column can differ between the Virtex-5, Virtex-6, and 7 series devices
and should be considered while porting designs between the devices. Spanning columns is
possible by taking P bus output from the top of one DSP column and adding fabric
pipeline registers to route this bus to the C port of the bottom DSP48E1 slice of the adjacent
DSP column. Alignment of input operands is also necessary to span multiple DSP
columns.
Time Multiplexing the DSP48E1 Slice
The high-speed math elements in the DSP48E1 slice enable designers to use time
multiplexing in their DSP designs. Time multiplexing is the process of implementing more
than one function within a single DSP48E1 slice at different instances of time. Time
multiplexing can be done for designs with low sample rates. The calculation to determine
the number of functions (N) that can be implemented in one single DSP48E1 slice is shown
in
These time-multiplexed DSP designs have optional pipelining that permits aggregate
multichannel sample rates of up to 500 million samples per second. Implementing a
time-multiplexed design using the DSP48E1 slice results in reduced resource utilization
and reduced power.
The DSP48E1 slice contains the basic elements of classic FIR filters: a multiplier followed
by an adder, delay or pipeline registers, and the ability to cascade an input stream (B bus)
and an output stream (P bus) without exiting to a general slice fabric.
Multichannel filtering can be viewed as time-multiplexed, single-channel filters. In a
typical multichannel filtering scenario, multiple input channels are filtered using a
separate digital filter for each channel. Due to the high performance of the DSP48E1 slice
within the 7 series device, a single digital filter can be used to filter all eight input channels
by clocking the single filter with an 8x clock. This implementation uses 1/8th of the total
FPGA resource as compared to implementing each channel separately.
Miscellaneous Notes and Suggestions
•
•
•
52
Send Feedback
Equation
3-4:
N * channel frequency ≤ maximum frequency of the DSP48E1 slice
Small multiplies (for example, 4 x 4 multiplies) and small bit width adders and
counters should be implemented using the interconnect logic LUTs and carry chain. If
you have a large number of small add operations and/or counters, you should take
advantage of the SIMD mode and implement the operation in the DSP48E1 slice.
Factor of 2x area and power savings occur, when compared to using interconnect
logic, whenever input registers are also folded into the DSP48E1 slice for SIMD mode
functions.
Always sign extend the input operands when implementing smaller bit width
functions. For lower fabric power, push operands into MSBs and ground (GND) LSBs.
While cascading different DSP48E1 slices, the pipestages of the different signal paths
should be matched.
www.xilinx.com
Equation 3-4
7 Series DSP48E1 User Guide
UG479 (v1.10) March 27, 2018
Need help?
Do you have a question about the 7 Series and is the answer not in the manual?