Xilinx DPU IP Product Manual
Xilinx DPU IP Product Manual

Xilinx DPU IP Product Manual

Dpu for convolutional neural network v1.2

Advertisement

DPU for Convolutional
Neural Network v1.2
DPU IP Product Guide
PG338 (v1.2) March 26, 2019

Advertisement

Table of Contents
loading

Summary of Contents for Xilinx DPU IP

  • Page 1 DPU for Convolutional Neural Network v1.2 DPU IP Product Guide PG338 (v1.2) March 26, 2019...
  • Page 2: Revision History

    Build the PetaLinux Project Updated code. Build the Demo Updated descrption. 03/05/2019 Version 1.1 Chapter 6: Example Design Added chapter regarding the DPU targeted reference design. 02/28/2019 Version 1.0 Initial release DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 3: Table Of Contents

    Reference Clock Generation ............................... 25 Reset ......................................27 Chapter 5: Development Flow ..............................28 Customizing and Generating the Core in MPSoC ...................... 28 Chapter 6: Example Design ..............................33 DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 4 Introduction ....................................33 Hardware Design Flow ................................36 Software Design Flow ................................39 Appendix A: Legal Notices ............................... 43 References ....................................43 Please Read: Important Legal Notices ..........................43 DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 5: Ip Facts

    IP Facts Introduction DPU IP Facts Table The Xilinx® Deep Learning Processor Unit (DPU) is Core Specifics a configurable engine dedicated for convolutional Supported Zynq®-7000 SoC and neural network. The computing parallelism can be Device Family UltraScale+™ MPSoC Family configured according to the selected device and Supported User application.
  • Page 6: Chapter 1: Overview

    VGG, ResNet, GoogLeNet, YOLO, SSD, MobileNet, FPN, etc. The DPU IP can be integrated as a block in the programmable logic (PL) of the selected Zynq®-7000 SoC and Zynq UltraScale™+ MPSoC devices with direct connections to the processing system (PS). To use DPU, you should prepare the instructions and input image data in the specific memory address that DPU can access.
  • Page 7: Development Tools

    Chapter 1: Overview Development Tools Use the Xilinx Vivado Design Suite to integrate DPU into your own project. Vivado Design Suite 2018.2 or later version is recommended. Previous versions of Vivado can also be supported. For requests, contact your sales representative.
  • Page 8: Example System With Dpu

    Chapter 1: Overview Example System with DPU The figure below shows an example system block diagram with the Xilinx UltraScale+ MPSoC using a camera input. DPU is integrated into the system through AXI interconnect to perform deep learning inference tasks such as image classification, object detection, and semantic segmentation.
  • Page 9: Licensing And Ordering Information

    License. Information about this and other IP modules is available at the Xilinx Intellectual Property page. For information on pricing and availability of other Xilinx IP modules and tools, contact your local Xilinx sales representative. DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 10: Chapter 2: Product Specification

    DNNDK compiler where substantial optimizations have been performed. To improve the efficiency, abundant on-chip memory in Xilinx® devices is used to buffer the intermediate data, input, and output data. The data is reused as much as possible to reduce the memory bandwidth.
  • Page 11: Dsp With Enhanced Utilization (Dpu_Eu)

    DSP48 Slice Async PCIN X22333-022019 Figure 7: Difference between DPU and DPU_EU Port Descriptions The DPU top-level interfaces are shown in the following figure. Figure 8: DPU_EU IP Port DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 12 The data width is decided by the DPU number. Notes: 1. If only input ports are needed, you can edit the ports in the block diagram and declare at the port interface level. DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 13: Register Space

    CPU through the S_AXI interface. Reg_dpu_reset The reg_dpu_reset register controls the resets of all DPU cores integrated in the DPU IP. The lower three bits of this register control the reset of up to three DPU cores respectively. All the reset signals are active-High.
  • Page 14 There are eight groups of DPU base address for each DPU core and in total 24 groups of DPU base address for up to three DPU cores. The details of reg_dpu_base_addr are shown in Table 6. DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 15 The lower 8 bits in the register represent the upper 8 bits of the base address2 of DPU core1. Reg_dpu1_base_addr3_l 0x33C The lower 32 bits of the base address3 of DPU core1. DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 16 The lower 32 bits of the base address7 of DPU core2. Reg_dpu2_base_addr7_h 0x460 The lower 8 bits in the register represent the upper 8 bits of the base address7 of DPU core2. DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 17: Interrupts

    The data width of dpu_interrupt is determined by the number of DPU cores. When the parameter of DPU_NUM is set to 2, it means the DPU IP is integrated with two DPU cores, and the data width of the dpu_interrupt signal is two bits. The lower bit represents the DPU core 0 interrupt and the higher bit represents the DPU core1 interrupt.
  • Page 18: Chapter 3: Dpu Configuration

    Chapter 3: DPU Configuration Introduction The DPU IP provides some user-configurable parameters to optimize the resources or the support of different features. You can select different configurations to use on the preferred DSP slices, LUT, block RAM, and UltraRAM utilization based on the programmable logic resources that are allowed. There is also an option to determine the number of DPU cores that will be used.
  • Page 19: Configuration Options

    DSP cascade, DSP usage, and UltraRAM usage. These options enable the DPU IP configurable in terms of DSP slice, LUT, block RAM, and UltraRAM utilization. Figure 10 shows the DPU configuration. Figure 10: DPU Configuration DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 20 You can use up to three DPU cores can be included in one IP. Multiple DPU cores can be used to achieve higher performance. Consequently, it consumes more programmable logic resource. If the requirement is to integrate more than three cores, send the request to a Xilinx® sales representative.
  • Page 21 You can select whether DSP48E slices are used for the accumulation in the DPU convolution module. If the low DSP usage is selected, the DPU IP will use DSP slices for multiplication only in the convolution. In the high DSP usage mode, the DSP slice will be used for both multiplication and accumulation.
  • Page 22: Dpu Performance On Different Devices

    In this section, the performance of several models is given for reference. The result was measured on the Xilinx ZCU102 board with 3x B4096_EU cores at 333 MHz (DSP slices ran at 666 MHz) and DNNDK v2.08, shown in Table 11.
  • Page 23: I/O Bandwidth Requirements

    Pruned YOLO-V3- 512*256 Notes: 1. The pruned models were generated by the Xilinx pruning tool. I/O Bandwidth Requirements When different neural networks run in the DPU, the I/O bandwidth requirement is different. Even the I/O bandwidth requirement of different layers in one neural network are different. The I/O bandwidth requirements for some neural networks, averaged by layer have been tested with one DPU core running at full speed.
  • Page 24: Chapter 4: Clocking And Resets

    Chapter 4: Clocking and Resets Introduction There are three clock domains in the DPU IP: the register, the data controller, and the computation unit. The three input clocks can be configured depending on the requirements. Therefore, the corresponding reset for the three input clocks shall be configured correctly.
  • Page 25: Reference Clock Generation

    Chapter 4: Clocking and Resets Data Controller Clock The primary function of the data controller module is to schedule the data flow in the DPU IP. The data controller module works with m_axi_dpu_aclk. The data transfer between DPU and external memory happens in the data controller clock domain, so m_axi_dpu_aclk is also the AXI clock for the AXI_MM master interface in the DPU IP.
  • Page 26 Clock Wizard IP. When the Matched Routing setting enables the two clocks that are both generated through a BUFGCE_DIV, the skew between the two clocks has significantly decreased. The related configuration is shown in Figure 15. Figure 15: Matched Routing in Clock Wizard DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 27: Reset

    Chapter 4: Clocking and Resets Reset There are three input clocks for the DPU IP, each of which has a corresponding reset. You must guarantee each pair of clocks and resets is generated in a synchronous clock domain. If the related clocks and resets are not matched, the DPU might not work properly.
  • Page 28: Chapter 5: Development Flow

    In the Vivado GUI, click Project Manager > IP Catalog. In the IP Catalog tab, right-click and select Add Repository (Figure 17), then select the location of the DPU IP. This will appear in the IP Catalog page (Figure 18).
  • Page 29 Figure 18: DPU IP in Repository Add DPU IP into Block Design Search DPU IP in the block design interface and add DPU IP into the block design. The procedure is shown in Figure 19 and Figure 20. Figure 19: Search DPU IP...
  • Page 30 Chapter 5: Development Flow Configure DPU Parameters You can configure the DPU IP as shown in Figure 21. The details about these parameters can be found in Chapter 3: DPU Configuration. Figure 21: Configure DPU Connect DPU with a Processing System in the Xilinx SoC No matter how many DPU cores are configured, there is only one slave interface in the DPU IP.
  • Page 31 Vivado, you can connect the DPU slave interface to any master interface in the PS and allocate any address for the DPU. The reference address assignments of the DPU with the DNNDK package are shown in Figure 23. DPU IP Product Guide www.xilinx.com Send Feedback...
  • Page 32 You can use the Vivado SDK or PetaLinux to generate the BOOT.BIN file. For boot image creation using the Vivado SDK, refer to the Zynq UltraScale+ MPSoC Embedded Design Tutorial (UG1209). For PetaLinux, use the PetaLinux Tools Documentation Reference Guide (UG1144). DPU IP Product Guide www.xilinx.com Send Feedback...
  • Page 33: Chapter 6: Example Design

    Chapter 6: Example Design Introduction The Xilinx® DPU targeted reference design (TRD) provides instructions on how to use DPU with a Xilinx SoC platform to build and run deep neural network applications. The TRD uses the Vivado® IP integrator flow for building the hardware design and Xilinx Yocto PetaLinux flow for software design.
  • Page 34 Xilinx tools: Vivado Design Suite 2018.2 • PetaLinux 2018.2 • Hardware peripherals: • Ethernet • UART • Linux or Windows host system: Serial terminal • Network terminal • DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 35 Chapter 6: Example Design Design Files Design files are in the following directory structure. Figure 26: Directory Structure Note: DPU_IP is in the directory. pl/srcs/dpu_ip/ DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 36: Hardware Design Flow

    This section describes how to create the DPU reference design project in the Xilinx Vivado Design Suite and generate the bit file. The parameters of DPU IP in the reference design are configured accordingly. Both the connections of the DPU interrupt and the assignment addresses for DPU in the reference design should not be modified.
  • Page 37 % vivado -source scripts/trd_prj.tcl Building the Hardware Design on Windows 1. Select Start > All Programs > Xilinx Design Tools > Vivado 2018.2 > Vivado 2018.2. 2. On the Quick Start screen, click Tcl Console. 3. Type the following command in the Tcl console: cd $TRD_HOME/pl source scripts/trd_prj.tcl...
  • Page 38 Chapter 6: Example Design Figure 29: TRD Block Design 4. In the GUI, click Generate Bitstream to generate the bit file, as shown in the following figure. Figure 30: Generate Bitstream DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 39: Software Design Flow

    Chapter 6: Example Design DPU Configuration The version of the DPU IP integrated in the TRD is DPU_v1.3.0. The default parameters of DPU in the reference design project is shown in the following figure. Figure 31: DPU Configuration Page Those parameters of DPU can be configured in case of different resource requirements. For more information about DPU and its parameters, refer to Chapter 3: DPU Configuration.
  • Page 40 % export TRD_HOME=<path/to/downloaded/zipfile>/zcu102-dpu-trd-2018-2 Build the PetaLinux Project Use the following commands to create the PetaLinux project: % cd $TRD_HOME/apu/dpu_petalinux_bsp % petalinux-create -t project -s xilinx-dpu-trd-zcu102-v2018.2.bsp % cd zcu102-dpu-trd-2018-2 % petalinux-config ––get-hw-description=$TRD_HOME/pl/pre-built --oldconfig % petalinux-build If the pre-built design is needed, use the ––get-hw-description path below.
  • Page 41 The predicted results of DPU is below, and the top-5 prediction probability of image classification are printed. If the Top-0 predict results are the same as the expected result, then the DPU is working properly. DPU IP Product Guide www.xilinx.com Send Feedback...
  • Page 42 Chapter 6: Example Design Figure 32: Running Results DPU IP Product Guide www.xilinx.com Send Feedback PG338 (v1.2) March 26, 2019...
  • Page 43: Appendix A: Legal Notices

    Xilinx’s limited warranty, please refer to Xilinx’s Terms of Sale which can be viewed at https://www.xilinx.com/legal.htm#tos; IP cores may be subject to warranty and support terms contained in a license issued to you by Xilinx. Xilinx products are not designed or intended to be fail-safe or for use in any application requiring fail-safe performance;...

This manual is also suitable for:

B512B1024B1152B1600B800B3136 ... Show all

Table of Contents