FPGA Video AI deployment – From platform creation to AI deployment

In this blog post, we are going to demonstrate how we can leverage Xilinx FPGA to quickly deploy a simple video application using AI.

The tutorial will be split into 2 parts. First, we will create a custom hardware platform and a Linux environment in order to support a Deep Learning Processing Unit (DPU) instance. The idea here is to deploy a pre-trained model and to run it onto the hardware (HW). Note: We are not going to go through the AI science path through which we would try to train and optimize our neural network (NN).

The second part will cover how to create an application in Vitis using a DPU HW accelerator, the deployment to the target board, and the testing phase.

The system and HW requirements to follow this tutorial are as follow:

Host requirements

OS: Ubuntu 20.04.2 LTS

Vivado/Vitis: 2021.1

Petalinux: 2021.1

Target requirements

Board: ZCU104

USB camera

Ethernet Connection

Displayport compatible display

Our goal in the first part is to build the base project and Linux infrastructure so we can use it in the next phase to prepare and deploy the application. Note: Most of these steps come from a Xilinx Tutorial called Vitis-Platform-Creation.

Step 1 - create a Vivado platform compatible with DPU for integration later on.

First, we need to create a Vivado project. We will need to source the environment and then open Vivado. We are going to use the GUI flow so we can know what goes on behind the hood instead of just using a pre-built script for the ZCU104. We are also going to create a directory to hold our entire project:

source <Vitis_Install_Directory>/settings64.sh
cd ~
mkdir zcu104_custom_platform_2021
cd zcu104_custom_platform_2021
vivado &

Create a new project

Once Vivado is started we need to create a new project:

Create a new project

    • Select File->Project->New, Click Next.
    • In Project Name dialog set Project name to zcu104_custom_platform. 
    • In location go to your new directory (zcu104_custom_platform_2021) Click Next.
    • Enable Project is an extensible Vitis platform. Click Next.
    • Select Boards tab and then select Zynq UltraScale+ ZCU104 Evaluation Board. Click Next.
    • click Finish

We now have a project created for the zcu104 board.

Create a Block Design

In this step, we are going to create a platform that will be able to support the integration of a DPU bloc later on, in Vitis. We would also be able to connect to other accelerators if needed.

    • In Project Manager, under IP INTEGRATOR, select Create Block Design.
    • (Optional) Change the design name to system.
    • Click OK.

Now add an MPSoC.

    • Add MPSoC IP and run block automation to configure it.
    • Run it with apply board preset selected

The MPSoC has now been configured to use the board preset for the ZCU104, which includes DDR configuration, input clock, reset GPIO, etc. Note that in case you are using a custom board, you would have to provide the configurations specific to that board.

Design customization

Now we need to create a design that will be compatible with the Vitis compiler tool later on. Historically, I have mostly been doing RTL designs but it seems like what we can do now is create a generic platform in Vivado and then use a custom-built accelerator (that's now reusable through projects!) in RTL/HLS/Opencl and add them later on in the software flow as needed. This will generate a binary file that can be loaded in the FPGA. The process uses the V++ linker which combines various accelerators with a target platform in our case the generic zcu104.

Configure the MPSOC interface

Enable AXI HPM0 LPD to control the AXI Interrupt Controller

In the block diagram, double-click the Zynq UltraScale+ MPSoC block.
Select PS-PL Configuration > PS-PL interfaces > Master interface.
Enable the AXI HPM0 LPD option.
Expand the arrow before AXI HPM0 LPD. Check the AXI HPM0 LPD Data width settings and keep it as default 32.
Disable AXI HPM0 FPD and AXI HPM1 FPD
Click OK to finish the configuration.

Note:

We use AXI HPM0 LPD mainly for control purposes. It will thus read and write 32 bits control registers. If the interface was more than 32 bits, AXI Interconnect or SmartConnect would do AXI bus width conversion using PL logic. This would cost logical resources and introduce unnecessary latency.

We reserve AXI HPM0 FPD and AXI HPM1 FPD for kernel usage. Disabling them from the block diagram can prevent auto connection by accident. We can export the unused AXI interfaces in Platform Setup, no matter if it's visible in the block diagram or not.

Reset and clock creation

Add the clocking wizard block to generate three clocks:

Right-click Diagram view and select Add IP.
Search for and add a Clocking Wizard from the IP Search dialog.
Double-click the clk_wiz_0 IP block to open the Re-Customize IP dialog box.
Click the Output Clocks tab.
Enable clk_out1 through clk_out3 in the Output Clock column. Set the Requested Output Freq as follows:

clk_out1 to 100 MHz.

clk_out2 to 200 MHz.

clk_out3 to 400 MHz.

At the bottom of the dialog box set the Reset Type to Active Low.

Processor System Reset:

Right-click Diagram view and select Add IP.
Search for and add a Processor System Reset from the IP Search dialog
Rename the reset block to proc_sys_reset_1 so that it's easy to understand the relationship between reset modules and the clock signals.
Select the proc_sys_reset_1 block, type Ctrl-C and Ctrl-V to replicate two modules. They are named as proc_sys_reset_2 and proc_sys_reset_3 by default.

Connect Clocks and Resets:

Click Run Connection Automation, which will open a dialog that will help connect the proc_sys_reset blocks to the clocking wizard clock outputs.
Enable All Automation on the left side of the Run Connection Automation dialog box.
Select clk_in1 on clk_wiz_0, and set the Clock Source to /zynq_ultra_ps_e_0/pl_clk0.
For each proc_sys_reset instance, select the slowest_sync_clk, and set the Clock Source as follows:
- proc_sys_reset_1 with /clk_wiz_0/clk_out1
- proc_sys_reset_2 with /clk_wiz_0/clk_out2
- proc_sys_reset_3 with /clk_wiz_0/clk_out3
On each proc_sys_reset instance, select the ext_reset_in, set Board Part Interface to Custom, and set the Select Manual Source to /zynq_ultra_ps_e_0/pl_resetn0.
Make sure all checkboxes are enabled and click OK to close the dialog and create the connections.
Connect all the dcm_locked signals on each proc_sys_reset instance to the locked signal on clk_wiz_0.

Add the AXI Interrupt Controller

As for the clock, the Interrupt configuration of the platform will be exported in the PFM.IRQ property. They will be available to the V++ linker so they can connect the kernel module with the platform if needed.

Right-click Diagram view and select Add IP, search and add AXI Interrupt Controller IP. It's instantiated as axi_intc_0.
Double click the AXI Interrupt Controller block, change Interrupt Output Connection to Single so that it can be connected to the PS IRQ interface.
Click OK.

Connect AXI Interfaces of axi_intc_0 to AXI HPM0 LPD of PS

Click Run Connection Automation
Review the settings (axi_intc_0 is enabled, s_axi is to be connect to /zynq_ultra_ps_e_0/M_AXI_HPM0_LPD)
Set Clock Source for Slave Interface and Clock Source for Master Interface to /clk_wiz_0/clk_out2(200 MHz)
Click OK

If you regenerate the layout it will look like this

Step 2 - Platform Configuration

As an RTL designer, I have had to deal with Vivado IPI for the last couple of years, but rarely to define parameters for a downstream tool. The platform creation is mostly that. Defining an interface so that the Vitis Linker can understand the design and connect the hardware accelerator accordingly.

Clock

Go to the Platform Setup tab.
If it's not opened yet, use menu Window -> Platform Setup to open it.
Click Clock tab.
Enable all clocks under clk_wiz_0: clk_out1, clk_out2, clk_out3
Change their ID to 0, 1 and 2
Set a default clock: click Is Default for clk_out2
After everything is set up, it should report Info: No problem with Clock interface.

Interrupt

Go to the Platform Setup tab
Go to the Interrupt tab
Enable intr under axi_intc_0

AXI Interfaces

Go to the AXI Port tab in Platform Setup
Under zynq_ultra_ps_e_0, enable M_AXI_HPM0_FPD and M_AXI_HPM1_FPD. Keep the Memport and sptag default to M_AXI_GP and empty.

Note: The V++ linker will instantiate the AXI Interconnect automatically to connect between PS AXI Master interfaces and slave interfaces of acceleration kernels. One AXI Master interface will connect up to 16 kernels.

Under zynq_ultra_ps_e_0, multi-select all AXI slave interfaces: press Ctrl and click S_AXI_HPC0_FPD, S_AXI_HPC1_FPD, S_AXI_HP0_FPD, S_AXI_HP1_FPD, S_AXI_HP2_FPD, S_AXI_HP3_FPD.
Right-click the selections and select enable.

Change Memport of S_AXI_HPC0_FPD and S_AXI_HPC1_FPD to S_AXI_HP because we won't use any coherent features for these interfaces.
Type in simple sptag names for these interfaces so that they can be selected in the V++ configuration during the linking phase. HPC0, HPC1, HP0, HP1, HP2, HP3.

Under ps8_0_axi_periph, click M01_AXI, press Shift and click M07_AXI to multi-select master interfaces from M01_AXI to M07_AXI.
Right-click the selection and click on Enable.
Keep the Memport and sptag default to M_AXI_GP and empty.

Enable EMULATION

Select the PS instance zynq_ultra_ps_e_0 in the block diagram
Check the Block Properties window.
Scroll down to SELECTED_SIM_MODEL property. Change it from RTL to TLM in order to use the TLM model.

Export Hardware XSA

Validate the block design (ignore the critical warning about /axi_intc_0/intr)

Create a top module wrapper for the block design

In the Source tab, right-click system.bd in Design Sources group
Select Create HDL Wrapper.
Select Let Vivado manage wrapper and auto-update.
Click OK to generate a wrapper for block design.

Generate pre-synth design

Select Generate Block Design from Flow Navigator
Select Synthesis Options to Global. It will skip IP synthesis during generation.
Click Generate.

Export the platform

Click menu File -> Export -> Export Platform to launch the Export Hardware Platform wizard. This wizard can also be launched by the Export Platform button in Flow Navigator or Platform Setup window.
Click Next on the first information page.
Select Platform Type: Hardware and Hardware Emulation, click Next. If you skipped the emulation setup previously, select Hardware here.
Select Platform State: Pre-synthesis, click Next
Input Platform Properties and click Next. For example,
Name: zcu104_custom_platform
Vendor: Xilinx
Board: zcu104
Version: 0.0
Description: This platform provides high PS DDR bandwidth and three clocks: 100MHz, 200MHz, and 400MHz.
Fill in XSA file name: zcu104_custom_platform and keep the export directory as default.
Click Finish.
zcu104_custom_platform.xsa will be generated. The export path is reported in the Tcl console.

You now have a platform description ready for your project. As you might have noticed, we haven't synthesized or implemented anything yet. This will be done in the other steps.

If you want to speed up the process here is a TCL file doing all the previous steps for you.

note: you will need to convert back to tcl file first

Step 3 - Create a Petalinux project for our Hardware

Now that we have a hardware description of our project, we can create the Petalinux project which targets our HW configuration.

First, setup the petalinux environnement

source <petaLinux_tool_install_dir>/settings.sh

Create a PetaLinux project named zcu104_custom_plnx and configure the HW with the XSA file we created before:

petalinux-create --type project --template zynqMP --name zcu104_custom_plnx
cd zcu104_custom_plnx
petalinux-config --get-hw-description=<vivado_design_dir>

A Petalinux config menu will be launched and set to use the ZCU104 device tree in this configuration window.

Select DTG Settings->MACHINE_NAME
Modify it to zcu104-revc.
Select OK -> Exit -> Exit -> Yes to close this window.

After this step, your directory hierarchy should look like this:

- zcu104_custom_platform # Vivado Project Directory

- zcu104_custom_plnx # PetaLinux Project Directory

Customize Root File System, Kernel, Device Tree and U-boot

Using your prefered editor modify the user rootfs config files as follow

<your_petalinux_project_dir>/project-spec/meta-user/conf/user-rootfsconfig

Replace the content with the following package

CONFIG_gpio-demo
CONFIG_peekpoke
CONFIG_xrt
CONFIG_dnf
CONFIG_e2fsprogs-resize2fs
CONFIG_parted
CONFIG_resize-part
CONFIG_packagegroup-petalinux-vitisai
CONFIG_packagegroup-petalinux-self-hosted
CONFIG_cmake
CONFIG_packagegroup-petalinux-vitisai-dev
CONFIG_xrt-dev
CONFIG_opencl-clhpp-dev
CONFIG_opencl-headers-dev
CONFIG_packagegroup-petalinux-opencv
CONFIG_packagegroup-petalinux-opencv-dev
CONFIG_mesa-megadriver
CONFIG_packagegroup-petalinux-x11
CONFIG_packagegroup-petalinux-v4lutils
CONFIG_packagegroup-petalinux-matchbox

Now we need to go into the config tool and Enable the selected rootfs packages

petalinux-config -c rootfs

Select User Packages
select the name of all rootfs libraries listed above.

Enable OpenSSH/ package management and disable dropbear (optional)

In the RootFS configuration window, go to the root directory by selecting Exit once.
Go to Image Features.
Disable ssh-server-dropbear and enable ssh-server-openssh
enable package-management and debug_tweaks options
Click Exit.
Go to Filesystem Packages-> misc->packagegroup-core-ssh-dropbear and disable packagegroup-core-ssh-dropbear.
Go to the Filesystem Packages level by selecting Exit twice.
Go to console -> network -> openssh and enable openssh, openssh-sftp-server, openssh-sshd, openssh-scp.
Go to root level by selecting Exit four times.

(optionnal) Add GIT and VIM

Go to Filesystem Packages-> console-> utils-> git
Enable git, Git-bash-completion
Go to Filesystem Packages-> console-> utils-> vim
Enable vim
Exit and save the changes.

Disable CPU IDLE in kernel config (Recommended during debugging).

petalinux-config -c kernel

Ensure the following items are TURNED OFF by entering 'n' in the [ ] menu selection:

CPU Power Management > CPU Idle > CPU idle PM support
CPU Power Management > CPU Frequency scaling > CPU Frequency scaling
Exit and Save.

Update the Device tree

Edit the following device tree file with some basic changes:

project-spec/meta-user/recipe-bsp/files/system-user.dtsi.

/include/ "system-conf.dtsi"
/ {
    chosen {
    	bootargs = "earlycon console=ttyPS0,115200 clk_ignore_unused root=/dev/mmcblk0p2 rw rootwait cma=512M";
    };
};

&sdhci1 {
      no-1-8-v;
      disable-wp;
};

Build PetaLinux Images

Now that the configuration is complete we need to build the project. This step can take a lot of time so you might want to grab a cup of coffee.

petalinux-build

Once the build process is complete you will also need to create the sysroot self-installer.

petalinux-build --sdk

Step 4 - Create the Vitis Platform

Now that we have a Petalinux project ready and an HW description, it is time to create the Vitis platform project inside in which we will create the accelerated application and build the supporting hardware file.

First, let's create a directory that will hold the Vitis platform project. It should be located in your base project directory.

mkdir zcu104_custom_pkg
cd zcu104_custom_pkg
mkdir pfm

Your directory structure should look like this

- zcu104_custom_platform # Vivado Project Directory
- zcu104_custom_plnx     # PetaLinux Project Directory
- zcu104_custom_pkg      # Platform Packaging Directory
  - pfm                  # Platform Packaging Source

Now we need to install the sysroot inside our platform

cd <base dir>/zcu104_custom_plnx/images/linux
./sdk.sh -d ~/zcu104_custom_platform_2021/zcu104_custom_pkg/

Create Boot and sd_card directory

cd zcu104_custom_pkg/pfm
mkdir boot
mkdir sd_dir

Prepare the boot component. Copy the following file to zcu104_custom_pkg/pfm/boot

zynqmp_fsbl.elf
pmufw.elf
bl31.elf
u-boot-dtb.elf: rename to u-boot.elf
system.dtb

Prepare sd_dir directory. Copy the following file to cu104_custom_pkg/pfm/sd_dir

boot.scr
system.dtb

Create a Vitis Platform

Now that the project directory structure is complete it's time to create the project.

cd zcu104_custom_pkg
source ~/Xilinx/Vitis/2021.1/settings64.sh

Important - At this point, the process will differ from the provided example by Xilinx since on my desktop the new platform project doesn’t work. We are going to use a console flow to create the platform and then add it to the project. You should find a source example here.

xsct xsct_create_pfm.tcl
vitis &

note: you will need to convert back to tcl file first

We will review the configuration in the next step.

Create a Vitis Accelerated Application

In Vitis:

File -| New application project

Add the newly created platform to the list

Select the platform and click next.

Now, let’s create a vector addition to test the workflow.

Name the application: vadd

Click Next

Select the xrt domain
Set sysroot path to : <full_pathname_to_zcu104_custom_pkg>/sysroots/cortexa72-cortexa53-xilinx-linux
Set Root Fs to : zcu104_custom_plnx/images/linux/rootfs.ext4
Set Kernel Image to : zcu104_custom_plnx/images/linux/Image
Click Next

Now Select the Vector addition example and click next

We will add the Platform project for review

Click file import ->Eclipse workspace

Let's review the project

Double click on the platform.spr
Click on xrt
On the BIF file line: Click browse and select Generate BIF

On the left pane select the zcu104_custom platform and build the project.

In the Explorer window double click the vadd.prj file to open it, change the Active Build configuration from Emulation-SW to Hardware.

Now, select the vadd_system and build the project. This could take a while since we recompile the hardware to add a vector accelerator. Once it is done, we have our first accelerated application! You can review the code to understand what the application will do.

The important part is located inside :

vadd_system/vadd/src/vadd.cpp

You will find the code required to load the Binary file with the accelerator in Linux and then pass data to it.

In vadd_system/vadd_kernels/src/krnl_vadd.cpp you will find the HLS source code of the accelerator being generated and integrated into our project.

SD card preparation

Now we need to flash an SD card to test our application on the target board.

Write the sd_card.img image file located in :

~/zcu104_custom_pkg/vadd_system/Hardware/package

using your favorite image write tool (i.e. BalenaEtcher on windows)

Booting the board

Once the board is booted, connect to it using the serial port or an ssh connection as you prefer.

Go to the FAT32 partition

cd /mnt/sd-mmcblk0p1/

Execute the application by providing the new binary with the accelerator

./vadd binary_container_1.xclbin

If everything went as planned you should see:

INFO: Reading binary_container_1.xclbin

Loading: 'binary_container_1.xclbin'

TEST PASSED

Congratulation, you just completed the first part of the tutorial. You now have a working platform, running petalinux with an example vector accelerator. In the next part, we are going to build on that platform to include a DPU (deep learning processing unit) and start using some accelerated AI applications, live from the board.

highspeed Digital SYSTEMS experts

FPGA Video AI deployment – From platform creation to AI deployment - Part 1