ZW01a-21 Hardware Acceleration for AI Processing

Subscribe
Notify of
4 Comments
Inline Feedbacks
View all comments
ECE FYP
January 19, 2022 4:10 pm

Mr. Zheng,

  1. what is the bandwidth of input buffer in your design?
  2. this a data flow design, how to do handshaking between each sub processing block?
  3. What do your design do when the ReLU to RAM bus is busy?
CHENG, Yih
January 19, 2022 4:44 pm
Reply to  ECE FYP
  1. I am not sure but it should be dependent on the block RAMs of the FPGA. In addition, the main bottleneck shall be from the DRAM side rather than the BRAM side.
  2. For synchronizing the data between each sub processing block, in the current design, the upcoming block waits for the previous blocks to finish all calculations, write to BRAM, and then it reads them. There are obvious optimizations that can be done here and they are still being worked on.
  3. As of now, the design waits until all data are finished writing to DRAM, then continue to the next set of data/images as pipelining isn’t implemented yet. Future plan is to pipeline these sub blocks so that the all resources are working at all clock cycles. However, a tradeoff in BRAM may be made.
YIP, Kam Wai
January 19, 2022 4:07 pm

How many images per second can it infer? Can it perform real-time video process?

CHENG, Yih
January 19, 2022 4:17 pm
Reply to  YIP, Kam Wai

Hello, I haven’t tested out how much images per second it can infer as I am still working on some optimization within the hardware design. As of real-time video processing, I am not sure if it is possible or not but I may try it out. Currently the resource allocations are based on 20 images of Cifar-10 dataset. Thank you!!