I am not sure but it should be dependent on the block RAMs of the FPGA. In addition, the main bottleneck shall be from the DRAM side rather than the BRAM side.
For synchronizing the data between each sub processing block, in the current design, the upcoming block waits for the previous blocks to finish all calculations, write to BRAM, and then it reads them. There are obvious optimizations that can be done here and they are still being worked on.
As of now, the design waits until all data are finished writing to DRAM, then continue to the next set of data/images as pipelining isn’t implemented yet. Future plan is to pipeline these sub blocks so that the all resources are working at all clock cycles. However, a tradeoff in BRAM may be made.
YIP, Kam Wai
January 19, 2022 4:07 pm
How many images per second can it infer? Can it perform real-time video process?
Hello, I haven’t tested out how much images per second it can infer as I am still working on some optimization within the hardware design. As of real-time video processing, I am not sure if it is possible or not but I may try it out. Currently the resource allocations are based on 20 images of Cifar-10 dataset. Thank you!!
Mr. Zheng,
How many images per second can it infer? Can it perform real-time video process?
Hello, I haven’t tested out how much images per second it can infer as I am still working on some optimization within the hardware design. As of real-time video processing, I am not sure if it is possible or not but I may try it out. Currently the resource allocations are based on 20 images of Cifar-10 dataset. Thank you!!