Design Issues in VLSI Implementation of Image Processing Hardware Accelerators - Methodology and Implementation

University dissertation from Department of Electroscience, Lund University

Abstract: With the increasing capacity in today's hardware system design enabled by technology scaling, image processing algorithms with substantially more complexity can be implemented in a single chip with real-time performance. Combined with the demand for low power consumption or larger resolution seen in many applications such as mobile devices and HDTV, new design methodologies and hardware architectures are constantly called for to bridge the gap between designers productivity and what the technology could offer. This thesis tries to address several issues commonly encountered in the implementations of real-time image processing system designs. Two implementations are presented to focus on different design issues in hardware design for image processing systems. In the first part, a real-time video surveillance system is presented by combining five papers. The segmentation unit is part of a real-time automated video surveillance system developed at the department, aiming for tracking people in an indoor environment. Alternative segmentation algorithms are elaborated, and various modifications to the selected segmentation algorithm is made aiming for potential hardware efficiency. In order to bridge the memory bandwidth issue which is identified as the bottleneck of the segmentation unit, combined memory bandwidth reduction schemes with pixel locality and wordlength reduction are utilized, resulting in an over 70% memory bandwidth reduction. Together with morphology, labeling and tracking unit developed by two other Ph.D. students, the whole surveillance system is prototyped on an Xilinx VirtexII pro VP30 FPGA, with a real-time performance at a 25 fps with a resolution of 320 × 240. For the second part, two papers are extended to discuss issues of a controller design implementation a control intensive algorithm. To avoid tedious and error prone procedure of hand coding FSMs in VHDL, a controller synthesis tool is modified to automate a controller design flow from C-like control algorithm specification to controller implementation in VHDL. To address issues of memory bandwidth as well as power consumptions, a three levels of memory hierarchical architecture is implemented, resulting in off-chip memory bandwidth reduction from N2 per clock cycle to only 1 per pixel operation. Furthermore, potential power consumption reduction of over 2.5 times can be obtained with the architecture. Together with a controller synthesized from the developed tool, a real-time image convolution system is implemented on an Xilinx VirtexE FPGA platform.