Embedded high-resolution stereo-vision of high frame-rate and low latency through FPGA-acceleration

University dissertation from Västerås : Mälardalen University

Abstract: Autonomous agents rely on information from the surrounding environment to act upon. In the array of sensors available, the image sensor is perhaps the most versatile, allowing for detection of colour, size, shape, and depth. For the latter, in a dynamic environment, assuming no a priori knowledge, stereo vision is a commonly adopted technique. How to interpret images, and extract relevant information, is referred to as computer vision. Computer vision, and specifically stereo-vision algorithms, are complex and computationally expensive, already considering a single stereo pair, with results that are, in terms of accuracy, qualitatively difficult to compare. Adding to the challenge is a continuous stream of images, of a high frame rate, and the race of ever increasing image resolutions. In the context of autonomous agents, considerations regarding real-time requirements, embedded/resource limited processing platforms, power consumption, and physical size, further add up to an unarguably challenging problem.This thesis aims to achieve embedded high-resolution stereo-vision of high frame-rate and low latency, by approaching the problem from two different angles, hardware and algorithmic development, in a symbiotic relationship. The first contributions of the thesis are the GIMME and GIMME2 embedded vision platforms, which offer hardware accelerated processing through FGPAs, specifically targeting stereo vision, contrary to available COTS systems at the time. The second contribution, toward stereo vision algorithms, is twofold. Firstly, the problem of scalability and the associated disparity range is addressed by proposing a segment-based stereo algorithm. In segment space, matching is independent of image scale, and similarly, disparity range is measured in terms of segments, indicating relatively few hypotheses to cover the entire range of the scene. Secondly, more in line with the conventional stereo correspondence for FPGAs, the Census Transform (CT) has been identified as a recurring cost metric. This thesis proposes an optimisation of the CT through a Genetic Algorithm (GA) - the Genetic Algorithm Census Transform (GACT). The GACT shows promising results for benchmark datasets, compared to established CT methods, while being resource efficient.