Computer Vision for Automated Traffic Safety Assessment : A Machine Learning Approach

Abstract: Traffic safety is a complex and important research area with the potential to save many lives in the future. Two key problems are considered, namely the gathering of reliable and detailed road user statistics which can be used to estimate the safety of a traffic environment and taking advantage of surveillance infrastructure to guide and assist vehicles in real time, primarily autonomous ones. A conceptual Traffic Surveillance Vision Pipeline (TSVP) is introduced which has the potential to solve both problems, which contains the following five steps: calibration, object detection, pixel-to-world coordinate conversion, tracking and track analysis. Then, research is performed to experiment with and improve upon computer vision models and methods in the different parts of the TSVP.Paper I introduces a fast and efficient method for identifying empty and occupied parking spaces, using Machine Learning. This could be used for guiding autonomous vehicles to the nearest empty parking space.Paper II introduces an object detector design that takes advantage of the properties of surveillance videos, by combining a compact representation of movement with images, primarily improving the detection of small objects.Paper III introduces a software framework that implements most of the TSVP using established computer vision methods, including a CNN object detector. The work is designed to facilitate collaboration between computer vision and traffic researchers. The software is available as open source.Paper IV introduces a camera calibration method for a Trinocular Linear Camera Array (TLCA), a device that can capture more geometric data than a traditional monocular or stereo camera. The calibration method takes advantage of the unique properties of the TLCA while also being practical in traffic surveillance.Paper V introduces a tracking method in pixel coordinates, utilizing instance segmentations as opposed to axis-aligned bounding boxes. It is designed for high execution speed and contains a novel strategy for only computing appearance features temporally sparsely.Paper VI introduces a novel system for converting pixel coordinate detections into world coordinates through a CNN trained on that particular scene. Multi-class and multi-view setups are supported. It is demonstrated how the system can be combined with the method from Paper~\V to produce world coordinate tracks.Paper VII introduces a complete system for converting a traffic video into world coordinate tracks, including a novel CNN for converting pixel coordinate positions into world coordinates, without needing to be trained for that particular scene. Tracking is performed in world coordinates through a spatially bound Kalman filter.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)