Online Learning for Robot Vision

University dissertation from Linköping : Linköping University Electronic Press

Abstract: In tele-operated robotics applications, the primary information channel from the robot to its human operator is a video stream. For autonomous robotic systems however, a much larger selection of sensors is employed, although the most relevant information for the operation of the robot is still available in a single video stream. The issue lies in autonomously interpreting the visual data and extracting the relevant information, something humans and animals perform strikingly well. On the other hand, humans have great diculty expressing what they are actually looking for on a low level, suitable for direct implementation on a machine. For instance objects tend to be already detected when the visual information reaches the conscious mind, with almost no clues remaining regarding how the object was identied in the rst place. This became apparent already when Seymour Papert gathered a group of summer workers to solve the computer vision problem 48 years ago [35].Articial learning systems can overcome this gap between the level of human visual reasoning and low-level machine vision processing. If a human teacher can provide examples of what to be extracted and if the learning system is able to extract the gist of these examples, the gap is bridged. There are however some special demands on a learning system for it to perform successfully in a visual context. First, low level visual input is often of high dimensionality such that the learning system needs to handle large inputs. Second, visual information is often ambiguous such that the learning system needs to be able to handle multi modal outputs, i.e. multiple hypotheses. Typically, the relations to be learned  are non-linear and there is an advantage if data can be processed at video rate, even after presenting many examples to the learning system. In general, there seems to be a lack of such methods.This thesis presents systems for learning perception-action mappings for robotic systems with visual input. A range of problems are discussed, such as vision based autonomous driving, inverse kinematics of a robotic manipulator and controlling a dynamical system. Operational systems demonstrating solutions to these problems are presented. Two dierent approaches for providing training data are explored, learning from demonstration (supervised learning) and explorative learning (self-supervised learning). A novel learning method fullling the stated demands is presented. The method, qHebb, is based on associative Hebbian learning on data in channel representation. Properties of the method are demonstrated on a vision-based autonomously driving vehicle, where the system learns to directly map low-level image features to control signals. After an initial training period, the system seamlessly continues autonomously. In a quantitative evaluation, the proposed online learning method performed comparably with state of the art batch learning methods.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)