Optimization Methods for Large Scale Combinatorial Problems and Bijectivity Constrained Image Deformations

Abstract: This thesis treats two separate but connected themes. This affiliation originates in optimization being the common choice of method for solving most of the occurring challenges. The first theme of the thesis is image segmentation. This is usually defined as the task of distinguishing objects from background in unseen images. This visual grouping process is typically based on low-level cues such as intensity, homogeneity or image contours. Popular approaches include thresholding techniques, edge based methods and region-based methods. Regardless of the method, the difficulty lies in formulating and describing the perception of what constitutes foreground and background in an arbitrary image. Furthermore, such a grouping is also highly contextually driven, certain image regions may be labeled differently depending on the task at hand - are we looking for people, buildings or trees? If one also allows for more labels than only foreground and background, the problem becomes increasingly harder and requires a much higher level of scene understanding. Once a formulation of the problem has been established and properly stated the question of how to efficiently solve it still remains. The complexity of this task and the size of most natural images typically leads to very large and difficult optimization problems. It is these issues we make an attempt at addressing in this thesis. We are interested in how to efficiently find visually relevant image partitions as well as how prior information can be included into the segmentation process. The second theme of this thesis concerns non-linear deformations of images and its applications. Functions that map $R^2$ onto itself are widely used in computer vision, medical imaging and computer graphics. What is common to all three is that mappings are used to model deformation occurring in natural images. As such deformations are highly complex they are near impossible to characterize. A reasonable and widely accepted assumption, or approximation, is that as the overall structure of the objects depicted will remain intact after deformation, hence folding or tearing of the images should never occur. Under these premises there must exist a dense mapping that is both one-to-one and onto. The deformations must be bijective. This is not entirely correct as for instance self-occlusion can not be described by bijective mappings. There exist an abundance of methods for parameterizing non-linear deformations. This part of the thesis concerns conditions for bijectivity of, perhaps the most commonly used method of describing non-linear deformations, the thin-plate spline mapping and its applications in computer vision.