Feature-Based Image Processing for Rendering, Compression, and Visual Search

University dissertation from Stockholm : KTH Royal Institute of Technology

Abstract: Visual communication, vivid, meaningful, and creative, permits a way to express information visually. The communication media, by images, graphs and videos, passes informative color and shape to human perception sensors. But when we look close, we wonder: are we merely a passive receiver? Or can we actively select what we would like? Can our eyes only sense the visual images? Or can we enjoy a comprehensive immersive experience of the real world? To discover wonders, we have to explore the essentials and under wraps of visual communication.The work described in this dissertation develops the techniques of visual communication, including rendering, compression and visual search. We leave the conventional pixel-by-pixel image processing behind to explore the opportunities of sparse feature-based image processing. Thus, in this dissertation, a new objective is proposed: to seek a methodology to improve the performance of visual communication by using geometric information carried by the image features. To motivate it, we investigate two systems of visual communication, namely free viewpoint coding and rendering, and mobile visual search. The first system is based on the delivery and presentation of multi-view videos. We demonstrate how to use the image features for efficient video coding and high quality virtual view rendering. To further boost the importance of image features, we discuss the second system, the mobile visual search system, which is only based on the transmission of image features. We illustrate how to achieve reliable identification by using sparse image features.The system of free-viewpoint coding and rendering encodes and delivers the video content to the end-user and allows interactively choosing and rendering a virtual viewpoint in real time. We propose a content-adaptive coding and rendering method to separate the dynamic and static video content items, and apply content-adaptive coding and rendering to each of them. The content-adaptive scheme comprises the extraction of static and dynamic content, the video coding engines, and a synthesis unit for virtual view rendering. We address the problem of using the image features for rate-distortion optimal video coding and high quality geometry model-based rendering. For the video coding engine, we study a feature-based motion compensation scheme and an optimal rate allocation model. For the component of free viewpoint rendering, we study a hypothesis-driven free viewpoint rendering approach based on 3D model hypotheses.For the second system of mobile visual search, we propose a geometry-based search, namely mobile 3D visual search. The end-to-end scheme uses a client-server model for visual communication. The client extracts and encodes the features of the query. The server holds the feature database derived from the multi-view imagery, as well as the feature matching engine. We address the problem of rate-constrained identification by using multi-view image features. For the client, we propose a rate-constrained feature coding method to efficiently encode the query features. For the server side, we propose a double hierarchy to structure the database for indexing the database features. Moreover, we develop an algorithm that accomplishes 3D geometry-based matching and ranking by utilizing 3D geometric information and 2D texture information jointly.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)