A study in integrating multiple biological data sources
Abstract: Life scientists often have to retrieve data from multiple biological data sources to solve their research problems. Although many data sources are available, they vary in content, data format, and access methods, which often vastly complicates the data retrieval process. The user must decide which data sources to access and in which order, how to retrieve the data and how to combine the results - in short, the task of retrieving data requires a great deal of effort and expertise on the part of the user.Information integration systems aim to alleviate these problems by providing a uniform (or even integrated) interface to biological data sources. The information integration systems currently available for biological data sources use traditional integration approaches. However, biological data and data sources have unique properties which introduce new challenges, requiring development of new solutions and approaches.This thesis is part of the BioTrifu project, which explores approaches to integrating multiple biological data sources. First, the thesis describes properties of biological data sources and existing systems that enable integrated access to them. Based on the study, requirements for systems integrating biological data sources are formulated and the challenges involved in developing such systems are discussed. Then, the thesis presents a query language and a high-level architecture for the BioTrifu system that meet these requirements. An approach to generating a query plan in the presence of alternative data sources and ways to integrate the data is then developed. Finally, the design and implementation of a prototype for the BioTrifu system are presented.
This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.