Deciphering sequence data : A multivariate approach

Abstract: In this thesis, attention has been focused on the quantitative description of nucleic acids, proteins and peptides. The strategy was to use multivariate chemometrical methods for improving the understanding of the complex structural codes of these kinds of biological molecules. Tools have been developed that enable quantitative modelling of biological molecules, i.e. models based on data that quantitatively describes their properties. The advantage of such models is that they provide interpretations in terms of chemical characteristics for complex features such as similarity, dissimilarity and potency.By a multivariate physical-chemical characterization of the building blocks of nucleic acids and proteins, i.e. nucleosides and amino acids, descriptive scales have been developed, so called principal properties. The scales give a description of the intrinsic properties of these building blocks. The multivariate characterization results in a multi-property matrix. A principal component analysis of the multi-property matrix gives a small number of latent variables which are considered as the principal properties of the characterized molecules.The principal property scales may be used for a wide range of different purposes, such as detecting trends and groupings in large sequence data sets, and for analyzing quantitative relationships between structure and function. In statistical experimental design, the descriptors are well suited as design variables to select combinations of amino acids in such a way that they span a wide range of properties.The use of these principal property descriptors is demonstrated in the quantitative modelling of relationships between structure and activity of various peptide series, DNA-promoters and in the quantitative modelling of transfer ribonucleic acid sequence data (tRNA).

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.