Improving interpretation by orthogonal variation : Multivariate analysis of spectroscopic data

Abstract: The desire to use the tools and concepts of chemometrics when studying problems in the life sciences, especially biology and medicine, has prompted chemometricians to shift their focus away from their field‘s traditional emphasis on model predictivity and towards the more contemporary objective of optimizing information exchange via model interpretation. The complex data structures that are captured by modern advanced analytical instruments open up new possibilities for extracting information from complex data sets. This in turn imposes higher demands on the quality of data and the modeling techniques used. The introduction of the concept of orthogonal variation in the late 1990‘s led to a shift of focus within chemometrics; the information gained from analysis of orthogonal structures complements that obtained from the predictive structures that were the discipline‘s previous focus. OPLS, which was introduced in the beginning of 2000‘s, refined this view by formalizing the model structure and the separation of orthogonal variations. Orthogonal variation stems from experimental/analytical issues such as time trends, process drift, storage, sample handling, and instrumental differences, or from inherent properties of the sample such as age, gender, genetics, and environmental influence. The usefulness and versatility of OPLS has been demonstrated in over 500 citations, mainly in the fields of metabolomics and transcriptomics but also in NIR, UV and FTIR spectroscopy. In all cases, the predictive precision of OPLS is identical to that of PLS, but OPLS is superior when it comes to the interpretation of both predictive and orthogonal variation. Thus, OPLS models the same data structures but provides increased scope for interpretation, making it more suitable for contemporary applications in the life sciences. This thesis discusses four different research projects, including analyses of NIR, FTIR and NMR spectroscopic data. The discussion includes comparisons of OPLS and PLS models of complex datasets in which experimental variation conceals and confounds relevant information. The PLS and OPLS methods are discussed in detail. In addition, the thesis describes new OPLS-based methods developed to accommodate hyperspectral images for supervised modeling. Proper handling of orthogonal structures revealed the weaknesses in the analytical chains examined. In all of the studies described, the orthogonal structures were used to validate the quality of the generated models as well as gaining new knowledge. These aspects are crucial in order to enhance the information exchange from both past and future studies.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)