Algorithms and Methods for Robust Processing and Analysis of Mass Spectrometry Data

Abstract: Liquid chromatography-mass spectrometry (LC-MS) and mass spectrometry imaging (MSI) are two techniques that are routinely used to study proteins, peptides, and metabolites at a large scale. Thousands of biological compounds can be identified and quantified in a single experiment with LC-MS, but many studies fail to convert this data to a better understanding of disease biology. One of the primary reasons for this is low reproducibility, which in turn is partially due to inaccurate and/or inconsistent data processing. Protein biomarkers and signatures for various types of cancer are frequently discovered with LC-MS, but their behavior in independent cohorts is often inconsistent to that in the discovery cohort. Biomarker candidates must be thoroughly validated in independent cohorts, which makes the ability to share data across different laboratories crucial to the future success of the MS-based research fields. The emergence and growth of public repositories for MSI data is a step in the rightdirection. Still, many of those data sets remain incompatible one another due to inaccurate or incompatible preprocessing strategies. Ensuring compatibility between data generated in different labs is therefore necessary to gain access to the full potential of MS-based research. In two of the studies that I present in this thesis, we used LC-MS to characterize lymph node metastases from individuals with melanoma. Furthermore, my thesis work has resulted in two novel preprocessing methods for MSI data sets. The first one is a peak detection method that achieves considerably higher sensitivity for faintly expressed compounds than existing methods, and the second one is a accurate, robust, and general approach to mass alignment. Both algorithms deliberately rely on centroid spectra, which makes them compatible with most shared data sets. I believe that the improvements demonstrated by these methods can lead to a higher reproducibility in the MS-based research fields, and, ultimately, to a better understanding of disease processes.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)