Advancing bioinformatics methods for in-depth proteome analysis based on high-resolution mass spectrometry
Abstract: Mass spectrometry-based shotgun proteomics has become one of the essential techniques for comprehensive studies of living systems. Due to the inherent complexity of proteomes and the data, bioinformatics plays a critical role to translate mass spectra into biological information and knowledge. Adapting to the increased availability of high-resolution mass analyzers, computational strategies for processing shotgun proteomics data should have some adjustments to utilize the advantages of modern instruments. This thesis presents five constituent papers to illustrate the methodological advancements for analyzing shotgun proteomics data that are generated from high-resolution mass spectrometry. Paper-I describes the DeMix workflow for protein identification, in which we broke down an old paradigm of tandem mass spectrometry by converting the unwanted co-fragmentation events into an advantage of natural multiplexing. DeMix simplifies the data processing procedure and significantly improves protein identification outcomes. Paper-III describes a label-free extension of the DeMix workflow, termed DeMix-Q, which makes use of the quantitative features of extracted ion chromatograms (XICs) for reliably propagating peptide identifications across LC-MS/MS experiments. DeMix-Q improves the reproducibility of peptide quantification by addressing the missing value problem that is caused by the data-dependent acquisition of MS/MS. Based on the results, the concept of quantification-centered proteomics has been proposed. In the practice of quantification-centered proteomics, a flexible proteome summarizing approach termed Diffacto is described in Paper-V, which utilizes the information about covariation of peptides’ abundances to improve the relative quantification of proteins. Diffacto offers automatic quality control to remove inconsistent and unreliable quantitative data on peptides. The combination of a weighted summarizing method and an efficient FDR estimation provides significant enhancement of data utility for large-scale comparative proteomics. In Paper-II, an improved pI estimation method has been introduced to the novel device for sample fractionation based on isoelectric focusing technique. In Paper-IV and V, the applications of peptide de novo sequencing have been demonstrated for analyzing complex proteomes in the absence of reference databases.
This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.