Protein Mixture Inference as Hitting Set Variants and Linear Algebra Problems

University dissertation from Chalmers University of Technology

Abstract: This work is dedicated to the problems of protein inference and quantification in bottom-up proteomics, and, in particular, in shotgun proteomics. We adopt a rather classical approach of representing inference problem as a set cover, where proteins are understood as sets of their observations: peptides' masses or sequences. However, we seek concise enumeration of all possible mixtures rather than some optimal mixture. Such enumeration gives insight on how likely every protein is to be in the correct mixture. In general, the corresponding Set Cover instances, are not very hard unless one admits, that there were experimental errors. Therefore we state that the hardest part is to first remove all possible errors. The corresponding computational problem's formulation is provided. We proceed with studying its complexity and performance in practice. Protein quantification problem is modeled in terms of linear systems. We advocate use of shared peptides in the data. It is known that these data makes analysis more difficult and error-prone. We study how bad can be error propagation, if one uses shared peptides. We conclude with a method for adjusting incorrect observations, given that their number is considerably low.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.