Development and Application of Computational Models for Peptide-Protein Complexes

Abstract: Protein-protein interactions between a protein and a smaller protein fragment or a disordered segment of a protein are called peptide-protein interactions. Such interactions are commonplace in nature and vital for normal cell function in humans. For example, the onco-protein Myc con- tains a large disordered region with several segments involved in peptide-protein interactions as part of transcription regulation, and it is mis-regulated in the vast majority of all human can- cers. As such, understanding the structural details of peptide-protein interactions on an atomic level is a necessary endeavor for understanding disease pathways as well as facilitating targeted drug-design. While experimental methods for structure determination such as X-ray crystallography and NMR can determine the structure of many peptide-protein complexes, these methods are time- consuming and costly. Additionally, the disordered nature of peptides and a sometimes lower binding affinity than for protein-protein binding can lead to transient or weak (but still highly specific) interactions impossible to fully capture with experimental methods. This leads to the need for computational methods as support and complement. Such methods have classically used statistical potentials or simple template search approaches, but as the number of deposited structures in the protein databank (PDB) grows so does the potential for supervised machine learning. The papers in this thesis present the contributions of the author to the field of peptide-protein in- teraction complex prediction, mainly through use of machine learning models. The first papers apply a Random Forest classifier to detect similarities between binding interfaces deposited in the PDB and a peptide-protein pair being investigated to find the optimal templates for struc- ture prediction. In excess of producing predictions with good self-evaluation of performance, the development of the method also confirmed theories on the similarity of protein-protein, domain-domain, and peptide-protein interfaces. Two more method for peptide-protein docking are presented in later papers. One utilizes graph convolution neural networks to improve model selection from rigid-body-docking methods by including MSA profile information as a feature, which also lead to the discovery that while profile information such as position conservation does improve predictive performance, something also seen in the first papers, the most impor- tant features are the ones describing the structural details of the complex and the bonds between residues. The other uses a graph neural network as an additional scoring term to improve upon the already state-of-the-art performing local refinement method FlexPepDock, and is capable of refining even models generated by AlphaFold-multimer. Finally, two manuscripts focus on the application of computational approaches for research into the interactions of human cMyc with TBP and PPP1R10, respectively. In the first of these pa- pers, the template-based peptide-protein complex prediction methods developed in the earlier papers of the thesis are employed together with prior knowledge of the interaction to model the complex to a high degree of certainty not achievable by NMR alone. In the second of these papers, experimental data is used as a basis for computational modeling of the complex, and the modeled complex could act as a basis for further experiments characterizing the interaction. 

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.