Prediction, Design and Determination of Protein Structures

University dissertation from Department of Biochemistry and Structural Biology

Abstract: The three-dimensional structure of protein is encoded in its amino acid sequence. Modern structure prediction algorithms make it possible to predict the structure of small proteins using sequence information alone. We used the Fold-and-Dock algorithm, which is part of the Rosetta macromolecular modeling suite, for de novo structure prediction of coiled-coil proteins. Members of this protein family consist of alpha-helices that assemble into symmetric complexes by winding around each other. Although the sequences of different coiled-coils follow a similar general pattern, the number of helices in a complex ranges from two to five. Remarkably, minor modifications of the sequence can change the oligomeric state of a coiled-coil. We tested, different approaches to predict the oligomeric state of homomeric coiled-coils by comparing the energies of computational models of several alternate complexes. Comparing the free energies of structural models of different size is highly challenging. Our results show that an accurate comparison of different oligomers must consider the free energy of forming a helix. We were able to predict the lowest free energy oligomer in up to 23 out of 33 tested coiled-coils. Additionally, we found that parallel dimeric coiled-coils frequently show significant backbone asymmetries. To be able to accurately predict the structures of this sub-class, we introduced a new Fold-and-Dock version, which now allows for prediction asymmetric complexes. Subsequently, we used models of coiled-coils, generated by de novo structure prediction, to test whether those are accurate enough to be used for solving X-ray structures by molecular replacement. To this end, we implemented the program CCsolve, which combines existing crystallographic software for fully automated phasing from de novo models, model building and structure refinement, optimized for homomeric coiled-coils. In our benchmark set of 24 coiled-coil structures, only two structures failed; the average difference between the previously reported Rfree values and those obtained by de novo phasing using CCsolve was 0.01. The successfully solved structures had data resolution up 2.5Å and a C-alpha r.m.s.d. of up to 3.3Å between initial model and crystal structure. Improved force-fields for structure prediction made it possible to find sequences that would fold into new protein structures. We developed a general method to design new repeat proteins, which can serve as binding scaffolds for developing new bio-sensors or inhibitors. Currently, there methods to engineer binding proteins for biochemical or medicinal applications, like antibody design or sequence-based design of repeat proteins. However, those methods lack the possibility to adjust the shape of a binder to the target structure. Using state-of-the-art protein design methods, implemented in Rosetta, we designed leucine-rich repeat (LRR) proteins with a geometry tailored towards the specific application in question. The method utilizes the variety of LRRs with known structure. A single self-compatible repeat is identified that can be re-designed to form a structure with a predefined geometry. LRR proteins form curved, elongated structures with a significant helical twist. As a proof of principle, we designed an LRR-protein that displays a high curvature, no helical twist. The resulting proteins can be expressed as monomer with terminal capping repeats. When expressed without caps, two monomers can self-assemble into planar ring-structure that has not been observed in nature. To be used in ‘real-world’ applications, design protein must exhibit good biophysical properties. Experimental techniques like directed evolution allow to optimize certain features of a protein by screening many different protein variants. Currently, there is no established method available that can be used to screen large numbers of proteins in a high-throughput manner. We developed a fluorescence-based assay, that allows for a fast comparison of proteins with different biophysical properties. The method monitors the endogenous stress-response in E.coli. By comparing signal from cell cultures expressing point mutants of a test protein with different thermostabilities, we found a good correlation between Tm and the expression of the stress-induced chaperone DnaK. We used a plasmid system that can harbor a gene for overexpression together with multiple reporter genes. This setup should enable the parallel detection of multiple stress-induced proteins upon overexpression of a protein that is to be optimized.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.