Prediction, Design and Determination of Protein Structures

University dissertation from Department of Biochemistry and Structural Biology, Lund University

Abstract: Popular Abstract in English Life on earth has evolved countless forms, beautiful shapes and colors as well as astonishingly complex systems. Remarkably, all living creatures consist of cells, which themselves are build from the same basic set of molecules, most importantly nucleic acids (RNA and DNA), lipids and proteins. The latter are large macromolecules that fulfill a vast number of tasks; they are best described as the work horses of the cell. Most commonly known are enzymes, which catalyze chemical reactions and hormones, which are used to send signals between distant parts of the body. But also muscle contraction and reading the genetic code from DNA are processes performed by specific proteins. The word protein describes a class of molecules that have a common basic chemical structure. But each protein has a unique three-dimensional structure. This three-dimensional structure is optimized during the course of evolution, to suit the specific task a protein has. Each protein can be described as a chain of small units, called amino acids. All living cells share the same set of 20 amino acids, each of which has distinct properties. This chain folds into a defined compact structure, which is encoded by the sequence of amino acids. The question, how it is possible that the amino acid sequence dictates the structure of a protein, is the focus of many scientist. Sophisticated computer simulations made it possible to demonstrate how the amino acids pack against each other, much like in a three-dimensional jigsaw puzzle. Today, it is possible to use simulations for predicting the structure of small proteins when the amino acid sequence is known. In this work, we both used and extended a state-of-the-art simulation program, called Rosetta to predict how certain proteins come together to form a protein complex. How many single proteins are needed to form such a complex is also encoded in the amino acid sequence. We succeeded to predict this number for 70% of the cases we tested. In another project, we developed a way to use the predicted complexes to guide the determination of the three-dimensional structure of those proteins. Solving a protein structure is often a hugely laborious and complicated enterprise, which frequently fails. Having correctly predicted structures of protein complexes available, makes this task significantly easier and faster. The method we developed might help other researchers to solve such structures faster in the future. We further used the Rosetta program to design proteins that do not exist in nature. Designing proteins in the computer is possible because the task is the inverse of predicting a structure: one defines a structure and asks the question, which amino acid sequence would form this structure. Using computer simulations, we designed proteins from a specific class, which is known to be well suited for binding other proteins. This way, we obtained a generic scaffold that may be used in the future to develop medications or new diagnostic tools. The key of the method we developed is that new scaffolds can be designed, with a shape that is tailored for the protein it is supposed to bind. In the fourth project, we focused on the stability of proteins. When designing novel proteins for applications, they need to be very stable. Stability can often be increased by changing only a few amino acids in the sequence. We successfully tested a way to easily compare the stabilities of different variants of a protein. The method allows for comparing stabilities while the proteins are being produced by bacteria. The noninvasive character makes the assessment fast, so that it might be possible to compare huge numbers of variants in a very short time. This way, proteins could be stabilized to make them available as medication or to create new biomaterials.