Library Preparation for High Throughput DNA Sequencing

Abstract: Order 3 billion base pairs of DNA in the correct order and you get the blueprint of a human, the genome. Before the introduction of massively parallel sequencing a little more than a decade ago it would cost around $10 million to get this blueprint. Since then, sequencing throughput and cost have plummeted and now that figure is around $1000, and large sequencing centres such as the National Genomics Infrastructure in Stockholm is sequencing the equivalent of 25 human genomes per hour. The papers that form the basis of this thesis cover different aspects of the rapidly expanding DNA sequencing field. Paper I describes a model system that employ massively parallel sequencing to characterize the behaviour of type IIS restriction enzymes. Enzymes are biological macromolecules that catalyse chemical reactions in the cell. All commercially available sequencing systems use enzymes to prepare the nucleic acids before they are loaded on the machine. Thus, intimate knowledge of enzymes is vital not only when designing new sequencing protocols, but also for understanding the limitations of current protocols. Paper II covers the automation of a library preparation protocol for spatially resolved transcriptome sequencing. Automation increases the sample throughput and also minimises the risk of human errors that can introduce technical noise in the data. In paper III, the power of massively parallel sequencing is employed to describe the RNA content of the endometrium at two different time points during the menstrual cycle. Finally, paper IV covers the sequencing of highly degraded nucleic acids from formalin fixed, paraffin embedded samples. These samples often have a rich clinical background, making them extremely valuable for researchers. However, it is challenging to sequence these samples and this study looks at the impact that different preparation kits have on the quality of the sequencing data.