Methods to Prepare DNA for Efficient Massive Sequencing

University dissertation from Stockholm : KTH Royal Institute of Technology

Abstract: Massive sequencing has transformed the field of genome biology due to the continuous introduction and evolution of new methods. In recent years, the technologies available to read through genomes have undergone an unprecedented rate of development in terms of cost-reduction. Generating sequence data has essentially ceased to be a bottleneck for analyzing genomes instead to be replaced by limitations in sample preparation and data analysis. In this work, new strategies are presented to increase both the throughput of library generation prior to sequencing, and the informational content of libraries to aid post-sequencing data processing. The protocols developed aim to enable new possibilities for genome research concerning project scale and sequence complexity.The first two papers that underpin this thesis deal with scaling library production by means of automation. Automated library preparation is first described for the 454 sequencing system based on a generic solid-phase polyethylene-glycol precipitation protocol for automated DNA handling. This was one of the first descriptions of automated sample handling for producing next generation sequencing libraries, and substantially improved sample throughput. Building on these results, the use of a double precipitation strategy to replace the manual agarose gel excision step for Illumina sequencing is presented. This protocol considerably improved the scalability of library construction for Illumina sequencing. The third and fourth papers present advanced strategies for library tagging in order to multiplex the information available in each library. First, a dual tagging strategy for massive sequencing is described in which two sets of tags are added to a library to trace back the origins of up to 4992 amplicons using 122 tags. The tagging strategy takes advantage of the previously automated pipeline and was used for the simultaneous sequencing of 3700 amplicons. Following that, an enzymatic protocol was developed to degrade long range PCR-amplicons and forming triple-tagged libraries containing information of sample origin, clonal origin and local positioning for the short-read sequences. Through tagging, this protocol makes it possible to analyze a longer continuous sequence region than would be possible based on the read length of the sequencing system alone. The fifth study investigates commonly used enzymes for constructing libraries for massive sequencing. We analyze restriction enzymes capable of digesting unknown sequences located some distance from their recognition sequence. Some of these enzymes have previously been extensively used for massive nucleic acid analysis. In this first high throughput study of such enzymes, we investigated their restriction specificity in terms of the distance from the recognition site and their sequence dependence. The phenomenon of slippage is characterized and shown to vary significantly between enzymes. The results obtained should favor future protocol development and enzymatic understanding.Through these papers, this work aspire to aid the development of methods for massive sequencing in terms of scale, quality and knowledge; thereby contributing to the general applicability of the new paradigm of sequencing instruments.