Tagging systems for sequencing large cohorts

University dissertation from Stockholm

Abstract: Advances in sequencing technologies constantly improves the throughput andaccuracy of sequencing instruments. Together with this development comes newdemands and opportunities to fully take advantage of the massive amounts of dataproduced within a sequence run. One way of doing this is by analyzing a large set ofsamples in parallel by pooling them together prior to sequencing and associating thereads to the corresponding samples using DNA sequence tags. Amplicon sequencingis a common application for this technique, enabling ultra deep sequencing andidentification of rare allelic variants. However, a common problem for ampliconsequencing projects is formation of unspecific PCR products and primer dimersoccupying large portions of the data sets.This thesis is based on two papers exploring these new kinds of possibilities andissues. In the first paper, a method for including thousands of samples in the samesequencing run without dramatically increasing the cost or sample handlingcomplexity is presented. The second paper presents how the amount of high qualitydata from an amplicon sequencing run can be maximized.The findings from the first paper shows that a two-tagging system, where the first tagis introduced by PCR and the second tag is introduced by ligation, can be used foreffectively sequence a cohort of 3500 samples using the 454 GS FLX Titaniumchemistry. The tagging procedure allows for simple and easy scalable samplehandling during sequence library preparation. The first PCR introduced tags, that arepresent in both ends of the fragments, enables detection of chimeric formation andhence, avoiding false typing in the data set.In the second paper, a FACS-machine is used to sort and enrich target DNA covered emPCR beads. This is facilitated by tagging quality beads using hybridization of afluorescently labeled target specific DNA probe prior to sorting. The system wasevaluated by sequencing two amplicon libraries, one FACS sorted and one standardenriched, on the 454 showing a three-fold increase of quality data obtained.