Targeted Long-read Sequencing : Development and Applications in Medical Genetics

Abstract: Targeted sequencing has the advantage of providing pinpointed DNA information, while costs and data-analysis efforts are reduced. If targeted sequencing is combined with single molecule long-read sequencing, it can become a powerful tool to investigate genomic regions traditionally difficult using the predominantly used short-read sequencing platforms, including repetitive regions and large structural variants.The aim of this thesis has been to develop and apply novel targeted long-read sequencing protocols to solve research questions of biomedical and clinical interest. In Paper I we utilized a new amplification-free targeted long-read sequencing method to study trinucleotide repeats in the huntingtin (HTT) gene, associated with Huntington’s disease. This method generated reads spanning the entire repeats, and we could accurately determine the repeat sizes in patient samples. Moreover, we could discover somatic variation of HTT repeat elements as a result of sequencing single, unamplified DNA molecules. In Paper II we present the Xdrop technology, a microfluidic-based system for targeted enrichment of large DNA molecules in droplets from low input samples. We applied the Xdrop technology to detect human papilloma virus 18 (HPV18) integration sites in the human genome of a cervical cancer cell line by targeting the virus genome. We also demonstrated its utility in detecting and phasing SNVs in the tumor suppressor gene TP53 in leukemia cells. In Paper III we employed targeted long-read sequencing to identify CRISPR-Cas9 off-target mutations in vitro with our two novel methods Nano-OTS and SMRT-OTS. Importantly, we were able to identify Cas9 cleavage sites in regions of the human genome that are difficult or impossible to assess using short-read sequencing. The aim of Paper IV was to investigate large structural variants (SVs) induced by CRISPR-Cas9 at on-target and off-target sites in genome edited zebrafish and their offspring. Nano-OTS was used to identify Cas9 off-target sites for four guide RNAs, which were also used for genome editing of fertilized fish eggs. Aided by long-read re-sequencing, we showed that Cas9 can induce large SVs at both on-target and off-target sites in vivo, and that these adverse variants can be passed on to the next generation.This thesis has highlighted a diversity of targeted long-read sequencing methods and some of their applications in medical genetics. We believe these methods could have an important place in future research and clinical diagnostics, and that the scope of their utility will be far beyond the applications demonstrated in this work.