Functional analysis of genetic risk markers

Abstract: Regulatory variants are the main factors responsible for genetic predisposition to how e.g. humans react differently to the environment. Therefore, it is important to locate and measure their effects, which can result in pre-disease intervention, new drugs, or as part in the personal medicine era, where selection and dose of a drug is based on a person’s genetic profile. In this thesis we have investigated the potential to link genetic markers to transcription using allele specific expression (ASE), which can avoid influence of both population stratification bias and trans-factors, increasing the statistical power compared to using total RNA based linkage methods. To quantify expression levels, we have used RNA-sequencing, which automatically makes it possible to measure ASE, provided that there is a heterozygous variant within the transcribed fragment, which in turn makes it possible to discern the expression between the two alleles. RNA sequencing data tend to be complex and requires to be summarized into count measures before further analyzed for ASE. To facilitate this process and provide additional analytical support, we developed the software AllelicImbalance, which now is freely accessible within bioconductor, a bioinformatics repository for code and data. Using this software we investigated ASE behavior on the individual level of a single transcribed variant, within a gene, and for connections between an ASE event and known risk markers, previously established from Genome Wide Association Studies (GWAS). We showed in a dataset of 10 individuals that by measuring a consistent ASE over consecutive exons withing the same gene that an ASE signature is robust against dissimilarities in sequence. Further, because we showed that ASE stability covered several SNPs we established that short read sequencing is not a fundamental obstacle to the implementation of this technique. However, more individuals were needed to better assess a link to genetic variants. We continued our analysis in a larger dataset, in which one of the sequenced tissues had a representation of 680 individuals. This was enough to measure ASE as a regression of allelic fraction by genotype (aeQTL), conceptually similar to the regression of expression by genotype commonly used in eQTL studies. In this data we were able to explain novel risk SNPs using the aeQTL method, and showed that any bias for the reference allele had no significant effect on the regression. We moved on to test if aeQTL could pick up unique signals for 205 individuals in a tissue previously investigated for eQTL using a large cohort of more than 5000 individuals. Indeed, we detected 15 novel aeQTLs, which probably were masked by trans-regulation in the previous investigation. In addition, we describe the software ClusterSignificance, which tests for separation of groups in data with reduced dimensionality. The algorithm sets statistical rigor to a task previously done by visual inspection. This thesis gives an overview of progress of us and others in ASE investigations, which is becoming more than being just a compliment to eQTL. The future signals a more dominant role as more sequencing data becomes readily available, accessing the closest active link to cis-regulation.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.