Regulation of the vertebrate transcriptome in development and disease
Abstract: In the last decade we have seen a tremendous development in the omics area (genomics, transcriptomics, proteomics etc.), making high throughput methods increasingly costeffective and available. The development in RNA-sequencing technology now enables us to sequence whole transcriptomes of hundreds or even thousands of samples or single cells simultaneously in only a few days. With the ability to quickly create millions of reads for thousands of genes in thousands of samples comes a computational challenge of how to make sense of the data. Due to the use of short sequencing reads, duplicate genes, biased base composition and repetitive regions in the genome, reads might not be uniquely assignable to a single gene. This problem can be solved either by computationally assigning multi-mapping reads to the most likely position, or excluding these reads and normalizing gene expression for the uniquely mappable positions in a gene. In paper I, we describe a software application for efficiently finding and storing the mappability data for every position in the genome, for subsequent use in normalization of RNA-seq data. When the first drafts of the human genome were published in 2001, it became clear that the majority of our DNA does not consist of protein-coding genes. Since then, a multitude of new functional non-coding RNA species have been discovered, but also transcription of seemingly non-functional RNA from open chromatin regions, such as promoter upstream transcripts (PROMPTs). In paper II, we decipher the physical interactions between the exosome complex, the NEXT complex and the cap-binding complex, and the role each complex has in targeting PROMPTs for degradation. In early embryonic development, having a mechanism for starting different developmental programs in a different set of cells is essential for multi-cellular organisms to develop. In the African clawed frog, Xenopus laevis, this mechanism involves sorting maternal RNA to different hemispheres of the oocyte, which will later be inherited asymmetrically to the cells in the developing embryo. The zygotic expression starts only after 12 cell divisions, and at the early stages the maternal RNA control the development. In paper III, we use de novo transcriptome assembly to get a good annotation of X. laevis in the absence of a fully assembled genome. We then use single cell RNA-sequencing to study the RNA sorting and search for sorting motifs in the 2-16 cell stage embryo. An advantage of full length RNA-sequencing is the possibility to study alternative splicing alongside expression estimates. Spinal muscular atrophy (SMA) is genetic disease, characterized by progressive loss of somatic motor neurons. The disease is an effect of the loss of the SMN1 gene, which is only partly compensated for by the orthologous SMN2 gene since it is less efficient in producing full-length SMN protein. SMN is involved in spliceosome assembly, and even though it is ubiquitously expressed it specifically affects a subgroup of somatic motor neurons. In paper IV, we try to elucidate why some motor neurons are resistant and other vulnerable in the disease, by looking at both gene expression and splicing differences in a mouse model of SMA.
This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.