The art of transcriptome reconstruction : with applications in Picea abies (L.) H. Karst

Abstract: Transcriptome reconstruction is an important component in the bioinformatical part of transcriptome studies. When a reference genome is missing, highly fragmented or incomplete, a de novo transcriptome assembly is the transcriptome reconstruction approach of choice, since in such situations, a simple alignment (or mapping) would not necessarily give all theinformation concerning splice junctions, isoforms or even the full extent of the gene. Several methods for de novo transcriptome assembly have been suggested, but many of these methods lack sufficient ability to recover isoforms or are memory intense, which requires themethods to be executed on computing clusters.One species, whose published reference genome is highly fragmented, is the Norway spruce (Picea abies (L.) H. Karst.) – a conifer, very important for Swedish forestry ande conomy, but with a long juvenile phase and irregular cone setting, the demand of cultivated seeds is larger than the supply. Thus, there is a desire to understand the molecular biology behind the cone setting in P. abies, not least regarding gene expression and its regulation. This doctoral dissertation addresses these problems by describing the biological background in general, followed by an introduction to theoretical computational problems relatedto the methods applied for transcriptome reconstruction, which then are described in depth themselves, as is P. abies.Paper I uses a novel de novo assembler to detect connections between scaffolds in the P. abies genome, and also studies P. abies var acrocona, a mutant with shorter juvenile phase and more regular cone setting than the wild type, in order to detect how cone setting is initiated. By means of allele-specific expression analysis, this study detects a SNP ina miRNA binding site on a novel gene, a mutation which is coherent with the acroconaphenotype.Paper II and paper III both introduce one novel de novo transcriptome assembler each: Paper II describes the assembly method applied in paper I, with the focus torecover a comprehensive list of isoforms. It accomplished this, thus providing higher recallthan the other tested assemblers, but with an increased use of computational resources. In turn, paper III introduces a lightweight assembly method, which is the first assembly method employing the ant colony system (ACS) meta-heuristic. It is more rapid than theother tested assemblers and requires less memory (never more than half of the second most memory efficient method), but provides low recall.Paper IV applies reference based transcriptome assembly to improve the gene annotation of a new, chromosome-scale, P. abies reference genome, which is being prepared at the moment. This study pinpoints the locations in the assembled genome of six previously described genes, but which the annotation missed – until now. Furthermore, for two annotatedgenes, this study found and verified one novel transcript isoform each.Paper V studies the natural variation in cone setting in P. abies, and also the gene expression pre and post treatment of gibberellic acid (GA), known to stimulate floweringin plants. P. abies genotypes with lower cone setting ability turns out to get more genesactivated post GA treatment, compared to genotypes with higher cone setting ability.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)