A Study in RNA Bioinformatics : Identification, Prediction and Analysis

Abstract: Research in the last few decades has revealed the great capacity of the RNA molecule. RNA, which previously was assumed to play a main role only as an intermediate in the translation of genes to proteins, is today known to play many important roles in the cell in addition to that as a messenger RNA and transfer RNA, including the ability to catalyze reactions and gene regulations at various levels.This thesis investigates several computational aspects of RNA. We will discuss identification of novel RNAs and RNAs that are known to exist in related species, RNA secondary structure prediction, as well as more general tools for analyzing, visualizing and classifying RNA sequences.We present two benchmark studies concerning RNA identification, both de novo identification/characterization of single RNA sequences and homology search methods.We develope a novel algorithm for analysis of the RNA folding landscape that is based on the nearest neighbor energy model adopted in many secondary structure prediction programs. We implement this algorithm, which computes structural neighbors of a given RNA secondary structure, in the program RNAbor, which is accessible on a web server.Furthermore, we combine a mutual information based structure prediction algorithm with a sequence logo visualization to create a novel visualization tool for analyzing an RNA alignment and identifying covarying sites.Finally, we present extensions to sequence logos for the purpose of tRNA identity analysis. We introduce function logos, which display features that distinguish functional subclasses within a large set of structurally related sequences, as well as the inverse logos, which display underrepresented features. For the purpose of comparing tRNA identity elements between different taxa we introduce two contrasting logos, the information difference and the Kullback-Leibler divergence difference logos.