Genomic signatures in viruses

Abstract: In an age of global pandemics, studying how viruses and their genomes evolve is of great importance. It has previously been found that the genomes of many eukaryotes and prokaryotes have specific preferences for nucleotides, dinucleotides, and codons. Such preferences are characterized by the selective pressure acting on the genomes and are referred to as specific genomic signatures. The presence of such signatures has, to our knowledge, not been studied in viruses, and it is, therefore, the aim of this thesis to thoroughly investigate genomic signatures in viruses. In the first two papers of this thesis, new algorithms for the study of genomic signatures were developed. Here, such genomic signatures were based on variable-length Markov chains of a genome. Compared to pre-existing methods, our new algorithms are a thousand times faster, and compared to the state-of-the-art, the algorithms are up to 600 times faster while also requiring less memory. These methods enable computationally efficient analysis of genomic signatures, even on laptops. In the subsequent two papers, we thoroughly analyzed the genomic signatures of viruses and compared such signatures to those of the viruses' hosts. The results illustrate that a majority of viruses have specific genomic signatures. In addition, in most cases, the signatures of viruses are not similar to the signatures of their hosts other than in GC content. This dissimilarity indicates that viruses' signatures are independent of their host's signature, despite viruses' dependence on their host's genetic and protein-expression machinery. In the final paper, we illustrated an application of the genomic signatures by applying them to identify recombination events between Human alphaherpesvirus 1 and Human alphaherpesvirus 2. We thus demonstrate that genomic signatures of variable length are an important property of virus genomes. They hint at the importance of the evolution of specific patterns of the nucleotide sequence of viruses. These patterns can likely identify even remotely related viruses in collections of unknown sequences, thus helping detect and classify novel viruses. In addition, it might be possible to use and modify the genomic signatures to, e.g., attenuate viruses to create vaccine candidates.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.