Algorithms for building and evaluating multiple sequence alignments

University dissertation from Stockholm : Karolinska Institutet, Department of Cell and Molecular Biology

Abstract: The alignment of biological sequences is crucial for the transfer of annotation from model organisms to humans. Pairwise alignment of sequences can reveal homology while multiple alignments are used to characterize protein families and elucidate their evolutionary history. We developed several software packages to create, evaluate and visualize multiple alignments. Our alignment program Kalign combines excellent accuracy with unparalleled computational benefits. The initial publication outlines the algorithm and innovations introduced to the field, while the second introduced several key improvements and additions to the original algorithm. The accuracy of Kalign is high for both protein and nucleotide alignments and Kalign can thus be used for a wide range of applications in genomics, including homology detection, protein and RNA structure prediction, phylogenetic analysis and promoter prediction. The assessment of alignment quality is a tough problem the field. While alignment programs can be tested on benchmark sets to reveal their overall performance, determining the accuracy of individual alignments is next to impossible. We approached this problem by analyzing several alignments of the same sequences and applying a consensus principle: if different methods arrive at the same conclusion it is more likely to be correct than when methods disagree. Our program MUMSA can thus diagnose faulty alignments which is critical in high throughput genomics application. Both Kalign and Mumsa can be freely accessed at our website which also features Kalignvu, a lightweight alignment viewer.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.