Karolinska Institutet
Browse

Algorithms for building and evaluating multiple sequence alignments

thesis
posted on 2024-09-02, 16:05 authored by Timo Lassmann

The alignment of biological sequences is crucial for the transfer of annotation from model organisms to humans. Pairwise alignment of sequences can reveal homology while multiple alignments are used to characterize protein families and elucidate their evolutionary history.

We developed several software packages to create, evaluate and visualize multiple alignments. Our alignment program Kalign combines excellent accuracy with unparalleled computational benefits. The initial publication outlines the algorithm and innovations introduced to the field, while the second introduced several key improvements and additions to the original algorithm. The accuracy of Kalign is high for both protein and nucleotide alignments and Kalign can thus be used for a wide range of applications in genomics, including homology detection, protein and RNA structure prediction, phylogenetic analysis and promoter prediction.

The assessment of alignment quality is a tough problem the field. While alignment programs can be tested on benchmark sets to reveal their overall performance, determining the accuracy of individual alignments is next to impossible. We approached this problem by analyzing several alignments of the same sequences and applying a consensus principle: if different methods arrive at the same conclusion it is more likely to be correct than when methods disagree. Our program MUMSA can thus diagnose faulty alignments which is critical in high throughput genomics application.

Both Kalign and Mumsa can be freely accessed at our website http://msa.cgb.ki.se which also features Kalignvu, a lightweight alignment viewer.

List of scientific papers

I. Lassmann T, Sonnhammer EL (2002). "Quality assessment of multiple alignment programs" FEBS Lett 529(1): 126-30
https://pubmed.ncbi.nlm.nih.gov/12354624

II. Lassmann T, Sonnhammer EL (2005). Kalign--an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics. 6: 298.
https://pubmed.ncbi.nlm.nih.gov/16343337

III. Lassmann T (2006). Kalign2 - a scalable approach for the alignment of protein and nucleotide sequences. [Submitted]

IV. Lassmann T, Sonnhammer EL (2005). Automatic assessment of alignment quality. Nucleic Acids Res. 33(22): 7120-8.
https://pubmed.ncbi.nlm.nih.gov/16361270

V. Lassmann T, Sonnhammer EL (2006). "Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment. Nucleic Acids Res. 34 (Web Server issue): W596-9.
https://pubmed.ncbi.nlm.nih.gov/16845078

VI. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A (2006). Pfam: clans, web tools and services. Nucleic Acids Res. 34 (Database issue): D247-51.
https://pubmed.ncbi.nlm.nih.gov/16381856

History

Defence date

2006-09-26

Department

  • Department of Cell and Molecular Biology

Publisher/Institution

Karolinska Institutet

Publication year

2006

Thesis type

  • Doctoral thesis

ISBN-10

91-7140-887-8

Number of supporting papers

6

Language

  • eng

Original publication date

2006-09-05

Author name in thesis

Lassmann, Timo

Original department name

Department of Cell and Molecular Biology

Place of publication

Stockholm

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC