Algorithms for building and evaluating multiple sequence alignments
The alignment of biological sequences is crucial for the transfer of annotation from model organisms to humans. Pairwise alignment of sequences can reveal homology while multiple alignments are used to characterize protein families and elucidate their evolutionary history.
We developed several software packages to create, evaluate and visualize multiple alignments. Our alignment program Kalign combines excellent accuracy with unparalleled computational benefits. The initial publication outlines the algorithm and innovations introduced to the field, while the second introduced several key improvements and additions to the original algorithm. The accuracy of Kalign is high for both protein and nucleotide alignments and Kalign can thus be used for a wide range of applications in genomics, including homology detection, protein and RNA structure prediction, phylogenetic analysis and promoter prediction.
The assessment of alignment quality is a tough problem the field. While alignment programs can be tested on benchmark sets to reveal their overall performance, determining the accuracy of individual alignments is next to impossible. We approached this problem by analyzing several alignments of the same sequences and applying a consensus principle: if different methods arrive at the same conclusion it is more likely to be correct than when methods disagree. Our program MUMSA can thus diagnose faulty alignments which is critical in high throughput genomics application.
Both Kalign and Mumsa can be freely accessed at our website http://msa.cgb.ki.se which also features Kalignvu, a lightweight alignment viewer.
List of scientific papers
I. Lassmann T, Sonnhammer EL (2002). "Quality assessment of multiple alignment programs" FEBS Lett 529(1): 126-30
https://pubmed.ncbi.nlm.nih.gov/12354624
II. Lassmann T, Sonnhammer EL (2005). Kalign--an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics. 6: 298.
https://pubmed.ncbi.nlm.nih.gov/16343337
III. Lassmann T (2006). Kalign2 - a scalable approach for the alignment of protein and nucleotide sequences. [Submitted]
IV. Lassmann T, Sonnhammer EL (2005). Automatic assessment of alignment quality. Nucleic Acids Res. 33(22): 7120-8.
https://pubmed.ncbi.nlm.nih.gov/16361270
V. Lassmann T, Sonnhammer EL (2006). "Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment. Nucleic Acids Res. 34 (Web Server issue): W596-9.
https://pubmed.ncbi.nlm.nih.gov/16845078
VI. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A (2006). Pfam: clans, web tools and services. Nucleic Acids Res. 34 (Database issue): D247-51.
https://pubmed.ncbi.nlm.nih.gov/16381856
History
Defence date
2006-09-26Department
- Department of Cell and Molecular Biology
Publisher/Institution
Karolinska InstitutetPublication year
2006Thesis type
- Doctoral thesis
ISBN-10
91-7140-887-8Number of supporting papers
6Language
- eng