Assignment and assessment of orthology and gene function
Author: Storm, Christian
Date: 2004-02-20
Location: Nobel Forum, Solnavägen 1
Time: 14.00
Department: Centrum för Genomik och Bioinformatik (CGB) / Center for Genomics Research
View/ Open:
Thesis (667.0Kb)
Abstract
Several genomes from different species have been sequenced over the last years, most notably the human genome. An important task of computational biology is to classify and functionally annotate the large amount of sequence data created by the genome sequencing projects.
The concept of orthology and paralogy, developed over 30 years ago by Fitch, plays an important role in this task: Orthologous genes are genes in different species that evolved from a single gene in the last common ancestor of these species. Paralogous genes are genes that evolved due to a duplication event. Orthologs can be seen as different versions of the same gene in different species. Therefore they are likely to have the same functional properties and play a similar biochemical role in the cell. Once an orthologous gene for a newly sequenced gene is known, the annotation of the ortholog can give reliable information about the function and the role of the new gene.
The main focus of the work was to improve existing and develop new approaches for the inference of orthology. We developed a novel method, called ortholog bootstrapping, to analyze a gene tree for orthologs. Instead of only assigning orthology from a single gene tree, ortholog bootstrapping analyses multiple trees calculated for the same gene family. The trees are reconstructed using the bootstrap technique, enabling us to calculate bootstrap support values for orthologous sequence pairs. Ortholog bootstrapping was then used to find orthologs between species with completely sequenced genomes. Here we employed a scheme for the hierarchical clustering of species based on their evolutionary history.
The orthology inference was performed on the domain level, using the Pfam domain definitions. The results of the analysis were compared to a tree reconciliation method using a complete species tree for orthology inference. The comparison was based on a testset of Putative orthologous proteins with experimentally characterized functional properties. The outcome of the comparison showed that our approach increases the sensitivity for assigning orthologs from a gene tree.
Orthologous relations found using our approach were stored in a database. The database is available over the Internet, accessible by a previously developed Java applet for visualizing phylogenetic relations between domains. In addition to inferring orthology by phylogenetic means we developed a pairwise sequence similarity based method for assigning orthology. It focuses on the correct separation of paralogs and the calculation of an orthology confidence value.
The concept of orthology and paralogy, developed over 30 years ago by Fitch, plays an important role in this task: Orthologous genes are genes in different species that evolved from a single gene in the last common ancestor of these species. Paralogous genes are genes that evolved due to a duplication event. Orthologs can be seen as different versions of the same gene in different species. Therefore they are likely to have the same functional properties and play a similar biochemical role in the cell. Once an orthologous gene for a newly sequenced gene is known, the annotation of the ortholog can give reliable information about the function and the role of the new gene.
The main focus of the work was to improve existing and develop new approaches for the inference of orthology. We developed a novel method, called ortholog bootstrapping, to analyze a gene tree for orthologs. Instead of only assigning orthology from a single gene tree, ortholog bootstrapping analyses multiple trees calculated for the same gene family. The trees are reconstructed using the bootstrap technique, enabling us to calculate bootstrap support values for orthologous sequence pairs. Ortholog bootstrapping was then used to find orthologs between species with completely sequenced genomes. Here we employed a scheme for the hierarchical clustering of species based on their evolutionary history.
The orthology inference was performed on the domain level, using the Pfam domain definitions. The results of the analysis were compared to a tree reconciliation method using a complete species tree for orthology inference. The comparison was based on a testset of Putative orthologous proteins with experimentally characterized functional properties. The outcome of the comparison showed that our approach increases the sensitivity for assigning orthologs from a gene tree.
Orthologous relations found using our approach were stored in a database. The database is available over the Internet, accessible by a previously developed Java applet for visualizing phylogenetic relations between domains. In addition to inferring orthology by phylogenetic means we developed a pairwise sequence similarity based method for assigning orthology. It focuses on the correct separation of paralogs and the calculation of an orthology confidence value.
List of papers:
I. Storm CE, Sonnhammer EL (2001). NIFAS: visual analysis of domain evolution in proteins. Bioinformatics. 17(4): 343-8.
Pubmed
II. Remm M, Storm CE, Sonnhammer EL (2001). Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 314(5): 1041-52.
Pubmed
III. Storm CE, Sonnhammer EL (2002). Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics. 18(1): 92-9.
Pubmed
IV. Hollich V, Storm CE, Sonnhammer EL (2002). OrthoGUI: graphical presentation of Orthostrapper results. Bioinformatics. 18(9): 1272-3.
Pubmed
V. Storm CE, Sonnhammer EL (2003). Comprehensive analysis of orthologous protein domains using the HOPS database. Genome Res. 13(10): 2353-62.
Pubmed
I. Storm CE, Sonnhammer EL (2001). NIFAS: visual analysis of domain evolution in proteins. Bioinformatics. 17(4): 343-8.
Pubmed
II. Remm M, Storm CE, Sonnhammer EL (2001). Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 314(5): 1041-52.
Pubmed
III. Storm CE, Sonnhammer EL (2002). Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics. 18(1): 92-9.
Pubmed
IV. Hollich V, Storm CE, Sonnhammer EL (2002). OrthoGUI: graphical presentation of Orthostrapper results. Bioinformatics. 18(9): 1272-3.
Pubmed
V. Storm CE, Sonnhammer EL (2003). Comprehensive analysis of orthologous protein domains using the HOPS database. Genome Res. 13(10): 2353-62.
Pubmed
Issue date: 2004-01-30
Rights:
Publication year: 2004
ISBN: 91-7349-810-6
Statistics
Total Visits
Views | |
---|---|
Assignment ...(legacy) | 954 |
Assignment ... | 156 |
Total Visits Per Month
October 2023 | November 2023 | December 2023 | January 2024 | February 2024 | March 2024 | April 2024 | |
---|---|---|---|---|---|---|---|
Assignment ... | 4 | 1 | 1 | 1 | 2 | 2 | 1 |
File Visits
Views | |
---|---|
thesis.pdf(legacy) | 307 |
thesis.pdf | 77 |
thesis.pdf.txt(legacy) | 2 |
Top country views
Views | |
---|---|
United States | 377 |
China | 103 |
Sweden | 86 |
Germany | 77 |
South Korea | 17 |
United Kingdom | 15 |
Russia | 15 |
Spain | 14 |
Denmark | 13 |
France | 11 |
Top cities views
Views | |
---|---|
Beijing | 43 |
Romeo | 34 |
Sunnyvale | 34 |
Kiez | 26 |
Stockholm | 23 |
Shenzhen | 20 |
Seoul | 16 |
Norsborg | 13 |
London | 12 |
Nürnberg | 12 |