Methods and applications in DNA sequence alignments
Author: Sherwood, Ellen
Date: 2007-03-23
Location: Föreläsningssalen på Institutionen Cell och Molekylärbiologi (CMB), Berzelius väg 21, Karolinska Institutet, Solna
Time: 09.30
Department: Institutionen för cell- och molekylärbiologi (CMB) / Department of Cell and Molecular Biology
View/ Open:
thesis.pdf (1.180Mb)
Abstract
DNA sequence alignment is one of the most common bioinformatics tasks.
Alignment analysis for eukaryotic genomes is challenging because the
datasets are large. Repeat sequences also make the analysis difficult.
This thesis describes new methods which we have developed for DNA
sequence alignment that address these problems. We have applied these new
methods in chicken and Trypanosoma cruzi genome analysis projects, and
this publication also describes the result from these projects.
Most alignment programs use a seed and extend method, where subsequences
(seeds) are used to locate potential alignments that are verified. There
is a tradeoff between sensitivity and specificity in the seeding process,
as short seeds are inefficient in eliminating spurious matches and long
seeds are more likely to omit true alignments in the presence of
sequencing errors and polymorphisms. We developed an approximate seed
matching algorithm which reduces the impact of this tradeoff by allowing
mismatches within the seeds. Approximate seed matching allows the use of
long seeds, which results in high specificity in the seeding and a faster
alignment program. At the same time, sequencing errors and polymorphisms
between the sequences do not reduce sensitivity.
The chicken is both an important agricultural source of protein and model
organism in biological research. The genome sequencing of the wild
ancestor of domestic chickens have offered an opportunity to study
genetic factors involved in domestication. Sequences from three domestic
chicken breeds were available for comparison to the genome sequence. We
used this data to find signs of selective sweeps between wild and
domestic chickens by searching for regions with low diversity within
domestic breeds. The results showed no evidence of large,
domestic-specific sweeps. These findings indicate substantial sequence
variation within chicken breeds.
Copy number variation is emerging as an important source of genotypic and
phenotypic variation in humans. We investigated the presence of such
structural variation in the chicken genome through array comparative
genome hybridizations of different chicken breeds. The results show
extensive copy number variation, in some cases unique to domestic
chickens.
Trypanosoma cruzi is a protozoan parasite which causes Chagas disease. It
has interesting biological features, including a genome structure with
many repeated genes. Genes are often repeated in tandem arrays, including
surface antigen genes and housekeeping genes. The genome assembly shows
numerous gaps and collapsed gene copies. We investigated the copy number
of the annotated genes and found the gene content of T. cruzi to be even
more repetitive than previously thought.
The genome analysis studies described in this thesis validated the DNA
sequence alignment methods we have developed, and have provided important
information for the chicken and T. cruzi research communities.
List of papers:
I. Tammi MT, Arner E, Kindlund E, Andersson B (2003). "Correcting errors in shotgun sequences." Nucleic Acids Res 31(15): 4663-72
Pubmed
II. Kindlund E, Tammi MT, Arner E, Nilsson D, Andersson B (2007). "GRAT-genome-scale rapid alignment tool." Comput Methods Programs Biomed Feb 8: Epub ahead of print
Pubmed
III. Wong GK, Liu B, Wang J, Zhang Y, Yang X, Zhang Z, Meng Q, Zhou J, Li D, Zhang J, Ni P, Li S, Ran L, Li H, Zhang J, Li R, Li S, Zheng H, Lin W, Li G, Wang X, Zhao W, Li J, Ye C, Kindlund E International Chicken Polymorphism Map Consortium et. al (2004). "A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms." Nature 432(7018): 717-22
Pubmed
IV. Kindlund E, Rubin CJ, Stromstedt L, Andersson B, Andersson L (2007). "Detection of copy number variation in the domestic chicken and its wild ancestor." (Manuscript)
V. Arner E, Kindlund E, Nilsson D, Farzana F, Ferella M, Tammi MT, Andersson B (2007). "Database of Trypanosoma cruzi repeated genes: 20 000 novel coding sequences." (Submitted)
I. Tammi MT, Arner E, Kindlund E, Andersson B (2003). "Correcting errors in shotgun sequences." Nucleic Acids Res 31(15): 4663-72
Pubmed
II. Kindlund E, Tammi MT, Arner E, Nilsson D, Andersson B (2007). "GRAT-genome-scale rapid alignment tool." Comput Methods Programs Biomed Feb 8: Epub ahead of print
Pubmed
III. Wong GK, Liu B, Wang J, Zhang Y, Yang X, Zhang Z, Meng Q, Zhou J, Li D, Zhang J, Ni P, Li S, Ran L, Li H, Zhang J, Li R, Li S, Zheng H, Lin W, Li G, Wang X, Zhao W, Li J, Ye C, Kindlund E International Chicken Polymorphism Map Consortium et. al (2004). "A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms." Nature 432(7018): 717-22
Pubmed
IV. Kindlund E, Rubin CJ, Stromstedt L, Andersson B, Andersson L (2007). "Detection of copy number variation in the domestic chicken and its wild ancestor." (Manuscript)
V. Arner E, Kindlund E, Nilsson D, Farzana F, Ferella M, Tammi MT, Andersson B (2007). "Database of Trypanosoma cruzi repeated genes: 20 000 novel coding sequences." (Submitted)
Issue date: 2007-03-02
Rights:
Publication year: 2007
ISBN: 978-91-7357-143-2
Statistics
Total Visits
Views | |
---|---|
Methods ...(legacy) | 687 |
Methods ... | 141 |
Total Visits Per Month
November 2023 | December 2023 | January 2024 | February 2024 | March 2024 | April 2024 | May 2024 | |
---|---|---|---|---|---|---|---|
Methods ... | 3 | 1 | 0 | 0 | 1 | 1 | 1 |
File Visits
Views | |
---|---|
thesis.pdf | 2261 |
thesis.pdf(legacy) | 1297 |
thesis.pdf.txt(legacy) | 2 |
Top country views
Views | |
---|---|
United States | 321 |
China | 67 |
Sweden | 62 |
Germany | 50 |
South Korea | 15 |
United Kingdom | 11 |
Finland | 10 |
Russia | 10 |
Nigeria | 9 |
Ireland | 8 |
Top cities views
Views | |
---|---|
Beijing | 36 |
Romeo | 34 |
Sunnyvale | 25 |
Kiez | 18 |
Seoul | 14 |
London | 8 |
Stockholm | 8 |
Ashburn | 7 |
Dublin | 7 |
Ballerup | 6 |