Gene complexes and regulatory domains in metazoan genomes
Despite the recent massive increases in genome and transcript sequence data, including wholegenome sequences for humans and many other metazoans, our understanding of the content of these sequences is far from complete. This thesis is about making use of metazoan sequence data to detect functional genetic elements on a genome-wide scale and examine the distribution of those elements on chromosomes. Specifically, the thesis focuses on the occurrence of gene complexes, such as pairs of overlapping genes, and on chromosomal regulatory domains of importance in development and disease.
Mammalian genomes contain a larger than expected number of complex loci, in which genes on opposite strands share transcribed regions, exons and/or core promoters. We find that, in both human and mouse genomes, 25% of transcriptional units (TUs) share exon sequence with a TU on the opposite strand. The true proportion is likely to be significantly higher because transcriptomes are not fully sequenced. Intriguingly, most pairs of overlapping TUs consist of one coding and one noncoding TU. We have included a large dataset of transcript sequences from such noncoding TUs in a database of noncoding RNA (http://research.imb.uq.edu.au/RNAdb). While nearly a thousand cases of overlapping TU arrangements are conserved between human and mouse, these constitute only 17% of all detected TU overlaps, suggesting that many species-specific arrangements exist. Taking advantage of newly available CAGE tag data on transcription start site locations, we analyze bidirectional promoters and show that their divergent transcription initiation regions are broad and often separated only by a small region (<60 bp) at which overall sequence composition changes strand.
Vertebrate, insect and nematode genomes contain an abundance of highly conserved noncoding elements (HCNEs) that appear to function as enhancers for developmental regulatory genes around which they cluster. We show evidence that large blocks of conserved synteny (genomic regulatory blocks, GRBs) have been maintained, across vertebrates and across insects, to keep arrays of HCNEs intact. GRBs often contain 'bystander' genes whose functions and expression patterns are unrelated to those of the presumptive target genes of HCNE enhancer activity. By analyzing the fate of duplicated genes and HCNEs after whole-genome duplication in teleosts, we show that bystander genes are indeed independent of the regulatory input of HCNE arrays. In addition, we describe differences in core promoters between target genes and bystander genes that might explain the differences in their responsiveness to long-range enhancers. We present a web resource (http://ancora.genereg.net) for exploring the distribution of HCNEs on metazoan chromosomes.
Together with other recent studies, this work challenges the canonical colinear model of how genes and their regulatory elements are arranged in metazoan genomes. Vertebrate and insect genomes appear to contain an abundance of nested and overlapping gene structures, giving rise to both coding and noncoding transcripts. In addition, regulatory elements controlling the expression of a gene are frequently distributed within or beyond other genes. These findings should be taken into account in future studies of regulation of gene expression and effects of genetic variation by considering the genomic neighborhood of genes and polymorphisms of interest, up to distances on the order of a million base pairs in the human genome.
List of scientific papers
I. Pang KC, Stephen S, Engström PG, Tajul-Arifin K, Chen W, Wahlestedt C, Lenhard B, Hayashizaki Y, Mattick JS (2005). RNAdb-a comprehensive mammalian noncoding RNA database. Nucleic Acids Res. 33(Database issue): D125-30.
https://pubmed.ncbi.nlm.nih.gov/15608161
II. Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J, Suzuki H, Carninci P, Hayashizaki Y, Wells C, Frith M, Ravasi T, Pang KC, Hallinan J, Mattick J, Hume DA, Lipovich L, Batalov S, Engström PG et. al. (2005). Antisense transcription in the mammalian transcriptome. Science. 309(5740): 1564-6.
https://pubmed.ncbi.nlm.nih.gov/16141073
III. Engström PG, Suzuki H, Ninomiya N, Akalin A, Sessa L, Lavorgna G, Brozzi A, Luzi L, Tan SL, Yang L, Kunarso G, Ng EL, Batalov S, Wahlestedt C, Kai C, Kawai J, Carninci P, Hayashizaki Y, Wells C, Bajic VB, Orlando V, Reid JF, Lenhard B, Lipovich L (2006). Complex Loci in human and mouse genomes. PLoS Genet. 2(4): e47.
https://pubmed.ncbi.nlm.nih.gov/16683030
IV. Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engström PG, Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, Ghislain J, Pezeron G, Mourrain P, Ellingsen S, Oates AC, Thisse C, Thisse B, Foucher I, Adolf B, Geling A, Lenhard B, Becker TS (2007). Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res. 17(5): 545-55.
https://pubmed.ncbi.nlm.nih.gov/17387144
V. Engström PG, Ho Sui SJ, Drivenes Ö, Becker TS, Lenhard B (2007). Genomic regulatory blocks underlie extensive microsynteny conservation in insects. Genome Res. [Accepted]
https://pubmed.ncbi.nlm.nih.gov/17989259
VI. Engström PG, Fredman D, Lenhard B (2007). Ancora: a web resource for exploring highly conserved noncoding elements and their association with developmental regulatory genes. [Submitted]
History
Defence date
2007-11-23Department
- Department of Cell and Molecular Biology
Publication year
2007Thesis type
- Doctoral thesis
ISBN
978-91-7357-361-0Number of supporting papers
6Language
- eng