Karolinska Institutet
Browse

Gene complexes and regulatory domains in metazoan genomes

Download (1.36 MB)
thesis
posted on 2024-09-02, 17:29 authored by Pär Engström

Despite the recent massive increases in genome and transcript sequence data, including wholegenome sequences for humans and many other metazoans, our understanding of the content of these sequences is far from complete. This thesis is about making use of metazoan sequence data to detect functional genetic elements on a genome-wide scale and examine the distribution of those elements on chromosomes. Specifically, the thesis focuses on the occurrence of gene complexes, such as pairs of overlapping genes, and on chromosomal regulatory domains of importance in development and disease.

Mammalian genomes contain a larger than expected number of complex loci, in which genes on opposite strands share transcribed regions, exons and/or core promoters. We find that, in both human and mouse genomes, 25% of transcriptional units (TUs) share exon sequence with a TU on the opposite strand. The true proportion is likely to be significantly higher because transcriptomes are not fully sequenced. Intriguingly, most pairs of overlapping TUs consist of one coding and one noncoding TU. We have included a large dataset of transcript sequences from such noncoding TUs in a database of noncoding RNA (http://research.imb.uq.edu.au/RNAdb). While nearly a thousand cases of overlapping TU arrangements are conserved between human and mouse, these constitute only 17% of all detected TU overlaps, suggesting that many species-specific arrangements exist. Taking advantage of newly available CAGE tag data on transcription start site locations, we analyze bidirectional promoters and show that their divergent transcription initiation regions are broad and often separated only by a small region (<60 bp) at which overall sequence composition changes strand.

Vertebrate, insect and nematode genomes contain an abundance of highly conserved noncoding elements (HCNEs) that appear to function as enhancers for developmental regulatory genes around which they cluster. We show evidence that large blocks of conserved synteny (genomic regulatory blocks, GRBs) have been maintained, across vertebrates and across insects, to keep arrays of HCNEs intact. GRBs often contain 'bystander' genes whose functions and expression patterns are unrelated to those of the presumptive target genes of HCNE enhancer activity. By analyzing the fate of duplicated genes and HCNEs after whole-genome duplication in teleosts, we show that bystander genes are indeed independent of the regulatory input of HCNE arrays. In addition, we describe differences in core promoters between target genes and bystander genes that might explain the differences in their responsiveness to long-range enhancers. We present a web resource (http://ancora.genereg.net) for exploring the distribution of HCNEs on metazoan chromosomes.

Together with other recent studies, this work challenges the canonical colinear model of how genes and their regulatory elements are arranged in metazoan genomes. Vertebrate and insect genomes appear to contain an abundance of nested and overlapping gene structures, giving rise to both coding and noncoding transcripts. In addition, regulatory elements controlling the expression of a gene are frequently distributed within or beyond other genes. These findings should be taken into account in future studies of regulation of gene expression and effects of genetic variation by considering the genomic neighborhood of genes and polymorphisms of interest, up to distances on the order of a million base pairs in the human genome.

List of scientific papers

I. Pang KC, Stephen S, Engström PG, Tajul-Arifin K, Chen W, Wahlestedt C, Lenhard B, Hayashizaki Y, Mattick JS (2005). RNAdb-a comprehensive mammalian noncoding RNA database. Nucleic Acids Res. 33(Database issue): D125-30.
https://pubmed.ncbi.nlm.nih.gov/15608161

II. Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J, Suzuki H, Carninci P, Hayashizaki Y, Wells C, Frith M, Ravasi T, Pang KC, Hallinan J, Mattick J, Hume DA, Lipovich L, Batalov S, Engström PG et. al. (2005). Antisense transcription in the mammalian transcriptome. Science. 309(5740): 1564-6.
https://pubmed.ncbi.nlm.nih.gov/16141073

III. Engström PG, Suzuki H, Ninomiya N, Akalin A, Sessa L, Lavorgna G, Brozzi A, Luzi L, Tan SL, Yang L, Kunarso G, Ng EL, Batalov S, Wahlestedt C, Kai C, Kawai J, Carninci P, Hayashizaki Y, Wells C, Bajic VB, Orlando V, Reid JF, Lenhard B, Lipovich L (2006). Complex Loci in human and mouse genomes. PLoS Genet. 2(4): e47.
https://pubmed.ncbi.nlm.nih.gov/16683030

IV. Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engström PG, Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, Ghislain J, Pezeron G, Mourrain P, Ellingsen S, Oates AC, Thisse C, Thisse B, Foucher I, Adolf B, Geling A, Lenhard B, Becker TS (2007). Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res. 17(5): 545-55.
https://pubmed.ncbi.nlm.nih.gov/17387144

V. Engström PG, Ho Sui SJ, Drivenes Ö, Becker TS, Lenhard B (2007). Genomic regulatory blocks underlie extensive microsynteny conservation in insects. Genome Res. [Accepted]
https://pubmed.ncbi.nlm.nih.gov/17989259

VI. Engström PG, Fredman D, Lenhard B (2007). Ancora: a web resource for exploring highly conserved noncoding elements and their association with developmental regulatory genes. [Submitted]

History

Defence date

2007-11-23

Department

  • Department of Cell and Molecular Biology

Publication year

2007

Thesis type

  • Doctoral thesis

ISBN

978-91-7357-361-0

Number of supporting papers

6

Language

  • eng

Original publication date

2007-11-02

Author name in thesis

Engström, Pär

Original department name

Department of Cell and Molecular Biology

Place of publication

Stockholm

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC