Karolinska Institutet
Browse

Methods development for the investigation of the mammalian genome radial architecture : the quantitative side

Download (16.99 MB)
thesis
posted on 2024-09-02, 20:06 authored by Gabriele Girelli

The nucleus of mammalian cells cradles the genome, an ensemble of nucleic acid macromolecular polymers that store information in a physical form. For a cell to perform life-sustaining processes, reading and utilizing the information encoded in the genome monomers’ sequence is necessary. Considerable attention has been paid to these processes since their discovery, leading to remarkable breakthroughs in our understanding of basic cell biology and the Genetics field’s birth. In the past two decades, the focus has shifted from this one-dimensional approach to a more spatio-temporal perspective. It is now clear that the genome has a complex architecture, with a multitude of organizational levels at different scales. Additionally, genome architecture interplays with gene expression, and alterations to its spatial organization associate with various pathologies like cancer, premature-aging diseases, and male infertility. In this thesis, we present the development of two methods enabling the investigation of genome architecture.

In Paper I, we established iFISH, a full-stack workflow for easy DNA fluorescence in situ hybridization (FISH) setup and application. Specifically, iFISH includes a novel and accurately crafted database of 40 nt long oligonucleotide sequences for labeling specific human genomic loci. iFISH 40-mers provide a strikingly higher genomic coverage and shorter interoligo distance than other state-of-the-art databases. Moreover, the iFISH database of homologous sequences allows for the design of a 96-oligo probe in more than half of the ten kb-wide genomic regions and more than 85% of 15 kb-wide genomic regions (against a 15-30% for other databases). iFISH also includes a computational tool, easily accessible and usable via a web-based graphical user interface, for the automatic selection of optimal sets of oligos (i.e., probe design), for single-probe or homogeneous multi-probe (i.e., spotting) labeling. We applied our computational pipeline to design a total of 330 DNA FISH probes, covering all human chromosomes homogeneously, with an inter-probe distance of 10 Mb for chromosomes 1 to 16 and X and of 5 Mb for chromosomes 17 to 22. Additionally, we systematically and individually tested most probes, whose sequences are readily available for the community to download and utilize. Furthermore, we built upon cutting-edge sequence amplification methods to provide an inexpensive and straightforward protocol for the large-scale amplification of DNA FISH probes starting from relatively low concentrated oligopools. To this end, we designed a set of novel 20-mer sequences orthogonal to the human genome and compatible with the probe-specific PCR steps of the amplification protocol. Finally, we showcased the extensive applicability and flexibility of the iFISH workflow in human IMR90 fibroblast cells, revealing the importance of a dense label sampling for correct chromatin volume estimation, and in human embryonic stem cells, uncovering overall less distinct chromosome territories, and a remarkable lack of chromosome territoriality in a subset of cells. Altogether, these results support iFISH as an empowering set of tools and resources for the research community, freely accessible online at https://www.ifish4u.org.

In Paper II, we presented Genomic loci Positioning by sequencing (a.k.a., GPSeq), a method for the genome-wide measurement of genomic loci position along the nuclear radius. GPSeq follows a straightforward protocol based on a simple and elegant concept: nuclear diffusion proceeds from the nucleus periphery towards its interior. We proved this concept by applying it to restriction enzyme diffusion, where we exploited a FISH-based method (YFISH) to visualize concentric genomic restriction signal waves generated by different digestion times. Specifically, GPSeq combines the sequencing of genomic loci restricted at different digestion time lengths into a so-called "GPSeq score," a reliable and accurate estimate of genomic loci centrality. We validated the GPSeq score against a collection of 68 DNA FISH probes, spanning 11 different chromosomes, data obtained from DamID-seq of Lamin B1, and also Hi-C chromatin contacts. Then, we utilized the radial maps drawn by GPSeq to reveal novel radial arrangements of different chromatin states and identify centrality predictors at different resolutions. Subsequently, we applied a novel 3D genome reconstruction algorithm to demonstrate how an additional centrality constraint can improve reconstructed structures’ quality. Specifically, 3D genome structures generated by a GPSeq-informed algorithm showed a higher correlation with FISH-based radial measurements and an arrangement of chromosome territories and genomic compartments that better reflects the underlying biology. Additionally, structures generated by the combination of GPSeq and Hi-C intrachromosomal contacts allowed the recovery of the inter-chromosomal contacts, further underscoring the necessity of additional constraints provided by orthogonal methods to Hi-C for a more reliable 3D genome reconstruction. Finally, we applied GPSeq to provide insight into the so-called "bodyguard hypothesis, " speculating that heterochromatin might act as a shield from exogenous mutagens for the more internally located active chromatin. In this regard, we showed that cancer-related single-nucleotide variants (SNVs) have a strikingly different radial arrangement than germline single-nucleotide polymorphisms (SNPs), with the former more peripherally located than the latter. We then showed that genomic regions involved with gene fusions in cancer tend to locate more internally and contact other chromosomes more frequently than other regions. We combined these observations and the fact that double-strand breaks (DSBs) tend to locate more internally, further confirmed from immunofluorescence experiments, to speculate that cancer-related SNVs and germline SNPs might come to be by different underlying mechanisms. Altogether, these results highlight the importance of genomewide high-resolution radial maps in the study of genome architecture, both as a standalone resource and as a complementary feature to chromatin contacts.

Disclaimer: The biomolecular (i.e., wet lab) method protocols presented here are the main work of other students and researchers (mainly Dr. Joaquin Custodio and Dr. Tomasz Kallas for GPSeq, and Eleni Gelali for iFISH). Instead, this thesis is focused on the development of the analytical and deeply quantitative side (i.e., dry lab) of method development. At the same time, we would like to stress that it is an impossible feat to fully separate these two sides, as novel method arise only through the interplay between experimentalists and analysts (when they are not the same person).

List of scientific papers

I. Gelali, Eleni*, Gabriele Girelli*, Masahiro Matsumoto, Erik Wernersson, Joaquin Custodio, Ana Mota, Maud Schweitzer et al. iFISH is a publically available resource enabling versatile DNA FISH to study genome architecture. Nature communications. 10, no. 1 (2019):1-15. *These authors contributed equally.
https://doi.org/10.1038/s41467-019-09616-w

II. Girelli, Gabriele*, Joaquin Custodio*, Tomasz Kallas*, Federico Agostini, Erik Wernersson, Bastiaan Spanjaard, Ana Mota et al. GPSeq reveals the radial organization of chromatin in the cell nucleus. Nature Biotechnology. 38, no. 10 (2020):1184-1193. *These authors contributed equally.
https://doi.org/10.1038/s41587-020-0519-y

History

Defence date

2021-05-12

Department

  • Department of Medical Biochemistry and Biophysics

Publisher/Institution

Karolinska Institutet

Main supervisor

Bienko, Magda

Co-supervisors

Farnebo, Marianne

Publication year

2021

Thesis type

  • Doctoral thesis

ISBN

978-91-8016-197-8

Number of supporting papers

2

Language

  • eng

Original publication date

2021-04-16

Author name in thesis

Girelli, Gabriele

Original department name

Department of Medical Biochemistry and Biophysics

Place of publication

Stockholm

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC