Karolinska Institutet
Browse

Bioinformatic characterization of structural variation in rare disease

Download (1.65 MB)
thesis
posted on 2025-05-14, 11:21 authored by Kristine Bilgrav SaetherKristine Bilgrav Saether

The genome contains the blueprint for human development, health, and disease. Despite significant advances, the full implications of many regions of the genome remain poorly understood, leaving approximately 60% of patients referred for genetic analysis undiagnosed.

Only about 2% of the genome consists of protein-coding exons, and the remaining 98% consists of non-coding regions, including regulatory elements, introns, and various types of repeat elements, such as transposable elements (TEs), low-copy repeats, and satellite sequences. These non-coding regions have often been neglected and remain understudied, despite their known association with structural variation. Only recently was the first complete human reference genome released, the telomere-to-telomere assembly (T2T-CHM13). This added over 200 megabase-pairs of previously missing or computed sequences, mainly composed of repetitive elements.

Structural variant detection has historically been limited by short read lengths, which struggle with alignment and resolution across repetitive regions. However, with new techniques such as long-read sequencing and T2T-CHM13, it is now possible to increase coverage and resolution in these challenging areas. In this thesis, we developed workflows and applied new techniques to identify and characterize both simple and complex structural variants.

In study I, we developed a pipeline to detect TE insertions in short-read genome sequencing data. The pipeline was applied to population genomic datasets to build databases for frequency annotation, and to patient genomes, where it identified two cases of disease-causing TE insertions. In study II, we developed STELLeR, a fast, sensitive, and precise tool for TE detection in long-read sequencing data that can easily be implemented into long-read workflows.

In study III, we used long-read genome sequencing and multiple reference genomes to resolve and characterize large chromosomal inversions. Four of twelve cases could only be identified using the T2T-CHM13 assembly. We further explored regions present in T2T-CHM13 but absent from other human and primate reference genomes. In study IV, we applied similar strategies to investigate supernumerary marker chromosomes. We characterized and proposed formation mechanisms for nine out of ten cases, four of which could not have been resolved without T2T-CHM13. Methylation analysis also revealed the parental origin in one case and skewed X-inactivation in another.

This thesis presents new tools and methodologies for detecting structural variants, including TEs, inversions, and supernumerary marker chromosomes. The work enhances our understanding of repetitive genomic regions and their implications in structural variant formation and detection. Furthermore, it highlights the need for longer reads and complete reference genomes for an accurate and comprehensive genome analysis.

List of scientific papers

I. Transposable element insertions in 1000 Swedish individuals. Bilgrav Saether K, Nilsson D, Thonberg H, Tham E, Ameur A, Eisfeldt J, Lindstrand A. PLoS One. 2023 Jul 28;18(7):e0289346. PMID: 37506127. https://doi.org/10.1371/journal.pone.0289346

II. Detecting transposable elements in long read genomes using STELLER. Bilgrav Saether K, Eisfeldt J, Bioinformatics, 2024 Nov 18; btae68. PMID: 39558574. https://doi.org/10.1093/bioinformatics/btae686

III. Leveraging the T2T-CHM13 assembly to resolve rare and pathogenic inversions in reference genome gaps. Bilgrav Saether K, Eisfeldt J, Bengtsson JD, Lun MY, Grochowski CM, Mahmoud M, Chao HT, Rosenfeld JA, Liu P, Ek M, Schuy J, Ameur A, Dai H; Undiagnosed Diseases Network; Hwang JP, Sedlazeck FJ, Bi W, Marom R, Wincent J, Nordgren A, Carvalho CMB, Lindstrand A. Genome Res. 2024 Nov 1;34(11):1785-1797. PMID: 39486878. https://doi.org/10.1101/gr.279346.124

IV. Detailed resolution and methylation patterns of supernumerary marker chromosomes using long read genome sequencing Bilgrav Saether K, Marlene Ek, Maria Pettersson, Elisabeth Syk Lundberg, Christopher M. Grochowski, Claudia M. B. Carvalho, Jesper Eisfeldt, Anna Lindstrand [Manuscript]

History

Defence date

2025-06-13

Department

  • Department of Molecular Medicine and Surgery

Publisher/Institution

Karolinska Institutet

Main supervisor

Jesper Eisfeldt

Co-supervisors

Anna Lindstrand; Daniel Nilsson; Valtteri Wirta; Magnus Nordenskjöld

Publication year

2025

Thesis type

  • Doctoral thesis

ISBN

978-91-8017-583-8

Number of pages

64

Number of supporting papers

4

Language

  • eng

Author name in thesis

Sæther, Kristine Bilgrav

Original department name

Department of Molecular Medicine and Surgery

Place of publication

Stockholm

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC