Bioinformatic characterization of structural variation in rare disease
The genome contains the blueprint for human development, health, and disease. Despite significant advances, the full implications of many regions of the genome remain poorly understood, leaving approximately 60% of patients referred for genetic analysis undiagnosed.
Only about 2% of the genome consists of protein-coding exons, and the remaining 98% consists of non-coding regions, including regulatory elements, introns, and various types of repeat elements, such as transposable elements (TEs), low-copy repeats, and satellite sequences. These non-coding regions have often been neglected and remain understudied, despite their known association with structural variation. Only recently was the first complete human reference genome released, the telomere-to-telomere assembly (T2T-CHM13). This added over 200 megabase-pairs of previously missing or computed sequences, mainly composed of repetitive elements.
Structural variant detection has historically been limited by short read lengths, which struggle with alignment and resolution across repetitive regions. However, with new techniques such as long-read sequencing and T2T-CHM13, it is now possible to increase coverage and resolution in these challenging areas. In this thesis, we developed workflows and applied new techniques to identify and characterize both simple and complex structural variants.
In study I, we developed a pipeline to detect TE insertions in short-read genome sequencing data. The pipeline was applied to population genomic datasets to build databases for frequency annotation, and to patient genomes, where it identified two cases of disease-causing TE insertions. In study II, we developed STELLeR, a fast, sensitive, and precise tool for TE detection in long-read sequencing data that can easily be implemented into long-read workflows.
In study III, we used long-read genome sequencing and multiple reference genomes to resolve and characterize large chromosomal inversions. Four of twelve cases could only be identified using the T2T-CHM13 assembly. We further explored regions present in T2T-CHM13 but absent from other human and primate reference genomes. In study IV, we applied similar strategies to investigate supernumerary marker chromosomes. We characterized and proposed formation mechanisms for nine out of ten cases, four of which could not have been resolved without T2T-CHM13. Methylation analysis also revealed the parental origin in one case and skewed X-inactivation in another.
This thesis presents new tools and methodologies for detecting structural variants, including TEs, inversions, and supernumerary marker chromosomes. The work enhances our understanding of repetitive genomic regions and their implications in structural variant formation and detection. Furthermore, it highlights the need for longer reads and complete reference genomes for an accurate and comprehensive genome analysis.
List of scientific papers
I. Transposable element insertions in 1000 Swedish individuals. Bilgrav Saether K, Nilsson D, Thonberg H, Tham E, Ameur A, Eisfeldt J, Lindstrand A. PLoS One. 2023 Jul 28;18(7):e0289346. PMID: 37506127. https://doi.org/10.1371/journal.pone.0289346
II. Detecting transposable elements in long read genomes using STELLER. Bilgrav Saether K, Eisfeldt J, Bioinformatics, 2024 Nov 18; btae68. PMID: 39558574. https://doi.org/10.1093/bioinformatics/btae686
III. Leveraging the T2T-CHM13 assembly to resolve rare and pathogenic inversions in reference genome gaps. Bilgrav Saether K, Eisfeldt J, Bengtsson JD, Lun MY, Grochowski CM, Mahmoud M, Chao HT, Rosenfeld JA, Liu P, Ek M, Schuy J, Ameur A, Dai H; Undiagnosed Diseases Network; Hwang JP, Sedlazeck FJ, Bi W, Marom R, Wincent J, Nordgren A, Carvalho CMB, Lindstrand A. Genome Res. 2024 Nov 1;34(11):1785-1797. PMID: 39486878. https://doi.org/10.1101/gr.279346.124
IV. Detailed resolution and methylation patterns of supernumerary marker chromosomes using long read genome sequencing Bilgrav Saether K, Marlene Ek, Maria Pettersson, Elisabeth Syk Lundberg, Christopher M. Grochowski, Claudia M. B. Carvalho, Jesper Eisfeldt, Anna Lindstrand [Manuscript]
History
Defence date
2025-06-13Department
- Department of Molecular Medicine and Surgery
Publisher/Institution
Karolinska InstitutetMain supervisor
Jesper EisfeldtCo-supervisors
Anna Lindstrand; Daniel Nilsson; Valtteri Wirta; Magnus NordenskjöldPublication year
2025Thesis type
- Doctoral thesis
ISBN
978-91-8017-583-8Number of pages
64Number of supporting papers
4Language
- eng