Karolinska Institutet
Browse

Statistical and computational methodologies for omics data analyses and drug response prediction

Download (2.33 MB)
thesis
posted on 2024-09-18, 11:08 authored by Quang Thinh TracQuang Thinh Trac

With the availability of valuable omics data from recent high-throughput sequencing technologies, and a deeper understanding of the pathophysiology of multiple diseases, researchers can now focus on precision medicine to improve the effectiveness of current diagnosis and treatment methods. Unlike traditional treatment, which was largely subjective and based on clinicians' experience, modern treatment for complex diseases can be guided by the precision medicine approach, such as through the molecular classification of diseases of patients. However, despite the early promising outcomes of precision medicine, analyzing omics data to tailor effective treatments for patients and explore the biological mechanisms of diseases remains highly challenging. This is not only due to the heterogeneity of the disease but also the complexity of the omics data.

In this thesis, we aim to develop statistical and computational methodologies for multi-omics data analyses and drug response prediction. The methodologies are applied for both simulated and real datasets from different diseases, with a particular focus on acute myeloid leukemia (AML) and amyotrophic lateral sclerosis (ALS). Through critical evaluation and validation analyses, we demonstrate that our methods perform well against competing methods.

In study I, we propose a pathway activation score (PAS) and apply it to identify and validate druggable cancer-specific pathways (DCSP) from pan-cancer datasets. Our hypothesis is that cancers with activated DCSPs are more likely to respond to the corresponding drug. In analysis, we identified and validated 4,794 DCSPs across 23 cancers. Further focusing on AML, we show that tumor samples with higher PAS exhibit stronger drug responses, supporting our hypothesis.

In study II, we develop MDREAM, a prediction model for drug response in AML patients. We first train MDREAM on the BeatAML cohort using gene expression, mutation profiles, and drug response data. We further validate MDREAM in the test set of the BeatAML dataset and externally validate it in a Swedish AML dataset and a relapsed leukemia dataset. Our results demonstrate the robust and consistent performance of MDREAM across datasets. We also propose a confidence score metric to compute prediction uncertainty and illustrate its application within the MDREAM framework.

In study III, we implement DIPx, a machine learning model for personalized drug synergy prediction based on PAS. DIPx is trained and validated using the AstraZeneca-Sanger (AZS) DREAM Challenge dataset. Our validation results show that DIPx achieves higher accuracy than the top-performing method from the challenge. Additionally, we demonstrate how PAS can suggest potential biological mechanisms by identifying activated pathways that mediate drug synergy interactions.

In study IV, we introduce MegaFun, a computational method for quantifying the functional aspects of the microbiome from metagenomics data. MegaFun utilizes gene clusters based on sequence similarities at both the pangenome and isolate levels. To quantify functional abundance, it employs an alternating EM algorithm which is applied to a bilinear model capturing the complexity of the microbiome at the isolate level. In a simulated dataset, MegaFun outperforms HUMAN, a state-of-the-art method for functional quantification. We also apply MegaFun to analyze a real metagenomics dataset from ALS patients.

In summary, we have developed novel statistical and computational methods to analyze omics data and responses of drugs. The results demonstrate that these methods perform well against existing methodologies. We hope that our work will advance omics data analysis and drug response prediction, and aid researchers in uncovering biological insights.

List of scientific papers

I. Quang Thinh Trac, Tingyou Zhou, Yudi Pawitan, and Trung Nghia Vu. Discovery of druggable cancer-specific pathways with application in acute myeloid leukemia. Gigascience. 11:giac091 (2022).
https://doi.org/10.1093/gigascience/giac091


II. Quang Thinh Trac, Yudi Pawitan, Tian Mou, Tom Erkers, Päivi Östling, Anna Bohlin, Albin Österroos, Mattias Vesterlund, Rozbeh Jafari, Ioannis Siavelis, Helena Bäckvall, Santeri Kiviluoto, Lukas M. Orre, Mattias Rantalainen, Janne Lehtio, Sören Lehmann, Olli Kallioniemi, and Trung Nghia Vu. Prediction model for drug response of acute myeloid leukemia patients. npj Precis. Onc. 7, 32 (2023).
https://doi.org/10.1038/s41698-023-00374-z


III. Quang Thinh Trac*, Yue Huang*, Tom Erkers, Päivi Östling, Anna Bohlin, Albin Osterroos, Mattias Vesterlund, Rozbeh Jafari, loannis Siavelis, Helena Bäckvall, Santeri Kiviluoto, Lukas M. Orre, Mattias Rantalainen, Janne Lehtio, Sören Lehmann, Olli Kallioniemi, Yudi Pawitan and Trung Nghia Vu. Pathway activation model for personalized prediction of drug synergy. eLife13:RP100071 (2024). (* Contributed equally)
https://doi.org/10.7554/eLife.100071.1


IV. Quang Thinh Trac, Emily Joyce, Fredrik Boulund, Fang Fang, Yudi Pawitan and Trung Nghia Vu. Functional quantification of microbiome from metagenomics. [Manuscript]

History

Defence date

2024-10-25

Department

  • Department of Medical Epidemiology and Biostatistics

Publisher/Institution

Karolinska Institutet

Main supervisor

Trung Nghia Vu

Co-supervisors

Yudi Pawitan; Fang Fang; Mattias Rantalainen

Publication year

2024

Thesis type

  • Doctoral thesis

ISBN

978-91-8017-738-2

Number of pages

53

Number of supporting papers

4

Language

  • eng

Author name in thesis

Trac, Quang Thinh

Original department name

Department of Medical Epidemiology and Biostatistics

Place of publication

Stockholm

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC