Machine learning methods for precision medicine
In precision medicine, predicting the risk of an event during a specific period may help, for example, to identify patients that need early preventive treatment. Modern machine learning (ML) techniques are therefore ideal for building these predictions. However, medical datasets often suffer from right-censoring of the outcome of interest posing an obstacle to the direct applicability of ML algorithms.
The aim of this thesis work is to develop and advance methods for prediction in settings of right-censoring, and in some settings also including competing risks. Specifically, in Project I, we developed an approach that combines inverse probability of censoring weighting (IPCW) with bagging as a pre-processing step to enable the application of all existing ML methods for classification in settings of right-censoring and competing risks, and we propose a procedure to combine optimally a set of single IPCW bagged methods.
In Project II, we developed an extension of Project 1 to combine optimally not only over ML procedures for the same outcome but combining survival outcomes such as Cox regression model and continuous outcome such as pseudo-observations-based regression.
In Project III, we integrated pseudo-observations into Convolutional Neural Network to predict the cumulative incidence using images and structured clinical data. In Project IV, we applied the methods developed in Project 1-2 to build a flexible risk prediction model to predict the risk of any cancer diagnosis using a Swedish population-based register among sarcoidosis patients.
In the last project, Project V, we explored the utility of a dynamic prediction model in a setting of complete data as decision support tool for public health to manage future pandemics. Specifically, we applied two state-of-the-art batch reinforcement learning algorithms to learn the best face covering policy response at the national level with the goal of reducing the spread of COVID-19.
List of scientific papers
I. Pablo Gonzalez Ginestet, Ales Kotalik, David Vock, Julian Wolfson and Erin Gabriel. Stacked inverse probability of censoring weighted bagging: a case study in the InfCareHIV Register. Journal of the Royal Statistical Society Series C. 2021,70:51-65.
https://doi.org/10.1111/rssc.12448
II. Pablo Gonzalez Ginestet, Erin Gabriel and Michael Sachs. Survival stacking with multiple data types using pseudo-observation-based-AUC loss. Journal of Biopharmaceutical Statistics. 2022.
https://doi.org/10.1080/10543406.2022.2041655
III. Pablo Gonzalez Ginestet, Philippe Weitz, Mattias Rantalainen and Erin Gabriel. A deep convolutional neural network approach for predicting cumulative incidence based on pseudo-observations. [Manuscript]
IV. Elizabeth Arkema, Pablo Gonzalez Ginestet, Erin Gabriel and Michael Sachs. Predicting risk of cancer among sarcoidosis patients: a nationwide, registerbased, cohort-study. [Manuscript]
V. Pablo Gonzalez Ginestet, Erin Gabriel, Ziad El-Khatib and Ujjwal Neogi. Batch deep reinforcement learning for policy responses to the COVID pandemic. [Manuscript]
History
Defence date
2022-08-19Department
- Department of Medical Epidemiology and Biostatistics
Publisher/Institution
Karolinska InstitutetMain supervisor
Gabriel, ErinCo-supervisors
Rantalainen, Mattias; Neogi, Ujjwal; Sjölander, ArvidPublication year
2022Thesis type
- Doctoral thesis
ISBN
978-91-8016-633-1Number of supporting papers
5Language
- eng