Towards clinically reliable artificial intelligence for prostate cancer pathology
Histopathological assessment of prostate biopsies plays a central role in determining patient prognosis and guiding treatment decisions. This evaluation is typically performed by pathologists using the Gleason scoring system on hematoxylin and eosin (H&E) stained tissue sections. However, the process is subjective and prone to significant intra- and inter-pathologist variability, which can result in under- and over-treatment of patients. In cases where a definitive diagnosis cannot be made from H&E slides alone, additional immunohistochemical (IHC) staining is often used to confirm malignancy, though this introduces added cost, time, and workload. The digitisation of prostate biopsies into Whole Slide Images (WSIs) and the application of Artificial Intelligence (AI) have shown great potential to support pathologists by improving diagnostic consistency and reducing variability. Deep learning models, in particular, have demonstrated diagnostic performance comparable to that of expert pathologists. Yet, real-world deployment remains limited due to a lack of protocolised studies assessing model generalisability different laboratories, whole slide scanners, patient populations and clinically challenging cases. More recently, foundation models have shown promising pan-cancer detection capabilities, including prostate cancer, but evidence on their performance in disease-specific diagnostics is limited. This thesis addresses these gaps by developing and validating AI models for prostate cancer diagnosis with a focus on generalisability, clinical integration, and potential applications such as reducing reliance on IHC.
In Study I, we developed software for accessing the .isyntax WSI format, which is the proprietary format of the Philips UFS scanner-the first scanner to obtain U.S. Food and Drug Administration clearance. Accessing these digital images allowed us to use them in subsequent studies to train AI models. In Study II, we evaluated the use of physical colour calibration for reducing scanner-induced colour variability in WSIs, with the goal of assessing the model's performance across cohorts digitised by different scanner vendors. Colour calibration showed superior performance over colour normalisation techniques, improving cancer detection and Gleason grading performance. In Study III, we have pre-registered a study protocol for the development and retrospective validation of an improved, weakly supervised AI model for diagnostic assessment of digitised prostate biopsies, and in Study IV, we have developed and validated the model rigorously following the predefined protocol. To our knowledge, this represents the largest retrospective validation of an AI model for such a purpose, including approximately 100,000 digitised core needle biopsies from 7,342 patients spanning across 15 laboratories in 11 countries. The model was trained end-to-end and showed excellent generalisation capabilities, achieving pathologist-level diagnosis and grading across international cohorts (i.e., external or internal validation with respect to the scanner used for digitisation and clinical laboratory). We compared the developed model, which we refer to as task-specific (TS), with the foundation models (FM), UNI and Virchow2. Given the potential of FMs as few- shot learners, we investigated whether their performance improves with increased prostate pathology training data. Results showed a consistent improvement in Gleason scoring across all validation cohorts for all models when more prostate pathology training data were available. We further assessed performance across different scanners and clinically challenging cases, observing improvements with increased task-specific training data. Notably, foundation models used up to 35 times more energy than the TS model during model prediction, raising concerns about their sustainability. In Study V, we used our in-house TS model for an important clinical challenge: reducing IHC staining in prostate cancer. Using a sensitivity-prioritised scenario, we retrospectively evaluated the model's performance on H&E-stained biopsies from international cohorts, selecting cases where pathologists had originally assessed the H&E-stained biopsy alongside the IHC. By lowering the malignancy classification threshold to prioritise sensitivity, the model identified all malignant slides without a single false negative, while reducing IHC usage up to 44.4% across international cohorts.
To conclude, the constituent papers of this thesis address important and timely challenges in digital pathology, paving the way for future prospective clinical trials of AI models in real-world settings. These efforts support the broader integration of AI-assisted diagnostics into routine clinical practice, with the ultimate goal of reducing variability in histopathological assessment and enhancing patient care.
List of scientific papers
I. Mulliqi N*, Kartasalo K*, Olsson H, Ji X, Egevad L, Eklund M, Ruusuvuori P. OpenPhi: an interface to access Philips iSyntax whole slide images for computational pathology. Bioinformatics 2021 Aug 6;37(21):3995-3997.
https://doi.org/10.1093/bioinformatics/btab578
II. Ji X, Salmon R, Mulliqi N, Khan U, Wang Y, Blilie A, Olsson H, Pedersen BG, Sørensen KD, Ulhøi BP, Kjosavik SR, Janssen EAM, Rantalainen M, Egevad L, Ruusuvuori P, Eklund M, Kartasalo K. Physical Color Calibration of Digital Pathology Scanners for Robust Artificial Intelligence-Assisted Cancer Diagnosis. Modern Pathology 2025 Jan 16;38(5):100715.
https://doi.org/10.1016/j.modpat.2025.100715
III. Mulliqi N, Blilie A, Ji X, Szolnoky K, Olsson H, Titus M, Gonzalez GM, Boman SE, Valkonen M, Gudlaugsson E, Kjosavik SR, Asenjo J, Gambacorta M, Libretti P, Braun M, Kordek R, Łowicki R, Hotakainen K, Väre P, Pedersen BG, Sørensen KD, Ulhøi BP, Rantalainen M, Ruusuvuori P, Delahunt B, Samaratunga H, Tsuzuki T, Janssen EAM, Egevad L, Kartasalo K, Eklund M. Study Protocol: Development and Retrospective Validation of an Artificial Intelligence System for Diagnostic Assessment of Prostate Biopsies. medRxiv 2024.07.04.24309948. [Manuscript Preprint]
https://doi.org/10.1101/2024.07.04.24309948
IV. Mulliqi N, Blilie A, Ji X, Szolnoky K, Olsson H, Boman SE, Titus M, Gonzalez GM, Mielcarz JA, Valkonen M, Gudlaugsson E, Kjosavik SR, Asenjo J, Gambacorta M, Libretti P, Braun M, Kordek R, Łowicki R, Hotakainen K, Väre P, Pedersen BG, Sørensen KD, Ulhøi BP, Ruusuvuori P, Delahunt B, Samaratunga H, Tsuzuki T, Janssen EAM, Egevad L, Eklund M, Kartasalo K. Foundation Models - A Panacea for Artificial Intelligence in Pathology? arXiv:2502.21264v2 [cs.CV]. [Manuscript Preprint]
https://doi.org/10.48550/arXiv.2502.21264
V. Blilie A*, Mulliqi N*, Ji X, Szolnoky K, Boman SE, Titus M, Gonzalez GM, Asenjo J, Gambacorta M, Libretti P, Gudlaugsson E, Kjosavik SR, Egevad L, Janssen EAM, Eklund M, Kartasalo K. Artificial Intelligence- Assisted Prostate Cancer Diagnosis for Reduced Use of Immunohistochemistry. arXiv:2504.00979v1 [cs.CV]. [Manuscript Preprint]
https://doi.org/10.48550/arXiv.2504.00979
* Authors contributed equally
History
Defence date
2025-06-13Department
- Department of Medical Epidemiology and Biostatistics
Publisher/Institution
Karolinska InstitutetMain supervisor
Martin EklundCo-supervisors
Lars Egevad; Alessio CrippaPublication year
2025Thesis type
- Doctoral thesis
ISBN
978-91-8017-592-0Number of pages
66Number of supporting papers
5Language
- eng