Fracture evaluation and prediction of outcome after a fracture using artificial neural networks
Background: Improved interpretation of orthopedic trauma could improve patient outcomes. The radiograph is the predominant tool in orthopedic emergency decision-making. Machine learning-guided radiographic interpretation could help improve patient outcomes.
Aims: 1) Explore convolutional neural networks (CNN) for orthopedic trauma imaging and fracture and classification in medical imaging. 2) Study CNNs on combined imaging and registry data to predict patient outcomes after trauma. 3) Evaluate the generalizability of this approach through external validation.
Methods: Study I used CNNs and transfer learning to detect fractures in auto- labeled wrist, hand, ankle, and foot radiographs. Study II and Study III doubled down on ankle fractures using the AO Foundation-/Orthopedic Trauma Association (AO) 2018 standard. We manually labeled thousands of ankle exams and trained a CNN to classify fractures. In Study III, we externally validated a CNN model against a different site and implemented active learning to improve the model. Study IV linked fractures in the Swedish Fracture Registry (SFR) to the trauma radiographs and developed models that, based on the initial radiograph, predicted patient-reported outcome measures (PROM) or death after one year.
Results
Study I: Deeper CNN architectures outperformed, with the best correctly classifying 83% of cases, compared to 82% for the human reviewers. For secondary outcomes, the CNN performed near-perfectly for body parts and excellently in exam view. A manual review of 400 random training cases found that the auto-generated labels were the problem.
Study II: The CNN performed well on the primary task. However, several outcomes were too rare to be included in the training, testing, or error bounding. For example, type A fractures were challenging to train, and there were many AO subgroups.
Study III: The external validation data differed from the training site in important ways. It included weight-bearing studies, mostly type A fractures, with fewer views per study. The CNN external validation performance improved with active learning on type A fractures but decreased somewhat for other types.
Study IV: We tried a range of network configurations and found that the CNN's ability to predict PROM after one year (PROM1) or death was variable. At best, the root mean squared errors (RMSE) and mean average errors (MAE) were on par with the standard deviation.
Conclusions
Study I: We succeeded in predicting fractures in radiographs at the level of human reviewers. The CNN performance for individual radiographs was better than indicated by the automatic fracture labels generated for the study.
Study II: We successfully implemented a CNN for ankle fracture classification using the AO 2018 standard, looking at the complete exam rather than individual images.
Study III: The initial external validation dataset performance was acceptable but not good enough. We successfully improved external validity using internal training data and active learning. External validation is essential when reporting CNN model performance.
Study IV: We performed a series of experiments to train a CNN to predict PROM after one year and got our models to learn the most common value or the mean for the PROMs, i.e., overfits. We explore different ways to improve performance.
List of scientific papers
I. Olczak J, Fahlberg N, Maki A, Razavian A S, Jilert A, Stark A, Sköldenberg O, Gordon M. Artificial intelligence for analyzing orthopedic trauma radiographs. Acta Orthopaedica, 2017, 88:6, 581-586. https://doi.org/10.1080/17453674.2017.1344459
II. Olczak J, Emilson F, Razavian A, Antonsson T, Stark A, Gordon M. Ankle fracture classification using deep learning: automating detailed AO Foundation/Orthopedic Trauma Association (AO/OTA) 2018 malleolar fracture identification reaches a high degree of correct classification. Acta Orthopaedica. 2021a Jan 2;92(1):102-8. https://doi.org/10.1080/17453674.2020.1837420
III. Olczak, J., Prijs, J., IJpma, F. et al. External validation of an artificial intelligence multi-label deep learning model capable of ankle fracture classification. BMC Musculoskelet Disord 25, 788 (2024).
https://doi.org/10.1186/s12891-024-07884-2
IV. Olczak J and Gordon M. Artificial intelligence for predicting patient- reported outcome measures (PROM) from the Swedish Fracture Registry, based on trauma radiographs [Manuscript].
History
Defence date
2024-12-04Department
- Department of Clinical Sciences, Danderyd Hospital
Publisher/Institution
Karolinska InstitutetMain supervisor
Max GordonCo-supervisors
Olof Sköldenberg; Ali Sharif RazavianPublication year
2024Thesis type
- Doctoral thesis
ISBN
978-91-8017-824-2Number of pages
108Number of supporting papers
4Language
- eng