Accuracy of cancer specific cause of death assigned through machine learning compared to an independent cause of death committee in a Cluster randomised trial of PSA testing for prostate cancer (CAP).


Session type:


Emma Turner1, Chris McWilliams, Eleanor Walsh, Raul Santos-Rodriguez, Avon Huxor, Richard Martin
1University of Bristol



Use of big data and AI in medicine is increasing with an expectation that these will improve patient outcomes and save lives.  They also have the potential to increase efficiencies in clinical trials.  Knowledge of underlying cause of death (CoD) is a key health outcome in research and assignment of cancer specific CoD from death certification is prone to misclassification.  The development of an interpretable machine learning (ML) classifier to predict cancer specific death could reduce the need for complex medical summaries to be reviewed.


We applied ML classifiers to over 2,606 free text summaries used by independent experts to assign CoD in CAP .  A label of prostate cancer death assigned by the reviewers was used to train the ML techniques to: 1) Identify the key elements (words and phrases) that are good predictors of prostate cancer death; 2) Add user confidence and transparency by explaining how the ML techniques work, rather than relying on the prediction probability output by the classifier.


Using a random forest (RF) classifier with a bag-of-words feature set we found that we could predict prostate cancer death with >90% accuracy. We then investigated how the RF was classifying the free-text summaries by looking at which elements in the free-text summarises were used to assign prostate cancer death.   Word clouds provide a visual representation of the words (or group of words) that are most predictive of prostate cancer deaths across the dataset. The word clouds show that clinically important signs of progressing prostate cancer, are key to identifying prostate cancer deaths. 


Algorithmic classification of clinical data could reduce the need for complex medical summaries to be reviewed by an independent committee. We demonstrate the use of visual methods to explain classifier predictions allowing users to apply clinical judgement when assessing the appropriateness of predictions.  Knowledge of predictive features could also be used to target data extraction reducing the workload in creating the free-text summaries.

Impact statement

An interpretable machine learning classifier that classifies cancer specific death could increase efficiencies in clinical trials, reducing the need for complex medical summaries to be reviewed by independent experts.