Evaluation of a multi-stage AI analysis system to support prostate cancer diagnostic imaging


Year:

Session type:

Theme:

Antony Rix, Jakub Suchanek, Aman Mehan, Chris Doran, Anwar Padhani, Christof Kastner, Tristan Barrett, Evis Sala

Abstract

Background

AI has the potential to support clinical interpretation and improve accuracy of pre-biopsy MRI for prostate cancer, helping address concerns about its sensitivity, selectivity and repeatability, and need for contrast. We compare a new artificial intelligence (AI) based system for detecting Gleason ≥3+4 clinically significant prostate cancer (csPCa) using MRI, with human readers and existing computer aided diagnosis (CAD) literature.

Method

An AI diagnostic aid for prostate cancer detection was developed using a multi-stage architecture. Data was obtained from open, anonymised prostate MRI datasets, divided into training, development validation, and held-out test sets: PROMISE12 and NCI-ISBI 2013 Challenge, T2 MRI segmentation datasets; and PROSTATEx, an MRI and MR-guided biopsy dataset acquired at a single centre on two 3T scanners (Siemens MAGNETOM Trio and Skyra). Performance was evaluated after model development was completed. For prostate gland segmentation, the DICE coefficient was calculated. For cancer identification, sensitivity, specificity, and negative predictive value (NPV), all at optimum NPV, and receiver operating characteristic area under curve (AUC), were estimated with bootstrapped 95% confidence intervals.

Results

The system achieved 92% average DICE score for prostate gland segmentation when compared with the benchmark examples from the PROMISE12 test set (n=10). The system identified patients with csPCA with sensitivity 93% (95% CI 82-100%), specificity 76% (64-87%), NPV 95% (88-100%), and AUC 0.92 (0.84-0.98), using bpMRI data. These results are derived from the combined PROSTATEx development validation and test sets (n=80). Performance on the held-out PROSTATEx test set (n=40) was higher. Similar performance was found with mpMRI data. Comparable AI/CAD publications report equivalent sensitivity at lower specificity of 56% (Bleker 2019, PZ only), 37% (Cao 2019) or 6% (Thon 2017).

Conclusion

The AI system’s performance is in line with central reporting by expert radiologists. Its accuracy exceeds published results for similar prostate CAD/AI systems, although methodological differences and small test set size limit comparisons. Training and evaluation with larger, more diverse datasets are needed to evaluate this and related detection tasks further.

Impact statement

Artificial intelligence-based software could support exclusion of clinically significant prostate cancer with high NPV and AUC, and may facilitate avoidance of gadolinium-based contrast agents.