Can we screen for pancreatic cancer? Identifying a sub-population of patients at high risk of subsequent diagnosis using machine learning techniques applied to primary care data


Year:

Session type:

Theme:

Ananya Malhotra, Bernard Rachet, Audrey Bonaventure, Stephen Pereira, Laura Woods

Abstract

Background

Pancreatic patients are predominantly diagnosed too late to be treated. Screening is not appropriate because so few people develop the disease. A simple blood or urine test for the disease may soon be available and has the potential to rapidly improve outcomes if used as part of a targeted screening programme aimed at high-risk patients. We examined if such patients are identifiable from routinely collected data.

Method

We conducted a retrospective case-control study on individually linked electronic health records collected from primary care linked to cancer registrations. We examined 1,139 pancreatic patients, aged 15-99 years, diagnosed January 2005 - June 2008 were individually age-, sex- and diagnosis time-matched to four non-pancreatic (cancer) controls. Clinical symptoms and prescription codes for the 24 months preceding diagnosis were used to identify the reporting of 57 individual symptoms. Using a machine learning approach, we trained a logistic regression model on 75% of the data to recognise a combination of atypical symptoms experienced by patients who later developed pancreatic cancer.

Results

Using patients’ medical history recorded between 20-24 months before diagnosis we were able to identify 41.3% of the population up to 60 years who were at high-risk of developing pancreatic cancer with 72.5% sensitivity, 59% specificity and 66% AUC. Among patients above age 60, 43.2% were similarly identified up to 17 months before diagnosis, with 66% sensitivity, 57% specificity and 61% AUC.

Conclusion

A sub-population of patients at higher risk were detectable 17-20 months prior to diagnosis. The use of cancer patient controls would have led to increased false positive tests so further work is required using population-based controls. Nevertheless, the model has the potential to be used alongside an accurate and acceptable pre-screening (biomarker) test to increase early diagnosis. This would result in a greater number of patients surviving this devastating disease.

Impact statement

Pairing our model with an accurate pre-screening biomarker test for pancreatic cancer, we can identify patients in primary care with early stage pancreatic tumours.