Using text mining to automatically extract radiation esophagitis from free-text clinical narratives for clinical decision support


Session type:


Azad Dehghan1,Tom Liptrot1,Kardo Ala’aldeen1,Daniel Tibble1,Matthew Barker-Hewitt1,Linda Ashcroft1,Fiona Blackhall1,Corinne Faivre-Finn1
1The Christie NHS Foundation Trust



Electronic patient records (EPR) contain vast amount of patient centric data in which free-text clinical narratives remain a main format in which clinicians’ record day-to-day patient care. In response to the data deluge derived from large volumes of textual data, text mining (TM) methods have shown great potential in automated extraction of clinical information contained within such unstructured data format [1,2,3,4].

Automated extraction of key information from the EPR is useful in many practical applications. For example, radiation esophagitis has been identified as an important variable as part of a proposed clinical decision support system initiative at The Christie NHS Foundation Trust.


A knowledge-driven TM method was engineered to identify and grade esophagitis (using CTCAE version 3.0) following radiotherapy treatment. Modelling of the problem at hand was achieved through expert knowledge (clinical oncologists) and additional pattern analysis (collocation extraction) using unlabelled data from randomly selected 61,397 patients’ narrative records. The latter strategies allowed us to model the incidence and severity of esophagitis by commonly appearing lexical patterns, which were subsequently used for pattern formalism to engineer specific extraction rules. We used standard information extraction evaluation metric (F1-score) to measure the performance of the method. A labelled dataset (n=35 patient records) derived from a well annotated clinical phase II trial [5] was used for validation.


Our preliminary results show that the TM method achieves 88/67% F1-score for identification of grade 2/3 esophagitis (or 80/86% accuracy for classification at the patient-level) respectively. The dataset contained 31 (grade 2) and 9 (grade 3) patients according to the trial case report form, and the TM method achieved 84/55% sensitivity respectively.


Knowledge-driven TM methods seem to perform well in identifying and classifying radiation esophagitis in free-text clinical narratives. We are currently exploring data-driven TM methods (i.e., machine learning) using a larger dataset.