Assessment of tissue composition with digital pathology in colorectal cancer


Session type:


Enric Domingo1,Aikaterini Chatzipli2,Susan Richman3,Andrew Blake1,Claire Hardy2,Celina Whalley4,Keara Redmond5,Ian Tomlinson4,Philip Dunne5,Steven Walker6,Andrew Beggs4,Ultan McDermott2,Graeme Murray7,Leslie Samuel8,Matt Seymour3,Philip Quirke3,Tim Maughan1,Viktor Koelzer9
1University of Oxford, Oxford, UK,2Wellcome Trust Sanger Institute, Cambridge, UK,3Leeds Institute of Cancer and Pathology, Leeds, UK,4University of Birmingham, Birmingham, UK,5Queen's University Belfast, Belfast, UK,6Almac Diagnostics, Craigavon, UK,7University of Aberdeen, Aberdeen, UK,8Aberdeen Royal Infirmary, Aberdeen, UK,9University of Zurich, Zurich, Switzerland



The tumour microenvironment is a key feature to understand cancer biology. Quantification of tissue composition is usually based on visual pathological review (VPR) or deconvolution of whole genome molecular data. The former is a direct measurement with modest reproducibility while the latter is an indirect measurement of unclear accuracy and is expensive. Here we test digital pathology coupled with machine learning as a new tool to assess tissue composition.


As part of the Stratification in COloRecTal cancer (S:CORT) programme, over 500 colorectal cancer (CRC) paraffin blocks from resections and biopsies were sequentially sectioned for RNA/DNA extractions and two Haematoxylin and Eosin stained (H&E) sections. RNA expression microarrays, targeted DNA sequencing and DNA methylation arrays were applied. Tissue composition was obtained by a deep neural net (DNN) algorithm after supervised training on >1,500 tissue areas. Tumour purity estimates (TPE) were obtained from VPR and RNA/methylation arrays. Copy number alterations were adjusted using different TPE and compared. Similar analyses were performed with TCGA CRCs.


DNN estimates including area and cell counts were obtained for tumour, desmoplastic stroma, inflamed stroma, mucin/hypocellular stroma, muscle, necrosis and white space. DNN estimates on the same H&Es obtained matching results (r=1.0). Comparison of paired H&Es showed very high correlations (r~0.85). TPE by VPR consistently underestimated purity which resulted in ~10% overestimation of copy number calls. Conversely, TPE from either RNA or methylation deconvolution showed consistent overestimation resulting in ~10% of copy number undercalls.


Tissue composition analysis with DNN allows analytical robustness, automatization and standardization and provides very high reproducibility at single cell resolution. DNN-based TPE are more accurate than VPR or deconvolution from genome-wide omic platforms which tend to under and overestimate tumour purity respectively. DNN could be used to better plan and assess downstream molecular analyses and investigate tissue-based metrics as potential biomarkers in clinical trials.