Development and validation of a risk prediction model for lung cancer with common health examination indexes


Session type:

Zhangyan Lyu1,Ni Li1,Fengwei Tan1,Jiang Li1,Chunqing Lin1,Hongda Chen1,Jiansong Ren1,Jufang Shi1,Kai Su1,Fang Li1,Xiaoshuang Feng1,Luopei Wei1,Xin Li1,Yan Wen1,Gang Wang2,Shuohua Chen2,Shouling Wu2,Min Dai1,Jie He1
1National Cancer Center, Beijing, China, National Clinical Research Center for Cancer, Beijing, China, Cancer Hospital, Beijing, China, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China,2Kailuan General Hospital, Tangshan, China



Lung cancer has been the most common cancer and leading cause of cancer-related death for several decades worldwide, especially in China, the most populous country. Low-dose computed tomography (LDCT) has been proven to reduce lung cancer mortality. A user-friendly lung cancer risk perdition model could help standardize the selection of high-risk population for LDCT screening and alter individuals’ lifestyle factors to lower their risk. We thus sought to develop and internally validate a simple model for lung cancer based on a prospective cohort study in China.


A total of 138,150 people was prospectively observed from 2006 to 2015 for lung cancer incidence. Stepwise multivariable-adjusted logistic regressions with Pentry=0.15 and Pstay=0.20 were conducted to select the candidate variables included in the prediction model. Concordance statistics (C-statistics) and Hosmer–Lemeshow tests were used to evaluate discrimination and calibration, respectively. Ten-fold cross-validation was used for internal validation.


During a median of 9-year follow-up, a total of 1088 (0.79 %) lung cancer cases were identified. The simple model including age and smoking generated a C-statistics of 0.71. The full model additionally included sex, alcohol consumption, body mass index (BMI), low-density lipoprotein cholesterol (LDL-C), and C-reactive protein (CRP) showed significantly better predictive performance regarding discrimination (C-statistics=0.73, P<0.01). In 10-fold cross-validation, the average C-statistic across the 10 test sets was similar (0.73). Model calibrated well across deciles of predicted risk (PHL=0.48). The predicted risk of lung cancer in the top decile was 0.04% vs. 2.36% in the bottom decile (Odds ratio [OR]=98.16).


We developed and internally validated an easy-to-use risk prediction model for lung cancer among the Chinese population that could provide guidance for LDCT screening and early detection of lung cancer.