A machine learning model equipped with only data on people’s age, smoking duration and the number of cigarettes smoked per day can predict lung cancer risk and identify who needs lung cancer screening, according to a new study publishing October 3rd in the open access journal PLOS Medicine by Thomas Callender of University College London, UK, and colleagues.
Lung cancer is the most common cause of cancer death worldwide, with poor survival in the absence of early detection. Screening for lung cancer among those at highest risk could reduce lung cancer deaths by nearly a quarter, but the ideal way to determine the high-risk population has been unclear. The current standard-of-care model of lung cancer risk requires 17 variables, few of which are routinely available in electronic health records.
In the new study, researchers used data on 216,714 ever-smokers from the UK Biobank cohort and 26,616 ever-smokers participating in the US National Lung Screening Trial to develop new models of lung cancer risk.
A machine learning model used three predictors — age, smoking duration and pack-years — to calculate people’s odds of both developing lung cancer and dying of lung cancer over the next five years. The researchers tested the new model on a third set of data, from the US Prostate, Lung, Colorectal and Ovarian Screening Trial. The model predicted lung cancer incidence with an 83.9% sensitivity and lung cancer deaths with an 85.5% sensitivity. All versions of the model had a higher sensitivity than the currently used risk prediction formulas at an equivalent specificity.
Callender adds, “We know that screening for those who have a high chance of developing lung cancer can save lives. With machine learning, we’ve been able to substantially simplify how we work out who is at high risk, presenting an approach that could be an exciting step in the direction of widespread implementation of personalised screening to detect many diseases early.”
#####