TY - JOUR
T1 - Hypothesis-free discovery of novel cancer predictors using machine learning
AU - Madakkatel, Iqbal
AU - Lumsden, Amanda L.
AU - Mulugeta, Anwar
AU - Olver, Ian
AU - Hyppönen, Elina
N1 - Publisher Copyright:
© 2023 The Authors. European Journal of Clinical Investigation published by John Wiley & Sons Ltd on behalf of Stichting European Society for Clinical Investigation Journal Foundation.
PY - 2023/10
Y1 - 2023/10
N2 - Background: Cancer is a leading cause of morbidity and mortality worldwide, and better understanding of the risk factors could enhance prevention. Methods: We conducted a hypothesis-free analysis combining machine learning and statistical approaches to identify cancer risk factors from 2828 potential predictors captured at baseline. There were 459,169 UK Biobank participants free from cancer at baseline and 48,671 new cancer cases during the 10-year follow-up. Logistic regression models adjusted for age, sex, ethnicity, education, material deprivation, smoking, alcohol intake, body mass index and skin colour (as a proxy for sun sensitivity) were used for obtaining adjusted odds ratios, with continuous predictors presented using quintiles (Q). Results: In addition to smoking, older age and male sex, positively associating features included several anthropometric characteristics, whole body water mass, pulse, hypertension and biomarkers such as urinary microalbumin (Q5 vs. Q1 OR 1.16, 95% CI = 1.13–1.19), C-reactive protein (Q5 vs. Q1 OR 1.20, 95% CI = 1.16–1.24) and red blood cell distribution width (Q5 vs. Q1 OR 1.18, 95% CI = 1.14–1.21), among others. High-density lipoprotein cholesterol (Q5 vs. Q1 OR 0.84, 95% CI = 0.81–0.87) and albumin (Q5 vs. Q1 OR 0.84, 95% CI = 0.81–0.87) were inversely associated with cancer. In sex-stratified analyses, higher testosterone increased the risk in females but not in males (Q5 vs. Q1 ORfemales 1.23, 95% CI = 1.17–1.30). Phosphate was associated with a lower risk in females but a higher risk in males (Q5 vs. Q1 ORfemales 0.94, 95% CI = 0.90–0.99 vs. ORmales 1.09, 95% CI 1.04–1.15). Conclusions: This hypothesis-free analysis suggests personal characteristics, metabolic biomarkers, physical measures and smoking as important predictors of cancer risk, with further studies needed to confirm causality and clinical relevance.
AB - Background: Cancer is a leading cause of morbidity and mortality worldwide, and better understanding of the risk factors could enhance prevention. Methods: We conducted a hypothesis-free analysis combining machine learning and statistical approaches to identify cancer risk factors from 2828 potential predictors captured at baseline. There were 459,169 UK Biobank participants free from cancer at baseline and 48,671 new cancer cases during the 10-year follow-up. Logistic regression models adjusted for age, sex, ethnicity, education, material deprivation, smoking, alcohol intake, body mass index and skin colour (as a proxy for sun sensitivity) were used for obtaining adjusted odds ratios, with continuous predictors presented using quintiles (Q). Results: In addition to smoking, older age and male sex, positively associating features included several anthropometric characteristics, whole body water mass, pulse, hypertension and biomarkers such as urinary microalbumin (Q5 vs. Q1 OR 1.16, 95% CI = 1.13–1.19), C-reactive protein (Q5 vs. Q1 OR 1.20, 95% CI = 1.16–1.24) and red blood cell distribution width (Q5 vs. Q1 OR 1.18, 95% CI = 1.14–1.21), among others. High-density lipoprotein cholesterol (Q5 vs. Q1 OR 0.84, 95% CI = 0.81–0.87) and albumin (Q5 vs. Q1 OR 0.84, 95% CI = 0.81–0.87) were inversely associated with cancer. In sex-stratified analyses, higher testosterone increased the risk in females but not in males (Q5 vs. Q1 ORfemales 1.23, 95% CI = 1.17–1.30). Phosphate was associated with a lower risk in females but a higher risk in males (Q5 vs. Q1 ORfemales 0.94, 95% CI = 0.90–0.99 vs. ORmales 1.09, 95% CI 1.04–1.15). Conclusions: This hypothesis-free analysis suggests personal characteristics, metabolic biomarkers, physical measures and smoking as important predictors of cancer risk, with further studies needed to confirm causality and clinical relevance.
KW - artificial intelligence
KW - biomarkers
KW - cancer incidence
KW - machine learning
KW - risk factors
UR - http://www.scopus.com/inward/record.url?scp=85161702863&partnerID=8YFLogxK
U2 - 10.1111/eci.14037
DO - 10.1111/eci.14037
M3 - Article
C2 - 37303098
AN - SCOPUS:85161702863
SN - 0014-2972
VL - 53
JO - European Journal of Clinical Investigation
JF - European Journal of Clinical Investigation
IS - 10
M1 - e14037
ER -