Aims: To investigate the physical activity levels of lung cancer survivors, analyse the influencing factors, and construct a predictive model for the physical activity levels of lung cancer survivors based on machine learning algorithms.
Design: This was a cross-sectional study.
Methods: Convenience sampling was used to survey lung cancer survivors across 14 hospitals in eastern, central, and western China. Data on demographic, disease-related, health-related, physical, and psychosocial factors were also collected. Descriptive analyses were performed using SPSS 25.0, and predictors were identified through multiple logistic regression analyses. Four machine learning models-random forest, gradient boosting tree, support vector machine, and logistic regression-were developed and evaluated based on the Area Under the Curve of the Receiver Operating Characteristic (AUC-ROC), accuracy, precision, recall, and F1 score. The best model was used to create an online computational tool using Python 3.11 and Flask 3.0.3. This study was conducted and reported in accordance with the TRIPOD guidelines and checklist.
Results: Among the 2231 participants, 670 (30%), 1185 (53.1%), and 376 (16.9%) exhibited low, moderate, and high physical activity levels, respectively. Multivariate logistic regression identified 15 independent influencing factors: residential location, geographical region, religious beliefs, histological type, treatment modality, regional lymph node stage, grip strength, 6-min walking distance, globulin, white blood cells, aspartate aminotransferase, blood urea, MDASI score, depression score, and SRAHP score. The random forest model performed best among the four algorithms, achieving AUC-ROC values of 0.86, 0.70, 0.72, and 0.67, respectively, and was used to develop an online predictive tool (URL: http://10.60.32.178:5000).
Conclusion: This study developed a machine learning model to predict physical activity levels in lung cancer survivors, with the random forest model demonstrating the highest accuracy and clinical utility. This tool enables the early identification of low-activity survivors, facilitating timely, personalised rehabilitation and health management.
Implications for the profession and/or patient care: The development of a predictive model for physical activity levels in lung cancer survivors can help clinical medical staff identify survivors with relatively low physical activity levels as early as possible. Thus, personalised rehabilitation plans can be formulated to optimise quality of life during their survival period.
Impact: Physical activity has been used as a nonpharmacological intervention in cancer patient rehabilitation plans. However, a review of past studies has shown that lung cancer survivors generally have low physical activity levels. In this study, we identified the key factors influencing physical activity among lung cancer survivors through a literature review. We constructed a prediction model for their physical activity levels using machine learning algorithms. Clinical medical staff can use this model to identify patients with low physical activity levels early and to develop personalised intervention plans to improve their quality of life during survival.
Reporting method: The study adhered to the relevant EQUATOR reporting guidelines, the TRIPOD Checklist for Prediction Model Development and Validation.
Patient or public contribution: During the data collection phase, participants were recruited to complete the questionnaires.
Keywords: lung cancer; machine learning; physical activity; predictive modelling; survivors.
© 2025 John Wiley & Sons Ltd.