Enhancing energy consumption prediction and interpretability in wastewater treatment plants: A novel temporal difference-weighted resampling framework with cross validation for imbalanced regression

Kangrong Tang; Anlei Wei; Hanxiao Shi; Zixuan Wang; Jirui Zou; Yaqi Zhu

doi:10.1016/j.jenvman.2025.126386

Enhancing energy consumption prediction and interpretability in wastewater treatment plants: A novel temporal difference-weighted resampling framework with cross validation for imbalanced regression

J Environ Manage. 2025 Jun 27:390:126386. doi: 10.1016/j.jenvman.2025.126386. Online ahead of print.

Authors

Kangrong Tang¹, Anlei Wei², Hanxiao Shi³, Zixuan Wang⁴, Jirui Zou⁵, Yaqi Zhu⁶

Affiliations

¹ Xi'an Key Laboratory of Environmental Simulation and Ecological Health in the Yellow River Basin, College of Urban and Environmental Sciences, Northwest University, Xi'an, 710127, China; Institute of Environmental Sciences, Northwest University, Xi'an, Shaanxi, 710127, China. Electronic address: 202332449@stumail.nwu.edu.cn.
² Xi'an Key Laboratory of Environmental Simulation and Ecological Health in the Yellow River Basin, College of Urban and Environmental Sciences, Northwest University, Xi'an, 710127, China; Institute of Environmental Sciences, Northwest University, Xi'an, Shaanxi, 710127, China. Electronic address: alwei@nwu.edu.cn.
³ Xi'an Key Laboratory of Environmental Simulation and Ecological Health in the Yellow River Basin, College of Urban and Environmental Sciences, Northwest University, Xi'an, 710127, China; Institute of Environmental Sciences, Northwest University, Xi'an, Shaanxi, 710127, China. Electronic address: 202232991@stumail.nwu.edu.cn.
⁴ Xi'an Key Laboratory of Environmental Simulation and Ecological Health in the Yellow River Basin, College of Urban and Environmental Sciences, Northwest University, Xi'an, 710127, China; Institute of Environmental Sciences, Northwest University, Xi'an, Shaanxi, 710127, China. Electronic address: 202332462@stumail.nwu.edu.cn.
⁵ Xi'an Key Laboratory of Environmental Simulation and Ecological Health in the Yellow River Basin, College of Urban and Environmental Sciences, Northwest University, Xi'an, 710127, China; Institute of Environmental Sciences, Northwest University, Xi'an, Shaanxi, 710127, China. Electronic address: 202332473@stumail.nwu.edu.cn.
⁶ Xi'an Key Laboratory of Environmental Simulation and Ecological Health in the Yellow River Basin, College of Urban and Environmental Sciences, Northwest University, Xi'an, 710127, China; Institute of Environmental Sciences, Northwest University, Xi'an, Shaanxi, 710127, China. Electronic address: zhuyaqi@stumail.nwu.edu.cn.

PMID: 40580866
DOI: 10.1016/j.jenvman.2025.126386

Abstract

Accurate prediction of energy consumption is crucial for optimizing wastewater treatment plant (WWTP) operations. However, imbalanced data caused by variable influent conditions often compromises machine learning (ML) model accuracy. This study proposes a novel ML framework to address the imbalanced regression problem using three temporal difference-weighted resampling (TDWR) methods: Threshold under-sampling (TUS), Stochastic under-sampling (SUS), and Inverse histogram under-sampling (IHS). Internal validation used an 80/20 training/testing split within each dataset, and external validation involved cross-testing among different resampled and original datasets to ensure robust assessment. Among the methods, SUS with a sampling factor of 6 (SUS-6) achieved the best performance. When combined with XGBoost, it attained an R² of 0.9998, an RMSE of 0.0833, and a MAPE of 0.14 %. Compared to the original data, R² was improved by up to 27.6 %, RMSE was reduced by nearly 87 %, and MAPE was reduced by 96.07 %. The 95 % confidence interval of residuals narrowed to (-1.24, 1.25), shrinking by approximately 70 %. Similar improvements were observed across support vector regression (84 % narrower), artificial neural network (45 %), and random forest (63 %) models. SHAP (SHapley Additive exPlanations)-based interpretability analysis revealed that aeration-related features such as BOD, COD, and NH₃-N were the main contributors to energy consumption, providing practical guidance for process optimization. Overall, the proposed TDWR framework enhances both prediction accuracy and interpretability, offering an effective tool for intelligent, low-carbon energy management in WWTPs.

Keywords: Data imbalance; Energy consumption; Machine learning; Temporal difference-weighted resampling; Wastewater treatment plants.