Predicting runoff water quality is crucial to mitigate non-point source pollution within the urban watershed. However, the complex physical runoff transportation process makes it difficult to predict effectively. This study proposed a flexible framework integrating hydrology-hydraulic datasets from the physical-driven model and machine learning networks to enhance prediction accuracy and efficiency. High-resolution measurement data was provided by online monitoring equipment installed in a highly urbanized watershed in the Pearl River Delta. Interpretable analysis derived by Shapley Additive Explanations (SHAP) approach was further utilized to determine the driving forces for predicting runoff water quality. Results show the average concentrations of chemical oxygen demand (COD), ammonia nitrogen (NH3-N), and suspended solids (SS) in the given water were 15.28 ± 2.84 mg/L, 2.63 ± 1.48 mg/L, and 12.02 ± 0.55 mg/L, respectively. Modeling comparison shows that the random forest networks performed the best among the given machine learning models. The R2 values for COD, NH3-N, and SS predictions were 0.78, 0.77, and 0.81, respectively. RMSE values were 0.58, 0.31, and 0.17, respectively. SHAP analysis revealed that precipitation, slope, and impervious areas ratio strongly affected the runoff water quality. The data presented herein shows the proposed modeling framework could capture the dynamic characteristics of pollutants in surface water.
Keywords: Hybrid model; Machine learning model; Runoff water quality prediction; Shapley additive explanations.
Copyright © 2025 Elsevier B.V. All rights reserved.