Predicting heavy metal concentration in crop grain using automated machine learning models

Ying Yong Sheng Tai Xue Bao. 2025 Jun;36(6):1889-1897. doi: 10.13287/j.1001-9332.202506.018.

Abstract

With the acceleration of industrialization and the intensification of agricultural activities, heavy metals (HMs) pollution in crops has become an issue that can not be ignored in current agricultural production. Based on 791 data sets from 54 publications, we predicted HMs concentrations in crop grains by using automated machine learning (AutoML) models. Ten factors were used as input variables: organic fertilizer application, HMs concentration in organic fertilizer, soil HMs concentration, soil organic matter, pH, cation exchange capacity, clay content, silt content, sand content and plant types. The concentrations of chromium (Cr), cadmium (Cd), lead (Pb), arsenic (As) and mercury (Hg) in crop grains were set as output variables. We evaluated the simulation and prediction performance of six models: deep learning (DL), distributed random forest (DRF), extremely randomized trees (XRT), stacked ensemble (SE), gradient boosting machine (GBM) and generalized linear model (GLM), with which we analyzed the key factors driving heavy metal accumulation in crop grains. The results showed that the optimal prediction model differed for different HMs. The DL model provided the best prediction for Cr, Pb, As and Hg, while the GBM model achieved the highest prediction accuracy for Cd. Feature importance and SHAP analysis revealed that the application of organic fertilizer and plant type were the key factors influencing HMs accumulation in crop grains. Organic fertilizer application, soil HMs concentration, organic fertilizer HMs concentration, and sand content were positively correlated with HMs concentration in crop grains, while cation exchange capacity, pH, organic matter, and clay content were negatively correlated with heavy metal concentration in crop grains. In summary, the DL and GBM models performed better in predicting heavy metal concentrations in crop grains. The input risk of heavy metals during organic fertilizer application must be strictly controlled.

随着工业化进程的加速和农业活动的频繁,作物重金属污染已成为当前农业生产中一个不容忽视的问题。本研究基于54篇文献的791组数据,利用自动机器学习模型对作物籽粒重金属浓度进行预测。研究选取有机肥施用量、有机肥重金属浓度、土壤重金属浓度、有机质、酸碱度、阳离子交换量、黏粒含量、砂粒含量、粉粒含量和作物类型10种影响因素作为输入变量,选取铬(Cr)、镉(Cd)、铅(Pb)、砷(As)和汞(Hg)在作物籽粒中的浓度作为输出变量,评估深度学习(DL)、分布式随机森林(DRF)、极度随机树(XRT)、堆栈集合(SE)、梯度提升机(GBM)和广义线性模型(GLM)6种模型的模拟预测效果,并分析影响作物籽粒重金属累积的关键因素。结果表明: 不同重金属的最佳预测模型存在差异。DL模型对Cr、Pb、As和Hg的预测效果最优,而GBM模型对Cd的预测精度最高。特征重要性和SHAP分析显示,有机肥施用量与作物类型是影响作物籽粒重金属累积的关键因素,有机肥施用量、土壤重金属浓度、有机肥重金属浓度、砂粒含量与作物籽粒重金属浓度呈正相关,阳离子交换量、酸碱度、有机质、黏粒含量与作物籽粒重金属浓度呈负相关。综上,DL和GBM模型在预测作物籽粒重金属浓度中具有优势,生产中需严格控制有机肥施用带来的重金属输入风险。.

Keywords: grain; heavy metal; machine learning; organic fertilizer; prediction.

MeSH terms

  • Cadmium / analysis
  • Crops, Agricultural* / chemistry
  • Edible Grain* / chemistry
  • Environmental Monitoring* / methods
  • Fertilizers
  • Machine Learning*
  • Metals, Heavy* / analysis
  • Soil / chemistry
  • Soil Pollutants* / analysis

Substances

  • Metals, Heavy
  • Soil Pollutants
  • Fertilizers
  • Soil
  • Cadmium