Spectral data-driven and machine learning-based modeling of soil total nitrogen content

Spectrochim Acta A Mol Biomol Spectrosc. 2025 Jun 19:343:126583. doi: 10.1016/j.saa.2025.126583. Online ahead of print.

Abstract

Soil total nitrogen (TN) serves as both a fundamental indicator of agricultural fertility and a critical marker for ecological balance and environmental security. Rapid monitoring of soil TN is essential for evaluating soil fertility and guiding precision agriculture in ecologically fragile agro-pastoral transitional zones like Northwest China. A total of 116 farmland soil samples were collected from Jingbian County, China. Six spectral transformations, combined with Correlation Analysis (CA), Competitive Adaptive Reweighted Sampling (CARS) were applied to extract TN-sensitive spectral bands and elucidate spectral mechanisms. PLSR, RF and GBDT were used to establish a prediction model. The study showed that: 1) Soil TN content of the surface soil of farmland in Jingbian County ranged from 0.003 to 0.781 g kg-1, with an average content of 0.266 g kg-1. 2) Soil spectral reflectance decreased gradually with the increase of soil TN content, and soil spectral reflectance was negatively correlated with soil total nitrogen content. However, the trend of spectral reflectance with wavelength is consistent with an overall upward trend; 3) Derivative variability effectively improves spectral sensitivity to target information, and the CA, CARS band screening methods achieve characteristic band screening and reduce data redundancy; 4) The R2 of the calibration and validation sets of the soil TN prediction model built based on Log1/R-CARS-GBDT were 0.92 and 0.89, RMSE was 1.09 and 1.33 g kg-1, and the RPD and RPIQ were 2.01 and 2.62, respectively, which allowed the model to carry out the estimation of TN better. Integration of preprocessing, feature extraction, and modeling significantly improved prediction accuracy, enabling rapid hyperspectral quantification of TN in arid agro-pastoral soils. This framework provides a scientific basis for hyperspectral-based TN monitoring and precision agriculture in ecologically fragile regions, supporting data-driven land management and policy formulation.

Keywords: Agro-pastoral transitional zone; Feature wavelength selection; Machine learning; Soil TN; Vis-NIR spectroscopy.