Fusarium head blight (FHB) is a destructive disease which adversely affects the yield of wheat. The occurrence and epidemic of wheat FHB are closely related to meteorological information. Firstly, by analyzing eight meteorological factors-rainfall (RAIN), average sunshine hours (ASH), average wind speed (AWS), average temperature (AT), highest temperature (HT), lowest temperature (LT), average relative humidity (ARH), and maximum temperature difference (MTD)-specific periods closely related to wheat FHB severity are identified. Based on this, a dataset for wheat FHB severity is constructed. After that, the wheat FHB severity levels are divided into four levels, and actual field data shows that the proportion of data for the high prevalence severity level is relatively small. To address data imbalance, the K-means-synthetic minority over-sampling technique (K-means-SMOTE) method is introduced to increase samples of underrepresented severity levels. Subsequently, a wheat FHB severity prediction model based on K-means-SMOTE and extreme gradient boosting (XGBoost) is constructed. Lastly, by combining the rankings of meteorological factors provided by the model and the biological characteristics of wheat FHB, the number of meteorological factors is reduced from eight to four (AWS 4.24-4.28, RAIN 4.5-4.19, ARH 4.12-4.16, LT 4.19-4.23), the accuracy and recall of the model remained unchanged at 0.8936, the F1 score increased from 0.8851 to 0.8898, and the precision decreased from 0.9249 to 0.9058. Although the precision has slightly decreased, most of the other evaluation indicators of the model remain unchanged or have improved, therefore the model is considered effective. Finally, comparative experiments with eight other models demonstrate the superiority of this approach.
Keywords: Data imbalance; FHB severity prediction; Feature selection; K-means-SMOTE; XGBoost.
© 2025 Sun et al.