Data anomaly repair method based on fuzzy voting and multi-segment interpolation

Yanling Lv; Qingdong Han; Shulei Xue

doi:10.1038/s41598-025-05951-9

Data anomaly repair method based on fuzzy voting and multi-segment interpolation

Sci Rep. 2025 Jul 1;15(1):20505. doi: 10.1038/s41598-025-05951-9.

Authors

Yanling Lv¹, Qingdong Han², Shulei Xue²

Affiliations

¹ School of Electrical and Electronic Engineering, Harbin University of Science and Technology, Harbin, 150080, China. yanling0828@hrbust.edu.cn.
² School of Electrical and Electronic Engineering, Harbin University of Science and Technology, Harbin, 150080, China.

Abstract

Wind turbines are often situated in remote areas under harsh environmental conditions, where external noise and electromagnetic interference can corrupt the data, negatively impacting downstream tasks such as predictive alerts and diagnostics. Consequently, this paper proposes a comprehensive data processing workflow, encompassing both anomaly detection and data interpolation, to preprocess data for wind farms effectively. Firstly, an outlier detection method based on fuzzy voting theory is proposed, utilizing multiple anomaly detectors to ensure accurate detection of outliers within voluminous datasets. Secondly, a multi-segment data interpolation method based on segmented recognition is introduced. This method captures statistical features of the dataset to establish dynamic thresholds for identifying the upper limits of missing segments. For middle gaps, interpolation is performed using forward-backward LOESS, while large gaps are filled using thermal card filling based on similar trend recognition. This approach not only enhances the quality of data interpolation but also optimally balances the training time cost. Finally, the proposed method was validated using real-world wind field data. The results of the analysis demonstrate that compared to LSTM and other interpolation methods, the multi-segment interpolation approach achieved significant improvements in performance metrics, with MAE, MSRE, and RSE reduced by 24%, 7.1%, and 8.2%, respectively, indicating a notable enhancement in data quality. After completing the full data processing workflow, the wind field data showed a substantial improvement in model performance: the test set F1 score of the DLinear model increased by 3.8-19.1%, and Accuracy improved by 2.3-13.3% compared to the unprocessed data. These results highlight the enhanced precision and stability of the early warning model, along with faster convergence speeds.