Anomaly Detection and Correction in Dense Functional Data Within Electronic Medical Records

Stat Med. 2024 Oct 30;43(24):4768-4777. doi: 10.1002/sim.10209. Epub 2024 Sep 3.

Abstract

In medical research, the accuracy of data from electronic medical records (EMRs) is critical, particularly when analyzing dense functional data, where anomalies can severely compromise research integrity. Anomalies in EMRs often arise from human errors in data measurement and entry, and increase in frequency with the volume of data. Despite the established methods in computer science, anomaly detection in medical applications remains underdeveloped. We address this deficiency by introducing a novel tool for identifying and correcting anomalies specifically in dense functional EMR data. Our approach utilizes studentized residuals from a mean-shift model, and therefore assumes that the data adheres to a smooth functional trajectory. Additionally, our method is tailored to be conservative, focusing on anomalies that signify actual errors in the data collection process while controlling for false discovery rates and type II errors. To support widespread implementation, we provide a comprehensive R package, ensuring that our methods can be applied in diverse settings. Our methodology's efficacy has been validated through rigorous simulation studies and real-world applications, confirming its ability to accurately identify and correct errors, thus enhancing the reliability and quality of medical data analysis.

Keywords: dense functional data; electronic medical record; false discovery rate; human mistake; penalized spline; studentized residual.

MeSH terms

  • Computer Simulation*
  • Data Accuracy
  • Data Interpretation, Statistical
  • Electronic Health Records*
  • Humans
  • Models, Statistical
  • Reproducibility of Results