Enhancing Healthcare Data Integration: A Machine Learning Approach to Harmonizing Laboratory Labels

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10:2025:65-73. eCollection 2025.

Abstract

Variations in laboratory test names across healthcare systems-stemming from inconsistent terminologies, abbreviations, misspellings, and assay vendors-pose significant challenges to the integration and analysis of clinical data. These discrepancies hinder interoperability and complicate efforts to extract meaningful insights for both clinical research and patient care. In this study, we propose a machine learning-driven solution, enhanced by natural language processing techniques, to standardize lab test names. By employing feature extraction methods that analyze both string similarity and the distributional properties of test results, we improve the harmonization of test names, resulting in a more robust dataset. Our model achieves a 99% accuracy rate in matching lab names, showcasing the potential of AI-driven approaches in resolving long-standing standardization challenges. Importantly, this method enhances the reliability and consistency of clinical data, which is crucial for ensuring accurate results in large-scale clinical studies and improving the overall efficiency of informatics-based research and diagnostics.