Enhancing Healthcare Data Integration: A Machine Learning Approach to Harmonizing Laboratory Labels

Mehmet F Bagci; Samantha R Spierling; Anna L Ritko; Truong Nguyen; Brian D Modena; Yusuf Ozturk

Enhancing Healthcare Data Integration: A Machine Learning Approach to Harmonizing Laboratory Labels

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10:2025:65-73. eCollection 2025.

Authors

Mehmet F Bagci^{1

2}, Samantha R Spierling³, Anna L Ritko⁴, Truong Nguyen¹, Brian D Modena¹, Yusuf Ozturk²

Affiliations

¹ University of California San Diego, ECE Dept., La Jolla, CA 92093.
² San Diego State University, ECE Dept., San Diego, CA 92182.
³ Dept. of Research Development, Scripps Health, CA.
⁴ Dept. of Knowledge Management, Scripps Health, CA.

PMID: 40502251
PMCID: PMC12150698

Abstract

Variations in laboratory test names across healthcare systems-stemming from inconsistent terminologies, abbreviations, misspellings, and assay vendors-pose significant challenges to the integration and analysis of clinical data. These discrepancies hinder interoperability and complicate efforts to extract meaningful insights for both clinical research and patient care. In this study, we propose a machine learning-driven solution, enhanced by natural language processing techniques, to standardize lab test names. By employing feature extraction methods that analyze both string similarity and the distributional properties of test results, we improve the harmonization of test names, resulting in a more robust dataset. Our model achieves a 99% accuracy rate in matching lab names, showcasing the potential of AI-driven approaches in resolving long-standing standardization challenges. Importantly, this method enhances the reliability and consistency of clinical data, which is crucial for ensuring accurate results in large-scale clinical studies and improving the overall efficiency of informatics-based research and diagnostics.