A BERT base model for the analysis of Electronic Health Records from diabetic patients

Annu Int Conf IEEE Eng Med Biol Soc. 2024 Jul:2024:1-4. doi: 10.1109/EMBC53108.2024.10782488.

Abstract

The increasing availability of Electronic Health Records (EHRs) and the continuous developments and improvements of deep learning (DL) predictive models are shifting the health care process from a paradigm centered on clinicians' and specialists' specific knowledge to a new one, centered on big databases of patients data. However, the usage of DL models with clinical data is anything but simple, with many limitations due to the availability of labeled data and their inherent characteristics. In the field of Natural Language Processing the BERT model is achieving astonishing results thanks to the pretraining on big unlabelled corpora and its capacity of analysing long sequences of data. Here we propose a BERT base model for the analysis of EHRs sequences. The original BERT model has been adapted to deal with different EHRs modalities, introducing also a state vector representing the patient at the beginning of the sequence. The model has been trained with 5 years of data of more than 200.000 diabetic patients in Catalunya (Spain) using diagnosis codes, drugs prescriptions, clinical variables and laboratory results. The proposed embedding model improves the AUROC of the baseline models for different clinical tasks.

MeSH terms

  • Deep Learning
  • Diabetes Mellitus* / diagnosis
  • Electronic Health Records*
  • Humans
  • Natural Language Processing