Objectives: Family data are a valuable data source in bioinformatic research. This is because family members often share common genetic and environmental exposures. Collecting this family data is traditionally very labor intensive but advances in electronic health record (EHR) data mining has proven useful when identifying pedigrees linked to longitudinal health histories. These are called e-pedigrees. Unfortunately, e-pedigrees tend to miss the oldest patients who inherently have the longest and richest health histories. A good source of family data from older generations includes obituaries, as they have a formulaic nature making them a good candidate for natural language processing (NLP) that can extract relationships to the decedent. While there have been several studies on obtaining such data from obituaries, we demonstrate for the first time approaches that tie that information to an EHR.
Methods: Natural language processing extraction resulted in 8 166 534 family members being abstracted from 567 279 obituaries published in the state of Wisconsin. After matching decedent and family members to patients in the EHR, we identified 200 033 unique patients that were put in 53 640 pedigrees.
Results: The largest pedigree consisted of 21 individuals. Heritability of adult height was quantified (H2=0.51±0.04, P<1.00e-07) demonstrating these data's use in genetic research. The heritability data, coupled with overlapping data in a biobank, suggested 80%-90% of familial relationships were accurately defined.
Conclusion: The totality of these findings demonstrate obituaries with the oldest people in society can be highly informative for bioinformatic research.
Availability and implementation: Code is available on GitHub at https://github.com/jgmayer672/ObituaryNLP.
Keywords: e-pedigree; electronic health records; genetics; natural language processing; pedigree.
© The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.