Training a model for estimating leukocyte composition using whole-blood DNA methylation and cell counts as reference

Epigenomics. 2017 Jan;9(1):13-20. doi: 10.2217/epi-2016-0091. Epub 2016 Nov 25.

Abstract

Aim: Whole-blood DNA methylation depends on the underlying leukocyte composition and confounding hereby is a major concern in epigenome-wide association studies. Cell counts are often missing or may not be feasible. Computational approaches estimate leukocyte composition from DNA methylation based on reference datasets of purified leukocytes. We explored the possibility to train such a model on whole-blood DNA methylation and cell counts without the need for purification.

Materials & methods: Using whole-blood DNA methylation and corresponding five-part cell counts from 2445 participants from the London Life Sciences Prospective Population Study, a model was trained on a subset of 175 subjects and evaluated on the remaining.

Results: Correlations between cell counts and estimated cell proportions were high (neutrophils 0.85, eosinophils 0.88, basophils 0.02, lymphocytes 0.84, monocytes 0.55) and estimated proportions explained more variance in whole-blood DNA methylation levels than counts.

Conclusion: Our model provided precise estimates for the common cell types.

Keywords: DNA methylation; Infinium 450K; KAROLA; LOLIPOP; estimation of cell proportions; leukocyte composition; white-blood cell distribution.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Biomarkers / blood
  • Coronary Disease / blood
  • DNA Methylation*
  • Female
  • Humans
  • Leukocyte Count / methods
  • Leukocyte Count / standards
  • Leukocytes / classification*
  • Leukocytes / metabolism
  • Male
  • Middle Aged
  • Reference Standards

Substances

  • Biomarkers