Statistical Methods for Chemical Mixtures: A Roadmap for Practitioners Using Simulation Studies and a Sample Data Analysis in the PROTECT Cohort

Environ Health Perspect. 2025 Jun;133(6):67019. doi: 10.1289/EHP15305. Epub 2025 Jun 19.

Abstract

Background: Quantitative characterization of the health impacts associated with exposure to chemical mixtures has received considerable attention in current environmental and epidemiological studies. With many existing statistical methods and emerging approaches, it is important for practitioners to understand which method is best suited for their inferential goals.

Objective: The goal of this paper is to provide empirical simulation-based evidence regarding performance of mixture methods to help guide researchers on selecting the best available methods to address three scientific questions in mixtures analysis: identifying important components of a mixture, identifying interactions among mixture components, and creating a summary score for risk stratification and prediction.

Methods: We conducted a review and comparison of 11 analytical methods available for use in mixtures research through extensive simulation studies for continuous and binary outcomes. In addition, we carried out an illustrative data analysis using the PROTECT birth cohort from Puerto Rico to examine the associations between exposure to chemical mixtures-metals, polycyclic aromatic hydrocarbons (PAHs), phthalates, and phenols-and birth outcomes.

Results: Our simulation results suggest that the choice of methods depends on the goal of analysis and that there is no clear winner across the board. For selection of important toxicants in the mixtures and for identifying interactions, Elastic net (Enet) by Zou et al., Lasso for Hierarchical Interactions (HierNet) by Bien et al., and selection of nonlinear interactions by a forward stepwise algorithm (SNIF) by Narisetty et al. have the most stable performance across simulation settings. For overall summary or a cumulative measure, we find that using the Super Learner to combine multiple environmental risk scores can lead to improved risk stratification and prediction properties.

Conclusions: We develop an integrated R package "CompMix" that provides a platform for mixtures analysis where the practitioners can implement a pipeline that includes several approaches for mixtures analysis. Our study offers guidelines for selecting appropriate statistical methods for addressing specific scientific questions related to mixtures research. We identify critical gaps where new and better methods are needed. https://doi.org/10.1289/EHP15305.

MeSH terms

  • Cohort Studies
  • Complex Mixtures*
  • Computer Simulation
  • Environmental Exposure* / statistics & numerical data
  • Environmental Pollutants* / toxicity
  • Female
  • Humans
  • Polycyclic Aromatic Hydrocarbons
  • Puerto Rico / epidemiology

Substances

  • Environmental Pollutants
  • Polycyclic Aromatic Hydrocarbons
  • Complex Mixtures