Two-stage multiple imputation with a longitudinal composite variable

BMC Med Res Methodol. 2025 May 6;25(1):124. doi: 10.1186/s12874-025-02555-9.

Abstract

Background: Missing data are common in longitudinal studies. Multiple imputation (MI) is widely used to handle missing data. However, most of the MI methods assume various missing data types as missing at random (MAR) in imputation. Two-stage MI is a flexible method that accounts for two types of missing data in a two-step process, allowing researchers to employ diverse assumptions regarding the mechanisms underlying the missing data. This method has immense potential yet limited application and extension within the field.

Methods: We evaluated the performance of two-stage MI in a novel context, imputing a composite variable constructed from several continuous and binary components in the longitudinal setting while handling missing data due to MAR and missing not at random (MNAR). Additionally, we compared three fully conditional specification (FCS) methods within the two-stage MI framework. Simulation studies were conducted using a longitudinal dataset that mimicked a cohort study. Sensitivity analysis was performed with various ignorability assumptions.

Results: In simulation studies, the imputation models within two-stage MI, assuming appropriate ignorability assumptions, exhibited the smallest bias and achieved optimal coverage probabilities for the means, slopes across different time points, and hazard ratios for mortality related to the composite variable. The FCS methods that incorporated longitudinal information yielded the best performance in most scenarios.

Conclusions: In the context of a longitudinal composite variable with missing values due to various missing mechanisms, the selection of imputation methods and ignorability assumptions plays an important role within the two-stage MI framework.

Keywords: Composite variable; Missing data; Missing not at random; Multiple imputation.

MeSH terms

  • Algorithms
  • Bias
  • Computer Simulation
  • Data Interpretation, Statistical
  • Humans
  • Longitudinal Studies
  • Models, Statistical*