Speakers

Peng Ding, University of California, Berkeley

 

Biography: Peng Ding (Chinese: 丁鹏) is a associate professor in the Department of Statistics, UC Berkeley. He obtained my Ph.D. from the Department of Statistics, Harvard University in May 2015, and worked as a postdoctoral researcher in the Department of Epidemiology, Harvard T. H. Chan School of Public Health until December 2015. Previously, He received my B.S. (Mathematics), B.A. (Economics), and M.S. (Statistics) from Peking University. 

His main research interest is causal inference including:

  1. Design and analysis of randomized experiments: randomization tests, covariate adjustment, rerandomization, clustered experiments
  2. Observational studies: sensitivity analysis for unmeasured confounding, overlap of covariates, integrating multiple data sources
  3. Natural experiments: instrumental variable, difference-in-difference, regression discontinuity
  4. Causal mechanisms: treatment effect variation, principal stratification, mediation, interference

 

Title: To adjust or not to adjust? Estimating the average treatment effect in randomized experiments with missing covariates

Abstract: Complete randomization allows for consistent estimation of the average treatment effect based on the difference in means of the outcomes without strong modeling assumptions on the outcome-generating process. Appropriate use of the pretreatment covariates can further improve the estimation efficiency. However, missingness in covariates is common in experiments and raises an important question: should we adjust for covariates subject to missingness, and if so, how? The unadjusted difference in means is always unbiased. The complete-covariate analysis adjusts for all completely observed covariates and improves the efficiency of the difference in means if at least one completely observed covariate is predictive of the outcome. Then what is the additional gain of adjusting for covariates subject to missingness? A key insight is that the missingness indicators act as fully observed pretreatment covariates as long as missingness is not affected by the treatment, and can thus be used in covariate adjustment to bring additional estimation efficiency. This motivates adding the missingness indicators to the regression adjustment, yielding the missingness-indicator method as a well-known but not so popular strategy in the literature of missing data. We recommend it due to its many advantages. First, it removes the dependence of the regression-adjusted estimators on the imputed values for the missing covariates. Second, it improves the estimation efficiency of the complete- covariate analysis and the regression analysis based on only the imputed covariates. Third, it does not require modeling the missingness mechanism and yields a consistent and efficient estimator even if the missing-data mechanism is related to the missing covariates and unobservable potential outcomes. Lastly, it is easy to implement via standard software packages for least squares. We also propose modifications to the missingness-indicator method based on asymptotic and finite-sample considerations. To reconcile the conflicting recommendations in the missing data literature, we analyze and compare various strategies for analyzing randomized experiments with missing covariates under the design-based framework. This framework treats randomization as the basis for inference and does not impose any modeling assumptions on the outcome-generating process and missing-data mechanism.

 

Keywords: efficiency; imputation; missingness pattern; randomized controlled trial