Review Articles

Nearest neighbour imputation under single index models

Jun Shao ,

School of Statistics, East China Normal University, Shanghai, People's Republic of China; Department of Statistics, University of Wisconsin, Madison, WI, USA

Lei Wang

School of Statistics and Data Science & LPMC, Nankai University, Tianjin, People's Republic of China

lwangstat@nankai.edu.cn

Pages 208-212 | Received 22 Jul. 2021, Accepted 22 Jul. 2021, Published online: 22 Jul. 2021,
  • Abstract
  • Full Article
  • References
  • Citations

ABSTRACT

A popular imputation method used to compensate for item nonresponse in sample surveys is the nearest neighbour imputation (NNI) method utilising a covariate to defined neighbours. When the covariate is multivariate, however, NNI suffers the well-known curse of dimensionality and gives unstable results. As a remedy, we propose a single-index NNI when the conditional mean of response given covariates follows a single index model. For estimating the population mean or quantiles, we establish the consistency and asymptotic normality of the single-index NNI estimators. Some limited simulation results are presented to examine the finite-sample performance of the proposed estimator of population mean.

References

  1. Chen, J., & Shao, J. (2000). Nearest neighbor imputation for survey data. Journal of Official Statistics16, 113–132. [Google Scholar]
  2. Chen, J., & Shao, J. (2001). Jackknife variance estimation for nearest-neighbor imputation. Journal of the American Statistical Association96, 260–269. doi: 10.1198/016214501750332839 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  3. Cook, R. D., & Weisberg, S. (1991). Discussion of ‘Sliced inverse regression for dimension reduction’. Journal of the American Statistical Association86, 28–33. [Crossref][Web of Science ®], [Google Scholar]
  4. Farber, J. E., & Griffin, R. (1998). A comparison of alternative estimation methodologies for Census 2000. Proceedings of the section on survey research methods (pp. 629–634). American Statistical Association. [Google Scholar]
  5. Fay, R. E. (1999). Theory and application of nearest neighbor imputation in Census 2000. Proceedings of the section on survey research methods (pp. 112–121). American Statistical Association. [Google Scholar]
  6. Kalton, G., & Kasprzyk, D. (1986). The treatment of missing data. Survey Methodology12, 1–16. [Google Scholar]
  7. Kosorok, M. R. (1999). Two-sample quantile tests under general conditions. Biometrika86, 909–921. doi: 10.1093/biomet/86.4.909 [Crossref][Web of Science ®], [Google Scholar]
  8. Krewski, D., & Rao, J. N. K. (1981). Inference from stratified samples: Properties of the linearization, jackknife and balanced repeated replication methods. The Annals of Statistics9, 1010–1019. doi: 10.1214/aos/1176345580 [Crossref][Web of Science ®], [Google Scholar]
  9. Lee, H., Rancourt, E., & Särndal, C. E. (1994). Experiments with variance estimation from survey data with imputed values. Journal of Official Statistics10, 231–243. [Google Scholar]
  10. Li, K. C., & Duan, N. (1991). Regression analysis under link violation. The Annals of Statistics17, 1009–1052. doi: 10.1214/aos/1176347254 [Crossref][Web of Science ®], [Google Scholar]
  11. Little, R. J. (1995). Modeling the dropout mechanism in repeated-measures studies. Journal of the American Statistical Association90, 1112–1121. doi: 10.1080/01621459.1995.10476615 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  12. Montaquila, J. M., & Ponikowski, C. H. (1993). Comparison of methods for imputing missing responses in an establishment survey. Proceedings of the section on survey research methods (pp. 446–451). American Statistical Association. [Google Scholar]
  13. Rancourt, E. (1999). Estimation with nearest neighbor imputation at statistics Canada. Proceedings of the section on survey research methods (pp. 131–138). American Statistical Association. [Google Scholar]
  14. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. [Crossref], [Google Scholar]
  15. Schenker, N., & Welsh, A. H. (1988). Asymptotic results for multiple imputation. The Annals of Statistics16, 1550–1566. doi: 10.1214/aos/1176351053 [Crossref][Web of Science ®], [Google Scholar]
  16. Sedransk, J. (1985). The objective and practice of imputation. Proceedings of the first annual research conference (pp. 445–452). Washington, DC: Bureau of the Census. [Google Scholar]
  17. Shao, J., & Wang, H. (2008). Confidence intervals based on survey data with nearest neighbor imputation. Statistica Sinica18, 281–297. [Web of Science ®], [Google Scholar]
  18. Valliant, R. (1993). Poststratification and conditional variance estimation. Journal of the American Statistical Association88, 89–96. [Taylor & Francis Online][Web of Science ®], [Google Scholar]