Review Articles

Domain estimation under informative linkage

Ray Chambers ,

School of Mathematics and Applied Statistics, National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, Australia

ray@uow.edu.au

Nicola Salvati ,

Dipartimento di Economia e Management, Universita di Pisa, Pisa, Italy

Enrico Fabrizi ,

Dipartimento di Scienze Economiche e Sociali, Universita Cattolica del S. Cuore, Milan, Italy

Andrea Diniz da Silva

Instituto Brasileiro de Geografia e Estatística & Escola Nacional de Ciências Estatísticas – ENCE, Rio de Janeiro, Brazil

Pages 90-102 | Received 01 Dec. 2018, Accepted 05 Aug. 2019, Published online: 15 Aug. 2019,
  • Abstract
  • Full Article
  • References
  • Citations

ABSTRACT

A standard assumption when modelling linked sample data is that the stochastic properties of the linking process and process underpinning the population values of the response variable are independent of one another. This is often referred to as non-informative linkage. But what if linkage errors are informative? In this paper, we provide results from two simulation experiments that explore two potential informative linking scenarios. The first is where the choice of sample record to link is dependent on the response; and the second is where the probability of correct linkage is dependent on the response. We focus on the important and widely applicable problem of estimation of domain means given linked data, and provide empirical evidence that while standard domain estimation methods can be substantially biased in the presence of informative linkage errors, an alternative estimation method, based on a Gaussian approximation to a maximum likelihood estimator that allows for non-informative linkage error, performs well.

References

  1. Breiman, L. (1994). Bagging predictors (Technical Report No. 421). Berkeley: Department of Statistics University of California. [Google Scholar]
  2. Breiman, L. (1998). Arcing classifiers. The Annals of Statistics26, 801-849. doi: 10.1214/aos/1024691079 [Crossref][Web of Science ®], [Google Scholar]
  3. Chambers, R. (2009). Regression analysis of probability-linked data. Statisphere, Official Statistics Research Series, Volume 4. [Google Scholar]
  4. Chambers, R., & Diniz da Silva, A. (2019). Improved secondary analysis of linked data. To Appear in Journal of the Royal Statistical Society Series A. doi:10.1111/rssa.12477. [Crossref], [Google Scholar]
  5. Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association64, 1183–1210. doi: 10.1080/01621459.1969.10501049 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  6. Kim, G., & Chambers, R. (2012). Regression analysis under incomplete linkage. Computational Statistics and Data Analysis56, 2756–2770. doi: 10.1016/j.csda.2012.02.026 [Crossref][Web of Science ®], [Google Scholar]
  7. Lahiri, P., & Larsen, M. D. (2005). Regression analysis with linked data. Journal of the American Statistical Association100, 222–230. doi: 10.1198/016214504000001277 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  8. Prasad, N. G. N., & Rao, J. N. K. (1990). The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association85, 163–171. doi: 10.1080/01621459.1990.10475320 [Taylor & Francis Online][Web of Science ®], [Google Scholar]