Review Articles

Combining multiple imperfect data sources for small area estimation: a Bayesian model of provincial fertility rates in Cambodia

Junni L. Zhang ,

Guanghua School of Management, Center for Statistical Science and Center for Data Science, Peking University, Beijing, People's Republic of China

John Bryant

Bayesian Demography Limited, Christchurch, New Zealand

Pages 178-185 | Received 24 Dec. 2018, Accepted 17 Aug. 2019, Published online: 25 Aug. 2019,
  • Abstract
  • Full Article
  • References
  • Citations


Demographic estimation becomes a problem of small area estimation when detailed disaggregation leads to small cell counts. The usual difficulties of small area estimation are compounded when the available data sources contain measurement errors. We present a Bayesian approach to the problem of small area estimation with imperfect data sources. The overall model contains separate submodels for underlying demographic processes and for measurement processes. All unknown quantities in the model, including coverage ratios and demographic rates, are estimated jointly via Markov chain Monte Carlo methods. The approach is illustrated using the example of provincial fertility rates in Cambodia.


  1. Asian Development Bank. (2014). Cambodia country poverty analysis 2014. Manila: Asian Development Bank. [Google Scholar]
  2. Bijak, J., & Bryant, J. (2016). Bayesian demography 250 years after Bayes. Population Studies70(1), 1–19. doi: 10.1080/00324728.2015.1122826 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  3. Brick, J. M., Cervantes, I. F., Lee, S., & Norman, G. (2011). Nonsampling errors in dual frame telephone surveys. Survey Methodology37, 1–12. [Web of Science ®], [Google Scholar]
  4. Bryant, J., & Zhang, J. L. (2018). Bayesian demographic estimation and forecasting. Boca Raton: CRC Press. [Crossref], [Google Scholar]
  5. Christen, P. (2012). Data matching: Concepts and techniques for record linkage, entity resolution, and duplicate detection. New York, NY: Springer Science and Business Media. [Crossref], [Google Scholar]
  6. Dwyer-Lindgren, L., Squires, E. R., Teeple, S., Ikilezi, G., Roberts, D. A., Colombara, D. V., …Lim, S. S. (2018). Small area estimation of under-5 mortality in Bangladesh, Cameroon, Chad, Mozambique, Uganda, and Zambia using spatially misaligned data. Population Health Metrics16(13), 1–15. [Google Scholar]
  7. Fay, R. E. I., & Herriot, R. A. (1979). Estimates of income for small places: An application of James–Stein procedures to census data. Journal of the American Statistical Association74, 269–277. doi: 10.1080/01621459.1979.10482505 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  8. Gelman, A., Carlin, J., Stern, H., Dunson, D. B., Vehtari, A., & Rubin, D. (2014). Bayesian data analysis (3rd ed.). New York, NY: Chapman and Hall. [Google Scholar]
  9. Gelman, A., King, G., & Liu, C. (1998). Not asked and not answered: Multiple imputation for multiple surveys. Journal of the American Statistical Association93, 846–857. doi: 10.1080/01621459.1998.10473737 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  10. Harron, K., Goldstein, H., & Dibben, C. (2016). Methodological developments in data linkage. Hoboken, NJ: Wiley. [Google Scholar]
  11. He, Y., Landrum, M. B., & Zaslavsky, A. M. (2014). Combining information from two data sources with misreporting and incompleteness to assess hospice-use among cancer patients: A multiple imputation approach. Statistics in Medicine33, 3710–3724. doi: 10.1002/sim.6173 [Crossref], [Web of Science ®], [Google Scholar]
  12. Herzog, T. N., Scheuren, F. J., & Winkler, W. E. (2017). Data quality and record linkage techniques. New York, NY: Springer Science and Business Media. [Google Scholar]
  13. Kim, J. K., Park, S., & Kim, S. Y. (2015). Small area estimation combining information from several sources. Survey Methodology41(1), 21–36. [Web of Science ®], [Google Scholar]
  14. Kim, J. K., & Rao, J. N. K. (2012). Combining data from two independent surveys: A model-assisted approach. Biometrika99, 85–100. doi: 10.1093/biomet/asr063 [Crossref][Web of Science ®], [Google Scholar]
  15. Knutson, K., Zhang, W., & Tabnak, F. (2008). Applying the small-area estimation method to estimate a population eligible for breast cancer detection. Preventing Chronic Disease5(1), A10. [Google Scholar]
  16. Lesser, V. M., Newton, L., & Yang, D (2008). Evaluating frames and modes of contact in a study of individuals with disabilities. Paper presented at the Joint Statistical Meetings, Denver, CO. [Google Scholar]
  17. Lohr, S. L., & Raghunathan, T. E. (2017). Combining survey data with other data sources. Statistical Science32(2), 293–312. doi: 10.1214/16-STS584 [Crossref][Web of Science ®], [Google Scholar]
  18. Lohr, S. L., & Rao, J. N. K. (2006). Estimation in multiple-frame surveys. Journal of the American Statistical Association101, 1019–1030. doi: 10.1198/016214506000000195 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  19. Lumley, T. (2004). Analysis of complex survey samples. Journal of Statistical Software9(1), 1–19. [Google Scholar]
  20. Lynch, S. M., & Brown, J. S. (2010). Obtaining multistate life table distributions for highly refined subpopulations from cross-sectional data: A Bayesian extension of Sullivan's method. Demography47(4), 1053–1077. doi: 10.1007/BF03213739 [Crossref][Web of Science ®], [Google Scholar]
  21. Manzi, G., Spiegelhalter, D. J., Turner, R. M., Flowers, J., & Thompson, S. G. (2011). Modelling bias in combining small area prevalence estimates from multiple surveys. Journal of the Royal Statistical Society: Series A174(1), 31–50. doi: 10.1111/j.1467-985X.2010.00648.x [Crossref], [Google Scholar]
  22. Metcalf, P., & Scott, A. (2009). Using multiple frames in health surveys. Statistics in Medicine28, 1512–1523. doi: 10.1002/sim.3566 [Crossref][Web of Science ®], [Google Scholar]
  23. Moultrie, T. A., Dorrington, R., Hill, A., Hill, K., Timæus, I., & Zaba, B. (2013). Tools for demographic estimation. Paris: International Union for the Scientific Study of Population. [Google Scholar]
  24. National Institute of Statistics, Directorate General for Health, & ICF Macro. (2011a). Cambodia demographic and health survey 2010. Author. [Google Scholar]
  25. National Institute of Statistics, Directorate General for Health, & ICF Macro (2011b). Cambodia demographic and health survey 2010 [Dataset] (khir61fl.dta). Author. [Google Scholar]
  26. Pfeffermann, D. (2013). New important developments in small area estimation. Statistical Science28(1), 40–68. doi: 10.1214/12-STS395 [Crossref][Web of Science ®], [Google Scholar]
  27. Prado, R., & West, M. (2010). Time series: Modeling, computation, and inference. Boca Raton, FL: CRC Press. [Crossref], [Google Scholar]
  28. Preston, S., Heuveline, P., & Guillot, M. (2001). Demography: Modelling and measuring population processes. Oxford: Blackwell. [Google Scholar]
  29. Raghunathan, T. E. (2006). Combining information from multiple surveys for assessing health disparities. Allgemeines Statistisches Archiv90, 515–526. doi: 10.1007/s10182-006-0003-0 [Crossref], [Google Scholar]
  30. Raghunathan, T., Xie, D., Schenker, N., Parsons, V., Davis, W., Dodd, K., & Feuer, E. (2007). Combining information from two surveys to estimate county-level prevalence rates of cancer risk factors and screening. Journal of the American Statistical Association102, 474–486. doi: 10.1198/016214506000001293 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  31. Ranalli, M. G., Arcos, A., Rueda, M. D. M., & Teodoro, A. (2016). Calibration estimation in dual-frame surveys. Statistical Methods and Applications25, 321–349. doi: 10.1007/s10260-015-0336-5 [Crossref][Web of Science ®], [Google Scholar]
  32. Rao, J. N. K., & Molina, I. (2015). Small area estimation (2nd ed.). Hoboken: John Wiley & Sons. [Crossref], [Google Scholar]
  33. Schenker, N., Raghunathan, T. E., & Bondarenko, I. (2010). Improving on analyses of self-reported data in a large-scale health survey by using information from an examination-based survey. Statistics in Medicine29, 533–545. [Web of Science ®], [Google Scholar]
  34. Schmertmann, C. P., Cavenaghi, S. M., Assunção, R. M., & Potter, J. E. (2013). Bayes plus Brass: Estimating total fertility for many small areas from sparse census data. Population Studies67(3), 255–273. doi: 10.1080/00324728.2013.795602 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  35. Steorts, R. C., Hall, R., & Fienberg, S. E. (2016). A Bayesian approach to graphical record linkage and de-duplication. Journal of the American Statistical Association111, 1660–1672. doi: 10.1080/01621459.2015.1105807 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  36. Zhang, J. L., Bryant, J., & Nissen, K. (2019). Bayesian small area demography. Survey Methodology45(1), 13–29. [Web of Science ®], [Google Scholar]