Review Articles

Communication-efficient distributed statistical inference on zero-inflated Poisson models

Ran Wan ,

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China

Yang Bai

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China

Pages | Received 12 Feb. 2023, Accepted 21 Sep. 2023, Published online: 30 Oct. 2023,
  • Abstract
  • Full Article
  • References
  • Citations

Zero-inflated count outcomes are common in many studies, such as counting claim frequency in the insurance industry in which identifying and understanding excessive zeros are of interest. Moreover, with the progress of data collecting and storage techniques, the amount of data is too massive to be stored or processed by a single node or branch. Hence, to develop distributed data analysis is blossoming. In this paper, several communication-efficient distributed zero-inflated Poisson regression algorithms are developed to analyse such kind of large-scale zero-inflated data. Both asymptotic properties of the proposed estimators and algorithm complexities are well studied and conducted. Various simulation studies demonstrate that our proposed method and algorithm work well and efficiently. Finally, in the case study, we apply our proposed algorithms to a car insurance data from Kaggle.

References

  • Cohen, A. C. (1963). Estimation in mixtures of discrete distributions. In Proceedings of the international symposium on discrete distributions (pp. 373–378). Montreal. 
  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological)39(1), 1–22.  
  • Gu, D. (2008). Distributed EM algorithm for Gaussian mixtures in sensor networks. IEEE Transactions on Neural Networks19(7), 1154–1166. https://doi.org/10.1109/TNN.2008.915110  
  • Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics56(4), 1030–1039. https://doi.org/10.1111/j.0006-341X.2000.01030.x 
  • Johnson, N. L., & Kotz, S. (1970). Distributions in statistics: Discrete distributions. Journal of the Royal Statistical Society: Series A (Statistics in Society)133(3), 482–483.  
  • Jordan, M. I., Lee, J. D., & Yang, Y. (2018). Communication-efficient distributed statistical inference. Journal of the American Statistical Association114(526), 668–681. https://doi.org/10.1080/01621459.2018.1429274 
  • Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics34(1), 1–14. https://doi.org/10.2307/1269547 
  • Lee, A. H., Wang, K., Scott, J. A., Yau, K. K., & McLachlan, G. J. (2006). Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros. Statistical Methods in Medical Research15(1), 47–61. https://doi.org/10.1191/0962280206sm429oa 
  • Mota, J. F., Xavier, J. M., Aguiar, P. M., & Püschel, M. (2013). D-ADMM: A communication-efficient distributed algorithm for separable optimization. IEEE Transactions on Signal Processing61(10), 2718–2723. https://doi.org/10.1109/TSP.2013.2254478
  • Nowak, R. D. (2003). Distributed EM algorithms for density estimation and clustering in sensor networks. IEEE Transactions on Signal Processing51(8), 2245–2253. https://doi.org/10.1109/TSP.2003.814623  
  • Redner, R. A., & Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review26(2), 195–239. https://doi.org/10.1137/1026034 
  • Shamir, O., Srebro, N., & Zhang, T. (2014). Communication-efficient distributed optimization using an approximate newton-type method. In International conference on machine learning (pp. 1000–1008).  
  • Tang, Y., Xiang, L., & Zhu, Z. (2014). Risk factor selection in rate making: EM adaptive LASSO for zero-inflated poisson regression models. Risk Analysis34(6), 1112–1127. https://doi.org/10.1111/risa.2014.34.issue-6 
  • Wu, C. J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics11(1), 95–103. https://doi.org/10.1214/aos/1176346060  
  • Zangwill, W. I. (1969). Nonlinear programming: A unified approach (Vol. 52). Prentice-Hall. 
  • Zhang, Y., Duchi, J. C., & Wainwright, M. J. (2013). Communication-efficient algorithms for statistical optimization. The Journal of Machine Learning Research14(1), 3321–3363.  
  • Zhu, X., Li, F., & Wang, H. (2021). Least squares approximation for a distributed system. Journal of Computational and Graphical Statistics30(4), 1004–1018. https://doi.org/10.1080/10618600.2021.1923517  

To cite this article: Ran Wan & Yang Bai (2024) Communication-efficient distributed statisticalinference on zero-inflated Poisson models, Statistical Theory and Related Fields, 8:2, 81-106,DOI: 10.1080/24754269.2023.2263721

To link to this article: https://doi.org/10.1080/24754269.2023.2263721