Review Articles

A review of distributed statistical inference

Yuan Gao ,

School of Statistics and Key Laboratory of Advanced Theory and Application in Statistics and Data Science – MOE, East China Normal University, Shanghai, People’s Republic of China

Weidong Liu ,

School of Mathematical Sciences and Key Lab of Articial Intelligence – MOE, Shanghai Jiao Tong University, Shanghai, People’s Republic of China

Hansheng Wang ,

Guanghua School of Management, Peking University, Beijing, People’s Republic of China

Xiaozhou Wang ,

School of Statistics and Key Laboratory of Advanced Theory and Application in Statistics and Data Science – MOE, East China Normal University, Shanghai, People’s Republic of China

Yibo Yan ,

School of Statistics and Key Laboratory of Advanced Theory and Application in Statistics and Data Science – MOE, East China Normal University, Shanghai, People’s Republic of China

Riquan Zhang

School of Statistics and Key Laboratory of Advanced Theory and Application in Statistics and Data Science – MOE, East China Normal University, Shanghai, People’s Republic of China

Pages 89-99 | Received 02 Sep. 2020, Accepted 01 Aug. 2021, Published online: 13 Sep. 2021,
  • Abstract
  • Full Article
  • References
  • Citations

The rapid emergence of massive datasets in various fields poses a serious challenge to traditional statistical methods. Meanwhile, it provides opportunities for researchers to develop novel algorithms. Inspired by the idea of divide-and-conquer, various distributed frameworks for statistical estimation and inference have been proposed. They were developed to deal with large-scale statistical optimization problems. This paper aims to provide a comprehensive review for related literature. It includes parametric models, nonparametric models, and other frequently used models. Their key ideas and theoretical properties are summarized. The trade-off between communication cost and estimate precision together with other concerns are discussed.

References

  • Agarwal, N., Suresh, A. T., Yu, F., Kumar, S., & McMahan, H. B. (2018). cpsgd: Communication-efficient and differentially-private distributed sgd. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (pp. 7575–7586). Curran Associates Inc.
  • Battey, H., Fan, J., Liu, H., Lu, J., & Zhu, Z. (2018). Distributed testing and estimation under sparse high dimensional models. Annals of Statistics, 46(3), 135200. https://doi.org/10.1214/17-AOS1587
  • Berlinet, A., & Thomas-Agnan, C. (2011). Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media.
  • Bickel, P. J., Götze, F., & van Zwet, W. R. (2012). Resampling fewer than n observations: Gains, losses, and remedies for losses. In Selected works of Willem van Zwet (pp. 267–297). Springer.
  • Chang, X., Lin, S.-B., & Wang, Y. (2017). Divide and conquer local average regression. Electronic Journal of Statistics, 11(1), 13261350. https://doi.org/10.1214/17-EJS1265
  • Chang, X., Lin, S.-B., & Zhou, D.-X. (2017). Distributed semi-supervised learning with kernel ridge regression. The Journal of Machine Learning Research, 18(1), 14931514. https://jmlr.org/papers/volume18/16-601/16-601.pdf
  • Chen, L., & Zhou, Y. (2020). Quantile regression in big data: A divide and conquer based strategy. Computational Statistics & Data Analysis, 144, 106892. https://doi.org/10.1016/j.csda.2019.106892
  • Chen, X., Lee, J. D., Li, H., & Yang, Y. (2021). Distributed estimation for principal component analysis: An enlarged eigenspace analysis. Journal of the American Statistical Association, 131. https://doi.org/10.1080/01621459.2021.1886937
  • Chen, X., Liu, W., Mao, X., & Yang, Z. (2020). Distributed high-dimensional regression under a quantile loss function. Journal of Machine Learning Research, 21(182), 143. https://jmlr.org/papers/volume21/20-297/20-297.pdf
  • Chen, X., Liu, W., & Zhang, Y. (2019). Quantile regression under memory constraint. The Annals of Statistics, 47(6), 32443273. https://doi.org/10.1214/18-AOS1777
  • Chen, X., Liu, W., & Zhang, Y. (2021b). First-order newton-type estimator for distributed estimation and inference. Journal of the American Statistical Association, 140. https://doi.org/10.1080/01621459.2021.1891925
  • Chen, X., & Xie, M.-G. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, 16551684. http://dx.doi.org/10.5705/ss.2013.088
  • Duchi, J. C., Jordan, M. I., Wainwright, M. J., & Zhang, Y. (2014). Optimality guarantees for distributed statistical estimation. arXiv preprint arXiv:1405.0782.
  • Dwork, C. (2008). Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation (pp. 1–19). Springer.
  • Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its applications: Monographs on statistics and applied probability 66 (Vol. 66). CRC Press.
  • Fan, J., Guo, Y., & Wang, K. (2019a). Communication-efficient accurate statistical estimation. arXiv preprint arXiv:1906.04870.
  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 13481360. https://doi.org/10.1198/016214501753382273
  • Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849911. https://doi.org/10.1111/rssb.2008.70.issue-5
  • Fan, J., Wang, D., Wang, K., & Zhu, Z. (2019b). Distributed estimation of principal eigenspaces. The Annals of Statistics, 47(6), 30093031. https://doi.org/10.1214/18-AOS1713
  • Guo, Z.-C., Lin, S.-B., & Shi, L. (2019). Distributed learning with multi-penalty regularization. Applied and Computational Harmonic Analysis, 46(3), 478499. https://doi.org/10.1016/j.acha.2017.06.001
  • Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: the lasso and generalizations. CRC press.
  • Huang, C., & Huo, X. (2015). A distributed one-step estimator. arXiv preprint arXiv:1511.01443.
  • Javanmard, A., & Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. The Journal of Machine Learning Research, 15(1), 28692909https://jmlr.csail.mit.edu/papers/volume15/javanmard14a/javanmard14a.pdf
  • Jordan, M. I., Lee, J. D., & Yang, Y. (2019). Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 114(526), 668681. https://doi.org/10.1080/01621459.2018.1429274
  • Kaplan, D. M. (2019). Optimal smoothing in divide-and-conquer for big data. Technical report, working paper available at https://faculty.missouri.edu/kaplandm.
  • Kleiner, A., Talwalkar, A., Sarkar, P., & Jordan, M. I. (2014). A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 795816. https://doi.org/10.1111/rssb.2014.76.issue-4
  • Koenker (2005). Quantile regression (Econometric society monographs; no. 38). Cambridge University Press.
  • Lee, J. D., Liu, Q., Sun, Y., & Taylor, J. E. (2017). Communication-efficient sparse regression. The Journal of Machine Learning Research, 18(1), 115144https://jmlr.csail.mit.edu/papers/volume18/16-002/16-002.pdf
  • Lehmann, E. L., & Casella, G. (2006). Theory of point estimation. Springer Science & Business Media.
  • Li, R., Lin, D. K., & Li, B. (2013). Statistical inference in massive data sets. Applied Stochastic Models in Business and Industry, 29(5), 399409. https://doi.org/10.1002/asmb.1927
  • Li, X., Li, R., Xia, Z., & Xu, C. (2020). Distributed feature screening via componentwise debiasing. Journal of Machine Learning Research, 21(24), 132https://jmlr.csail.mit.edu/papers/volume21/19-537/19-537.pdf
  • Lian, H., & Fan, Z. (2018). Divide-and-conquer for debiased l1-norm support vector machine in ultra-high dimensions. Journal of Machine Learning Research, 18(1), 126https://jmlr.org/papers/volume18/17-343/17-343.pdf
  • Lian, H., Zhao, K., & Lv, S. (2019). Projected spline estimation of the nonparametric function in high-dimensional partially linear models for massive data. Annals of Statistics, 47(5), 29222949. https://doi.org/10.1214/18-AOS1769
  • Lin, S.-B., Guo, X., & Zhou, D.-X. (2017). Distributed learning with regularized least squares. The Journal of Machine Learning Research, 18(1), 32023232https://www.jmlr.org/papers/volume18/15-586/15-586.pdf
  • Lin, S.-B., Wang, D., & Zhou, D.-X. (2020). Distributed kernel ridge regression with communications. arXiv preprint arXiv:2003.12210.
  • Lin, S.-B., & Zhou, D.-X. (2018). Distributed kernel-based gradient descent algorithms. Constructive Approximation, 47(2), 249276. https://doi.org/10.1007/s00365-017-9379-1
  • Liu, D., Liu, R. Y., & Xie, M. (2015). Multivariate meta-analysis of heterogeneous studies using only summary statistics: efficiency and robustness. Journal of the American Statistical Association, 110(509), 326340. https://doi.org/10.1080/01621459.2014.899235
  • Liu, Q., & Ihler, A. T. (2014). Distributed estimation, information loss and exponential families. In Advances in neural information processing systems (pp. 1098–1106). MIT Press.
  • Lv, S., & Lian, H. (2017). Debiased distributed learning for sparse partial linear models in high dimensions. arXiv preprint arXiv:1708.05487.
  • Minsker, S. (2019). Distributed statistical estimation and rates of convergence in normal approximation. Electronic Journal of Statistics, 13(2), 52135252. https://doi.org/10.1214/19-EJS1647
  • Mücke, N., & Blanchard, G. (2018). Parallelizing spectrally regularized kernel algorithms. The Journal of Machine Learning Research, 19(1), 10691097https://www.jmlr.org/papers/volume19/16-569/16-569.pdf
  • Politis, D. N., Romano, J. P., & Wolf, M. (1999). Subsampling. Springer Science & Business Media.
  • Qiao, X., Duan, J., & Cheng, G. (2019). Rates of convergence for large-scale nearest neighbor classification. In Advances in neural information processing systems (pp. 10768–10779). Curran Associates Inc.
  • Rosenblatt, J. D., & Nadler, B. (2016). On the optimality of averaging in distributed statistical learning. Information and Inference: A Journal of the IMA, 5(4), 379404. https://doi.org/10.1093/imaiai/iaw013
  • Saunders, C., Gammerman, A., & Vovk, V. (1998). Ridge regression learning algorithm in dual variables.
  • Shamir, O., Srebro, N., & Zhang, T. (2014). Communication-efficient distributed optimization using an approximate newton-type method. In International conference on machine learning (pp. 1000–1008). JMLR.org.
  • Shang, Z., & Cheng, G. (2017). Computational limits of a distributed algorithm for smoothing spline. Journal of Machine Learning Research, 18(108), 137https://jmlr.org/papers/volume18/16-289/16-289.pdf
  • Song, Q., & Liang, F. (2015). A split-and-merge bayesian variable selection approach for ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 77(5), 947972. https://doi.org/ 10.1111/rssb.12095.
  • Steinwart, I., Hush, D. R., & Scovel, C. (2009). Optimal rates for regularized least squares regression. In COLT (pp. 79–93). https://www.cs.mcgill.ca/∼colt2009/papers/038.pdf
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
  • Truex, S., Baracaldo, N., Anwar, A., Steinke, T., Ludwig, H., Zhang, R., & Zhou, Y. (2019). A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security (pp. 1–11). Association for Computing Machinery.
  • Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business media.
  • Volgushev, S., Chao, S.-K., & Cheng, G. (2019). Distributed inference for quantile regression processes. The Annals of Statistics, 47(3), 16341662. https://doi.org/10.1214/18-AOS1730
  • Wahba, G. (1990). Spline models for observational data (Vol. 59). Siam.
  • Wang, F., Huang, D., Zhu, Y., & Wang, H. (2020). Efficient estimation for generalized linear models on a distributed system with nonrandomly distributed data. arXiv preprint arXiv:2004.02414.
  • Wang, J., Kolar, M., Srebro, N., & Zhang, T. (2017). Efficient distributed learning with sparsity. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 3636–3645). JMLR.org.
  • Wang, S. (2019). A sharper generalization bound for divide-and-conquer ridge regression. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 5305–5312). AAAI Press.
  • Wang, X., Ishii, H., Du, L., Cheng, P., & Chen, J. (2019). Differential privacy-preserving distributed machine learning. In 2019 IEEE 58th Conference on Decision and Control (CDC) (pp. 7339–7344). IEEE.
  • Wang, X., Yang, Z., Chen, X., & Liu, W. (2019). Distributed inference for linear support vector machine. Journal of Machine Learning Research, 20(113), 141https://www.jmlr.org/papers/volume20/18-801/18-801.pdf
  • Xu, C., Zhang, Y., Li, R., & Wu, X. (2016). On the feasibility of distributed kernel regression for big data. IEEE Transactions on Knowledge and Data Engineering, 28(11), 30413052. https://doi.org/10.1109/TKDE.2016.2594060
  • Xu, G., Shang, Z., & Cheng, G. (2018). Optimal tuning for divide-and-conquer kernel ridge regression with massive data. In Proceedings of Machine Learning Research. PMLR.
  • Xu, M., & Shao, J. (2020). Meta-analysis of independent datasets using constrained generalised method of moments. Statistical Theory and Related Fields, 4(1), 109116. https://doi.org/10.1080/24754269.2019.1630545
  • Yang, J., Mahoney, M. W., Saunders, M. A., & Sun, Y. (2016). Feature-distributed sparse regression: A screen-and-clean approach. In NIPS (pp. 2712–2720). Curran Associates, Inc.
  • Zhang, C.-H., & Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science, 27(4), 576593. https://doi.org/10.1214/12-STS399
  • Zhang, T. (2005). Learning bounds for kernel regression using effective data dimensionality. Neural Computation, 17(9), 20772098. https://doi.org/10.1162/0899766054323008
  • Zhang, Y., Duchi, J., & Wainwright, M. (2015). Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. The Journal of Machine Learning Research, 16(1), 32993340https://jmlr.org/papers/volume16/zhang15d/zhang15d.pdf
  • Zhang, Y., Duchi, J. C., & Wainwright, M. J. (2013). Communication-efficient algorithms for statistical optimization. The Journal of Machine Learning Research, 14(1), 33213363https://www.jmlr.org/papers/volume14/zhang13b/zhang13b.pdf
  • Zhao, T., Cheng, G., & Liu, H. (2016). A partially linear framework for massive heterogeneous data. Annals of Statistics, 44(4), 1400. https://doi.org/10.1214/15-AOS1410
  • Zhao, T., Kolar, M., & Liu, H. (2014). A general framework for robust testing and confidence regions in high-dimensional quantile regression. arXiv preprint arXiv:1412.8724.
  • Zhao, W., Zhang, F., & Lian, H. (2019). Debiasing and distributed estimation for high-dimensional quantile regression. IEEE Transactions on Neural Networks and Learning Systems, 31(7), 25692577. https://doi.org/10.1109/TNNLS.2019.2933467
  • Zhou, L., & Song, P. X.-K. (2017). Scalable and efficient statistical inference with estimating functions in the mapreduce paradigm for big data. arXiv preprint arXiv:1709.04389.
  • Zhu, L.-P., Li, L., Li, R., & Zhu, L.-X. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106(496), 14641475. https://doi.org/10.1198/jasa.2011.tm10563
  • Zhu, X., Li, F., & Wang, H. (2019). Least squares approximation for a distributed system. arXiv preprint arXiv:1908.04904.

To cite this article: Yuan Gao, Weidong Liu, Hansheng Wang, Xiaozhou Wang, Yibo Yan &
Riquan Zhang (2021): A review of distributed statistical inference, Statistical Theory and Related Fields, DOI: 10.1080/24754269.2021.1974158
To link to this article: https://doi.org/10.1080/24754269.2021.1974158