A review of distributed statistical inference

ISSN 2475-4269

CN 31-2182/O1

Weidong Liu ,

School of Mathematical Sciences and Key Lab of Articial Intelligence – MOE, Shanghai Jiao Tong University, Shanghai, People’s Republic of China

Hansheng Wang ,

Guanghua School of Management, Peking University, Beijing, People’s Republic of China

Xiaozhou Wang ,

School of Statistics and Key Laboratory of Advanced Theory and Application in Statistics and Data Science – MOE, East China Normal University, Shanghai, People’s Republic of China

Yibo Yan ,

School of Statistics and Key Laboratory of Advanced Theory and Application in Statistics and Data Science – MOE, East China Normal University, Shanghai, People’s Republic of China

Riquan Zhang

School of Statistics and Key Laboratory of Advanced Theory and Application in Statistics and Data Science – MOE, East China Normal University, Shanghai, People’s Republic of China

Pages 89-99 | Received 02 Sep. 2020, Accepted 01 Aug. 2021, Published online: 13 Sep. 2021,

Abstract
Full Article
References
Citations

The rapid emergence of massive datasets in various fields poses a serious challenge to traditional statistical methods. Meanwhile, it provides opportunities for researchers to develop novel algorithms. Inspired by the idea of divide-and-conquer, various distributed frameworks for statistical estimation and inference have been proposed. They were developed to deal with large-scale statistical optimization problems. This paper aims to provide a comprehensive review for related literature. It includes parametric models, nonparametric models, and other frequently used models. Their key ideas and theoretical properties are summarized. The trade-off between communication cost and estimate precision together with other concerns are discussed.

References

Agarwal, N., Suresh, A. T., Yu, F., Kumar, S., & McMahan, H. B. (2018). cpsgd: Communication-efficient and differentially-private distributed sgd. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (pp. 7575–7586). Curran Associates Inc.
Battey, H., Fan, J., Liu, H., Lu, J., & Zhu, Z. (2018). Distributed testing and estimation under sparse high dimensional models. Annals of Statistics, 46(3), 1352–00. https://doi.org/10.1214/17-AOS1587
Berlinet, A., & Thomas-Agnan, C. (2011). Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media.
Bickel, P. J., Götze, F., & van Zwet, W. R. (2012). Resampling fewer than n observations: Gains, losses, and remedies for losses. In Selected works of Willem van Zwet (pp. 267–297). Springer.
Chang, X., Lin, S.-B., & Wang, Y. (2017). Divide and conquer local average regression. Electronic Journal of Statistics, 11(1), 1326–1350. https://doi.org/10.1214/17-EJS1265
Chang, X., Lin, S.-B., & Zhou, D.-X. (2017). Distributed semi-supervised learning with kernel ridge regression. The Journal of Machine Learning Research, 18(1), 1493–1514. https://jmlr.org/papers/volume18/16-601/16-601.pdf
Chen, L., & Zhou, Y. (2020). Quantile regression in big data: A divide and conquer based strategy. Computational Statistics & Data Analysis, 144, 106892. https://doi.org/10.1016/j.csda.2019.106892
Chen, X., Lee, J. D., Li, H., & Yang, Y. (2021). Distributed estimation for principal component analysis: An enlarged eigenspace analysis. Journal of the American Statistical Association, 1–31. https://doi.org/10.1080/01621459.2021.1886937
Chen, X., Liu, W., Mao, X., & Yang, Z. (2020). Distributed high-dimensional regression under a quantile loss function. Journal of Machine Learning Research, 21(182), 1–43. https://jmlr.org/papers/volume21/20-297/20-297.pdf
Chen, X., Liu, W., & Zhang, Y. (2019). Quantile regression under memory constraint. The Annals of Statistics, 47(6), 3244–3273. https://doi.org/10.1214/18-AOS1777
Chen, X., Liu, W., & Zhang, Y. (2021b). First-order newton-type estimator for distributed estimation and inference. Journal of the American Statistical Association, 1–40. https://doi.org/10.1080/01621459.2021.1891925
Chen, X., & Xie, M.-G. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, 1655–1684. http://dx.doi.org/10.5705/ss.2013.088
Duchi, J. C., Jordan, M. I., Wainwright, M. J., & Zhang, Y. (2014). Optimality guarantees for distributed statistical estimation. arXiv preprint arXiv:1405.0782.
Dwork, C. (2008). Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation (pp. 1–19). Springer.
Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its applications: Monographs on statistics and applied probability 66 (Vol. 66). CRC Press.
Fan, J., Guo, Y., & Wang, K. (2019a). Communication-efficient accurate statistical estimation. arXiv preprint arXiv:1906.04870.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360. https://doi.org/10.1198/016214501753382273
Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849–911. https://doi.org/10.1111/rssb.2008.70.issue-5
Fan, J., Wang, D., Wang, K., & Zhu, Z. (2019b). Distributed estimation of principal eigenspaces. The Annals of Statistics, 47(6), 3009–3031. https://doi.org/10.1214/18-AOS1713
Guo, Z.-C., Lin, S.-B., & Shi, L. (2019). Distributed learning with multi-penalty regularization. Applied and Computational Harmonic Analysis, 46(3), 478–499. https://doi.org/10.1016/j.acha.2017.06.001
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: the lasso and generalizations. CRC press.
Huang, C., & Huo, X. (2015). A distributed one-step estimator. arXiv preprint arXiv:1511.01443.
Javanmard, A., & Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. The Journal of Machine Learning Research, 15(1), 2869–2909. https://jmlr.csail.mit.edu/papers/volume15/javanmard14a/javanmard14a.pdf
Jordan, M. I., Lee, J. D., & Yang, Y. (2019). Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 114(526), 668–681. https://doi.org/10.1080/01621459.2018.1429274
Kaplan, D. M. (2019). Optimal smoothing in divide-and-conquer for big data. Technical report, working paper available at https://faculty.missouri.edu/kaplandm.
Kleiner, A., Talwalkar, A., Sarkar, P., & Jordan, M. I. (2014). A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 795–816. https://doi.org/10.1111/rssb.2014.76.issue-4
Koenker (2005). Quantile regression (Econometric society monographs; no. 38). Cambridge University Press.
Lee, J. D., Liu, Q., Sun, Y., & Taylor, J. E. (2017). Communication-efficient sparse regression. The Journal of Machine Learning Research, 18(1), 115–144. https://jmlr.csail.mit.edu/papers/volume18/16-002/16-002.pdf
Lehmann, E. L., & Casella, G. (2006). Theory of point estimation. Springer Science & Business Media.
Li, R., Lin, D. K., & Li, B. (2013). Statistical inference in massive data sets. Applied Stochastic Models in Business and Industry, 29(5), 399–409. https://doi.org/10.1002/asmb.1927
Li, X., Li, R., Xia, Z., & Xu, C. (2020). Distributed feature screening via componentwise debiasing. Journal of Machine Learning Research, 21(24), 1–32. https://jmlr.csail.mit.edu/papers/volume21/19-537/19-537.pdf
Lian, H., & Fan, Z. (2018). Divide-and-conquer for debiased l1-norm support vector machine in ultra-high dimensions. Journal of Machine Learning Research, 18(1), 1–26. https://jmlr.org/papers/volume18/17-343/17-343.pdf
Lian, H., Zhao, K., & Lv, S. (2019). Projected spline estimation of the nonparametric function in high-dimensional partially linear models for massive data. Annals of Statistics, 47(5), 2922–2949. https://doi.org/10.1214/18-AOS1769
Lin, S.-B., Guo, X., & Zhou, D.-X. (2017). Distributed learning with regularized least squares. The Journal of Machine Learning Research, 18(1), 3202–3232. https://www.jmlr.org/papers/volume18/15-586/15-586.pdf
Lin, S.-B., Wang, D., & Zhou, D.-X. (2020). Distributed kernel ridge regression with communications. arXiv preprint arXiv:2003.12210.
Lin, S.-B., & Zhou, D.-X. (2018). Distributed kernel-based gradient descent algorithms. Constructive Approximation, 47(2), 249–276. https://doi.org/10.1007/s00365-017-9379-1
Liu, D., Liu, R. Y., & Xie, M. (2015). Multivariate meta-analysis of heterogeneous studies using only summary statistics: efficiency and robustness. Journal of the American Statistical Association, 110(509), 326–340. https://doi.org/10.1080/01621459.2014.899235
Liu, Q., & Ihler, A. T. (2014). Distributed estimation, information loss and exponential families. In Advances in neural information processing systems (pp. 1098–1106). MIT Press.
Lv, S., & Lian, H. (2017). Debiased distributed learning for sparse partial linear models in high dimensions. arXiv preprint arXiv:1708.05487.
Minsker, S. (2019). Distributed statistical estimation and rates of convergence in normal approximation. Electronic Journal of Statistics, 13(2), 5213–5252. https://doi.org/10.1214/19-EJS1647
Mücke, N., & Blanchard, G. (2018). Parallelizing spectrally regularized kernel algorithms. The Journal of Machine Learning Research, 19(1), 1069–1097. https://www.jmlr.org/papers/volume19/16-569/16-569.pdf
Politis, D. N., Romano, J. P., & Wolf, M. (1999). Subsampling. Springer Science & Business Media.
Qiao, X., Duan, J., & Cheng, G. (2019). Rates of convergence for large-scale nearest neighbor classification. In Advances in neural information processing systems (pp. 10768–10779). Curran Associates Inc.
Rosenblatt, J. D., & Nadler, B. (2016). On the optimality of averaging in distributed statistical learning. Information and Inference: A Journal of the IMA, 5(4), 379–404. https://doi.org/10.1093/imaiai/iaw013
Saunders, C., Gammerman, A., & Vovk, V. (1998). Ridge regression learning algorithm in dual variables.
Shamir, O., Srebro, N., & Zhang, T. (2014). Communication-efficient distributed optimization using an approximate newton-type method. In International conference on machine learning (pp. 1000–1008). JMLR.org.
Shang, Z., & Cheng, G. (2017). Computational limits of a distributed algorithm for smoothing spline. Journal of Machine Learning Research, 18(108), 1–37. https://jmlr.org/papers/volume18/16-289/16-289.pdf
Song, Q., & Liang, F. (2015). A split-and-merge bayesian variable selection approach for ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 77(5), 947–972. https://doi.org/ 10.1111/rssb.12095.
Steinwart, I., Hush, D. R., & Scovel, C. (2009). Optimal rates for regularized least squares regression. In COLT (pp. 79–93). https://www.cs.mcgill.ca/∼colt2009/papers/038.pdf
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Truex, S., Baracaldo, N., Anwar, A., Steinke, T., Ludwig, H., Zhang, R., & Zhou, Y. (2019). A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security (pp. 1–11). Association for Computing Machinery.
Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business media.
Volgushev, S., Chao, S.-K., & Cheng, G. (2019). Distributed inference for quantile regression processes. The Annals of Statistics, 47(3), 1634–1662. https://doi.org/10.1214/18-AOS1730
Wahba, G. (1990). Spline models for observational data (Vol. 59). Siam.
Wang, F., Huang, D., Zhu, Y., & Wang, H. (2020). Efficient estimation for generalized linear models on a distributed system with nonrandomly distributed data. arXiv preprint arXiv:2004.02414.
Wang, J., Kolar, M., Srebro, N., & Zhang, T. (2017). Efficient distributed learning with sparsity. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 3636–3645). JMLR.org.
Wang, S. (2019). A sharper generalization bound for divide-and-conquer ridge regression. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 5305–5312). AAAI Press.
Wang, X., Ishii, H., Du, L., Cheng, P., & Chen, J. (2019). Differential privacy-preserving distributed machine learning. In 2019 IEEE 58th Conference on Decision and Control (CDC) (pp. 7339–7344). IEEE.
Wang, X., Yang, Z., Chen, X., & Liu, W. (2019). Distributed inference for linear support vector machine. Journal of Machine Learning Research, 20(113), 1–41. https://www.jmlr.org/papers/volume20/18-801/18-801.pdf
Xu, C., Zhang, Y., Li, R., & Wu, X. (2016). On the feasibility of distributed kernel regression for big data. IEEE Transactions on Knowledge and Data Engineering, 28(11), 3041–3052. https://doi.org/10.1109/TKDE.2016.2594060
Xu, G., Shang, Z., & Cheng, G. (2018). Optimal tuning for divide-and-conquer kernel ridge regression with massive data. In Proceedings of Machine Learning Research. PMLR.
Xu, M., & Shao, J. (2020). Meta-analysis of independent datasets using constrained generalised method of moments. Statistical Theory and Related Fields, 4(1), 109–116. https://doi.org/10.1080/24754269.2019.1630545
Yang, J., Mahoney, M. W., Saunders, M. A., & Sun, Y. (2016). Feature-distributed sparse regression: A screen-and-clean approach. In NIPS (pp. 2712–2720). Curran Associates, Inc.
Zhang, C.-H., & Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science, 27(4), 576–593. https://doi.org/10.1214/12-STS399
Zhang, T. (2005). Learning bounds for kernel regression using effective data dimensionality. Neural Computation, 17(9), 2077–2098. https://doi.org/10.1162/0899766054323008
Zhang, Y., Duchi, J., & Wainwright, M. (2015). Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. The Journal of Machine Learning Research, 16(1), 3299–3340. https://jmlr.org/papers/volume16/zhang15d/zhang15d.pdf
Zhang, Y., Duchi, J. C., & Wainwright, M. J. (2013). Communication-efficient algorithms for statistical optimization. The Journal of Machine Learning Research, 14(1), 3321–3363. https://www.jmlr.org/papers/volume14/zhang13b/zhang13b.pdf
Zhao, T., Cheng, G., & Liu, H. (2016). A partially linear framework for massive heterogeneous data. Annals of Statistics, 44(4), 1400. https://doi.org/10.1214/15-AOS1410
Zhao, T., Kolar, M., & Liu, H. (2014). A general framework for robust testing and confidence regions in high-dimensional quantile regression. arXiv preprint arXiv:1412.8724.
Zhao, W., Zhang, F., & Lian, H. (2019). Debiasing and distributed estimation for high-dimensional quantile regression. IEEE Transactions on Neural Networks and Learning Systems, 31(7), 2569–2577. https://doi.org/10.1109/TNNLS.2019.2933467
Zhou, L., & Song, P. X.-K. (2017). Scalable and efficient statistical inference with estimating functions in the mapreduce paradigm for big data. arXiv preprint arXiv:1709.04389.
Zhu, L.-P., Li, L., Li, R., & Zhu, L.-X. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106(496), 1464–1475. https://doi.org/10.1198/jasa.2011.tm10563
Zhu, X., Li, F., & Wang, H. (2019). Least squares approximation for a distributed system. arXiv preprint arXiv:1908.04904.

To cite this article: Yuan Gao, Weidong Liu, Hansheng Wang, Xiaozhou Wang, Yibo Yan &
Riquan Zhang (2021): A review of distributed statistical inference, Statistical Theory and Related Fields, DOI: 10.1080/24754269.2021.1974158
To link to this article: https://doi.org/10.1080/24754269.2021.1974158

Archives

References

Authors

About the Journal

Links

Search

Archives