A survey on differential privacy methods for big data privacy protection

ISSN 2475-4269

CN 31-2182/O1

Miaomiao Yu ,

mmyu@fem.ecnu.edu.cn

Yong Zhou

Pages | Received 26 Aug. 2025, Accepted 20 May. 2026, Published online: 27 May. 2026,

Abstract
Full Article
References
Citations

In the era of big data, ensuring data privacy has emerged as a significant challenge in large-scale data applications. Currently, differential privacy is one of the most promising privacy preserving algorithms, as it provides an explicit measure of the degree of privacy protection. Although the development of differential privacy is still in its early stages within the field of statistics, it is expected to play an integral role in future research. Motivated by this, this paper first provides a review of the development of privacy models, including the detailed introduction and interpretation of the differential privacy framework. In addition, we present the applications of several commonly used noise mechanisms and elaborate on the parallel and sequential composition theorems in differential privacy. Finally, this paper also discusses potential future research on differential privacy for online data analysis and statistical inference.

References

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (pp. 308–318). Association for Computing Machinery (ACM).
Abowd, J. M. (2018). The U.S. Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2867–2867). Association for Computing Machinery (ACM).
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., & Zhu, A. (2004). k-Anonymity: Algorithms and Hardness. Stanford University.
Altschuler, J., & Talwar, K. (2022). Privacy of noisy stochastic gradient descent: More iterations without more privacy loss. Advances in Neural Information Processing Systems, 35, 3788–3800. https://doi.org/10.52202/068431
Auddy, A., Cai, T. T., & Chakraborty, A. (2024). Minimax and adaptive transfer learning for nonparametric classification under distributed differential privacy constraints. arXiv:2406.20088(open in a new window)
Avancha, S., Baxi, A., & Kotz, D. (2012). Privacy in mobile technology for personal healthcare. ACM Computing Surveys (CSUR), 45(1), 1–54. https://doi.org/10.1145/2379776.2379779
Avella-Medina, M. (2021). Privacy-preserving parametric inference: A case for robust statistics. Journal of the American Statistical Association, 116(534), 969–983. https://doi.org/10.1080/01621459.2019.1700130
Avella-Medina, M., Bradshaw, C., & Loh, P. L. (2023). Differentially private inference via noisy optimization. The Annals of Statistics, 51(5), 2067–2092. https://doi.org/10.1214/23-AOS2321
Awan, J., & Vadhan, S. (2023). Canonical noise distributions and private hypothesis tests. The Annals of Statistics, 51(2), 547–572. https://doi.org/10.1214/23-AOS2259
Balle, B., Barthe, G., & Gaboardi, M. (2018). Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Advances in Neural Information Processing Systems (Vol. 31, pp. 6277–6287). Curran Associates, Inc.
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., & Talwar, K. (2007). Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (pp. 273–282). Association for Computing Machinery (ACM).
Bayardo, R. J., & Agrawal, R. (2005). Data privacy through optimal k-anonymization. In 21st International Conference on Data Engineering (ICDE'05) (pp. 217–228). IEEE Computer Society.
Bhagoji, A. N., Chakraborty, S., Mittal, P., & Calo, S. (2019). Analyzing federated learning through an adversarial lens. In International Conference on Machine Learning (pp. 634–643). Proceedings of Machine Learning Research (PMLR).
Bi, X., & Shen, X. (2023). Distribution-invariant differential privacy. Journal of Econometrics, 235(2), 444–453. https://doi.org/10.1016/j.jeconom.2022.05.004
Bok, J., Su, W. J., & Altschuler, J. M. (2024). Shifted interpolation for differential privacy. In Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research (PMLR).
Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., & Seth, K. (2017). Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175–1191). Association for Computing Machinery (ACM).
Bu, Z., Dong, J., Long, Q., & Su, W. J. (2020). Deep learning with Gaussian differential privacy. Harvard Data Science Review, 2020(23), 10–1162.
Bu, Z., Wang, Y. X., Zha, S., & Karypis, G. (2022). Differentially private bias-term only fine-tuning of foundation models. arXiv:2210.00036(open in a new window)
Cai, T. T., Wang, Y., & Zhang, L. (2021). The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. The Annals of Statistics, 49(5), 2825–2850. https://doi.org/10.1214/21-AOS2058
Canonne, C. L., Kamath, G., & Steinke, T. (2020). The discrete Gaussian for differential privacy. In Advances in Neural Information Processing Systems (Vol. 33, pp. 15676–15688). Curran Associates, Inc.
Chaudhuri, K., Monteleoni, C., & Sarwate, A. D. (2011). Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(3), 1069–1109.
Chawla, S., Dwork, C., McSherry, F., & Talwar, K. (2012). On privacy-preserving histograms. arXiv:1207.1371(open in a new window)
Chourasia, R., Ye, J., & Shokri, R. (2021). Differential privacy dynamics of Langevin diffusion and noisy gradient descent. Advances in Neural Information Processing Systems, 34, 14771–14781.
Cohen, A., Duchin, M., Matthews, J., & Suwal, B. (2022). Private numbers in public policy: Census, differential privacy, and redistricting. Harvard Data Science Review, (Special Issue 2), 1–43.
Cyffers, E., & Bellet, A. (2022). Privacy amplification by decentralization. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (pp. 5334–5353). Proceedings of Machine Learning Research (PMLR).
Cyffers, E., Bellet, A., & Upadhyay, J. (2024). Differentially private decentralized learning with random walks. In Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research (PMLR).
Dankar, F. K., & El Emam, K. (2012). The application of differential privacy to health data. In Proceedings of the 2012 Joint EDBT/ICDT Workshops (pp. 158–166). Association for Computing Machinery (ACM).
Dankar, F. K., & El Emam, K. (2013). Practicing differential privacy in health care: A review. Transactions on Data Privacy, 6(1), 35–67.
De, S., Berrada, L., Hayes, J., Smith, S. L., & Balle, B. (2022). Unlocking high-accuracy differently private image classification through scale. arXiv:2204.13650(open in a new window)
Dinur, I., & Nissim, K. (2003). Revealing information while preserving privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (pp. 202–210). Association for Computing Machinery (ACM).
Dong, J., Roth, A., & Su, W. J. (2022). Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1), 3–37. https://doi.org/10.1111/rssb.12454
Duchi, J. C., Jordan, M. I., & Wainwright, M. J. (2018). Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 113(521), 182–201. https://doi.org/10.1080/01621459.2017.1389735
Dwork, C. (2006). Differential privacy. In International Colloquium on Automata, Languages, and Programming (pp. 1–12). Springer.
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., & Naor, M. (2006). Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques (pp. 486–503). Springer.
Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference (pp. 265–284). Springer.
Dwork, C., McSherry, F., & Talwar, K. (2007). The price of privacy and the limits of LP decoding. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (pp. 85–94). Association for Computing Machinery (ACM).
Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4), 211–407. https://doi.org/10.1561/TCS
Dwork, C., & Yekhanin, S. (2008). New efficient attacks on statistical disclosure control mechanisms. In Annual International Cryptology Conference (pp. 469–480). Springer.
Errounda, F. Z., & Liu, Y. (2019). An analysis of differential privacy research in location data. In 2019 IEEE 5th INTL Conference on Big Data Security on Cloud (BigDataSecurity), IEEE INTL Conference on High Performance and Smart Computing,(HPSC) and IEEE INTL Conference on Intelligent Data and Security (IDS) (pp. 53–60). Institute of Electrical and Electronics Engineers (IEEE).
Esmaeili, M. M., Mironov, I., Prasad, K., Shilov, I., & Tramèr, F. (2021). Antipodes of label differential privacy: PATE and ALIBI. In Advances in Neural Information Processing Systems (Vol. 34, pp. 6934–6945). Curran Associates, Inc.
Feldman, V., McMillan, A., & Talwar, K. (2022). Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) (pp. 954–964). Institute of Electrical and Electronics Engineers (IEEE).
Francis, T., Madiajagan, M., & Kumar, V. (2015). Privacy issues and techniques in E-health systems. In Proceedings of the 2015 ACM SIGMIS Conference on Computers and People Research (pp. 113–115). Association for Computing Machinery (ACM).
Geng, Q., & Viswanath, P. (2014). The optimal mechanism in differential privacy. In 2014 IEEE International Symposium on Information Theory (pp. 2371–2375). Institute of Electrical and Electronics Engineers (IEEE).
Geyer, R. C., Klein, T., & Nabi, M. (2017). Differentially private federated learning: A client level perspective. arXiv:1712.07557(open in a new window)
Ghazi, B., Golowich, N., Kumar, R., Manurangsi, P., & Zhang, C. (2021). Deep Learning with label differential privacy. In Advances in Neural Information Processing Systems (Vol. 34, pp. 27131–27145). Curran Associates, Inc.
Gopi, S., Lee, Y. T., & Wutschitz, L. (2021). Numerical composition of differential privacy. In Advances in Neural Information Processing Systems (Vol. 34, pp. 11631–11642). Curran Associates, Inc.
Gu, J., & Chen, S. X. (2024). Statistical inference for decentralized federated learning. The Annals of Statistics, 52(6), 2931–2955. https://doi.org/10.1214/24-AOS2452
Hall, R., Rinaldo, A., & Wasserman, L. (2013). Differential privacy for functions and functional data. The Journal of Machine Learning Research, 14(1), 703–727.
Hardt, M., Ligett, K., & McSherry, F. (2012). A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems (Vol. 25). Curran Associates, Inc.
Ji, Z., Lipton, Z. C., & Elkan, C. (2014). Differential privacy and machine learning: A survey and review. arXiv:1412.7584(open in a new window)
Jin, H., Luo, Y., Li, P., & Mathew, J. (2019). A review of secure and privacy-preserving medical data sharing. IEEE Access, 7, 61656–61669. https://doi.org/10.1109/Access.6287639
Kenny, C. T., Kuriwaki, S., McCartan, C., Rosenman, E. T. R., Simko, T., & Imai, K. (2021). The use of differential privacy for census data and its impact on redistricting: The case of the 2020 U.S. Census. Science Advances, 7(41), eabk3283. https://doi.org/10.1126/sciadv.abk3283
Kenny, C. T., McCartan, C., Kuriwaki, S., Simko, T., & Imai, K. (2024). Evaluating bias and noise induced by the U.S. Census Bureau's privacy protection methods. Science Advances, 10(18), eadl2524. https://doi.org/10.1126/sciadv.adl2524
Kim, M., Günlü, O., & Schaefer, R. F. (2021). Federated learning with local differential privacy: Trade-offs between privacy, utility, and communication. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2650–2654). Institute of Electrical and Electronics Engineers (IEEE).
Konečny`, J., McMahan, H. B., Ramage, D., & Richtárik, P. (2016). Federated optimization: Distributed machine learning for on-device intelligence. arXiv:1610.02527(open in a new window)
Koskela, A., Jälkö, J., & Honkela, A. (2020). Computing tight differential privacy guarantees using FFT. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (pp. 2560–2569). Proceedings of Machine Learning Research (PMLR).
LeFevre, K., DeWitt, D. J., & Ramakrishnan, R. (2005). Incognito: Efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (pp. 49–60). Association for Computing Machinery (ACM).
Lei, J. (2011). Differentially private m-estimators. In Advances in Neural Information Processing Systems (Vol. 24). Curran Associates, Inc.
Li, M., Berrett, T. B., & Yu, Y. (2023). On robustness and local differential privacy. The Annals of Statistics, 51(2), 717–737. https://doi.org/10.1214/23-AOS2267
Li, N., Li, T., & Venkatasubramanian, S. (2006). t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd International Conference on Data Engineering (pp. 106–115). Institute of Electrical and Electronics Engineers (IEEE).
Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50–60. https://doi.org/10.1109/MSP.79
Li, X., Su, B., Wang, C., Long, Q., & Su, W. J. (2025). Mitigating privacy-utility trade-off in decentralized federated learning via f-differential privacy. arXiv:2510.19934(open in a new window)
Lin, W., Li, B., & Wang, C. (2022). Towards private learning on decentralized graphs with local differential privacy. IEEE Transactions on Information Forensics and Security, 17, 2936–2946. https://doi.org/10.1109/TIFS.2022.3198283
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., & Vilhuber, L. (2008). Privacy: Theory meets practice on the map. In 2008 IEEE 24th International Conference on Data Engineering (pp. 277–286). Institute of Electrical and Electronics Engineers (IEEE).
Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, M. (2007). L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data, 1(1), 3–es. https://doi.org/10.1145/1217299.1217302
McMahan, B., Moore, E., Ramage, D., Hampson, S., & Agüera y Arcas, B. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics (pp. 1273–1282). Proceedings of Machine Learning Research (PMLR).
McSherry, F., & Talwar, K. (2007). Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07) (pp. 94–103). Institute of Electrical and Electronics Engineers (IEEE).
McSherry, F. D. (2009). Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (pp. 19–30). Association for Computing Machinery (ACM).
Nasr, M., Shokri, R., & Houmansadr, A. (2018). Comprehensive privacy analysis of deep learning. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP) (Vol. 2018, pp. 1–15). Institute of Electrical and Electronics Engineers (IEEE).
Nissim, K., Raskhodnikova, S., & Smith, A. (2007). Smooth sensitivity and sampling in private data analysis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (pp. 75–84). Association for Computing Machinery (ACM).
Peng, H., Jin, K., Fu, C., Fu, N., & Zhang, X. (2020). Private time series pattern mining with sequential lattice. Acta Electronica Sinica, 48(1), 153–163.
Rubinstein, B. I., Bartlett, P. L., Huang, L., & Taft, N. (2009). Learning in a large function space: Privacy-preserving mechanisms for SVM learning. arXiv:0911.5708(open in a new window)
Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121. https://doi.org/10.1023/A:1026543900054
Sadki, S., & El Bakkali, H. (2014). Enhancing privacy on mobile health: An integrated privacy module. In 2014 International Conference on Next Generation Networks and Services (NGNS) (pp. 245–250). Institute of Electrical and Electronics Engineers (IEEE).
Samarati, P. (2002). Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6), 1010–1027. https://doi.org/10.1109/69.971193
Shaikh, A., & Patil, S. (2018). Role of differential privacy in a new age data privacy environment. International Journal of Pure and Applied Mathematics, 118, 24.
Shokri, R., & Shmatikov, V. (2015). Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (pp. 1310–1321). Association for Computing Machinery (ACM).
Smith, A. (2011). Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing (pp. 813–822). Association for Computing Machinery (ACM).
Su, B., Su, W. J., & Wang, C. (2025). The 2020 US Decennial Census is more private than you (might) think. Proceedings of the National Academy of Sciences, 122(45), e2500337122. https://doi.org/10.1073/pnas.2500337122
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 557–570. https://doi.org/10.1142/S0218488502001648
Tan, Z., & Zhang, L. (2020). Survey on privacy preserving techniques for machine learning. Journal of Software, 31(7), 2127–2156.
Wang, C., Su, B., Ye, J., Shokri, R., & Su, W. J. (2023). Unified enhancement of privacy bounds for mixture mechanisms via f-differential privacy. In Advances in Neural Information Processing Systems (Vol. 36, pp. 55051–55063). Curran Associates, Inc.
Wang, C., Zhu, Y., Su, W. J., & Wang, Y. X. (2024). Neural collapse meets differential privacy: Curious behaviors of NoisyGD with near-perfect representation learning. In Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research (PMLR).
Wang, H., Gao, S., Zhang, H., Shen, M., & Su, W. J. (2022). Analytical composition of differential privacy via the edgeworth accountant. arXiv:2206.04236(open in a new window)
Wang, J., Zhu, R., Liu, S., & Cai, Z. (2018). Node location privacy protection based on differentially private grids in industrial wireless sensor networks. Sensors, 18(2), 410. https://doi.org/10.3390/s18020410
Wang, P., & Zhang, H. (2020). Distributed privacy-preserving logistic regression. Scientia Sinica Informationis, 50(10), 18.
Wasserman, L (2006). All of Nonparametric Statistics. Springer.
Wasserman, L., & Zhou, S. (2010). A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489), 375–389. https://doi.org/10.1198/jasa.2009.tm08651
Wu, Y., Ge, C., Zhang, L., & Sun, L. (2017). Differential privacy stream data publishing algorithm based on matrix mechanism under exponential decay model. Scientia Sinica Informationis, 47(11), 17.
Xia, X., Zhang, L., & Cai, Z. (2025). Statistical inference for differentially private stochastic gradient descent. arXiv:2507.20560(open in a new window)
Xiao, Y., Gardner, J., & Xiong, L. (2012). Dpcube: Releasing differentially private data cubes for health information. In 2012 IEEE 28th International Conference on Data Engineering (pp. 1305–1308). Institute of Electrical and Electronics Engineers (IEEE).
Xiong, P., Zhu, T., & Wang, X. (2014). A survey on differential privacy and applications. Chinese Journal of Computers, 37(1), 22.
Xu, L., Jiang, C., Wang, J., Yuan, J., & Ren, Y. (2014). Information security in big data: Privacy and data mining. IEEE Access, 2, 1149–1176. https://doi.org/10.1109/ACCESS.2014.2362522
Xu, S., Wang, C., Sun, W. W. Y., & Cheng, G. (2023). Binary classification under local label differential privacy using randomized response mechanisms. Transactions on Machine Learning Research, 1–26.
Yang, X., Gao, L., Wang, H., Guo, H., & Zheng, J. (2020). Balanced correlation differential privacy protection method for histogram publishing. Chinese Journal of Computers, 43(8), 19.
Yu, D., Naik, S., Backurs, A., Gopi, S., Inan, H. A., Kamath, G., Kulkarni, J., Lee, Y. T., Manoel, A., Wutschitz, L., Yekhanin, S., & Zhang, H. (2022). Differentially private fine-tuning of language models. In International Conference on Learning Representations. OpenReview.
Yu, M., Li, J., & Zhou, Y. (2026). Enhancements of communication-efficient distributed statistical inference and its privacy preservation. Journal of Econometrics, 253,106125. https://doi.org/10.1016/j.jeconom.2025.106125
Yu, M., Li, Z., & Zhou, Y. (2023). Privacy-preserving parameter estimation in distributed cases. Acta Mathematicae Applicatae Sinica, 46(2), 145–165.
Zhang, J., Cormode, G., Procopiuc, C. M., Srivastava, D., & Xiao, X. (2017). PrivBayes: Private data release via Bayesian networks. ACM Transactions on Database Systems, 42(4), 25–41. https://doi.org/10.1145/3134428
Zhao, Y., Xia, Y., & Wang, C. (2025). Why does private fine-tuning resist differential privacy noise? A representation learning perspective. In ICLR 2025 Workshop on Navigating and Addressing Data Problems for Foundation Models. OpenReview.

To cite this article: Chengliang Liu , Miaomiao Yu & Yong Zhou (27 May 2026): A survey on differential privacy methods for big data privacy protection, Statistical Theory and Related Fields, DOI: 10.1080/24754269.2026.2679084
To link to this article: https://doi.org/10.1080/24754269.2026.2679084

Archives

References

Authors

About the Journal

Links

Search

Archives