Review Articles

A survey on differential privacy methods for big data privacy protection

Chengliang Liu ,

Key Laboratory of Advanced Theory and Application in Statistics and Data Science (MOE), School of Statistics and Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai, People's Republic of China

Miaomiao Yu ,

Key Laboratory of Advanced Theory and Application in Statistics and Data Science (MOE), School of Statistics and Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai, People's Republic of China

mmyu@fem.ecnu.edu.cn

Yong Zhou

Key Laboratory of Advanced Theory and Application in Statistics and Data Science (MOE), School of Statistics and Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai, People's Republic of China

Pages | Received 26 Aug. 2025, Accepted 20 May. 2026, Published online: 27 May. 2026,
  • Abstract
  • Full Article
  • References
  • Citations

In the era of big data, ensuring data privacy has emerged as a significant challenge in large-scale data applications. Currently, differential privacy is one of the most promising privacy preserving algorithms, as it provides an explicit measure of the degree of privacy protection. Although the development of differential privacy is still in its early stages within the field of statistics, it is expected to play an integral role in future research. Motivated by this, this paper first provides a review of the development of privacy models, including the detailed introduction and interpretation of the differential privacy framework. In addition, we present the applications of several commonly used noise mechanisms and elaborate on the parallel and sequential composition theorems in differential privacy. Finally, this paper also discusses potential future research on differential privacy for online data analysis and statistical inference.

Your browser may not support PDF viewing. Please click to download the file.

References

  • Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (pp. 308–318). Association for Computing Machinery (ACM).
  • Abowd, J. M. (2018). The U.S. Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2867–2867). Association for Computing Machinery (ACM).
  • Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., & Zhu, A. (2004). k-Anonymity: Algorithms and Hardness. Stanford University.
  • Altschuler, J., & Talwar, K. (2022). Privacy of noisy stochastic gradient descent: More iterations without more privacy loss. Advances in Neural Information Processing Systems35, 3788–3800. https://doi.org/10.52202/068431
  • Auddy, A., Cai, T. T., & Chakraborty, A. (2024). Minimax and adaptive transfer learning for nonparametric classification under distributed differential privacy constraints. arXiv:2406.20088(open in a new window)
  • Avancha, S., Baxi, A., & Kotz, D. (2012). Privacy in mobile technology for personal healthcare. ACM Computing Surveys (CSUR)45(1), 1–54. https://doi.org/10.1145/2379776.2379779
  • Avella-Medina, M. (2021). Privacy-preserving parametric inference: A case for robust statistics. Journal of the American Statistical Association116(534), 969–983. https://doi.org/10.1080/01621459.2019.1700130
  • Avella-Medina, M., Bradshaw, C., & Loh, P. L. (2023). Differentially private inference via noisy optimization. The Annals of Statistics51(5), 2067–2092. https://doi.org/10.1214/23-AOS2321
  • Awan, J., & Vadhan, S. (2023). Canonical noise distributions and private hypothesis tests. The Annals of Statistics51(2), 547–572. https://doi.org/10.1214/23-AOS2259
  • Balle, B., Barthe, G., & Gaboardi, M. (2018). Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Advances in Neural Information Processing Systems (Vol. 31, pp. 6277–6287). Curran Associates, Inc.
  • Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., & Talwar, K. (2007). Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (pp. 273–282). Association for Computing Machinery (ACM).
  • Bayardo, R. J., & Agrawal, R. (2005). Data privacy through optimal k-anonymization. In 21st International Conference on Data Engineering (ICDE'05) (pp. 217–228). IEEE Computer Society.
  • Bhagoji, A. N., Chakraborty, S., Mittal, P., & Calo, S. (2019). Analyzing federated learning through an adversarial lens. In International Conference on Machine Learning (pp. 634–643). Proceedings of Machine Learning Research (PMLR).
  • Bi, X., & Shen, X. (2023). Distribution-invariant differential privacy. Journal of Econometrics235(2), 444–453. https://doi.org/10.1016/j.jeconom.2022.05.004
  • Bok, J., Su, W. J., & Altschuler, J. M. (2024). Shifted interpolation for differential privacy. In Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research (PMLR).
  • Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., & Seth, K. (2017). Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175–1191). Association for Computing Machinery (ACM).
  • Bu, Z., Dong, J., Long, Q., & Su, W. J. (2020). Deep learning with Gaussian differential privacy. Harvard Data Science Review2020(23), 10–1162.
  • Bu, Z., Wang, Y. X., Zha, S., & Karypis, G. (2022). Differentially private bias-term only fine-tuning of foundation models. arXiv:2210.00036(open in a new window)
  • Cai, T. T., Wang, Y., & Zhang, L. (2021). The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. The Annals of Statistics49(5), 2825–2850. https://doi.org/10.1214/21-AOS2058
  • Canonne, C. L., Kamath, G., & Steinke, T. (2020). The discrete Gaussian for differential privacy. In Advances in Neural Information Processing Systems (Vol. 33, pp. 15676–15688). Curran Associates, Inc.
  • Chaudhuri, K., Monteleoni, C., & Sarwate, A. D. (2011). Differentially private empirical risk minimization. Journal of Machine Learning Research12(3), 1069–1109.
  • Chawla, S., Dwork, C., McSherry, F., & Talwar, K. (2012). On privacy-preserving histograms. arXiv:1207.1371(open in a new window)
  • Chourasia, R., Ye, J., & Shokri, R. (2021). Differential privacy dynamics of Langevin diffusion and noisy gradient descent. Advances in Neural Information Processing Systems34, 14771–14781.
  • Cohen, A., Duchin, M., Matthews, J., & Suwal, B. (2022). Private numbers in public policy: Census, differential privacy, and redistricting. Harvard Data Science Review, (Special Issue 2), 1–43.
  • Cyffers, E., & Bellet, A. (2022). Privacy amplification by decentralization. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (pp. 5334–5353). Proceedings of Machine Learning Research (PMLR).
  • Cyffers, E., Bellet, A., & Upadhyay, J. (2024). Differentially private decentralized learning with random walks. In Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research (PMLR).
  • Dankar, F. K., & El Emam, K. (2012). The application of differential privacy to health data. In Proceedings of the 2012 Joint EDBT/ICDT Workshops (pp. 158–166). Association for Computing Machinery (ACM).
  • Dankar, F. K., & El Emam, K. (2013). Practicing differential privacy in health care: A review. Transactions on Data Privacy6(1), 35–67.
  • De, S., Berrada, L., Hayes, J., Smith, S. L., & Balle, B. (2022). Unlocking high-accuracy differently private image classification through scale. arXiv:2204.13650(open in a new window)
  • Dinur, I., & Nissim, K. (2003). Revealing information while preserving privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (pp. 202–210). Association for Computing Machinery (ACM).
  • Dong, J., Roth, A., & Su, W. J. (2022). Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology84(1), 3–37. https://doi.org/10.1111/rssb.12454
  • Duchi, J. C., Jordan, M. I., & Wainwright, M. J. (2018). Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association113(521), 182–201. https://doi.org/10.1080/01621459.2017.1389735
  • Dwork, C. (2006). Differential privacy. In International Colloquium on Automata, Languages, and Programming (pp. 1–12). Springer.
  • Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., & Naor, M. (2006). Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques (pp. 486–503). Springer.
  • Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference (pp. 265–284). Springer.
  • Dwork, C., McSherry, F., & Talwar, K. (2007). The price of privacy and the limits of LP decoding. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (pp. 85–94). Association for Computing Machinery (ACM).
  • Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science9(3-4), 211–407. https://doi.org/10.1561/TCS
  • Dwork, C., & Yekhanin, S. (2008). New efficient attacks on statistical disclosure control mechanisms. In Annual International Cryptology Conference (pp. 469–480). Springer.
  • Errounda, F. Z., & Liu, Y. (2019). An analysis of differential privacy research in location data. In 2019 IEEE 5th INTL Conference on Big Data Security on Cloud (BigDataSecurity), IEEE INTL Conference on High Performance and Smart Computing,(HPSC) and IEEE INTL Conference on Intelligent Data and Security (IDS) (pp. 53–60). Institute of Electrical and Electronics Engineers (IEEE).
  • Esmaeili, M. M., Mironov, I., Prasad, K., Shilov, I., & Tramèr, F. (2021). Antipodes of label differential privacy: PATE and ALIBI. In Advances in Neural Information Processing Systems (Vol. 34, pp. 6934–6945). Curran Associates, Inc.
  • Feldman, V., McMillan, A., & Talwar, K. (2022). Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) (pp. 954–964). Institute of Electrical and Electronics Engineers (IEEE).
  • Francis, T., Madiajagan, M., & Kumar, V. (2015). Privacy issues and techniques in E-health systems. In Proceedings of the 2015 ACM SIGMIS Conference on Computers and People Research (pp. 113–115). Association for Computing Machinery (ACM).
  • Geng, Q., & Viswanath, P. (2014). The optimal mechanism in differential privacy. In 2014 IEEE International Symposium on Information Theory (pp. 2371–2375). Institute of Electrical and Electronics Engineers (IEEE).
  • Geyer, R. C., Klein, T., & Nabi, M. (2017). Differentially private federated learning: A client level perspective. arXiv:1712.07557(open in a new window)
  • Ghazi, B., Golowich, N., Kumar, R., Manurangsi, P., & Zhang, C. (2021). Deep Learning with label differential privacy. In Advances in Neural Information Processing Systems (Vol. 34, pp. 27131–27145). Curran Associates, Inc.
  • Gopi, S., Lee, Y. T., & Wutschitz, L. (2021). Numerical composition of differential privacy. In Advances in Neural Information Processing Systems (Vol. 34, pp. 11631–11642). Curran Associates, Inc.
  • Gu, J., & Chen, S. X. (2024). Statistical inference for decentralized federated learning. The Annals of Statistics52(6), 2931–2955. https://doi.org/10.1214/24-AOS2452
  • Hall, R., Rinaldo, A., & Wasserman, L. (2013). Differential privacy for functions and functional data. The Journal of Machine Learning Research14(1), 703–727.
  • Hardt, M., Ligett, K., & McSherry, F. (2012). A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems (Vol. 25). Curran Associates, Inc.
  • Ji, Z., Lipton, Z. C., & Elkan, C. (2014). Differential privacy and machine learning: A survey and review. arXiv:1412.7584(open in a new window)
  • Jin, H., Luo, Y., Li, P., & Mathew, J. (2019). A review of secure and privacy-preserving medical data sharing. IEEE Access7, 61656–61669. https://doi.org/10.1109/Access.6287639
  • Kenny, C. T., Kuriwaki, S., McCartan, C., Rosenman, E. T. R., Simko, T., & Imai, K. (2021). The use of differential privacy for census data and its impact on redistricting: The case of the 2020 U.S. Census. Science Advances7(41), eabk3283. https://doi.org/10.1126/sciadv.abk3283
  • Kenny, C. T., McCartan, C., Kuriwaki, S., Simko, T., & Imai, K. (2024). Evaluating bias and noise induced by the U.S. Census Bureau's privacy protection methods. Science Advances10(18), eadl2524. https://doi.org/10.1126/sciadv.adl2524
  • Kim, M., Günlü, O., & Schaefer, R. F. (2021). Federated learning with local differential privacy: Trade-offs between privacy, utility, and communication. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2650–2654). Institute of Electrical and Electronics Engineers (IEEE).
  • Konečny`, J., McMahan, H. B., Ramage, D., & Richtárik, P. (2016). Federated optimization: Distributed machine learning for on-device intelligence. arXiv:1610.02527(open in a new window)
  • Koskela, A., Jälkö, J., & Honkela, A. (2020). Computing tight differential privacy guarantees using FFT. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (pp. 2560–2569). Proceedings of Machine Learning Research (PMLR).
  • LeFevre, K., DeWitt, D. J., & Ramakrishnan, R. (2005). Incognito: Efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (pp. 49–60). Association for Computing Machinery (ACM).
  • Lei, J. (2011). Differentially private m-estimators. In Advances in Neural Information Processing Systems (Vol. 24). Curran Associates, Inc.
  • Li, M., Berrett, T. B., & Yu, Y. (2023). On robustness and local differential privacy. The Annals of Statistics51(2), 717–737. https://doi.org/10.1214/23-AOS2267
  • Li, N., Li, T., & Venkatasubramanian, S. (2006). t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd International Conference on Data Engineering (pp. 106–115). Institute of Electrical and Electronics Engineers (IEEE).
  • Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine37(3), 50–60. https://doi.org/10.1109/MSP.79
  • Li, X., Su, B., Wang, C., Long, Q., & Su, W. J. (2025). Mitigating privacy-utility trade-off in decentralized federated learning via f-differential privacy. arXiv:2510.19934(open in a new window)
  • Lin, W., Li, B., & Wang, C. (2022). Towards private learning on decentralized graphs with local differential privacy. IEEE Transactions on Information Forensics and Security17, 2936–2946. https://doi.org/10.1109/TIFS.2022.3198283
  • Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., & Vilhuber, L. (2008). Privacy: Theory meets practice on the map. In 2008 IEEE 24th International Conference on Data Engineering (pp. 277–286). Institute of Electrical and Electronics Engineers (IEEE).
  • Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, M. (2007). L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data1(1), 3–es. https://doi.org/10.1145/1217299.1217302
  • McMahan, B., Moore, E., Ramage, D., Hampson, S., & Agüera y Arcas, B. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics (pp. 1273–1282). Proceedings of Machine Learning Research (PMLR).
  • McSherry, F., & Talwar, K. (2007). Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07) (pp. 94–103). Institute of Electrical and Electronics Engineers (IEEE).
  • McSherry, F. D. (2009). Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (pp. 19–30). Association for Computing Machinery (ACM).
  • Nasr, M., Shokri, R., & Houmansadr, A. (2018). Comprehensive privacy analysis of deep learning. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP) (Vol. 2018, pp. 1–15). Institute of Electrical and Electronics Engineers (IEEE).
  • Nissim, K., Raskhodnikova, S., & Smith, A. (2007). Smooth sensitivity and sampling in private data analysis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (pp. 75–84). Association for Computing Machinery (ACM).
  • Peng, H., Jin, K., Fu, C., Fu, N., & Zhang, X. (2020). Private time series pattern mining with sequential lattice. Acta Electronica Sinica48(1), 153–163.
  • Rubinstein, B. I., Bartlett, P. L., Huang, L., & Taft, N. (2009). Learning in a large function space: Privacy-preserving mechanisms for SVM learning. arXiv:0911.5708(open in a new window)
  • Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision40(2), 99–121. https://doi.org/10.1023/A:1026543900054
  • Sadki, S., & El Bakkali, H. (2014). Enhancing privacy on mobile health: An integrated privacy module. In 2014 International Conference on Next Generation Networks and Services (NGNS) (pp. 245–250). Institute of Electrical and Electronics Engineers (IEEE).
  • Samarati, P. (2002). Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering13(6), 1010–1027. https://doi.org/10.1109/69.971193
  • Shaikh, A., & Patil, S. (2018). Role of differential privacy in a new age data privacy environment. International Journal of Pure and Applied Mathematics118, 24.
  • Shokri, R., & Shmatikov, V. (2015). Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (pp. 1310–1321). Association for Computing Machinery (ACM).
  • Smith, A. (2011). Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing (pp. 813–822). Association for Computing Machinery (ACM).
  • Su, B., Su, W. J., & Wang, C. (2025). The 2020 US Decennial Census is more private than you (might) think. Proceedings of the National Academy of Sciences122(45), e2500337122. https://doi.org/10.1073/pnas.2500337122
  • Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems10(5), 557–570. https://doi.org/10.1142/S0218488502001648
  • Tan, Z., & Zhang, L. (2020). Survey on privacy preserving techniques for machine learning. Journal of Software31(7), 2127–2156.
  • Wang, C., Su, B., Ye, J., Shokri, R., & Su, W. J. (2023). Unified enhancement of privacy bounds for mixture mechanisms via f-differential privacy. In Advances in Neural Information Processing Systems (Vol. 36, pp. 55051–55063). Curran Associates, Inc.
  • Wang, C., Zhu, Y., Su, W. J., & Wang, Y. X. (2024). Neural collapse meets differential privacy: Curious behaviors of NoisyGD with near-perfect representation learning. In Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research (PMLR).
  • Wang, H., Gao, S., Zhang, H., Shen, M., & Su, W. J. (2022). Analytical composition of differential privacy via the edgeworth accountant. arXiv:2206.04236(open in a new window)
  • Wang, J., Zhu, R., Liu, S., & Cai, Z. (2018). Node location privacy protection based on differentially private grids in industrial wireless sensor networks. Sensors18(2), 410. https://doi.org/10.3390/s18020410
  • Wang, P., & Zhang, H. (2020). Distributed privacy-preserving logistic regression. Scientia Sinica Informationis50(10), 18.
  • Wasserman, L (2006). All of Nonparametric Statistics. Springer.
  • Wasserman, L., & Zhou, S. (2010). A statistical framework for differential privacy. Journal of the American Statistical Association105(489), 375–389. https://doi.org/10.1198/jasa.2009.tm08651
  • Wu, Y., Ge, C., Zhang, L., & Sun, L. (2017). Differential privacy stream data publishing algorithm based on matrix mechanism under exponential decay model. Scientia Sinica Informationis47(11), 17.
  • Xia, X., Zhang, L., & Cai, Z. (2025). Statistical inference for differentially private stochastic gradient descent. arXiv:2507.20560(open in a new window)
  • Xiao, Y., Gardner, J., & Xiong, L. (2012). Dpcube: Releasing differentially private data cubes for health information. In 2012 IEEE 28th International Conference on Data Engineering (pp. 1305–1308). Institute of Electrical and Electronics Engineers (IEEE).
  • Xiong, P., Zhu, T., & Wang, X. (2014). A survey on differential privacy and applications. Chinese Journal of Computers37(1), 22.
  • Xu, L., Jiang, C., Wang, J., Yuan, J., & Ren, Y. (2014). Information security in big data: Privacy and data mining. IEEE Access2, 1149–1176. https://doi.org/10.1109/ACCESS.2014.2362522
  • Xu, S., Wang, C., Sun, W. W. Y., & Cheng, G. (2023). Binary classification under local label differential privacy using randomized response mechanisms. Transactions on Machine Learning Research, 1–26.
  • Yang, X., Gao, L., Wang, H., Guo, H., & Zheng, J. (2020). Balanced correlation differential privacy protection method for histogram publishing. Chinese Journal of Computers43(8), 19.
  • Yu, D., Naik, S., Backurs, A., Gopi, S., Inan, H. A., Kamath, G., Kulkarni, J., Lee, Y. T., Manoel, A., Wutschitz, L., Yekhanin, S., & Zhang, H. (2022). Differentially private fine-tuning of language models. In International Conference on Learning Representations. OpenReview.
  • Yu, M., Li, J., & Zhou, Y. (2026). Enhancements of communication-efficient distributed statistical inference and its privacy preservation. Journal of Econometrics253,106125. https://doi.org/10.1016/j.jeconom.2025.106125
  • Yu, M., Li, Z., & Zhou, Y. (2023). Privacy-preserving parameter estimation in distributed cases. Acta Mathematicae Applicatae Sinica46(2), 145–165.
  • Zhang, J., Cormode, G., Procopiuc, C. M., Srivastava, D., & Xiao, X. (2017). PrivBayes: Private data release via Bayesian networks. ACM Transactions on Database Systems42(4), 25–41. https://doi.org/10.1145/3134428
  • Zhao, Y., Xia, Y., & Wang, C. (2025). Why does private fine-tuning resist differential privacy noise? A representation learning perspective. In ICLR 2025 Workshop on Navigating and Addressing Data Problems for Foundation Models. OpenReview.

To cite this article: Chengliang Liu , Miaomiao Yu & Yong Zhou (27 May 2026): A survey on differential privacy methods for big data privacy protection, Statistical Theory and Related Fields, DOI: 10.1080/24754269.2026.2679084
To link to this article: https://doi.org/10.1080/24754269.2026.2679084