支持隐私保护的端云协同训练

doi:10.3969/j.issn.1000-5641.2023.05.007

摘要/Abstract

摘要：

我国在数据资源上具有规模化和多样化的优势, 在移动互联网数据应用上具有后发优势, 在丰富的应用场景下产生了海量数据, 推荐系统可以从大规模数据中挖掘有价值的信息, 缓解信息过载问题. 已有的工作聚焦于集中式推荐, 数据在云侧训练. 随着数据安全和隐私保护问题的日益突出, 从端侧设备收集用户数据变得越发困难, 这使得集中式推荐变得不可行. 以去中心化的方式, 利用端侧设备和云服务器的优势, 充分考虑数据安全与隐私保护问题, 面向推荐系统, 提出了一个基于联邦机器学习 (federated machine learning, FedML)与移动神经网络 (mobile neural network, MNN) 的端云协同训练方法FedMNN (federated machine learning and mobile neural network). 具体分为3部分: 首先, 将多种深度学习框架实现的云侧模型以ONNX (open neural network exchange)作为中间框架通过MNN模型转换工具转换成通用MNN模型供端侧设备训练; 然后, 云侧将模型下发给端侧设备, 端侧初始化后, 获取本地数据进行训练并计算损失, 再执行梯度反向传播; 最后, 端侧训练后的模型反馈给云侧, 通过联邦学习框架进行模型聚合与更新, 再根据不同需求, 将云侧模型按需部署到端侧设备上, 实现端云协同. 实验通过对比FedMNN和FLTFlite (flower and TensorFlow lite)框架在基准任务上的功耗, 发现FedMNN比FLTFlite低32% ~ 51%, 并以DSSM (deep structured semantic model)和Deep and Wide这2个推荐模型为例, 实验验证了端云协同训练的有效性.

关键词: 隐私保护, 联邦学习, 机器学习, 端云协同训练

Abstract:

China has the advantages of scale and diversity in data resources, and mobile internet data applications, which generate massive amounts of data in diverse application scenarios, recommendation systems have the capability to extract valuable information from this massive amounts of data, thereby mitigating the problem of information overload. Most existing research on recommendation systems focused on centralized recommender systems, training the data on the cloud centrally. However, with increasingly prominent data security and privacy protection issues, collecting user data has become increasingly difficult, making centralized recommendation methods infeasible. This study focuses on privacy-preserving cloud-end collaborative training in a decentralized manner for personalized recommender systems. To fully utilize the advantages of end devices and cloud servers while considering privacy and security issues, a cloud-end collaborative training method named FedMNN (federated machine learning and mobile neural network) is proposed for recommender systems based on federated machine learning (FedML) and a mobile neural network (MNN). The proposed method was divided into three parts: First, cloud-based models implemented in various deep learning frameworks were converted into general MNN models for end-device training using the ONNX (open neural network exchange) intermediate framework and a MNN model conversion tool. Second, the cloud server sends the model to the end-side devices, which initialized and obtain local data for training and loss calculation, followed by gradient back-propagation. Finally, the end-side models are fed back to the cloud server for model aggregation and updating. Depending on different requirements, the cloud model was deployed on end-side devices as required, achieving end-cloud collaboration. Experiments comparing power consumption of the proposed FedMNN and FLTFlite (flower and TensorFlow lite) frameworks on benchmark tasks identified that FedMNN is 32% to 51% lower than FLTFlite. Using DSSM (deep structured semantic model) and deep and wide recommendation models, the experimental results demonstrated the effectiveness of the proposed cloud-end collaborative training method.

Key words: privacy protection, federated learning, machine learning, cloud-end collaborative training

中图分类号:

TP181

高祥云, 孟丹, 罗明凯, 王俊, 张丽平, 孔超. 支持隐私保护的端云协同训练[J]. 华东师范大学学报（自然科学版）, 2023, 2023(5): 77-89.

Xiangyun GAO, Dan MENG, Mingkai LUO, Jun WANG, Liping ZHANG, Chao KONG. Privacy-preserving cloud-end collaborative training[J]. Journal of East China Normal University(Natural Science), 2023, 2023(5): 77-89.

图/表 12

图1

图2

图3

表1

表2

表3

图4

图5

图6

图7

图8

图9

参考文献 23

1	HAFFAR R, SANCHEZ D, DOMINGO-FERRER J.. Explaining predictions and attacks in federated learning via random forests. Applied Intelligence, 2023, 53 (1): 169- 185.
2	ZHAO B, MOPURI K R, BILEN H. IDGL: Improved deep leakage from gradients [EB/OL]. (2020-01-08)[2023-05-18]. https://arxiv.org/pdf/2001.02610.pdf.
3	WEI K, LI J, DING M, et al.. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security, 2020, 15, 3454- 3469.
4	FANG H K, QIAN Q.. Privacy preserving machine learning with homomorphic encryption and federated learning. Future Internet, 2021, 13 (4): 94.
5	LI T, SAHU A K, TALWALKAR A, et al.. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 2020, 37 (3): 50- 60.
6	KAIROUZ P, MCMAHAN H B, AVENT B, et al.. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 2021, 14 (1/2): 1- 210.
7	KARIMIREDDY S P, KALE S, MOHRI M, et al. SCAFFOLD: Stochastic controlled averaging for federated learning [C]// Proceedings of the 37th International Conference on Machine Learning. 2020: 5132-5143.
8	HE C Y, LI S Z, SO J, et al. FedML: A research library and benchmark for federated machine learning [EB/OL]. (2020-11-08)[2023-05-18]. https://arxiv.org/pdf/2007.13518.pdf.
9	MATHUR A, BEUTEL D J, DE GUSMÃO P P B, et al. On-device federated learning with flower [EB/OL]. (2021-04-07)[2023-05-18]. https://arxiv.org/pdf/2104.03042.pdf.
10	JIANG X T, WANG H, CHEN Y L, et al. MNN: A universal and efficient inference engine[EB/OL]. (2020-02-27)[2023-06-01]. https://arxiv.org/pdf/2002.12418.pdf.
11	ZHOU P, WANG K H, GUO L K, et al.. A privacy-preserving distributed contextual federated online learning framework with big data support in social recommender systems. IEEE Transactions on Knowledge and Data Engineering, 2019, 33 (3): 824- 838.
12	YANG Q, LIU Y, CHEN T J, et al.. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 2019, 10 (2): 12.
13	CHEN Q, YAO L, WU Y L, et al. PyHENet: A generic framework for privacy-preserving DL inference based on fully homomorphic encryption [C]// Proceeding of the 4th International Conference on Data Intelligence and Security. 2022: 127-133.
14	LIU Y, FAN T, CHEN T J, et al.. FATE: An industrial grade platform for collaborative learning with data protection. Journal of Machine Learning Research, 2021, 22, 226.
15	马艳军, 于佃海, 吴甜, 等.. 飞桨: 源于产业实践的开源深度学习平台. 数据与计算发展前沿, 2019, 1 (1): 105- 115.
16	MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data [C]// Proceedings of the 20th International Conference on Artificial intelligence and statistics. 2017: 1273-1282.
17	LI Q B, WEN Z Y, HE B S. Practical federated gradient boosting decision trees [C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020: 4642-4649.
18	林伟伟, 石方, 曾岚, 等. 联邦学习开源框架综述 [J]. 计算机研究与发展, 2023, 60(7): 1551-1580.
19	WU D P, SUN M Y, ZHANG P, et al.. Personalized secure demand-oriented data service toward edge-cloud collaborative IoT. IEEE Internet of Things Journal, 2022, 10 (1): 378- 390.
20	WANG T, MEI Y X, JIA W J, et al.. Edge-based differential privacy computing for sensor-cloud systems. Journal of Parallel and Distributed Computing, 2020, 136, 75- 85.
21	YAN Y K, NIU C Y, GU R J, et al. On-device learning for model personalization with large-scale cloud-coordinated domain adaption [C]// Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022: 2180-2190.
22	YAO J C, WANG F, DING X C, et al. Device-cloud collaborative recommendation via meta controller [C]// Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022: 4353-4362.
23	YAO J C, WANG F, JIA K Y, et al. Device-cloud collaborative learning for recommendation [C]// Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2021: 3865-3874.

训练框架	epoch	batch_size	收敛时间/min	每轮功耗/mA · h	每轮功耗增量/mA · h
FLTFlitelocal	3	100	17.82	8.8589	4.3181
FLTFlitelocal	1	100	8.99	4.5292	2.2649
FLTFlitelocal	1	1000	8.04	4.3096	2.2908
FedMNNlocal	3	100	18.08	8.6584	2.9465
FedMNNlocal	1	100	9.63	4.1511	1.1283
FedMNNlocal	1	1000	8.64	3.9922	1.2832

客户端数量	通信轮数	AUC
1	1	70.47
1	2	72.63
1	3	74.18
1	4	74.85
1	5	75.18
3	1	71.10
3	2	73.31
3	3	74.82
3	4	75.57
3	5	75.86

客户端数量	通信轮数	AUC
1	1	53.45
1	2	57.08
1	3	59.27
1	4	60.68
1	5	61.67
3	1	53.97
3	2	57.70
3	3	59.85
3	4	61.19
3	5	62.11

[1]	乔少杰, 蒋宇河, 刘晨旭, 金澈清, 韩楠, 何帅为. 基于智能合约的教育大数据安全管理和隐私保护算法[J]. 华东师范大学学报（自然科学版）, 2024, 2024(5): 128-140.
[2]	陈灏, 何贤强, 李润, 曹芳. 基于机器学习的长江口表层水体溶解有机碳遥感反演研究[J]. 华东师范大学学报（自然科学版）, 2024, 2024(4): 123-136.
[3]	张勇, 王慧, 朱传华, 周浩, 詹宇, 李灿, 肖逸凡, 杨丽丽, 刘佳奇. 基于机器学习的卫星遥感水质富营养化评价——以合肥市环城河为例[J]. 华东师范大学学报（自然科学版）, 2024, 2024(1): 1-8, 112.
[4]	步一凡, 王晓玲, 贺珂珂, 卢兴见, 王文萱. 面向身份相互关系一致性的人脸去识别化方法[J]. 华东师范大学学报（自然科学版）, 2023, 2023(6): 49-60.
[5]	史洪玮, 洪道诚, 施连敏, 杨迎尧. 异构编码联邦学习[J]. 华东师范大学学报（自然科学版）, 2023, 2023(5): 110-121.
[6]	黄超然, 佟兴, 张召, 金澈清, 杨英杰, 秦钢. 面向教育的区块链应用合约架构和数据隐私研究[J]. 华东师范大学学报（自然科学版）, 2022, 2022(5): 61-72.
[7]	周童晖, 柳银萍. 利用机器学习技术确定同伦分析解中的收敛控制参数[J]. 华东师范大学学报（自然科学版）, 2022, 2022(2): 34-44.
[8]	余阳, 胡卉芪, 周煊. 基于非易失性内存的LSM-tree存储系统优化[J]. 华东师范大学学报（自然科学版）, 2021, 2021(5): 37-47.
[9]	杨梦晨, 陈旭栋, 蔡鹏, 倪葎. 早期时间序列分类方法研究综述[J]. 华东师范大学学报（自然科学版）, 2021, 2021(5): 115-133.
[10]	宋春芝, 董晓蕾, 曹珍富. 高效可验证的隐私保护推荐系统[J]. 华东师范大学学报(自然科学版), 2018, 2018(2): 41-51,62.
[11]	田秀霞, 李丽莎, 赵传强, 田福粮, 宋谦. 面向智能电表隐私保护的电量请求方案[J]. 华东师范大学学报(自然科学版), 2017, 2017(5): 87-100.
[12]	肖垚, 毕军芳, 韩易, 董启文. 在线广告中点击率预测研究[J]. 华东师范大学学报(自然科学版), 2017, 2017(5): 80-86,100.
[13]	刘曙曙，刘安，刘冠峰，李直旭，赵雷，郑凯. 一种高效的保护隐私的轨迹相似度计算框架[J]. 华东师范大学学报(自然科学版), 2015, 2015(5): 154-161.
[14]	曹伍，徐葎，刘玉葆，印鉴. 基于多边形隐匿区域的LBS系统[J]. 华东师范大学学报(自然科学版), 2015, 2015(5): 143-153.
[15]	张峰,倪巍伟. 基于伪随机数加密的保护位置隐私近邻查询方法[J]. 华东师范大学学报(自然科学版), 2015, 2015(5): 128-142.