收稿日期: 2023-07-08
录用日期: 2023-07-08
网络出版日期: 2023-09-20
版权
A multimode data management method based on Data Fabric
Received date: 2023-07-08
Accepted date: 2023-07-08
Online published: 2023-09-20
Copyright
随着政府和企业在信息化向数字化演进历程中信息化程度的加深, 各类应用系统生成的数据日益多模化、多源化、海量化, 这对数据管理造成了新的挑战. 为了解决这些挑战, 数据管理领域涌现出了许多新的技术和理念, 其中Data Fabric (数据编织) 便是一种新兴的数据管理技术和方法, 它将分布式数据存储、处理和应用整合为一个整体, 并提供了一套可视化的接口进行管理. 本文首先分析了Data Fabric的技术架构、技术特点、技术价值和对多模数据进行管理与应用的完整流程. 其次, 提出了基于时序指标的多模多源数据的异常监测方法、基于日志数据的多模多源数据的异常监测方法, 它们通过Data Fabric技术的使用, 处理速度分别提高了33.3%和42.2%, F1-score分别提高12.2个和14.8个百分点, 进一步说明了Data Fabric技术和本文新提出方法的高效性和应用价值.
关键词: Data Fabric; 多模数据管理; 数据虚拟化
郑新俊 , 田国良 , 黄飞虎 . 基于Data Fabric的多模数据管理方法[J]. 华东师范大学学报(自然科学版), 2023 , 2023(5) : 164 -181 . DOI: 10.3969/j.issn.1000-5641.2023.05.014
In the process of government and enterprise evolution, as information technology deepens from informatization into digitization, the data generated by various applications are becoming increasingly multimode, multisource, and massive, thereby posing new challenges to data management. To address these challenges, many new technologies and concepts have emerged in the field of data management. Data Fabric is a method that integrates distributed data storage, processing, and applications into a whole, providing a set of visual interfaces for management. First, we analyzed the technical architecture, characteristics, value, and complete process of managing and applying the multimode data of Data Fabric. Subsequently, we proposed anomaly monitoring methods based on time series indicators as well as log data for multimode and multisource data, whereby the processing speed improved by 33.3% and 42.2%, and F1 score improved by 12.2 pps (percentage points) and 14.8 pps, respectively, using Data Fabric technology. This further demonstrates the efficiency and application value of Data Fabric technology in the newly proposed methods.
Key words: Data Fabric; multimode data management; data virtualization
1 | LU J, HOLUBOVA I.. Multi-model databases: A new journey to handle the variety of data. ACM Computing Surveys, 2019, 52 (3): 1- 38. |
2 | ALVORD M M, LU F, DU B, et al.. Big data fabric architecture: How big data and data management frameworks converge to bring a new generation of competitive advantage for enterprises.. ACM SIGMOD, 2016, 1 (1): 1- 15. |
3 | K2VIEW. What is data fabric? The complete guide [EB/OL]. [2023-05-15]. https://www.k2view.com/what-is-data-fabric. |
4 | 赵国锋, 葛丹凤. 数据虚拟化研究综述 [J]. 重庆邮电大学学报(自然科学版), 2016, 28(4): 494-502. |
5 | WHITE A, ROLLINGS M. 5 Key actions for IT leaders for effective decision making [EB/OL]. (2023-05-10) [2023-05-15]. https://www.gartner.com/en/publications/what-effective-decision-making-looks-like. |
6 | GROOMBRIDGE D. Gartner top strategic technology trends for 2022 [EB/OL]. (2021-10-27) [2023-05-15]. https://emtemp.gcom.cloud/ngw/globalassets/en/publications/documents/2022-gartner-top-strategic-technology-trends-ebook.pdf. |
7 | 张元涛, 刘闯. Denodo + AWS: 解密数据编织的核心技术--数据虚拟化 [EB/OL]. [2023-5-15]. https://www.denodo.com.cn/document/denodoaws-decrypt-the-core-technology-of-data-weaving-data-virtualization. |
8 | GHOSH P. Data Fabric architecture 101 [EB/OL]. (2022-09-27) [2023-05-15]. https://www.dataversity.net/data-fabric-architecture-101. |
9 | GUPTA A. Data Fabric architecture is key to modernizing data management and integration [EB/OL]. (2021-03-11) [2023-05-15]. https://www.gartner.com/smarterwithgartner/data-fabric-architecture-is-key-to-modernizing-data-management-and-integration. |
10 | BRODER A Z. On the resemblance and containment of documents [C]// Proceedings of the Co-mpression and Complexity of Sequences 1997. USA: IEEE Computer Society, 1997: 21-29. |
11 | SUN L, VERSTEEG S, BOZTAS S, et al. Detecting anomalous user behavior using an extended isolation forest algorithm: An enterprise case study. [EB/OL]. (2016-09-21) [2023-05-15]. |
12 | 李存冰, 尹萍, 林杰, 等. 一种面向公共安全领域的多模态数据管理方法及系统: 中国, 202310315483.3 [P]. 2023-03-29. |
13 | STONEBRAKER M, ILYAS I F.. Data integration: The current status and the way forward. IEEE Data Engineering Bulletin, 2018, 41 (2): 3- 9. |
14 | DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of deep bidirectional transformrs for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 1: 4171–4186. |
15 | AN J, CHO S.. Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2015, 2 (1): 1- 18. |
16 | 鲁鑫. 2021-2022年中国数字治理研究年度报告 [R/OL]. (2022-09-05) [2023-05-15]. https://www.mtx.cn/#/report?id=1001609. |
17 | YUHANNA N. The forrester wave?: Enterprise data fabric, Q2 2022 [R/OL]. (2022-06-23) [2023-05-15]. https://reprints2.forrester.com/#/assets/2/73/RES176390/report. |
18 | 唐军, 张林. 一种通过大数据建模实现智能运营及精准营销的方法: 中国, CN201811491274. X [P]. 2023-05-12. |
19 | 易存道. 一种根因分析频繁子图置信度预测方法及系统: 中国, CN202111267296. X [P]. 2021-12-31. |
20 | GRBOVIC M, CHENG H.. Real-time personalization using embeddings for search rankingat airbnb. ACM SIGKDD, 2018, 1 (1): 311- 320. |
/
〈 |
|
〉 |