Journal of East China Normal University(Natural Science) >
Query processing of large-scale product knowledge in a CPU-GPU heterogeneous environment
Received date: 2021-08-07
Online published: 2021-09-28
Knowledge graphs are an effective way to structurally represent and organize unstructured knowledgeare; in fact, these graphs are commonly used to support many intelligent applications. However, product-related knowledge is typically massive in scale, heterogeneous, and hierarchical; these characteristics present a challenge for traditional knowledge query processing methods based on relational and graph models. In this paper, we propose a solution to address these challenges by designing and implementing a product knowledge query processing method using CPU and GPU collaborative computing. Firstly, in order to leverage the full parallel computing capability of GPU, a product knowledge storage strategy based on a sparse matrix is proposed and optimized for the scale of the task. Secondly, based on the storage structure of the sparse matrix, a query conversion method is designed, which transforms the SPARQL query into a corresponding matrix calculation, and extends the join query algorithm to the GPU for acceleration. In order to verify the effectiveness of the proposed method, we conducted a series of experiments on an LUBM dataset and a semisynthetic dataset of products. The experimental results showed that the proposed method not only improves retrieval efficiency for large-scale product knowledge datasets compared with existing RDF query engines, but also achieves better retrieval performance on a general RDF standard dataset.
Key words: product knowledge; heterogeneous environment; RDF data; query processing
Chuangxin FANG , Hao SONG , Yuming LIN , Ya ZHOU . Query processing of large-scale product knowledge in a CPU-GPU heterogeneous environment[J]. Journal of East China Normal University(Natural Science), 2021 , 2021(5) : 157 -168 . DOI: 10.3969/j.issn.1000-5641.2021.05.014
1 | 陈强, 代仕娅. 基于金融知识图谱的会计欺诈风险识别. 大数据, 2021, 7 (3): 116- 129. |
2 | TAKEDA A, ITO Y. A review of FinTech research. International Journal of Technology Management, 2021, 86 (1): 67- 88. |
3 | LEE K, LIU L. Scaling queries over big RDF graphs with semantic hash partitioning. Proceedings of the VLDB Endowment, 2013, 6 (14): 1894- 1905. |
4 | NEUMANN T, WEIKUM G. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 2010, 19 (1): 91- 113. |
5 | ZOU L, ?ZSU M T, CHEN L, et al. gStore: A graph-based SPARQL query engine. The VLDB Journal, 2014, 23 (4): 565- 590. |
6 | INGALALLI V, IENCO D, PONCELET P, et al. Querying RDF data using a multigraph-based approach [C]// International Conference on Extending Database Technology (EDBT 2016). 2016: 245-256. |
7 | 黄涛贻, 李优, 宋浩, 等. 大规模商品知识的组织和查询优化. 计算机工程与应用, 2020, 56 (21): 154- 163. |
8 | SCHTZLE A, PRZYJACIEL-ZABLOCKI M, SKILEVIC S, et al. S2RDF: RDF querying with SPARQL on Spark. Proceedings of the VLDB Endowment, 2016, 9 (10): 804- 815. |
9 | ATRE M, CHAOJI V, ZAKI M J, et al. Matrix Bit loaded: A scalable lightweight join query processor for RDF data [C]// Proceedings of the 19th International Conference on World Wide Web (WWW 2010). 2010: 41-50. |
10 | KIM J, SHIN H, HAN W S, et al. Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8 (11): 1238- 1249. |
11 | ZOUAGHI I, MESMOUDI A, GALICIA J, et al. Query optimization for large scale clustered RDF data [C]// Proceedings of the 22nd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data. 2020: 56-65. |
12 | MANOLESCU I. Exploring RDF graphs through summarization and analytic query discovery [C]// Proceedings of the 22nd International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data. 2020: 1-5. |
13 | SONG J, PENG P, FENG Z, et al. MapSQ: A plugin-based MapReduce framework for SPARQL queries on GPU [C]// WWW’18 Companion. 2018: 81-82. |
14 | TRAN H N, CAMBRIA E, DO H G. Efficient semantic search over structured web data: A GPU approach [C]// International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2017). 2017: 549-562. |
15 | CHANTRAPORNCHAI C, CHOKSUCHAT C. TripleID-Q: RDF query processing framework using GPU. IEEE Transactions on Parallel and Distributed Systems, 2018, 29 (9): 2121- 2135. |
16 | ZHANG X, ZHANG M, PENG P, et al. A scalable sparse matrix-based join for SPARQL query processing [C]// International Conference on Database Systems for Advanced Applications (DASFAA 2019). 2019: 510-514. |
17 | WANG S, LOU C, CHEN R, et al. Fast and concurrent RDF queries using RDMA-assisted GPU graph exploration [C]// Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC’18). 2018: 651-664. |
18 | JAMOUR F, ABDELAZIZ I, CHEN Y, et al. Matrix algebra framework for portable, scalable and efficient query engines for RDF graphs [C]// Proceedings of the Fourteenth EuroSys Conference. 2019: 1-15. |
/
〈 |
|
〉 |