数据分析与应用

CPU-GPU异构环境下的大规模商品知识查询处理

  • 方创新 ,
  • 宋浩 ,
  • 林煜明 ,
  • 周娅
展开
  • 桂林电子科技大学 广西可信软件重点实验室, 广西 桂林 541004

收稿日期: 2021-08-07

  网络出版日期: 2021-09-28

基金资助

国家自然科学基金(62062027, U1811264); 广西自然科学基金(2018GXNSFDA281049, 2020GXNSFAA159012); 广西创新驱动发展专项资金(桂科AA19046004); 桂林市重点研发计划(2020010304); 桂林电子科技大学研究生教育创新计划资助项目(2021YCXS075); 广西可信软件重点实验室研究课题(kx202021)

Query processing of large-scale product knowledge in a CPU-GPU heterogeneous environment

  • Chuangxin FANG ,
  • Hao SONG ,
  • Yuming LIN ,
  • Ya ZHOU
Expand
  • Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin Guangxi 541004, China

Received date: 2021-08-07

  Online published: 2021-09-28

摘要

知识图谱是将无结构的知识进行结构化表示和组织的有效途径, 已经成为支持众多智能应用的基础设施. 然而, 与商品相关的知识通常呈现出海量性、异质性和层次性的特点, 这对现有基于关系模型和图模型的知识查询处理方法提出了挑战. 针对商品知识的这些特点, 本文设计与实现了一种利用CPU和GPU协同计算的商品知识查询处理方法. 首先, 为了充分发挥GPU的并行计算能力, 提出了一种基于稀疏矩阵的商品知识存储策略, 并针对商品知识进行存储优化; 其次, 根据稀疏矩阵的存储结构设计了一种查询转换方式, 将SPARQL查询转化为对应的矩阵计算, 并将连接查询算法扩展到GPU上进行加速. 为了验证所提出方法的有效性, 我们在LUBM数据集和一个半合成的商品数据集上进行了一系列的实验. 结果表明, 本文提出的方法, 不仅在海量商品知识下相对于现有RDF查询引擎在检索效率上有较大提升, 而且在通用的RDF标准数据集上也能取得较好的检索性能, 并验证了GPU加速查询处理的有效性.

本文引用格式

方创新 , 宋浩 , 林煜明 , 周娅 . CPU-GPU异构环境下的大规模商品知识查询处理[J]. 华东师范大学学报(自然科学版), 2021 , 2021(5) : 157 -168 . DOI: 10.3969/j.issn.1000-5641.2021.05.014

Abstract

Knowledge graphs are an effective way to structurally represent and organize unstructured knowledgeare; in fact, these graphs are commonly used to support many intelligent applications. However, product-related knowledge is typically massive in scale, heterogeneous, and hierarchical; these characteristics present a challenge for traditional knowledge query processing methods based on relational and graph models. In this paper, we propose a solution to address these challenges by designing and implementing a product knowledge query processing method using CPU and GPU collaborative computing. Firstly, in order to leverage the full parallel computing capability of GPU, a product knowledge storage strategy based on a sparse matrix is proposed and optimized for the scale of the task. Secondly, based on the storage structure of the sparse matrix, a query conversion method is designed, which transforms the SPARQL query into a corresponding matrix calculation, and extends the join query algorithm to the GPU for acceleration. In order to verify the effectiveness of the proposed method, we conducted a series of experiments on an LUBM dataset and a semisynthetic dataset of products. The experimental results showed that the proposed method not only improves retrieval efficiency for large-scale product knowledge datasets compared with existing RDF query engines, but also achieves better retrieval performance on a general RDF standard dataset.

参考文献

1 陈强, 代仕娅. 基于金融知识图谱的会计欺诈风险识别. 大数据, 2021, 7 (3): 116- 129.
2 TAKEDA A, ITO Y. A review of FinTech research. International Journal of Technology Management, 2021, 86 (1): 67- 88.
3 LEE K, LIU L. Scaling queries over big RDF graphs with semantic hash partitioning. Proceedings of the VLDB Endowment, 2013, 6 (14): 1894- 1905.
4 NEUMANN T, WEIKUM G. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 2010, 19 (1): 91- 113.
5 ZOU L, ?ZSU M T, CHEN L, et al. gStore: A graph-based SPARQL query engine. The VLDB Journal, 2014, 23 (4): 565- 590.
6 INGALALLI V, IENCO D, PONCELET P, et al. Querying RDF data using a multigraph-based approach [C]// International Conference on Extending Database Technology (EDBT 2016). 2016: 245-256.
7 黄涛贻, 李优, 宋浩, 等. 大规模商品知识的组织和查询优化. 计算机工程与应用, 2020, 56 (21): 154- 163.
8 SCHTZLE A, PRZYJACIEL-ZABLOCKI M, SKILEVIC S, et al. S2RDF: RDF querying with SPARQL on Spark. Proceedings of the VLDB Endowment, 2016, 9 (10): 804- 815.
9 ATRE M, CHAOJI V, ZAKI M J, et al. Matrix Bit loaded: A scalable lightweight join query processor for RDF data [C]// Proceedings of the 19th International Conference on World Wide Web (WWW 2010). 2010: 41-50.
10 KIM J, SHIN H, HAN W S, et al. Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8 (11): 1238- 1249.
11 ZOUAGHI I, MESMOUDI A, GALICIA J, et al. Query optimization for large scale clustered RDF data [C]// Proceedings of the 22nd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data. 2020: 56-65.
12 MANOLESCU I. Exploring RDF graphs through summarization and analytic query discovery [C]// Proceedings of the 22nd International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data. 2020: 1-5.
13 SONG J, PENG P, FENG Z, et al. MapSQ: A plugin-based MapReduce framework for SPARQL queries on GPU [C]// WWW’18 Companion. 2018: 81-82.
14 TRAN H N, CAMBRIA E, DO H G. Efficient semantic search over structured web data: A GPU approach [C]// International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2017). 2017: 549-562.
15 CHANTRAPORNCHAI C, CHOKSUCHAT C. TripleID-Q: RDF query processing framework using GPU. IEEE Transactions on Parallel and Distributed Systems, 2018, 29 (9): 2121- 2135.
16 ZHANG X, ZHANG M, PENG P, et al. A scalable sparse matrix-based join for SPARQL query processing [C]// International Conference on Database Systems for Advanced Applications (DASFAA 2019). 2019: 510-514.
17 WANG S, LOU C, CHEN R, et al. Fast and concurrent RDF queries using RDMA-assisted GPU graph exploration [C]// Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC’18). 2018: 651-664.
18 JAMOUR F, ABDELAZIZ I, CHEN Y, et al. Matrix algebra framework for portable, scalable and efficient query engines for RDF graphs [C]// Proceedings of the Fourteenth EuroSys Conference. 2019: 1-15.
文章导航

/