CPU-GPU异构环境下的大规模商品知识查询处理

doi:10.3969/j.issn.1000-5641.2021.05.014

华东师范大学学报（自然科学版） ›› 2021, Vol. 2021 ›› Issue (5): 157-168.doi: 10.3969/j.issn.1000-5641.2021.05.014

CPU-GPU异构环境下的大规模商品知识查询处理

方创新(), 宋浩, 林煜明*(), 周娅

桂林电子科技大学广西可信软件重点实验室, 广西桂林　541004

收稿日期:2021-08-07 出版日期:2021-09-25 发布日期:2021-09-28
通讯作者: 林煜明 E-mail:innofang@163.com;ymlin@guet.edu.cn
基金资助:
国家自然科学基金(62062027, U1811264); 广西自然科学基金(2018GXNSFDA281049, 2020GXNSFAA159012); 广西创新驱动发展专项资金(桂科AA19046004); 桂林市重点研发计划(2020010304); 桂林电子科技大学研究生教育创新计划资助项目(2021YCXS075); 广西可信软件重点实验室研究课题(kx202021)

Query processing of large-scale product knowledge in a CPU-GPU heterogeneous environment

Chuangxin FANG(), Hao SONG, Yuming LIN*(), Ya ZHOU

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin Guangxi　541004, China

Received:2021-08-07 Online:2021-09-25 Published:2021-09-28
Contact: Yuming LIN E-mail:innofang@163.com;ymlin@guet.edu.cn

摘要/Abstract

摘要：

知识图谱是将无结构的知识进行结构化表示和组织的有效途径, 已经成为支持众多智能应用的基础设施. 然而, 与商品相关的知识通常呈现出海量性、异质性和层次性的特点, 这对现有基于关系模型和图模型的知识查询处理方法提出了挑战. 针对商品知识的这些特点, 本文设计与实现了一种利用CPU和GPU协同计算的商品知识查询处理方法. 首先, 为了充分发挥GPU的并行计算能力, 提出了一种基于稀疏矩阵的商品知识存储策略, 并针对商品知识进行存储优化; 其次, 根据稀疏矩阵的存储结构设计了一种查询转换方式, 将SPARQL查询转化为对应的矩阵计算, 并将连接查询算法扩展到GPU上进行加速. 为了验证所提出方法的有效性, 我们在LUBM数据集和一个半合成的商品数据集上进行了一系列的实验. 结果表明, 本文提出的方法, 不仅在海量商品知识下相对于现有RDF查询引擎在检索效率上有较大提升, 而且在通用的RDF标准数据集上也能取得较好的检索性能, 并验证了GPU加速查询处理的有效性.

关键词: 商品知识, 异构环境, RDF数据, 查询处理

Abstract:

Knowledge graphs are an effective way to structurally represent and organize unstructured knowledgeare; in fact, these graphs are commonly used to support many intelligent applications. However, product-related knowledge is typically massive in scale, heterogeneous, and hierarchical; these characteristics present a challenge for traditional knowledge query processing methods based on relational and graph models. In this paper, we propose a solution to address these challenges by designing and implementing a product knowledge query processing method using CPU and GPU collaborative computing. Firstly, in order to leverage the full parallel computing capability of GPU, a product knowledge storage strategy based on a sparse matrix is proposed and optimized for the scale of the task. Secondly, based on the storage structure of the sparse matrix, a query conversion method is designed, which transforms the SPARQL query into a corresponding matrix calculation, and extends the join query algorithm to the GPU for acceleration. In order to verify the effectiveness of the proposed method, we conducted a series of experiments on an LUBM dataset and a semisynthetic dataset of products. The experimental results showed that the proposed method not only improves retrieval efficiency for large-scale product knowledge datasets compared with existing RDF query engines, but also achieves better retrieval performance on a general RDF standard dataset.

Key words: product knowledge, heterogeneous environment, RDF data, query processing

中图分类号:

TP392

方创新, 宋浩, 林煜明, 周娅. CPU-GPU异构环境下的大规模商品知识查询处理[J]. 华东师范大学学报（自然科学版）, 2021, 2021(5): 157-168.

Chuangxin FANG, Hao SONG, Yuming LIN, Ya ZHOU. Query processing of large-scale product knowledge in a CPU-GPU heterogeneous environment[J]. Journal of East China Normal University(Natural Science), 2021, 2021(5): 157-168.

图/表 11

图1

图2

图3

图4

图5

图6

表1

表2

表3

表4

表5

参考文献 18

1	陈强, 代仕娅. 基于金融知识图谱的会计欺诈风险识别. 大数据, 2021, 7 (3): 116- 129.
2	TAKEDA A, ITO Y. A review of FinTech research. International Journal of Technology Management, 2021, 86 (1): 67- 88. doi: 10.1504/IJTM.2021.115761
3	LEE K, LIU L. Scaling queries over big RDF graphs with semantic hash partitioning. Proceedings of the VLDB Endowment, 2013, 6 (14): 1894- 1905. doi: 10.14778/2556549.2556571
4	NEUMANN T, WEIKUM G. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 2010, 19 (1): 91- 113. doi: 10.1007/s00778-009-0165-y
5	ZOU L, ÖZSU M T, CHEN L, et al. gStore: A graph-based SPARQL query engine. The VLDB Journal, 2014, 23 (4): 565- 590. doi: 10.1007/s00778-013-0337-7
6	INGALALLI V, IENCO D, PONCELET P, et al. Querying RDF data using a multigraph-based approach [C]// International Conference on Extending Database Technology (EDBT 2016). 2016: 245-256.
7	黄涛贻, 李优, 宋浩, 等. 大规模商品知识的组织和查询优化. 计算机工程与应用, 2020, 56 (21): 154- 163.
8	SCHTZLE A, PRZYJACIEL-ZABLOCKI M, SKILEVIC S, et al. S2RDF: RDF querying with SPARQL on Spark. Proceedings of the VLDB Endowment, 2016, 9 (10): 804- 815. doi: 10.14778/2977797.2977806
9	ATRE M, CHAOJI V, ZAKI M J, et al. Matrix Bit loaded: A scalable lightweight join query processor for RDF data [C]// Proceedings of the 19th International Conference on World Wide Web (WWW 2010). 2010: 41-50.
10	KIM J, SHIN H, HAN W S, et al. Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8 (11): 1238- 1249. doi: 10.14778/2809974.2809985
11	ZOUAGHI I, MESMOUDI A, GALICIA J, et al. Query optimization for large scale clustered RDF data [C]// Proceedings of the 22nd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data. 2020: 56-65.
12	MANOLESCU I. Exploring RDF graphs through summarization and analytic query discovery [C]// Proceedings of the 22nd International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data. 2020: 1-5.
13	SONG J, PENG P, FENG Z, et al. MapSQ: A plugin-based MapReduce framework for SPARQL queries on GPU [C]// WWW’18 Companion. 2018: 81-82.
14	TRAN H N, CAMBRIA E, DO H G. Efficient semantic search over structured web data: A GPU approach [C]// International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2017). 2017: 549-562.
15	CHANTRAPORNCHAI C, CHOKSUCHAT C. TripleID-Q: RDF query processing framework using GPU. IEEE Transactions on Parallel and Distributed Systems, 2018, 29 (9): 2121- 2135. doi: 10.1109/TPDS.2018.2814567
16	ZHANG X, ZHANG M, PENG P, et al. A scalable sparse matrix-based join for SPARQL query processing [C]// International Conference on Database Systems for Advanced Applications (DASFAA 2019). 2019: 510-514.
17	WANG S, LOU C, CHEN R, et al. Fast and concurrent RDF queries using RDMA-assisted GPU graph exploration [C]// Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC’18). 2018: 651-664.
18	JAMOUR F, ABDELAZIZ I, CHEN Y, et al. Matrix algebra framework for portable, scalable and efficient query engines for RDF graphs [C]// Proceedings of the Fourteenth EuroSys Conference. 2019: 1-15.

数据集	三元组数量	数据集大小/MB	主语数量	谓语数量	宾语数量
LUBM1	11 429	247	1 248	437	318
LUBM2	25 491	568	2 746	1 017	872
LUBM4	56 317	1 126	6 182	1 973	1 517
LUBM8	139 748	2 457	14 057	3 551	2 749
LUBM16	547 103	5 117	26 149	6 424	5 287
LUBM32	1 218 526	10 854	57 429	10 573	10 566

SPARQL	数据集	CPU/ms	CPU-GPU/ms
Q1	LUBM1	14	19
	LUBM2	27	36
	LUBM4	79	97
	LUBM8	134	104
	LUBM16	181	144
	LUBM32	247	183
Q2	LUBM1	39	46
	LUBM2	49	51
	LUBM4	89	81
	LUBM8	167	122
	LUBM16	196	158
	LUBM32	269	211
Q3	LUBM1	86	69
	LUBM2	92	74
	LUBM4	117	92
	LUBM8	192	143
	LUBM16	257	184
	LUBM32	302	238
Q4	LUBM1	176	137
	LUBM2	198	156
	LUBM4	244	194
	LUBM8	292	214
	LUBM16	417	286
	LUBM32	497	322
Q5	LUBM1	185	138
	LUBM2	211	146
	LUBM4	268	207
	LUBM8	342	291
	LUBM16	442	327
	LUBM32	523	352
Q6	LUBM1	225	151
	LUBM2	236	170
	LUBM4	325	219
	LUBM8	422	298
	LUBM16	631	327
	LUBM32	983	433

数据集	三元组数量	数据集大小/MB	实体数量	属性数量	观点词数量
AmazonDataset1	68 311 401	2749	6 831 040	5 094	4 843
AmazonDataset2	127 615 329	5174	12 526 227	11 489	8 441
AmazonDataset3	204 544 709	9830	21 447 852	19 847	15 716

编号	SPARQL查询	含义
Q7	SELECT ?p WHERE {C1 <hasProduct> ?p.}	查找在商品类型C1下的所有商品
Q8	SELECT ?a ?o WHERE {C1 <hasProduct> ?p1. C1 <hasProduct> ?p2. ?p1 <hasAttribute> ?a. ?p2 <hasAttribute> ?a. ?a <hasOpinion> ?o.}	查找商品类型C1下不同商品中具有的相同特征以及相同观点的特征和观点
Q9	SELECT DISTINCT ?p ?a1 ?a2 ?o WHERE {C1 <hasProduct> ?p. ?p <hasAttribute> ?a1. ?p <hasAttribute> ?a2. ?a1 <hasOpinion> ?o. ?a2 <hasOpinion> ?o.}	查找商品类型C1下具有不同特征但具有相同观点的商品、特征以及观点

SPARQL	数据集	RDF-3X/ms	gStore/ms	本方法/ms
Q7	AmazonDataset1	212	223	52
	AmazonDataset2	374	411	103
	AmazonDataset3	789	843	477
Q8	AmazonDataset1	646	749	242
	AmazonDataset2	1147	1241	727
	AmazonDataset3	1976	2664	1616
Q9	AmazonDataset1	2693	1848	1098
	AmazonDataset2	4394	3129	2434
	AmazonDataset3	8487	6213	4247

CPU-GPU异构环境下的大规模商品知识查询处理

Query processing of large-scale product knowledge in a CPU-GPU heterogeneous environment

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 18

相关文章 1

编辑推荐

Metrics

本文评价