华东师范大学学报(自然科学版) ›› 2021, Vol. 2021 ›› Issue (5): 157-168.doi: 10.3969/j.issn.1000-5641.2021.05.014

• 数据分析与应用 • 上一篇    下一篇

CPU-GPU异构环境下的大规模商品知识查询处理

方创新(), 宋浩, 林煜明*(), 周娅   

  1. 桂林电子科技大学 广西可信软件重点实验室, 广西 桂林 541004
  • 收稿日期:2021-08-07 出版日期:2021-09-25 发布日期:2021-09-28
  • 通讯作者: 林煜明 E-mail:innofang@163.com;ymlin@guet.edu.cn
  • 基金资助:
    国家自然科学基金(62062027, U1811264); 广西自然科学基金(2018GXNSFDA281049, 2020GXNSFAA159012); 广西创新驱动发展专项资金(桂科AA19046004); 桂林市重点研发计划(2020010304); 桂林电子科技大学研究生教育创新计划资助项目(2021YCXS075); 广西可信软件重点实验室研究课题(kx202021)

Query processing of large-scale product knowledge in a CPU-GPU heterogeneous environment

Chuangxin FANG(), Hao SONG, Yuming LIN*(), Ya ZHOU   

  1. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
  • Received:2021-08-07 Online:2021-09-25 Published:2021-09-28
  • Contact: Yuming LIN E-mail:innofang@163.com;ymlin@guet.edu.cn

摘要:

知识图谱是将无结构的知识进行结构化表示和组织的有效途径, 已经成为支持众多智能应用的基础设施. 然而, 与商品相关的知识通常呈现出海量性、异质性和层次性的特点, 这对现有基于关系模型和图模型的知识查询处理方法提出了挑战. 针对商品知识的这些特点, 本文设计与实现了一种利用CPU和GPU协同计算的商品知识查询处理方法. 首先, 为了充分发挥GPU的并行计算能力, 提出了一种基于稀疏矩阵的商品知识存储策略, 并针对商品知识进行存储优化; 其次, 根据稀疏矩阵的存储结构设计了一种查询转换方式, 将SPARQL查询转化为对应的矩阵计算, 并将连接查询算法扩展到GPU上进行加速. 为了验证所提出方法的有效性, 我们在LUBM数据集和一个半合成的商品数据集上进行了一系列的实验. 结果表明, 本文提出的方法, 不仅在海量商品知识下相对于现有RDF查询引擎在检索效率上有较大提升, 而且在通用的RDF标准数据集上也能取得较好的检索性能, 并验证了GPU加速查询处理的有效性.

关键词: 商品知识, 异构环境, RDF数据, 查询处理

Abstract:

Knowledge graphs are an effective way to structurally represent and organize unstructured knowledgeare; in fact, these graphs are commonly used to support many intelligent applications. However, product-related knowledge is typically massive in scale, heterogeneous, and hierarchical; these characteristics present a challenge for traditional knowledge query processing methods based on relational and graph models. In this paper, we propose a solution to address these challenges by designing and implementing a product knowledge query processing method using CPU and GPU collaborative computing. Firstly, in order to leverage the full parallel computing capability of GPU, a product knowledge storage strategy based on a sparse matrix is proposed and optimized for the scale of the task. Secondly, based on the storage structure of the sparse matrix, a query conversion method is designed, which transforms the SPARQL query into a corresponding matrix calculation, and extends the join query algorithm to the GPU for acceleration. In order to verify the effectiveness of the proposed method, we conducted a series of experiments on an LUBM dataset and a semisynthetic dataset of products. The experimental results showed that the proposed method not only improves retrieval efficiency for large-scale product knowledge datasets compared with existing RDF query engines, but also achieves better retrieval performance on a general RDF standard dataset.

Key words: product knowledge, heterogeneous environment, RDF data, query processing

中图分类号: