华东师范大学学报(自然科学版) ›› 2022, Vol. 2022 ›› Issue (5): 195-207.doi: 10.3969/j.issn.1000-5641.2022.05.016

• 物流时空数据分析与智能优化理论 • 上一篇    

基于钢铁物流数据的索引与查询技术研究

邹韬, 钱荣涛, 毛嘉莉*()   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2022-07-23 出版日期:2022-09-25 发布日期:2022-09-26
  • 通讯作者: 毛嘉莉 E-mail:jlmao@dase.ecnu.edu.cn
  • 基金资助:
    国家自然科学基金 (62072180)

Indexing and query technology for steel logistic data

Tao ZOU, Rongtao QIAN, Jiali MAO*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2022-07-23 Online:2022-09-25 Published:2022-09-26
  • Contact: Jiali MAO E-mail:jlmao@dase.ecnu.edu.cn

摘要:

随着钢铁物流的数字化转型发展, 钢铁物流数据的规模也迎来快速增长, 传统的关系型数据库已无法满足海量钢铁物流数据的存储与查询需求. 考虑分布式NoSQL (Not Only Structured Query Language) 数据库具有扩展简单、读写速度快且成本低的特点, 本文利用分布式云存储与NoSQL技术, 对海量钢铁物流数据进行存储并构建索引, 以提高对物流数据的存储能力与查询性能. 首先, 利用Spark对不同来源的数据进行关联与融合, 再将货运平台产生的历史数据与实时数据分级存储管理; 然后, 针对钢铁运输中主要涉及的3类查询构建时空索引和属性索引, 实现对多源物流数据的高效查询; 最后, 基于钢铁物流真实数据的实验结果表明, 本文所提出的方案在数据写入、存储和查询等方面优于传统关系型数据库的索引查询方法, 能够有效支撑海量物流数据的存储和查询.

关键词: 钢铁物流数据, 分布式数据库, 时空索引, 查询

Abstract:

With digital transformation and the development of iron and steel logistics, the scale of iron and steel logistic data has rapidly expanded, and traditional relational databases can no longer meet the storage and query needs. Considering that a distributed not only structured query language (NoSQL) database has a simple expansion capability, fast reading and writing speeds, and low cost, in this study, distributed cloud storage and NoSQL technologies are used to store and build indexes for massive steel logistic data, improving the accuracy of the storage capacity and query performance of the logistic data. First, Spark is used to associate and fuse the data from different sources, and then store and manage the historical and real-time data generated by the freight platform in a hierarchical manner. It then builds spatiotemporal and attribute indexes for the three types of queries mainly involved in steel transportation to achieve an efficient query of multi-source logistic data. Finally, the experimental results based on real steel logistic data show that the proposed scheme is superior to traditional relational database methods in terms of data writing, storage, and querying, and can effectively support the storage and querying of massive logistic data.

Key words: steel logistic data, HBase, spatio-temporal index, query

中图分类号: