华东师范大学学报(自然科学版) ›› 2020, Vol. 2020 ›› Issue (5): 179-188.doi: 10.3969/j.issn.1000-5641.202091003

• 数据中台应用 • 上一篇    

基于自编码器的旅行同伴挖掘

李小昌, 陈贝, 董启文, 陆雪松   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2020-08-02 发布日期:2020-09-24
  • 通讯作者: 陆雪松,男,副研究员,研究方向为计算教育学、金融科技和自然语言处理.E-mail:xslu@dase.ecnu.edu.cn E-mail:xslu@dase.ecnu.edu.cn
  • 基金资助:
    国家自然科学基金(61672234, U1711262)

Discovering traveling companions using autoencoders

LI Xiaochang, CHEN Bei, DONG Qiwen, LU Xuesong   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2020-08-02 Published:2020-09-24

摘要: 随着移动设备的广泛应用, 当今的位置跟踪系统不断产生大量的轨迹数据. 同时, 许多应用亟需具备从移动物体的轨迹数据中挖掘出一起旅行的物体(旅行同伴)的能力, 如智慧交通系统和智慧营销. 现有算法或是基于模式挖掘方法, 按照特定模式匹配旅行同伴; 或是基于表征学习方法, 学习相似轨迹的相似表征. 前一种方法受限于点对匹配的问题, 后一种方法往往忽略轨迹之间的时间相近性. 为了改善这些问题, 提出了一个基于自编码器的深度表征学习模型Mean-Attn(Mean-Attention), 用于发现旅行同伴. Mean-Attn分别使用低维稠密向量表征和位置编码技术, 将空间和时间信息同时注入轨迹的嵌入表征中; 此外, 还利用Sort-Tile-Recursive(STR)算法、均值运算和全局注意力机制, 鼓励轨迹向邻近的轨迹学习; 从编码器获得轨迹表征后, 利用DBSCAN(Density-Based Spatial Clustering of Applications with Noise)对表征进行聚类, 从而找到旅行同伴. 实验结果表明, Mean-Attn在寻找旅行同伴方面的表现要优于传统的数据挖掘算法和最新的深度学习算法.

关键词: 旅伴同伴, 自编码器, 时空信息, STR算法, 注意力机制

Abstract: With the widespread adoption of mobile devices, today’s location tracking systems are producing tremendous amounts of trajectory data on a continuous basis. The ability to discover moving objects that travel together (i.e., traveling companions) from their respective trajectories is desirable for many applications, including intelligent transportation systems and intelligent advertising. Existing algorithms are either based on pattern mining methods that define a particular pattern of traveling companions or based on representation learning methods that learn similar representations for similar trajectories. The former method suffers from the pairwise point-matching problem, and the latter often ignores the temporal proximity between trajectories. In this work, we propose a deep representation learning model using autoencoders, namely Mean-Attn (Mean-Attention) , for the discovery of traveling companions. Mean-Attn collectively injects spatial and temporal information into its input embeddings using skip-gram and positional encoding techniques, respectively. In addition, our model encourages trajectories to learn from their neighbors by leveraging the sort-tile-recursive (STR) algorithm as well as the mean operation and global attention mechanisms. After obtaining the representations from the encoder, we run DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to cluster the representations and find traveling companions. Experimental results suggest that Mean-Attn performs better than the state-of-the-art data mining and deep learning algorithms for locating traveling companions.

Key words: traveling companion, autoencoders, spatiotemporal information, sort-tile-recursive algorithm, attention mechanism

中图分类号: