Journal of East China Normal University(Natural Sc ›› 2018, Vol. 2018 ›› Issue (5): 120-134,153.doi: 10.3969/j.issn.1000-5641.2018.05.010

Previous Articles     Next Articles

Distributed spatio-textual analytics based on the Spark platform

XU Yang1, WANG Zhi-jie2, QIAN Shi-you1   

  1. 1. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
    2. School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China
  • Received:2018-07-09 Online:2018-09-25 Published:2018-09-26

Abstract: With the rapid development of location-based services, spatio-textual data analytics is becoming increasingly important. For instance, it is widely used in social recommendation applications. However, performing efficient analysis on large spatio-textual datasets in a central environment remains a big challenge. This paper explored distributed algorithms for spatio-textual analytics based on the Spark platform. Speciffically, we proposed a scalable two-level index framework, which processes spatio-textual queries in two steps. The global index is highly scalable and it can retrieve candidate partitions with only a few false positives. The local index is designed based on pruning ability of infrequent keywords and used for each candidate partition. We implemented the proposed distributed algorithms in Spark. Extensive experiments demonstrated promising performance for the proposed solution.

Key words: distributed processing, spatio-textual analytics, similarity join

CLC Number: