Journal of East China Normal University(Natural Sc

Previous Articles     Next Articles

Distributed and scalable stream join algorithm

WANG Xiao-tong, FANG Jun-hua, ZHANG Rong   

  1. Institute for Data Science and Engineering, Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China
  • Received:2016-06-24 Online:2016-09-25 Published:2016-11-29

Abstract:

Join-Matrix is a high-performance model for stream join processing in a parallel shared-nothing environment, which supports arbitrary join operations and is
resilient to data skew for taking random tuple distribution as its routing policy. To evenly distribute workload and minimize network communication cost, designing an efficient partitioning policy on the matrix is particularly essential. In this paper, we propose a novel stream join operator that continuously adjust its partitioning scheme to real-time data dynamics. Specifically, based on the sample statistics of streams and rated load of each physical machine, a lightweight scheme generator produces a partitioning scheme; then the corresponding solutions for state relocation are generated by a migration plan generator to minimize migration cost while ensuring result correctness.Our experiments on different kinds of data sets demonstrate that our operator outperforms the static-of-the-art strategies in resource utilization, throughput and system latency.

Key words: stream join processing, Join-Matrix, partitioning scheme, distributed computing