Journal of East China Normal University(Natural Sc ›› 2017, Vol. 2017 ›› Issue (5): 11-19.doi: 10.3969/j.issn.1000-5641.2017.05.002

• Data Management • Previous Articles     Next Articles

Distributed stream processing system for join operations

CHEN Ming-zhu, WANG Xiao-tong, FANG Jun-hua, ZHANG Rong   

  1. School of Computer Science and Software Engineering, Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China
  • Received:2017-06-28 Online:2017-09-25 Published:2017-09-25

Abstract: Real-time stream processing system plays an increasingly important role in practical applications. Stream Join constitutes one of the most important and expensive operation in big data analysis. However, skewed data distribution in real-world applications and inherent features of streaming data, such as infinity and unpredictability, put great pressure on the join processing in distributed stream systems. Mainstream industrial stream systems have low versatility on join processing, providing no programming interface; though several academic stream prototype systems solve such a problem to a certain extent, they support equi-join processing only, or results in high resource utilization and severe load imbalance. In this paper, after analyzing three typical distributed stream systems, we integrate the techniques based on Join-Matrix into Storm, design and implement a general stream processing system which supports arbitrary theta joins. Experiments demonstrate that the system proposed in this paper outperforms the static-of-the-art strategies.

Key words: stream processing system, join processing, distributed computing

CLC Number: