Article

LCDJ: Locality conscious join algorithm for in-memory cluster computing

  • ZHANG Lei ,
  • ZHOU Min-Qi ,
  • WANG Li
Expand
  • Data Science and Engineering Institute, Software Engineering Institute, East China Normal University, Shanghai 200062, China

Online published: 2014-11-27

Abstract

Equal join is one of the most important operators in database systems. Hash join is an efficient algorithm for equal join. In distributed inmemory database system, data tables are partitioned across multiple nodes. Hash join needs two input tables to be repartitioned on the joined attributes under the same hash function before local join, to make sure that tuples from the two tables with the same join values are transferred to the same node. Since the speed of data processing in memory is much faster than the speed of network, data repartition occupies a large amount of time and becomes the bottleneck of equal join in distributed in-memory database. This paper proposes a novel equal join algorithm, which takes full advantages of in-memory computing and reduces the volume of data to be transferred. The algorithm first accumulates accurate statistics on the joined attributes of two tables, and then builds a cost model to measure the cost of different schedule strategies, and generates the optimized schedule strategy. Furthermore, the degree of parallelism and computing load balance are taken into consideration in our cost model. The proposed algorithm is implemented on our in-memory distributed prototype system Claims. Extensive experiment confirms that the algorithm effectively reduces the cost of network communication, improves the query response time by a huge margin, and gets a higher performance than Hive and Shark.

Cite this article

ZHANG Lei , ZHOU Min-Qi , WANG Li . LCDJ: Locality conscious join algorithm for in-memory cluster computing[J]. Journal of East China Normal University(Natural Science), 2014 , 2014(5) : 228 -239 . DOI: 10.3969/j.issn.10005641.2014.05.020

Outlines

/