Materialization is a necessary operation in the process of query execution. Materialization strategy and materialization technology play an important role in the process of query execution. Therefore, it is necessary to design a materialization strategy for column-store database. According to the shortcomings of early materialization and later materialization, we provide a strategy named Smart materialization that are different from the two strategies mentioned above. Here we need to define a concept in the logical query plan-projection, the structure is used to select the desired attributes, the physical table is cut by column, to ensure that the structure at the beginning of the query can reduce the direct load to memory of the amount of data, to avoid additional overhead. In the logical query plan, the projection is divided by columns, and the next required columns are predicted according to the relevance of the query in a set of queries, and the required columns are stabilized in one of the most appropriate projection. We use the data set of TPC-H to verify its validity worked on the disturbed in-memory database-CLAIMS.
ZHANG Han
,
ZHOU Min-qi
. Design and implementation of Smart materialization for column-store in CLAIMS[J]. Journal of East China Normal University(Natural Science), 2017
, 2017(5)
: 30
-39
.
DOI: 10.3969/j.issn.1000-5641.2017.05.004
[1] STONEBRAKER M, ABADI D J, BATKIN A, et al. C-store:A column-oriented DBMS[C]//International Conference on Very Large Data Bases. DBLP, 2005:553-564.
[2] COPELAND G P, KHOSHAFIAN S N. A decomposition storage model[C]//ACM SIGMOD International Conference on Management of Data. ACM, 1985:268-279.
[3] STONEBRAKER M, ETINTEMEL U. "One Size Fits All":An idea whose time has come and gone[C]//International Conference on Data Engineering. IEEE, 2005:2-11.
[4] CORMACK G V. Data compression on a database system[J]. Communications of the ACM, 1985, 28(12):1336-1342.
[5] 黄鹏, 李占山, 张永刚, 等. 基于列存储数据库的压缩态数据访问算法[J]. 吉林大学学报(理学版), 2009, 47(5):1013-1019.
[6] Google Snappy[EB/OL].[2017-04-01]. https://github.com/google/snappy.
[7] WANG L, ZHOU M, ZHANG Z, et al. Elastic pipelining in an in-memory database cluster[C]//ACM SIGMOD. ACM, 2016:1279-1294.
[8] SIKKA V, RBER F, GOEL A, et al. SAP HANA:The evolution from a modern main-memory data platform to an enterprise application platform[J]. Proceedings of the Vldb Endowment, 2013, 6(11):1184-1185.
[9] ABADI D, MADDEN S, FERREIRA M. Integrating compression and execution in column-oriented database systems[C]//ACM SIGMOD International Conference on Management of Data. DBLP, 2006:671-682.
[10] ABADI D J, MYERS D S, DEWITT D J, et al. Materialization strategies in a column-oriented DBMS[C]//2007 IEEE 23rd International Conference on Data Engineering. IEEE, 2007:466-475.
[11] CORNELL D W, YU P S. An effective approach to vertical partitioning for physical design of relational databases[J]. IEEE Transactions on Software Engineering, 1990, 16(2):248-258.
[12] BRYANT R E, HALLARON D R O'. 深入理解计算机系统[M].龚奕利,雷迎春,译.北京:机械工业出版社,2011.
[13] IDREOS S, KERSTEN M L, MANEGOLD S. Self-organizing tuple reconstruction in column-stores[C]//ACM SIGMOD International Conference on Management of Data. ACM, 2009:297-308.
[14] 杨传辉. 大规模分布式存储系统[M]. 北京:机械工业出版社, 2013.
[15] BRUNO N, CHAUDHURI S. To tune or not to tune?:A lightweight physical design alerter[C]//International Conference on Very Large Data Bases. VLDB Endowment, 2006:499-510.
[16] TPC Benchmark H[EB/OL].[2017-04-01]. http://www.tpc.org/tpch/.