窗口函数作为一种分析型的OLAP函数加入SQL (Structured Query Language)标准已有十多年,而且随着分析型应用需求的增长窗口函数有着越来越广泛的应用前景.窗口函数的语法非常简单,却可以表达诸如rank、moving average、cumulative sum等复杂的查询.尽管目前主流的商业数据库几乎都支持窗口函数,但是现有的执行策略效率低下,不能满足大批量数据的处理需求.本文主要针对窗口函数中MIN和MAX聚集函数,提出了一种改进的IM2优化策略,可以有效地提升窗口函数的执行效率.本文不仅从时空复杂性理论分析层面进行了证明,而且与已有算法进行了对比实验,证明了本文方法的高效性;另外在目前主流的开源数据库PostgreSQL中实现本文算法,与SQL Server对比有着显著的优化效果.
Window functions, also known as analytic OLAP functions, is a part of the SQL standard, and has been extensively studied during the past decade. And the window function has more and more extensive application prospects withthegrowthofthedemandstheanalyticalapplications. Despite its simple syntax, window functions can express many complex queries, such as ranking, moving average, cumulative sum and so on. Although almost all the current mainstream commercial database support window function, the existing implementation strategy is inefficient, and is not suitable for processing big data. In this paper, we propose the IM2 algorithm, an improved algorithm for the MAX/MIN window functions, which effectively improves the efficiency. And we prove the effectiveness of the IM2 algorithm the theoretical complexity analysis. Additionally, we implement the algorithm in PostgreSQL and conduct extensive experiments on real world data to demonstrate the efficiency of the IM2 algorithm.
[1] BELLAMKONDA S, AHMED R, WITKOWSKI A, et al. Enhanced subquery optimizations in oracle[J]. Proceedings of the VLDB Endowment, 2009, 2(2):1366-1377.
[2] EISENBERG A, MELTON J. Formerly known as SQL3[J]. ACM SIGMOD Record, 1999, 28(12):131-138.
[3] EISENBERG A, MELTON J, KULKARNI K, et al. SQL:2003 has been published[J]. ACM SIGMOD Record, 2004, 33(1):119-126.
[4] JIN C Q, YI K, CHEN L, et al. Sliding-window top-k, queries on uncertain streams[J]. The VLDB Journal, 2010, 19(3):411-435.
[5] LI J, MAIER D, TUFTE K, et al. Semantics and evaluation techniques for window aggregates in data streams[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 2005:311-322.
[6] LI J, MAIER D, TUFTE K, et al. No pane, no gain:Efficient evaluation of sliding-window aggregates over data streams[J]. ACM SIGMOD Record, 2005, 34(1):39-44.
[7] GUIRGUIS S, SHARAF M A, CHRYSANTHIS P K, et al. Optimized processing of multiple aggregate continuous queries[C]//Proceedings of the 20th ACM international conference on Information and Knowledge Management. ACM, 2012:1515-1524.
[8] ARASU A, WIDOM J. Resource sharing in continuous sliding-window aggregates[C]//Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004:336-347.
[9] LAW Y N, WANG H X, ZANIOLO C. Relational languages and data models for continuous queries on sequences and data streams[J]. ACM Transactions on Database Systems, 2011, 36(2):8:1-8:32
[10] CHATZIANTONIOU D, ROSS K A. Querying multiple features of groups in relational databases[C]//Proceedings of the 22nd International Conference on Very Large Data Bases. 1997:295-306.
[11] BELLAMKONDA S, BORZKAYA T, GHOSH B, et al. Analytic functions in oracle 8i[R/OL].[2016-09-01]. http://www-db.stanford.edu/dbseminar/Archive/SpringY2000/speakers/agupta/paper.pdf.
[12] CAO Y, CHAN C Y, LI J, et al. Optimization of analytic window functions[J]. Proceedings of the VLDB Endowment, 2012, 5(11):1244-1255.
[13] NEUMANN T, MOERKOTTE G. A combined framework for grouping and order optimization[C]//Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004:960-971.
[14] SIMMEN D, SHEKITA E, MALKEMUS T. Fundamental techniques for order optimization[C]//Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. ACM, 1996:57-67.
[15] WANG X Y, CHERNIACK M. Avoiding sorting and grouping in processing queries[C]//Proceedings of the International Conference on Very Large Data Bases. VLDB Endowment, 2003:826-837.
[16] LEIS V, KUNDHIKANJANA K, KEMPER A, et al. Efficient processing of window functions in analytical SQL queries[J]. Proceedings of the VLDB Endowment, 2015, 8(10):1058-1069.
[17] WESLEY R, XU F. Incremental computation of common windowed holistic aggregates[J]. Proceedings of the VLDB Endowment, 2016, 9(12):1221-1232.
[18] 马建松, 王科强, 宋光旋,等. 面向MAX/MIN优化的SQL Window函数处理[J]. 计算机学报, 2016, 39(10):2149-2160.
[19] LIN X, YUAN Y, WANG W, et al. Stabbing the sky:Efficient skyline computation over sliding windows[C]//Proceedings of the 21st International Conference on Data Engineering. IEEE, 2005:502-513.