计算机科学

IM2:一种改进的MIN/MAX窗口函数优化技术

  • 宋光旋 ,
  • 赵大鹏 ,
  • 王晓玲
展开
  • 华东师范大学 计算机科学与软件工程学院 上海市高可信计算重点实验室, 上海 200062
宋光旋,男,硕士研究生,研究方向为数据库.E-mail:guangxuan_song@163.com.

收稿日期: 2016-10-21

  网络出版日期: 2018-01-11

基金资助

国家自然科学基金(61170085,61472141);上海市重点学科建设项目(B412);上海市可信物联网软件协同创新中心项目(ZF1213)

IM2: Improved MIN/MAX window functions optimizer in relational database

  • SONG Guang-xuan ,
  • ZHAO Da-peng ,
  • WANG Xiao-ling
Expand
  • Shanghai Key Laboratory of Trustworthy Computing, School of Computer Science and Software Engineering, East China Normal University, Shanghai 200062, China

Received date: 2016-10-21

  Online published: 2018-01-11

摘要

窗口函数作为一种分析型的OLAP函数加入SQL (Structured Query Language)标准已有十多年,而且随着分析型应用需求的增长窗口函数有着越来越广泛的应用前景.窗口函数的语法非常简单,却可以表达诸如rank、moving average、cumulative sum等复杂的查询.尽管目前主流的商业数据库几乎都支持窗口函数,但是现有的执行策略效率低下,不能满足大批量数据的处理需求.本文主要针对窗口函数中MIN和MAX聚集函数,提出了一种改进的IM2优化策略,可以有效地提升窗口函数的执行效率.本文不仅从时空复杂性理论分析层面进行了证明,而且与已有算法进行了对比实验,证明了本文方法的高效性;另外在目前主流的开源数据库PostgreSQL中实现本文算法,与SQL Server对比有着显著的优化效果.

本文引用格式

宋光旋 , 赵大鹏 , 王晓玲 . IM2:一种改进的MIN/MAX窗口函数优化技术[J]. 华东师范大学学报(自然科学版), 2018 , 2018(1) : 103 -116 . DOI: 10.3969/j.issn.1000-5641.2018.01.010

Abstract

Window functions, also known as analytic OLAP functions, is a part of the SQL standard, and has been extensively studied during the past decade. And the window function has more and more extensive application prospects withthegrowthofthedemandstheanalyticalapplications. Despite its simple syntax, window functions can express many complex queries, such as ranking, moving average, cumulative sum and so on. Although almost all the current mainstream commercial database support window function, the existing implementation strategy is inefficient, and is not suitable for processing big data. In this paper, we propose the IM2 algorithm, an improved algorithm for the MAX/MIN window functions, which effectively improves the efficiency. And we prove the effectiveness of the IM2 algorithm the theoretical complexity analysis. Additionally, we implement the algorithm in PostgreSQL and conduct extensive experiments on real world data to demonstrate the efficiency of the IM2 algorithm.

参考文献

[1] BELLAMKONDA S, AHMED R, WITKOWSKI A, et al. Enhanced subquery optimizations in oracle[J]. Proceedings of the VLDB Endowment, 2009, 2(2):1366-1377.
[2] EISENBERG A, MELTON J. Formerly known as SQL3[J]. ACM SIGMOD Record, 1999, 28(12):131-138.
[3] EISENBERG A, MELTON J, KULKARNI K, et al. SQL:2003 has been published[J]. ACM SIGMOD Record, 2004, 33(1):119-126.
[4] JIN C Q, YI K, CHEN L, et al. Sliding-window top-k, queries on uncertain streams[J]. The VLDB Journal, 2010, 19(3):411-435.
[5] LI J, MAIER D, TUFTE K, et al. Semantics and evaluation techniques for window aggregates in data streams[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 2005:311-322.
[6] LI J, MAIER D, TUFTE K, et al. No pane, no gain:Efficient evaluation of sliding-window aggregates over data streams[J]. ACM SIGMOD Record, 2005, 34(1):39-44.
[7] GUIRGUIS S, SHARAF M A, CHRYSANTHIS P K, et al. Optimized processing of multiple aggregate continuous queries[C]//Proceedings of the 20th ACM international conference on Information and Knowledge Management. ACM, 2012:1515-1524.
[8] ARASU A, WIDOM J. Resource sharing in continuous sliding-window aggregates[C]//Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004:336-347.
[9] LAW Y N, WANG H X, ZANIOLO C. Relational languages and data models for continuous queries on sequences and data streams[J]. ACM Transactions on Database Systems, 2011, 36(2):8:1-8:32
[10] CHATZIANTONIOU D, ROSS K A. Querying multiple features of groups in relational databases[C]//Proceedings of the 22nd International Conference on Very Large Data Bases. 1997:295-306.
[11] BELLAMKONDA S, BORZKAYA T, GHOSH B, et al. Analytic functions in oracle 8i[R/OL].[2016-09-01]. http://www-db.stanford.edu/dbseminar/Archive/SpringY2000/speakers/agupta/paper.pdf.
[12] CAO Y, CHAN C Y, LI J, et al. Optimization of analytic window functions[J]. Proceedings of the VLDB Endowment, 2012, 5(11):1244-1255.
[13] NEUMANN T, MOERKOTTE G. A combined framework for grouping and order optimization[C]//Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004:960-971.
[14] SIMMEN D, SHEKITA E, MALKEMUS T. Fundamental techniques for order optimization[C]//Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. ACM, 1996:57-67.
[15] WANG X Y, CHERNIACK M. Avoiding sorting and grouping in processing queries[C]//Proceedings of the International Conference on Very Large Data Bases. VLDB Endowment, 2003:826-837.
[16] LEIS V, KUNDHIKANJANA K, KEMPER A, et al. Efficient processing of window functions in analytical SQL queries[J]. Proceedings of the VLDB Endowment, 2015, 8(10):1058-1069.
[17] WESLEY R, XU F. Incremental computation of common windowed holistic aggregates[J]. Proceedings of the VLDB Endowment, 2016, 9(12):1221-1232.
[18] 马建松, 王科强, 宋光旋,等. 面向MAX/MIN优化的SQL Window函数处理[J]. 计算机学报, 2016, 39(10):2149-2160.
[19] LIN X, YUAN Y, WANG W, et al. Stabbing the sky:Efficient skyline computation over sliding windows[C]//Proceedings of the 21st International Conference on Data Engineering. IEEE, 2005:502-513.
文章导航

/