索引在提高查询性能方面可以起到至关重要的作用, 数据库管理员的一项重要工作是为数据库选择合适的索引. 但随着数据库工作负载的不断复杂、数据量的持续增加、库中关系表的逐渐增多, 人为地分析工作负载成本、选择合适的索引并估算数据库空间变化情况变得越发困难. 本文设计了一款面向负载的自动化索引推荐工具——CedarAdvisor. 它从日志中自动化收集负载, 统计查询频率, 在为单条查询生成候选索引的基础上, 评估索引收益与代价, 通过动态规划算法生成整个负载的索引推荐. 最后我们在分布式数据库上验证了工具的有效性.
Indexes play a pivotal role in improving the performance of queries. One of the important tasks of database administrators is to select appropriate indexes for databases. However, with the increasing complexity of database workloads, the rising amount of data volumes, and the gradually increasing number of tables, it has become more and more difficult to artificially analyze workload cost, select appropriate indexes, and estimate database spatial changes. This paper designs a load-oriented automatic indexing recommendation tool: CedarAdvisor. It collects workload automatically from logs and counts query frequency. First, it generates a candidate index for each query and then evaluates the respective revenue and cost of these candidates. Finally, it generates an indexing recommendation for the complete workload through a dynamic programming algorithm. We verify the effectiveness of the tool through experiments on the distributed database.
[1] GANTZ J, REINSEL D. The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east [EB/OL]. [2020-05-22]. https://www.emc.com/leadership/digital-universe/2012iview/index.htm.
[2] KAO K F, LIAO I E. An index selection method without repeated optimizer estimations [J]. Information Sciences, 2009, 179(13): 2263-2272
[3] CHAUDHURI S, NARASAYYA V. AutoAdmin “what-if” index analysis utility [C]//Acm Sigmod International Conference on Management of Data. ACM, 1998.
[4] CHAUDHURI S, NARASAYYA V R. Microsoft index tuning wizard for SQL server 7.0 [C]//Proceedings ACM SIGMOD International Conference on Management of Data, 1998.
[5] AGRAWAL S, CHAUDHURI S, KOLLAR L, et al. Database tuning advisor for microsoft SQL server 2005: Demo [C]// Acm Sigmod International Conference on Management of Data. ELSEVIER, 2005.
[6] CHAUDHURI S, NARASAYYA V. An Efficient, Cost-Driven Index Selection Tool for Microsoft SQL Server [C]// International Conference on Very Large Data Bases. [S.l.]: Morgan Kaufmann Publishers Inc, 1997.
[7] LOHMAN G M, LIGHTSTONE S. SMART: Making DB2 (more) autonomic [C]// Proc of the 28th International Conference on Very Large Data Bases. [S.l.]: VLDB Endowment, 2002: 877-879.
[8] VALENTIN G, ZULIANI M, ZILIO D C, et al. DB2 advisor: An optimizer smart enough to recommend its own indexes [C]// Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073). IEEE, 2002.
[9] ZILIO D C, RAO J, LIGHTSTONE S, et al. DB2 design advisor: Integrated automatic physical database design [C]//Proceedings of the 30th Annual International Conference on Very Large Data Bases, 2004: 1087-1097.
[10] XIAOMI. SOAR [EB/OL]. [2020-05-22]. https://github.com/XiaoMi/soar.
[11] Data Science & Engineering, East China Normal University. CEDAR. [EB/OL]. [2020-05-22]. https://github.com/daseECNU/Cedar/.
[12] LAHDENMÄKI T, LEACH M. Relational Database Index Design and the Optimizer [M]. [S.l.]: John Wiley & Sons, Inc, 2005: 47.
[13] The PostgreSQL Global Development Group. PostgreSQL [EB/OL]. [2020-05-22]. https://www.postgresql.org/.
[14] TPC. TPC-DS [EB/OL]. [2020-05-22]. http://www.tpc.org/tpcds/.