华东师范大学学报(自然科学版) ›› 2014, Vol. 2014 ›› Issue (5): 290-300.doi: 10.3969/j.issn.10005641.2014.05.026

• 计算机科学与技术 • 上一篇    下一篇

分布式环境中数据库模式设计实践

庞天泽,张晨东,高明,宫学庆   

  1. 华东师范大学 软件学院,上海 200062
  • 出版日期:2014-09-25 发布日期:2014-11-27
  • 通讯作者: 宫学庆,男,教授,博士生导师,研究方向为数据库 E-mail:xqgong@sei.ecnu.edu.cn
  • 作者简介:庞天泽,男,硕士生,研究方向为分布式数据库. E-mail:pangtz@ecnu.edu.com.
  • 基金资助:

    国家973课题(2010CB731402)

Implementation of database schema design in distributed environment

 PANG  Tian-Ze, ZHANG  Chen-Dong, GAO  Ming, GONG  Xue-Qing   

  1. Software Engineering Institute, East China Normal University, Shanghai 200062, China
  • Online:2014-09-25 Published:2014-11-27

摘要: 近年来,数据规模呈爆炸式增长,使得传统集中式数据库难以满足业务需求.而分布式数据库可以将数据存储在多个节点上,具有更好的扩展性,从而可以支撑业务的不断增长.目前,许多企业已经开发出了成功的分布式数据库产品,例如Google Spanner、淘宝的OceanBase等.传统数据库模式设计中,三大范式(1NF、2NF和3NF)及其扩展范式能够减少数据冗余和更新异常,并保证数据的完整性.然而,在分布式架构下,严格遵循范式的模式设计可能带来查询效率较低等问题,而使用反范式模式设计方法通常可以有效提高查询效率.OceanBase是淘宝自主研发的分布式数据库,支持跨行跨表事务,并在OLTP中具有良好的性能,但是对于OLAP业务,其性能并不高.本文将以OceanBase为例,介绍如何利用反范式设计分布式数据库模式,以改善OLAP的查询性能,并通过在OceanBase上部署TPCH基准评测验证了反范式模式设计的有效性和高效性.

关键词: 反范式, 分布式数据库, OceanBase, TPC-H

Abstract: Recently, we have witnessed an exponential increase in the amount of data. It results in a problem that a centralized database is hard to scaleup to the massive business requirements. A distributed database (DDB) is an alternative that can be scalable to the large scale applications by distributing the data to multinode server. Now, many enterprises have successfully implemented some distributed databases, such as Google Spanner and TaoBao OceanBase. In the theory of the designation of traditional database, different normal forms reduce the operational exception and data redundancy, and also ensure the data integrity. However, a schema design strictly following the normal forms leads to an inefficiently distributed database system because of the large amount of distributed relational operations. Fortunately, denormalization can significantly improve the query efficiency by reducing the number of relations and the amount of the distributed relational operations. OceanBase, a distributed database, is implemented by TaoBao and has high performance for OLTP, rather than OLAP. In this paper, we introduce how to utilize denormalization to design the schema for OceanBase and to improve the performance of OLAP. Finally, we illustrate the efficiency and effectiveness of the denormalization design for OceanBase in the empirical study by using benchmark TPC-H.

Key words: denormalization, distributed database, OceanBase, TPC-H

中图分类号: