Recently, we have witnessed an exponential increase in the amount of data. It results in a problem that a centralized database is hard to scaleup to the massive business requirements. A distributed database (DDB) is an alternative that can be scalable to the large scale applications by distributing the data to multinode server. Now, many enterprises have successfully implemented some distributed databases, such as Google Spanner and TaoBao OceanBase. In the theory of the designation of traditional database, different normal forms reduce the operational exception and data redundancy, and also ensure the data integrity. However, a schema design strictly following the normal forms leads to an inefficiently distributed database system because of the large amount of distributed relational operations. Fortunately, denormalization can significantly improve the query efficiency by reducing the number of relations and the amount of the distributed relational operations. OceanBase, a distributed database, is implemented by TaoBao and has high performance for OLTP, rather than OLAP. In this paper, we introduce how to utilize denormalization to design the schema for OceanBase and to improve the performance of OLAP. Finally, we illustrate the efficiency and effectiveness of the denormalization design for OceanBase in the empirical study by using benchmark TPC-H.
PANG Tian-Ze
,
ZHANG Chen-Dong
,
GAO Ming
,
GONG Xue-Qing
. Implementation of database schema design in distributed environment[J]. Journal of East China Normal University(Natural Science), 2014
, 2014(5)
: 290
-300
.
DOI: 10.3969/j.issn.10005641.2014.05.026