大规模评论数据的存储与查询性能对构建于其上的各类应用的快速响应具有重要影响.同时,异构计算环境中各计算节点性能呈现差异,如何充分开采各节点的计算和存储性能,优化大规模评论数据的存储与查询性能,是一个关键挑战.基于Redis集群的数据管理优势,首先提出了一种同构环境下基于卡槽存储平衡的大规模评论数据存储模型;然后论证了卡槽数目与节点查询效率的关系,以"负载与访问性能相平衡"的原则分配卡槽,进一步设计了异构环境下的集群节点负载计算和存储分配方法,充分开采了异构Redis集群中不同节点的性能.实验结果表明,提出的存储模型具有很好的存储平衡效果,提升了集群的整体查询效率.
The storage and query performance for large-scale comment data have a great influence on those applications built on the above data. In a heterogeneous computing environment, each node has different performance on storage and computation, it presents a key challenge for optimizing the storage and query performance for large-scale comment data by taking full advantage of the performance of each node. Based on the ability of Redis cluster, we design a storage model for large-scale comment data in a homogeneous Redis cluster, which provides the storage balancing in Redis slots. And then, we discuss the relationship between the number of Redis slots and query efficiency to design a method for allocating storage on the real load of each computing node for heterogeneous Redis clusters, which can make full use of the performance of each node and can guide to allocate slots to nodes by balancing the query performance and storage loading. Our experimental results show that the proposed model has a good effect on storage loading and improve the query efficiency of the heterogeneous Redis cluster.
[1] INTEL. A yearly product cadence moves the industry forward in a predictable fashion that can be planned in advance[EB/OL].[2017-05-10]. https://www.intel.com/content/www/us/en/silicon-innovations/intel-tock-modelgeneral.html.
[2] CHANG F, DEAN J, GHEMAWAT S. et al. Bigtable:A distributed storage system for structured data[J]. ACM Transactions on Computer Systems, 2006, 26(2):205-218.
[3] BORTHAKUR D. The Hadoop distributed file system:Achitecture and design[EB/OL].[2017-06-02]. http://hadoop.apache.org/common/docs/r0.180/hdfsdesign.pdf.
[4] 申德荣, 于戈, 王习特, 等. 支持大数据管理的NoSQL系统研究综述[J]. 软件学报, 2013(8):1786-1803.
[5] 何亚农, 宋玮, 赵跃龙. 基于平衡结构的对等网络存储系统研究[J]. 计算机工程与设计, 2011, 32(8):2611-2613.
[6] KALA K A, CHITHARANJAN K. Locality Sensitive Hashing based incremental clustering for creating affinity groups in Hadoop-HDFS-An infrastructure extension[C]//International Conference on Circuits, Power and Computing Technologies. IEEE, 2013:1243-1249.
[7] ROWSTRON A, DRUSCHEL P. Storage management and caching in PAST, a large-scale, persistent peer-topeer storage utility[C]//Proceedings of the 18th ACM Symposium on Operating Systems Principles. ACM, 2001:188-201.
[8] OKCAN A, RIEDEWALD M. Processing theta-joins using MapReduce[C]//Proceedings of SIGMOD International Conference on Management of Data. ACM, 2011:949-960.
[9] WEI Q, VEERAVALLI B, GONG B, et al. CDRM:A cost-effective dynamic replication management scheme for cloud storage cluster[C]//IEEE International Conference on CLUSTER Computing. 2010:188-196.
[10] XIE C, CAI B. A decentralized storage cluster with high reliability and flexibility[C]//Proceedings of 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. IEEE, 2006:1-8.