华东师范大学学报(自然科学版) ›› 2014, Vol. 2014 ›› Issue (5): 216-227.doi: 10.3969/j.issn.10005641.2014.05.019

• 计算机科学与技术 • 上一篇    下一篇

内存集群计算:交互式数据分析

黄岚1,孙珂1,陈晓竹1,周敏奇2   

  1. 1. 电子科技集团第三十二研究所,上海 200233;
    2. 华东师范大学 数据科学与工程研究院,上海 200062
  • 出版日期:2014-09-25 发布日期:2014-11-27
  • 通讯作者: 周敏奇,男,博士,研究方向为内存数据库系统 E-mail:mqzhou@sei.ecnu.edu.cn
  • 作者简介:黄岚,女,硕士,研究方向为刀片服务器设计. Email: huang.moomoo@gmail.com.
  • 基金资助:

    国家自然科学重点项目(61332006)

In-memory cluster computing: Interactive data analysis

 HUANG  Lan1, SUN  Ke1, CHEN  Xiao-Zhu1, ZHOU  Min-Qi2   

  1. 1. No.32 Institute, China Electronics Technology Group Corporation, Shanghai 200233, China;
    2. Data Science & Engineering Institute, East China Normal University, Shanghai 200062, China
  • Online:2014-09-25 Published:2014-11-27

摘要: 本文围绕大数据分类中决策数据的管理和分析进行展开.重点分析了大数据时代关于商务智能(Business Intelligence,BI)技术新的应用需求;讨论了计算机硬件和体系结构的发展为决策数据管理和分析带来的挑战和机遇;通过对新兴典型应用的分析和相关技术和系统特点的总结,说明了基于内存计算的高性能数据管理和分析技术是当前亟待解决的问题,具有广阔的应用前景.在全内存式(in-memory)数据管理环境下,网络通讯将成为整个系统的主要瓶颈.结合内存的特点(数据易失性、内存墙瓶颈),设计针对高性能服务器的无共享分布式内存系统拓扑结构;研究面向异构、多层次缓存和内存结构的分布式数据布局与索引策略,跨核、跨处理器、跨服务器的多粒度并行处理框架,缓存感知、内存感知的分布式数据一致性维护等关键技术,轻量级面向按列存储的数据压缩机制及压缩感知的数据处理机制,将是基于内存计算的高性能数据管理与分析技术的重点研究内容,并将最终实现实时交互式分析处理.

关键词: 内存数据管理, 缓存感知, 迭代式处理

Abstract: This paper discussed the management and analysis over data for decision support, which is defined as one of the three categories of big data. In this big data era, business intelligence creates tremendous large market values, while the enhancement in the computer hardware further stimulate the emergence of new data analysis applications, which require interactive data analysis. Based on the detailed analysis of the typical applications, we find that the inmemory cluster computing system will be the future trends for interactive data analysis. In the environment of inmemory cluster computing systems, the network communication has become the main bottleneck when comparing to memory data access and disk I/Os. Hence, the further research topics within the inmemory cluster computing aspects, including the system topology of the distributed sharednothing inmemory computing systems when considering the characteristics of memory (e.g., volatility, memory wall) as well as communication bottleneck, the data placement and index strategies for isomerism, multilevel cache, the parallel computing framework of multi-granularity over multi-core, multi-processor and multicomputer, the data consistency of the distributed data management, data compression and process mechanism over the column wise data storage.

Key words: in-memory data management, cache sensitive, iterative data processing

中图分类号: