华东师范大学学报(自然科学版)

• 计算机科学 • 上一篇    下一篇

GraphHP:一个图迭代处理的混合平台

苏 静, 索 博, 陈 群, 潘 魏, 李战怀   

  1. 西北工业大学 计算机学院, 西安 710072
  • 收稿日期:2016-06-27 出版日期:2016-09-25 发布日期:2016-11-29
  • 通讯作者: 苏 静, 女, 博士研究生, 研究方向为大数据处理技术. E-mail: jinjin-su@163.com.
  • 基金资助:

    国家 973 计划项目(2012CB316203); 国家 863 计划项目(2015AA015307); 国家自然科学基金(61332006, 61472321, 61502390)

GraphHP: A hybrid platform for iterative graph processing

SU Jing, SUO Bo, CHEN Qun, PAN Wei, LI Zhan-huai   

  1. School of Computer, Northwestern Polytechnical University, Xi’an, 710072, China
  • Received:2016-06-27 Online:2016-09-25 Published:2016-11-29

摘要:

BSP(Bulk Synchronous Parallel, BSP)计算模型是建立大规模迭代式图处理分布式系统的重要基础. 现有平台(如 Pregel、Giraph、Hama)虽然已经实现了较高的可扩展性, 但主机之间高频同步和通信负荷严重影响了并行计算的效率. 为了解决这个关键性问题, 本文提出了一种基于混合式模型的执行平台 GraphHP(Graph Hybrid Processing). 它不仅继承了以顶点为中心的 BSP 编程接口, 而且能够显著减少同步和通信负荷. 通过在图分区内部和分区之间建立混合执行模型, GraphHP 实现了伪超步迭代计算, 把分区内部计算从分布式同步和通信中分离出来. 这种混合执行模型不需要繁重的调度算法或者以图为中心的串行算法, 就能有效减少同步和通信负荷. 最后, 本文评估了经典的 BSP 应用在 GraphHP 平台的实现方式. 实验表明它比现有的 BSP 实现平台效率更高. 本文提出的 GraphHP 平台虽然是基于Hama 实现的, 但它很容易迁移到其他的 BSP 平台.

关键词: 图迭代, 分布式计算, BSP, GraphHP

Abstract:

BSP (Bulk Synchronous Parallel) computing model is an important foundation for the establishment of a large-scale iterative graph processing distributed system. Existing platforms (e.g., Pregel, Giraph, and Hama) have achieved a high scalability, but the high frequency synchronization and communication load between the hosts have seriously affected the efficiency of parallel computing. In order to solve this key problem, this paper proposes a hybrid model based on GraphHP (Graph Hybrid Processing). It not only inherits the BSP programming interface with the vertex as the center, but also can significantly reduce the synchronization and communication load. By establishing the hybrid execution model between the interior and the interval partition of the graph, the GraphHP realizes the pseudo super step iteration calculation, and separates the internal computation from the distributed synchronization and communication. This hybrid execution model does not need heavy scheduling algorithm or the serial algorithm can effectively reduce the synchronization and communication load. Finally, this paper evaluates the implementation of the classic BSP application in the GraphHP platform, and the experiment shows that it is more efficient than the existing BSP platform. Although the GraphHP platform proposed in this paper is based on Hama, it is easy to migrate to other BSP platforms.

Key words: graph iterative, distributed computation, BSP, GraphHP