收稿日期: 2016-06-27
网络出版日期: 2016-11-29
基金资助
国家 973 计划项目(2012CB316203); 国家 863 计划项目(2015AA015307); 国家自然科学基金(61332006, 61472321, 61502390)
GraphHP: A hybrid platform for iterative graph processing
Received date: 2016-06-27
Online published: 2016-11-29
BSP(Bulk Synchronous Parallel, BSP)计算模型是建立大规模迭代式图处理分布式系统的重要基础. 现有平台(如 Pregel、Giraph、Hama)虽然已经实现了较高的可扩展性, 但主机之间高频同步和通信负荷严重影响了并行计算的效率. 为了解决这个关键性问题, 本文提出了一种基于混合式模型的执行平台 GraphHP(Graph Hybrid Processing). 它不仅继承了以顶点为中心的 BSP 编程接口, 而且能够显著减少同步和通信负荷. 通过在图分区内部和分区之间建立混合执行模型, GraphHP 实现了伪超步迭代计算, 把分区内部计算从分布式同步和通信中分离出来. 这种混合执行模型不需要繁重的调度算法或者以图为中心的串行算法, 就能有效减少同步和通信负荷. 最后, 本文评估了经典的 BSP 应用在 GraphHP 平台的实现方式. 实验表明它比现有的 BSP 实现平台效率更高. 本文提出的 GraphHP 平台虽然是基于Hama 实现的, 但它很容易迁移到其他的 BSP 平台.
苏 静 , 索 博 , 陈 群 , 潘 魏 , 李战怀 . GraphHP:一个图迭代处理的混合平台[J]. 华东师范大学学报(自然科学版), 2016 , 2016(5) : 112 -120 . DOI: 10.3969/j.issn.1000-5641.2016.05.013
BSP (Bulk Synchronous Parallel) computing model is an important foundation for the establishment of a large-scale iterative graph processing distributed system. Existing platforms (e.g., Pregel, Giraph, and Hama) have achieved a high scalability, but the high frequency synchronization and communication load between the hosts have seriously affected the efficiency of parallel computing. In order to solve this key problem, this paper proposes a hybrid model based on GraphHP (Graph Hybrid Processing). It not only inherits the BSP programming interface with the vertex as the center, but also can significantly reduce the synchronization and communication load. By establishing the hybrid execution model between the interior and the interval partition of the graph, the GraphHP realizes the pseudo super step iteration calculation, and separates the internal computation from the distributed synchronization and communication. This hybrid execution model does not need heavy scheduling algorithm or the serial algorithm can effectively reduce the synchronization and communication load. Finally, this paper evaluates the implementation of the classic BSP application in the GraphHP platform, and the experiment shows that it is more efficient than the existing BSP platform. Although the GraphHP platform proposed in this paper is based on Hama, it is easy to migrate to other BSP platforms.
Key words: graph iterative; distributed computation; BSP; GraphHP
[ 1 ] SALIHOGLU S, WIDOM J. Optimizing graph algorithms on pregel-like systems [J]. Proceedings of the VLDB Endowment, 2014, 7(7): 577-588.
[ 2 ] SALIHOGLU S, WIDOM J. GPS: A graph processing system [C]//Proceedings of the 25th International Conference on Scientific and Statistical Database Management. ACM, 2013, Article No 22, doi: 10.1145/2484838.2484843.
[ 3 ] BAO N T, SUZUMURA T. Towards highly scalable pregel-based graph processing platform with x10 [C]//Proceedings of the 22nd International Conference on World Wide Web. ACM, 2013: 501-508.
[ 4 ] CHEN R S, YANG M, WENG X T, et al. Improving large graph processing on partitioned graphs in the cloud [C]//Proceedings of the 3rd ACM Symposium on Cloud Computing. ACM, 2012, Article No 3, doi: 10.1145/2391229.2391232.
[ 5 ] GOUDREAU M W, LANG K, RAO S B, et al. Portable and efficient parallel computing using the bsp model [J]. Computers IEEE Transactions on, 1999, 48(7): 670-689.
[ 6 ] HILL J M D, MCCOL B, STEFANESCU D C, et al. BSPlib: The BSP programming library [J]. Parallel Computing, 1998, 24(14): 1947-1980.
[ 7 ] GREGOR D, LUMSDAINE A. The Parallel BGL: A generic library for distributed graph computations [C]//Proceedings of the Parallel Object-Oriented Scientific Computing (POOSC). 2005: 1-18.
[ 8 ] CHAN A, DEHNE F. CGMGRAPH/CGMLIB: Implementing and testing CGM graph algorithms on PC clusters and shared memory machines [J]. Lecture Notes in Computer Science, 2003, 2840: 117-125.
[ 9 ] WANG G Z, XIE W L, DEMERS A, et al. Asynchronous large-scale graph processing made easy [C]//Proceedings of the 6th Biennial Conference on Innovative Data Systems Research (CIDR). 2013: 58-70.
[10] SHAO B, WANG H, LI Y. Trinity: A distributed graph engine on a memory cloud [C]//Proceedings of the ACM-SIGMOD International Conference on Management of Data. ACM, 2013: 505-516.
[11] CHENG R, HONG J, KYROLA A, et al. Kineograph: Taking the pulse of a fast-changing and connected world [C]//Proceedings of the 7th ACM European Conference on Computer Systems. ACM, 2012: 85-98.
[12] DEMETRESCU C. USA road network [EB/OL]. (2005-10-12)[2016-04-01]. http://www.dis.uniroma1.it/challenge9/download.shtml.
[13] DAVIS T. The University of Florida sparse matrix collection [EB/OL]. (2011-10-13)[2016-04-01]. http://www.cise.ufl.edu/research/sparse/matrices/.
[14] KARYPIS G, KUMAR V. A fast and high quality multilevel scheme for partitioning irregular graphs [J]. SIAM J Sci Comput, 1998, 20(1): 359-392.
[15] KARYPIS G, KUMAR V. A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm [C]//Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing. 1997: 1-12.
[16] TIAN Y Y, BALMIN A, CORSTEN S A, et al. From “think like a vertex” to “think like a graph” [J]. Proceedings of the VLDB Endowment, 2013, 7(3): 193-204.
[17] CHERKASSKY B V, GOLDBERG A V, RADZIK T. Shortest paths algorithms: Theory and experimental evaluation [J]. Mathematical Programming, 1996, 73(2): 129-174.
[18] ANDERSON T, OWICKI S, SAXE J, et al. High-speed switch scheduling for local-area networks [J]. ACM Sigplan Notices, 1993, 11(4): 319-352.
[19] KHAYYAT Z, AWARA K, ALONAZI A, et al. Mizan: A system for dynamic load balancing in large-scale graph processing [C]//Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013: 169-182.
/
〈 |
|
〉 |