华东师范大学学报(自然科学版) ›› 2019, Vol. 2019 ›› Issue (6): 73-87.doi: 10.3969/j.issn.1000-5641.2019.06.008

• 计算机科学 • 上一篇    下一篇

单图中的近似频繁子图挖掘算法

窦建凯1, 林欣1, 胡文心2   

  1. 1. 华东师范大学 计算机科学与技术系, 上海 200062;
    2. 华东师范大学 计算中心, 上海 200062
  • 收稿日期:2018-12-21 出版日期:2019-11-25 发布日期:2019-11-26
  • 通讯作者: 胡文心,女,高级工程师,硕士生导师,研究方向为计算机科学.E-mail:wxhu@cc.ecnu.edu.cn. E-mail:wxhu@cc.ecnu.edu.cn
  • 作者简介:窦建凯,男,硕士研究生,研究方向为频繁图挖掘.E-mail:yirandjk@163.com.

Algorithm for mining approximate frequent subgraphs in a single graph

DOU Jian-kai1, LIN Xin1, HU Wen-xin2   

  1. 1. Department of Computer Science and Technology, East China Normal University, Shanghai 200062, China;
    2. Computer Center, East China Normal University, Shanghai 200062, China
  • Received:2018-12-21 Online:2019-11-25 Published:2019-11-26

摘要: 图数据的挖掘工作是数据挖掘工作中的重要组成部分,已经有许多人在这个领域进行了深入的研究.由于数据获取不可避免噪音数据,故在挖掘频繁图时考虑近似十分重要.然而许多此前的工作只考虑了子图间编辑距离(Graph Edit Distance,GED)的绝对值,而没有考虑子图间编辑距离与子图大小的相对关系.提出了一种在单图中进行近似频繁子图挖掘的新算法,并在计算近似程度时考虑当前子图的大小.该算法通过对近似频繁子图的大小上限进行预测,并通过局部反单调性进行剪枝,提高了算法的效率.实验表明,该算法能够挖掘出传统算法无法发现的近似频繁子图,且相比对比算法具有更好的时间性能.

关键词: 近似, 图, 频繁子图挖掘, 剪枝

Abstract: Graph mining is an important part of data mining, and significant research has been dedicated to the field. Due to the inevitability of noise during data acquisition, it is crucial to address the issue of approximation in mining frequent subgraphs. Previous work has only considered the absolute value of the graph edit distance (GED); however, the relative value between the GED and the size of the subgraph should also be considered. Hence, in this paper, we propose a novel algorithm to mine approximate frequent subgraphs in a single graph; this algortihm takes into account the size of current subgraphs for the calculation of approximations. We increase the efficiency of this algorithm by estimating the upper bound of frequent subgraphs, and pruning according to local anti-monotonicity. Experimental results show that this algorithm can find subgraphs that are missed by traditional mining algorithms, and our proposed approach is relatively efficent compared to other algorithms.

Key words: approximate, graph, frequent subgraph mining, pruning

中图分类号: