Computer Science

Algorithm for mining approximate frequent subgraphs in a single graph

  • DOU Jian-kai ,
  • LIN Xin ,
  • HU Wen-xin
Expand
  • 1. Department of Computer Science and Technology, East China Normal University, Shanghai 200062, China;
    2. Computer Center, East China Normal University, Shanghai 200062, China

Received date: 2018-12-21

  Online published: 2019-11-26

Abstract

Graph mining is an important part of data mining, and significant research has been dedicated to the field. Due to the inevitability of noise during data acquisition, it is crucial to address the issue of approximation in mining frequent subgraphs. Previous work has only considered the absolute value of the graph edit distance (GED); however, the relative value between the GED and the size of the subgraph should also be considered. Hence, in this paper, we propose a novel algorithm to mine approximate frequent subgraphs in a single graph; this algortihm takes into account the size of current subgraphs for the calculation of approximations. We increase the efficiency of this algorithm by estimating the upper bound of frequent subgraphs, and pruning according to local anti-monotonicity. Experimental results show that this algorithm can find subgraphs that are missed by traditional mining algorithms, and our proposed approach is relatively efficent compared to other algorithms.

Cite this article

DOU Jian-kai , LIN Xin , HU Wen-xin . Algorithm for mining approximate frequent subgraphs in a single graph[J]. Journal of East China Normal University(Natural Science), 2019 , 2019(6) : 73 -87 . DOI: 10.3969/j.issn.1000-5641.2019.06.008

References

[1] YAN X F, HAN J W. gSpan:Graph-based substructure pattern mining[C]//Proceedings of the 2002 IEEE International Conference on Data Mining. 2002:721-724.
[2] ELSEIDY M, ABDELHAMID E, SKIADOPOULOS S, et al. GraMi:Frequent subgraph and pattern mining in a single large graph[J]. Proceedings of the VLDB Endowment, 2014, 7:517-528.
[3] CHEN C, YAN X F, ZHU F D, et al. gApprox:Mining frequent approximate patterns from a massive network[C]//7th IEEE International Conference on Data Mining. 2007:445-450.
[4] FLORES-GARRIDO M, CARRASCO-OCHOA J A, MARTÍNEZ-TRINIDAD J F. AGraP:An algorithm for mining frequent patterns in a single graph using inexact matching[J]. Knowledge and Information Systems, 2015,2(44):385-406.
[5] KURAMOCHI M, KARYPIS G. Finding frequent patterns in a large sparse graph[J]. Data Mining and Knowledge Discovery, 2005, 11(3):243-271.
[6] NIJSSEN S, KOK J N. A quickstart in frequent structure mining can make a difference[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004:647-652.
[7] ALONSO A G, PAGOLA J E M, CARRASCO-OCHOA J A, et al. Mining frequent connected subgraphs reducing the number of candidates[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2008:365-376.
[8] RANU S, SINGH A K. Graphsig:A scalable approach to mining significant subgraphs in large graph databases[C]//2009 IEEE 25th International Conference on Data Engineering. 2009:844-855.
[9] CHENG H, YAN X F, HAN J W. Mining graph patterns[C]//Frequent Pattern Mining, Berlin:Springer, 2014:307-338.
[10] KRISHNA V, SURI N R, ATHITHAN G. A comparative survey of algorithms for frequent subgraph discovery[J]. Current Science, 2011,:190-198.
[11] CHOUDHURY S, PUROHIT S, LIN P, et al. Percolator:Scalable pattern discovery in dynamic graphs[C]//Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 2018:759-762.
[12] INGALALLI V, IENCO D, PONCELET P. Mining frequent subgraphs in multigraphs[J]. Information Sciences, 2018, 451/452:50-66.
[13] ALGULIEV R M, ALIGULIYEV R M, GANJALIYEV F S. Extracting a heterogeneous social network of academic researchers on the Web based on information retrieved from multiple sources[J]. American Journal of Operations Research, 2011, 1(2):33.
[14] LIMA JR D P, GIACOMINI H C, TAKEMOTO R M, et al. Patterns of interactions of a large fish-parasite network in a tropical floodplain[J]. Journal of Animal Ecology, 2012, 81(4):905-913..
[15] GAO X B, XIAO B, TAO D C, et al. A survey of graph edit distance[J]. Pattern Analysis and Applications, 2010, 13(1):113-129.
[16] HIDOVIĆ D, PELILLO M. Metrics for attributed graphs based on the maximal similarity common subgraph[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2004, 18(3):299-313.
[17] DEHMER M, EMMERT-STREIB F. Comparing large graphs efficiently by margins of feature vectors[J]. Applied Mathematics and Computation, 2007, 188(2):1699-1710.
[18] HOLDER L B, COOK D J, DJOKO S. Substucture discovery in the SUBDUE system[C]//KDD Workshop, 1994:169-180.
[19] JIA Y, ZHANG J T, HUAN J. An efficient graph-mining method for complicated and noisy data with real-world applications[J]. Knowledge and Information Systems, 2011, 28(2):423-447.
[20] FLORES-GARRIDO M, CARRASCO-OCHOA J A, MARTÍNEZ-TRINIDAD J F. Extensions to AGraP algorithm for finding a reduced set of inexact graph patterns[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2018, 32(1):1860012.
[21] ACOSTA-MENDOZA N, GAGO-ALONSO A, CARRASCO-OCHOA J A, et al. Extension of canonical adjacency matrices for frequent approximate subgraph mining on multi-graph collections[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2017, 31(8):1750025.
[22] ACOSTA-MENDOZA N, MORALES-GONZáLEZ A, GAGO-ALONSO A, et al. Image classification using frequent approximate subgraphs[C]//Iberoamerican Congress on Pattern Recognition. 2012:292-299.
[23] ACOSTA-MENDOZA N, CARRASCO-OCHOA J A, MARTÍNEZ-TRINIDAD J F, et al. Image clustering based on frequent approximate subgraph mining[C]//Mexican Conference on Pattern Recognition. 2018:189-198.
Outlines

/