A survey of graph edit distance

Pattern Anal Applic (2010) 13:113–129DOI 10.1007/s10044-008-0141-yTHEORETICAL ADVANCESA survey of graph edit distanceXinbo Gao Æ Bing Xiao Æ Dacheng Tao ÆXuelong LiReceived: 15 November 2007 / Accepted: 16 October 2008 / Published online: 13 January 2009Ó Springer-Verlag London Limited 2009Abstract Inexact graph matching has been one of thesignificant research foci in the area of pattern analysis. Asan important way to measure the similarity between pairwisegraphs error-tolerantly, graph edit distance (GED) isthe base of inexact graph matching. The research advanceof GED is surveyed in order to provide a review of theexisting literatures and offer some insights into the studiesof GED. Since graphs may be attributed or non-attributedand the definition of costs for edit operations is various, theexisting GED algorithms are categorized according to thesetwo factors and described in detail. After these algorithmsare analyzed and their limitations are identified, severalpromising directions for further research are proposed.Keywords Inexact graph matching Graph edit distance Attributed graph Non-attributed graphX. Gao B. XiaoSchool of Electronic Engineering, Xidian University,710071 Xi’an, People’s Republic of ChinaX. Gaoe-mail: xbgao@mail.xidian.edu.cnB. Xiaoe-mail: bingxue_8125@163.comD. TaoSchool of Computer Engineering,Nanyang Technological University, 50 Nanyang Avenue,Blk N4, Singapore 639798, Singaporee-mail: dacheng.tao@gmail.comX. Li (&)School of Computer Science and Information Systems,Birkbeck College, University of London,London WC1E 7HX, UKe-mail: xuelong@dcs.bbk.ac.uk1 Originality and contributionGraph edit distance is an important way to measure thesimilarity between pairwise graphs error-tolerantly ininexact graph matching and has been widely applied topattern analysis and recognition. However, there is scarcelyany survey of GED algorithms up to now, and thereforethis paper is novel to a certain extent. The contribution ofthis paper focuses on the following aspects:On the basis of authors’ effort for studying almost allGED algorithms, authors firstly expatiate on the idea ofGED with illustrations in order to make the explanationvisual and explicit;Secondly, the development of GED algorithms andsignificant research findings at each stage, which shouldbe comprehended by researchers in this area, aresummarized and analyzed;Thirdly, the most important part of this paper is providinga review of the existing literatures and offering someinsights into the studies of GED. Since graphs may beattributed or non-attributed and the definition of costs foredit operations is various, existing GED algorithms arecategorized according to these two factors and aredescribed in detail. Advantages and disadvantages ofthese algorithms are analyzed and indicated by comparingthem experimentally and theoretically;Finally, a conclusion is given that several promisingdirections for further research are proposed in terms oflimitations of the existing GED algorithms.2 IntroductionIn structural pattern recognition, graphs have invariabilityto rotation and translation of images, and in addition,123

114 Pattern Anal Applic (2010) 13:113–129Fig. 1 The development ofGEDtransformation of an image into a ‘mirror’ image; thus theyare used widely as the potent representation of objects.With the representation of graphs, pattern recognitionbecomes a problem of graph matching. The presence ofnoise means that the graph representations of identical realworld objects may not match exactly. One commonapproach to this problem is to apply inexact graphmatching [1–5], in which error correction is made part ofthe matching process. Inexact graph matching has beensuccessfully applied to recognition of characters [6, 7],shape analysis [8, 9], image and video indexing [10–14]and image registration [15]. Since photos and their correspondingsketches are similar to each other geometricallyand their difference mainly focuses on texture information,photos and sketches can be represented with graphs andthen sketch-photo recognition [16] will be realized throughinexact graph matching in the future. Central to thisapproach is the measurement of the similarity of pairwisegraphs. This can be measured in many ways but oneapproach which of late has garnered particular interestbecause it is error-tolerant to noise and distortion is thegraph edit distance (GED), defined as the cost of the leastexpensive sequence of edit operations that are needed totransform one graph into another.In the development of GED which is demonstrated inFig. 1, Sanfeliu and Fu [17] plays an important role, whofirst introduced edit distance into graph. It is computed bycounting node and edge relabelings together with thenumber of node and edge deletions and insertions necessaryto transform a graph into another. Extending theiridea, Messmer and Bunke [18, 19] defined the subgraphedit distance by the minimum cost for all error-correctingsubgraph isomorphisms, in which common subgraphs ofdifferent model graphs are represented only once and thelimitation of inexact graph matching algorithms workingon only two graphs once can be avoided. But the directGED lacks some of the formal underpinning of string editdistance, so there is considerable current effort aimed atputting the underlying methodology on a rigorous footing.There have been some development for overcoming thisdrawback, for instance, the relationship between GED andthe size of the maximum common subgraph (MCS) hasbeen demonstrated [20], the uniqueness of the cost-functionis commented [21], a probability distribution for localGED has been constructed, extending of string edit to treesand graphs, etc. GED has been computed for both attributedrelational graphs and non-attributed graphs so far.Attributed graphs have the attribute of nodes, edges, orboth nodes and edges according to which the GED iscomputed directly. Non-attributed graphs only include theinformation of connectivity structure; therefore they areusually converted into strings and edit distance is used tocompare strings, the coded patterns of graphs. The editdistance between strings can be evaluated by dynamicprogramming [5], which has been extended to comparetrees and graphs on a global level [22, 23]. Hancock et al.used Levenshtein distance, an important kind of edit distance,to evaluate the similarity of pairwise strings whichare derived from graphs [24]. Whereas Levenshtein editdistance does not fully exploit the coherence or statisticaldependencies existing in the local context, Wei made useof Markov random field to develop the Markov edit distance[25] in 2004. Recently, Marzal and Vidal [26]normalized the edit-distance so that it may be consistentlyapplied across a range of objects in different size and thisidea has been used to model the probability distribution foredit path between pairwise graphs [27]. The Hammingdistance between two strings is a special case of the editdistance. Hancock measures the GED with Hamming distancebetween the structural units of graphs together withthe size difference between graphs [28]. So, in the developmentof GED, the role of edit distance [5, 29] cannot beneglected and its advancement promotes the birth of newGED algorithms.Although the research of GED has been developedflourishingly, GED algorithms are influenced considerablyby cost functions which are related to edit operations. TheGED between pairwise graphs changes with the change ofcost functions and its validity is dependent on the rationalityof cost functions definition. Researchers havedefined the cost functions in various ways by far, andparticularly, the definition of cost functions is cast into aprobability framework mainly by Hancock and Bunkelately [30, 31]. The problem is not solved radically. Bunke[21] connects cost functions with multifold graph isomorphismin theory and then the necessity of cost functiondefinition can be removed. But graph isomorphism is akind of NP-complete problem and has to work togetherwith constraints and heuristics in practice. So, efforts arestill needed to develop new GED algorithms having123

Pattern Anal Applic (2010) 13:113–129 115reasonable cost functions or independent of defining costfunctions, such as the algorithms in [32, 33].The aim of this paper is to provide a survey of currentdevelopment of GED which is related to computing thedissimilarity of graphs in error correcting graph matching.This paper is organized as follows: after some concepts andbasic algorithms are given in Sect. 3, the existing GEDalgorithms are categorized and described in detail inSect. 4, which compares existing algorithms to show theiradvantages and disadvantages. In Sect. 5, a summary ispresented and some important problems of GED deservingfurther research are proposed.3 Basic concepts and algorithmsMany concepts of graph theory and basic algorithms ofsearch and learning strategies are regarded as the foundationof existing GED algorithms. In order to describe andanalyze GED algorithms thoroughly, these concepts haveto be expounded in advance.3.1 Concepts related to graph theoryThe subject investigated by GED is the graph representationof objects, and, judging from current research results,graph theory is the basis of GED research; therefore,introduction of some concepts related to graph theory isnecessary, such as the definitions of the graph with orwithout attributes, the directed acyclic graph, the commonsubgraph, the common supergraph, the maximum weightclique, the isomorphism of graphs, the transitive closure,the Fiedler vector and the super clique.3.1.1 Definitions of the graph and the attributed graphA graph is denoted by G = (V, E), whereV ¼f1; 2; ...; Mgis the set of vertices (nodes), E is the edge set(E 7 V 9 V). If nodes in a graph have attributes, thegraph is an attributed graph denoted by G = (V, E, l),where l is a labeling functionl : V ! L N :If both nodes and edges in a graph have attributes, thegraph is an attributed graph denoted by G = (V, E, a, b),wherea : V ! L N and b : E ! L Eare node and edge labeling functions. L N and L E arerestricted to labels consisting of fixed-size tuples, that is,L N = R p , L E = R q , p, q [ N [ {0}.3.1.2 Definitions of the directed graph and directedacyclic graphGiven a graph G = (V, E), if E is a set of ordered pairs ofvertices, G is a directed graph and edges in E are calleddirected edges. If there is no non-empty directed path thatstarts and ends on v for any vertex v in V, G is a directedacyclic graph.3.1.3 Definition of the subgraph and supergraphLet G = (V, E, a, b) and G 0 = (V 0 , E 0 , a 0 , b 0 ) be twographs; G 0 is a subgraph of G, G is a supergraph of G 0 ,G 0 7 G, if• V 0 7 V,• E 0 7 E,• a 0 (x) = a(x) for all x [ V 0 ,• b 0 ((x, y)) = b((x, y)) for all (x, y) [ E 0 .For non-attributed graphs, only the first two conditions areneeded.3.1.4 Definition of the graph isomorphismLet G 1 = (V 1 , E 1 , a 1 , b 1 ) and G 2 = (V 2 , E 2 , a 2 , b 2 ) be twographs. A graph isomorphism between G 1 and G 2 is abijective mapping f : V 1 ? V 2 such that• a 1 (x) = a 2 (f(x)), Vx [ V 1 ,• b 1 ((x, y)) = b 2 ((f(x), f(y))), V(x, y) [ E 1 .0For non-attributed graphs G 1 = (V 0 1 , E 0 1 ) and0G 2 = (V 0 2 , E 0 0 02 ), a bijective mapping f : V 1 ? V 2 suchthat ðu; vÞ 2E1 0 ,ðf ðuÞ; f ðvÞÞ 2 E0 2 ; Vu, v [ V 1 0 is a graphisomorphism between these two graphs.If V 1 = V 2 = /, then f is called the empty graphisomorphism.3.1.5 Definitions of the common subgraphand the maximum common subgraphLet G 1 and G 2 be two graphs and G 107 G 1 , G 207 G 2 .Ifthere exists a graph isomorphism between G 1 0 and G 2 0 , thenboth G 1 0 and G 2 0 are called a common subgraph of G 1 andG 2 . If there exists no other common subgraph of G 1 and G 2that has more nodes than G 10and G 2 0 , G 10and G 20arecalled a MCS of G 1 and G 2 .3.1.6 Definitions of the common supergraphand the minimum common supergraphA graph G^is a common supergraph of two graphs G 1 andG 2 if there exist graphs G^1 G^and G^2 G^such that thereexists a graph isomorphism between G^ 1 and G 1 , and a123

116 Pattern Anal Applic (2010) 13:113–129graph isomorphism between G^ 2 and G 2 .Itisaminimumcommon supergraph if there is no other common supergraphof G 1 and G 2 smaller than G^:3.1.7 Definition of the Fiedler vectorIf the degree matrix and adjacency matrix of a graph arediagonal matrixD ¼ diagðdegð1Þ; degð2Þ; ...; degðMÞÞ;where M is the last node in the graph, and the symmetricmatrix A, respectively, the Laplacian matrix is the matrixL = D - A. The eigenvector corresponding to the secondsmallest eigenvalue of the graph Laplacian is referred to asthe Fiedler vector.3.1.8 Definitions of the clique and the maximum weightcliqueA clique in a graph is a set of nodes which are adjacent toeach other, for example, in Fig. 2, node 3, node 4 and node5 form a clique in the graph. Weight clique is the extensionof clique to weighted graphs. Maximum weight clique isthe clique with the largest weight.3.1.9 Definition of the super cliqueGiven a graph G = (V, E), the super clique (or neighborhood)of the node i [ V consists of its center node itogether with its immediate neighbors connected by edgesin the graph.3.1.10 Transitive closureThe transitive closure of a directed graph G = (V, E) isagraph G ?=(V, E?) such that for all v and w in V, (v,w) [ E? if and only if there is a non-null path from v to win G.3.2 Basic algorithms used in the existing GEDalgorithmsThe definition of cost functions is key issue of GEDalgorithms and self-organizing map (SOM) can be used tolearn cost functions automatically to some extent [34].GED is defined as the cost of least expensive editsequences; thus search strategy for shortest path is closelyrelated to GED algorithms. Dijkstra’s algorithm is the mostpopular shortest path algorithm and is applied by Robles-Kelly [35]. In addition, expectation maximum (EM) algorithmis applied to parameter optimization [30]. So, EMalgorithm, Dijkstra’s algorithm and SOM are presentedhere before GED algorithms are given.3.2.1 EM algorithmExpectation maximum algorithm [36] is one of the mainapproaches for estimating the parameters of a Gaussianmixture model (GMM). There exist two sample spaces Xand Y, and a many-one mapping from X to Y. Data Yderived in space Y is observed directly, and correspondingdata x in X is not observed directly, but only indirectlythrough Y. Data x and y are referred to as the complete dataand incomplete data, respectively. Given a set of N randomvectorsZ ¼fz 1 ; z 2 ; ...; z N gin which each random vector is drawn from an independentand identically distributed mixture model, the likelihood ofthe observed samples (conditional probability) is defined asthe joint densityPðZjhÞ¼P N i¼1 pðz ijhÞ:Z is the complete data and z i is the incomplete data, and theaim of EM algorithm is to determine the parameter h thatmaximizes P(Z|h) given an observed Z.The EM algorithm is an iterative maximum likelihood(ML) estimation algorithm. Each iteration of EM algorithminvolves two steps: expectation step (E-step) and maximizationstep (M-step). In E-step, the updated posteriorprobability is computed with the prior probability, and inM-step, according to the posterior probability transferredfrom E-step, conditional probability is maximized to obtainthe updated prior probability and the parameters correspondingto the updated prior probability are transferred toE-step.3.2.2 Dijkstra’s algorithmFig. 2 An example of clique in the graphDijkstra’s algorithm [37] is developed by Dijkstra. It is agreedy algorithm that solves the single-source shortest pathproblem for a directed graph with non-negative edge123

Pattern Anal Applic (2010) 13:113–129 117weights and it can be extended to undirected graph. Givena weighted directed graph G = (V, E), each of edges in E isgiven a weight value, the cost of moving directly along thisedge. The cost of a path between two vertices is the sum ofcosts of the edges in that path. Given a pair of vertices sand t in V, the algorithm finds the shortest path from s to t.Let S be the set of nodes visited along the shortest pathfrom s to t. The adjacency matrix of G is the weight valuematrix C. The element d(i) is the cost of path from s tov i [ V, and d(s) = 0. The algorithm can be described asbelow:• Initialization: S = /, and d(i) is the weight value of theedge (s, v i );• If d(j) = min{d(i)|v i [ V - S.} is true, S = S [ {v j };• For every node v k [ V - S, ifd(j) ? C(j, k) \ d(k) istrue, d(k) is updated, that is, d(k) = d(j) ? C(j, k);• The last two steps are repeated until vertex t is visitedand d(t) is unchanged such that the shortest path from sto t is achieved.3.2.3 SOMSelf-organizing map [38–40] is an unsupervised artificialneural network and maps the training samples into lowdimensionalspace with the topological properties of theinput space unchanged. In SOM, neighboring neuronscompete in their activities by means of mutual lateralinteractions, and develop adaptively into specific detectorsof different signal patterns, so it is unsupervised, selforganizingand competitive.The SOM network consists of two layers: one is inputlayer and another is competitive layer. As shown in Fig. 3,hollow nodes denote neurons in input layer and solid nodesare competitive neurons. Each input is connected to allneurons in the competitive layer and every neuron in thecompetitive layer is connected to the neurons in itsneighborhood. For each neuron j, its position is describedas neural weight W j , and for neurons in the competitivelayer, the grid connections are regarded as their neighborhoodrelation. The training process is as below:• The neurons of the input layer selectively feed inputelements into the competitive layer;• When an input element D is mapped onto the competitivelayer, the neurons in competitive layer competefor the input element’s position to represent the inputelement well. The closest neuron c, the winner neuron,is chosen in terms of the distance metric d v which is thedistance of neural weights in the vector space;• Neighborhood N c = {c, n c 1 , n c 2 , …, n c M }, where M isthe number of neighbors, of the winner neuron c isdetermined with the distance d n between neurons whichis defined by means of the neighborhood relations. Theneuron c and its neighbors in N c are drawn closer to theinput element and weights of the whole neighborhoodare moved in the same direction, similar items tend toexcite adjacent neurons. The strength of the adaptationfor a competitive neuron is decided by a non-increasingactivation function a(t), and the weight of the neuron jin competitive layer is adapted according to thefollowing formula:(W j ðt þ 1Þ ¼ W jðtÞþaðtÞðDðtÞ W j ðtÞÞ; j 2 N cW j ðtÞ;j 62 N c :ð1Þ• The last two steps are repeated until a terminalcondition is achieved.4 Graph edit distanceA graph can be transformed to another one by a finitesequence of graph edit operations which may be defineddifferently in various algorithms, and GED is defined bythe least-cost edit operation sequence. In the following, anexample is used to illustrate the definition of GED. Formodel graph shown in Fig. 4 and data graph shown inFig. 3 The structure of SOMFig. 4 The model graph123

118 Pattern Anal Applic (2010) 13:113–1294.1.1 SOM based algorithmFig. 5 The data graphFig. 5, the task is transforming data graph into modelgraph. All edit operations are performed on the datagraph. One of the edit operation sequences includes nodeinsertion and edge insertion (node 6 and its relative edge),node deletion and edge deletion (node a and its relativeedges), node substitution (node 1) and edge substitution(the edge relative to node 5 and node 3). A cost functionis defined for each operation and the cost for this editoperation sequence is sum of costs for all operations inthe sequence. The sequence of edit operations and its costneeded for transforming a data graph into a model graphis not unique, but the least cost is exclusive. Then editoperation sequence with the least cost is requested and itscost is the GED between these two graphs. It is obviousthat how to determine the similarity of components ingraphs and define costs of edit operations are the keyissues.A graph may be an attributed relational graph withattributes of nodes, edges, or both nodes and edges,according to which the GED is computed directly. On theother hand, for a structural graph only having the informationof connectivity structure, graphs are usuallyconverted into strings according to nodes, edges or theconnectivity, and the GED is computed based on the editdistance methods concerning strings. GED algorithms,whose ideas are given in brief, are classified from these twoaspects. Algorithms for different kinds of graphs are notcomparable. The distances obtained with algorithms of thesame kinds are compared in the ability of clustering andclassifying images, and accordingly their superiorities andflaws can be concluded, which may be in favor of ourfurther research.4.1 GED for attributed graphsGraph edit distance for attributed graphs is computeddirectly according to the attributes which are various indifferent algorithms. In the SOM based method [34],probability based approach [30], convolution graph kernelbased method [41] and subgraph and supergraph basedmethod [42], attributes are of both nodes and edges,whereas the attribute is of nodes in binary linear programming(BLP) based method [43].In the existing algorithms for GED, the automatic inferenceof the cost for edit operations remains an open problem. Tothis end, the SOM based algorithm [34] is developed, inwhich the attributed graphs G = (V, E, a, b) are theobjects to be processed. Every node and edge labels arem-dimensional and n-dimensional vectors, respectively.The space of the node and edge labels in a population ofgraphs is mapped into a regular grid which is an untrainedSOM network, and the grid will be deformed after beingtrained. One type of edit operation is described by a SOM.The actual edit costs are derived from a distance measurefor labels that is defined with the distribution encoded inthe SOM. The encoding of node substitution is described asbelow:• The m-dimensional node label space is reduced to aregular grid by being sampled at equidistant positions.Each vertex of the grid is connected to its nearestneighbor along the dimensions so as to obtain arepresentation of the full space. The regular grid isSOM neural network;• Grid vertices and connections correspond to competitiveneurons and neighborhood relations in the SOM;• When it is being trained, the SOM corresponds to adeformed grid. A label vector at a vertex position of theregular grid is mapped directly onto the same vertex inthe deformed map. Any vector in the original space thatis not at a vertex can be mapped into the deformedspace by a simple linear interpolation of its adjacentgrid points.• The cost of substituting node v 2 for node v 1 is definedwith d v , that is,cðv 1 ! v 2 Þ¼b n sub d vðv 0 1 ; v0 2 Þ;nwhere b sub is the weighting factor, and the vector v 0 i isv i in the deformed space. The weighting factor compensatefor the dependency of the initial distancebetween vertexes.For other edit operations, the SOM networks are constructedin an analogous way. The vertex distribution ofeach SOM will be changed iteratively in the learningprocedure, which results in different costs. The object is toderive the cost functions resulting in small intraclass andlarge interclass distances; therefore activation function a(t)is defined such that the value of the function decreaseswhen the distance between neurons increases.The experiments demonstrating the performance ofSOM based method are performed on the graph samplesconsisting of ten distorted copies for each of the threeletter-classes A, E, and X. The instances of letter A beingsheared are illustrated in Fig. 6. Shearing factor a is used to123

Pattern Anal Applic (2010) 13:113–129 1194.1.2 Probability based algorithmα = 0 α = 0. 25 α = 0. 5 α = 0. 75 α = 1Fig. 6 Sheared letter A with shearing factor a in [34]indicate the degree of letter distortion. For every shearingfactor, the best average index is computed and shown inFig. 7. The average index, which is normalized to the unitinterval [0, 1], is defined by the average value of eightvalidation indices to evaluate the performance of clusteringquantitatively. The Smaller values, the better clustering.Eight validation indices are the Davies–Bouldin index [44],the Dunn index [45], the C-index [46], the Goodman–Kruskal index [47], the Calinski–Harabasz index [48],Rand statistics [49], the Jaccard coefficient [50], and theFowlkes–Mallows index [51]. In Fig. 7, the average indicescorresponding to SOM learning are smaller than thoseof the Euclidean model under every shearing factor and thesuperiority of SOM over the Euclidean model is increasinglyobvious with the shearing factor increasing, so editcosts derived through SOM learning make the differencebetween intraclass and interclass distances greater than thatderived through Euclidean model, which illustrates that theSOM performs better than Euclidean model for clustering.In this method, GED is computed based the metric d v ;therefore, the obtained GED is a metric. For a certainapplication, some areas of the label space are of greatrelevancy, while other areas are irrelevant. Other existingcost functions treat every part of label space equally, whichcan be overcome by the SOM based method learning therelevant areas of the label space from graph sample set.Similar to the cost learning system based on the frequencyestimation of edit operations for the string matching [52],Neuhaus proposed a probability based algorithm [30] tocompute GED. In this algorithm, if the GED of graphs G 1and G 2 is to be computed by transferring G 1 into G 2 , twoindependent empty graphs EG 1 and EG 2 are constructed forG 1 and G 2 , respectively, by a stochastic generation process.The sequence of node and edge insertion is applied toeither both or only one of the two constructed graphs EG 1and EG 2 , which can be interpreted as an edit operationsequence transforming G 1 into G 2 and whose effects on G 1are presented in Table 1. Edit costs are derived from thedistribution estimation of edit operations. Each type of editoperations, regarded as a random event, is modeled with aGMM and the mixtures are weighted to form the probabilitydistribution of edit events. Initialization starts withthe empiric mean and covariance matrices of a singlecomponent, and new components are added sequentially ifthe mixture density appears to converge in the trainingprocess. Training pairs of graphs required to be similar areextracted and the EM algorithm is employed to find alocally optimized parameter set in terms of the likelihoodof the edit events occurring between the pairwise traininggraphs. If a probability distribution of edit events sequenceðe 1 ; e 2 ; ...; e l Þis given, the probability of two graphs p(G 1 , G 2 ) is definedasZpðG 1 ; G 2 Þ¼dpðe 1 ; e 2 ; ...; e l Þ ð2Þðe 1 ;e 2 ;...;e l Þ2wðG 1 ;G 2 Þwhere w(G 1 , G 2 ) denotes the set of all edit operationstransferring G 1 to G 2 . Finally, the distance between thesetwo graphs is obtained by settingdðG 1 ; G 2 Þ¼ logðpðG 1 ; G 2 ÞÞ: ð3ÞThis algorithm is compared with the SOM based algorithm[34]. Three letter classes Z, A, and V are chosen, and 90graphs (30 samples per class) are constructed to produce fivesample sets with different values of the distortion parameter,0.1, 0.5, 0.8, 1.0, and 1.2. Examples of these graphs areshown in Fig. 8. The average index consisting of the Calinski–Harabaszindex [48], Davies–Bouldin index [44],Table 1 Effects of edit operations on original graphEdit operations Effects on G 1EG 1 EG 2Fig. 7 Comparison of average index on sheared line drawing sample[34]Node/edge insertion / Node/edge insertion/ Node/edge insertion Node/edge deletionNode/edge insertion Node/edge insertion Node/edge substitution123

120 Pattern Anal Applic (2010) 13:113–129(a) (b) (c)Fig. 8 Line drawing example. a Original drawing of letter A; bdistorted instance of the same letter with distortion parameter 0.5 andc distortion parameter 1.0 [30]Fig. 9 Comparison of performance with increasing strength ofdistortion in [30]Goodman–Kruskal index [47], and C-index [46] is computedfor every sample set, the result of which is shown in Fig. 9.As mentioned ahead, smaller average indices correspond tobetter clustering and this method corresponds to smalleraverage index in every sample set and for every distortionlevel; thus, it is confirmed that this method clearly leads tobetter clustering results than SOM based algorithm, and thebest average index value is obtained for the second-strongestdistortion, although the matching task becomes harder andharder with increasing distortion strength.Although SOM neural network can derive edit costsautomatically and distinct the relevant areas of the labelspace, edit costs derived according to probability distributionof edit operations are more effective for clusteringdistorted letters. The advantage of this method is that it isable to cope with large samples of graphs and strong distortionsbetween samples of the same class. It can be foundthat the key of this algorithm is the probability distributionof edit events.4.1.3 Method based on convolution graph kernelKernel method is a new class of algorithms for patternanalysis based on statistical learning. When kernel functionsare used to evaluate graph similarity, the graph matchingproblem can be formulated in an implicitly existing vectorspace, and then statistical methods for pattern analysis canbe applied. In the algorithm based on convolution graphkernel [41], a novel graph kernel function is proposed tocompute the GED so as to avoid the lack of mathematicalstructure in the space of graphs.For graphs G = (V, E, l, m) and G 0 = (V 0 , E 0 , l 0 , m 0 ), thecost of node substitution u ? u 0 replacing node u [ V bynode u 0 [ V 0 is given by the radial basis function:K sim ðu; u 0 Þ¼exp klðuÞ l 0 ðu 0 Þk 2. 2r 2 : ð4ÞThe same function with different parameter r is also usedto evaluate the similarity of edge labels. These radial basisfunctions favor edit paths containing more substitutions,fewer insertions and fewer deletions. Hence, substitutionsare modeled explicitly, while insertions and deletionsimplicitly.The set of edit decompositions (sequence consisting ofall nodes and edges in graph) of G is denoted by R -1 (G)and a function evaluating whether two edit decompositionsare equivalent to a valid edit path is denoted by K val . Forx [ R -1 (G) and x 0 [ R -1 (G 0 ), the function K val is defined asfollows:K val ðx; x 0 Þ¼ 1; if edit path x and x0 is valid: ð5Þ0; otherwiseWith these notations, the proposed edit kernel functionon graphs can finally be written as:kðG; G 0 Þ¼XK val ðx; x 0 Þ Y K sim ððxÞ i; ðx 0 Þ iÞ; ð6Þx2R 1 ðgÞix 0 2R 1 ðg 0 Þwhere the index i indicates all nodes and edges present inthe edit decomposition. In the computation of the kernelvalue k(G, G 0 ), only valid edit paths are considered with thehelp of function K val .On the one hand, convolution edit kernel based GEDand support vector machines (EK-SVM) are broughttogether, whose classification performance is comparedwith that of the traditional edit distance together with the k-nearest neighbor classifier [53] (ED-kNN). This experimentis conducted on the 15 letters that can be drawn withstraight lines only, such as A, E, F, etc. The distorted lettergraphs are split into a training set of 150 graphs, a validationset of 150 graphs, and a test set of 750 graphs. Theexperimental results are shown in Fig. 10, the accuracy ofthese two methods are heightened gradually with theincrease of running time and convolution edit kernel basedGED method has higher rate of classification than traditionaledit distance under the same running time.On the other hand, this method is compared with kernelfunctions derived directly from edit distance [54] (ED-123

Pattern Anal Applic (2010) 13:113–129 1214.1.4 Method based on binary linear programmingFig. 10 Running time and accuracy of the proposed kernel functionand edit distance in [41]The BLP based algorithm [43] is for graphs with vertexattributes only and a framework for computing GED onthe set of graphs is introduced. Every attributed graph inthe set is treated as a subgraph of a larger graph referredto as edit grid, and edit operations of converting a graphinto another one are equivalent to the state altering of theedit grid, from which GED can be derived. With the helpof graph adjacency matrix, it can be treated as a problemof the BLP.If the GED between graph G 0 = (V 0 , E 0 , l 0 ) and graphG 1 = (V 1 , E 1 , l 1 ) is to be computed, the graph G 0 is firstlyembedded in a labeled complete graphG X ¼ðX; X X; l X Þ;such thatTable 2 Accuracy of two edit distance methods (ED), a random walkkernel (RW), and the proposed edit kernel (EK) in [41] (%)Letter datasetED-kNN 69.33 48.15ED-SVM 73.2 59.26RW-SVM 75.2 33.33EK-SVM 75.2 68.52Image datasetSVM), and random walks in graphs [55] (RW-SVM),respectively. The Letters dataset used in the last experimentand the image dataset which is split into a training set,a validation set, and a test set, each of size 54, is used inthis experiment. The images are assigned to one of theclasses snowy, countryside, city, people, and streets andthey are described in [56] in detail. The classificationaccuracy of four methods mentioned above is shown inTable 2. The EK-SVM method outperforms all othermethods on the second dataset and achieves significantlyhigher classification accuracy than the traditional edit distancemethod. RW-SVM performs as well as EK-SVM onthe first dataset, but significantly worse than all othermethods on the second dataset. Convolution edit kernelbased performs best of other methods.In a word, the convolution edit kernel based GEDtogether with SVM outperforms not only the cooperation oftraditional edit distance and kNN, but also other kernelfunctions combining with SVM in classification. Unlike thetraditional edit distance, this kernel function makes gooduse of statistical learning theory in the inner product ratherthan the graph space directly. The convolution edit kernelis defined by decomposing pairs of graphs into edit path, soit is more closely related to GED than other kernelfunctions.• Graph G 0 is a subgraph of graph G X ,• Label l X (x i ) = / for all nodes x i [ X - V 0 ,• Label l X (x i , x j ) = 0 for all edgesðx i ; x j Þ2ðX XÞ E 0The G X = (X, X 9 X, l X ) is the edit grid and its statevector is denoted by g 2ðR [ /Þ N f0; 1g ðN2 NÞ=2 ; whereR is the label alphabet of nodes in the graph G 0 and N is thenumber of nodes in the edit grid.Then, a sequence of edits used to convert graph G 0 intothe graph G 1 can be specified by the sequence of edit gridstate vectors {g k } M k=0 . The GED between G 0 and G 1 is theminimum cost of state transition of edit grid, that is,d c ðG 0 ; G 1 Þ¼¼X Mminfg k g M j k¼1 g M2C 1 k¼1cðg k 1 ; g k 1 ÞX M X Iminfg k g M j k¼1 g M2C 1 k¼1 i¼1X I¼ minp2Pi¼1 c g i ; 0 gpi 1i¼Nþ1c g i k1 ; gi k; ð7Þwhere I = N ? (N 2 - N)/2, C 1 is the set of state vectorscorresponding to all isomorphisms of G 1 on the edit grid,and P is the set of all permutation mappings forisomorphisms of the edit grid. Permutation maps theelement i of a set to other element p i of the same set. Byintroducing the Kronecker delta function d : < 2 ? {0, 1},formula (7) is equalized to formula (8):X N d c ðG 0 ; G 1 Þ¼min c gdp i ; g j i ; j 0 1p2Pi¼1þ cð0; 1Þ XI 1 d g i ; 0 gpið8Þ1123

122 Pattern Anal Applic (2010) 13:113–129Finally, the edit grid state vector g k is represented withthe adjacency matrix A k whose elements correspond toedge labels in the state vector and rows (columns) areindexed with node labels, that isA ijk ¼ giNþji 2 þi2kand ðA i k Þ¼gi k where 1 i; j NFormula (8) is converted into formula (9)d c ðG 0 ; G 1 Þ¼X N X NmincðlðA iP;S;T2f0;1g NN 0 Þ; lðA j 1 ÞÞPiji¼1 j¼1þ 1 cð0; 1ÞðS þ TÞij2s:t: ðA 0 P PA 1 þ S TÞ ij ¼ 0 8 i; j;and X P ik ¼ X P kj ¼ 1 8 k; ð9Þijwhere P ij = d(p i , j), i, j [ [1, N] is a permutation matrix, Sand T are the introduced matrices for formula conversion.Formula (9) is a BLP, and the solved optimal permutationmatrix P* can be used to determine the optimal edit operations.This method is tested on 135 similar molecules whichhave only 18 or fewer atoms in the Klotho BiochemicalCompounds Declarative Database [57]. Ideally, pairwisedistances of all these molecules are the same. Two MCSbaseddistance metrics are used as references. The GEDcomputed with this method is more concentrated than thatof MCS-based distances. Furthermore, classification performanceis examined with the ‘‘classifier ratio’’ which isthe ratio of the GED between sample graph and the correctprototype to the distance of sample and the nearest incorrectprototype. This method leads to the lowest classifierratio which indicates the least ambiguous classification.As demonstrated above, this method tends to reduce thelevel of ambiguity in graph recognition. But the complexityof BLP makes the computation of GED for large graphsdifficult.4.1.5 Method based on subgraph and supergraphConcrete edit costs for GED are strongly applicationdependentand cannot be obtained in a general way, so subgraphand supergraph based method [42] is proposed. It is aspecial kind of graph distance to approximate the edit distance,which is totally independent of edit costs. This methodis based on the conclusion that GED coincides with the MCSof two graphs under the certain cost function [20]. Let G _ andG^be a MCS and a minimum common supergraph ofG 1 ¼ðV 1 ; E 1 ; a 1 Þ and G 2 ¼ðV 2 ; E 2 ; a 2 ÞThe distance between G 1 and G 2 is defined by dðG 1 ; G 2 Þ¼G^ G _ ;where G^ is the number of nodes in graph G^ and G _ issimilar. A cost function C is defined as a vector consistingof non-negative real functionsðc nd ðvÞ; c ni ðvÞ; c ns ðv 1 ; v 2 Þ; c ed ðeÞ; c ei ðeÞ; c es ðe 1 ; e 2 ÞÞ;where v, v 1 [ V 1 , e, e 1 [ E 1 , v 2 [ V 2 , e 2 [ E 2 and thecomponents orderly represent costs for node deletion, nodeinsertion, node substitution, edge deletion, edge insertionand edge substitution. If the cost function C is specified asC ¼ðc; c; c ns ; c; c; c es Þ;where c is a constant function which holds thatc ns ðv 1 ; v 2 Þ [ 2c and c es ðe 1 ; e 2 Þ [ 2cfor all v 1 [ V 1 and v 2 [ V 2 with a 1 (v 1 ) = a 2 (v 2 ), the GEDbetween G 1 and G 2 can be computed by the formulad(G 1 , G 2 ) = cd(G 1 , G 2 ).Construction of this method is simple and this method isnot relying on fundamental graph edit operations, that is tosay, it is independent of cost functions.The first four algorithms take different approaches todefining cost functions and they are proved to be potent forclassifying or clustering some specific images; therefore,they are limited to some specific data. The last method hasless limitation and can be used for general attributedgraphs; however, this is related to the search of MCS and aminimum common supergraph, which is also difficult forimplementing in practice.4.2 GED for non-attributed graphsFor the non-attributed graphs only having the informationof connectivity structure, GED algorithms [31, 35] usuallyinclude two parts: conversion of graphs to stringsand computation of edit distance for strings [58–60].Especially, a structural graph may be a tree. Althoughtrees can be viewed as a special kind of graphs, specificcharacteristics of trees suggest that posing the treematchingproblem as a variant on graph matching is notthe best approach. In particular, complexity of both treeisomorphism and subtree isomorphism problems is polynomialtime, which is more efficient than general graphs.The similarity of labeled trees is compared in [61] byvarious methods, in which definitions of cost functionsare given ahead. In this paper, specific methods for nonattributedtree matching problem are summarized. Treeedit distance (TED) can be obtained by searching for themaximum weight cliques [62, 63], or embedding treesinto a pattern space by constructing super-tree [64],which are presented separately from those of generalgraphs.123

Pattern Anal Applic (2010) 13:113–129 1234.2.1 Tree edit distance4.2.1.1 Maximum weight cliques based method The editdistance of the unordered tree still presents a computationalbottleneck, therefore, the computation of unordered TEDshould be efficiently approximated. Bunke’s idea of theequivalence of MCS and edit distance computation hasbeen applied to the GED [42], and it can also be extendedto the TED [62, 63, 65]. In these algorithms, there is astrong connection between the computation of maximumcommon subtree and the TED, and searching for themaximum common subtree is transformed into finding amaximum weight clique, so computation of TED is convertedinto a series of maximum weight clique problems,which is illustrated in Fig. 11.Similar with the graphs, data tree needs converting intomodel tree. Under the constraint [66] that the cost ofdeleting and reinserting the same element with a differentlabel is not greater than the cost of relabeling it, nodesubstitution is to be replaced by node removal and insertionon the data tree. The cost of node insertion on the data treeis dual to that of node removal on the model tree, so theoperations to be performed are further reduced to noderemoval on both trees, which makes the optimal matchingcompletely determined by the subset of nodes left after theminimum edit sequence. The edit distance problem is equalto a particular substructure isomorphism problem.Given two directed acyclic graphs (DAGs) t and t 0 0 to bematched, the transitive closures ‘(t) and ‘(t 0 ) are calculated.A tree _ t is an obtainable subtree of the ‘(t) if and only if _tis generated from a tree t with a sequence of node removaloperations. The minimum cost edited tree isomorphismbetween t and t 0 0 is a maximum common obtainable subtreeof the two ‘(t) and ‘(t 0 ).Then a maximum common obtainable subtree of the twotrees ‘(t) and ‘(t 0 ) is searched to induce the optimal matches,which can be transformed into computing the maximumweight clique. It is a quadratic programming problem:• The objective function is: min x T Cx;xs:t: x 2 D where C ¼ðc ij Þ i;j2V; ð10Þ81

124 Pattern Anal Applic (2010) 13:113–129least cost between pairwise strings is determined with Dijkstra’salgorithm.Based on the conclusion that adjacency matrix is associatedto the Markov chain, the transition matrix of theMarkov chains is the normalized adjacency matrix of agraph G = (V, E), where V = {1, 2, …, N}. Its leadingeigenvector gives the node sequence of the steady staterandom walk on the graph so that a graph is converted intoa string and global structural properties of graphs is characterized.The procedure is shown as below:1. The adjacent matrix A of the graph is defined;2. A transition probability matrix P is defined as:.XPði; jÞ ¼Aði; jÞ Aði; jÞ;ð12Þj2V3. The matrix P is converted into a symmetric form for aneigenvector expansion. The diagonal degree matrix Dis computed, and its elements are(Dði; jÞ ¼. P 1=dðiÞ ¼1 jVjj¼1 Pði; jÞ;0;i ¼ jotherwise ;ð13ÞThe symmetric version of the matrix P is W ¼ D 1 12 AD2:4. The spectral analysis for the symmetric transitionmatrix W isW ¼ X jVjk i¼1i/ i / T i ;ð14Þwhere k i is an eigenvalue of W and / i is the correspondingeigenvector of unit length.5. The leading eigenvector / * gives the sequence ofnodes in an iterative procedure and at each iteration k,a list L k denotes the nodes visited:• In the first step, let L 1 = j 1 , wherej 1 ¼ arg max / ðjÞ;jand neighbors of j 1 is the set N j1 ¼fmjðj 1 ; mÞ 2Eg;• In the second step, node j 2 satisfyingj 2 ¼ arg max / ðjÞj2N j1is found to form L 2 ¼fj 1 ; j 2 g and the set ofneighbors¼fmjðj 2 ; mÞ 2E ^ m 6¼ j 1 gN j2of j 2 is hunted;In the kth step, the node visited is j k and the list of nodesvisited is L k . The set N jk ¼fmjðj k ; mÞ 2Eg consists ofneighbors of j k , and then in the k ? 1th step, node j k?1satisfying j kþ1 ¼ arg max j2Ck / ðjÞ is chosen, whereC k ¼fjj2 N jk ^ j 62 L k g;• The number of step k = k ? 1;• The third and fourth steps are repeated until every nodein the graph is traversed.Given the model graph G M = (V M , E M ) and the datagraph G D = (V D , E D ) whose GED is to be computed,strings of these two graphs are determined by the procedureabove. The model graph is denoted byX ¼fx 1 ; x 2 ; ...; xjVM jgand the data graph is denoted byY ¼fy 1 ; y 2 ; ...; yjVD jg:A lattice is constructed, rows of which are indexed usingthe data-graph string, whereas columns of which areindexed using the model-graph string. An edit path can befound to transform string of data graph into string of modelgraph, which is denoted byC ¼ hc 1 ; c 2 ; ...; c k ; ...; c L iand its elements are Cartesian pairs c k [ (V D [ e) 9(V M [ e), where e denotes the empty set. The path isconstrained to be connected on the edit lattice. Thediagonal transition corresponds to the match of an edgeof the data-graph to an edge of the model graph. Ahorizontal transition corresponds to the case where thetraversed nodes of the model graph do not have matchednodes in data graph. Similarly, when a vertical transition ismade, then the traversed nodes of the data graph do nothave matched nodes in model graph. The cost of the editpath is the sum of the costs for the elementary editoperations:CðCÞ ¼ X c k 2C gðc k ! c kþ1 Þ;ð15Þwhere g(c k ? c k?1 ) =-ln P(c k ? c k?1 ) is the cost of thetransition from state c k = (a, b) to c k?1 = (c, d). Theprobability P(c k ? c k?1 ) is defined as below:Pðc k ! c kþ1 Þ¼b a;b b c;d R D ða; cÞR M ðb; dÞ;where b a,b and b c,d are the morphological affinity,8of data graph G D ,and R M (b, d) is similar with R D (a, c). The optimal edit pathis the one with the minimum cost, that is, C * = arg -min C C(C). So, the problem of computing GED is posed asfinding the shortest path through the lattice by Dijkstra’salgorithm and the GED between these two graphs is C(C * ).This is a relatively preliminary work for applying theeigenstructure of the graph adjacency matrix to the graph-123

Pattern Anal Applic (2010) 13:113–129 125matching, and it is improved to be the method in theprobability framework.4.2.2.2 MAP based method The idea of the MAP estimationbased algorithm [31, 59] is developed from theDijkstra’s algorithm based method [35]. They differ in theestablishment of strings and match of strings. Edit costs arerelated to different features. All these differences are presentedin Table 3.In the MAP based algorithm, graphs are converted intostrings with graph spectral method according to the leadingeigenvectors of their adjacency matrices. Similar with theDijkstra’s algorithm based method, GED is the cost of theleast expensive edit path C * , but the path C * is found basedon the idea of Levenshtein distance in probability framework.The cost for the elementary edit operations is definedas:gðc k ! c kþ1 Þ¼ ln Pðc k / X ðx jÞ; / Y ðy iÞÞln Pðc kþ1 / X ðx jþ1Þ; / Y ðy ð16Þiþ1ÞÞ ln R k;kþ1 ;where the edge compatibility coefficient R k,k?1 isandPðc k / X ðx jÞ; / Y ðy iÞÞ( nop 1ffiffiffiffi1¼ 2p rexp2rð/ 2 X ðx jÞ / Y ðy iÞÞ 2aif x j 6¼ e and y i 6¼ eif x j ¼ e or y i ¼ eGiven images in three sequences: CMU-VASC sequence[67], the INRIA MOVI sequence [68], and a sequence ofviews of a model Swiss chalet, their GED matrix is computedwith this method. The result is shown in Fig. 12.Each element of the matrix specifies the color of a rectilinearpatch in Fig. 12 and the deeper color corresponds tothe smaller distance. All patches constitute nine blocks.Coordinates 1–10, 11–20 and 21–30 correspond to CMU-VASC sequence, INRIA MOVI sequence and Swiss chaletsequence, respectively; therefore blocks along the diagonalpresent within-class distances and other blocks presentbetween-class distances. In each block, the row and columnindexes increase monotonically according to the viewingangle of each sequence. Color of diagonal blocks is deeperthan that of other areas, and it is obvious that GED within aclass is lower than that between classes on the whole.R k;kþ1 ¼Pðc k; c kþ1 ÞPðc k ÞPðc kþ1 Þ8q M q D if c k ! c kþ1 is a diagonal transition on the the edit>:lattice; i:e:; y i ¼ e or y iþ1 ¼ e and ðx j ; x jþ1 Þ2E M1 if y i ¼ e or y iþ1 ¼ e and x j ¼ e or x jþ1 ¼ eTable 3 Comparison of the MAP based algorithm and the Dijkstra’salgorithm based methodMethods MAP based algorithm Dijkstra’s algorithmbased methodEstablishmentof the serialorderingStringmatchingEdit costsUsing the leadingeigenvector of thegraph adjacencymatrixA MAP alignment ofthe strings forpairwise graphsRelated to the edgedensity of two graphsUsing the leadingeigenvector of thenormalized graphadjacency matrixSearching for the optimaledit sequence usingDijkstra’s algorithmRelated to the degree andadjacency of nodesCompared with the Dijkstra’s algorithm based method,this method has the following two advantages: when graphsare converted into strings, the adjacency matrix needs notnormalization, which decreases computation complexity;when strings are matched, the computation of minimal editdistance is cast in a probabilistic setting so that statisticalmodels can be used for the cost definition.4.2.2.3 String kernel based method String kernels can beused to measure the similarity of seriated graphs, whichmakes the computation of GED more efficient. In the stringkernel based algorithm [58], graphs are seriated into stringswith semidefinite programming (SDP) whose steps aregiven as below.123

126 Pattern Anal Applic (2010) 13:113–129images between classes and images corresponding to differentobjects can be clustered well.The SDP overcomes local optimality of the graphspectral method used in the Dijkstra’s algorithm based andMAP based methods, and string kernel function is moreefficient than aligning strings with Dijkstra’s algorithm.4.2.2.4 Subgraph based method Because of potentiallyexponential complexity of the general inexact graphmatchingproblem, it is decomposed into a series of simplersubgraph matching problems [60]. A graph G = (V, E) ispartitioned into non-overlapping super cliques according toFiedler vector:Fig. 12 GED matrix in [31]• Let B be X 1/2 AX -1/2 and y be X 1/2 x * , where231 0 0 0 00 2 0 0 0.X ¼6740 0 0 2 05 ;0 0 0 0 1A is the adjacency matrix of the graph to be convertedinto a string, and x * denotes the value to be solved. IfY * = yy T , the SDP is represented as the followingformula:arg min traceðBY Þ;Y such that trace(EY * ) = 1, where E is the unit matrix.Matrix Y * can be solved with the method in [69] soasto obtain x * .• Similar with the idea of converting a graph into a stringwith the leading eigenvector in [35], the graph isconverted into a string according to the vector x * .With strings obtained, kernel feature is applied to presentingthe times of a substring occurring in a string and isweighted by the length of the substring. Elements of akernel feature vector for a string correspond to substrings.The inner product of the kernel feature vectors correspondingto two strings is called as string kernel function.The kernel function gives sum of frequency of all commonsubstrings weighted by their lengths. String kernel functionworks based on the idea that the strings are more similar ifthey share more common substrings.COIL image database [70] is used to evaluate thismethod. Six objects are selected from the database andeach object has 20 different views. Distance of imagesbelonging to the same class is much smaller than that of• The list C = {j 1 , j 2 , …, j |V| } is the node rank-orderwhich is determined under the conditions that thepermutation satisfiespðj 1 Þ\pðj 2 Þ\ \pðj jVjÞand the components of the Fiedler vector isx j1 [ x j2 [ [ x jjVj:The weight assigned to the node i [ V is w i = r-ank(p(i)) and the significance score of the node i beinga center node is computed, that is S i , according todegree and weight of node i;• The list C is traversed until a node k is founded which isneither in the perimeter nor whose score s k exceedsthose of its neighbors. Node k and its neighborhood N kconstitute a super clique and they are deleted from thelist C, that is C = C - {k} [ N k . This procedure isrepeated until C = /, and then the non-overlappingneighborhoods of the graph G are located.With super cliques in hand, a graph G 0 containing supercliques of the original graph G is constructed, in which thenodes denote the super cliques and the edges indicatewhether these super cliques are connected in the originalgraph. Such graphs are matched based on the matching ofthe super clique set, that is, the super clique-to-super cliquematching, which is computed by the conversion of supercliques into strings based on the cyclic permutations of theperipheral nodes about the center nodes and the Levenshteindistance between strings.This method partitions a graph into subgraphs, andtherefore the process may cast into a hierarchical frameworkand be suitable for parallel computation.Trees, as a special kind of graphs, have some attributessuperior to general graphs and TED can be computedwithout definition of cost functions, which has beenapplied to shape classification [71] with shock-tree as astructural representation of 2D shape. But it is obviousthat the key issue of GED algorithms for general graphs isstill definition of cost functions. Each method defines cost123

Pattern Anal Applic (2010) 13:113–129 127functions in a task specific way from heuristics of theproblem domain in a trial-and-error fashion and furtherresearch is still needed to derive cost functions in ageneral method.5 ConclusionGraph edit distance is a flexible error-tolerant mechanismto measure distance between two graphs, which has beenwidely applied to pattern recognition and image retrieval.The research of GED is studied and surveyed in thispaper. Existing GED algorithms are categorized andpresented in detail. Advantages and disadvantages ofthese algorithms are uncovered by comparing themexperimentally and theoretically. Although the researchhas remained for several decades and yielded substantialresults, there are few robust algorithms suitable for allkinds of graphs and several problems deserve futureresearch.1. In the computation of GED, how to compare thesimilarity of corresponding nodes and edges in twographs is still not solved well. For attributed graphs,attributes of nodes and edges can be used forcomparing the similarity. But which attributes shouldbe adopted and available for computing distanceremains an open problem. For non-attributed graphs,the connectivity of the graph can be used for comparingthe similarity. But how to characterize theconnectivity to achieve a better evaluation of similarityremains unsolved.2. The definition of costs for edit operations is alsoimportant for GED, which affects the rationality ofGED directly. Existing researches of GED mainlyfocus on this problem and each of them is available forlimited applications, or under some constrains, sosome definitions of costs, which can be appliedextensively and easily, are demanded.3. Many ways of searching for least expensive editsequence have been used previously. The searchstrategy should be consistent with the method ofsimilarity comparison and the definition of edit cost,instead of the best one in theory. So an appropriatesearch strategy for the minimum edit costs sequenceshould be studied to improve both the efficiency andaccuracy of GED algorithms.Acknowledgments We want to thank the helpful comments andsuggestions from the anonymous reviewers. This research has beenpartially supported by National Science Foundation of China(60771068, 60702061, 60832005), the Open-End Fund of NationalLaboratory of Pattern Recognition in China and National Laboratoryof Automatic Target Recognition, Shenzhen University, China, theProgram for Changjiang Scholars and innovative Research Team inUniversity of China (IRT0645).References1. Umeyama S (1988) An eigendecomposition approach to weightedgraph matching problems. IEEE Trans Pattern Anal MachIntell 10(5):695–7032. Bunke H (2000) Recent developments in graph matching. In:Proceedings of IEEE international conference on pattern recognition,Barcelona, pp 117–1243. Caelli T, Kosinov S (2004) An eigenspace projection clusteringmethod for inexact graph matching. IEEE Trans Pattern AnalMach Intell 26(4):515–5194. Cross ADJ, Wilson RC, Hancock ER (1997) Inexact graphmatching using genetic search. Pattern Recognit 30(7):953–9705. Wagner RA, Fischer MJ (1974) The string-to-string correctionproblem. J ACM 21(1):168–1736. Pavlidis JRT (1994) A shape analysis model with applications toa character recognition system. IEEE Trans Pattern Anal MachIntell 16(4):393–4047. Wang Y-K, Fan K-C, Horng J-T (1997) Genetic-based search forerror-correcting graph isomorphism. IEEE Trans Syst ManCybern B Cybern 27(4):588–5978. Sebastian TB, Klien P, Kimia BB (2004) Recognition of shapesby editing their shock graphs. IEEE Trans Pattern Anal MachIntell 26(5):550–5719. He L, Han CY, Wee WG (2006) Object recognition and recoveryby skeleton graph matching. In: Proceedings of IEEE internationalconference on multimedia and expo, Toronto, pp 993–99610. Shearer K, Bunke H, Venkatesh S (2001) Video indexing andsimilarity retrieval by largest common subgraph detection usingdecision trees. Pattern Recognit 34(5):1075–109111. Lee J (2006) A graph-based approach for modeling and indexingvideo data. In: Proceedings of IEEE international symposium onmultimedia, San Diego, pp 348–35512. Tao D, Tang X (2004) Nonparametric discriminant analysis inrelevance feedback for content-based image retrieval. In: Proceedingsof IEEE international conference on pattern recognition,Cambridge, pp 1013–101613. Tao D, Tang X, Li X et al (2006) Kernel direct biased discriminantanalysis: a new content-based image retrieval relevancefeedback algorithm. IEEE Trans Multimedia 8(4):716–72714. Tao D, Tang X, Li X (2008) Which components are important forinteractive image searching? IEEE Trans Circuits Syst VideoTechnol 18(1):1–1115. Christmas WJ, Kittler J, Petrou M (1995) Structural matching incomputer vision using probabilistic relaxation. IEEE Trans PatternAnal Mach Intell 17(8):749–76416. Gao X, Zhong J, Tao D et al (2008) Local face sketch synthesislearning. Neurocomputing 71(10–12):1921–193017. Sanfeliu A, Fu KS (1983) A distance measure between attributedrelational graphs for pattern recognition. IEEE Trans Syst ManCybern 13(3):353–36218. Messmer BT, Bunke H (1994) Efficient error-tolerant subgraphisomorphism detection. Shape Struct Pattern Recognit:231–24019. Messmer BT, Bunke H (1998) A new algorithm for error-tolerantsubgraph isomorphism detection. IEEE Trans Pattern Anal MachIntell 20(5):493–50420. Bunke H (1997) On a relation between graph edit distance andmaximum common subgraph. Pattern Recognit Lett 18(8):689–69421. Bunke H (1999) Error correcting graph matching: on the influenceof the underlying cost function. IEEE Trans Pattern AnalMach Intell 21(9):917–922123

128 Pattern Anal Applic (2010) 13:113–12922. Shasha D, Zhang K (1989) Simple fast algorithms for the editingdistance between trees and related problems. SIAM J Comput18(6):1245–126223. Zhang K (1996) A constrained edit distance between unorderedlabeled trees. Algorithmica 15(3):205–22224. Myers R, Wilson RC, Hancock ER (2000) Bayesian graph editdistance. IEEE Trans Pattern Anal Mach Intell 22(6):628–63525. Wei J (2004) Markov edit distance. IEEE Trans Pattern AnalMach Intell 26(3):311–32126. Marzal A, Vidal E (1993) Computation of normalized edit distanceand applications. IEEE Trans Pattern Anal Mach Intell15(9):926–93227. Myers R, Wilson RC, Hancock ER (1998) Efficient relationalmatching with local edit distance. In: Proceedings of IEEE internationalconference on pattern recognition, Brisbane, pp 1711–171428. Wilson RC, Hancock ER (1997) Structural matching by discreterelaxation. IEEE Trans Pattern Anal Mach Intell 19(6):634–64829. Levenshtein V (1966) Binary codes capable of correcting deletions,insertions, and reversals. Sov Phys Dokl 10(8):707–71030. Neuhaus M, Bunke H (2004) A probabilistic approach to learningcosts for graph edit distance. In: Proceedings of IEEE internationalconference on pattern recognition, Cambridge, pp 389–39331. Robles-Kelly A, Hancock ER (2005) Graph edit distance fromspectral seriation. IEEE Trans Pattern Anal Mach Intell27(3):365–37832. Xiao B, Gao X, Tao D et al (2008) HMM-based graph edit distancefor image indexing. Int J Imag Syst Tech 18(2–3):209–21833. Gao X, Xiao B, Tao D et al (2008) Image categorization: graphedit distance ? edge direction histogram. Pattern Recognit47(10):3179–319134. Neuhaus M, Bunke H (2005) Self-organizing maps for learningthe edit costs in graph matching. IEEE Trans Syst Man Cybern BCybern 35(3):503–51435. Robles-Kelly A, Hancock ER (2004) String edit distance, randomwalks and graph matching. Int J Pattern Recogn Artif Intell18(3):315–32736. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihoodfrom incomplete data via the EM algorithm. J R Stat Soc B39(1):1–3837. Dijkstra EW (1959) A note on two problems in connexion withgraphs. Numer Math 1:269–27138. Azcarraga AP, Hsieh M-H, Pan SL et al (2005) Extracting salientdimensions for automatic SOM labeling. IEEE Trans Syst ManCybern C Appl Rev 35(4):595–60039. Kohonen T (1995) Self organizing maps. Springer, New York40. Bhattacharyya S, Dutta P, Maulik U (2007) Binary objectextraction using bi-directional self-organizing neural network(BDSONN) architecture with fuzzy context sensitive thresholding.Pattern Anal Appl 10(4):345–36041. Neuhaus M, Bunke H (2006) A convolution edit kernel for errortolerantgraph matching. In: Proceedings of IEEE internationalconference on pattern recognition, Hong Kong, pp 220–22342. Fernández M-L, Valiente G (2001) A graph distance metriccombining maximum common subgraph and minimum commonsupergraph. Pattern Recognit Lett 22(6–7):753–75843. Justice D, Hero A (2006) A binary linear programming formulationof the graph edit distance. IEEE Trans Pattern Anal MachIntell 28(8):1200–121444. Davies DL, Bouldin DW (1979) A cluster separation measure.IEEE Trans Pattern Anal Mach Intell 1(2):224–22745. Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions.J Cybern 4:95–10446. Hubert LJ, Schultz JV (1976) Quadratic assignment as a generaldata analysis strategy. Br J Math Stat Psychol 29:190–24147. Goodman LA, Kruskal WH (1954) Measures of association forcross classification. J Am Stat Assoc 49:732–76448. Calinski T, Harabasz J (1974) A dendrite method for clusteranalysis. Commun Stat 3(1):1–2749. Rand W (1971) Objective criteria for the evaluation of clusteringmethods. J Am Stat Assoc 66(336):846–85050. Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs51. Fowlkes EB, Mallows CL (1983) A method for comparing twohierarchical clusterings. J Am Stat Assoc 78:553–58452. Ristad E, Yianilos P (1998) Learning string edit distance. IEEETrans Pattern Anal Mach Intell 20(5):522–53253. García V, Mollineda RA, Sánchez JS (2008) On the k-NN performancein a challenging scenario of imbalance and overlapping.Pattern Anal Appl 11(3–4):269–28054. Neuhaus M, Bunke H (2005) Edit distance based kernel functionsfor attributed graph matching. In: Proceedings of 5th internationalworkshop graph-based representations in pattern recognition,Poitiers, pp 352–36155. Artner TG, Flach P, Wrobel S (2003) On graph kernels: hardnessresults and efficient alternatives. In: Proceedings of 16th annualconference on learning theory, Washington, pp 129–14356. Saux BL, Bunke H (2005) Feature selection for graph-basedimage classifiers. In: Proceedings of 2nd Iberian conference onpattern recognition and image analysis, Estoril, pp 147–15457. Dunford-Shore B, Sulaman W, Feng B et al (2002) Klotho: biochemicalcompounds declarative database. http://www.biocheminfo.org/klotho/58. Yu H, Hancock ER (2006) String kernels for matching seriatedgraphs. In: Proceedings of IEEE international conference onpattern recognition, Hong Kong, pp 224–22859. Robles-Kelly A, Hancock ER (2003) Edit distance from graphspectra. In: Proceedings of IEEE international conference oncomputer vision, Nice, pp 234–24160. Qiu HJ, Hancock ER (2006) Graph matching and clustering usingspectral partitions. Pattern Recognit 39(1):22–3461. Bille P (2005) A survey on tree edit distance and related problems.Theor Comput Sci 337(1–3):217–23962. Torsello A, Robles-Kelly A, Hancock ER (2007) Discoveringshape classes using tree edit-distance and pairwise clustering. IntJ Comput Vis 72(3):259–28563. Torsello A, Hancock ER (2003) Computing approximate tree editdistance using relaxation labeling. Pattern Recognit Lett24(8):1089–109764. Torsello A, Hancock ER (2007) Graph embedding using treeedit-union. Pattern Recognit 40(5):1393–140565. Torsello A, Hancock ER (2001) Efficiently computing weightedtree edit distance using relaxation labeling. In: Proceedings ofenergy minimization methods in computer vision and patternrecognition. Springer, Sophia Antipolis, pp 438–45366. Bunke H, Kandel A (2000) Mean and maximum common subgraphof two graphs. Pattern Recognit Lett 21(2):163–16867. Vision and Autonomous Systems Center’s Image Database.http://vasc.ri.cmu.edu//idb/html/motion/house/index.html68. INRIA-MOVI houses. http://www.inria.fr/recherche/equipes/movi.en.html69. Fujisawa K, Futakata Y, Kojima M et al (2008) Sdpa-m user’smanual. http://sdpa.is.titech.ac.jp/SDPA-M70. Columbia Object Image Library. http://www1.cs.columbia.edu/CAVE/software/softlib/coil-20.php71. Torsello A, Robles-Kelly A, Hancock ER (2007) Discoveringshape classes using tree edit-distance and pairwise clustering. IntJ Comput Vis 72(3):259–285123

Pattern Anal Applic (2010) 13:113–129 129Author BiographiesXinbo Gao received his Ph.D.degree in signal and informationprocessing in 1999, at XidianUniversity, Xi’an, China. Since2001, he joined Xidian University,where he is currently a FullProfessor at the School ofElectronic Engineering andDirector of VIPS Lab. Hisresearch interests include computationalintelligence, machinelearning, visual informationprocessing and analysis, andpattern recognition. In theseareas, he has published four books and over 100 technical articles inrefereed journals and proceedings. He is on the editorial boards ofseveral journals including EURASIP Signal Processing Journal. Heserved as general chair/co-chair or program committee chair/co-chairor PC member for around 30 major international conferences. He is asenior member of IEEE and IET.Bing Xiao received the B.S.degree in Computer Science andTechnology and the M.Eng.degree in Computer Softwareand Theory from Shaanxi NormalUniversity, Xi’an, China, in2003 and 2006, respectively.Since August 2006, she has beenworking toward the Ph.D. degreein Intelligent Information Processingat Xidian University,Xi’an, China. Her researchinterests include pattern recognitionand computer vision.Dacheng Tao received theB.Eng. degree from USTC, theMPhil degree from CUHK, andthe PhD degree from Lon. Currently,he is a NanyangAssistant Professor in NanyangTechnological University, aVisiting Professor in XidianUniversity, a Guest Professor inWuhan University, and a VisitingResearch Fellow in Lon. Hisresearch is mainly on applyingstatistics and mathematics fordata analysis problems in computervision, data mining, and machine learning. He has publishedmore 90 scientific papers in IEEE T-PAMI, T-KDE, T-IP, with bestpaper awards. He is an associate editor of IEEE T-KDE, Neurocomputing(Elsevier) and CSDA (Elsevier).Xuelong Li holds a permanentacademic post at Birkbeck College,University of London. Heis also a visiting/guest professorat Tianjin University and USTC.His research focuses on cognitivecomputing, image/videoprocessing, pattern recognition,and multimedia. He has 140publications with several BestPaper Awards and finalists. Heis an author/editor of fourbooks, an associate editor ofPattern Analysis and Applications(Springer), four IEEE transactions, and ten other journals.123

A survey of graph edit distance

Create successful ePaper yourself

Delete template?

Save as template?