Graph Indexing Algorithms

Graph Indexing AlgorithmsJaume BaixeriesUQàM2008

PreliminariesWe assume that the graphs we are dealing with are:UNDIRECTEDandVERTEX LABELED

Subgraph Isomorphism(aka: graph containement problem)There is a SUBGraph isomorphism between graphsG1 = (V1,E1) and G2 = (V2,E2) if there is a bijectionf: V1 V2’, where V2’ is a subset of V2st:(x,y) E1 if and only if (f(x),f(y)) E2andx V1: l(x) = l(f(x)).

Complexity● Graph ISO is not known to be NP-complete nor P.●SUBgraph ISO is NP-complete.●If it was not, we could test the subgraphisomorphism of a graph and a Kn in P time, and,therefore, CLIQUE would be in P.

Why Graph Indexing?●Graphs are used in many fields.●In those fields, there is the need to know if a graphis contained in another graph.●Ex: Proteins, images, grammars..

Why Graph Indexing?●In general, we assume that we will have to processmany times the following operation:Given a graph database GDB (a set of graphs) and aquery graph Q, find those graphs in GDB thatcontain Q.

Why Graph Indexing?●We can test graph containement (subgraph iso) foreach graph in GDB.●But the number of graphs in GDB can be extremlylarge, and also the graphs may be large.●Therefore, this approach is unfeasible

Why Graph Indexing?Graph Indexing aims at reducing the set of graphsthat must be tested pruning non promisinggraphs.The conditions for a graph indexing technique are:●●●Not too expensive (in terms of space and time)High prune capability.Absence of false negatives.

Graph Indexing?Two steps in all graph indexing techniques:1. Given a query graph Q, we look up the index andretrieve the candidate set of graphs.2. We verify (via subgraph iso) that those graphscontain the query graph and output the answer set.

Graph Indexing?●●●Graph indexing techniques compute aCANDIDATE SET that must be verified.Ideally, the candidate set should be the ANSWERSET.However, this is equivalent to the subgraph isoproblem.

Graph Indexing?●Graph Indexing techniques will always have todeal with the exponenciality of subgraph iso.●But smart techniques may be designed for specificsets of graphs, depending on:– The size of graphs in GDB.– Number of labels.– Density of graphs– Etc ...

Graph Indexing?In general, graph indexing techniques useas index features.INVARIANTSX is an invariant, if and only if:G1 G2 and I G1 I G2

Graph IndexingThis is:If I G1 and I G2then, maybe, G1 G2But if I G1 and I ⊈ G2, then, we know thatG1⊈G2

Graph IndexingInvariants help to prune unpromising graphs.However, we must make sure that no valid graphs ispruned!!

Graph IndexingAn invariant can be the vertex list, degree list....But in graph indexing the invariants used areSUBSTRUCTURESthis are: subgraphs, subtrees, sequences, canonicalrepresentations of subsetructures, etc...

Graph IndexingIn general terms, we need invariants (features) that:●Are representative (selective). For instance, thelabel is not much selective, but the neighborhoodis more selective.●Easy to compare (trees, sequences). This is: cheapin terms of computation.

HierarchyGCodeFGIndexTree + DeltagIndexCTreeGraphGrep

Major Concerns●Index size.●Index construction time.●False positive ratio.●Scalability wrt to:– Query size– Size of graphs in database– Density of graphs in database

ClassificationThere are differents ways to classify thosealgorithms:● Mining vs non mining● Kind of information stored.● Way to store the information

Mining vs non-mining●gIndex● Tree+Delta●FGIndexMining● CTree●GCode● GraphGrepNon-mining

Information Stored (features)●gIndex● CTree●FGIndexGraphs● Tree+Delta●GCode● GraphGrepTree

Way to store features●gIndex● Tree+Delta●FGIndex(inverted) Index of features● CTree●GCodeIn a tree

Graph Indexing Algorithms

Create successful ePaper yourself

Delete template?

Save as template?