12.07.2015 Views

Graph Indexing Algorithms

Graph Indexing Algorithms

Graph Indexing Algorithms

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Graph</strong> <strong>Indexing</strong> <strong>Algorithms</strong>Jaume BaixeriesUQàM2008


PreliminariesWe assume that the graphs we are dealing with are:UNDIRECTEDandVERTEX LABELED


Subgraph Isomorphism(aka: graph containement problem)There is a SUB<strong>Graph</strong> isomorphism between graphsG1 = (V1,E1) and G2 = (V2,E2) if there is a bijectionf: V1 V2’, where V2’ is a subset of V2st:(x,y) E1 if and only if (f(x),f(y)) E2andx V1: l(x) = l(f(x)).


Complexity● <strong>Graph</strong> ISO is not known to be NP-complete nor P.●SUBgraph ISO is NP-complete.●If it was not, we could test the subgraphisomorphism of a graph and a Kn in P time, and,therefore, CLIQUE would be in P.


Why <strong>Graph</strong> <strong>Indexing</strong>?●<strong>Graph</strong>s are used in many fields.●In those fields, there is the need to know if a graphis contained in another graph.●Ex: Proteins, images, grammars..


Why <strong>Graph</strong> <strong>Indexing</strong>?●In general, we assume that we will have to processmany times the following operation:Given a graph database GDB (a set of graphs) and aquery graph Q, find those graphs in GDB thatcontain Q.


Why <strong>Graph</strong> <strong>Indexing</strong>?●We can test graph containement (subgraph iso) foreach graph in GDB.●But the number of graphs in GDB can be extremlylarge, and also the graphs may be large.●Therefore, this approach is unfeasible


Why <strong>Graph</strong> <strong>Indexing</strong>?<strong>Graph</strong> <strong>Indexing</strong> aims at reducing the set of graphsthat must be tested pruning non promisinggraphs.The conditions for a graph indexing technique are:●●●Not too expensive (in terms of space and time)High prune capability.Absence of false negatives.


<strong>Graph</strong> <strong>Indexing</strong>?Two steps in all graph indexing techniques:1. Given a query graph Q, we look up the index andretrieve the candidate set of graphs.2. We verify (via subgraph iso) that those graphscontain the query graph and output the answer set.


<strong>Graph</strong> <strong>Indexing</strong>?●●●<strong>Graph</strong> indexing techniques compute aCANDIDATE SET that must be verified.Ideally, the candidate set should be the ANSWERSET.However, this is equivalent to the subgraph isoproblem.


<strong>Graph</strong> <strong>Indexing</strong>?●<strong>Graph</strong> <strong>Indexing</strong> techniques will always have todeal with the exponenciality of subgraph iso.●But smart techniques may be designed for specificsets of graphs, depending on:– The size of graphs in GDB.– Number of labels.– Density of graphs– Etc ...


<strong>Graph</strong> <strong>Indexing</strong>?In general, graph indexing techniques useas index features.INVARIANTSX is an invariant, if and only if:G1 G2 and I G1 I G2


<strong>Graph</strong> <strong>Indexing</strong>This is:If I G1 and I G2then, maybe, G1 G2But if I G1 and I ⊈ G2, then, we know thatG1⊈G2


<strong>Graph</strong> <strong>Indexing</strong>Invariants help to prune unpromising graphs.However, we must make sure that no valid graphs ispruned!!


<strong>Graph</strong> <strong>Indexing</strong>An invariant can be the vertex list, degree list....But in graph indexing the invariants used areSUBSTRUCTURESthis are: subgraphs, subtrees, sequences, canonicalrepresentations of subsetructures, etc...


<strong>Graph</strong> <strong>Indexing</strong>In general terms, we need invariants (features) that:●Are representative (selective). For instance, thelabel is not much selective, but the neighborhoodis more selective.●Easy to compare (trees, sequences). This is: cheapin terms of computation.


HierarchyGCodeFGIndexTree + DeltagIndexCTree<strong>Graph</strong>Grep


Major Concerns●Index size.●Index construction time.●False positive ratio.●Scalability wrt to:– Query size– Size of graphs in database– Density of graphs in database


ClassificationThere are differents ways to classify thosealgorithms:● Mining vs non mining● Kind of information stored.● Way to store the information


Mining vs non-mining●gIndex● Tree+Delta●FGIndexMining● CTree●GCode● <strong>Graph</strong>GrepNon-mining


Information Stored (features)●gIndex● CTree●FGIndex<strong>Graph</strong>s● Tree+Delta●GCode● <strong>Graph</strong>GrepTree


Way to store features●gIndex● Tree+Delta●FGIndex(inverted) Index of features● CTree●GCodeIn a tree

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!