13.07.2015 Views

Top-K Correlation Sub-graph Search in Graph Databases

Top-K Correlation Sub-graph Search in Graph Databases

Top-K Correlation Sub-graph Search in Graph Databases

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Top</strong>-K <strong>Correlation</strong> <strong>Sub</strong>-<strong>graph</strong> <strong>Search</strong> <strong>in</strong> <strong>Graph</strong> <strong>Databases</strong> 7Def<strong>in</strong>ition 8. (Total Order) Given two sub-<strong>graph</strong>s S 1 and S 2 <strong>in</strong> <strong>graph</strong> database D, where S 1 isnot isomorphism to S 2, we can say that S 1 < S 2 if either of the follow<strong>in</strong>g conditions holds:1) sup(S 1) > sup(S 2);2) sup(S 1) == sup(S 2) AND size(S 1) < size(S 2), where size(S 1) denotes the number ofedges <strong>in</strong> S 1.3) sup(S 1) == sup(S 2) AND size(S 1) == size(S 2) AND label(S 1) < label(S 2), wherelabel(S 1) and label(S 2) are the canonical label<strong>in</strong>g of sub-<strong>graph</strong>s S 1 and S 2 respectively.S1S2 S3 S4 S5aaQuery Qa a(a) Projected Database D Qxx x x xMax Heap: H a ba c b b b cSupp(Si) 0.8 0.6 0.6 0.2 0.2Occurrences G1 G1 G1 G2 G1 G2 G1 G3 G2 bG2 G2 1G5 2 1 2 1 2a b a bG3 G2 a b3aG4 G4 43 4 3G4 c a b aG1 G2 G4<strong>Sub</strong>-<strong>graph</strong>s Si Φ( QS , i )Result SetProjected Database DQ(RS) a b 0.61xS1(b) The First Step <strong>in</strong> M<strong>in</strong><strong>in</strong>g ProcessFig. 3. Projected Database and The First StepFurthermore, as mentioned <strong>in</strong> Section 2, we only focus on search<strong>in</strong>g positively correlated<strong>graph</strong>s <strong>in</strong> this paper. Thus, accord<strong>in</strong>g to Def<strong>in</strong>ition 4, if φ(Q, S i) > 0, then sup(Q, S i) > 0,which <strong>in</strong>dicates that Q and S i must appear together <strong>in</strong> at least one <strong>graph</strong> <strong>in</strong> D. Therefore, we onlyneed to conduct the correlation search over all data <strong>graph</strong>s that conta<strong>in</strong> query <strong>graph</strong> Q, which iscalled projected <strong>graph</strong> database and denoted as D Q. A projected database of the runn<strong>in</strong>g exampleis given <strong>in</strong> Fig. 3(a). Similar with [8], we focus the m<strong>in</strong><strong>in</strong>g task on the projected database D Q. Itmeans that we always generate candidate sub-<strong>graph</strong>s from the projected <strong>graph</strong> database throughpattern growth. S<strong>in</strong>ce |D Q| < |D|, it is efficient to perform m<strong>in</strong><strong>in</strong>g task on D Q.At the beg<strong>in</strong>n<strong>in</strong>g of PG-<strong>Search</strong> algorithm, we enumerate all size-1 sub-<strong>graph</strong> S i <strong>in</strong> the projected<strong>graph</strong> database and <strong>in</strong>sert them <strong>in</strong>to the heap H <strong>in</strong> <strong>in</strong>creas<strong>in</strong>g order of total order. Wealso record their occurrences <strong>in</strong> each data <strong>graph</strong>. Fig. 3(b) shows all size-1 sub-<strong>graph</strong> S i <strong>in</strong> theprojected <strong>graph</strong> database <strong>in</strong> the runn<strong>in</strong>g example. “G 1 < 1, 2 >” <strong>in</strong> Fig. 3(b) shows that there isone occurrence of S 1 <strong>in</strong> data <strong>graph</strong> G 1 and the correspond<strong>in</strong>g vertex IDs are 1 and 2 <strong>in</strong> <strong>graph</strong> G 1(see Fig. 1). In Fig. 3(b), the head of the heap H is sub-<strong>graph</strong> S 1. Therefore, we first grow S 1,s<strong>in</strong>ce it has the highest support. For each occurrence of S 1 <strong>in</strong> the projected database D Q, we f<strong>in</strong>dall Growth Elements (GE for short) around the occurrence.Def<strong>in</strong>ition 9. Growth Element. Given a connected <strong>graph</strong> C hav<strong>in</strong>g n edges and another connected<strong>graph</strong> P hav<strong>in</strong>g n − 1 edges, P is a sub-<strong>graph</strong> of C (namely, C is a child of P ). For someoccurrence O i of P <strong>in</strong> C, we use (C \ O i) to represent the edge <strong>in</strong> C that is not <strong>in</strong> O i. (C \ O i)is called Growth Element (GE for short) w.r.t O i. Usually, GE is represented as a five-tuple(< (vid1, vlable1), (elabel), (vid2, vlabel2) >), where vid1 and vid2 are canonical vertexIDs (def<strong>in</strong>ed <strong>in</strong> Def<strong>in</strong>ition 6) of P (vid1 < vid2). vlable1 and vlable2 are correspond<strong>in</strong>g vertexlabels, and elabel is the edge label of GE. Notice that vid2 can also be “*”, and “*” means theadditional vertex for P .To facilitate understand<strong>in</strong>g GE, we illustrate it with an example. Given a sub-<strong>graph</strong> S 1 <strong>in</strong>Fig. 4(a), there is one occurrence of S 1 <strong>in</strong> data <strong>graph</strong> G 1, which is denoted as the shaded area<strong>in</strong> Fig. 4(a). For the occurrence, there are three GEs that are also shown <strong>in</strong> Fig. 4(a). Grow<strong>in</strong>gS 1 (parent) through each GE, we will obta<strong>in</strong> another size-2 sub-<strong>graph</strong> (child) <strong>in</strong> Fig. 4(b). The

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!