13.07.2015 Views

Top-K Correlation Sub-graph Search in Graph Databases

Top-K Correlation Sub-graph Search in Graph Databases

Top-K Correlation Sub-graph Search in Graph Databases

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8 Lei Zou et al.GE3 1 2axb1 x 2a bx x3 4a cS1G1GE2 GE1 1 2xa bS11 x 2a bS11 2xa bS1+++GE1 GE2 GE3 1 x 3a bx2cS61x 2a b3xcS72 x 3a b1xaS8(a) How to F<strong>in</strong>d GE(b) Grow <strong>Sub</strong>-<strong>graph</strong> by GEFig. 4. F<strong>in</strong>d<strong>in</strong>g GE and Grow <strong>Sub</strong>-<strong>graph</strong> by GEnumbers beside vertexes <strong>in</strong> each sub-<strong>graph</strong> S i are canonical vertex IDs <strong>in</strong> S i. The numbers besidevertexes <strong>in</strong> data <strong>graph</strong> G are just vertex IDs, which are assigned arbitrarily. Notice that, canonicalvertex IDs <strong>in</strong> a parent may not be preserved <strong>in</strong> its child. For example, <strong>in</strong> Fig. 4(b), the canonicalvertex IDs <strong>in</strong> S 1 are different from that <strong>in</strong> S 6. Given a size-(n+1) sub-<strong>graph</strong> S i (child), it can beobta<strong>in</strong>ed by grow<strong>in</strong>g from different size-n sub-<strong>graph</strong>s (parents). For example, <strong>in</strong> Fig. 5(a)a, wecan obta<strong>in</strong> sub-<strong>graph</strong> S 6 by grow<strong>in</strong>g S 1 or S 5. Obviously, duplicate patterns will decrease theperformance of m<strong>in</strong><strong>in</strong>g algorithm. Therefore, we propose the follow<strong>in</strong>g def<strong>in</strong>ition about correctgrowth to avoid the possible duplicate generation.Def<strong>in</strong>ition 10. Correct Growth. Given a size-(n+1) sub-<strong>graph</strong> C(child), it can be obta<strong>in</strong>ed bygrow<strong>in</strong>g some size-n sub-<strong>graph</strong>s P i (parent), i=1...m. Among all growthes, the growth from P jto C is called correct growth, where P j is the largest one among all parents accord<strong>in</strong>g to TotalOrder <strong>in</strong> Def<strong>in</strong>ition 8. Otherwise, the growth is <strong>in</strong>correct one.For example, <strong>in</strong> Fig. 5(a), S 9 has two parents, which are S 1 and S 5. S<strong>in</strong>ce S 5 is larger than S 1(see Def<strong>in</strong>ition 8), the growth from S 5 to S 6 is a correct growth, and the growth from S 1 to S 6 isan <strong>in</strong>correct growth. Obviously, accord<strong>in</strong>g to Def<strong>in</strong>ition 8, we can determ<strong>in</strong>e whether the growthfrom P to C is correct or <strong>in</strong>correct. We will propose another efficient method to determ<strong>in</strong>e thecorrect growth <strong>in</strong> Section 3.2.Now, we illustrate the m<strong>in</strong><strong>in</strong>g process with the runn<strong>in</strong>g example. In the runn<strong>in</strong>g example,we always ma<strong>in</strong>ta<strong>in</strong> β to be the K largest φ(Q, S i) by now (K = 2 <strong>in</strong> the runn<strong>in</strong>g example).Initially, we set β = −∞. First, we conduct sub-<strong>graph</strong> query to obta<strong>in</strong> projected <strong>graph</strong> databaseD Q, which is shown <strong>in</strong> Fig. 3(a). Then, we f<strong>in</strong>d all size-1 sub-<strong>graph</strong>s (hav<strong>in</strong>g one edge) <strong>in</strong> theprojected database D Q. In Fig. 3(b), we <strong>in</strong>sert them <strong>in</strong>to heap H <strong>in</strong> <strong>in</strong>creas<strong>in</strong>g order accord<strong>in</strong>g toTotal Order <strong>in</strong> Def<strong>in</strong>ition 8. We also record all their occurrences <strong>in</strong> the <strong>graph</strong> database. S<strong>in</strong>ce S 1is the head of heap H, we compute α = upperbound(φ(Q, S 1)) = 1.63 and φ(Q, S 1) = 0.61.We <strong>in</strong>set S 1 <strong>in</strong>to result set RS. Now, β = −∞ < α, the algorithm cont<strong>in</strong>ues.We f<strong>in</strong>d all growth elements (GEs for short) around S 1 <strong>in</strong> D Q, which are shown <strong>in</strong> Fig. 5(b).S<strong>in</strong>ce G 3 is not <strong>in</strong> the projected database, we do not consider the occurrences <strong>in</strong> G 3 when wef<strong>in</strong>d GEs. For each GE, we check whether it leads to a correct growth. In Fig. 5(b), only the lastone is a correct growth. We <strong>in</strong>sert the last one (that is S 10) <strong>in</strong>to the max-heap H, as shown <strong>in</strong> Fig.6. Now, the sub-<strong>graph</strong> S 2 is the heap head. We re-compute α = upperbound(φ(Q, S 2)) = 1.0,φ(Q, S 2) = 1.0, and <strong>in</strong>sert S 2 <strong>in</strong>to RS. We update β to be <strong>Top</strong>-2 answer, that is β = 0.61. S<strong>in</strong>ceβ < α, the algorithm still cont<strong>in</strong>ues. The above processes are iterated until β ≥ α. At last, wereport top-2 answers <strong>in</strong> RS.3.2 PG-search AlgorithmAs discussed <strong>in</strong> Section 3.1, the heap H is ranked accord<strong>in</strong>g to Total Order <strong>in</strong> Def<strong>in</strong>ition 8 (seeFig. 3(b) and 6). In Def<strong>in</strong>ition 8, we need sup(S i). Therefore, we ma<strong>in</strong>ta<strong>in</strong> all occurrences of S i

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!