13.07.2015 Views

Top-K Correlation Sub-graph Search in Graph Databases

Top-K Correlation Sub-graph Search in Graph Databases

Top-K Correlation Sub-graph Search in Graph Databases

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2 Lei Zou et al.(or S i ) appears <strong>in</strong> data <strong>graph</strong> g. In CG<strong>Search</strong> problem, each sub-<strong>graph</strong> S i of data <strong>graph</strong>s<strong>in</strong> database D is a candidate, which is quite different from (similar) sub-<strong>graph</strong> search <strong>in</strong>previous work. In (similar) sub-<strong>graph</strong> search problem [13, 18], it reports all data <strong>graph</strong>sthat (closely) conta<strong>in</strong> query <strong>graph</strong> Q. Therefore, only data <strong>graph</strong>s are candidates. Thus,the search space of CG<strong>Search</strong> problem is much larger than that of (similar) sub-<strong>graph</strong>search, which leads CG<strong>Search</strong> to be a more challeng<strong>in</strong>g task.1ax 2 1b ax 2 1b ax 2 1cx 2 1a bx 2 xa c a bx x4x x x34 xxa34 x3c a b d3x xb a3caG 1 G 2 G 3 G 4 G 5Query QaaxbS 1xaS 2∅ ( QS , ) = 0.611∅ ( QS , ) = 1.02Fig. 1. A <strong>Graph</strong> DatabaseFurthermore, CG<strong>Search</strong> is also different from frequent sub-<strong>graph</strong> m<strong>in</strong><strong>in</strong>g. As weall know, all frequent sub-<strong>graph</strong> m<strong>in</strong><strong>in</strong>g algorithms are based on “down-closure property”(thatis Apriori property)[1] no matter they adopt bread-first search [9] or depthfirstsearch [17]. However, Apriori property does not hold for φ(Q, S i ). It means thatwe cannot guarantee that φ(Q, S i ) ≥ φ(Q, S j ), if S i is a sub-<strong>graph</strong> of S j . Therefore,for CG<strong>Search</strong> problem, we cannot employ the same prun<strong>in</strong>g strategies used <strong>in</strong> frequentsub-<strong>graph</strong> m<strong>in</strong><strong>in</strong>g algorithms.In order to address CG<strong>Search</strong>, Ke et al. propose CGS Algorithm [8], which adopts“filter-and-verification” framework to perform the search. First, accord<strong>in</strong>g to a givenm<strong>in</strong>imum correlation coefficient threshold θ and a query <strong>graph</strong> Q, they derive them<strong>in</strong>imum support lowerbound(sup(g)). Any sub-<strong>graph</strong> g i whose support is less thanlowerbound(sup(g)) cannot be a correlation sub-<strong>graph</strong> w.r.t 3 query Q. Then, <strong>in</strong> theprojected database (all data <strong>graph</strong>s conta<strong>in</strong><strong>in</strong>g query <strong>graph</strong> Q are collected to form theprojected database, denoted as D q ), frequent sub-<strong>graph</strong> m<strong>in</strong><strong>in</strong>g algorithms are used tof<strong>in</strong>d candidates g i by sett<strong>in</strong>g m<strong>in</strong>imum support lowerbound(sup(g))sup(Q)(that is filter process).After that, for each candidate g i , the correlation coefficient is computed to f<strong>in</strong>d all trueanswers (that is verification process).However, CG<strong>Search</strong> <strong>in</strong> [8] requires users or applications to specify a m<strong>in</strong>imum correlationthreshold θ. In practice, it may not be trivial for users to provide an appropriatethreshold θ, s<strong>in</strong>ce different <strong>graph</strong> databases typically have different characteristics.Therefore, <strong>in</strong> this paper, we propose an alternative m<strong>in</strong><strong>in</strong>g task:top-K correlation sub<strong>graph</strong>search, TOP-CG<strong>Search</strong>, which is def<strong>in</strong>ed:Given a query <strong>graph</strong> Q, we report top-K sub-<strong>graph</strong>s S i accord<strong>in</strong>g to the Pearson’scorrelation coefficient between Q and S i (that is φ(Q, S i )).Obviously, the above methods <strong>in</strong> CGSearh [8] cannot be directly applied to theTOP-CG<strong>Search</strong> problem, s<strong>in</strong>ce we do not have the threshold θ. Simply sett<strong>in</strong>g a largethreshold and then decreas<strong>in</strong>g it step by step based on the number of answers obta<strong>in</strong>edfrom the previous step is out of practice, s<strong>in</strong>ce we need to perform frequent sub-<strong>graph</strong>m<strong>in</strong><strong>in</strong>g algorithms to obta<strong>in</strong> candidates <strong>in</strong> each iterative step. Furthermore, we also needto perform verification of CG<strong>Search</strong> [8] <strong>in</strong> each iterative step.3 with regard to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!