11.07.2015 Views

Upgrade Report - Department of Informatics - King's College London

Upgrade Report - Department of Informatics - King's College London

Upgrade Report - Department of Informatics - King's College London

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6 GRAPH SAMPLING AND CRAWLING 16In addition to the above problem there is also the more general problem <strong>of</strong> when to stop the sampling.When do we know that we’ve sampled enough <strong>of</strong> the network? This question is very critical as it has manyside-effects. What if our method is appropriate but we are stopping it too soon? What if we sampled toomuch and distorted our perfect image? The answer to this question is still largely an open problem. Howeverin [27] some convergence tactics are discussed and analyzed. This <strong>of</strong>fers a great tool for the crawler to beable to determine when it is best to halt the crawling.In general we can consider numerous other sampling goals which are sometimes case specific. For exampleone could consider the goal <strong>of</strong> measuring cover time <strong>of</strong> a graph, or <strong>of</strong> determining where a SRW <strong>of</strong> s stepsand starting from a vertex u is most likely to terminate at. These are some examples <strong>of</strong> possible interestingquestions that arise which always depends on the problem at hand. In general, sampling with SRWs is usedextensively in the context <strong>of</strong> property testing seen in Section 4.8, for example in the work <strong>of</strong> Czumaj etal [18] SRWs were used to determine whether or not a graph is an α-expander (defined in Section 4.3).6.1 Simple Random WalkThe most popular crawling method used in practice is the SRW. The reason for this is its simplicity andsurprising efficiency which made it a very attractive method to study and analyze.Let G = (V, E) be a connected graph. A SRW W u , u ∈ V on the undirected graph G = (V, E) is aMarkov chain X 0 = u, X 1 , . . . , X t , . . . on the vertices V associated to a particle that moves from vertex tovertex according to a transition rule. The probability <strong>of</strong> a transition from vertex i to vertex j is p(i, j) if{i, j} ∈ E, and 0 otherwise.Let d(v) = d(v, t) be the degree <strong>of</strong> vertex v ∈ G(t), and let N(v) denote the neighbours <strong>of</strong> v in thisgraph. A vertex u is a neighbor <strong>of</strong> v if there exists an edge e = (v, u).In the above general case the stopping criterion can be arbitrary. The most common use case is to stopwhen a sample <strong>of</strong> sufficient size and quality has been obtained, in other cases the method may stop whenall the vertices have been visited (covered).The SRW on graphs is in general the algorithm presented in Algorithm 1Algorithm 1 SRWv ← start vertexwhile not done doVisit(v)v ← random vertex from N(v)end whileThis simple method has certain properties:Transition matrix Given a graph G(N, M) the transition matrix P is a matrix which corresponds to theprobability that a certain transition will occur in a random walk. For example let P uv denote theelement at (u, v) in the transition matrix P uv = P r[“Given that we are at vertex u we move to vertexv”].{ 1P uv =d u, v ∈ N(u)0, otherwiseWhere d u is the degree <strong>of</strong> vertex u and N(u) denotes the neighborhood <strong>of</strong> u.Stationary Distribution Given a distribution π over a graph which is the proportion <strong>of</strong> time a randomwalk has spent over specific vertices, we determine the distribution at the next step <strong>of</strong> a SRW π ′ asπ ′ = P T π. The stationary distribution π s is the distribution with the property P T π s = π s .Mixing time The mixing time t m is the number <strong>of</strong> steps where the distribution π tm → π sIt is proven that in SRWs, as the number <strong>of</strong> steps t → ∞ the stationary distribution <strong>of</strong> a vertex u is:π u = d u2M(6.1)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!