Upgrade Report - Department of Informatics - King's College London

More documents

Recommendations

Info

11 GRAPH SAMPLING 43However the methods of the SRW and WRW are not the only methods we intend to use. There are aplethora of methods which have their uses and these include:Uniform Sampling (UNI) This method is the ideal method for unbiased sampling. While we do realisethat it is unfeasable to use this method in some cases we will continue to make extensive use of thismethod where possible. In Section 9.2 we saw some practical experiments using this method andconcluded that it is sufficient to obtain small sample sizes in order to sample the lower and mid-rangeof the degree distribution.Weighted Random Walk (WRW) This method has been extensively analyzed. In Section 9.3 we havemade use of this method to sample all high degree vertices in sub-linear time to the size of the graph.This, in conjuction with the UNI sampling could efficiently be used to sample from the entire graph insub-linear times. We intend to make further use of these methods, by relaxing the WRW requirementsto only partially sample the high degree vertices rather than fully acquire them.Simple Random Walk (SRW) The SRW method is a well known and analyzed one. It is important torealize the its potential and value and take advantage of the analyses that exist for this method inorder to perform appropriate re-weighting (such as in the RWRW) to meet our requirements. One ofthe major advantages of this method is its very low computational time and space.Sampling With Memory This kind of sampling includes a general category of sampling methods. Theseinclude methods such as: BFS, Depth-First Search (DFS), RDS. At this point we will make a specialreference to a method that had been used in the past and is included as a footnote of this report theBFS queue filtering, which includes a family of methods which is based on the application of certainselection policies on a BFS queue, which is the queue containing the order in which the vertices willbe visited during a BFS traversal. There were many interesting results when we made use of thesemethods to sample from manufactured graphs and we intend to further utilize them to sample fromreal graphs.All the methods mentioned above have been use to some extent and most of there results have been, atleast, discussed. We believe that proper utilization of these methods we further help expand our capabilitiesin graph sampling.11.1 Algorithm OptimizationsIn the context of optimization there are several aspects one should consider:• Algorithm complexity reduction• Run-time reduction• Approximation11.1.1 Algorithm complexity reductionThe most obvious way to reduce the run-time of an algorithm is by reducing its overall complexity. Howeverthis is not always easy, and sometimes it is impossible. In the case of the WRW it is possible, with theappropriate modifications, to make use of other methods of searching the adjacency lists of the graph, suchas a binary search instead of a linear search. This would reduce the complexity of each step to O(log d(u)).11.1.2 Run-time reductionThe run-time reduction of the algorithms is something that is provided by default in most compilers thereforewe will not go in an in-depth analysis of it.
12 GRAPH GENERATION MODELS 4411.1.3 ApproximationIn cases where the precise result is not required but just a “good guess” then approximations can be usedto calculate the results of certain operations. These approximations have the advantage of having verylow run-times and if it is the case that we don’t require accurate results then approximations are usuallyfavorable. In our case we can make use of approximate weights and selection of next target in order to speedup the methods.11.2 Improvement Of Sampling EfficiencyAt this point we need to stress the fact that sampling needs to be as efficient as possible. We need to ensurethat the sample size we require is the smallest possible. There are several trade-offs we need to considerfor our sampling methods. SRWs are sampling methods which require no memory, in theory. There arecertain optimizations that can be achieved by making use of some limited memory but in general the SRWand WRW do not require this memory in order to function. This may have many advantages since the sizeof the graphs we are working with is too large to assume that even a good sample would fit in memory,however this has the disadvantage of frequent vertex re-visits. This can be avoided in other methods suchas BFS and similar techniques but keeping track of a BFS tree of graphs with high branching factors is verylimiting even.In order to fully answer the question of how we can improve the sampling efficiency of our methods weneed to carefully consider these trade-offs and the sampling goals. It is a fact that when we sample weneed to store this sample somewhere, since this is our end goal. This allows us to make some run-timeoptimizations of methods such as the SRW by “simulating” revisits rather than performing an actual revisitand using information we obtained in previous steps. This may violate the “no memory” policy of the SRWbut in practice we choose to do so in order to achieve our sampling goals.Keeping the above in mind we are able to better understand our goals and limitations and what weconsider to be “reasonable” trade-offs when designing our methods. An important part of our future workwould be to apply these methods, in their more relaxed form on real networks, having the confidence thatour experimental results and theoretical results provide us with the guarantees we need in order to makeclaims that the samples we obtain achieve the goals that our methods allow us to achieve.In addition, we would be able to verify the applicability of other methods such as the modified BFSmethods which would allow us to achieve different sampling goals. However at this point we would like toelaborate on what we really mean by “sampling goal”. Up to now the sampling goals we had were to getunbiased samples of the mid-range of degree distributions and in addition to obtain all of the high degreevertices of graphs. As part of our future work we will consider relaxing the latter goal to obtaining “most”of the high degree vertices, but also expand our goals to sample triangles, clusters and expansion properties.12 Graph Generation modelsIn Sections 5.2 we saw the preferential attachment model. This model is a very good, well analyzed andunderstood model to generate graphs which share characteristics with many real online networks. Howeverthis model does not simulate all characteristics which are apparent in such networks such as communitystructure, constant-bound diameter and edge densification. There are models which have been proposedthat simulate these characteristics and we also have suggested some of our own models. However we believethere is a lot of work to be done in this aspect as most of the existing models still do not simulate somekey characteristics of real world networks and in addition many of the models have not yet been analyzed.The analysis of these models is a very important aspect of our future research goals and as we saw from theproposed method of WRW, a good analysis of such models would allow us to transfer existing knowledge ofgraph sampling on those graph generation models.Special interest will be given to graphs such as the implicit graph model seen in Section 7.4 since thismodel is able to produce power-law degree distributions with co-efficient c < 2 which are hard to generateusing other generative models. In addition, other methods based on graph generation based on affiliationswill be used more extensively since they tend to better match real world network characteristics.
Page 1 and 2: King’s College LondonDepartment o
Page 3 and 4: 7.5 Partitioned Preferential Attach
Page 5 and 6: List of Figures1 Erdós-Rényi Grap
Page 7 and 8: 2Part IIntroduction1 Online Social
Page 9 and 10: 4Part IIRelated Work4 Network prope
Page 11 and 12: 4 NETWORK PROPERTIES AND METRICS 6W
Page 13 and 14: 5 GRAPH GENERATION MODELS 8Figure 1
Page 15 and 16: 5 GRAPH GENERATION MODELS 10Figure
Page 17 and 18: 5 GRAPH GENERATION MODELS 12Figure
Page 19 and 20: 6 GRAPH SAMPLING AND CRAWLING 14on
Page 21 and 22: 6 GRAPH SAMPLING AND CRAWLING 16In
Page 23 and 24: 6 GRAPH SAMPLING AND CRAWLING 18{a(
Page 25 and 26: 7 GRAPH GENERATION 20(a) Constant m
Page 27 and 28: 7 GRAPH GENERATION 22Figure 9: Grow
Page 29 and 30: 7 GRAPH GENERATION 24Figure 10: Imp
Page 31 and 32: 8 EXISTING DATA SET ANALYSIS 26From
Page 33 and 34: 8 EXISTING DATA SET ANALYSIS 28From
Page 35 and 36: 9 GRAPH SAMPLING 30calls. We presen
Page 37 and 38: 9 GRAPH SAMPLING 32Figure 19: Real
Page 39 and 40: 9 GRAPH SAMPLING 34where b > 0 cons
Page 41 and 42: 9 GRAPH SAMPLING 36Theorem 3. For c
Page 43 and 44: 9 GRAPH SAMPLING 38Figure 22: Plots
Page 45 and 46: 9 GRAPH SAMPLING 40wide range of re
Page 47: 11 GRAPH SAMPLING 42In addition to
Page 51 and 52: 46Part VReferencesReferences[1] Dim
Page 53 and 54: REFERENCES 48[36] Jure Leskovec, La
Page 55 and 56: 50Part VIAppendixASampling Manufact
Page 57 and 58: A SAMPLING MANUFACTURED GRAPHS: BFS

Upgrade Report - Department of Informatics - King's College London

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?