11.07.2015 Views

Upgrade Report - Department of Informatics - King's College London

Upgrade Report - Department of Informatics - King's College London

Upgrade Report - Department of Informatics - King's College London

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

12 GRAPH GENERATION MODELS 4411.1.3 ApproximationIn cases where the precise result is not required but just a “good guess” then approximations can be usedto calculate the results <strong>of</strong> certain operations. These approximations have the advantage <strong>of</strong> having verylow run-times and if it is the case that we don’t require accurate results then approximations are usuallyfavorable. In our case we can make use <strong>of</strong> approximate weights and selection <strong>of</strong> next target in order to speedup the methods.11.2 Improvement Of Sampling EfficiencyAt this point we need to stress the fact that sampling needs to be as efficient as possible. We need to ensurethat the sample size we require is the smallest possible. There are several trade-<strong>of</strong>fs we need to considerfor our sampling methods. SRWs are sampling methods which require no memory, in theory. There arecertain optimizations that can be achieved by making use <strong>of</strong> some limited memory but in general the SRWand WRW do not require this memory in order to function. This may have many advantages since the size<strong>of</strong> the graphs we are working with is too large to assume that even a good sample would fit in memory,however this has the disadvantage <strong>of</strong> frequent vertex re-visits. This can be avoided in other methods suchas BFS and similar techniques but keeping track <strong>of</strong> a BFS tree <strong>of</strong> graphs with high branching factors is verylimiting even.In order to fully answer the question <strong>of</strong> how we can improve the sampling efficiency <strong>of</strong> our methods weneed to carefully consider these trade-<strong>of</strong>fs and the sampling goals. It is a fact that when we sample weneed to store this sample somewhere, since this is our end goal. This allows us to make some run-timeoptimizations <strong>of</strong> methods such as the SRW by “simulating” revisits rather than performing an actual revisitand using information we obtained in previous steps. This may violate the “no memory” policy <strong>of</strong> the SRWbut in practice we choose to do so in order to achieve our sampling goals.Keeping the above in mind we are able to better understand our goals and limitations and what weconsider to be “reasonable” trade-<strong>of</strong>fs when designing our methods. An important part <strong>of</strong> our future workwould be to apply these methods, in their more relaxed form on real networks, having the confidence thatour experimental results and theoretical results provide us with the guarantees we need in order to makeclaims that the samples we obtain achieve the goals that our methods allow us to achieve.In addition, we would be able to verify the applicability <strong>of</strong> other methods such as the modified BFSmethods which would allow us to achieve different sampling goals. However at this point we would like toelaborate on what we really mean by “sampling goal”. Up to now the sampling goals we had were to getunbiased samples <strong>of</strong> the mid-range <strong>of</strong> degree distributions and in addition to obtain all <strong>of</strong> the high degreevertices <strong>of</strong> graphs. As part <strong>of</strong> our future work we will consider relaxing the latter goal to obtaining “most”<strong>of</strong> the high degree vertices, but also expand our goals to sample triangles, clusters and expansion properties.12 Graph Generation modelsIn Sections 5.2 we saw the preferential attachment model. This model is a very good, well analyzed andunderstood model to generate graphs which share characteristics with many real online networks. Howeverthis model does not simulate all characteristics which are apparent in such networks such as communitystructure, constant-bound diameter and edge densification. There are models which have been proposedthat simulate these characteristics and we also have suggested some <strong>of</strong> our own models. However we believethere is a lot <strong>of</strong> work to be done in this aspect as most <strong>of</strong> the existing models still do not simulate somekey characteristics <strong>of</strong> real world networks and in addition many <strong>of</strong> the models have not yet been analyzed.The analysis <strong>of</strong> these models is a very important aspect <strong>of</strong> our future research goals and as we saw from theproposed method <strong>of</strong> WRW, a good analysis <strong>of</strong> such models would allow us to transfer existing knowledge <strong>of</strong>graph sampling on those graph generation models.Special interest will be given to graphs such as the implicit graph model seen in Section 7.4 since thismodel is able to produce power-law degree distributions with co-efficient c < 2 which are hard to generateusing other generative models. In addition, other methods based on graph generation based on affiliationswill be used more extensively since they tend to better match real world network characteristics.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!