11.07.2015 Views

Upgrade Report - Department of Informatics - King's College London

Upgrade Report - Department of Informatics - King's College London

Upgrade Report - Department of Informatics - King's College London

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

9 GRAPH SAMPLING 32Figure 19: Real 2009 Twitter distribution (Green) compared to sampled Twitter distribution (Red)From the above we can observe that the degree distribution is the same, however this cannot be said withcertainty as the scale <strong>of</strong> the two data sets is largely different. It is worth mentioning that the large data-setwe have acquired does not include vertices <strong>of</strong> total degree 0, which seem to take up a significant portion <strong>of</strong>the network. Also when viewing both Figure 18 and Figure 12a as we can see the anomaly <strong>of</strong> the in-degree <strong>of</strong>degree 20 is in fact an artifact <strong>of</strong> the Twitter network and still remains today. From empirical observation anumber <strong>of</strong> the users with in-degree 20 do not correspond to people but certain services, sometimes malicious.It seems that there might be a limit imposed either by Twitter or by other limitations which restricts thesekinds <strong>of</strong> accounts to an in-degree <strong>of</strong> 20.All the above have led us to the assumption that the actual Twitter network today, while being muchlarger, has a similar degree distribution compared to the network as it was in 2009, however it seems thatthe slope is slightly different.9.2 Uniform Sampling (UNI): A study <strong>of</strong> efficiencyIn this section we believe that it is worth mentioning a small case study <strong>of</strong> UNI. We have performed sometests by sampling vertices with respect to degree UAR from a preferential attachment network. This studywas made in order to correctly and accurately measure the efficiency <strong>of</strong> the UNI method depending on thesize <strong>of</strong> the sample. The resulting image was mostly what was expected (which was a near accurate degreedistribution sample even for small sample sizes) however we did make some important observations.We will present our findings here but before doing so we will describe the method we used to testthe efficiency <strong>of</strong> UNI. The method we used was the widely used and described Kolmogorov-Smirnov Test(KV Test) [46]. This goodness <strong>of</strong> fit test is defined as follows:Assuming that we are sampling from a distribution with Cumulative Distribution Function (c.d.f.) F (.)we denote the c.d.f. <strong>of</strong> our sample as F x (.) then the KV Test is defined as:D n =sup |F n (x) − F (x)| (9.1)−∞

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!