11.07.2015 Views

Upgrade Report - Department of Informatics - King's College London

Upgrade Report - Department of Informatics - King's College London

Upgrade Report - Department of Informatics - King's College London

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

11 GRAPH SAMPLING 42In addition to the above properties there are many real world network characteristics which may beunknown. This claim is backed by observations on the efficiency <strong>of</strong> sampling methods such as the WRWwhich are different from what the theory suggests. An important aspect <strong>of</strong> our research would be todetermine what these properties are and formalize a unit <strong>of</strong> measurement for graphs which would indicate“ease <strong>of</strong> sampling”.10.2 Property Testing and Estimation AlgorithmsIn the section <strong>of</strong> graph analysis we have encountered the need to measure properties which are computationallyhard. Problems such as expansion testing are problems which, on graphs <strong>of</strong> such a scale, are impossibleto compute using an exhaustive algorithm. This tells us that proper estimations would need to be made.Ideal candidate methods for such estimations fall within the fields <strong>of</strong> property testing, seen in Section 4.8as well as approximation algorithms like the modularity algorithm discussed in Section 4.6. In addition weintend to implement such algorithms to test properties and measure quantities. We believe the scope <strong>of</strong> theapplication <strong>of</strong> such algorithms is limited and we intend to introduce new applications by making use <strong>of</strong> thesealgorithms. As part <strong>of</strong> our work we intend to investigate possible algorithms to solve graph partitioningproblems which we <strong>of</strong>ten encounter.There are numerous algorithms which <strong>of</strong>fer approximate solutions to the above problems and we havealready implemented and tested several. We intend to continue implementing such algorithms but in additionwe will investigate possible improvements in aspects such as accuracy and complexity.There are many possibilities in this area and we have already started work on approximate optimalminimum k-cut 4 algorithms in order to provide our own estimate <strong>of</strong> the community structure <strong>of</strong> graphs aswell as determine whether or not this cut would result in conductunce and modularity quantities providedby other, similar algorithms.11 Graph samplingIn Section 9.3 we have seen a WRW method which discovers vertices <strong>of</strong> degree d > n a in sub-linear time.There is always additional room for improvement in graph crawling and sampling. We must stress thatthe purpose <strong>of</strong> our crawling is to get representative samples <strong>of</strong> a real world network and not to sample thenetworks in their entirety. As we saw in Section 9.2 uniform sampling is successful in performing this taskbut may be unfeasible in reality. Additionally it does not work well with discovering the high degree vertices<strong>of</strong> power-law graphs. This may indicate that a combination <strong>of</strong> the two methods may work better in samplingsuch graphs. However the main focus <strong>of</strong> our future work will be given on methods based on random walksto efficiently and effectively sample from real world networks. Random walks work surprisingly well and thisis backed by theory. In general WRWs can be a very important tool in graph sampling in the general caseif we dynamically adjust the bias according the desired outcomes. This is an area <strong>of</strong> great interest which ispart <strong>of</strong> our currently ongoing research.However there are some drawbacks with the WRW method which have to do with its algorithmic complexity.This method requires additional quantities to be measured in order to determine the next vertex tobe traversed and these quantities require time O(d(u)) to compute. While it is possible to pre-proccess some<strong>of</strong> these quantities the time does not decrease in such a degree to make the WRW complexity proportionalto the SRW complexity which takes O(1) time per step.We know that C v ∝Diam(G)n is the cover time <strong>of</strong> the graph. In our case Diam(G) ∝ log n. While thiscover time is in the same scale as the SRW the expected O(n log n) <strong>of</strong> a SRW is significantly lower than therun-time <strong>of</strong> a WRW. We have implemented several optimizations on the algorithm already which includea preprocessing <strong>of</strong> the graph to determine vertex and edge weights. This process takes O(mn) time tocomplete but reduces the overall runtime <strong>of</strong> the walk. It is important for the applicability <strong>of</strong> the algorithmto firstly, analyze the complexity <strong>of</strong> the method and further reduce the run-time to quantities comparableto a SRW complexity.4 The minimum k-cut problem is the problem <strong>of</strong> finding the minimum set <strong>of</strong> edges which when removed, partition the graphinto k connected components

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!