Upgrade Report - Department of Informatics - King's College London

More documents

Recommendations

Info

5 GRAPH GENERATION MODELS 13given graph and then applying the Kronecker multiplication will produce a graph very similar to the originalgraph. This can be used to model the given network either at a scale or determine how it may look likewhen it grows.The problem of finding the initiator, while generally an NP -complete problem was proven to be solvableby approximation in O(n) time [37] and this may prove to be a valuable tool in analyzing networks andtheir temporal evolution.The Kronecker product is an operation of two matrices of arbitrary size resulting in a block matrix. Thisoperation is completely unrelated to the normal matrix product.Assume we have two matrices M A and M B of dimentions m × n and p × q respectively.⎛⎞⎛⎞a 1,1 a 1,2 · · · a 1,nb 1,1 b 1,2 · · · b 1,qM A = ⎜ a 2,1 a 2,2 · · · a 2,n⎟⎝. . . . . . . . . . . . . . . . . . . . . . ⎠ M B = ⎜b 2,1 b 2,2 · · · b 2,q⎟⎝. . . . . . . . . . . . . . . . . . . ⎠a m,1 a m,2 · · · a m,n b p,1 b p,2 · · · b p,qThe Kronecker Product of these matrices is symbolised as:and is a block operation defined as:M A ⊗ M B⎛⎞a 1,1 M B a 1,2 M B · · · a 1,n M BM A ⊗ M B = ⎜ a 2,1 M B a 2,2 M B · · · a 2,n M B⎟⎝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ⎠a m,1 M B a m,2 M B · · · a m,n M BIn the case of the Stochastic Kronecker Graph we require that M A = M B , m = n and 0 < a ij ≤ 1. Wewill symbolize M A ⊗ M A as M (2)(i)Awhich would contain elements a(2)ij. The matrix MAis defined as theadjacency probability matrix of a graph. In order to generate the actual graph which results from the n-thKronecker multiplication we create a graph where for each vertex pair u i , u j the probability there is an edgee ij = (u i , u j ) is P (e ij ) = a (n)ij .Further analysis of this model was done by M. Mahdian et al [42] and it was shown that there aretransitional phases for the emergence of a giant component and for the connectivity. Additionally Mahdianproved that the diameter beyond the connectivity threshold is constant.5.7 Affiliation NetworksIn the work of S. Lattanzi et al [33] it is pointed out that all the theoretical understanding of pre-existinggraph generation models failed to explain the properties which were recently observed in social networks[31,38]. They propose a new method which is based on previous work on bipartite models of social networks.This model aims to “capture the affiliation of agents to societies”In this model there are two distinct graphs:• A bipartite graph that represents the affiliation network, which in [33] is reffered to as B(Q, U)• The social network graph which in [33] is reffered to as G(Q, E)In the above case the set Q is shared among both graphs. Their intuition in creating this model is basedon observations on social phenomena in online graphs (such as the citation network). In their example theymake use of the citation network and in this context Q is the set of papers and U the set of topics thosepapers are about. When a new paper emerges it is likely to be based on an older paper, reffered to as theprototype and it is also likely that focus on (a subset of) the topics in which the prototype focuses on. Based
6 GRAPH SAMPLING AND CRAWLING 14on this, new vertices emerging in Q will consist of an edge copying flavor as described in Section 5.3. In asimilar fashion when a new topic emerges it is likely to be based, or inspired from, an existing topic.Based on the above example the graph G(Q, E) is constructed when taking into consideration that whenan author adds references to a new paper that author will cite most or all of the papers on that topic andsome papers of general interest. It is mentioned that this intuition can be applied to other social graphs aswell and we can assume within this statement that we can consider Q to be a set of interests a person mayhave and G(Q, E) to be a social interaction graph between people. It is more likely for people who sharethe same interest to be connected in G and in addition people without common interests may be connectedas a result of the popularity of one or both of those people.From this intuitive understanding there are two factors that emerge which are an edge copying flavorand a flavor of preferential attachment based on degree. In fact the suggested model contains both theseelements in some way. In fact since graph B(Q, U) uses the edge copying mechanism heavily it does exhibita power-law degree distribution and a community structure as it is proven in the paper. The graph G(Q, E)which also includes the degree preferential attachment mechanism as well as common affiliation links betweenvertices of Q also exhibits this phenomenon. In addition this model claims a densification power-law andbounded diameter.This model is worth noting due to the fact that all the above properties are properties which have beenobserved in most online graphs (social, citation, Peer-To-Peer (P2P) etc) but more importantly this modelprovides a proven power-law degree distribution, densification and bounded diameter.6 Graph sampling and crawlingGenerally speaking, graph sampling is a very underdeveloped topic. Put simply, the big question is: ‘Howcan one sample only a part of a graph and yet maintain certain structural information that is present onthe entire graph’. This question has many interpretations and shades of meaning. In our case, for example,we are interested in getting a crawled sample of a preferential attachment graph which is a good model ofthe graph in its entirety. Thus, this sample studied on its own will have certain required properties whichneed to hold for us. In particular we want the degree distribution that is observed in the entire network tobe, at scale, observed in our sample of the network. Other typical examples might be clustering coefficientor diameter.Work on efficient sampling of network characteristics arises in many areas. In the context of searchengine design, studies in optimally sampling the URL crawl frontier to rapidly sample (e.g.) high page-rankvertices based on knowledge of vertex degree in the current sample can be found in e.g. [4].Within the random graph community, trace-route sampling was used to estimate cumulate degree distributions;and methods of removing the high degree bias from this process were studied in e.g. [1], [25].Another approach, analyzed in [10], is the jump and crawl method to find (e.g.) all very high degree vertices.The method uses a mixture of uniform sampling followed by inspection of the neighboring vertices, in a timesub-linear in the network size.In the context of online social networks, exploration often focused on how discover the entire networkmore efficiently. Until recently this was feasible for many real world networks, before they exploded to theircurrent size. It is no longer feasible to get a consistent snapshot of the Facebook network for example. 1Methods based on SRW are commonly used for graph searching and crawling, and such methods havebeen used and analyzed extensively. Stutzbach et al [53] compare the performance of BFS with a SRW anda MHRW [28,43] on various classes of random graphs as a basis for sampling the degree distribution of theunderlying networks. The purpose of the investigation was to sample from dynamic P2P networks. In arelated study M. Gjoka et al [27] made extensive use of the above methods to collect a sample of Facebookusers. As Simple Random Walks (SRWs) are degree biassed they used a re-weighting technique to unbiasthe sampled degree sequence output by the random walk. This is referred to as a Re-Weighted RandomWalk (RWRW) in [27]. In both the above cases it was shown the bias could be removed dynamically by1 According to the Facebook statistics page at *http://www.facebook.com/press/info.php?statistics (retrieved on 02 June2011) at the time retrieved there were over 500 million active users (the exact number was not mentioned) and the average userhad 130 friends.
Page 1 and 2: King’s College LondonDepartment o
Page 3 and 4: 7.5 Partitioned Preferential Attach
Page 5 and 6: List of Figures1 Erdós-Rényi Grap
Page 7 and 8: 2Part IIntroduction1 Online Social
Page 9 and 10: 4Part IIRelated Work4 Network prope
Page 11 and 12: 4 NETWORK PROPERTIES AND METRICS 6W
Page 13 and 14: 5 GRAPH GENERATION MODELS 8Figure 1
Page 15 and 16: 5 GRAPH GENERATION MODELS 10Figure
Page 17: 5 GRAPH GENERATION MODELS 12Figure
Page 21 and 22: 6 GRAPH SAMPLING AND CRAWLING 16In
Page 23 and 24: 6 GRAPH SAMPLING AND CRAWLING 18{a(
Page 25 and 26: 7 GRAPH GENERATION 20(a) Constant m
Page 27 and 28: 7 GRAPH GENERATION 22Figure 9: Grow
Page 29 and 30: 7 GRAPH GENERATION 24Figure 10: Imp
Page 31 and 32: 8 EXISTING DATA SET ANALYSIS 26From
Page 33 and 34: 8 EXISTING DATA SET ANALYSIS 28From
Page 35 and 36: 9 GRAPH SAMPLING 30calls. We presen
Page 37 and 38: 9 GRAPH SAMPLING 32Figure 19: Real
Page 39 and 40: 9 GRAPH SAMPLING 34where b > 0 cons
Page 41 and 42: 9 GRAPH SAMPLING 36Theorem 3. For c
Page 43 and 44: 9 GRAPH SAMPLING 38Figure 22: Plots
Page 45 and 46: 9 GRAPH SAMPLING 40wide range of re
Page 47 and 48: 11 GRAPH SAMPLING 42In addition to
Page 49 and 50: 12 GRAPH GENERATION MODELS 4411.1.3
Page 51 and 52: 46Part VReferencesReferences[1] Dim
Page 53 and 54: REFERENCES 48[36] Jure Leskovec, La
Page 55 and 56: 50Part VIAppendixASampling Manufact
Page 57 and 58: A SAMPLING MANUFACTURED GRAPHS: BFS

Upgrade Report - Department of Informatics - King's College London

Create successful ePaper yourself

Delete template?

Save as template?