slides - SNAP - Stanford University

CS 322: (Social and Information) Network Analysis 

Jure Leskovec 

Stanford University

Want to generate realistic networks: 

Given a 

real network 

Generate a 

synthetic y network 

Why synthetic graphs? 

Compare graphs properties, 

eg e.g., degree distribution 

Anomaly detection, Simulations, Predictions, Null Null‐ 

model, Sharing privacy sensitive graphs, … 

Q: Which network properties do we care about? 

Q: What is a good model and how do we fit it? 

11/12/2009 

Jure Leskovec, Stanford CS322: Network Analysis 2

Intuition: self‐similarity leads to power‐laws 

Try T to t mimic i i recursive i graph h / community it 

growth 

There are many obvious (but wrong) ways: 

Initial graph Recursive expansion 

Kronecker Product is a way of generating self‐ 

similar matrices 

11/12/2009 Jure Leskovec, Stanford CS322: Network Analysis 

3

Adjacency matrix 

(3x3) 

IIntermediate t di t stage t 


Adjacency matrix 

[PKDD ’05] 

(9x9) 

4

Kronecker product of matrices A and B is given by 

N x M K x L 

N*K x M*L 

We define a Kronecker product of two graphs as 

a Kronecker product of their adjacency matrices 

11/12/2009 

Jure Leskovec, Stanford CS322: Network Analysis 

5

[Leskovec et al. PKDD ‘05] 

Kronecker graph: a growing sequence of graphs 

by b iterating it ti the th Kronecker K k product d t 

Each h Kronecker k multiplication l l exponentially ll 

increases the size of the graph 

Kk has N k 

1 nodes and E1 

k 

k 1 1 edges, so we get 

densification 

One can easily y use multiple p initiator matrices 

(K ’ 

1 , K1 

’’ , K1 

’’’ ) that can be of different sizes 

11/12/2009 


K 1 

6

K 1 

Continuing multypling with K 1 we 

obtain K4 and so on … 


K 4 adjacency matrix 

[PKDD ’05] 

7

11/12/2009 


[Leskovec et al. PKDD ‘05] 

8

Kronecker graphs have many properties 

found in real networks: 

Properties of static networks 

Power‐Law like Degree Distribution 

Power‐Law Power Law eigenvalue and eigenvector distribution 

Constant Diameter 

Properties of dynamic networks: 

Densification Power Law 

Shrinking/Stabilizing Diameter 


[PKDD ’05] 

9

[PKDD ’05] 

Theorem: Constant diameter: If G 1 has 

diameter d then graph Gk also has diameter d 

Observation: Edges in Kronecker graphs: 

where h X are appropriate it nodes d 

Example: 


10

Create N 1N 1N1 1 probability matrix P 1 

Compute the kth Kronecker power Pk For each entry y p uv of P k include an edge g (u,v) ( ) with 

probability puv Kronecker 

multiplication 

0.25 0.10 0.10 0.04 

Probability 

of f edge d pij [PKDD ’05] 

0.5 0.2 0.05 0.15 0.02 0.06 Instance 

0.1 0.3 

0.05 0.02 0.15 0.06 matrix K2 P P1 0.01 0.03 0.03 0.09 

P 2=P 1P 1 


flip biased 

coins 

11

Initiator matrix K 1 is a similarity matrix 

Node N d vi is i described d ib dwith ith k binary bi attributes: tt ib t 

v1, v2 ,…, vk Probabilityy of a link between nodes vi, i, vj: j 

P(vi,vj) = ∏a K1[vi(a), vj(a)] K1 

11/12/2009 

 

0 1 

a b 

c d 

0 

1 

v2 = (0,1) 

v 3 = (1,0) 

P(v2,v4) = b·c 


= 

[Leskovec et al. 09] 

12

Given a real network G 

Want to estimate initiator matrix: 

Method of moments [Owen ‘09] 09] 

K1 

 

a b 

c d 

Compare counts of and solve system 

of equations. equations 

Maximum likelihood [Leskovec‐Faloutsos ICML ‘07] 

arg max P( | G G1) ) 

SVD [VanLoan‐Pitsianis ‘93] 

11/12/2009 

Can solve min G K K1 

 

K K1 

using SVD 

min F 

2 

Jure Leskovec, Stanford CS322: Network Analysis 13

Maximum likelihood estimation 

arg max 

G 

1 

[Leskovec‐Faloutsos ICML ‘07] 

P( | K ) 1 

Naïve estimation takes O(N!N 2 ): 

N! for different node labelings: g 

Kronecker 

Our solution: Metropolis sampling: N! (big) const 

N 2 for traversing graph adjacency matrix 

Our solution: Kronecker product (E

Maximum likelihood estimation 

Given real graph G 

Estimate Kronecker initiator graph Θ (e.g., ) which 

arg max P( 

G | ) 

 

We need to (efficiently) ( y) calculate 

P( 

G | ) 

AAnd dmaximize i i over Θ ( (e.g., using i gradient di t ddescent) t) 


15

Given a ggraph p G and Kronecker matrix Θ we 

calculate probability that Θ generated G 

P(G|Θ) 

0.25 0.10 0.10 0.04 

0.5 0.2 

0.1 0.3 

Θ 

0.05 0.15 0.02 0.06 

005 0.05 002 0.02 015 0.15 006 0.06 

0.01 0.03 0.03 0.09 

Θ k 

P(G|Θ) 

[ICML ’07] 

G 

1 0 1 1 

0 1 0 1 

1 0 1 1 

1 1 1 1 

P ( G | ) 

 

 

[ u u, 

v ] 

( 1 1 

 

[ u u, 

v ]) 

P k 

k 

( u, 

v) 

G 

( u, 

v) 

G 


G 

16

Θ 

0.5 0.2 

0.1 0.3 

1 

2 

2 

1 

G’ 

G” 

4 

3 

3 

4 

σ 

Θ k 

0.25 025 0.10 010 0.10 010 0.04 004 

0.05 0.15 0.02 0.06 

0.05 0.02 0.15 0.06 

001 0.01 003 0.03 003 0.03 009 0.09 

1 0 1 0 

0 1 1 1 

1 1 1 1 

0 0 1 1 

1 0 1 1 

0 1 0 1 

1 0 1 1 

1 1 1 1 

P(G’|Θ) = P(G”|Θ) 


[ICML ’07] 

Nodes are unlabeled 

Graphs G’ and G” should 

have the same probability 

P(G’|Θ) P(G |Θ) = P(G”|Θ) P(G |Θ) 

One needs to consider all 

node correspondences σ 

P( G | ) 

P( 

G | , 

) P( 

) 

 

All correspondences p are a 

priori equally likely 

There are O(N!) 

correspondences d 

17

Assume we solved the correspondence problem 

Calculating 

[ICML ’07] 

P( G | ) [ , ] ( 1 

[ , ]) 

( 

( u, v) 

G 

k u v 

( u, 

v) 

G 

k u v 

σ… node labeling 

Takes O(N2 ) time 

Infeasible for large graphs (N ~ 105 Infeasible for large graphs (N 10 ) 

0.25 0.10 0.10 0.04 

005 0.05 015 0.15 002 0.02 006 0.06 

0.05 0.02 0.15 0.06 

0.01 0.03 0.03 0.09 

Θ kc 

σ 

P(G|Θ, σ) 


1 0 1 1 

0 1 0 1 

1 0 1 1 

0 0 1 1 

G 

Part 1‐18

Log‐likelihood 

Gradient of log‐likelihood 

Sample the permutations from P(σ|G,Θ) and 

average the gradients 


[ICML ’07] 

19

Metropolis sampling 

Start with a random permutation σ 

Do local moves on the permutation 

Accept the new permutation σ’ σ 

1 

2 

3 

4 

If new permutation is better (gives higher likelihood) 

else accept with prob. proportional to the ratio of 

likelihoods (no need to calculate the normalizing constant!) 

1 

2 

3 

Swap node 

labels 1 and 4 

4 

2 

4 

1 0 1 0 

0 1 1 1 

1 1 1 1 

0 1 1 1 

1 

2 

3 

4 

3 

1 

1 1 1 0 

1 1 1 0 

1 1 1 1 

0 0 1 1 


[ICML ’07] 

Can compute efficiently 

Only need to account for 

changes in 2 rows / 

columns 

20

Metropolis permutation 

sampling algorithm 

j 

k 

Need to efficiently calculate 

the likelihood ratios 

But the permutations σ (i) 

and σ (i+1) and σ only differ at 2 

( ) only differ at 2 

positions 

So we only traverse to 

update d 2 rows ( (columns) l ) of f 

Θ k 

We can evaluate the 

likelihood ratio efficiently 


21

Calculating naively P(G|Θ σ) takes O(N2 Calculating naively P(G|Θ,σ) takes O(N ) 

Idea: 

First calculate likelihood of empty graph, graph a graph 

with 0 edges 

[ICML ’07] 

Correct the likelihood for edges that we observe in 

the graph 

By exploiting the structure of Kronecker product 

we obtain closed form for likelihood of an 

empty graph 


22

We approximate the likelihood: 

Empty graph 

The sum goes only over the edges 

Evaluating P(G|Θ,σ) takes O(E) time 

Real graphs are sparse, E

Real ggraphs p are sparse p so we first 

calculate likelihood of empty graph 

Probability of edge (i,j) is in general pij =θθ a 

1 θ2 

b 

θ3 

c 

θ4 

d 

1 θ2 θ3 θ4 By using Taylor approximation to p ij and 

summing the multinomial series we 

obtain: obta 

We approximate the likelihood: 

Empty graph 

Θ k 

p ij =θ 1 a θ2 b θ3 c θ1 d 

Taylor approximation 

log(1-x) ~ -x – 0.5 x 2 

No‐edge likelihood Edge likelihood 


24

Experimental E i t lsetup t 

[ICML ’07] 

Given real graph 

Stochastic gradient descent from random initial point 

Obtain estimated parameters 

Generate synthetic graphs 

Compare properties of both graphs 

WWe ddo not t fit th the properties ti themselves th l 

We fit the likelihood and then compare the graph 

properties 


25

Can gradient descent recover true parameters? 

How nice (smooth, (smooth without local minima) is optimization 

space? 

Generate a graph from random parameters 

Start at random point and use gradient descent 

We recover true parameters 98% of the times 

How does algorithm converge to true parameters with 

gradient descent iterations? 

Log‐likelihood Avg abs error 1 st eigenvalue 

Diameter 


26

Real and Kronecker are very close: 

[Leskovec‐Faloutsos ICML ‘07] 

Real and Kronecker are very close: 1 0.49 0490.13 013 

11/12/2009 


K 

 

0.99 0.54 

27

What do estimated parameters tell us 

about the network structure? 

K1 

11/12/2009 

 

a b 

c d 


b edges 

[Leskovec et al. Arxiv 09] 

a edges d edges 

c edges 

28

What do estimated parameters tell us 

about the network structure? K 1 

11/12/2009 

0.5 edges 

Core 

09 0.9 edges 0.1 edges 

0.5 edges 

Periphery 

01 edges 

Nested Core‐periphery 


[Leskovec et al. Arxiv ‘09] 

0.9 0.5 

0.5 0.1 

29

[Leskovec et al. Arxiv ‘09] 

Small and large networks net orks are very er different: different 

11/12/2009 

099 0.99 017 0.17 

K = 1 K = 1 

0.17 0.82 


099054 0.99 0.54 

0.49 0.13 

30

slides - SNAP - Stanford University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?