31.07.2013 Views

slides - SNAP - Stanford University

slides - SNAP - Stanford University

slides - SNAP - Stanford University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CS 322: (Social and Information) Network Analysis<br />

Jure Leskovec<br />

<strong>Stanford</strong> <strong>University</strong>


Want to generate realistic networks:<br />

Given a<br />

real network<br />

Generate a<br />

synthetic y network<br />

Why synthetic graphs?<br />

Compare graphs properties,<br />

eg e.g., degree distribution<br />

Anomaly detection, Simulations, Predictions, Null Null‐<br />

model, Sharing privacy sensitive graphs, …<br />

Q: Which network properties do we care about?<br />

Q: What is a good model and how do we fit it?<br />

11/12/2009<br />

Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis 2


Intuition: self‐similarity leads to power‐laws<br />

Try T to t mimic i i recursive i graph h / community it<br />

growth<br />

There are many obvious (but wrong) ways:<br />

Initial graph Recursive expansion<br />

Kronecker Product is a way of generating self‐<br />

similar matrices<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

3


Adjacency matrix<br />

(3x3)<br />

IIntermediate t di t stage t<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

Adjacency matrix<br />

[PKDD ’05]<br />

(9x9)<br />

4


Kronecker product of matrices A and B is given by<br />

N x M K x L<br />

N*K x M*L<br />

We define a Kronecker product of two graphs as<br />

a Kronecker product of their adjacency matrices<br />

11/12/2009<br />

Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

5


[Leskovec et al. PKDD ‘05]<br />

Kronecker graph: a growing sequence of graphs<br />

by b iterating it ti the th Kronecker K k product d t<br />

Each h Kronecker k multiplication l l exponentially ll<br />

increases the size of the graph<br />

Kk has N k<br />

1 nodes and E1<br />

k<br />

k 1 1 edges, so we get<br />

densification<br />

One can easily y use multiple p initiator matrices<br />

(K ’<br />

1 , K1<br />

’’ , K1<br />

’’’ ) that can be of different sizes<br />

11/12/2009<br />

Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

K 1<br />

6


K 1<br />

Continuing multypling with K 1 we<br />

obtain K4 and so on …<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

K 4 adjacency matrix<br />

[PKDD ’05]<br />

7


11/12/2009<br />

Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

[Leskovec et al. PKDD ‘05]<br />

8


Kronecker graphs have many properties<br />

found in real networks:<br />

Properties of static networks<br />

Power‐Law like Degree Distribution<br />

Power‐Law Power Law eigenvalue and eigenvector distribution<br />

Constant Diameter<br />

Properties of dynamic networks:<br />

Densification Power Law<br />

Shrinking/Stabilizing Diameter<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

[PKDD ’05]<br />

9


[PKDD ’05]<br />

Theorem: Constant diameter: If G 1 has<br />

diameter d then graph Gk also has diameter d<br />

Observation: Edges in Kronecker graphs:<br />

where h X are appropriate it nodes d<br />

Example:<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

10


Create N 1N 1N1 1 probability matrix P 1<br />

Compute the kth Kronecker power Pk For each entry y p uv of P k include an edge g (u,v) ( ) with<br />

probability puv Kronecker<br />

multiplication<br />

0.25 0.10 0.10 0.04<br />

Probability<br />

of f edge d pij [PKDD ’05]<br />

0.5 0.2 0.05 0.15 0.02 0.06 Instance<br />

0.1 0.3<br />

0.05 0.02 0.15 0.06 matrix K2 P P1 0.01 0.03 0.03 0.09<br />

P 2=P 1P 1<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

flip biased<br />

coins<br />

11


Initiator matrix K 1 is a similarity matrix<br />

Node N d vi is i described d ib dwith ith k binary bi attributes: tt ib t<br />

v1, v2 ,…, vk Probabilityy of a link between nodes vi, i, vj: j<br />

P(vi,vj) = ∏a K1[vi(a), vj(a)] K1<br />

11/12/2009<br />

<br />

0 1<br />

a b<br />

c d<br />

0<br />

1<br />

v2 = (0,1)<br />

v 3 = (1,0)<br />

P(v2,v4) = b·c<br />

Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

=<br />

[Leskovec et al. 09]<br />

12


Given a real network G<br />

Want to estimate initiator matrix:<br />

Method of moments [Owen ‘09] 09]<br />

K1<br />

<br />

a b<br />

c d<br />

Compare counts of and solve system<br />

of equations. equations<br />

Maximum likelihood [Leskovec‐Faloutsos ICML ‘07]<br />

arg max P( | G G1) )<br />

SVD [VanLoan‐Pitsianis ‘93]<br />

11/12/2009<br />

Can solve min G K K1<br />

<br />

K K1<br />

using SVD<br />

min F<br />

2<br />

Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis 13


Maximum likelihood estimation<br />

arg max<br />

G<br />

1<br />

[Leskovec‐Faloutsos ICML ‘07]<br />

P( | K ) 1<br />

Naïve estimation takes O(N!N 2 ):<br />

N! for different node labelings: g<br />

Kronecker<br />

Our solution: Metropolis sampling: N! (big) const<br />

N 2 for traversing graph adjacency matrix<br />

Our solution: Kronecker product (E


Maximum likelihood estimation<br />

Given real graph G<br />

Estimate Kronecker initiator graph Θ (e.g., ) which<br />

arg max P(<br />

G | )<br />

<br />

We need to (efficiently) ( y) calculate<br />

P(<br />

G | )<br />

AAnd dmaximize i i over Θ ( (e.g., using i gradient di t ddescent) t)<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

15


Given a ggraph p G and Kronecker matrix Θ we<br />

calculate probability that Θ generated G<br />

P(G|Θ)<br />

0.25 0.10 0.10 0.04<br />

0.5 0.2<br />

0.1 0.3<br />

Θ<br />

0.05 0.15 0.02 0.06<br />

005 0.05 002 0.02 015 0.15 006 0.06<br />

0.01 0.03 0.03 0.09<br />

Θ k<br />

P(G|Θ)<br />

[ICML ’07]<br />

G<br />

1 0 1 1<br />

0 1 0 1<br />

1 0 1 1<br />

1 1 1 1<br />

P ( G | ) <br />

<br />

<br />

[ u u,<br />

v ] <br />

( 1 1<br />

<br />

[ u u,<br />

v ])<br />

P k<br />

k<br />

( u,<br />

v)<br />

G<br />

( u,<br />

v)<br />

G<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

G<br />

16


Θ<br />

0.5 0.2<br />

0.1 0.3<br />

1<br />

2<br />

2<br />

1<br />

G’<br />

G”<br />

4<br />

3<br />

3<br />

4<br />

σ<br />

Θ k<br />

0.25 025 0.10 010 0.10 010 0.04 004<br />

0.05 0.15 0.02 0.06<br />

0.05 0.02 0.15 0.06<br />

001 0.01 003 0.03 003 0.03 009 0.09<br />

1 0 1 0<br />

0 1 1 1<br />

1 1 1 1<br />

0 0 1 1<br />

1 0 1 1<br />

0 1 0 1<br />

1 0 1 1<br />

1 1 1 1<br />

P(G’|Θ) = P(G”|Θ)<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

[ICML ’07]<br />

Nodes are unlabeled<br />

Graphs G’ and G” should<br />

have the same probability<br />

P(G’|Θ) P(G |Θ) = P(G”|Θ) P(G |Θ)<br />

One needs to consider all<br />

node correspondences σ<br />

P( G | )<br />

P(<br />

G | ,<br />

) P(<br />

)<br />

<br />

All correspondences p are a<br />

priori equally likely<br />

There are O(N!)<br />

correspondences d<br />

17


Assume we solved the correspondence problem<br />

Calculating<br />

[ICML ’07]<br />

P( G | ) [ , ] ( 1<br />

[ , ])<br />

(<br />

( u, v)<br />

G<br />

k u v<br />

( u,<br />

v)<br />

G<br />

k u v<br />

σ… node labeling<br />

Takes O(N2 ) time<br />

Infeasible for large graphs (N ~ 105 Infeasible for large graphs (N 10 )<br />

0.25 0.10 0.10 0.04<br />

005 0.05 015 0.15 002 0.02 006 0.06<br />

0.05 0.02 0.15 0.06<br />

0.01 0.03 0.03 0.09<br />

Θ kc<br />

σ<br />

P(G|Θ, σ)<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

1 0 1 1<br />

0 1 0 1<br />

1 0 1 1<br />

0 0 1 1<br />

G<br />

Part 1‐18


Log‐likelihood<br />

Gradient of log‐likelihood<br />

Sample the permutations from P(σ|G,Θ) and<br />

average the gradients<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

[ICML ’07]<br />

19


Metropolis sampling<br />

Start with a random permutation σ<br />

Do local moves on the permutation<br />

Accept the new permutation σ’ σ<br />

1<br />

2<br />

3<br />

4<br />

If new permutation is better (gives higher likelihood)<br />

else accept with prob. proportional to the ratio of<br />

likelihoods (no need to calculate the normalizing constant!)<br />

1<br />

2<br />

3<br />

Swap node<br />

labels 1 and 4<br />

4<br />

2<br />

4<br />

1 0 1 0<br />

0 1 1 1<br />

1 1 1 1<br />

0 1 1 1<br />

1<br />

2<br />

3<br />

4<br />

3<br />

1<br />

1 1 1 0<br />

1 1 1 0<br />

1 1 1 1<br />

0 0 1 1<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

[ICML ’07]<br />

Can compute efficiently<br />

Only need to account for<br />

changes in 2 rows /<br />

columns<br />

20


Metropolis permutation<br />

sampling algorithm<br />

j<br />

k<br />

Need to efficiently calculate<br />

the likelihood ratios<br />

But the permutations σ (i)<br />

and σ (i+1) and σ only differ at 2<br />

( ) only differ at 2<br />

positions<br />

So we only traverse to<br />

update d 2 rows ( (columns) l ) of f<br />

Θ k<br />

We can evaluate the<br />

likelihood ratio efficiently<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

21


Calculating naively P(G|Θ σ) takes O(N2 Calculating naively P(G|Θ,σ) takes O(N )<br />

Idea:<br />

First calculate likelihood of empty graph, graph a graph<br />

with 0 edges<br />

[ICML ’07]<br />

Correct the likelihood for edges that we observe in<br />

the graph<br />

By exploiting the structure of Kronecker product<br />

we obtain closed form for likelihood of an<br />

empty graph<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

22


We approximate the likelihood:<br />

Empty graph<br />

The sum goes only over the edges<br />

Evaluating P(G|Θ,σ) takes O(E) time<br />

Real graphs are sparse, E


Real ggraphs p are sparse p so we first<br />

calculate likelihood of empty graph<br />

Probability of edge (i,j) is in general pij =θθ a<br />

1 θ2<br />

b<br />

θ3<br />

c<br />

θ4<br />

d<br />

1 θ2 θ3 θ4 By using Taylor approximation to p ij and<br />

summing the multinomial series we<br />

obtain: obta<br />

We approximate the likelihood:<br />

Empty graph<br />

Θ k<br />

p ij =θ 1 a θ2 b θ3 c θ1 d<br />

Taylor approximation<br />

log(1-x) ~ -x – 0.5 x 2<br />

No‐edge likelihood Edge likelihood<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

24


Experimental E i t lsetup t<br />

[ICML ’07]<br />

Given real graph<br />

Stochastic gradient descent from random initial point<br />

Obtain estimated parameters<br />

Generate synthetic graphs<br />

Compare properties of both graphs<br />

WWe ddo not t fit th the properties ti themselves th l<br />

We fit the likelihood and then compare the graph<br />

properties<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

25


Can gradient descent recover true parameters?<br />

How nice (smooth, (smooth without local minima) is optimization<br />

space?<br />

Generate a graph from random parameters<br />

Start at random point and use gradient descent<br />

We recover true parameters 98% of the times<br />

How does algorithm converge to true parameters with<br />

gradient descent iterations?<br />

Log‐likelihood Avg abs error 1 st eigenvalue<br />

Diameter<br />

11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

26


Real and Kronecker are very close:<br />

[Leskovec‐Faloutsos ICML ‘07]<br />

Real and Kronecker are very close: 1 0.49 0490.13 013<br />

11/12/2009<br />

Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

K<br />

<br />

0.99 0.54<br />

27


What do estimated parameters tell us<br />

about the network structure?<br />

K1<br />

11/12/2009<br />

<br />

a b<br />

c d<br />

Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

b edges<br />

[Leskovec et al. Arxiv 09]<br />

a edges d edges<br />

c edges<br />

28


What do estimated parameters tell us<br />

about the network structure? K 1<br />

11/12/2009<br />

0.5 edges<br />

Core<br />

09 0.9 edges 0.1 edges<br />

0.5 edges<br />

Periphery<br />

01 edges<br />

Nested Core‐periphery<br />

Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

[Leskovec et al. Arxiv ‘09]<br />

0.9 0.5<br />

0.5 0.1<br />

29


[Leskovec et al. Arxiv ‘09]<br />

Small and large networks net orks are very er different: different<br />

11/12/2009<br />

099 0.99 017 0.17<br />

K = 1 K = 1<br />

0.17 0.82<br />

Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />

099054 0.99 0.54<br />

0.49 0.13<br />

30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!