slides - SNAP - Stanford University
slides - SNAP - Stanford University
slides - SNAP - Stanford University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
CS 322: (Social and Information) Network Analysis<br />
Jure Leskovec<br />
<strong>Stanford</strong> <strong>University</strong>
Want to generate realistic networks:<br />
Given a<br />
real network<br />
Generate a<br />
synthetic y network<br />
Why synthetic graphs?<br />
Compare graphs properties,<br />
eg e.g., degree distribution<br />
Anomaly detection, Simulations, Predictions, Null Null‐<br />
model, Sharing privacy sensitive graphs, …<br />
Q: Which network properties do we care about?<br />
Q: What is a good model and how do we fit it?<br />
11/12/2009<br />
Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis 2
Intuition: self‐similarity leads to power‐laws<br />
Try T to t mimic i i recursive i graph h / community it<br />
growth<br />
There are many obvious (but wrong) ways:<br />
Initial graph Recursive expansion<br />
Kronecker Product is a way of generating self‐<br />
similar matrices<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
3
Adjacency matrix<br />
(3x3)<br />
IIntermediate t di t stage t<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
Adjacency matrix<br />
[PKDD ’05]<br />
(9x9)<br />
4
Kronecker product of matrices A and B is given by<br />
N x M K x L<br />
N*K x M*L<br />
We define a Kronecker product of two graphs as<br />
a Kronecker product of their adjacency matrices<br />
11/12/2009<br />
Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
5
[Leskovec et al. PKDD ‘05]<br />
Kronecker graph: a growing sequence of graphs<br />
by b iterating it ti the th Kronecker K k product d t<br />
Each h Kronecker k multiplication l l exponentially ll<br />
increases the size of the graph<br />
Kk has N k<br />
1 nodes and E1<br />
k<br />
k 1 1 edges, so we get<br />
densification<br />
One can easily y use multiple p initiator matrices<br />
(K ’<br />
1 , K1<br />
’’ , K1<br />
’’’ ) that can be of different sizes<br />
11/12/2009<br />
Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
K 1<br />
6
K 1<br />
Continuing multypling with K 1 we<br />
obtain K4 and so on …<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
K 4 adjacency matrix<br />
[PKDD ’05]<br />
7
11/12/2009<br />
Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
[Leskovec et al. PKDD ‘05]<br />
8
Kronecker graphs have many properties<br />
found in real networks:<br />
Properties of static networks<br />
Power‐Law like Degree Distribution<br />
Power‐Law Power Law eigenvalue and eigenvector distribution<br />
Constant Diameter<br />
Properties of dynamic networks:<br />
Densification Power Law<br />
Shrinking/Stabilizing Diameter<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
[PKDD ’05]<br />
9
[PKDD ’05]<br />
Theorem: Constant diameter: If G 1 has<br />
diameter d then graph Gk also has diameter d<br />
Observation: Edges in Kronecker graphs:<br />
where h X are appropriate it nodes d<br />
Example:<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
10
Create N 1N 1N1 1 probability matrix P 1<br />
Compute the kth Kronecker power Pk For each entry y p uv of P k include an edge g (u,v) ( ) with<br />
probability puv Kronecker<br />
multiplication<br />
0.25 0.10 0.10 0.04<br />
Probability<br />
of f edge d pij [PKDD ’05]<br />
0.5 0.2 0.05 0.15 0.02 0.06 Instance<br />
0.1 0.3<br />
0.05 0.02 0.15 0.06 matrix K2 P P1 0.01 0.03 0.03 0.09<br />
P 2=P 1P 1<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
flip biased<br />
coins<br />
11
Initiator matrix K 1 is a similarity matrix<br />
Node N d vi is i described d ib dwith ith k binary bi attributes: tt ib t<br />
v1, v2 ,…, vk Probabilityy of a link between nodes vi, i, vj: j<br />
P(vi,vj) = ∏a K1[vi(a), vj(a)] K1<br />
11/12/2009<br />
<br />
0 1<br />
a b<br />
c d<br />
0<br />
1<br />
v2 = (0,1)<br />
v 3 = (1,0)<br />
P(v2,v4) = b·c<br />
Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
=<br />
[Leskovec et al. 09]<br />
12
Given a real network G<br />
Want to estimate initiator matrix:<br />
Method of moments [Owen ‘09] 09]<br />
K1<br />
<br />
a b<br />
c d<br />
Compare counts of and solve system<br />
of equations. equations<br />
Maximum likelihood [Leskovec‐Faloutsos ICML ‘07]<br />
arg max P( | G G1) )<br />
SVD [VanLoan‐Pitsianis ‘93]<br />
11/12/2009<br />
Can solve min G K K1<br />
<br />
K K1<br />
using SVD<br />
min F<br />
2<br />
Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis 13
Maximum likelihood estimation<br />
arg max<br />
G<br />
1<br />
[Leskovec‐Faloutsos ICML ‘07]<br />
P( | K ) 1<br />
Naïve estimation takes O(N!N 2 ):<br />
N! for different node labelings: g<br />
Kronecker<br />
Our solution: Metropolis sampling: N! (big) const<br />
N 2 for traversing graph adjacency matrix<br />
Our solution: Kronecker product (E
Maximum likelihood estimation<br />
Given real graph G<br />
Estimate Kronecker initiator graph Θ (e.g., ) which<br />
arg max P(<br />
G | )<br />
<br />
We need to (efficiently) ( y) calculate<br />
P(<br />
G | )<br />
AAnd dmaximize i i over Θ ( (e.g., using i gradient di t ddescent) t)<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
15
Given a ggraph p G and Kronecker matrix Θ we<br />
calculate probability that Θ generated G<br />
P(G|Θ)<br />
0.25 0.10 0.10 0.04<br />
0.5 0.2<br />
0.1 0.3<br />
Θ<br />
0.05 0.15 0.02 0.06<br />
005 0.05 002 0.02 015 0.15 006 0.06<br />
0.01 0.03 0.03 0.09<br />
Θ k<br />
P(G|Θ)<br />
[ICML ’07]<br />
G<br />
1 0 1 1<br />
0 1 0 1<br />
1 0 1 1<br />
1 1 1 1<br />
P ( G | ) <br />
<br />
<br />
[ u u,<br />
v ] <br />
( 1 1<br />
<br />
[ u u,<br />
v ])<br />
P k<br />
k<br />
( u,<br />
v)<br />
G<br />
( u,<br />
v)<br />
G<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
G<br />
16
Θ<br />
0.5 0.2<br />
0.1 0.3<br />
1<br />
2<br />
2<br />
1<br />
G’<br />
G”<br />
4<br />
3<br />
3<br />
4<br />
σ<br />
Θ k<br />
0.25 025 0.10 010 0.10 010 0.04 004<br />
0.05 0.15 0.02 0.06<br />
0.05 0.02 0.15 0.06<br />
001 0.01 003 0.03 003 0.03 009 0.09<br />
1 0 1 0<br />
0 1 1 1<br />
1 1 1 1<br />
0 0 1 1<br />
1 0 1 1<br />
0 1 0 1<br />
1 0 1 1<br />
1 1 1 1<br />
P(G’|Θ) = P(G”|Θ)<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
[ICML ’07]<br />
Nodes are unlabeled<br />
Graphs G’ and G” should<br />
have the same probability<br />
P(G’|Θ) P(G |Θ) = P(G”|Θ) P(G |Θ)<br />
One needs to consider all<br />
node correspondences σ<br />
P( G | )<br />
P(<br />
G | ,<br />
) P(<br />
)<br />
<br />
All correspondences p are a<br />
priori equally likely<br />
There are O(N!)<br />
correspondences d<br />
17
Assume we solved the correspondence problem<br />
Calculating<br />
[ICML ’07]<br />
P( G | ) [ , ] ( 1<br />
[ , ])<br />
(<br />
( u, v)<br />
G<br />
k u v<br />
( u,<br />
v)<br />
G<br />
k u v<br />
σ… node labeling<br />
Takes O(N2 ) time<br />
Infeasible for large graphs (N ~ 105 Infeasible for large graphs (N 10 )<br />
0.25 0.10 0.10 0.04<br />
005 0.05 015 0.15 002 0.02 006 0.06<br />
0.05 0.02 0.15 0.06<br />
0.01 0.03 0.03 0.09<br />
Θ kc<br />
σ<br />
P(G|Θ, σ)<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
1 0 1 1<br />
0 1 0 1<br />
1 0 1 1<br />
0 0 1 1<br />
G<br />
Part 1‐18
Log‐likelihood<br />
Gradient of log‐likelihood<br />
Sample the permutations from P(σ|G,Θ) and<br />
average the gradients<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
[ICML ’07]<br />
19
Metropolis sampling<br />
Start with a random permutation σ<br />
Do local moves on the permutation<br />
Accept the new permutation σ’ σ<br />
1<br />
2<br />
3<br />
4<br />
If new permutation is better (gives higher likelihood)<br />
else accept with prob. proportional to the ratio of<br />
likelihoods (no need to calculate the normalizing constant!)<br />
1<br />
2<br />
3<br />
Swap node<br />
labels 1 and 4<br />
4<br />
2<br />
4<br />
1 0 1 0<br />
0 1 1 1<br />
1 1 1 1<br />
0 1 1 1<br />
1<br />
2<br />
3<br />
4<br />
3<br />
1<br />
1 1 1 0<br />
1 1 1 0<br />
1 1 1 1<br />
0 0 1 1<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
[ICML ’07]<br />
Can compute efficiently<br />
Only need to account for<br />
changes in 2 rows /<br />
columns<br />
20
Metropolis permutation<br />
sampling algorithm<br />
j<br />
k<br />
Need to efficiently calculate<br />
the likelihood ratios<br />
But the permutations σ (i)<br />
and σ (i+1) and σ only differ at 2<br />
( ) only differ at 2<br />
positions<br />
So we only traverse to<br />
update d 2 rows ( (columns) l ) of f<br />
Θ k<br />
We can evaluate the<br />
likelihood ratio efficiently<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
21
Calculating naively P(G|Θ σ) takes O(N2 Calculating naively P(G|Θ,σ) takes O(N )<br />
Idea:<br />
First calculate likelihood of empty graph, graph a graph<br />
with 0 edges<br />
[ICML ’07]<br />
Correct the likelihood for edges that we observe in<br />
the graph<br />
By exploiting the structure of Kronecker product<br />
we obtain closed form for likelihood of an<br />
empty graph<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
22
We approximate the likelihood:<br />
Empty graph<br />
The sum goes only over the edges<br />
Evaluating P(G|Θ,σ) takes O(E) time<br />
Real graphs are sparse, E
Real ggraphs p are sparse p so we first<br />
calculate likelihood of empty graph<br />
Probability of edge (i,j) is in general pij =θθ a<br />
1 θ2<br />
b<br />
θ3<br />
c<br />
θ4<br />
d<br />
1 θ2 θ3 θ4 By using Taylor approximation to p ij and<br />
summing the multinomial series we<br />
obtain: obta<br />
We approximate the likelihood:<br />
Empty graph<br />
Θ k<br />
p ij =θ 1 a θ2 b θ3 c θ1 d<br />
Taylor approximation<br />
log(1-x) ~ -x – 0.5 x 2<br />
No‐edge likelihood Edge likelihood<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
24
Experimental E i t lsetup t<br />
[ICML ’07]<br />
Given real graph<br />
Stochastic gradient descent from random initial point<br />
Obtain estimated parameters<br />
Generate synthetic graphs<br />
Compare properties of both graphs<br />
WWe ddo not t fit th the properties ti themselves th l<br />
We fit the likelihood and then compare the graph<br />
properties<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
25
Can gradient descent recover true parameters?<br />
How nice (smooth, (smooth without local minima) is optimization<br />
space?<br />
Generate a graph from random parameters<br />
Start at random point and use gradient descent<br />
We recover true parameters 98% of the times<br />
How does algorithm converge to true parameters with<br />
gradient descent iterations?<br />
Log‐likelihood Avg abs error 1 st eigenvalue<br />
Diameter<br />
11/12/2009 Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
26
Real and Kronecker are very close:<br />
[Leskovec‐Faloutsos ICML ‘07]<br />
Real and Kronecker are very close: 1 0.49 0490.13 013<br />
11/12/2009<br />
Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
K<br />
<br />
0.99 0.54<br />
27
What do estimated parameters tell us<br />
about the network structure?<br />
K1<br />
11/12/2009<br />
<br />
a b<br />
c d<br />
Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
b edges<br />
[Leskovec et al. Arxiv 09]<br />
a edges d edges<br />
c edges<br />
28
What do estimated parameters tell us<br />
about the network structure? K 1<br />
11/12/2009<br />
0.5 edges<br />
Core<br />
09 0.9 edges 0.1 edges<br />
0.5 edges<br />
Periphery<br />
01 edges<br />
Nested Core‐periphery<br />
Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
[Leskovec et al. Arxiv ‘09]<br />
0.9 0.5<br />
0.5 0.1<br />
29
[Leskovec et al. Arxiv ‘09]<br />
Small and large networks net orks are very er different: different<br />
11/12/2009<br />
099 0.99 017 0.17<br />
K = 1 K = 1<br />
0.17 0.82<br />
Jure Leskovec, <strong>Stanford</strong> CS322: Network Analysis<br />
099054 0.99 0.54<br />
0.49 0.13<br />
30