CS224W: Social and Information Network Analysis

Jure Leskovec, **Stanford** **University**

, y

http://cs224w.stanford.edu

Due in 1 week: Oct 4 in class!

The Th idea id of f the h reaction i papers is: i

To familiarize yourselves more in depth with the material

covered in class

DDo reading di beyond b dwhat htwas covered. d

You should be thinking beyond what you just read, and not

just take other people's work for granted.

Can be done in groups of 2‐3 students

Read at least 3 papers:

Anything from course website, last year’s website

Anything from Easley Easley‐Kleinberg Kleinberg

How to submit:

File: PDF or DOC with SUNetIds of team members:

Eg E.g., if 2 members then: ‐ .pdf pdf

Upload to Dropbox folder at http://coursework.stanford.edu

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

On 3‐5 pages answer the following questions:

1 page: Summary

What is main technical content of the papers?

How do they fit in the field, and what you have learned in class so far?

What is the connection between the papers you are discussing?

1 page: Critique

Why is it interesting in relation to the corresponding section of the

course?

What were the authors missing?

Was anything particularly unrealistic?

1 page: pg Brainstormingg

What are promising further research questions in the direction of the

papers?

How could they be pursued?

An idea of a better model for something? A better algorithm?

A test of a model or algorithm on a dataset or simulated data?

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

Erdos Erdos‐Renyi Renyi Random Graph model [Erdos [Erdos‐Renyi Renyi, ‘60] 60]

aka.: Poisson/Bernoulli random graphs

Not perfect model but interesting calculations

Two variants:

Gnp: n,p undirected graph g p on n nodes and each

edge (u,v) appears i.i.d. with probability p

So a graph with m edges appears with prob.:

(M choose m)pm (1-p) M-m ( )p ( p) ,

where M=n(n-1)/2 is the max number of edges

Gn,m: undirected graph with n nodes, m

uniformly y at random picked p edges g

What kinds of networks does such model produce?

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

4

What is expected degree of a node?

Let LtX Xv be b a random d var. measuring i the th degree d of f the th

node v (# of incident edges): E[X v]= j jP(X v=j)

Linearity of expectation:

For any random variables Y 1,Y 2,…,Y k

If Y=Y 1+Y 2+…Y k, then E[Y]= i E[Y i]

Easier way: decompose X v in X Xv= v X Xv1+X v1 Xv2+…+X v2 … X vn

where X vu is a {0,1}‐random variable which tells if edge (v,u)

exists or not. So:

E[X E[Xv] ]= u E[X E[Xvu] ] = p(n-1) p (n 1)

How to think about it:

Prob. of node u linking to node v is p

u can link (flips ( p a coin) ) for all of (n-1) ( ) remaining g nodes

Thus, the expected degree of node u is: p(n-1)

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

We want E[Xvv] be independent of n

So let: p=const/(n-1)

Observation: If we build random graph G(n,p)

with ih p=c/(n-1) /( 1) we hhave many isolated i l dnodes d

Why? P[v has degree 0]=

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

How big do we have to make p before we we’re re

likely to have no isolated nodes?

We know: P[v has degree 0] < e-c We know: P[v has degree 0] < e

Event we are asking about is:

II=some some node is isolated

I = v Iv where I Iv is the event that v is isolated

We have: P(I)= P( v I v) v P(I v) = ne -c

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

So P(I) = ne ‐c

So, P(I) = ne

Let’s try:

c=ln(n) then: ne-c =ne-ln n c=ln(n) then: ne =n1/n = 1

-c = ne-ln n = n1/n = 1

c=2ln(n) then: ne-2ln n = n1/n2 = 1/n

So if:

p=ln(n) then P(I)=1

p=2ln(n) 2l ( ) then h P(I) P(I)=1/n0 1/ 0 as n

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

Graph structure as p changes:

0

0 edges

1/(n‐1)

Giant component

appears

const/(n‐1)

Avg. deg const.

Lots of isolated

nodes.

log(n)/(n‐1)

Fewer isolated

nodes.

Emergence of a giant component:

avg. g degree g k=2m/n or p=k/(n-1) p ( )

2*log(n)/(n‐1)

No isolated nodes.

p

1

Complete

graph

k=1-ε: all components are of size Ω(log n)

k=1+ε: 1 component of size Ω(n), others have size Ω(log n)

Demo!

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

9

Degree g distribution is Binomial.

Let pk denote a fraction of nodes with degree

k:

p

Mean=npp

Var=np(1-p)

k

n

n

p ( 1

p)

k

k nk

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

10

Assume each node has d spokes (half (half‐edges): edges):

d=1: set of pairs

d=2: d2: set of cycles

d=3: arbitrarily complicated graphs

Randomly y pair p them upp

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

11

Configuration model:

Nodes with spokes Nodes with mini mini‐n0des n0des

Assume a degree sequence d d1, d d2, … d dn Useful for social networks because we have

control over the degree sequence

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

12

G(V, E) has expansion α:

if SV: #edges leaving S α min(|S|, |V-S|)

Or equivalently: α is the minimum ratio:

# edges leaving S

min(| S |, | V S |) over all sets S

ie i.e, every set of nodes has a high surface to

volume ratio

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

Expansion gives us a measure of robustness –

if we want to disconnect l nodes, we need to

cut α α l edges

Low expansion:

α =

High expansion:

α =

Social networks:

CCommunities iti

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

d‐regular graph (every node has deg. d):

Expansions it at best b d. d (when ( h S is 1 node) d)

Is there a graph on n nodes (n), max deg.

d (const) (const), so that expansion α remains const?

Examples:

Grid: d=4: α =2n/(n2 Grid: d 4: α 2n/(n /4)0 /4)0

(n/2 by n/2 square in the center)

Complete binary tree:

α 0 for|S|=(n/2)-1

Fact: for a random 3‐regular graph on n nodes nodes, there is

some const α (α>0, indep. of n) such that w.h.p. the

expansion of the graph is α

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

Fact: In a graph on n nodes with

expansion α for all pairs of nodes s

and t there is a path connecting

them of O((log n) / α) edges.

Proof:

Let Sj be a set of all nodes found

within j steps of BFS from t.

Then: |S | j+1| j+1| |S | j| j| + α|S | j|/d j| =

|Sj|(1+ α/d)

In how many steps of BFS we

reach >n/2 nodes?

Need j so that: (1+ α/d) j Need j so that: (1+ α/d) >n/2 j > n/2,

set j=d log(n)/α

So, in O(log n) steps |Sj| grows to

Θ(n).

AAnd dth the diameter di t of f G is i O(l O(log(n)/ ( )/ α) )

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

Consequence of expansion:

Short paths: O(log n)

between each pair

Working definition of a

“short path”: O(log n)

This is the “best” we can do if

the graph has constant degree

and n nodes

But social networks have

local structure:

Triadic closure:

Friend of a friend is my friend

Maybe grid is a better

model?

Pure exponential growth

Triadic closure reduces growth rate

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

17

Just before the edge (u,v) (uv) is placed how many

hops is between u and v?

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

Just before the edge (u,v) (uv) is placed how many

hops is between u and v?

Fraction of triad

(D) closing edges

(L)

(F)

G np

(A)

Network % Δ

u

F 66%

D 28%

A 23%

L 50%

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

v

w

[Watts‐Strogatz Nature ‘98]

How to have local edges (lots of triangles) and

small diameter?

Small‐world model [Watts‐Strogatz ‘98]: 98]:

Start with a low‐dimensional

regular lattice

Rewire:

Add/remove edges to create

shortcuts to join remote parts

of the lattice

For each edge with prob. p move

the other end to a random vertex

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

[Watts‐Strogatz Nature ‘98]

High clustering

High clustering

Low clustering

High diameter

Low diameter

Low diameter

Rewiring allows to interpolate between

regular lattice and a random graph

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

Measure of clustering (local structure, structure i.e., ie

triangles in a graph)

Clustering coefficient C Ci of node i is:

k ki… degree of node i

C i=0 C i=1/3 C i=1

Clustering coefficient: C =1/n = 1/n ∑ C Ci 9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

1/n ∑ C i

ient, C = 1

ng coeffic

Clusterin

Prob. of rewiring, p

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

Collaborations between actors ( (IMDB): )

225,226 nodes, avg. degree k=61

Electrical power grid:

44,941 941 nodes nodes, k=2 k=2.67 67

Network of neurons

282 nodes, k=14

L ... Average shortest path length

C ... Average clustering coefficient

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

When we add one random

connection out of each node we

get short‐paths.

Why?

Suppose we build random edges by

giving every node half edge and

randomly pair them

CConsider id a graph h where h we contract t t

2x2 subgraphs into supernodes

Now we have 4 edges sticking out of

each supernode

From Thm. we have short paths

between super nodes, we can turn

this into a path in a real graph by

adding dd at most 2 steps per hhop:

O(2log n)

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

Ok, Ok so paths are short

And people are able to find them!

(without the global knowledge of the network)

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

s only knows locations of its friends

and location of the target t

s does not know links

of anyone but itself

Geographic navigation: s forwards

the message to the node closest to t

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27

Model: Grid where each node has one

random edge

This is a small‐world.

Fact: A decentralized algorithm in Watts‐

Strogatz model needs n 2/3 steps to reach

t in expectation (even though paths of

length log(n) exist).

Proof: Let’s do this in 1‐dim. n nodes on

a ring i plus l one random d directed di t d edge d

per node. Lower bound on search time

is now n1/2 Lower bound for d‐dim.: nd/d d/d+1

9/28/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28