CS224W: Social and Information Network Analysis

Jure Leskovec **Stanford** **University**

Jure Leskovec, **Stanford** **University**

http://cs224w.stanford.edu

Power Power‐law law degree

distributions

How do power‐law p degree g

networks look like?

Random network

(Erdos‐Renyi random graph)

Scale‐free (power‐law)

network

Function is

scale free if:

f(ax) = c f(x)

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

In Preferential Attachment model power power‐law law

degrees naturally emerge [Albert‐Barabasi ‘99]

Nodes arrive in order

A new node j creates m out‐links

Prob. of linking g to a node i is proportional p p to its

degree di: P( j i)

d

i

Note: Pref Pref. Attachment is not the only model to

generate power‐law networks

What are other mechanisms giving power‐law

degree networks?

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

Preferential

attachment:

Power‐law Power law

degree

distributions

But no local

clustering

Can we get

multiple

properties?

Node degrees:

Clustering coefficient:

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

Preferential attachment is a model of a

growing network

What governs the network

growth and evolution?

P1) Node arrival process:

When nodes enter the network

P2) Edge initiation process:

Each node decides when to initiate an edge

P3) Edge destination process:

The node determines destination of the edge

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

(F)

(D)

(A)

(L)

4 online social networks with

exact edge arrival sequence

For every edge (u,v) we know exact

time of the appearance tuv Directly observe mechanisms leading

to global network properties

[Leskovec et al. KDD 08]

and so on for

millions…

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

(F) (D)

Flickr:

Exponential

(A) (L)

Delicious:

Linear

Answers:

LinkedIn:

SSub‐linear b li

QQuadratic d ti

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

7

How long do nodes live?

Node life‐time is the time between the 1st and the

last edge of a node

How often nodes “wake wake up” up to create edges?

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

LinkedIn Li k dI

Lifetime a: a

time between

node’s first

and d llast tedge d

Node lifetime is exponential: p(a) = λ exp(‐λa)

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

How often nodes “wake wake up up” to create edges?

Edge gap δ(d): time between dth and d+1st edge

of a node:

Let ti(d) be the creation time of d‐th edge of node i

δ δi(d) i(d) = t ti(d+1) i(d ) ‐ t ti(d) i(d)

Then δ(d) is a distribution (histogram) of δ i(d) over

all nodes i

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

p g

(

LinkedIn

;

,

)

e

Edge gap δ(d):

inter‐arrival

time between

d th and d+1 st

edge

For every d we get

a different plot p

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

As the degree of the node degree increases, increases

how α and β change?

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

α is const, β linear in d – gaps get smaller with d

Probabilit P ty

p

g

d=3 d=2

Edge gap

( ; ,

,

d)

Degree

d=1

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

e

d

13

Source node i wakes up and creates an edge

How does i select a target node j?

What is the degree of the target j?

Do preferential attachment really hold?

How many hops away if the target j?

Are edges attaching locally?

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

[w/ Backstrom‐Kumar‐Tomkins, KDD ’08]

Are edges more likely likel to connect to higher

degree nodes?

G np

PA

Flickr

p e

( k)

Network τ

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

k

G Gnp 0

PA 1

Flickr 1

Delicious 1

Answers 0.9

LinkedIn 0.6

15

[w/ Backstrom‐Kumar‐Tomkins, KDD ’08]

Just before the edge (u,w) (uw) is placed how many

hops is between u and w?

G np

PA

Fli Flickr k

Real edges are local local.

Most of them close triangles!

Fraction of triad

closing edges

Network % Δ

Flickr 66%

Delicious 28%

Answers 23%

LinkedIn kd 50%

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

u

v

w

16

New triad‐closing triad closing edge (u,w) (uw) appears next

We model this as:

11. Ch Choose u’sneighbor ’ ihb v u

v’

v

w

2. Choose v’s neighbor w

3. Connect ( (u,w) )

Compute edge prob. under Random‐

RRandom: d p(u,w) ( ) =

“S “Score” ” of f a graph h = p(u,w) ( )

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

17

ode)

t w (2 nd no

Select

Improvement Impro ement over oerthe the baseline: baseline

Strategies to pick a neighbor:

random: uniformly at random

deg: proportional to its degree

Strategy to select v (1 st node)

com: prop. to the number of common friends u

last: prop. to time since last activity

comlast: prop. to com*last

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

18

v

w

[w/ Backstrom‐Kumar‐Tomkins, KDD ’08]

Theorem: Exponential node lifetimes and

power‐law with exponential cutoff edge gaps

lead to power‐law degree distributions

Interesting as temporal behavior predicts

structural network property

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

[w/ Backstrom‐Kumar‐Tomkins, KDD ’08]

Node lifetime: p l() l(a) =

Node of life‐time a, what is its final degree D?

What is distribution of D as a func. of ,,?

The 2 exp funcs “cancel”. Power‐law survives

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

The model of network evolution

Process Model

P1) Node arrival • Node arrival function is given

P2) Edge initiation

P3) 3) Edge g destination

• Node lifetime is exponential

• Edge gaps get smaller as the

d degree increases i

Pick edge destination using

random‐random

d d

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

21

Given the model one can take an existing

network continue its evolution

Compare true and predicted degree

exponent: p

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

How do networks evolve at the macro level?

What are global phenomena of network growth?

Questions:

What is the relation between the number of nodes

n(t) and number of edges e(t) over time t?

How does diameter change g as the network grows? g

How does degree distribution evolve as the

network grows?

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

N(t) … nodes at time t

E(t) … edges at time t

Suppose that

N(t+1) = 2 * N(t)

Q: what is

E(t+1) =

AA: over‐doubled! d bl d!

But obeying the Densification Power Law

[Leskovec et al. KDD 05]

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

[w/ Kleinberg‐Faloutsos, KDD ’05]

What is the relation between

the number of nodes and the

Internet

edges over time? a=1.2

Prior work assumes: constant

average degree over time

Networks are denser over time

Densification Power Law:

E(t)

E(t)

Citations

N(t)

a … densification exponent (1 ≤ a ≤ 2) N(t)

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

a=1.6

25

Densification Power Law

[Leskovec et al. KDD 05]

the number of edges grows faster than the

number of nodes – average g degree g is increasing g

or

equivalently

a … densification exponent: 1 ≤ a ≤ 2:

a=1: linear growth – constant out‐degree

(traditionally assumed)

a=2: quadratic growth – clique

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

Prior models and intuition say

that the network diameter slowly

grows (like log N, log log N)

Diameter shrinks over time

as the network grows the

di distances t bt between th the nodes d

slowly decrease

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

diameter d

diametter

[w/ Kleinberg‐Faloutsos, KDD ’05]

Internet

si size e of the graph

Citations

time

27

Is shrinking

diameter just j a

consequence of

densification?

diammeter

[Leskovec et al. TKDD 07]

Erdos‐Renyi

random graph

Densification

exponent p a =1.3

size of the graph

Densifying random graph has increasing

diameterThere diameterThere is more to shrinking diameter

than just densification

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

28

Is it the degree sequence?

Compare diameter of a:

True network (red)

Random network with

diameeter

[Leskovec et al. TKDD 07]

Citations Cit ti

the same degree

distribution (blue) size of the graph

Densification + degree sequence

give shrinking h k diameter d

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29

[Leskovec et al. TKDD 07]

How does degree distribution evolve to allow

for densification?

Option 1) Degree exponent is constant:

Fact 1: For degree exponent 1< < 2: a = 2/

Email network

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 30

[Leskovec et al. TKDD 07]

How does degree distribution evolve to allow

for densification?

Option 2) Exponent n evolves with graph size n:

Fact 2:

Citation network

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31

[Leskovec et al. TKDD 07]

Let’s assume the

community structure

**University**

One expects many

within‐group

Science Arts

friendships and fewer

cross‐group ones

CS Math Drama Music

How hard is it to

cross communities? Self‐similar university

community it structure t t

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32

Assume the cross‐community cross community linking

probability of nodes at tree‐distance h is:

where: c ≥ 1 … the Difficulty constant

h … tree‐distance

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 33

n = 2k n 2 nodes reside in the leaves of the bb‐way way

community hierarchy (assume b=2)

Each node then independently creates edges

based the community hierarchy: f(h)=c-h How many edges m are in a graph of n nodes?

Community tree evolves by a complete new level of

nodes being added in each time step

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 34

[Leskovec et al. TKDD 07]

Claim: l Community Guided ddAttachment h graph h

model, the expected out‐degree of a node is

proportional i lto

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 35

[Leskovec et al. TKDD 07]

What is the link prob.: p(u v)=c-h(u,v) What is the link prob.: p(u,v) c

What is expected out‐degree of a node x?

How many nodes are at distance h?

AAnalyze l separate t cases:

Can also generalize the model

to get power‐law degrees and

densification [see TKDD 07]

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 36

Claim: The Community Guided Attachment

leads to Densification Power Law with

exponent: p

a … ddensification ifi ti exponent t

b … community tree branching factor

c … difficulty constant constant, 1 ≤ c ≤ b

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37

DPL:

Gives any non‐integer Densification exponent

If c = 1: easy to cross communities

Then: a=2 a=2, quadratic growth of edges – near

clique

If c = b: hard to cross communities

Then: a=1, linear growth of edges –constant out‐

degree

10/27/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38

[Leskovec et al. TKDD 07]

But, , we do not want to have explicit p communities

Want to model graphs that density and have

shrinking diameters

Intuition:

How do we meet friends at a party?

HHow do d we identify id tif references f when h writing iti papers? ?

w

v

10/27/2010 39

The Forest Fire model has 2 parameters:

p … forward burning probability

r … backward burning probability

The h model: dl

10/27/2010

Each turn a new node v arrives

Uniformly at random chooses an

“ambassador” w

Flip 2 geometric coins to determine the

number b of f iin‐ and d out‐links t li k of f w tto ffollow ll

Fire spreads recursively until it dies

New node v links to all burned nodes

[Leskovec et al. TKDD 07]

Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40

E(t)

Forest Fire generates graphs that densify and

have shrinking diameter

10/27/2010

densification diameter

1.32

meter

diam

N(t) N(t)

Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41

Forest Fire also generates graphs with Power‐ Power

Law degree distribution

iin‐degree d out‐degree t d

log count vs. log in-degree log count vs. log out-degree

10/27/2010

Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42

Fix backward

probability b bilit r and d

vary forward

burning g probability p y

p

Notice a sharp p

transition between

sparse and clique‐

like graphs

Sweet spot is very

narrow

10/27/2010

Increasing

diameter

Sparse

graph

Clique‐like

graph

Constant

di diameter t

Decreasing

ddiameter

Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43