Slides - SNAP - Stanford University

snap.stanford.edu

Slides - SNAP - Stanford University

CS224W: Social and Information Network Analysis

Jure Leskovec, Stanford University

, y

http://cs224w.stanford.edu


Course website:

http://cs224w.stanford.edu

Slides will be available online

Reading material will be posted online:

Chapters from the book from Jon Kleinberg and

David Easley from Cornell

Whole book is available at:

htt http://www.cs.cornell.edu/home/kleinber/networks‐book

// ll d /h /kl i b / t k b k

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 2


CContact t t(b (buddy) dd ) li list t

Messaging window

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

3


Observe social and communication

phenomena at a planetary scale

Largest social network analyzed to date

Questions:

What is the structure of the communication

network?

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

4


Data Dt for f June J 2006

Log size:

150Gb/day (compressed)

Total: 1 month of communication data:

4.5Tb of compressed data

Activity over June 2006 (30 days)

245 million users logged in

180 million users engaged in conversations

17,5 million new accounts activated

More than 30 billion conversations

More than 255 billion exchanged messages

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

5


Activity on a typical day (June 1 2006):

1 billion conversations

93 million users login

65 million different users talk (exchange

messages)

1.5 million invitations for new accounts sent

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

6


Fraction of country’s

population on MSN:

•Iceland: 35%

•Spain: 28%

•Netherlands,

Canada Canada, Sweden, Sweden

Norway: 26%

•France, UK: 18%

•USA, Brazil: 8%

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

7


Buddy Conversation

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

8


Buddy graph

240 million people (people that login in June ’06)

91billi 9.1 billion bbuddy dd edges d (f (friendship i d hi li links) k )

Communication graph (take only 2‐user

conversations)

Edge if the users exchanged at least 1 message

180 million illi people l

1.3 billion edges

30 billion conversations

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

9


9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

10


Remove nodes (in some order) and observe

how network falls apart:

Number of edges deleted

Size of largest connected component

Od Order nodes d bby:

Number of links

Total conversations

Total conv. Duration

Messages/conversation

g /

Avg. sent, avg. duration

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

11


9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

12


Origins of a small small‐world world idea:

Bacon number:

Create a network of Hollywood actors

Connect two actors if they co‐

appeared in the movie

BBacon number: b number b of f steps t tto

Kevin Bacon

As of Dec 2007, , the highest g (finite) ( )

Bacon number reported is 8

Only approx. 12% of all actors

cannot t bbe li linked k d tto Bacon B

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

13


Erdos numbers are small

Hollywood and science are small‐worlds

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

14


9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

15


What is the typical shortest path

length between any two people?

Experiment on the global friendship

network

Can’t measure, need to probe explicitly

The Small‐world Small world experiment [Stanley

Milgram ’67]

Picked 300 people p p at random

Ask them to get a letter to a by passing it

through friends to a stockbroker in

Boston

St l Mil

How many steps does it take?

Stanley Milgram

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

16


64 chains completed:

6.2 on the average, thus

“66 degrees of separation” separation

Further observations:

Milgram’s small world experiment

People what owned stock

had shortest paths to the stockbroker than

random people: 5.4 vs. 5.7

People from the Boston area have even closer

paths: 4.4

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

17


MSN Messenger network

Number of steps

between pairs of

people

A Avg. path th length l th 66.6 6

90% of the people can be reached in < 8 hops

Hops Nodes

0 1

1 10

2 78

3 3,96

4 8,648

5 3,299,252

6 28,395,849 28 395 849

7 79,059,497

8 52,995,778

9 10,321,008

people 10 1,955,007

11 518,410

12 149,945

13 44,616

14 13,740

15 4,476

16 1,542

17 536

18 167

19 71

20 29

21 16

22 10

23 3

24 2

25 3

9/22/2010 18


People use different networks:

Boston vs. occupation

Criticism:

Funneling:

31 of 64 chains passed through 1 of 3 people

ass their final step Not all links/nodes are equal

Choice of starting points and the target were non‐random

People refuse to participate (25% for Milgram)

Some sort of social search: People in the experiment follow

some strategy (e.g., geographic routing) instead of

forwarding the letter to everyone. They are not finding the

shortest path.

There are not many samples.

People might have used extra information resources.

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 19


What is the structure of a social network?

How people behave in those networks and

which mechanisms do they use to route and

find information?

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 20


In 2003 Dodds, Dodds Muhamad and Watts

performed the experiment using email:

18 targets of various backgrounds

24,000 first steps (~1,500 per target)

65% dropout d tper step t

384 chains completed (1.5%)

Chain length, L

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

[Dodds‐Muhamad‐Watts, ’03]

Avg. chain length = 4.01

PROBLEM: Huge drop‐out rate, i.e.,

longer chains are less likely to complete

21


Huge drop drop‐out out rate:

Longer chains don’t complete

Correction proposed by Harrison‐White. Let:

f fj = true ( (unobserved) b d)ffraction i of f chains hi that h would ld

have length j

N = total # of starters

Nj = # starters who reached target in j steps

Then: f *

j := Nj/N

Assume drop out rate 1 in each step so f * : f j Assume drop‐out rate 1‐ in each step, so fj := fj j j fj=1 j f *

j j =1

Observe ffj *

j , calculate the average g dropout p rate 1‐

and

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 22


After the correction:

Typical path length L=7

SSome not t well ll understood d t d

phenomena in social networks:

Funneling effect: some target target’ss friends

are more likely to be the final step.

Conjecture: High reputation/authority

Effects of target’s characteristics:

structurally why are high‐status

target g easier to find

Conjecture: Core‐periphery net structure

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 23


9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

•N… # people assigned

to correspond to target

•Nc…# completed

chains

•r… frac. of people who

did not forward

•L… mean path length

24


Assume each human is connected to 100 other

people:

So:

In step 1 she can reach 100 people

In step 2 she can reach 100*100 100 100 = 10,000 people

In step 3 she can reach 100*100*100 = 100,000 people

In 5 steps p she can reach 10 billion people p p

What’s wrong here?

Many edges are local (“short”):

friend of a friend

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

25


How can we understand the small world

phenomena?

What is a good model?

Plan:

Simplest random graph model [Erdos‐Renyi, ‘60]

The Small Small‐world world model [Watts [Watts‐Strogatz Strogatz ‘98]

Models of geographic search in networks

CConnections i to peer‐to‐peer networks k

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

26


Erdos Erdos‐Renyi Renyi Random Graph model [Erdos [Erdos‐Renyi Renyi, ‘60] 60]

aka.: Poisson/Bernoulli random graphs

Not perfect model but interesting calculations

Two variants:

Gnp: n,p undirected graph g p on n nodes and each edge g (u,v) ( , )

appears i.i.d. with probability p.

So a graph with m edges appears with prob.:

(M choose m)pm (1-p) M-m ( )p ( p) ,

where M=n(n-1)/2 is the max number of edges

Gn,m: undirected graph with n nodes, m uniformly at

random picked p edges g

What kinds of networks does such model produce?

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis

27


What is expected degree of a node?

Let LtX Xv be b a random d var. measuring i the th degree d of f the th

node v (# of incident edges): E[X v]= j jP(X v=j)

Linearity of expectation:

For any random variables Y 1,Y 2,…,Y k

If Y=Y 1+Y 2+…Y k, then E[Y]= i E[Y i]

Easier way: decompose X v in X Xv= v X Xv1+X v1 Xv2+…+X v2 … X vn

where X vu is a {0,1}‐random variable which tells if edge (v,u)

exists or not. So:

E[X E[Xv] ]= u E[X E[Xvu] ] = p (n‐1) (n 1)

How to think about it:

Prob. of node u linking to node v is p

u can link (flips ( p a coin) ) for all of (n-1) ( ) remaining g nodes

Thus, the expected degree of node u is: p(n-1)

9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 28

More magazines by this user
Similar magazines