CS224W: Social and Information Network Analysis

Jure Leskovec **Stanford** **University**

Jure Leskovec, **Stanford** **University**

http://cs224w.stanford.edu

Task:

Find coalitions in signed networks

Incentives:

European chocolates!

FFame

Up to 10% extra credit

DDue:

Friday midnight

No late days!

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

Today: 3 methods

(3) Trawling: Community signatures that can be efficiently extracted

(4) SSpectral t lgraph h partitioning: titi i LLaplacian l i matrix, ti 22nd deigenvector i t

(5) Overlapping communities: Clique percolation method

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

[Kumar et al. ‘99]

Searching for small communities in Web graph

(1) What is the signature of a

community/discussion in a Web graph

…

…

A dense 2‐layer graph

Intuition: many people all talking about the same things

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Use this to define topics:

Wh What t th the same people l on th the

left talk about on the right

4

(2) A more well‐defined well defined problem:

Enumerate complete bipartite subgraphs K s,t

Where K Ks,t = s nodes where each links to the

same t other nodes

[Kumar et al. ‘99]

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

Two points:

(1) The signature of a community/discussion

(2) Complete bipartite subgraph K Ks,t Ks,t = graph on s nodes, each links to the same t other nodes

Plan:

(A) From (2) get back to (1):

Via: Any dense enough graph contains a

smaller K s,t as a subgraph

(B) How do we solve (2) in a giant graph?

[Kumar et al. ‘99]

What similar problems have been solved on big non‐graph non graph data?

(3) Frequent itemset enumeration [Agrawal‐Srikant ‘99]

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

Marketbasket analysis:

What items are bought together in a store?

Setting:

Universe U of n items

m subsets b t of f UU: S S1, S S2, …, S Sm U

(Si is a set of items one person bought)

Frequency threshold f

Goal:

Fi Find d all ll subsets b t T s.t. t T S Si of f f f sets t S Si (items in T were bought together f times)

[Agrawa‐Srikant ‘99]

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

Example:

Universe of items:

U={1,2,3,4,5} {,,,,}

Itemsets:

S1={1,3,5}, S2={2,3,4}, S3={2,4,5}, S4={3,4,5}, S5={1,3,4,5}, S6={2,3,4,5} Minimum support:

f=3

Algorithm: Build up the lists

Insight: for a frequent set of size k, all

its subsets are also frequent

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

U={1,2,3,4,5}, U={12345} f=3

S1={1,3,5}, S2={2,3,4}, S3={2,4,5}, S S4={3,4,5}, ={345} SS={1345} 5={1,3,4,5}, SS={2345} 6={2,3,4,5}

[Agrawa‐Srikant ‘99]

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

For i =1 = 1,…,kk

Find all frequent sets of size i by

composing sets of size ii-1 1 that

differ in 1 element

Open question:

Efficiently find only

maximal frequent sets

[Agrawa‐Srikant ‘99]

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

[Kumar et al. ‘99]

Claim: (3) (itemsets) solves (2) (bipartite graphs)

How?

View each node i as a

set S i of nodes i points to

K Ks,t = a set t y of f size i t

that occurs in s sets Si Looking for Ks,t set of

frequency threshold to s

and look at layer t – all

frequent sets of size t

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

[Kumar et al. ‘99]

(2) (1): Informally, Informally every dense enough

graph G contains a bipartite Ks,t subgraph

where s and t depend on size (# of nodes) and

density (avg. degree) of G [Kovan‐Sos‐Turan ‘53]

Theorem: Let G=(X,Y,E), |X|=|Y|= n with

g g

1/

t 1

1/

t

avg. degree: d

s n

t

then G contains K stas a subgraph

s,t g p

[Will not prove it here. See online **slides**]

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

For the proof we will need the following fact

Recall:

a

a(

a 1)...(

a b 1)

b

b!

Let f(x) = x(x-1)(x-2)…(x-k)

Once x k, f(x) curves upward (convex)

Suppose a setting:

g(y) is convex

Want to minimize g(xi) where xi=x TTo minimize i i i g(x ( i) ) make k each h xi = x/n /

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

Node i, i degree d i and

neighbor set Si Put node i in buckets

for all size t subsets of

its neighbors

Potential right‐hand

sides of Ks,t (i.e., all

size t subsets of Si) 11/10/2009 Jure Leskovec, **Stanford** CS322: Network Analysis 14

Note: As soon as s nodes appear in a bucket

we have a Ks,t How many buckets node i contributes?

d i … degree of node i

i

What is the total size of all buckets?

11/10/2009 Jure Leskovec, **Stanford** CS322: Network Analysis 15

So, So the total height of

all buckets is…

Plug in:

d

s

1

/ t 1 1

1 / t

n

t

a

a(

a 1)...(

a b 1)

b

b

b !

11/10/2009 Jure Leskovec, **Stanford** CS322: Network Analysis 16

We have: Total height of all buckets

How many buckets are there?

n s

t

t!

t

n

n

t t!

What is the average height of buckets?

s t

s

t n

t ! height s

n t

! So, avg. bucket

So by pigeonhole principle, principle there must be at least

one bucket with more than s nodes in it.

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

Theoretical result:

[Kumar et al. ‘99]

Complete bipartite subgraphs Ks,t are embedded in

larger dense enough graphs (i (i.e., e the communities)

i.e., biparite subgraphs as signatures of communities

Algorithmic result:

Frequent itemset extraction and dynamic

programming

p g g

SCALABLE!!!

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

Undirected graph G(V,E): G(V E):

Bi‐partitioning Bi partitioning task:

Divide vertices into two disjoint groups (A,B)

Questions:

A 1

5 B

2

3

How can we define a “good” g ppartition

of G?

How can we efficiently identify such a partition?

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

19

4

2

6

1

3

4

5

6

What makes a good partition?

Maximize the number of within‐group connections

Mi Minimize i i th the number b of f bt between‐group

connections

2

1

3

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

4

5

6

A

Express partitioning objectives as a

function of the “edge cut” of the partition

CCut: SSet of f edges d with ihonly l one vertex in i a

group:

2

1

3

4

5

6

B

cut(A,B) = 2

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

21

Criterion: Minimum Minimum‐cut cut

Minimise weight of connections between groups

min minABcut(A,B) A,B cut(A,B)

Degenerate case:

Problem:

Optimal cut Minimum cut

OOnly l considers id external t lcluster l t connections ti

Does not consider internal cluster connectivity

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

22

Criterion: Normalized‐cut [ [Shi‐Malik, , ’97] ]

Connectivity between groups relative to the density of

each group

Vol(A): The total weight of the edges originating from group AA.

Why use this criterion?

Produces more balanced partitions

How do we efficiently find a good partition?

Problem: Computing optimal cut is NP‐hard

[Shi‐Malik]

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

23

A: adjacency matrix of undirected G

Aij =1 if (i,j) is an edge, else 0

x is a vector in n x is a vector in with components (x x )

n with components (x1,…, xn) just a label/value of each node of G

What is the meaning of AAx? x?

Entry y j is a sum of labels x i of neighbors of j

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

jth j coordinate of Ax:

Sum of the x‐values

of neighbors of j

Make this a new value at node j

Spectral Graph Theory:

Analyze the “spectrum” of matrix representing G

Spectrum: Eigenvectors of a graph, graph ordered by the

magnitude (strength) of their corresponding

eigenvalues:

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

Suppose all nodes in G have degree d

and G is connected

What are some eigenvalues/vectors of G?

Ax = x What is ? What x?

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

What if G is not connected?

Say G has 2 components, each d‐regular

What are some eigenvectors?

x= Put all 1s on A and 0s on B or vice versa

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27

Adjacency matrix (A):

n n matrix

A=[a A [aij] ij], a aij=1 ij 1 if edge between node i and j

2

1

3

Important properties:

4

5

Symmetric matrix

Eigenvectors are real and orthogonal

6

1 2 3 4 5 6

1 0 1 1 0 1 0

2 1 0 1 0 0 0

3 1 1 0 1 0 0

4 0 0 1 0 1 1

5 1 0 0 1 0 1

6 0 0 0 1 1 0

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

28

Degree matrix (D):

n n diagonal matrix

D D=[d [d ii], ] d dii = ddegree of f node d i

2

1

3

4

5

6

1 2 3 4 5 6

1 3 0 0 0 0 0

2 0 2 0 0 0 0

3 0 0 3 0 0 0

4 0 0 0 3 0 0

5 0 0 0 0 3 0

6 0 0 0 0 0 2

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

29

Laplacian p matrix () (L):

n n symmetric matrix

2

1

3

What is trivial eigenvector/

eigenvalue?

Important properties:

4

5

6

1 2 3 4 5 6

1 3 ‐1 ‐1 0 ‐1 0

2 ‐1 2 ‐1 0 0 0

3 ‐1 ‐1 3 ‐1 0 0

4 0 0 ‐1 3 ‐1 ‐1

5 ‐1 0 0 ‐1 3 ‐1

6 0 0 0 ‐1 ‐1 2

L = D - A

Eigenvalues are non‐negative non negative real numbers

Eigenvectors are real and orthogonal

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

30

For symmetric matrix M:

2

min

x

T

Mx

x

T

Mx

x x

T

What is the meaning of min xT What is the meaning of min x Lx on G?

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31

What else do we know about x?

x is unit vector

i th l t 1st x is orthogonal to 1 i t (1 1) th

st eigenvector (1,…,1) thus:

Then:

( x

x

min

i j

2 All lbli f

x

2

i

All labelings of

nodes so that

sum(x i)=0

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32

)

2

Express p p partition ( (A,B) , ) as a vector

We can minimize the cut of the partition p by y

finding a non‐trivial vector x that minimizes:

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

33

The minimum value is given by the 2 nd

smallest eigenvalue λ 2 of the Laplacian

matrix L

The optimal solution for x is given by the

corresponding eigenvector λ2, referred as

the Fiedler vector

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 34

How to define a “good” good partition of a graph?

Minimise a given graph cut criterion

How to efficiently identify such a partition?

Approximate using information provided by the

eigenvalues and eigenvectors of a graph

Spectral l Clustering l i

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

35

Three basic stages:

1. Pre‐processing

Construct a matrix representation of the graph

2. Decomposition

Compute eigenvalues and eigenvectors of the matrix

Map each point to a lower‐dimensional

representation based on one or more eigenvectors

3. Grouping

Assign points to two or more clusters, based on the

new representation

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

36

p g

Pre‐processing:

Build Laplacian

matrix L of the

graph g p

p

1 2 3 4 5 6

1 3

‐1 ‐1 0 ‐1 0

2 ‐1 2 ‐1 0 0 0

3 ‐1 ‐1 3 ‐1 0 0

4 0 0 ‐1 3 ‐1 ‐1

5 ‐1 0 0 ‐1 3 ‐1

Decomposition: 3.0

0.4 0.3 0.1 0.6 ‐0.4 0.5

Find eigenvalues

and eigenvectors x

of the matrix L

Map vertices to

corresponding p g

components of 2 0.0

1.0

= X =

1

2

3

4

0.3

3.0

4.0

5.0

0.6

0.3

6 0 0 0 ‐1 ‐1 2

0.4

0.4

0.4

0.4

0.4

0.3

0.6

‐0.3

‐0.3

‐0.6

‐0.5

0.4

0.1

‐0.5

‐0.4

‐0.2

‐0.4

0.6

‐0.2

‐0.4

‐0.4

0.4

0.4

0.4

‐0.4

‐0.5

0.0

‐0.5

‐0.5

0.0

How do we now

ffind dthe h clusters? l

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37

5

6

‐0.3

‐0.3

‐0.6

Grouping:

Sort components of reduced 1‐dimensional vector

Identify clusters by splitting the sorted vector in two

How to choose h a splitting l point?

Naïve approaches:

Split p at 0, mean or median value

More expensive approaches:

Attempt to minimise normalized cut criterion in 1‐dimension

1

2

3

4

5

6

03 0.3

0.6

0.3

‐0.33

‐0.3

‐0.6

Split p at 0:

Cluster A: Positive points

Cluster B: Negative points

1

2

3

03 0.3

0.6

0.3

**Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

38

4

5

6

‐0.3 03

‐0.3

‐0.6

A

B

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 39

How do we partition a graph into k clusters?

Two basic approaches:

Recursive bi‐partitioning [Hagen et al., ’92]

Recursively apply bi‐partitioning algorithm in a

hierarchical divisive manner

Disadvantages: Inefficient, unstable

Cluster multiple p eigenvectors g [Shi‐Malik, [ , ’00] ]

Build a reduced space from multiple eigenvectors

Commonly used in recent papers

A preferable f bl approach… h

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40

k‐eigenvector k eigenvector Algorithm [Ng et al., al ’01] 01]

Pre‐processing:

Construct the scaled adjacency matrix A': A :

1/

2 1/

2

A'

Decomposition:

D AD

Find the eigenvalues and eigenvectors of A'

Build embedded space from the eigenvectors

corresponding to the k largest eigenvalues

Grouping:

Apply k‐means to reduced nk space to get k clusters

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41

Approximates pp the optimal p cut [ [Shi‐Malik, , ’00] ]

Can be used to approximate the optimal k‐way normalized cut

Emphasizes cohesive clusters

Increases the h unevenness in i the h distribution di ib i of f the h data d

Associations between similar points are amplified, associations

between dissimilar points are attenuated

Th The data dt bbegins i t to “ “approximate i t a clustering” l t i ”

Well‐separated space

Transforms data to a new “embedded embedded space”, space ,

consisting of k orthogonal basis vectors

NB: Multiple eigenvectors prevent instability due to

information f loss l

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42

Eigengap:

The difference between two consecutive eigenvalues

Most stable clustering is generally given by the

value k that maximises eigengap:

p

Example:

Eigenvalue

50

45

40

35

30

25

20

15

10

5

0

λ 1

λ 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

k

k k k1

max

k

Choose

kk=22 11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43

2

1

METIS:

Heuristic but works really well in practice

http://glaros.dtc.umn.edu/gkhome/views/metis

Graclus:

Based on kernel k‐means

http://www.cs.utexas.edu/users/dml/Software/graclus.html

Cluto:

http://glaros.dtc.umn.edu/gkhome/views/cluto/

p //g /g / / /

Clique percorlation method:

For finding overlapping clusters

http://angel.elte.hu/cfinder/

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 44

Non‐overlapping Non overlapping vs. vs overlapping communities

11/8/2010 Jure Leskovec, **Stanford** CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 45