Fast subtree kernels on graphs - VideoLectures

<strong>Fast</strong> <strong>subtree</strong> <strong>kernels</strong> on graphs 

Nino Shervashidze 

joint work with Karsten Borgwardt 

Machine Learning and Computational Biology Research Group 

Max Planck Institute for Biological Cybernetics, Tübingen 

Max Planck Institute for Developmental Biology, Tübingen 

9 December 2009 

N. Shervashidze, K. Borgwardt <strong>Fast</strong> <strong>subtree</strong> <strong>kernels</strong> on graphs NIPS 1

Introduction 


◮ Kernels are inner products in some feature space H: 

k(x, x ′ )=〈φ(x), φ(x ′ )〉. 

◮ Intuitively, k(x, x ′ ) is a measure of similarity of x and x ′ . 

◮ x and x ′ can be vectors, but also strings, trees, graphs. 

◮ Kernels are used within kernel methods in 

◮ classification (SVM), 

◮ regression, 

◮ feature selection, 

◮ two-sample problems, etc. 





k(x, x ′ )=〈φ(x), φ(x ′ )〉. 












k(x, x ′ )=〈φ(x), φ(x ′ )〉. 












k(x, x ′ )=〈φ(x), φ(x ′ )〉. 










Why graph <strong>kernels</strong>? 

For instance, they can be used in graph classification. 

figure by Koji Tsuda 



Overview 

Overview of graph <strong>kernels</strong> 

◮ Graph <strong>kernels</strong> usually count matching subgraphs (Haussler, 1999) 

◮ 

paths, walks, cycles, graphlets, etc. 

◮ All subgraphs kernel is at least as hard to compute as isomorphism 

checking (Gärtner et al., 2003) 

◮ Restricted classes of subgraphs: better runtime (and no isomorphism 

checking) 

◮ But we still need graph <strong>kernels</strong> that 

◮ can take into account node and edge labels 

◮ are efficient to compute even on large graphs 



Overview 



◮ 





checking) 






Overview 



◮ 





checking) 






Overview 



◮ 





checking) 






Overview 


10 

9 

Subtree kernel (Ramon and Gaertner, 2003) 

Runtime for labeled graphs 

8 

7 

6 

5 

4 

3 

2 

1 

100 200 300 400 500 600 700 800 900 1000 

Graph size 

100 graphs, <strong>subtree</strong> height 3, alphabet size 25, max. degree n/2, n 2 /2 edges 



Overview 



10 

9 

8 

7 

6 

5 

4 

3 

2 


<strong>Fast</strong> Random Walk (Vishwanathan et al., 2007) 

1 

100 200 300 400 500 600 700 800 900 1000 

Graph size 




Overview 



10 

9 

8 

7 

6 

5 

4 

3 

2 



Shortest Path (Borgwardt and Kriegel, 2005) 

1 

100 200 300 400 500 600 700 800 900 1000 

Graph size 




Overview 



10 

9 

8 

7 

6 

5 

4 

3 

2 




3-Graphlet (Shervashidze et al., 2009) 

1 

100 200 300 400 500 600 700 800 900 1000 

Graph size 




Overview 



10 

9 

8 

7 

6 

5 

4 

3 

2 





Weisfeiler-Lehman <strong>subtree</strong> kernel (this talk) 

1 

100 200 300 400 500 600 700 800 900 1000 

Graph size 




Subtree <strong>kernels</strong> 


◮ Informally, <strong>subtree</strong> <strong>kernels</strong> iteratively look at neighborhoods of nodes. 

◮ Unfolding the structure over iterations, we get a tree-like pattern, 

called “<strong>subtree</strong>” or “tree-walk” in the literature. 

1 

2 

3 

1 

2 

3 

6 

6 

4 

5 








1 

2 

3 

1 

2 

3 

6 

6 

5 

4 

1 3 1 2 4 5 1 5 








1 

2 

3 

1 

2 

3 

6 

6 

5 

4 

1 3 1 2 4 5 1 5 

Subtree of height 2 rooted at the node 1 




The 1-dimensional Weisfeiler-Lehman algorithm (1968) 

Given two graphs G and G ′ 

1 

1 

1 

1 

1 

1 

1 

1 1 

1 

1 

1 

Are they non-isomorphic? 

1-dimensional WL algorithm may answer this question. 




The 1-dimensional Weisfeiler-Lehman algorithm: Iteration 1 

Each iteration of the 1-dimensional WL test comprises the following steps: 

1. Multiset-label 

determination and 

sorting 

O(m) via bucket sort 

2. Label compression 

O(m) via radix sort 

3. Relabeling O(n) 

Are the label sets of G 

and G ′ identical? Yes. 

Continue. 

1 

1 

1,111 

1,11 

1 

1 

1, 11 

1,111 

1 1 

1 1 

1,1111 1,11 

1,11 1,11 

1 

1 

1,1111 

1,11 

1 

1 

1,111 

1,111 








sorting 




1,111 

1,11 

1, 11 

1,111 

1,1111 1,11 

1,11 1,11 

1,1111 

1,11 

1,111 

1,111 





1, 11 

1, 11 

1, 11 

1, 11 

1, 11 

1, 11 

1,111 

1,111 

1,111 

1,111 

1,1111 

1,1111 








sorting 


1, 11 

1, 11 

1, 11 

1, 11 

1, 11 

1, 11 

1,111 

1,111 

1,111 

1,111 

1,1111 

1,1111 







1, 11 

1,111 

1,1111 

2 

3 

4 








sorting 








1,111 

1,11 

3 

2 

1, 11 

1,111 

2 

3 

1,1111 1,11 

1,11 1,11 

4 2 

2 2 

1,1111 

1,11 

4 

2 

1,111 

1,111 

3 

3 








sorting 








3 

2 

2 

3 

4 

2 

2 2 

4 

2 

3 

3 







sorting 




3 

2 

2 

3 

4 2 

2 2 

4 

2 

3 

3 



and G ′ identical? No. 

Output YES 

3,242 

2, 43 

4,2332 

2,42 

4,3322 

3,324 

Overall complexity - 

O(hm) for h iterations 

2,33 

3,242 

2,34 2,24 

2,33 

3,243 







sorting 


3,242 

2, 43 

4,2332 

2,42 

4,3322 

3,324 



2,33 

3,242 

2,34 2,24 

2,33 

3,243 




Output YES 

3,224 

2, 34 

4,2233 

2,24 

4,2233 

3,234 



2,33 

3,224 

2,34 2,24 

2,33 

3,234 







sorting 







Output YES 



2, 34 

3,224 4,2233 2,24 

2,33 

2,34 2,24 

3,224 

2,24 

2, 34 

2,24 2, 34 

2,33 

3,224 

2,33 

3,224 

4,2233 

3,234 

3,234 

2,33 

3,234 

3,234 

4,2233 

4,2233 







sorting 







Output YES 



2,24 

2,24 

2, 34 

2, 34 

2,33 

2,33 

3,224 

3,224 

2,24 5 3,224 

2,33 

6 3,234 

2,34 

7 4,2233 

3,234 

3,234 

4,2233 

4,2233 

8 

9 

10 







sorting 


3,224 

2, 34 

4,2233 

2,24 

4,2233 

3,234 




2,33 

3,224 

2,34 2,24 

2,33 

3,234 



Output YES 

8 

7 

10 

5 

10 

9 



6 

8 

7 5 

6 

9 







sorting 




8 

7 

10 

5 

10 

9 




Output YES 

6 

8 

7 5 

6 

9 









sorting 




8 

7 

10 

5 

10 

9 




Output YES 

6 

8 

7 5 

6 

9 




The Weisfeiler-Lehman kernel on graphs 

Differences between test and kernel 

WL kernel vs isomorphism test 

The test 

◮ checks sets of node labels of 

two graphs for identity after 

each iteration 

◮ stops when the sets become 

different or when number of 

iterations reaches n 

◮ is computed in O(hm) 



Differences between test and kernel 

WL kernel vs isomorphism test 

The test 

◮ checks sets of node labels of 

two graphs for identity after 


◮ stops when the sets become 

different or when number of 

iterations reaches n 


The kernel 

◮ counts matching pairs of 

labels in two graphs after 


◮ the number of iterations h is 

a parameter of the algorithm 

(in practice h of 2 or 3 gives 

the best results) 




Definitions 

The Weisfeiler-Lehman kernel on a pair of graphs: Initialization 

5 

2 

2 

5 

4 

3 

4 

3 

G 1 

1 

1 

2 

G ′ 

Initial feature vector representations of G and G ′ : 

φ (G) = (2, 1, 1, 1, 1) 

0 

φ (G’) = (1, 2, 1, 1, 1) 

0 

12345 




The Weisfeiler-Lehman kernel on a pair of graphs: Iteration 1 



sorting 




4 

5 

1 

2 

1 

3 

4 

2 

1 

5 

2 

3 


5,234 

2,35 

2,45 

5,234 

Update feature vector 

representations of G and 

G ′ . 

k (1) 

WL (G, G′ ) = 11. 

4,1135 

1,4 

3,245 

1,4 

4,1235 

1,4 

3,245 

2,3 







sorting 







G ′ . 

k (1) 

WL (G, G′ ) = 11. 

5,234 2,35 

2,45 5,234 

4,1135 

3,245 4,1235 3,245 

1,4 1,4 

1,4 2,3 

1,4 

2,3 

2,35 

6 

7 

8 

3,245 

4,1135 

4,1235 

10 

11 

12 

2,45 9 5,234 

13 







sorting 







G ′ . 

k (1) 

WL (G, G′ ) = 11. 

1,4 

2,3 

2,35 

6 

7 

8 

3,245 

4,1135 

4,1235 

2,45 9 5,234 

13 

13 8 9 13 

11 10 12 

10 

6 6 6 7 

10 

11 

12 







sorting 







G ′ . 

k (1) 

WL (G, G′ ) = 11. 

13 8 9 13 

11 10 12 

10 

6 6 6 7 

φ (G) = (2, 1, 1, 1, 1, 2, 0, 1, 0, 1, 1, 0, 1) 

1 

φ (G’) = (1, 2, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1) 

1 

Initialization 

12345 

1st iteration 

6 7 8 9 10 11 12 13 







sorting 







G ′ . 

k (1) 

WL (G, G′ ) = 11. 

13 8 9 13 

11 10 12 

10 

6 6 6 7 

φ (G) = (2, 1, 1, 1, 1, 2, 0, 1, 0, 1, 1, 0, 1) 

1 

φ (G’) = (1, 2, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1) 

1 

Initialization 

12345 

1st iteration 

6 7 8 9 10 11 12 13 




The Weisfeiler-Lehman kernel on a pair of graphs more formally 

The Weisfeiler-Lehman kernel on two graphs G and G ′ is defined as: 

where 

k (h) 

WL (G, G′ )= ∣ ∣{(s i (v),s i (v ′ ))|f(s i (v)) = f(s i (v ′ )), 

i ∈ {0, . . . , h},v ∈ V, v ′ ∈ V ′ } ∣ ∣, 

◮ s i (v) is the sorted multiset-label of node v in iteration i, 

◮ f is an injective label compression function, 

◮ the sets {f(s i (v))|v ∈ V ∪ V ′ } and {f(s j (v))|v ∈ V ∪ V ′ } are 

disjoint for all i ≠ j, 

◮ s 0 (v) is the original label of v in case of labeled graphs and 0 in case 

of unlabeled graphs, 

◮ and f(s 0 (v)) = s 0 (v). 




The Weisfeiler-Lehman kernel on N graphs 

◮ Naive computation of our kernel on N graphs is O(N 2 hm). 

◮ Instead, perform the following steps for all graphs in each iteration: 

1. Multiset-label determination and sorting 

2. Label compression via hashing 

3. Relabeling 

◮ WL kernel for all pairs can be computed in 

◮ In practice the first term dominates the runtime. 









3. Relabeling 

◮ WL kernel for all pairs can be computed in 










3. Relabeling 

◮ WL kernel for all pairs can be computed in O(Nhm + N 2 hn). 










3. Relabeling 

◮ WL kernel for all pairs can be computed in O(Nhm+N 2 hn). 




Runtime behaviour on synthetic graphs 

Runtime comparison of naive and hashing approaches 

10 5 Number of graphs N 

Runtime in seconds 

10 4 

10 3 

10 2 

10 1 

10 0 

naive 

with hashing 

Runtime in seconds 

10 1 

10 1 10 2 10 3 


1 hour 

Datasets 

minute 

10 sec 

Experimental evaluation 

◮ MUTAG - mutagenic/non-mutagenic nitro compounds for Salmonella 

typhimurium 

85 % 

◮ NCI1 and NCI109 - active/inactive compounds in an anti-cancer 

screen 

80 % 

75 % 

70 % 

65 % 

60 % 

55 % 

50 % 

◮ D & D - enzymes/non-enzymes 

Setup 

Dataset MUTAG NCI1 NCI109 D & D 

Maximum # nodes 28 111 111 5748 

Average # nodes 17.93 29.87 29.68 284.32 

# labels 7 37 54 89 

Number MUTAG of graphsNCI1 188 4110 NCI109 4127 D&D 1178 

graph size 



Setup 

Comparison partners 


10 

9 

8 

7 

6 

5 

4 

3 

2 





Weisfeiler-Lehman <strong>subtree</strong> kernel (this talk) 

1 

100 200 300 400 500 600 700 800 900 1000 

Graph size 



Results 

Runtime and accuracy 

1000 days* 

100 days* 

10 days* 

1 day 

1 hour 

WL 

RG 

3 Graphlet 

RW 

SP 

1 minute 

10 sec 

* extrapolated 

85 % 

80 % 

75 % 

70 % 

65 % 

60 % 

55 % 

50 % 

MUTAG NCI1 NCI109 D&D 

graph size 


Conclusion 

Conclusion and outlook 

◮ We have defined a <strong>subtree</strong> kernel on graphs that is able to deal with 

node and edge labels. Its computation time is O(Nhm) 

◮ linear in the number of graphs N, 

◮ linear in <strong>subtree</strong> height h, 

◮ linear in the number of edges in each graph, m. 

◮ Inexact matching of the <strong>subtree</strong>s? 

◮ Continuous or high-dimensional node labels? 























Acknowledgements 

We would like to thank Kurt Mehlhorn, Pascal Schweitzer, and Erik Jan 

van Leeuwen for fruitful discussions.

Fast subtree kernels on graphs - VideoLectures

Create successful ePaper yourself

Delete template?

Save as template?