14.11.2014 Views

Fast subtree kernels on graphs - VideoLectures

Fast subtree kernels on graphs - VideoLectures

Fast subtree kernels on graphs - VideoLectures

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong><br />

Nino Shervashidze<br />

joint work with Karsten Borgwardt<br />

Machine Learning and Computati<strong>on</strong>al Biology Research Group<br />

Max Planck Institute for Biological Cybernetics, Tübingen<br />

Max Planck Institute for Developmental Biology, Tübingen<br />

9 December 2009<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 1


Introducti<strong>on</strong><br />

Introducti<strong>on</strong><br />

◮ Kernels are inner products in some feature space H:<br />

k(x, x ′ )=〈φ(x), φ(x ′ )〉.<br />

◮ Intuitively, k(x, x ′ ) is a measure of similarity of x and x ′ .<br />

◮ x and x ′ can be vectors, but also strings, trees, <strong>graphs</strong>.<br />

◮ Kernels are used within kernel methods in<br />

◮ classificati<strong>on</strong> (SVM),<br />

◮ regressi<strong>on</strong>,<br />

◮ feature selecti<strong>on</strong>,<br />

◮ two-sample problems, etc.<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 2


Introducti<strong>on</strong><br />

Introducti<strong>on</strong><br />

◮ Kernels are inner products in some feature space H:<br />

k(x, x ′ )=〈φ(x), φ(x ′ )〉.<br />

◮ Intuitively, k(x, x ′ ) is a measure of similarity of x and x ′ .<br />

◮ x and x ′ can be vectors, but also strings, trees, <strong>graphs</strong>.<br />

◮ Kernels are used within kernel methods in<br />

◮ classificati<strong>on</strong> (SVM),<br />

◮ regressi<strong>on</strong>,<br />

◮ feature selecti<strong>on</strong>,<br />

◮ two-sample problems, etc.<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 2


Introducti<strong>on</strong><br />

Introducti<strong>on</strong><br />

◮ Kernels are inner products in some feature space H:<br />

k(x, x ′ )=〈φ(x), φ(x ′ )〉.<br />

◮ Intuitively, k(x, x ′ ) is a measure of similarity of x and x ′ .<br />

◮ x and x ′ can be vectors, but also strings, trees, <strong>graphs</strong>.<br />

◮ Kernels are used within kernel methods in<br />

◮ classificati<strong>on</strong> (SVM),<br />

◮ regressi<strong>on</strong>,<br />

◮ feature selecti<strong>on</strong>,<br />

◮ two-sample problems, etc.<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 2


Introducti<strong>on</strong><br />

Introducti<strong>on</strong><br />

◮ Kernels are inner products in some feature space H:<br />

k(x, x ′ )=〈φ(x), φ(x ′ )〉.<br />

◮ Intuitively, k(x, x ′ ) is a measure of similarity of x and x ′ .<br />

◮ x and x ′ can be vectors, but also strings, trees, <strong>graphs</strong>.<br />

◮ Kernels are used within kernel methods in<br />

◮ classificati<strong>on</strong> (SVM),<br />

◮ regressi<strong>on</strong>,<br />

◮ feature selecti<strong>on</strong>,<br />

◮ two-sample problems, etc.<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 2


Introducti<strong>on</strong><br />

Why graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g>?<br />

For instance, they can be used in graph classificati<strong>on</strong>.<br />

figure by Koji Tsuda<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 3


Introducti<strong>on</strong><br />

Overview<br />

Overview of graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

◮ Graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g> usually count matching sub<strong>graphs</strong> (Haussler, 1999)<br />

◮<br />

paths, walks, cycles, graphlets, etc.<br />

◮ All sub<strong>graphs</strong> kernel is at least as hard to compute as isomorphism<br />

checking (Gärtner et al., 2003)<br />

◮ Restricted classes of sub<strong>graphs</strong>: better runtime (and no isomorphism<br />

checking)<br />

◮ But we still need graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g> that<br />

◮ can take into account node and edge labels<br />

◮ are efficient to compute even <strong>on</strong> large <strong>graphs</strong><br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 4


Introducti<strong>on</strong><br />

Overview<br />

Overview of graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

◮ Graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g> usually count matching sub<strong>graphs</strong> (Haussler, 1999)<br />

◮<br />

paths, walks, cycles, graphlets, etc.<br />

◮ All sub<strong>graphs</strong> kernel is at least as hard to compute as isomorphism<br />

checking (Gärtner et al., 2003)<br />

◮ Restricted classes of sub<strong>graphs</strong>: better runtime (and no isomorphism<br />

checking)<br />

◮ But we still need graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g> that<br />

◮ can take into account node and edge labels<br />

◮ are efficient to compute even <strong>on</strong> large <strong>graphs</strong><br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 4


Introducti<strong>on</strong><br />

Overview<br />

Overview of graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

◮ Graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g> usually count matching sub<strong>graphs</strong> (Haussler, 1999)<br />

◮<br />

paths, walks, cycles, graphlets, etc.<br />

◮ All sub<strong>graphs</strong> kernel is at least as hard to compute as isomorphism<br />

checking (Gärtner et al., 2003)<br />

◮ Restricted classes of sub<strong>graphs</strong>: better runtime (and no isomorphism<br />

checking)<br />

◮ But we still need graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g> that<br />

◮ can take into account node and edge labels<br />

◮ are efficient to compute even <strong>on</strong> large <strong>graphs</strong><br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 4


Introducti<strong>on</strong><br />

Overview<br />

Overview of graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

◮ Graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g> usually count matching sub<strong>graphs</strong> (Haussler, 1999)<br />

◮<br />

paths, walks, cycles, graphlets, etc.<br />

◮ All sub<strong>graphs</strong> kernel is at least as hard to compute as isomorphism<br />

checking (Gärtner et al., 2003)<br />

◮ Restricted classes of sub<strong>graphs</strong>: better runtime (and no isomorphism<br />

checking)<br />

◮ But we still need graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g> that<br />

◮ can take into account node and edge labels<br />

◮ are efficient to compute even <strong>on</strong> large <strong>graphs</strong><br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 4


Introducti<strong>on</strong><br />

Overview<br />

Overview of graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

10<br />

9<br />

Subtree kernel (Ram<strong>on</strong> and Gaertner, 2003)<br />

Runtime for labeled <strong>graphs</strong><br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

100 200 300 400 500 600 700 800 900 1000<br />

Graph size<br />

100 <strong>graphs</strong>, <str<strong>on</strong>g>subtree</str<strong>on</strong>g> height 3, alphabet size 25, max. degree n/2, n 2 /2 edges<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 5


Introducti<strong>on</strong><br />

Overview<br />

Overview of graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

Runtime for labeled <strong>graphs</strong><br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

Subtree kernel (Ram<strong>on</strong> and Gaertner, 2003)<br />

<str<strong>on</strong>g>Fast</str<strong>on</strong>g> Random Walk (Vishwanathan et al., 2007)<br />

1<br />

100 200 300 400 500 600 700 800 900 1000<br />

Graph size<br />

100 <strong>graphs</strong>, <str<strong>on</strong>g>subtree</str<strong>on</strong>g> height 3, alphabet size 25, max. degree n/2, n 2 /2 edges<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 5


Introducti<strong>on</strong><br />

Overview<br />

Overview of graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

Runtime for labeled <strong>graphs</strong><br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

Subtree kernel (Ram<strong>on</strong> and Gaertner, 2003)<br />

<str<strong>on</strong>g>Fast</str<strong>on</strong>g> Random Walk (Vishwanathan et al., 2007)<br />

Shortest Path (Borgwardt and Kriegel, 2005)<br />

1<br />

100 200 300 400 500 600 700 800 900 1000<br />

Graph size<br />

100 <strong>graphs</strong>, <str<strong>on</strong>g>subtree</str<strong>on</strong>g> height 3, alphabet size 25, max. degree n/2, n 2 /2 edges<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 5


Introducti<strong>on</strong><br />

Overview<br />

Overview of graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

Runtime for labeled <strong>graphs</strong><br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

Subtree kernel (Ram<strong>on</strong> and Gaertner, 2003)<br />

<str<strong>on</strong>g>Fast</str<strong>on</strong>g> Random Walk (Vishwanathan et al., 2007)<br />

Shortest Path (Borgwardt and Kriegel, 2005)<br />

3-Graphlet (Shervashidze et al., 2009)<br />

1<br />

100 200 300 400 500 600 700 800 900 1000<br />

Graph size<br />

100 <strong>graphs</strong>, <str<strong>on</strong>g>subtree</str<strong>on</strong>g> height 3, alphabet size 25, max. degree n/2, n 2 /2 edges<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 5


Introducti<strong>on</strong><br />

Overview<br />

Overview of graph <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

Runtime for labeled <strong>graphs</strong><br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

Subtree kernel (Ram<strong>on</strong> and Gaertner, 2003)<br />

<str<strong>on</strong>g>Fast</str<strong>on</strong>g> Random Walk (Vishwanathan et al., 2007)<br />

Shortest Path (Borgwardt and Kriegel, 2005)<br />

3-Graphlet (Shervashidze et al., 2009)<br />

Weisfeiler-Lehman <str<strong>on</strong>g>subtree</str<strong>on</strong>g> kernel (this talk)<br />

1<br />

100 200 300 400 500 600 700 800 900 1000<br />

Graph size<br />

100 <strong>graphs</strong>, <str<strong>on</strong>g>subtree</str<strong>on</strong>g> height 3, alphabet size 25, max. degree n/2, n 2 /2 edges<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 5


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

◮ Informally, <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> iteratively look at neighborhoods of nodes.<br />

◮ Unfolding the structure over iterati<strong>on</strong>s, we get a tree-like pattern,<br />

called “<str<strong>on</strong>g>subtree</str<strong>on</strong>g>” or “tree-walk” in the literature.<br />

1<br />

2<br />

3<br />

1<br />

2<br />

3<br />

6<br />

6<br />

4<br />

5<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 6


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

◮ Informally, <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> iteratively look at neighborhoods of nodes.<br />

◮ Unfolding the structure over iterati<strong>on</strong>s, we get a tree-like pattern,<br />

called “<str<strong>on</strong>g>subtree</str<strong>on</strong>g>” or “tree-walk” in the literature.<br />

1<br />

2<br />

3<br />

1<br />

2<br />

3<br />

6<br />

6<br />

5<br />

4<br />

1 3 1 2 4 5 1 5<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 6


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

◮ Informally, <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> iteratively look at neighborhoods of nodes.<br />

◮ Unfolding the structure over iterati<strong>on</strong>s, we get a tree-like pattern,<br />

called “<str<strong>on</strong>g>subtree</str<strong>on</strong>g>” or “tree-walk” in the literature.<br />

1<br />

2<br />

3<br />

1<br />

2<br />

3<br />

6<br />

6<br />

5<br />

4<br />

1 3 1 2 4 5 1 5<br />

Subtree of height 2 rooted at the node 1<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 6


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm (1968)<br />

Given two <strong>graphs</strong> G and G ′<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1 1<br />

1<br />

1<br />

1<br />

Are they n<strong>on</strong>-isomorphic?<br />

1-dimensi<strong>on</strong>al WL algorithm may answer this questi<strong>on</strong>.<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 7


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 1<br />

Each iterati<strong>on</strong> of the 1-dimensi<strong>on</strong>al WL test comprises the following steps:<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? Yes.<br />

C<strong>on</strong>tinue.<br />

1<br />

1<br />

1,111<br />

1,11<br />

1<br />

1<br />

1, 11<br />

1,111<br />

1 1<br />

1 1<br />

1,1111 1,11<br />

1,11 1,11<br />

1<br />

1<br />

1,1111<br />

1,11<br />

1<br />

1<br />

1,111<br />

1,111<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 8


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 1<br />

Each iterati<strong>on</strong> of the 1-dimensi<strong>on</strong>al WL test comprises the following steps:<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

1,111<br />

1,11<br />

1, 11<br />

1,111<br />

1,1111 1,11<br />

1,11 1,11<br />

1,1111<br />

1,11<br />

1,111<br />

1,111<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? Yes.<br />

C<strong>on</strong>tinue.<br />

1, 11<br />

1, 11<br />

1, 11<br />

1, 11<br />

1, 11<br />

1, 11<br />

1,111<br />

1,111<br />

1,111<br />

1,111<br />

1,1111<br />

1,1111<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 8


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 1<br />

Each iterati<strong>on</strong> of the 1-dimensi<strong>on</strong>al WL test comprises the following steps:<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

1, 11<br />

1, 11<br />

1, 11<br />

1, 11<br />

1, 11<br />

1, 11<br />

1,111<br />

1,111<br />

1,111<br />

1,111<br />

1,1111<br />

1,1111<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? Yes.<br />

C<strong>on</strong>tinue.<br />

1, 11<br />

1,111<br />

1,1111<br />

2<br />

3<br />

4<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 8


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 1<br />

Each iterati<strong>on</strong> of the 1-dimensi<strong>on</strong>al WL test comprises the following steps:<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? Yes.<br />

C<strong>on</strong>tinue.<br />

1,111<br />

1,11<br />

3<br />

2<br />

1, 11<br />

1,111<br />

2<br />

3<br />

1,1111 1,11<br />

1,11 1,11<br />

4 2<br />

2 2<br />

1,1111<br />

1,11<br />

4<br />

2<br />

1,111<br />

1,111<br />

3<br />

3<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 8


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 1<br />

Each iterati<strong>on</strong> of the 1-dimensi<strong>on</strong>al WL test comprises the following steps:<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? Yes.<br />

C<strong>on</strong>tinue.<br />

3<br />

2<br />

2<br />

3<br />

4<br />

2<br />

2 2<br />

4<br />

2<br />

3<br />

3<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 8


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 2<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3<br />

2<br />

2<br />

3<br />

4 2<br />

2 2<br />

4<br />

2<br />

3<br />

3<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? No.<br />

Output YES<br />

3,242<br />

2, 43<br />

4,2332<br />

2,42<br />

4,3322<br />

3,324<br />

Overall complexity -<br />

O(hm) for h iterati<strong>on</strong>s<br />

2,33<br />

3,242<br />

2,34 2,24<br />

2,33<br />

3,243<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 9


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 2<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

3,242<br />

2, 43<br />

4,2332<br />

2,42<br />

4,3322<br />

3,324<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

2,33<br />

3,242<br />

2,34 2,24<br />

2,33<br />

3,243<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? No.<br />

Output YES<br />

3,224<br />

2, 34<br />

4,2233<br />

2,24<br />

4,2233<br />

3,234<br />

Overall complexity -<br />

O(hm) for h iterati<strong>on</strong>s<br />

2,33<br />

3,224<br />

2,34 2,24<br />

2,33<br />

3,234<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 9


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 2<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? No.<br />

Output YES<br />

Overall complexity -<br />

O(hm) for h iterati<strong>on</strong>s<br />

2, 34<br />

3,224 4,2233 2,24<br />

2,33<br />

2,34 2,24<br />

3,224<br />

2,24<br />

2, 34<br />

2,24 2, 34<br />

2,33<br />

3,224<br />

2,33<br />

3,224<br />

4,2233<br />

3,234<br />

3,234<br />

2,33<br />

3,234<br />

3,234<br />

4,2233<br />

4,2233<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 9


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 2<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? No.<br />

Output YES<br />

Overall complexity -<br />

O(hm) for h iterati<strong>on</strong>s<br />

2,24<br />

2,24<br />

2, 34<br />

2, 34<br />

2,33<br />

2,33<br />

3,224<br />

3,224<br />

2,24 5 3,224<br />

2,33<br />

6 3,234<br />

2,34<br />

7 4,2233<br />

3,234<br />

3,234<br />

4,2233<br />

4,2233<br />

8<br />

9<br />

10<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 9


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 2<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

3,224<br />

2, 34<br />

4,2233<br />

2,24<br />

4,2233<br />

3,234<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

2,33<br />

3,224<br />

2,34 2,24<br />

2,33<br />

3,234<br />

Are the label sets of G<br />

and G ′ identical? No.<br />

Output YES<br />

8<br />

7<br />

10<br />

5<br />

10<br />

9<br />

Overall complexity -<br />

O(hm) for h iterati<strong>on</strong>s<br />

6<br />

8<br />

7 5<br />

6<br />

9<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 9


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 2<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

8<br />

7<br />

10<br />

5<br />

10<br />

9<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? No.<br />

Output YES<br />

6<br />

8<br />

7 5<br />

6<br />

9<br />

Overall complexity -<br />

O(hm) for h iterati<strong>on</strong>s<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 9


Introducti<strong>on</strong><br />

Subtree <str<strong>on</strong>g>kernels</str<strong>on</strong>g><br />

The 1-dimensi<strong>on</strong>al Weisfeiler-Lehman algorithm: Iterati<strong>on</strong> 2<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

8<br />

7<br />

10<br />

5<br />

10<br />

9<br />

3. Relabeling O(n)<br />

Are the label sets of G<br />

and G ′ identical? No.<br />

Output YES<br />

6<br />

8<br />

7 5<br />

6<br />

9<br />

Overall complexity -<br />

O(hm) for h iterati<strong>on</strong>s<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 9


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Differences between test and kernel<br />

WL kernel vs isomorphism test<br />

The test<br />

◮ checks sets of node labels of<br />

two <strong>graphs</strong> for identity after<br />

each iterati<strong>on</strong><br />

◮ stops when the sets become<br />

different or when number of<br />

iterati<strong>on</strong>s reaches n<br />

◮ is computed in O(hm)<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 10


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Differences between test and kernel<br />

WL kernel vs isomorphism test<br />

The test<br />

◮ checks sets of node labels of<br />

two <strong>graphs</strong> for identity after<br />

each iterati<strong>on</strong><br />

◮ stops when the sets become<br />

different or when number of<br />

iterati<strong>on</strong>s reaches n<br />

◮ is computed in O(hm)<br />

The kernel<br />

◮ counts matching pairs of<br />

labels in two <strong>graphs</strong> after<br />

each iterati<strong>on</strong><br />

◮ the number of iterati<strong>on</strong>s h is<br />

a parameter of the algorithm<br />

(in practice h of 2 or 3 gives<br />

the best results)<br />

◮ is computed in O(hm)<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 10


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> a pair of <strong>graphs</strong>: Initializati<strong>on</strong><br />

5<br />

2<br />

2<br />

5<br />

4<br />

3<br />

4<br />

3<br />

G 1<br />

1<br />

1<br />

2<br />

G ′<br />

Initial feature vector representati<strong>on</strong>s of G and G ′ :<br />

φ (G) = (2, 1, 1, 1, 1)<br />

0<br />

φ (G’) = (1, 2, 1, 1, 1)<br />

0<br />

12345<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 11


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> a pair of <strong>graphs</strong>: Iterati<strong>on</strong> 1<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

4<br />

5<br />

1<br />

2<br />

1<br />

3<br />

4<br />

2<br />

1<br />

5<br />

2<br />

3<br />

3. Relabeling O(n)<br />

5,234<br />

2,35<br />

2,45<br />

5,234<br />

Update feature vector<br />

representati<strong>on</strong>s of G and<br />

G ′ .<br />

k (1)<br />

WL (G, G′ ) = 11.<br />

4,1135<br />

1,4<br />

3,245<br />

1,4<br />

4,1235<br />

1,4<br />

3,245<br />

2,3<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 12


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> a pair of <strong>graphs</strong>: Iterati<strong>on</strong> 1<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

Update feature vector<br />

representati<strong>on</strong>s of G and<br />

G ′ .<br />

k (1)<br />

WL (G, G′ ) = 11.<br />

5,234 2,35<br />

2,45 5,234<br />

4,1135<br />

3,245 4,1235 3,245<br />

1,4 1,4<br />

1,4 2,3<br />

1,4<br />

2,3<br />

2,35<br />

6<br />

7<br />

8<br />

3,245<br />

4,1135<br />

4,1235<br />

10<br />

11<br />

12<br />

2,45 9 5,234<br />

13<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 12


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> a pair of <strong>graphs</strong>: Iterati<strong>on</strong> 1<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

Update feature vector<br />

representati<strong>on</strong>s of G and<br />

G ′ .<br />

k (1)<br />

WL (G, G′ ) = 11.<br />

1,4<br />

2,3<br />

2,35<br />

6<br />

7<br />

8<br />

3,245<br />

4,1135<br />

4,1235<br />

2,45 9 5,234<br />

13<br />

13 8 9 13<br />

11 10 12<br />

10<br />

6 6 6 7<br />

10<br />

11<br />

12<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 12


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> a pair of <strong>graphs</strong>: Iterati<strong>on</strong> 1<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

Update feature vector<br />

representati<strong>on</strong>s of G and<br />

G ′ .<br />

k (1)<br />

WL (G, G′ ) = 11.<br />

13 8 9 13<br />

11 10 12<br />

10<br />

6 6 6 7<br />

φ (G) = (2, 1, 1, 1, 1, 2, 0, 1, 0, 1, 1, 0, 1)<br />

1<br />

φ (G’) = (1, 2, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1)<br />

1<br />

Initializati<strong>on</strong><br />

12345<br />

1st iterati<strong>on</strong><br />

6 7 8 9 10 11 12 13<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 12


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> a pair of <strong>graphs</strong>: Iterati<strong>on</strong> 1<br />

1. Multiset-label<br />

determinati<strong>on</strong> and<br />

sorting<br />

O(m) via bucket sort<br />

2. Label compressi<strong>on</strong><br />

O(m) via radix sort<br />

3. Relabeling O(n)<br />

Update feature vector<br />

representati<strong>on</strong>s of G and<br />

G ′ .<br />

k (1)<br />

WL (G, G′ ) = 11.<br />

13 8 9 13<br />

11 10 12<br />

10<br />

6 6 6 7<br />

φ (G) = (2, 1, 1, 1, 1, 2, 0, 1, 0, 1, 1, 0, 1)<br />

1<br />

φ (G’) = (1, 2, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1)<br />

1<br />

Initializati<strong>on</strong><br />

12345<br />

1st iterati<strong>on</strong><br />

6 7 8 9 10 11 12 13<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 12


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> a pair of <strong>graphs</strong> more formally<br />

The Weisfeiler-Lehman kernel <strong>on</strong> two <strong>graphs</strong> G and G ′ is defined as:<br />

where<br />

k (h)<br />

WL (G, G′ )= ∣ ∣{(s i (v),s i (v ′ ))|f(s i (v)) = f(s i (v ′ )),<br />

i ∈ {0, . . . , h},v ∈ V, v ′ ∈ V ′ } ∣ ∣,<br />

◮ s i (v) is the sorted multiset-label of node v in iterati<strong>on</strong> i,<br />

◮ f is an injective label compressi<strong>on</strong> functi<strong>on</strong>,<br />

◮ the sets {f(s i (v))|v ∈ V ∪ V ′ } and {f(s j (v))|v ∈ V ∪ V ′ } are<br />

disjoint for all i ≠ j,<br />

◮ s 0 (v) is the original label of v in case of labeled <strong>graphs</strong> and 0 in case<br />

of unlabeled <strong>graphs</strong>,<br />

◮ and f(s 0 (v)) = s 0 (v).<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 13


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> N <strong>graphs</strong><br />

◮ Naive computati<strong>on</strong> of our kernel <strong>on</strong> N <strong>graphs</strong> is O(N 2 hm).<br />

◮ Instead, perform the following steps for all <strong>graphs</strong> in each iterati<strong>on</strong>:<br />

1. Multiset-label determinati<strong>on</strong> and sorting<br />

2. Label compressi<strong>on</strong> via hashing<br />

3. Relabeling<br />

◮ WL kernel for all pairs can be computed in<br />

◮ In practice the first term dominates the runtime.<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 14


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> N <strong>graphs</strong><br />

◮ Naive computati<strong>on</strong> of our kernel <strong>on</strong> N <strong>graphs</strong> is O(N 2 hm).<br />

◮ Instead, perform the following steps for all <strong>graphs</strong> in each iterati<strong>on</strong>:<br />

1. Multiset-label determinati<strong>on</strong> and sorting<br />

2. Label compressi<strong>on</strong> via hashing<br />

3. Relabeling<br />

◮ WL kernel for all pairs can be computed in<br />

◮ In practice the first term dominates the runtime.<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 14


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> N <strong>graphs</strong><br />

◮ Naive computati<strong>on</strong> of our kernel <strong>on</strong> N <strong>graphs</strong> is O(N 2 hm).<br />

◮ Instead, perform the following steps for all <strong>graphs</strong> in each iterati<strong>on</strong>:<br />

1. Multiset-label determinati<strong>on</strong> and sorting<br />

2. Label compressi<strong>on</strong> via hashing<br />

3. Relabeling<br />

◮ WL kernel for all pairs can be computed in O(Nhm + N 2 hn).<br />

◮ In practice the first term dominates the runtime.<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 14


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Definiti<strong>on</strong>s<br />

The Weisfeiler-Lehman kernel <strong>on</strong> N <strong>graphs</strong><br />

◮ Naive computati<strong>on</strong> of our kernel <strong>on</strong> N <strong>graphs</strong> is O(N 2 hm).<br />

◮ Instead, perform the following steps for all <strong>graphs</strong> in each iterati<strong>on</strong>:<br />

1. Multiset-label determinati<strong>on</strong> and sorting<br />

2. Label compressi<strong>on</strong> via hashing<br />

3. Relabeling<br />

◮ WL kernel for all pairs can be computed in O(Nhm+N 2 hn).<br />

◮ In practice the first term dominates the runtime.<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 14


The Weisfeiler-Lehman kernel <strong>on</strong> <strong>graphs</strong><br />

Runtime behaviour <strong>on</strong> synthetic <strong>graphs</strong><br />

Runtime comparis<strong>on</strong> of naive and hashing approaches<br />

10 5 Number of <strong>graphs</strong> N<br />

Runtime in sec<strong>on</strong>ds<br />

10 4<br />

10 3<br />

10 2<br />

10 1<br />

10 0<br />

naive<br />

with hashing<br />

Runtime in sec<strong>on</strong>ds<br />

10 1<br />

10 1 10 2 10 3<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 15


1 hour<br />

Datasets<br />

minute<br />

10 sec<br />

Experimental evaluati<strong>on</strong><br />

◮ MUTAG - mutagenic/n<strong>on</strong>-mutagenic nitro compounds for Salm<strong>on</strong>ella<br />

typhimurium<br />

85 %<br />

◮ NCI1 and NCI109 - active/inactive compounds in an anti-cancer<br />

screen<br />

80 %<br />

75 %<br />

70 %<br />

65 %<br />

60 %<br />

55 %<br />

50 %<br />

◮ D & D - enzymes/n<strong>on</strong>-enzymes<br />

Setup<br />

Dataset MUTAG NCI1 NCI109 D & D<br />

Maximum # nodes 28 111 111 5748<br />

Average # nodes 17.93 29.87 29.68 284.32<br />

# labels 7 37 54 89<br />

Number MUTAG of <strong>graphs</strong>NCI1 188 4110 NCI109 4127 D&D 1178<br />

graph size<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 16


Experimental evaluati<strong>on</strong><br />

Setup<br />

Comparis<strong>on</strong> partners<br />

Runtime for labeled <strong>graphs</strong><br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

Subtree kernel (Ram<strong>on</strong> and Gaertner, 2003)<br />

<str<strong>on</strong>g>Fast</str<strong>on</strong>g> Random Walk (Vishwanathan et al., 2007)<br />

Shortest Path (Borgwardt and Kriegel, 2005)<br />

3-Graphlet (Shervashidze et al., 2009)<br />

Weisfeiler-Lehman <str<strong>on</strong>g>subtree</str<strong>on</strong>g> kernel (this talk)<br />

1<br />

100 200 300 400 500 600 700 800 900 1000<br />

Graph size<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 17


Experimental evaluati<strong>on</strong><br />

Results<br />

Runtime and accuracy<br />

1000 days*<br />

100 days*<br />

10 days*<br />

1 day<br />

1 hour<br />

WL<br />

RG<br />

3 Graphlet<br />

RW<br />

SP<br />

1 minute<br />

10 sec<br />

* extrapolated<br />

85 %<br />

80 %<br />

75 %<br />

70 %<br />

65 %<br />

60 %<br />

55 %<br />

50 %<br />

MUTAG NCI1 NCI109 D&D<br />

graph size<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 18


C<strong>on</strong>clusi<strong>on</strong><br />

C<strong>on</strong>clusi<strong>on</strong> and outlook<br />

◮ We have defined a <str<strong>on</strong>g>subtree</str<strong>on</strong>g> kernel <strong>on</strong> <strong>graphs</strong> that is able to deal with<br />

node and edge labels. Its computati<strong>on</strong> time is O(Nhm)<br />

◮ linear in the number of <strong>graphs</strong> N,<br />

◮ linear in <str<strong>on</strong>g>subtree</str<strong>on</strong>g> height h,<br />

◮ linear in the number of edges in each graph, m.<br />

◮ Inexact matching of the <str<strong>on</strong>g>subtree</str<strong>on</strong>g>s?<br />

◮ C<strong>on</strong>tinuous or high-dimensi<strong>on</strong>al node labels?<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 19


C<strong>on</strong>clusi<strong>on</strong><br />

C<strong>on</strong>clusi<strong>on</strong> and outlook<br />

◮ We have defined a <str<strong>on</strong>g>subtree</str<strong>on</strong>g> kernel <strong>on</strong> <strong>graphs</strong> that is able to deal with<br />

node and edge labels. Its computati<strong>on</strong> time is O(Nhm)<br />

◮ linear in the number of <strong>graphs</strong> N,<br />

◮ linear in <str<strong>on</strong>g>subtree</str<strong>on</strong>g> height h,<br />

◮ linear in the number of edges in each graph, m.<br />

◮ Inexact matching of the <str<strong>on</strong>g>subtree</str<strong>on</strong>g>s?<br />

◮ C<strong>on</strong>tinuous or high-dimensi<strong>on</strong>al node labels?<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 19


C<strong>on</strong>clusi<strong>on</strong><br />

C<strong>on</strong>clusi<strong>on</strong> and outlook<br />

◮ We have defined a <str<strong>on</strong>g>subtree</str<strong>on</strong>g> kernel <strong>on</strong> <strong>graphs</strong> that is able to deal with<br />

node and edge labels. Its computati<strong>on</strong> time is O(Nhm)<br />

◮ linear in the number of <strong>graphs</strong> N,<br />

◮ linear in <str<strong>on</strong>g>subtree</str<strong>on</strong>g> height h,<br />

◮ linear in the number of edges in each graph, m.<br />

◮ Inexact matching of the <str<strong>on</strong>g>subtree</str<strong>on</strong>g>s?<br />

◮ C<strong>on</strong>tinuous or high-dimensi<strong>on</strong>al node labels?<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 19


C<strong>on</strong>clusi<strong>on</strong><br />

Acknowledgements<br />

We would like to thank Kurt Mehlhorn, Pascal Schweitzer, and Erik Jan<br />

van Leeuwen for fruitful discussi<strong>on</strong>s.<br />

N. Shervashidze, K. Borgwardt <str<strong>on</strong>g>Fast</str<strong>on</strong>g> <str<strong>on</strong>g>subtree</str<strong>on</strong>g> <str<strong>on</strong>g>kernels</str<strong>on</strong>g> <strong>on</strong> <strong>graphs</strong> NIPS 20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!