13.07.2015 Views

Online Learning over Graphs - UCL Department of Computer Science

Online Learning over Graphs - UCL Department of Computer Science

Online Learning over Graphs - UCL Department of Computer Science

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong>Mark Herbster, Massimiliano Pontil, Lisa Wainer<strong>Computer</strong> <strong>Science</strong> <strong>Department</strong>University College London<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.1/17


Inspiration – 1Yeast protein networkInternet hosts<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.2/17


Inspiration – 20.2Digits ’3’ and ’8’ graph using 3 nearest neighbours0.150.10.050−0.05−0.1−0.15−0.2−0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1USPS digits 3 and 8 Random graph G 2 k−out (26; 2)<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.3/17


<strong>Online</strong> <strong>Learning</strong> Model Aim: learn a function g : V → {−1, +1} correspondingto a labeling <strong>of</strong> a graph G = (V,E) and V = {1,...,n}. <strong>Learning</strong> proceeds in trialsfor t = 1,...,l do1. Nature selects v t ∈ V2. Learner predicts ŷ t ∈ {−1, +1}3. Nature selects y t ∈ {−1, +1}4. If ŷ t ≠ y t then mistakes = mistakes + 1 Learner’s goal: minimize mistakes Bound: mistakes ≤ f(complexity(g)) What is a natural complexity for a graph labeling?<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.4/17


RKHS on a Graph Graph Laplacian L := D − A where A is theadjacency matrix and D := diag(d 1 ,...,d n ) Define S n := {g : ∑ ni=1 g i = 0}; assume G is connected Define〈f, g〉 := f ⊤ Lg f, g ∈ S n∑‖g‖ 2 = (g i − g j ) 2(i,j)∈E(G) Graph kernel: K G = L + (pseudoinverse) Reproducing property:g i = e i L + Lg = K i Lg = 〈K i , g〉<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.5/17


Example‖g‖ 2 = 3 × 4 ‖g‖ 2 = 12 × 4<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.6/17


Projection Algorithms Perceptron-inspired (but non-conservative) Projection: P(N;w) := arg minu∈N ‖u − w‖ Protypical algorithm:Input: A sequence <strong>of</strong> closed convex sets {U t } l t=1 ⊂ HInitialization: g 1 = 0For t = 1,...,l dog t+1 = P(U t ; g t ) Three variants:1. 1-Proj : U t = {g : y t 〈K vt , g〉 ≥ 1}2. MNI : U t = ⋂ ti=1 {g : y i〈K vi , g〉 = 1}3. C-proj : cycle thru {U 1 ,...,U t } until no “mistakes” Prediction: ŷ t = sign(〈K vt , g t 〉) = sign(g t,vt )<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.7/17


BoundsTheorem: Given a sequence <strong>of</strong> vertex label pairs{(v i ,y i )} l i=1 ⊆ V × {−1, 1} where M is the index set <strong>of</strong>mistaken trials then the cumulative mistakes <strong>of</strong> thealgorithm is bounded by|M| ≤ ‖g ∗ ‖ 2 Bfor all consistent g ∗ ∈ S n whereB = harmonic-mean({K vi v i} i∈M ) ≤ maxi∈V (K ii) mistake → generalization bounds see [CCG04, etc]<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.8/17


Interpretation <strong>of</strong> Bounds – 1 d G (p,q) length <strong>of</strong> shortest path p to q for p,q ∈ V . Eccentricity: ρ p := max q∈V d G (p,q) Diameter: D G := max p∈V ρ p Algebraic connectivity: λ 2 2nd smallest eigenvalue <strong>of</strong> LTheorem:K pp ≤ min( 1 λ 2,ρ p ) Label partition: (g + , g − ) := ({i : g i = 1}, {i : g i = −1})s.t. |g + | + |g − | = n. Cut : ∂(g + , g − ) := {(i,j) ∈ E(G) : i ∈ g + ,j ∈ g − }<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.9/17


<strong>Online</strong> Active <strong>Learning</strong> – 1Protocol: A is an index set <strong>of</strong> active trials. On an active trialt now the learner selects v t .Theorem: Given a sequence <strong>of</strong> vertex label pairs{(v i ,y i )} l i=1 ⊆ V × {−1, 1} where M is the index set <strong>of</strong>mistaken trials then the cumulative mistakes <strong>of</strong> thealgorithm is bounded by)|M\A| ≤(‖g ∗ ‖ 2 − Z A Bfor all consistent g ∗ where Z A = ∑ ‖g t − g t+1 ‖ 2t∈A} {{ }“progress”Idea: Maximize the “progress”: Z A<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.11/17


<strong>Online</strong> Active <strong>Learning</strong> – 2Strategy: On trial t ∈ A choose v t to max-minimize theper-trial progress ‖g t − g t+1 ‖ 2 thus choose vertexv t= arg maxi∈V= arg maxi∈VObservations:min ‖g t − P({g : 〈g,K i 〉y ≥ 1}; g t )‖ 2y∈{−1,1}(min(|g t,i |, 1) − 1) 2K ii The top is the current “uncertainty” (margin) <strong>of</strong> vertex i The bottom is a “structural” property <strong>of</strong> vertex i Since K ii ≤ ρ i as an approximation posit that K ii ∼ ρ iInterpretation: the above criteria trades the label uncertainty(“nearness to previous labels”) with “centrality” <strong>of</strong> vertex<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.12/17


<strong>Online</strong> Active <strong>Learning</strong> – 3241522 23201916212614181327 28292530173133321134353672891210314 5 6Observe: even if vertex 15 ismaximally uncertain (g 15 =0) vertex 30 is still preferredif the margin |g 30 | ≤ 0.51.Hierarchal random graph G 2 k−out(6; 2)Grey-scaled K pp : K 30,30 = .21 (min),K 15,15 = .94 (max)<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.13/17


Experiments – 1 (Random Graph)200180160MNI−agC−proj1−projAct−stAct−muFuture cumulative error1401201008060402000 100 200 300 400 500 600 700TrialsHierarchal random graph G 2 k−out(26; 2)726 Nodes labeled by a noisy diffusion process<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.14/17


Experiments – 2 (USPS Even vs Odd)Future Cumulative Error1201008060401−projC−projMNI−agAct−stAct−mu2000 200 400 600 800 1000Trials1000 random samples 100 per digitGraph built via 3-NN with Euclidean distance<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.15/17


Selected ReferencesBelkin, M., & Niyogi, P. (2004). Semi-supervised learning onriemannian manifolds. Machine <strong>Learning</strong>, 56, 209–239.Chung, F. R. (1997). Spectral graph theory. No. 92 in CBMS RegionalConference Series in Mathematics. American Mathematical Society.Smola, A., & Kondor, R. (2003). Kernels and regularization ongraphs. COLT 2003, Proc. (pp. 144–158).Zhu, X., Lafferty, J., & Ghahramani, Z. (2003b). Combining activelearning and semi-supervised learning using gaussian fields andharmonic functions. Proc. <strong>of</strong> the ICML 2003 workshop on TheContinuum from Labeled to Unlabeled Data in ML and Data Mining(pp. 58–65).<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.16/17


Active <strong>Learning</strong> DemoActive <strong>Learning</strong> on Digits 5 and 8 with MNI−− −−−− − −− −−−−−0 points selected−− −−−−−<strong>Online</strong> <strong>Learning</strong> <strong>over</strong> <strong>Graphs</strong> (ICML ’05) – p.17/17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!