Online Learning over Graphs - UCL Department of Computer Science

Online Learning over GraphsMark Herbster, Massimiliano Pontil, Lisa WainerComputer Science DepartmentUniversity College LondonOnline Learning over Graphs (ICML ’05) – p.1/17

Inspiration – 1Yeast protein networkInternet hostsOnline Learning over Graphs (ICML ’05) – p.2/17

Inspiration – 20.2Digits ’3’ and ’8’ graph using 3 nearest neighbours0.150.10.050−0.05−0.1−0.15−0.2−0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1USPS digits 3 and 8 Random graph G 2 k−out (26; 2)Online Learning over Graphs (ICML ’05) – p.3/17

Online Learning Model Aim: learn a function g : V → {−1, +1} correspondingto a labeling of a graph G = (V,E) and V = {1,...,n}. Learning proceeds in trialsfor t = 1,...,l do1. Nature selects v t ∈ V2. Learner predicts ŷ t ∈ {−1, +1}3. Nature selects y t ∈ {−1, +1}4. If ŷ t ≠ y t then mistakes = mistakes + 1 Learner’s goal: minimize mistakes Bound: mistakes ≤ f(complexity(g)) What is a natural complexity for a graph labeling?Online Learning over Graphs (ICML ’05) – p.4/17

RKHS on a Graph Graph Laplacian L := D − A where A is theadjacency matrix and D := diag(d 1 ,...,d n ) Define S n := {g : ∑ ni=1 g i = 0}; assume G is connected Define〈f, g〉 := f ⊤ Lg f, g ∈ S n∑‖g‖ 2 = (g i − g j ) 2(i,j)∈E(G) Graph kernel: K G = L + (pseudoinverse) Reproducing property:g i = e i L + Lg = K i Lg = 〈K i , g〉Online Learning over Graphs (ICML ’05) – p.5/17

Example‖g‖ 2 = 3 × 4 ‖g‖ 2 = 12 × 4Online Learning over Graphs (ICML ’05) – p.6/17

Projection Algorithms Perceptron-inspired (but non-conservative) Projection: P(N;w) := arg minu∈N ‖u − w‖ Protypical algorithm:Input: A sequence of closed convex sets {U t } l t=1 ⊂ HInitialization: g 1 = 0For t = 1,...,l dog t+1 = P(U t ; g t ) Three variants:1. 1-Proj : U t = {g : y t 〈K vt , g〉 ≥ 1}2. MNI : U t = ⋂ ti=1 {g : y i〈K vi , g〉 = 1}3. C-proj : cycle thru {U 1 ,...,U t } until no “mistakes” Prediction: ŷ t = sign(〈K vt , g t 〉) = sign(g t,vt )Online Learning over Graphs (ICML ’05) – p.7/17

BoundsTheorem: Given a sequence of vertex label pairs{(v i ,y i )} l i=1 ⊆ V × {−1, 1} where M is the index set ofmistaken trials then the cumulative mistakes of thealgorithm is bounded by|M| ≤ ‖g ∗ ‖ 2 Bfor all consistent g ∗ ∈ S n whereB = harmonic-mean({K vi v i} i∈M ) ≤ maxi∈V (K ii) mistake → generalization bounds see [CCG04, etc]Online Learning over Graphs (ICML ’05) – p.8/17

Interpretation of Bounds – 1 d G (p,q) length of shortest path p to q for p,q ∈ V . Eccentricity: ρ p := max q∈V d G (p,q) Diameter: D G := max p∈V ρ p Algebraic connectivity: λ 2 2nd smallest eigenvalue of LTheorem:K pp ≤ min( 1 λ 2,ρ p ) Label partition: (g + , g − ) := ({i : g i = 1}, {i : g i = −1})s.t. |g + | + |g − | = n. Cut : ∂(g + , g − ) := {(i,j) ∈ E(G) : i ∈ g + ,j ∈ g − }Online Learning over Graphs (ICML ’05) – p.9/17

Online Active Learning – 1Protocol: A is an index set of active trials. On an active trialt now the learner selects v t .Theorem: Given a sequence of vertex label pairs{(v i ,y i )} l i=1 ⊆ V × {−1, 1} where M is the index set ofmistaken trials then the cumulative mistakes of thealgorithm is bounded by)|M\A| ≤(‖g ∗ ‖ 2 − Z A Bfor all consistent g ∗ where Z A = ∑ ‖g t − g t+1 ‖ 2t∈A} {{ }“progress”Idea: Maximize the “progress”: Z AOnline Learning over Graphs (ICML ’05) – p.11/17

Online Active Learning – 2Strategy: On trial t ∈ A choose v t to max-minimize theper-trial progress ‖g t − g t+1 ‖ 2 thus choose vertexv t= arg maxi∈V= arg maxi∈VObservations:min ‖g t − P({g : 〈g,K i 〉y ≥ 1}; g t )‖ 2y∈{−1,1}(min(|g t,i |, 1) − 1) 2K ii The top is the current “uncertainty” (margin) of vertex i The bottom is a “structural” property of vertex i Since K ii ≤ ρ i as an approximation posit that K ii ∼ ρ iInterpretation: the above criteria trades the label uncertainty(“nearness to previous labels”) with “centrality” of vertexOnline Learning over Graphs (ICML ’05) – p.12/17

Online Active Learning – 3241522 23201916212614181327 28292530173133321134353672891210314 5 6Observe: even if vertex 15 ismaximally uncertain (g 15 =0) vertex 30 is still preferredif the margin |g 30 | ≤ 0.51.Hierarchal random graph G 2 k−out(6; 2)Grey-scaled K pp : K 30,30 = .21 (min),K 15,15 = .94 (max)Online Learning over Graphs (ICML ’05) – p.13/17

Experiments – 1 (Random Graph)200180160MNI−agC−proj1−projAct−stAct−muFuture cumulative error1401201008060402000 100 200 300 400 500 600 700TrialsHierarchal random graph G 2 k−out(26; 2)726 Nodes labeled by a noisy diffusion processOnline Learning over Graphs (ICML ’05) – p.14/17

Experiments – 2 (USPS Even vs Odd)Future Cumulative Error1201008060401−projC−projMNI−agAct−stAct−mu2000 200 400 600 800 1000Trials1000 random samples 100 per digitGraph built via 3-NN with Euclidean distanceOnline Learning over Graphs (ICML ’05) – p.15/17

Selected ReferencesBelkin, M., & Niyogi, P. (2004). Semi-supervised learning onriemannian manifolds. Machine Learning, 56, 209–239.Chung, F. R. (1997). Spectral graph theory. No. 92 in CBMS RegionalConference Series in Mathematics. American Mathematical Society.Smola, A., & Kondor, R. (2003). Kernels and regularization ongraphs. COLT 2003, Proc. (pp. 144–158).Zhu, X., Lafferty, J., & Ghahramani, Z. (2003b). Combining activelearning and semi-supervised learning using gaussian fields andharmonic functions. Proc. of the ICML 2003 workshop on TheContinuum from Labeled to Unlabeled Data in ML and Data Mining(pp. 58–65).Online Learning over Graphs (ICML ’05) – p.16/17

Active Learning DemoActive Learning on Digits 5 and 8 with MNI−− −−−− − −− −−−−−0 points selected−− −−−−−Online Learning over Graphs (ICML ’05) – p.17/17

Online Learning over Graphs - UCL Department of Computer Science

Create successful ePaper yourself

Delete template?

Save as template?