14.08.2013 Views

The Laplace Approximation of Gaussian Process Mixture

The Laplace Approximation of Gaussian Process Mixture

The Laplace Approximation of Gaussian Process Mixture

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>The</strong> <strong>Laplace</strong> <strong>Approximation</strong> <strong>of</strong> <strong>Gaussian</strong> <strong>Process</strong> <strong>Mixture</strong><br />

Topic: Learning Algorithms<br />

Preference: oral<br />

Zhengdong Lu<br />

Department <strong>of</strong> Computer Science and Engineering<br />

OGI School <strong>of</strong> Science and Engineering , OHSU<br />

{zhengdon}@csee.ogi.edu<br />

Various models <strong>of</strong> <strong>Gaussian</strong> process mixture have been proposed, mainly to address the non-stationarity in regression<br />

[1, 2, 3]. Typically, a <strong>Gaussian</strong> process mixture consists <strong>of</strong> M <strong>Gaussian</strong> processes F = {f1, f2, · · · , fM } with zero<br />

mean and covariance function specified by parameters Θ = {θ1, θ2, · · · , θM }, and a gating network g. For any input<br />

x, g(x) = [g1(x), g2(x), · · · , gM (x)] with<br />

M<br />

gm(x) ≥ 0, m = 1, 2, · · · , M and gm(x) = 1.<br />

As in [1], the probability model is as follows. Given input X = {x1, x2, ..., xN}, we define the latent variable<br />

Z = {z1, z2, ..., zN} to indicate which process each xi is involved in. <strong>The</strong> probability <strong>of</strong> Z, given the gating network<br />

g and input X, is<br />

N<br />

N<br />

P (Z|X, g) = P (zi|xi, g) = gzi(xi). (1)<br />

i=1<br />

<strong>The</strong> probability <strong>of</strong> the output Y = {y1, y2, ..., yN} is<br />

P (Y |X, g, Θ) = <br />

P (Y, Z|X, g, Θ) = <br />

P (Y |X, Z, Θ)P (Z|X, g) (2)<br />

Z<br />

where P (Y |X, Z, Θ) = M m=1 P ({yi : zi = m}|{xi : zi = m}; θm). Here, P ({yi : zi = m}|{xi : zi = m}; θm) is<br />

the <strong>Gaussian</strong> distribution <strong>of</strong> all the yi generated by fm according to Z. <strong>The</strong> <strong>Gaussian</strong> process mixture given in equation<br />

(1)-(2) is suitable for situation where the domain consists <strong>of</strong> different regimes that should be described by different<br />

types <strong>of</strong> <strong>Gaussian</strong> processes. Unfortunately the regression based on equation (1)-(2) is generally intractable due to<br />

the exponential number <strong>of</strong> summations in equation (2). Traditionally this problem is solved by sampling [1]. Our<br />

method, instead, considers the <strong>Laplace</strong> approximation <strong>of</strong> P (Y |X, g, Θ), which is a <strong>Gaussian</strong> process with zero mean<br />

and the covariance function as the GP mixture P (Y |X, g, Θ). Interestingly, this approximation becomes exact when<br />

the gating network g gives us a hard partition <strong>of</strong> domain [3]. Easy to show that the covariance function <strong>of</strong> <strong>Gaussian</strong><br />

process mixture is:<br />

M<br />

E(yiyj|xi, xj, g, Θ) = Km(xi, xj)P (zi = m, zj = m|xi, xj, g) (3)<br />

m=1<br />

where Km(xi, xj) is the covariance function associated with process fm. Using ˆ K(·, ·; g, Θ) to denote the covariance<br />

function given in equation (3), our <strong>Laplace</strong> approximation <strong>of</strong> the likelihood <strong>of</strong> Y is PL(Y |X, ˆ 1<br />

K) = √<br />

2πN | KX ˆ +σ2I| exp(− 1<br />

2y′ ( ˆ KX + σ2I) −1y), where σ2 is the variance <strong>of</strong> observation noise and ˆ KX is the covariance matrix evaluated<br />

on X. We find a suitable gating network by maximizing the data likelihood PL.<br />

Figure 1 and 2 show the regression results <strong>of</strong> our model on two toy examples. In these two experiments, we<br />

considered the mixture <strong>of</strong> two <strong>Gaussian</strong> processes with covariance function <strong>of</strong> process fm, (m = 1, 2) specified as<br />

Z<br />

m=1<br />

i=1<br />

Km(xi, xj) = exp(−||xi − xj|| 2 /s 2 m), m = 1, 2<br />

where sm is the width <strong>of</strong> the RBF kernel for the m th <strong>Gaussian</strong> process, which are chosen beforehand. We use the<br />

same gating network used in [2] and fit it with gradient descent. We report the regression result given by the probability<br />

PL(y|x, X, Y, Θ, ˆ K). As shown in the two figures, our model automatically finds the appropriate gating network and<br />

yields regression result significantly better than the <strong>Gaussian</strong> process regression with either f1 or f2.<br />

1


g 1<br />

1.5<br />

1<br />

0.5<br />

0<br />

0 0.5 1<br />

x<br />

1.5 2<br />

(a) Output <strong>of</strong> g1<br />

y<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

2<br />

K1:s =3<br />

1<br />

2<br />

K2:s =0.05<br />

2<br />

fitted K<br />

−1.5<br />

0 0.5 1<br />

x<br />

1.5 2<br />

(b) Regression result<br />

Figure 1: <strong>The</strong> regression result.(b)Blue line: true curve. Blue stars: noisy observation. Black curve: f1, s 2 1 = 3 ; Red<br />

curve: f2, s 2 2 = 0.05 ; Green curve: <strong>Laplace</strong> <strong>Approximation</strong> <strong>of</strong> mixture <strong>of</strong> {f1, f2} with fitted K.<br />

g 1<br />

1.5<br />

1<br />

0.5<br />

0<br />

0 0.5 1<br />

x<br />

1.5 2<br />

(a) Output <strong>of</strong> g1<br />

y<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

2<br />

K1:s =0.5<br />

1<br />

2<br />

K2:s =0.01<br />

2<br />

fitted K<br />

−1.5<br />

0 0.5 1<br />

x<br />

1.5 2<br />

(b) Regression result<br />

Figure 2: <strong>The</strong> regression result. (b) Blue line: truth. Blue stars: noisy observation. Black curve: f1, s 2 1 = 0.5 ; Red<br />

curve: f2, s 2 2 = 0.01 ; Green curve: <strong>Laplace</strong> <strong>Approximation</strong> <strong>of</strong> mixture <strong>of</strong> {f1, f2} with fitted K.<br />

References<br />

[1] C. Rasmussen and Z. Ghahramani. Infinite mixtures <strong>of</strong> gaussian process experts. In NIPS 14, 2002.<br />

[2] V. Tresp. <strong>Mixture</strong>s <strong>of</strong> gaussian processes. In NIPS 13, 2001.<br />

[3] O. Walliams. A Switched <strong>Gaussian</strong> <strong>Process</strong> for Estimating Disparity and Segmentation in Binocular Stereo. In<br />

NIPS 19, 2007.<br />

2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!