26.08.2019 Views

The_Future_of_Employment

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Williams, 2006). Such classifiers model the latent function f with a Gaussian<br />

process (GP): a non-parametric probability distribution over functions.<br />

A GP is defined as a distribution over the functionsf: X → R such that the<br />

distribution over the possible function values on any finite subset <strong>of</strong>X (such as<br />

X) is multivariate Gaussian. For a functionf(x), the prior distribution over its<br />

valuesf on a subsetx ⊂ X are completely specified by a covariance matrixK<br />

(4)<br />

p(f | K) = N(f;0,K) =<br />

1<br />

√<br />

det2πK<br />

exp ( − 1 2 f⊺ K −1 f ) .<br />

<strong>The</strong> covariance matrix is generated by a covariance function κ : X ×X ↦→ R;<br />

that is, K = κ(X,X). <strong>The</strong> GP model is expressed by the choice <strong>of</strong> κ; we consider<br />

the exponentiated quadratic (squared exponential) and rational quadratic.<br />

Note that we have chosen a zero mean function, encoding the assumption that<br />

P(y ∗ = 1) = 1 sufficiently far from training data.<br />

2<br />

Given training dataD, we use the GP to make predictions about the function<br />

valuesf ∗ at inputx ∗ . With this information, we have the predictive equations<br />

(5)<br />

p(f ∗ | x ∗ ,D) = N ( f ∗ ;m(f ∗ | x ∗ ,D),V(f ∗ | x ∗ ,D) ) ,<br />

where<br />

(6)<br />

(7)<br />

m(f ∗ | x ∗ ,D) = K(x ∗ ,X)K(X,X) −1 y<br />

V(f ∗ | x ∗ ,D) = K(x ∗ ,x ∗ )−K(x ∗ ,X)K(X,X) −1 K(X,x ∗ ).<br />

Inferring the label posterior p(y ∗ | x ∗ ,D) is complicated by the non-Gaussian<br />

form <strong>of</strong> the logistic (3). In order to effect inference, we use the approximate<br />

Expectation Propagation algorithm (Minka, 2001).<br />

We tested three Gaussian process classifiers using the GPML toolbox (Rasmussen<br />

and Nickisch, 2010) on our data, built around exponentiated quadratic,<br />

rational quadratic and linear covariances. Note that the latter is equivalent to<br />

logistic regression with a Gaussian prior taken on the weights w. To validate<br />

these classifiers, we randomly selected a reduced training set <strong>of</strong> half the available<br />

dataD; the remaining data formed a test set. On this test set, we evaluated<br />

how closely the algorithm’s classifications matched the hand labels according<br />

to two metrics (see e.g. Murphy (2012)): the area under the receiver operat-<br />

33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!