The_Future_of_Employment
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Williams, 2006). Such classifiers model the latent function f with a Gaussian<br />
process (GP): a non-parametric probability distribution over functions.<br />
A GP is defined as a distribution over the functionsf: X → R such that the<br />
distribution over the possible function values on any finite subset <strong>of</strong>X (such as<br />
X) is multivariate Gaussian. For a functionf(x), the prior distribution over its<br />
valuesf on a subsetx ⊂ X are completely specified by a covariance matrixK<br />
(4)<br />
p(f | K) = N(f;0,K) =<br />
1<br />
√<br />
det2πK<br />
exp ( − 1 2 f⊺ K −1 f ) .<br />
<strong>The</strong> covariance matrix is generated by a covariance function κ : X ×X ↦→ R;<br />
that is, K = κ(X,X). <strong>The</strong> GP model is expressed by the choice <strong>of</strong> κ; we consider<br />
the exponentiated quadratic (squared exponential) and rational quadratic.<br />
Note that we have chosen a zero mean function, encoding the assumption that<br />
P(y ∗ = 1) = 1 sufficiently far from training data.<br />
2<br />
Given training dataD, we use the GP to make predictions about the function<br />
valuesf ∗ at inputx ∗ . With this information, we have the predictive equations<br />
(5)<br />
p(f ∗ | x ∗ ,D) = N ( f ∗ ;m(f ∗ | x ∗ ,D),V(f ∗ | x ∗ ,D) ) ,<br />
where<br />
(6)<br />
(7)<br />
m(f ∗ | x ∗ ,D) = K(x ∗ ,X)K(X,X) −1 y<br />
V(f ∗ | x ∗ ,D) = K(x ∗ ,x ∗ )−K(x ∗ ,X)K(X,X) −1 K(X,x ∗ ).<br />
Inferring the label posterior p(y ∗ | x ∗ ,D) is complicated by the non-Gaussian<br />
form <strong>of</strong> the logistic (3). In order to effect inference, we use the approximate<br />
Expectation Propagation algorithm (Minka, 2001).<br />
We tested three Gaussian process classifiers using the GPML toolbox (Rasmussen<br />
and Nickisch, 2010) on our data, built around exponentiated quadratic,<br />
rational quadratic and linear covariances. Note that the latter is equivalent to<br />
logistic regression with a Gaussian prior taken on the weights w. To validate<br />
these classifiers, we randomly selected a reduced training set <strong>of</strong> half the available<br />
dataD; the remaining data formed a test set. On this test set, we evaluated<br />
how closely the algorithm’s classifications matched the hand labels according<br />
to two metrics (see e.g. Murphy (2012)): the area under the receiver operat-<br />
33