Shared Gaussian Process Latent Variables Models - Oxford Brookes ...
Shared Gaussian Process Latent Variables Models - Oxford Brookes ...
Shared Gaussian Process Latent Variables Models - Oxford Brookes ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2.5. NON-LINEAR 27<br />
resentation. This is the fundamental background to the Kernel Trick which<br />
is a way of non-linearizing algorithms that depend only on the inner product<br />
between data-points. Even though an accepted term, it is not clear where the<br />
term was initially suggested. The Kernel Trick is based on that rather than<br />
finding a specific mapping Φ that takes the data to the feature space F we<br />
specify a function k(yi,yj), called the kernel function that parameterizes the<br />
inner product between Φ(yi) and Φ(yj),<br />
k(yi,yj) = Φ(yi) T Φ(yj). (2.32)<br />
Evaluated between each pair of points in the data-set the kernel function k<br />
specifies the kernel matrix K(Y,Y) which specifies the Gram matrix in the<br />
feature space F . From Eq. 2.17 we know that the Gram matrix and a dis-<br />
tance matrix is interchangeable representation for centered data. Therefore<br />
as long as the kernel function k specifies a valid Gram matrix K there is an<br />
underlying geometrical representation of data in F . The class of kernel func-<br />
tions that specifies geometrically representable feature spaces are known as<br />
Mercer Kernels [41, 50]. Mercer Kernels are positive semidefinite, i.e in the<br />
spectral decomposition of the resulting kernel matrix K all eigenvalues are<br />
non-negative. Intuitively this can be understood through Eq. 2.17, if the<br />
eigenvalues where to be negative then by adding basis vectors the distance<br />
between two points would be reduced, which is not possible in a Euclidean<br />
space. Using a kernel function to represent the data the feature space F is<br />
know as a kernel induced feature space.<br />
One advantage using kernel induced feature space is that if we aim to