Shared Gaussian Process Latent Variables Models - Oxford Brookes ...

More documents

Recommendations

Info

2.5. NON-LINEAR 27 resentation. This is the fundamental background to the Kernel Trick which is a way of non-linearizing algorithms that depend only on the inner product between data-points. Even though an accepted term, it is not clear where the term was initially suggested. The Kernel Trick is based on that rather than finding a specific mapping Φ that takes the data to the feature space F we specify a function k(yi,yj), called the kernel function that parameterizes the inner product between Φ(yi) and Φ(yj), k(yi,yj) = Φ(yi) T Φ(yj). (2.32) Evaluated between each pair of points in the data-set the kernel function k specifies the kernel matrix K(Y,Y) which specifies the Gram matrix in the feature space F . From Eq. 2.17 we know that the Gram matrix and a distance matrix is interchangeable representation for centered data. Therefore as long as the kernel function k specifies a valid Gram matrix K there is an underlying geometrical representation of data in F . The class of kernel functions that specifies geometrically representable feature spaces are known as Mercer Kernels [41, 50]. Mercer Kernels are positive semidefinite, i.e in the spectral decomposition of the resulting kernel matrix K all eigenvalues are non-negative. Intuitively this can be understood through Eq. 2.17, if the eigenvalues where to be negative then by adding basis vectors the distance between two points would be reduced, which is not possible in a Euclidean space. Using a kernel function to represent the data the feature space F is know as a kernel induced feature space. One advantage using kernel induced feature space is that if we aim to
28 CHAPTER 2. BACKGROUND apply an algorithm to the data which is only in terms of the inner product between data points we never need to find the geometrical representation of the data in F . This means that kernels resulting in potentially infinite dimensional spaces can be used. One such kernel function is the RBF kernel, k(yi,yj) = θ1e − θ 2 2 ||yi−yj|| 2 2, (2.33) with parameters {θ1, θ2}. If the inner product is specified by an RBF kernel any combination of points yi and yj will have a non-zero inner product. For this to be possible the feature space F will need to be infinite dimensional. PCA works by diagonalizing the covariance matrix of the data through the spectral decomposition. Kernel PCA [51] is formulated by first finding the Gram matrix in the kernel induced feature space. By representing each point using the basis of the data itself the Gram matrix is equivalent to the covariance matrix. A reduced representation can now be found through the spectral decomposition of the kernel matrix K. For many popular kernels, such as RBF kernels, the kernel represents feature spaces of higher dimensionality compared to the original data-space meaning that the mapping increases the dimensionality of the data. However, even though the dimensionality of the data might have been increased, the rel- ative ratio of the eigenvalues of the spectral decomposition of the covariance matrix might result in fewer eigenvectors, compared to the decomposition in the original space, that represents a significant variance meaning that a lower dimensional approximation of the data can be found. So strictly speaking, for many types of kernel functions, Kernel PCA is not a dimensionality reduction
Page 1 and 2: Shared Gaussian Process Latent Vari
Page 3 and 4: Acknowledgements 2
Page 5 and 6: 4 CONTENTS 2.7.2 Training . . . . .
Page 7 and 8: 6 CONTENTS 5.10 Summary . . . . . .
Page 9 and 10: 8 LIST OF FIGURES 3.11 Toy data3: l
Page 11 and 12: Chapter 1 Introduction Information
Page 13 and 14: 12 CHAPTER 1. INTRODUCTION current
Page 15 and 16: Chapter 2 Background 2.1 Introducti
Page 17 and 18: 16 CHAPTER 2. BACKGROUND 1 0.9 0.8
Page 19 and 20: 18 CHAPTER 2. BACKGROUND X to its o
Page 21 and 22: 20 CHAPTER 2. BACKGROUND dimensions
Page 23 and 24: 22 CHAPTER 2. BACKGROUND approximat
Page 25 and 26: 24 CHAPTER 2. BACKGROUND diagonal c
Page 27: 26 CHAPTER 2. BACKGROUND unraveled.
Page 31 and 32: 30 CHAPTER 2. BACKGROUND be connect
Page 33 and 34: 32 CHAPTER 2. BACKGROUND Local Line
Page 35 and 36: 34 CHAPTER 2. BACKGROUND 15 10 5 0
Page 37 and 38: 36 CHAPTER 2. BACKGROUND by W, Figu
Page 39 and 40: 38 CHAPTER 2. BACKGROUND Figure 2.4
Page 41 and 42: 40 CHAPTER 2. BACKGROUND X ∈ ℜ
Page 43 and 44: 42 CHAPTER 2. BACKGROUND 2.5 2 1.5
Page 45 and 46: 44 CHAPTER 2. BACKGROUND rupted by
Page 47 and 48: 46 CHAPTER 2. BACKGROUND where ˜ C
Page 49 and 50: 48 CHAPTER 2. BACKGROUND prior over
Page 51 and 52: 50 CHAPTER 2. BACKGROUND generative
Page 53 and 54: 52 CHAPTER 2. BACKGROUND the latent
Page 55 and 56: 54 CHAPTER 2. BACKGROUND 2.9 Shared
Page 57 and 58: 56 CHAPTER 2. BACKGROUND Y X Figure
Page 59 and 60: 58 CHAPTER 2. BACKGROUND Gaussian p
Page 61 and 62: 60 CHAPTER 3. SHARED GP-LVM sponden
Page 63 and 64: 62 CHAPTER 3. SHARED GP-LVM makes m
Page 65 and 66: 64 CHAPTER 3. SHARED GP-LVM where t
Page 67 and 68: 66 CHAPTER 3. SHARED GP-LVM 4 3 2 1
Page 69 and 70: 68 CHAPTER 3. SHARED GP-LVM 1 0.8 0
Page 71 and 72: 70 CHAPTER 3. SHARED GP-LVM Example
Page 73 and 74: 72 CHAPTER 3. SHARED GP-LVM 0.3 0.2
Page 75 and 76: 74 CHAPTER 3. SHARED GP-LVM φ Y X
Page 77 and 78: 76 CHAPTER 3. SHARED GP-LVM leading
Page 79 and 80:
78 CHAPTER 3. SHARED GP-LVM servati
Page 81 and 82:
Chapter 4 NCCA 4.1 Introduction In
Page 83 and 84:
82 CHAPTER 4. NCCA as u Y i = x S
Page 85 and 86:
84 CHAPTER 4. NCCA X Y Z X X S Y Z
Page 87 and 88:
86 CHAPTER 4. NCCA By pre-multiplyi
Page 89 and 90:
88 CHAPTER 4. NCCA 1 0.8 0.6 0.4 0.
Page 91 and 92:
90 CHAPTER 4. NCCA 4.5 Extensions W
Page 93 and 94:
Chapter 5 Applications 5.1 Introduc
Page 95 and 96:
94 CHAPTER 5. APPLICATIONS as optic
Page 97 and 98:
96 CHAPTER 5. APPLICATIONS of an im
Page 99 and 100:
98 CHAPTER 5. APPLICATIONS histogra
Page 101 and 102:
100 CHAPTER 5. APPLICATIONS the dat
Page 103 and 104:
102 CHAPTER 5. APPLICATIONS Figure
Page 105 and 106:
104 CHAPTER 5. APPLICATIONS the GP
Page 107 and 108:
106 CHAPTER 5. APPLICATIONS varianc
Page 109 and 110:
108 CHAPTER 5. APPLICATIONS locatio
Page 111 and 112:
Page 113 and 114:
112 CHAPTER 5. APPLICATIONS Error (
Page 115 and 116:
Page 117 and 118:
116 CHAPTER 5. APPLICATIONS over th
Page 119 and 120:
118 CHAPTER 5. APPLICATIONS age rep
Page 121 and 122:
120 CHAPTER 6. CONCLUSIONS vated an
Page 123 and 124:
122 CHAPTER 6. CONCLUSIONS a shared
Page 125 and 126:
124 BIBLIOGRAPHY [7] S. Belongie, J
Page 127 and 128:
126 BIBLIOGRAPHY [24] K. Grochow, S
Page 129 and 130:
128 BIBLIOGRAPHY [39] D. MacKay. Ba
Page 131 and 132:
130 BIBLIOGRAPHY [56] H. A. Simon.
Page 133 and 134:
132 BIBLIOGRAPHY [74] S. Wachter an
show all

Shared Gaussian Process Latent Variables Models - Oxford Brookes ...

Create successful ePaper yourself

Delete template?

Save as template?