13.07.2015 Views

Mixture Density Mercer Kernels - Intelligent Systems Division - NASA

Mixture Density Mercer Kernels - Intelligent Systems Division - NASA

Mixture Density Mercer Kernels - Intelligent Systems Division - NASA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2 Notation• p is the dimension of the data• M is the space of models from which the mixturedensity models are drawn.• F is the feature space, which may be a highdimensional (but finite) space, or more generallyan infinite dimensional Hilbert space.• N is the number of data points x i drawn from a pdimensional space• M is the number of probabilistic models used ingenerating the kernel function.• C is the number of mixture components in eachprobabilistic model. In principle one can use adifferent number of mixture components in eachmodel. However, here we choose a fixed numberfor simplicity.• x i is a p × 1 dimensional real column vector thatrepresents the data sampled from a data set X .• Φ(x) : R p ↦→ F is generally a nonlinear mappingto a high, possibly infinite dimensional, featurespace F. This mapping operator may be explicitlydefined or may be implicitly defined via a kernelfunction.• K(x i , x j ) = Φ(x i )Φ T (x j ) ∈ R is the kernelfunction that measures the similarity between datapoints x i and x j . If K is a <strong>Mercer</strong> kernel, it canbe written as the outer product of the map Φ.As i and j sweep through the N data points, itgenerates an N × N kernel matrix.• Θ is the entire set of parameters that specify amixture model.3 <strong>Mixture</strong> Models: A Sample from a ModelSpaceIn this section, we briefly motivate the development of<strong>Mixture</strong> <strong>Density</strong> <strong>Kernels</strong> by showing that the combinedresult of model misspecification and the effects of a finitedata set can lead to high uncertainty in the estimate ofa mixture model. We closely follow the arguments givenin [15].Suppose that a data set X is generated by drawing afinite sample from a mixture density function f(Λ ∗ , Θ ∗ ),where Λ ∗ defines the true density function (say a Gaussianmixture density) and is a sample from a large butfinite class of models M, and Θ ∗ defines the true setof parameters of that density function. In the case ofthe Gaussian mixture density, these parameters wouldbe the means and covariance matrices for each Gaussiancomponent, and the number of components thatcomprise the model. We can compute the probabilityof obtaining the correct model given the data as follows(see Smyth and Wolpert, 1998 for a detailed discussion).The posterior probability of the true densityf ∗ ≡ f(Λ ∗ , Θ ∗ ) given the data X is:∫ ∫P (f(Λ ∗ , Θ ∗ )|X ) =P (Λ, Θ|X ) ×MR(M)δ(f ∗ − P (Λ, Θ))dΛdΘwhere the first integral is taken over the model spaceM and the second integral is taken over the region inthe parameter space that is appropriate for the modelΛ, and δ is the Dirac delta function. Using Bayes rule,it is possible to expand the posterior into a product ofthe posterior of the model uncertainty and the posteriorof the parameter uncertainty. Thus, we have:∫ ∫P (f ∗ |X ) =P (Λ|X )P (Θ|Λ, X ) ×=MR(M)δ(f ∗ − P (Λ, Θ))dΛdΘ∫ ∫1P (Θ|Λ, X )P (Λ, X ) ×P (X )MR(M)δ(f ∗ − P (Λ, Θ))dΛdΘThe first equation above shows that there are twosources of variation in the estimation of the densityfunction. The first is due to model misspecification, andthe second is due to parameter uncertainty. The secondequation shows that if prior information is available, itcan be used to modify the likelihood of the data in orderto obtain a better estimate of the true density function.The goal of this paper is to seek a representationof the posterior P (x i |Θ) by reducing these errors byembedding x i in a high dimensional feature space thatdefines a kernel function.4 Review of Kernel Functions<strong>Mercer</strong> Kernel functions can be viewed as a measureof the similarity between two data points that are embeddedin a high, possibly infinite dimensional featurespace. For a finite sample of data X , the kernel func-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!