13.07.2015 Views

Bayesian Minimum Mean-Square Error Estimation for ... - IEEE Xplore

Bayesian Minimum Mean-Square Error Estimation for ... - IEEE Xplore

Bayesian Minimum Mean-Square Error Estimation for ... - IEEE Xplore

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

116 <strong>IEEE</strong> TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 1, JANUARY 2011TABLE ISUMMARY OF CANCER CLASSIFICATION STUDIES BASED ON LESS THAN 50 SAMPLE POINTSThe per<strong>for</strong>mance of an error estimator concerns the relationbetween the true error and the estimate. The full probabilisticrelation is characterized by the joint distribution of the randomvector of the true and estimated errors, where denotessample size. A commonly used per<strong>for</strong>mance measure, and theone we use here, is the root-mean-square (RMS) error, whichis the square root of the mean-square error (MSE) between theestimated and true errors,The RMS can also be expressed in terms of bias and deviationvariancewhereandOur interest in this paper is to optimize error estimation relativeto RMS, or equivalently MSE, across a family of feature-labeldistributions, given the sample (and, implicitly, theclassifier designed from the sample). Optimization is not new toclassifier error estimation. Under the assumption that the errorestimator is a linear combination of counting estimators, theweights have been optimized relative to a given feature-labeldistribution and classification rule [20]. Here, however, we donot wish to impose a <strong>for</strong>m on the estimator, nor do we wish toassume a known feature-label distribution. Indeed, if we knewthe feature-label distribution, then we could find the exact error.Thepointhereisthatoptimizationcanbedoneacrossaspaceofdistributions and if the mass of the random parameter governingthe space is sufficiently concentrated around the parametervalue corresponding to the actual feature-label distributionfrom which the data have come, then this estimate will bereasonably good <strong>for</strong> the actual distribution—all of this in amanner to be explained. This leads naturally to a <strong>Bayesian</strong>approach <strong>for</strong> estimating classifier errors in which the classconditional distributions satisfy a parametric model and priordistributions govern the parameter probabilities. Given thesample data, we find posterior distributions <strong>for</strong> the parametersand define the <strong>Bayesian</strong> error estimator to be the estimate of(1)the true error which minimizes the minimum mean-squareerror (MMSE) relative to the parameter space and samplingdistribution.Here, in Part I of a two-part study, we define the <strong>Bayesian</strong>MMSE error estimator, discuss basic properties such as robustnessto the priors and unbiasedness, apply it to a discrete classificationproblem using different priors, and examine its per<strong>for</strong>mancewith the discrete histogram rule when compared to classicalpoint-based estimators. In Part II, also in this same issueof the <strong>IEEE</strong> TRANSACTIONS ON SIGNAL PROCESSING, we derivethe <strong>Bayesian</strong> MMSE error estimator <strong>for</strong> linear discriminationin the multivariate Gaussian model. Both discrete classificationand LDA in the Gaussian model are classical problems; indeed,the <strong>for</strong>m of the LDA classifier and the distribution of the trueerror go back to [21] and [22], respectively; however, characterizationof the classical point-based error estimators is muchmore recent. The joint distributions of the true error with the resubstitutionand leave-one-out cross-validation error estimatorsin the case of the discrete histogram rule were only published in2005 [23], the marginal distributions of the resubstitution andleave-out-error estimators <strong>for</strong> LDA in the Gaussian model withknown covariance in 2009 [24], and the joint distributions <strong>for</strong>the true error with the latter in 2010 [25]. This paper pushes thestudy of error estimation ahead by placing it in a rigorous optimizationframework rather than relying on ad hoc “intuitive”estimation rules.<strong>Bayesian</strong> error estimation <strong>for</strong> classification is not completelynew, although we know of no work in recent years. In the 1960s,two papers made small <strong>for</strong>ays into the area. In [26], a <strong>Bayesian</strong>error estimator is given <strong>for</strong> the univariate Gaussian model withknown covariance matrices. In [27], the problem is addressedin the multivariate Gaussian model <strong>for</strong> a particular linear classificationrule based on Fishers discriminant <strong>for</strong> a common unknowncovariance matrix and known class probabilities by usinga specific prior on the means and the inverse of the covariancematrix. In neither case were the properties or per<strong>for</strong>mance ofthese estimators considered. In Part II, we derive the <strong>Bayesian</strong>MMSE error estimator <strong>for</strong> an arbitrary linear classification rulein the multivariate Gaussian model with unknown, independentcovariance matrices and unknown class probabilities using ageneral class of priors on the means and an intermediate parameterthat allows us to impose structure on the covariance ma-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!