13.07.2015 Views

Bayesian Minimum Mean-Square Error Estimation for ... - IEEE Xplore

Bayesian Minimum Mean-Square Error Estimation for ... - IEEE Xplore

Bayesian Minimum Mean-Square Error Estimation for ... - IEEE Xplore

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DALTON AND DOUGHERTY: BAYESIAN MINIMUM MEAN-SQUARE ERROR ESTIMATION FOR CLASSIFICATION ERROR—PART I 119When finding the posterior probabilities <strong>for</strong> the parameters,we need only consider the sample points from the correspondingclass. We find using Bayes’ rule:(10)where the constant of proportionality can be found by normalizingthe integral of to 1. The term iscalled the likelihood function.Although we call the “prior probabilities,” they are notrequired to be valid density functions. In particular, the priorsare called “improper” if the integral of is infinite, i.e.,if induces a -finite measure but not a finite probabilitymeasure. Such priors can be used to represent uni<strong>for</strong>m weight<strong>for</strong> all parameters in an unbounded range, rather than truncatingthe range of each parameter to a finite range. When improperpriors are used, Bayes’ rule does not apply so we take (10) asa definition, but normalize the posterior distributions to have aunit integral as usual. This is sometimes justified in the sensethat the posterior distribution (or a <strong>Bayesian</strong> estimate) obtainedfrom an improper prior is equivalent to a limit of posterior distributions(or <strong>Bayesian</strong> estimates) from some sequence of properprior distributions [29]–[31]. However, extra care must be takento ensure that the resulting posterior density can be normalizedand makes sense.For the class prior probabilities, we only need to consider thesize of each class:(11)where we have taken advantage of the fact that given hasa binomial distribution.We present three useful models <strong>for</strong> the prior distributions ofthe a priori class probabilities: beta, uni<strong>for</strong>m, and known. Aswe will see, the <strong>Bayesian</strong> MMSE error estimator requires onlythe posterior expectation, .If we assume the prior distribution <strong>for</strong> is beta distributed,then the posterior distribution <strong>for</strong> can be simplifiedfrom (11). From this Beta-binomial model, the <strong>for</strong>m of isstill a beta distribution and, in particularwhere is the beta function. The expectation of this distributionis given by [32]In the special case where we have uni<strong>for</strong>m priors which assumethat initially all parameters are equally likely, we have, and(12)(13)Finally, to apply a known prior we define the parameter tohave a trivial sample space with one point. Then, the expectationis simply the known value <strong>for</strong> , regardless of the data. Note ifstratified sampling is used, is essentially given in the data and.D. Evaluating the <strong>Bayesian</strong> MMSE <strong>Error</strong> EstimatorOwing to the posterior independence between and ,and since is a function of only, the <strong>Bayesian</strong> MMSE errorestimator can be expressed as(14)where depends on our prior assumptions about the priorclass probability. For example, if we assume flat priors <strong>for</strong> andapply (13), thenmay be viewed as the posterior expectation <strong>for</strong> the errorcontributed by class . With a fixed classifier and given , thetrue error, , is deterministic and(15)In all examples here and in the sequel, we derive <strong>for</strong>each class using (15), find according to our prior model<strong>for</strong> , and refer to (14) <strong>for</strong> the complete <strong>Bayesian</strong> error estimator.When a closed-<strong>for</strong>m solution is not available, (15) can be approximatedusing the following method. First, discretize the parameterspace by generating a list of parameters to test. In oursimulations, we generated separate lists (uni<strong>for</strong>mly spaced <strong>for</strong>flat priors) <strong>for</strong> each class near the estimated parameters sincethese will have the most influence on the <strong>Bayesian</strong> error estimator.Then calculate the posterior probabilities, , <strong>for</strong>every set of parameters in the list and normalize these to sumto 1. With the classifier fixed from the observed samples,is deterministic <strong>for</strong> every set of parameters, so we may computethe true error corresponding to each in the list. The approximate<strong>Bayesian</strong> error estimate is then found by averagingthe true errors, , weighted by the probabilities . Computingthe approximate error estimate this way is very laborious,and these results may deviate from the exact error estimateif the sample space is not discretized finely enough. From astudy using Gaussian distributions, as a rule of thumb we foundthat each degree of freedom in the parameters should be discretizedinto 6 to 15 bins, and the size of the list of parametersto test tends to grow exponentially with the number of degreesof freedom.IV. DISCRETE CLASSIFICATIONWe next illustrate the <strong>Bayesian</strong> error estimator applied tothe discrete classification setting. Discrete classification, also

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!