Ing. H. Ney 9. Exercise Sheet Pattern Recognition and Neural N
Ing. H. Ney 9. Exercise Sheet Pattern Recognition and Neural N
Ing. H. Ney 9. Exercise Sheet Pattern Recognition and Neural N
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Lehrstuhl für Informatik VI Tuesday, June 21th, 2011<br />
Rheinisch–Westfälische Technische Hochschule Aachen Simon Wiesler & Patrick Lehnen<br />
Prof. Dr.–<strong>Ing</strong>. H. <strong>Ney</strong><br />
<strong>9.</strong> <strong>Exercise</strong> <strong>Sheet</strong> <strong>Pattern</strong> <strong>Recognition</strong> <strong>and</strong> <strong>Neural</strong> Networks<br />
The solutions may be submitted in groups of up to three students until the next exercise lesson on<br />
Friday, July 1st, 2011, either in the secretariat of the Lehrstuhl für Informatik VI, per email to<br />
wiesler@cs.rwth-aachen.de <strong>and</strong> lehnen@cs.rwth-aachen.de, or at the exercise lesson.The problems indicated<br />
with (**. . . ) are optional. Bachelor students do not need to submit this exercise, all points are<br />
optional for them.<br />
1. (Repetition) What is?<br />
Give a description of the terms given in the itemization below. The length of the description<br />
should be one paragraph.<br />
(a) Bayes decision rule (** 1P)<br />
(b) Bayes error (** 1P)<br />
(c) Maximum Likelihood (** 1P)<br />
2. k-Nearest Neighbour Classification (* 5P)<br />
The k-nearest neighbour method is an extension to the nearest neighbour classifier, where the<br />
classification decision c(x) is taken using a majority vote among the k training samples closest<br />
to the test sample x, compared to the nearest neighbour method, where the decision is taken by<br />
looking only at the closest training sample. If k = 1 the k nearest neighbour method is equal to<br />
normal nearest neighbour.<br />
Use the k nearest neighbour implementation from “Netlab” 1 for classification on the USPS digit<br />
recognition task (see <strong>Exercise</strong> <strong>Sheet</strong> 6). Tabulate the error rate for test <strong>and</strong> training set for k in<br />
the range from 1 to 10. Interpret your results – what effect does the parameter k have?<br />
Hints: You will need the methods knn <strong>and</strong> knnfwd from Netlab, which are described in the help<br />
pages “knn.html”, “knnfwd.html” <strong>and</strong> the demo “demknn1.html”. A Matlab solution framework<br />
is provided at pub/PatRec/usps/kNN framework.m on our server wasserstoff.informatik.rwthaachen.de.<br />
See second page on reverse side!<br />
1 http://www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/
3. Gaussian Mixture Distributions <strong>and</strong> the EM Algorithm<br />
Consider the estimation of the parameters of a Gaussian mixture distribution with a pooled<br />
diagonal covariance matrix<br />
Ik�<br />
p(x|k) = cki · N (x|µki, Σ)<br />
i=1<br />
using the Expectation Maximization (EM) algorithm applied to the image recognition system for<br />
h<strong>and</strong>written digits from the US Postal (USPS) corpus as discussed in <strong>Exercise</strong> 6 (cf. directories<br />
pub/PatRec/usps/ <strong>and</strong> pub/PatRec/usps/SOLUTION on our server wasserstoff.informatik.rwthaachen.de).<br />
(a) Derive the EM estimates of the mean vectors µk,i, the pooled diagonal covariance<br />
matrices Σ with Σdd ≡ σ 2 d , <strong>and</strong> the mixture weights ck,i of Gaussian mixture<br />
distributions, k = 1, . . . , K <strong>and</strong> i = 1, . . . , Ik.<br />
(b) Implement the EM estimation of the above Gaussian mixture distributions for<br />
USPS. Use the solution presented in <strong>Exercise</strong> 6 as a starting point <strong>and</strong> use the<br />
Maximum Likelihood training result from <strong>Exercise</strong> 6 as initialization, which you<br />
find in the correct format at pub/PatRec/usps/usps.mixture.initparam on our server<br />
wasserstoff. Your implementation should take the number of EM reestimation iterations<br />
as an additional input parameter.<br />
In order to be able to increase the number Ik of densities per class, double each<br />
density by perturbing it by a small amount in opposite directions: µ → {µ−ɛ, µ+ɛ}<br />
<strong>and</strong> distribute the mixture weight equally to both new densities. Make sure that the<br />
perturbations ɛ are proportional to <strong>and</strong> smaller than the corresponding variances.<br />
Assume training data given by D-dimensional observation vectors xn ∈ IR,<br />
n = 1, . . . , N. Use the format introduced in pub/PatRec/usps/usps.mixture.README<br />
on our server wasserstoff to read the initial <strong>and</strong> store the resulting parameter<br />
sets. Use pub/PatRec/usps/usps.train to produce parameter sets for Ik = 2 <strong>and</strong><br />
Ik = 4 using 10 EM iterations each. Monitor the avergage score (log-likelihood)<br />
per observation for each iteration <strong>and</strong> check that it increases.<br />
(c) Generalize the recognizer for the USPS problem from <strong>Exercise</strong> 6 to the case of using<br />
mixture densities together with pooled diagonal covariance matrices. Recognize<br />
pub/PatRec/usps/usps.test using the parameters obtained above for I = 2 <strong>and</strong><br />
I = 4 <strong>and</strong> compare this to the results obtained in Ex. 6.<br />
(* 5P)<br />
(* 6P)<br />
(* 4P)<br />
Hints: Use the initial parameter set pub/PatRec/usps/usps.mixture.initparam for the case Ik = 1<br />
as testing cases for both your implementation of the training (without previous splitting, the parameters<br />
for the case Ik = 1 need to remain the same), <strong>and</strong> of your classifier (the result needs to<br />
be the same as using pooled diagonal covariances with the old implementation using one Gaussian<br />
per class.