13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.6 CLUSTERING 265ization process makes the final result correct. Note that the final outcome is nota particular cluster but rather the probabilities with which x belongs to clusterA <strong>and</strong> cluster B.The EM algorithmThe problem is that we know neither of these things: not the distribution thateach training instance came from nor the five parameters of the mixture model.So we adopt the procedure used for the k-means clustering algorithm <strong>and</strong>iterate. Start with initial guesses for the five parameters, use them to calculatethe cluster probabilities for each instance, use these probabilities to reestimatethe parameters, <strong>and</strong> repeat. (If you prefer, you can start with guesses for theclasses of the instances instead.) This is called the EM algorithm, for expectation–maximization.The first step, calculation of the cluster probabilities (whichare the “expected” class values) is “expectation”; the second, calculation of thedistribution parameters, is “maximization” of the likelihood of the distributionsgiven the data.A slight adjustment must be made to the parameter estimation equations toaccount for the fact that it is only cluster probabilities, not the clusters themselves,that are known for each instance. These probabilities just act like weights.If w i is the probability that instance i belongs to cluster A, the mean <strong>and</strong> st<strong>and</strong>arddeviation for cluster A arewx 1 1+ wx 2 2+ ... + wx nm A =w + w + ... + ws2 A =1 2n2w1( x1-m) + w2( x2-m) + ... + wn( xn-m)w + w + ... + w1 2n2 2n—where now the x i are all the instances, not just those belonging to cluster A.(This differs in a small detail from the estimate for the st<strong>and</strong>ard deviation givenon page 101. Technically speaking, this is a “maximum likelihood” estimator forthe variance, whereas the formula on page 101 is for an “unbiased” estimator.The difference is not important in practice.)Now consider how to terminate the iteration. The k-means algorithmstops when the classes of the instances don’t change from one iteration to thenext—a “fixed point” has been reached. In the EM algorithm things are not quiteso easy: the algorithm converges toward a fixed point but never actually getsthere. But we can see how close it is by calculating the overall likelihood thatthe data came from this dataset, given the values for the five parameters. Thisoverall likelihood is obtained by multiplying the probabilities of the individualinstances i:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!