13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

124 CHAPTER 4 | ALGORITHMS: THE BASIC METHODS-w -w a -...- w a = .0 1 1 k k 0Because this is a linear equality in the attribute values, the boundary is a linearplane, or hyperplane, in instance space. It is easy to visualize sets of points thatcannot be separated by a single hyperplane, <strong>and</strong> these cannot be discriminatedcorrectly by logistic regression.Multiresponse linear regression suffers from the same problem. Each classreceives a weight vector calculated from the training data. Focus for the momenton a particular pair of classes. Suppose the weight vector for class 1 is( ) ( ) ( ) ( )0 1 1 2 2k k1 1 11w + w a + w a + ... + w a<strong>and</strong> the same for class 2 with appropriate superscripts. Then, an instance willbe assigned to class 1 rather than class 2 if( ) ( ) ( ) ( ) ( ) ( )0 1 1k k 0 1 1k k1 112 22w + w a + ... + w a > w + w a + ... + w aIn other words, it will be assigned to class 1 if( 1) ( 2) ( 1) ( 2) ( 1) ( 2)( w0- w0)+( w1-w1) a1+ ... + ( wk -wk ) ak> 0.This is a linear inequality in the attribute values, so the boundary between eachpair of classes is a hyperplane. The same holds true when performing pairwiseclassification. The only difference is that the boundary between two classes isgoverned by the training instances in those classes <strong>and</strong> is not influenced by theother classes.Linear classification using the perceptronLogistic regression attempts to produce accurate probability estimates by maximizingthe probability of the training data. Of course, accurate probability estimateslead to accurate classifications. However, it is not necessary to performprobability estimation if the sole purpose of the model is to predict class labels.A different approach is to learn a hyperplane that separates the instances pertainingto the different classes—let’s assume that there are only two of them. Ifthe data can be separated perfectly into two groups using a hyperplane, it is saidto be linearly separable. It turns out that if the data is linearly separable, thereis a very simple algorithm for finding a separating hyperplane.The algorithm is called the perceptron learning rule. Before looking at it indetail, let’s examine the equation for a hyperplane again:wa + wa+ wa + ... + wa = .0 0 1 1 2 2 k k 0Here, a 1 , a 2 ,...,a k are the attribute values, <strong>and</strong> w 0 , w 1 ,...,w k are the weightsthat define the hyperplane. We will assume that each training instance a 1 , a 2 ,. . . is extended by an additional attribute a 0 that always has the value 1 (as wedid in the case of linear regression). This extension, which is called the bias, just

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!