13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

126 CHAPTER 4 | ALGORITHMS: THE BASIC METHODSTo see why this works, consider the situation after an instance a pertainingto the first class has been added:( w + a ) a + ( w + a ) a + ( w + a ) a + ... + ( w + a ) a .0 0 0 1 1 1 2 2 2This means the output for a has increased bya ¥ a + a ¥ a + a ¥ a + ... + a ¥ a .0 0 1 1 2 2kkk k kThis number is always positive. Thus the hyperplane has moved in the correctdirection for classifying instance a as positive. Conversely, if an instance belongingto the second class is misclassified, the output for that instance decreasesafter the modification, again moving the hyperplane to the correct direction.These corrections are incremental <strong>and</strong> can interfere with earlier updates.However, it can be shown that the algorithm converges in a finite number ofiterations if the data is linearly separable. Of course, if the data is not linearlyseparable, the algorithm will not terminate, so an upper bound needs to beimposed on the number of iterations when this method is applied in practice.The resulting hyperplane is called a perceptron, <strong>and</strong> it’s the gr<strong>and</strong>father ofneural networks (we return to neural networks in Section 6.3). Figure 4.10(b)represents the perceptron as a graph with nodes <strong>and</strong> weighted edges, imaginativelytermed a “network” of “neurons.” There are two layers of nodes: input <strong>and</strong>output. The input layer has one node for every attribute, plus an extra node thatis always set to one. The output layer consists of just one node. Every node inthe input layer is connected to the output layer. The connections are weighted,<strong>and</strong> the weights are those numbers found by the perceptron learning rule.When an instance is presented to the perceptron, its attribute values serve to“activate” the input layer. They are multiplied by the weights <strong>and</strong> summed upat the output node. If the weighted sum is greater than 0 the output signal is 1,representing the first class; otherwise, it is -1, representing the second.Linear classification using WinnowThe perceptron algorithm is not the only method that is guaranteed to find aseparating hyperplane for a linearly separable problem. For datasets with binaryattributes there is an alternative known as Winnow, shown in Figure 4.11(a).The structure of the two algorithms is very similar. Like the perceptron, Winnowonly updates the weight vector when a misclassified instance is encountered—it is mistake driven.The two methods differ in how the weights are updated. The perceptron ruleemploys an additive mechanism that alters the weight vector by adding (or subtracting)the instance’s attribute vector. Winnow employs multiplicative updates<strong>and</strong> alters weights individually by multiplying them by the user-specifiedparameter a (or its inverse). The attribute values a i are either 0 or 1 because we

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!