13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

336 CHAPTER 7 | TRANSFORMATIONS: ENGINEERING THE INPUT AND OUTPUTWe have identified one property of a good error-correcting code: the codewords must be well separated in terms of their Hamming distance. Because theycomprise the rows of the code table, this property is called row separation. Thereis a second requirement that a good error-correcting code should fulfill: columnseparation. The Hamming distance between every pair of columns must belarge, as must the distance between each column <strong>and</strong> the complement of everyother column. In Table 7.1(b), the seven columns are separated from oneanother (<strong>and</strong> their complements) by at least 1 bit.Column separation is necessary because if two columns are identical(or if one is the complement of another), the corresponding classifiers willmake the same errors. Error correction is weakened if the errors arecorrelated—in other words, if many bit positions are simultaneously incorrect.The greater the distance between columns, the more errors are likely to becorrected.With fewer than four classes it is impossible to construct an effectiveerror-correcting code because good row separation <strong>and</strong> good column separationcannot be achieved simultaneously. For example, with three classesthere are only eight possible columns (2 3 ), four of which are complementsof the other four. Moreover, columns with all zeroes or all ones provideno discrimination. This leaves just three possible columns, <strong>and</strong> the resultingcode is not error correcting at all. (In fact, it is the st<strong>and</strong>ard “one-per-class”encoding.)If there are few classes, an exhaustive error-correcting code such as the onein Table 7.1(b) can be built. In an exhaustive code for k classes, the columnscomprise every possible k-bit string, except for complements <strong>and</strong> the trivial allzeroor all-one strings. Each code word contains 2 k-1 - 1 bits. The code is constructedas follows: the code word for the first class consists of all ones; that forthe second class has 2 k-2 zeroes followed by 2 k-2 - 1 ones; the third has 2 k-3 zeroesfollowed by 2 k-3 ones followed by 2 k-3 zeroes followed by 2 k-3 - 1 ones; <strong>and</strong> soon. The ith code word consists of alternating runs of 2 k-i zeroes <strong>and</strong> ones, thelast run being one short.With more classes, exhaustive codes are infeasible because the number ofcolumns increases exponentially <strong>and</strong> too many classifiers have to be built. Inthat case more sophisticated methods are employed, which can build a code withgood error-correcting properties from a smaller number of columns.Error-correcting output codes do not work for local learning algorithms suchas instance-based learners, which predict the class of an instance by looking atnearby training instances. In the case of a nearest-neighbor classifier, all outputbits would be predicted using the same training instance. The problem can becircumvented by using different attribute subsets to predict each output bit,decorrelating the predictions.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!