13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

304 CHAPTER 7 | TRANSFORMATIONS: ENGINEERING THE INPUT AND OUTPUTattributes, it will be easy to learn how to tell the classes apart with a simple decisiontree or rule algorithm. Discretizing a2 is no problem. For a1, however, thefirst <strong>and</strong> last intervals will have opposite labels (dot <strong>and</strong> triangle, respectively).The second will have whichever label happens to occur most in the region from0.3 through 0.7 (it is in fact dot for the data in Figure 7.4). Either way, this labelmust inevitably be the same as one of the adjacent labels—of course this is truewhatever the class probability happens to be in the middle region. Thus this discretizationwill not be achieved by any method that minimizes the error counts,because such a method cannot produce adjacent intervals with the same label.The point is that what changes as the value of a1 crosses the boundary at 0.3is not the majority class but the class distribution. The majority class remainsdot. The distribution, however, changes markedly, from 100% before the boundaryto just over 50% after it. And the distribution changes again as the boundaryat 0.7 is crossed, from 50% to 0%. Entropy-based discretization methodsare sensitive to changes in the distribution even though the majority class doesnot change. Error-based methods are not.Converting discrete to numeric attributesThere is a converse problem to discretization. Some learning algorithms—notably the nearest-neighbor instance-based method <strong>and</strong> numeric predictiontechniques involving regression—naturally h<strong>and</strong>le only attributes that arenumeric. How can they be extended to nominal attributes?In instance-based learning, as described in Section 4.7, discrete attributes canbe treated as numeric by defining the “distance” between two nominal valuesthat are the same as 0 <strong>and</strong> between two values that are different as 1—regardlessof the actual values involved. Rather than modifying the distance function,this can be achieved using an attribute transformation: replace a k-valuednominal attribute with k synthetic binary attributes, one for each value indicatingwhether the attribute has that value or not. If the attributes have equalweight, this achieves the same effect on the distance function. The distance isinsensitive to the attribute values because only “same” or “different” informationis encoded, not the shades of difference that may be associated with thevarious possible values of the attribute. More subtle distinctions can be made ifthe attributes have weights reflecting their relative importance.If the values of the attribute can be ordered, more possibilities arise. For anumeric prediction problem, the average class value corresponding to eachvalue of a nominal attribute can be calculated from the training instances <strong>and</strong>used to determine an ordering—this technique was introduced for modeltrees in Section 6.5. (It is hard to come up with an analogous way of orderingattribute values for a classification problem.) An ordered nominal attributecan be replaced with an integer in the obvious way—but this implies not just

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!