13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.3 DIVIDE-AND-CONQUER: CONSTRUCTING DECISION TREES 103Table 4.6The weather data with identification codes.ID code Outlook Temperature Humidity Windy Playa sunny hot high false nob sunny hot high true noc overcast hot high false yesd rainy mild high false yese rainy cool normal false yesf rainy cool normal true nog overcast cool normal true yesh sunny mild high false noi sunny cool normal false yesj rainy mild normal false yesk sunny mild normal true yesl overcast mild high true yesm overcast hot normal false yesn rainy mild high true noID codea b c ... m nno no yes yes noFigure 4.5 Tree stump for the ID code attribute.Table 4.6 gives the weather data with this extra attribute. Branching on IDcode produces the tree stump in Figure 4.5. The information required to specifythe class given the value of this attribute isinfo( [ 0,1]) + info( [ 0,1]) + info( [ 1,0]) + ... + info( [ 1,0]) + info( [ 0,1]),which is zero because each of the 14 terms is zero. This is not surprising: the IDcode attribute identifies the instance, which determines the class without anyambiguity—just as Table 4.6 shows. Consequently, the information gain of thisattribute is just the information at the root, info([9,5]) = 0.940 bits. This isgreater than the information gain of any other attribute, <strong>and</strong> so ID code willinevitably be chosen as the splitting attribute. But branching on the identificationcode is no good for predicting the class of unknown instances <strong>and</strong> tellsnothing about the structure of the decision, which after all are the twin goals ofmachine learning.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!