13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

342 CHAPTER 7 | TRANSFORMATIONS: ENGINEERING THE INPUT AND OUTPUTDougherty et al. (1995) give a brief account of supervised <strong>and</strong> unsuperviseddiscretization, along with experimental results comparing the entropy-basedmethod with equal-width binning <strong>and</strong> the 1R method. Frank <strong>and</strong> Witten (1999)describe the effect of using the ordering information in discretized attributes.Proportional k-interval discretization for Naïve Bayes was proposed by Yang <strong>and</strong>Webb (2001). The entropy-based method for discretization, including the useof the MDL stopping criterion, was developed by Fayyad <strong>and</strong> Irani (1993). Thebottom-up statistical method using the c 2 test is due to Kerber (1992), <strong>and</strong> itsextension to an automatically determined significance level is described by Liu<strong>and</strong> Setiono (1997). Fulton et al. (1995) investigate the use of dynamic programmingfor discretization <strong>and</strong> derive the quadratic time bound for a generalimpurity function (e.g., entropy) <strong>and</strong> the linear one for error-based discretization.The example used for showing the weakness of error-based discretizationis adapted from Kohavi <strong>and</strong> Sahami (1996), who were the first to clearly identifythis phenomenon.Principal components analysis is a st<strong>and</strong>ard technique that can be found inmost statistics textbooks. Fradkin <strong>and</strong> Madigan (2003) analyze the performanceof r<strong>and</strong>om projections. The TF ¥ IDF metric is described by Witten et al.(1999b).The experiments on using C4.5 to filter its own training data were reportedby John (1995). The more conservative approach of a consensus filter involvingseveral learning algorithms has been investigated by Brodley <strong>and</strong> Friedl(1996). Rousseeuw <strong>and</strong> Leroy (1987) describe the detection of outliers in statisticalregression, including the least median of squares method; they alsopresent the telephone data of Figure 7.6. It was Quinlan (1986) who noticedthat removing noise from the training instance’s attributes can decrease aclassifier’s performance on similarly noisy test instances, particularly at highernoise levels.Combining multiple models is a popular research topic in machine learningresearch, with many related publications. The term bagging (for “bootstrapaggregating”) was coined by Breiman (1996b), who investigated the propertiesof bagging theoretically <strong>and</strong> empirically for both classification <strong>and</strong> numeric prediction.Domingos (1999) introduced the MetaCost algorithm. R<strong>and</strong>omizationwas evaluated by Dietterich (2000) <strong>and</strong> compared with bagging <strong>and</strong> boosting.Bay (1999) suggests using r<strong>and</strong>omization for ensemble learning with nearestneighborclassifiers. R<strong>and</strong>om forests were introduced by Breiman (2001).Freund <strong>and</strong> Schapire (1996) developed the AdaBoost.M1 boosting algorithm<strong>and</strong> derived theoretical bounds for its performance. Later, they improved thesebounds using the concept of margins (Freund <strong>and</strong> Schapire 1999). Drucker(1997) adapted AdaBoost.M1 for numeric prediction. The LogitBoost algorithmwas developed by Friedman et al. (2000). Friedman (2001) describes how tomake boosting more resilient in the presence of noisy data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!