13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.7 FURTHER READING 343Domingos (1997) describes how to derive a single interpretable model froman ensemble using artificial training examples. Bayesian option trees were introducedby Buntine (1992), <strong>and</strong> majority voting was incorporated into optiontrees by Kohavi <strong>and</strong> Kunz (1997). Freund <strong>and</strong> Mason (1999) introduced alternatingdecision trees; experiments with multiclass alternating decision treeswere reported by Holmes et al. (2002). L<strong>and</strong>wehr et al. (2003) developed logisticmodel trees using the LogitBoost algorithm.Stacked generalization originated with Wolpert (1992), who presented theidea in the neural network literature, <strong>and</strong> was applied to numeric prediction byBreiman (1996a). Ting <strong>and</strong> Witten (1997a) compared different level-1 modelsempirically <strong>and</strong> found that a simple linear model performs best; they alsodemonstrated the advantage of using probabilities as level-1 data. A combinationof stacking <strong>and</strong> bagging has also been investigated (Ting <strong>and</strong> Witten1997b).The idea of using error-correcting output codes for classification gained wideacceptance after a paper by Dietterich <strong>and</strong> Bakiri (1995); Ricci <strong>and</strong> Aha (1998)showed how to apply such codes to nearest-neighbor classifiers.Blum <strong>and</strong> Mitchell (1998) pioneered the use of co-training <strong>and</strong> developed atheoretical model for the use of labeled <strong>and</strong> unlabeled data from different independentperspectives. Nigam <strong>and</strong> Ghani (2000) analyzed the effectiveness <strong>and</strong>applicability of co-training, relating it to the traditional use of st<strong>and</strong>ard EM tofill in missing values. They also introduced the co-EM algorithm. Nigam et al.(2000) thoroughly explored how the EM clustering algorithm can use unlabeleddata to improve an initial classifier built by Naïve Bayes, as reported in theClustering for classification section. Up to this point, co-training <strong>and</strong> co-EMwere applied mainly to small two-class problems; Ghani (2002) used errorcorrectingoutput codes to address multiclass situations with many classes.Brefeld <strong>and</strong> Scheffer (2004) extended co-EM to use a support vector machinerather than Naïve Bayes. Seeger (2001) casts some doubt on whether these newalgorithms really do have anything to offer over traditional ones, properly used.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!