13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.7 FURTHER READING 341deal with weighted instances in Section 6.5 under Locally weighted linear regression(page 252). One way of obtaining probability estimates from support vectormachines is to fit a one-dimensional logistic model to the output, effectivelyperforming logistic regression as described in Section 4.6 on the output. Excellentresults have been reported for text classification using co-EM with thesupport vector machine (SVM) classifier. It outperforms other variants of SVM<strong>and</strong> seems quite robust to varying proportions of labeled <strong>and</strong> unlabeled data.The ideas of co-training <strong>and</strong> EM—<strong>and</strong> particularly their combination in theco-EM algorithm—are interesting, thought provoking, <strong>and</strong> have striking potential.But just what makes them work is still controversial <strong>and</strong> poorly understood.These techniques are the subject of current research: they have not yet enteredthe mainstream of machine learning <strong>and</strong> been harnessed for practical datamining.7.7 Further readingAttribute selection, under the term feature selection, has been investigated in thefield of pattern recognition for decades. Backward elimination, for example, wasintroduced in the early 1960s (Marill <strong>and</strong> Green 1963). Kittler (1978) surveysthe feature selection algorithms that have been developed for pattern recognition.Best-first search <strong>and</strong> genetic algorithms are st<strong>and</strong>ard artificial intelligencetechniques (Winston 1992, Goldberg 1989).The experiments that show the performance of decision tree learners deterioratingwhen new attributes are added are reported by John (1997), who givesa nice explanation of attribute selection. The idea of finding the smallest attributeset that carves up the instances uniquely is from Almuallin <strong>and</strong> Dietterich(1991, 1992) <strong>and</strong> was further developed by Liu <strong>and</strong> Setiono (1996). Kibler <strong>and</strong>Aha (1987) <strong>and</strong> Cardie (1993) both investigated the use of decision tree algorithmsto identify features for nearest-neighbor learning; Holmes <strong>and</strong> Nevill-Manning (1995) used 1R to order features for selection. Kira <strong>and</strong> Rendell (1992)used instance-based methods to select features, leading to a scheme calledRELIEF for Recursive Elimination of Features. Gilad-Bachrach et al. (2004) showhow this scheme can be modified to work better with redundant attributes. Thecorrelation-based feature selection method was developed by Hall (2000).The use of wrapper methods for feature selection is due to John et al. (1994)<strong>and</strong> Kohavi <strong>and</strong> John (1997), <strong>and</strong> genetic algorithms have been applied withina wrapper framework by Vafaie <strong>and</strong> DeJong (1992) <strong>and</strong> Cherkauer <strong>and</strong> Shavlik(1996). The selective Naïve Bayes learning method is due to Langley <strong>and</strong> Sage(1994). Guyon et al. (2002) present <strong>and</strong> evaluate the recursive feature eliminationscheme in conjunction with support vector machines. The method of racedsearch was developed by Moore <strong>and</strong> Lee (1994).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!