13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10.4 LEARNING ALGORITHMS 405Table 10.5(continued)NameFunctionSMOregSequential minimal optimization algorithm for supportvector regressionVotedPerceptronVoted perceptron algorithmWinnowMistake-driven perceptron with multiplicative updatesLazy IB1 Basic nearest-neighbor instance-based learnerIBkk-nearest-neighbor classifierKStarNearest neighbor with generalized distance functionLBRLazy Bayesian Rules classifierLWLGeneral algorithm for locally weighted learningMisc. Hyperpipes Extremely simple, fast learner based onhypervolumes in instance spaceVFIVoting feature intervals method, simple <strong>and</strong> fastattributes using supervised discretization. NaiveBayesUpdateable is an incrementalversion that processes one instance at a time; it can use a kernel estimatorbut not discretization. NaiveBayesMultinomial implements themultinomial Bayes classifier (Section 4.2, page 95). ComplementNaiveBayesbuilds a Complement Naïve Bayes classifier as described by Rennie et al. (2003)(the TF ¥ IDF <strong>and</strong> length normalization transforms used in this paper can beperformed using the StringToWordVector filter).AODE (averaged, one-dependence estimators) is a Bayesian method that averagesover a space of alternative Bayesian models that have weaker independenceassumptions than Naïve Bayes (Webb et al., 2005). The algorithm may yieldmore accurate classification than Naïve Bayes on datasets with nonindependentattributes.BayesNet learns Bayesian networks under the assumptions made in Section6.7: nominal attributes (numeric ones are prediscretized) <strong>and</strong> no missing values(any such values are replaced globally). There are two different algorithms forestimating the conditional probability tables of the network. Search is doneusing K2 or the TAN algorithm (Section 6.7) or more sophisticated methodsbased on hill-climbing, simulated annealing, tabu search, <strong>and</strong> genetic algorithms.Optionally, search speed can be improved using AD trees (Section 6.7).There is also an algorithm that uses conditional independence tests to learn thestructure of the network; alternatively, the network structure can be loaded froman XML (extensible markup language) file. More details on the implementationof Bayesian networks in Weka can be found in Bouckaert (2004).You can observe the network structure by right-clicking the history item <strong>and</strong>selecting Visualize graph. Figure 10.18(a) shows the graph for the nominalversion of the weather data, which in fact corresponds to the Naïve Bayes result

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!