13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INDEX 513Fisher, R. A., 15flat file, 45F-measure, 172FN (false negatives), 162folds, 150forward pruning, 34, 192forward selection, 292, 294forward stagewise additive modeling, 325–327Fourier analysis, 25FP (false positives), 162freedom, degrees of 93, 155functional dependencies, 350functions in Weka, 404–405, 409–410Ggain ratio, 104GainRatioAttributeEval, 423gambling, 160garbage in, garbage out. See cost of errors; datacleaning; error rateGaussian-distribution assumption, 92Gaussian kernel function, 252generalization as search, 30–35bias, 32–35enumerating concept space, 31–32generalized distance functions, 241–242generalized exemplars, 236general-to-specific search bias, 34genetic algorithms, 38genetic algorithm search procedures, 294,341GeneticSearch, 424getOptions(), 482getting to know your data, 60global discretization, 297globalInfo(), 472global optimization, 205–207Gosset, William, 184gradient descent, 227, 229, 230Grading, 417graphical models, 283GraphViewer, 431gray bar in margin of textbook (optionalsections), 30greedy search, 33GreedyStepwise, 423–424growing set, 202HHamming distance, 335h<strong>and</strong>-labeled data, 338hapax legomena, 310hard instances, 322hash table, 280hazard detection system, 23–24hidden attributes, 272hidden layer, 226, 231, 232hidden units, 226, 231, 234hierarchical clustering, 139highly-branching attribute, 86high-performance rule inducers, 188histogram equalization, 298historical literary mystery, 358holdout method, 146, 149–150, 333homel<strong>and</strong> defense, 357HTML, 355hypermetrope, 13hyperpipes, 139Hyperpipes, 414hyperplane, 124, 125hyperrectangle, 238–239hyperspheres, 133hypertext markup language (HTML), 355hypothesis testing, 29IIB1, 413IB3, 237IBk, 413ID3, 105Id3, 404identification code, 86, 102–104implementation—real-world schemes,187–283Bayesian networks, 271–283classification rules, 200–214clustering, 254–271decision tree, 189–199instance-based, 236–243linear models, 214–235

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!