13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

514 INDEXimplementation—real-world schemes(continued)numeric prediction, 243–254See also individual subject headingsinaccurate values, 59–60. See also cost of errors;data cleaning; error rateincremental algorithms, 346incrementalClassifier, 434IncrementalClassifierEvaluator, 431incremental clustering, 255–260incremental learning in Weka, 433–435incremental reduced-error pruning, 203, 205independent attributes, 267index(), 472induction, 29inductive logic programming, 48, 60, 75, 351Induct system, 214industrial usage. See implementation—realworldschemesinferring rudimentary rules, 84–88InfoGainAttributeEval, 422–423informational loss function, 159–160, 161information-based heuristic, 201information extraction, 354information gain, 99information retrieval, 171information value, 102infrequent words, 353inner cross-validation, 286input, 41–60ARFF format, 53–55assembling the data, 52–53attribute, 49–52attribute types, 56–57concept, 42–45data engineering, 286–287, 288–315. See alsoengineering input <strong>and</strong> outputdata preparation, 52–60getting to know your data, 60inaccurate values, 59–60instances, 45missing values, 58sparse data, 55–56input layer, 224instance in Weka, 450Instance, 451instance-based learning, 78, 128–136, 235–243ball tree, 133–135distance functions, 128–129, 239–242finding nearest neighbors, 129–135generalized distance functions, 241–242generalized exemplars, 236kD-trees, 130–132missing values, 129pruning noisy exemplars, 236–237redundant exemplars, 236simple method, 128–136, 235–236weighting attributes, 237–238Weka, 413–414instance-based learning methods, 291instance-based methods, 34instance-based representation, 76–80instance filters in Weka, 394, 400–401, 403instances, 45Instances, 451instance space, 79instance weights, 166, 321–322integer-valued attributes, 49intensive care patients, 29interval, 88interval quantities, 50–51intrusion detection systems, 357invertSelection, 382in vitro fertilization, 3iris dataset, 15–16iris setosa, 15iris versicolor, 15iris virginica, 15ISO-8601 combined date <strong>and</strong> time format, 55item, 113item sets, 113, 114–115iterative distance-based clustering, 137–138JJ4.8, 373–377J48, 404, 450Javadoc indices, 456JDBC database, 445JRip, 409junk email filtering, 356–357

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!