13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

10.4 LEARNING ALGORITHMS 413there are predefined values that can be used instead of integers: i is the numberof attributes, o the number of class values, a the average of the two, <strong>and</strong> t theirsum. The default, a, was used to generate Figure 10.20(a).The parameters learningRate <strong>and</strong> Momentum set values for these variables,which can be overridden in the graphical interface. A decay parameter causesthe learning rate to decrease with time: it divides the starting value by the epochnumber to obtain the current rate. This sometimes improves performance <strong>and</strong>may stop the network from diverging. The reset parameter automatically resetsthe network with a lower learning rate <strong>and</strong> begins training again if it is divergingfrom the answer (this option is only available if the graphical user interfaceis not used).The trainingTime parameter sets the number of training epochs. Alternatively,a percentage of the data can be set aside for validation (using validation-SetSize): then training continues until performance on the validation set startsto deteriorate consistently—or until the specified number of epochs is reached.If the percentage is set to zero, no validation set is used. The validationThresholdparameter determines how many consecutive times the validation set errorcan deteriorate before training is stopped.The nominalToBinaryFilter filter is specified by default in the MultilayerPerceptronobject editor; turning it off may improve performance on data in whichthe nominal attributes are really ordinal. The attributes can be normalized (withnormalizeAttributes), <strong>and</strong> a numeric class can be normalized too (with normalizeNumericClass):both may improve performance.Lazy classifiersLazy learners store the training instances <strong>and</strong> do no real work until classificationtime. IB1 is a basic instance-based learner (Section 4.7) which finds thetraining instance closest in Euclidean distance to the given test instance <strong>and</strong> predictsthe same class as this training instance. If several instances qualify as theclosest, the first one found is used. IBk is a k-nearest-neighbor classifier that usesthe same distance metric. The number of nearest neighbors (default k = 1) canbe specified explicitly in the object editor or determined automatically usingleave-one-out cross-validation, subject to an upper limit given by the specifiedvalue. Predictions from more than one neighbor can be weighted according totheir distance from the test instance, <strong>and</strong> two different formulas are implementedfor converting the distance into a weight. The number of traininginstances kept by the classifier can be restricted by setting the window sizeoption. As new training instances are added, the oldest ones are removed tomaintain the number of training instances at this size. KStar is a nearestneighbormethod with a generalized distance function based on transformations(Section 6.4, pages 241–242).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!