13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

234 CHAPTER 6 | IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMESRadial basis function networksAnother popular type of feedforward network is the radial basis function (RBF)network. It has two layers, not counting the input layer, <strong>and</strong> differs from amultilayer perceptron in the way that the hidden units perform computations.Each hidden unit essentially represents a particular point in input space, <strong>and</strong> itsoutput, or activation, for a given instance depends on the distance between itspoint <strong>and</strong> the instance—which is just another point. Intuitively, the closer thesetwo points, the stronger the activation. This is achieved by using a nonlineartransformation function to convert the distance into a similarity measure. Abell-shaped Gaussian activation function, whose width may be different for eachhidden unit, is commonly used for this purpose. The hidden units are calledRBFs because the points in instance space for which a given hidden unit producesthe same activation form a hypersphere or hyperellipsoid. (In a multilayerperceptron, this is a hyperplane.)The output layer of an RBF network is the same as that of a multilayer perceptron:it takes a linear combination of the outputs of the hidden units <strong>and</strong>—in classification problems—pipes it through the sigmoid function.The parameters that such a network learns are (a) the centers <strong>and</strong> widths ofthe RBFs <strong>and</strong> (b) the weights used to form the linear combination of the outputsobtained from the hidden layer. A significant advantage over multilayer perceptronsis that the first set of parameters can be determined independently ofthe second set <strong>and</strong> still produce accurate classifiers.One way to determine the first set of parameters is to use clustering, withoutlooking at the class labels of the training instances at all. The simple k-meansclustering algorithm described in Section 4.8 can be applied, clustering eachclass independently to obtain k basis functions for each class. Intuitively, theresulting RBFs represent prototype instances. Then the second set of parameterscan be learned, keeping the first parameters fixed. This involves learning alinear model using one of the techniques we have discussed (e.g., linear or logisticregression). If there are far fewer hidden units than training instances, thiscan be done very quickly.A disadvantage of RBF networks is that they give every attribute the sameweight because all are treated equally in the distance computation. Hence theycannot deal effectively with irrelevant attributes—in contrast to multilayer perceptrons.Support vector machines share the same problem. In fact, supportvector machines with Gaussian kernels (i.e., “RBF kernels”) are a particular typeof RBF network, in which one basis function is centered on every traininginstance, <strong>and</strong> the outputs are combined linearly by computing the maximummargin hyperplane. This has the effect that only some RBFs have a nonzeroweight—the ones that represent the support vectors.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!