13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

340 CHAPTER 7 | TRANSFORMATIONS: ENGINEERING THE INPUT AND OUTPUTThere is some experimental evidence, using Naïve Bayes throughout as thelearner, that this bootstrapping procedure outperforms one that employs all thefeatures from both perspectives to learn a single model from the labeled data.It relies on having two different views of an instance that are redundant but notcompletely correlated. Various domains have been proposed, from spottingcelebrities in televised newscasts using video <strong>and</strong> audio separately to mobilerobots with vision, sonar, <strong>and</strong> range sensors. The independence of the viewsreduces the likelihood of both hypotheses agreeing on an erroneous label.EM <strong>and</strong> co-trainingOn datasets with two feature sets that are truly independent, experiments haveshown that co-training gives better results than using EM as described previously.Even better performance, however, can be achieved by combining the twointo a modified version of co-training called co-EM. Co-training trains two classifiersrepresenting different perspectives, A <strong>and</strong> B, <strong>and</strong> uses both to add newexamples to the training pool by choosing whichever unlabeled examples theyclassify most positively or negatively. The new examples are few in number <strong>and</strong>deterministically labeled. Co-EM, on the other h<strong>and</strong>, trains perspective A on thelabeled data <strong>and</strong> uses it to probabilistically label all unlabeled data. Next it trainsclassifier B on both the labeled data <strong>and</strong> the unlabeled data with classifier A’stentative labels, <strong>and</strong> then it probabilistically relabels all the data for use by classifierA. The process iterates until the classifiers converge. This procedure seemsto perform consistently better than co-training because it does not commit tothe class labels that are generated by classifiers A <strong>and</strong> B but rather reestimatestheir probabilities at each iteration.The range of applicability of co-EM, like co-training, is still limited by therequirement for multiple independent perspectives. But there is some experimentalevidence to suggest that even when there is no natural split of featuresinto independent perspectives, benefits can be achieved by manufacturing sucha split <strong>and</strong> using co-training—or, better yet, co-EM—on the split data. Thisseems to work even when the split is made r<strong>and</strong>omly; performance could surelybe improved by engineering the split so that the feature sets are maximallyindependent. Why does this work? Researchers have hypothesized that thesealgorithms succeed partly because the split makes them more robust to theassumptions that their underlying classifiers make.There is no particular reason to restrict the base classifier to Naïve Bayes.Support vector machines probably represent the most successful technology fortext categorization today. However, for the EM iteration to work it is necessarythat the classifier labels the data probabilistically; it must also be able to useprobabilistically weighted examples for training. Support vector machines caneasily be adapted to do both. We explained how to adapt learning algorithms to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!