A Comparison of Decision Tree Ensemble Creation Techniques Ç

More documents

Recommendations

Info

178 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 1, JANUARY 2007TABLE 5Number of Trees and Test Set Accuracy of the Stopping Criteria for Random Forests and Baggingestimate quirks on data sets with a small number of examples. Smalldata sets (number of examples < 1,000) can often have a very lowerror estimate with a rather small number of decision trees (50 to100), but then the addition of more trees results in a greater error ratein both the out-of-bag error and the test set error, as might be shownin a 10-fold cross-validation. This behavior is contrary to manyexperiments which have shown that test set error steadily decreaseswith an increasing number of classifiers until it plateaus. Wespeculate that this is a result of instability in the predictions leadingto a “lucky guess” by the ensemble for such data sets. Since thedecision to stop building additional classifiers is more effective, in atime-saving sense, for large data sets, we believe it is more importantto concentrate on data sets with a larger number of examples.We have developed an algorithm which appears to provide areasonable solution to the problem of deciding when enoughclassifiers have been created for an ensemble. It works by firstsmoothing the out-of-bag error graph with a sliding window inorder to reduce the variance. We have chosen a window size of 5for our experiments. After the smoothing has been completed, thealgorithm takes windows of size 20 on the smoothed data pointsand determines the maximum accuracy within that window. Itcontinues to process windows of size 20 until the maximumaccuracy within that window no longer increases. At this point, thestopping criterion has been reached and the algorithm returns theensemble with the maximum raw accuracy from within thatwindow. The algorithm is shown in Algorithm 1.Algorithm 1 Algorithm for deciding when to stop buildingclassifiers1: SlideSize ( 5, SlideWindowSize ( 5, BuildSize ( 202: A½nŠ ( Raw Ensemble accuracy with n trees3: S½nŠ ( Average Ensemble accuracy with n trees over theprevious SlideWindowSize trees4: W½nŠ ( Maximum smoothed value5: repeat6: Add (BuildSize) more trees to the ensemble7: NumTrees ¼ NumTrees þ BuildSize//Update A½Š with raw accuracy estimates obtained fromout-of-bag error8: for x ( NumTrees BuildSize to NumTrees do9: A½xŠ ( VotedAccuracy(T ree 1 ...Tree x )10: end for//Update S½Š with averaged accuracy estimates11: for x ( NumTrees BuildSize to NumTrees do12: S½xŠ ( Average(A½x SlideSizeŠ ...A½xŠ)13: end for//Update maximum smoothed accuracy within window14: W[NumT rees=BuildSize 1]( maxðS½NumTrees BuildSizeŠ ...S½NumTreesŠ)15: until ðW½NumT rees=BuildSize 1Š W½NumT rees=BuildSize 2ŠÞ16: Stop at tree argmax j ðA½jŠjj 2½NumTrees 2 BuildSizeŠ ...½NumTrees BuildSizeŠÞ6.2 ExperimentsWe compare the stopping points and the resulting test set accuracyof ensembles built out to 2,000 trees using Random Forests-lg and a10-fold cross-validation. For this comparison we examine 1) thestopping point of our algorithm, 2) the stopping point by taking theminimum out-of-bag error over all 2,000 trees, and 3) an oraclealgorithm which looks at the lowest observed error on the test setover the 2,000 created trees (as trees are added sequentially).Thirteen of the previously used data sets with greater than1,000 examples are used. The results are shown in Table 5.For most data sets, the out-of-bag error continues to decreaselong into the training stage. This often does not result in anyimprovement of test set performance. Across all 13 data sets the totalgain by using the minimum out-of-bag error rather than ouralgorithm was only 0.06 percent on average. Comparing ouralgorithm to the oracle, the accuracy loss is less than 0.25 percentper data set. In comparing the number of trees used, our methoduses many fewer trees than the other methods. On average, we use1,140 fewer trees compared to the minimum out-of-bag error and755 fewer trees compared to the oracle method. While thesenumbers are clearly influenced by the maximum number of treeschosen to build, it is also evident that looking at the maximum outof-bagaccuracy causes the algorithm to continue building a largenumber of trees.We have also tested this method on the bagged trees without theuse of random forests. We generated half (1,000) the number of thetrees used in the previous experiment in order to shorten thepreviously observed large over estimation on the number of treesusing the minimum out-of-bag error alone and to reduce thetraining time. The results for this experiment are shown in Table 5.The use of our algorithm results in an average net loss of 0.12 percentper data set compared to the minimum out-of-bag error, while using431 fewer trees. Compared to the oracle method, there is a net loss of0.25 percent per data set (consistent with the previous experiment)while using 442 fewer trees.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 1, JANUARY 2007 179TABLE 6Test Set Accuracy Results Using a Third of the Trees Chosen in Table 5Based on these results, we believe it is possible to choose anacceptable stopping point while the ensemble is being built. Inexperiments with our algorithm, it has not shown itself to be overlysensitive to the parameters of the sliding window size and thebuilding window size. On average, the number of trees built inexcess for the purpose of choosing the stopping point in ouralgorithm, will be half of the building window size.When bagging a data set, the probability of any particularexample being included in the bag is slightly less than two-thirds,meaning only about one-third of the examples are out-of-bag. Putanother way, for each example in the training set, only about onethirdof the trees in the ensemble vote on that example. Therefore,the number of trees we have chosen to stop at may be as many asthree times the amount necessary for equivalent performance on atest set consisting of all unseen examples. For this reason, weinclude the accuracy results obtained by using a random one-thirdof the number of trees chosen to stop with in the previousexperiments. These results are shown in Table 6. Figs. 1 and 2demonstrate the relationship of out-of-bag error and test set errorfor a given number of trees in the full ensemble. Fig. 2 is a worstcaseresult, with oob error decreasing but overall error beingminimal early and higher with more trees before stabilizing.Looking at the accuracy with one-third of the number of treesshows mixed results. Though there are some data sets unaffectedby the change, other data sets, especially the larger sized ones,benefit from the greater number of trees. We believe that ouralgorithm, which stops at the first window at which accuracy nolonger increases, compensates for what might otherwise requirethree times the number of trees to decide.7 CONCLUSIONSThis paper compares a variant of the randomized C4.5 methodintroduced by Dietterich [7], random subspaces [5], random forests[6], AdaBoost.M1W [2], and bagging. A 10-fold cross validation and5 2-fold cross validation are used in the comparison. The accuracyof the various ensemble building approaches was compared withbagging using OpenDT to build unpruned trees. The comparisonwas done on 57 data sets. This is the largest comparison of ensembletechniques that we know of, in terms of number of data sets ornumber of techniques. This is also the most rigorous comparison, inthe sense of employing the cross-validation test suggested byAlpaydin in addition to the standard 10-fold cross-validation andthe Friedman-Holm test on the average rank.We found that some of the well-known ensemble techniquesrarely provide a statistically significant advantage over theaccuracy achievable with standard bagging on individual datasets. We found that boosting-by-resampling results in betteraccuracy with a much larger ensemble size than has generallyFig. 1. Out-of-bag accuracy versus test set accuracy results as classifiers areadded to the ensemble for satimage.Fig. 2. Out-of-bag accuracy versus test set accuracy results as classifiers areadded to the ensemble for segment.
Page 2 and 3: 174 IEEE TRANSACTIONS ON PATTERN AN
Page 4 and 5: 176 IEEE TRANSACTIONS ON PATTERN AN
Page 8: 180 IEEE TRANSACTIONS ON PATTERN AN

A Comparison of Decision Tree Ensemble Creation Techniques Ç

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?