anytime algorithms for learning anytime classifiers saher ... - Technion

More documents

Recommendations

Info

Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 the size of a tree and its accuracy. To test the significance of the correlation, we used the non-parametric Spearman correlation test on each of the samples. 2 Spearman’s coefficient ρ measures the monotonic association of two variables, without making any assumptions about their frequency distribution. For a paired sample of X and Y , ρ is defined as 1−6 � d 2 i /(n(n2 −1)), where di is the difference in the statistical rank of xi and yi. There is a special correction for this formula in the presence of ties. Table B.1 lists summarizing statistics for the RTG, SID3, and LSID3 samplers. The validity of Occam’s empirical principle, tested by Spearman’s method, is listed in the rightmost column. For each sampling method, for each dataset, we count how many times, out of the 10 folds, the null hypothesis H0 (which states that the variables are not correlated) can be rejected at an α = 5% significance level, against the alternative hypothesis that the correlation is negative. The results indicate that when random sampling (RTG) in used, Occam’s empirical principle, as measured by Spearman’s test, is valid for almost all problems, except for Scale. The results for the SID3 sampling method indicate that even when focusing on smaller trees, simplicity is beneficial as a bias for accuracy. Again, except for the Scale dataset, there is a strong inverse correlation between size and accuracy. The numbers indicate that the correlation is weaker than the RTG case, yet still significant across most domains. The LSID3 samples focus on very small trees. In several cases, LSID3 could reach the smallest tree possible. Again the negative correlation was found to be significant for most domains. 3 However, the number of cases where the null hypothesis could not be rejected is higher than in the previous samplers. One possible reason for this phenomenon is that LSID3 samples trees from a much tighter size-range. Hence, there is an increased probability for finding two trees, T1 and T2, with the size and accuracy of T1 being greater than T2. As indicated by the frequency curves, the different sampling methods cover different portions of the space, but sometimes overlap. An interesting question is whether the conclusions hold if we analyze all samples together. Note that it is not statistically valid to merge the samples of RTG, SID3, and LSID3 due to their different distributions. Nevertheless, we measured the correlation statistics for the combined samples and found that the results are very similar, with a strong negative correlation between the size of a tree and its accuracy. To illustrate the correlation between the size of a tree and its accuracy, we grouped the trees from a single run into bins, according to their size, and calculated the average accuracy for each bin. We did this for each of the sampling 2All statistics were computed using the The R Project package (R Development Core Team, 2005). 3Note that in some cases, significance tests could not be computed due to the large number of ties (indicated by na in the table). 150
Technion - Computer Science Department - Ph.D. Thesis PHD-2008-12 - 2008 Table B.2: Testing Occam’s empirical principle when the trees are pruned. For each method we report the accuracy, tree size, percentage of pruned leaves, and Spearman’s correlation coefficient (ρ) averaged over all 10 partitions. We also report the number of times (out of 10) that a negative correlation between the size of the unpruned trees and their accuracy after pruning was found to be statistically significant with p = 0.95. RTG SID3 LSID3 Dataset Acc. Size %P ρ √ Acc. Size %P ρ √ Acc. Size %P ρ √ Breast-w 93±3 128±14 82 -0.1 9 93±3 108±11 77 -0.1 7 94±1 77±4 70 -0.3 7 Bupa 63±7 213±10 86 0 7 65±7 94±6 46 0 5 62±7 69±3 11 0 2 Car 81±3 647±77 93 -0.6 10 84±4 520±95 87 -0.7 10 92±2 285±13 56 -0.3 9 Cleveland 55±6 188±7 73 0 5 53±7 134±7 48 0 6 47±7 98±5 12 -0.1 7 Corral 75±20 15±3 67 -0.1 9 78±20 10±2 46 -0.3 10 86±13 7±1 9 0 7 Glass 59±9 135±8 73 -0.1 10 64±9 57±5 25 -0.2 10 68±8 39±3 5 -0.1 9 Hungerian 78±5 125±10 92 0 5 76±6 65±6 65 0 5 71±7 47±3 23 -0.1 5 Iris 92±7 39±9 82 -0.2 10 94±4 12±2 47 -0.1 7 94±3 8±0 20 -0.1 5 Monks-1 91±6 203±42 89 -0.6 10 95±6 113±55 77 -0.7 10 100±0 28±4 2 -0.1 na Monks-2 66±2 294±9 97 0 6 66±2 289±8 97 0 5 67±4 259±3 78 0 4 Monks-3 94±5 171±46 92 -0.6 10 97±3 77±33 84 -0.6 10 99±0 38±2 65 -0.2 na Mux-20 57±6 388±14 84 -0.1 10 58±7 249±13 61 -0.2 10 88±12 89±35 32 -0.9 10 Nursery 88±2 3271±551 97 -0.7 10 92±2 1583±295 90 -0.7 10 96±1 656±54 61 -0.5 10 Scale 69±4 394±11 90 0 3 69±4 389±11 90 0 4 69±4 352±5 89 0.2 0 Splice 63±5 1977±101 91 -0.6 10 67±5 1514±112 81 -0.6 10 92±2 355±23 58 -0.4 10 Tic-tac 78±4 468±34 88 -0.3 10 81±4 311±30 77 -0.4 10 87±3 166±11 36 -0.1 6 Voting 94±5 52±12 92 -0.3 10 96±2 26±5 89 -0.1 8 96±1 15±2 79 0 5 Wine 82±10 73±12 84 -0.2 10 91±7 13±3 22 -0.2 9 92±4 7±1 8 -0.1 6 Xor-5 55±12 136±8 85 -0.2 10 56±13 108±11 77 -0.4 10 97±7 39±11 18 -0.7 10 Zoo 90±8 24±5 53 -0.2 10 91±7 18±4 34 -0.2 9 94±4 11±1 5 0 na methods. Bins with less than 5 observations were discarded. Figure B.5 plots the results for the Nursery, Glass, and Multiplexer-20 datasets. The error bars represent confidence intervals, with α = 5%. Again, the graphs show a strong correlation between size and accuracy, confirming Occam’s empirical principle. For the Nursery and Multiplexer-20 datasets the correlation is strong for all 3 samplers. For Glass, the correlation is weaker when the trees are very small. These graphs, which represent each size range in its own bin, indicate that the positive support we showed for Occam’s principle is not a result of a size bias in our sampling methods. B.3.2 Pruned Decision Trees Formally, Occam’s empirical principle, as defined in Section B.1, is applicable only to consistent hypotheses. Most decision tree learners, however, do not necessarily 151
Page 1 and 2:
Technion - Computer Science Departm
Page 3 and 4:
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116: Technion - Computer Science Departm
Page 165: Technion - Computer Science Departm
show all

anytime algorithms for learning anytime classifiers saher ... - Technion

Create successful ePaper yourself

Delete template?

Save as template?