- Page 1: Nathalie Japkowicz Mohak Shah Unive
- Page 5 and 6: Example I: Which Classifier is be8e
- Page 7 and 8: Example III: What do our results me
- Page 9 and 10: Book on which the tutorial is based
- Page 11 and 12: What these steps depend on These s
- Page 14 and 15: Performance Measures Outline Ontol
- Page 16 and 17: Confusion Matrix-‐Based Perform
- Page 18 and 19: Pairs of Measures and Compounded Me
- Page 20 and 21: Some issues with performance measur
- Page 22 and 23: Some issues with performance measur
- Page 24 and 25: Graphical Measures ROC Analysis, AU
- Page 26 and 27: AUC 26
- Page 28 and 29: Recent Developments I: Smooth ROC C
- Page 30 and 31: Recent Developments II: The H Measu
- Page 32 and 33: ProbabilisNc Measures I: RMSE The
- Page 34 and 35: Other Measures I: A MulN-‐Crite
- Page 36 and 37: True class Pos Neg Yes 82 17 No 12
- Page 38 and 39: Illustration on a Multiclass domain
- Page 40 and 41: Mul$ple Annota$ons 40
- Page 42 and 43: Such measurements are also desired
- Page 44 and 45: General Agreement StaNsNc 44
- Page 46: MulNple raters over mulNple classes
- Page 49 and 50: Hold-‐out approach Set aside a
- Page 51 and 52: A Tighter bound Based on binomial
- Page 53 and 54:
Binomial vs. Gaussian assumpNons B
- Page 55 and 56:
Hold-‐out sample size bound Th
- Page 57 and 58:
What implicitly guides re-‐samp
- Page 59 and 60:
An ontology of error esNmaNon techn
- Page 61 and 62:
Simple Resampling: Some variaNons o
- Page 63 and 64:
ObservaNons Leave-‐One-‐Ou
- Page 65 and 66:
MulNple Resampling: Bootstrapping
- Page 67 and 68:
MulNple Resampling: ε0 Bootstrappi
- Page 69 and 70:
Discussion Bootstrap can be useful
- Page 71:
What to watch out when selecNng err
- Page 74 and 75:
StaNsNcal Significance TesNng Stat
- Page 76 and 77:
Choosing a StaNsNcal Test There ar
- Page 78 and 79:
Issues with hypothesis tesNng: But
- Page 80 and 81:
Parametric vs. Non-‐parametric
- Page 82 and 83:
Comparing 2 algorithms on a single
- Page 84 and 85:
Comparing 2 algorithms on a single
- Page 86 and 87:
Effect size A typical interpretati
- Page 88 and 89:
Comparing 2 algorithms on a single
- Page 90 and 91:
Comparing 2 algorithms on a single
- Page 92 and 93:
Comparing 2 algorithms on a single
- Page 94 and 95:
Comparing 2 algorithms on a mulNple
- Page 96 and 97:
Comparing 2 algorithms on a mulNple
- Page 98 and 99:
Comparing mulNple algorithms on a m
- Page 100 and 101:
IllustraNon of the Friedman test Do
- Page 102 and 103:
Nemenyi Test: IllustraNon Computin
- Page 104 and 105:
ConsideraNons to keep in mind while
- Page 106 and 107:
Pros and Cons of Repository Data P
- Page 108 and 109:
Pros and Cons of Web-‐Based Exc
- Page 110 and 111:
EvaluaNon Space Mapping Artificial
- Page 112 and 113:
112
- Page 114 and 115:
Where to look for evaluaNon metrics
- Page 116 and 117:
Where to look for StaNsNcal Tests?
- Page 118 and 119:
Some Concluding Remarks 118
- Page 120:
References Too many to put down he