13.07.2015 Views

Quantifying Social Vulnerability: A methodology for ... - EarthByte

Quantifying Social Vulnerability: A methodology for ... - EarthByte

Quantifying Social Vulnerability: A methodology for ... - EarthByte

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ule, which progressively refines the classification and can also be used to classify future data.The structure of a decision tree and its components; the root node, branch, decision andleaf nodes are shown in Figure 3.4. Once built, a decision tree can be used to classify aninstance by starting at the root node of the tree and moving through it until a terminal node isencountered. The class assigned to each terminal node then corresponds to a predicted outcome<strong>for</strong> the instance in question.Rule Based ClassifiersThe purpose of a classification technique is to find the appropriate classification model thatsummarizes the relationship between a set of attributes and the class. The classifier is definedby a set of rules. Decision tree induction is a supervised learning classifier which can beconverted into rules that represent a summary of the knowledge content of the data. The rulesin a classifier are determined by the path from the root to a leaf node of a decision tree, withone rule created <strong>for</strong> each branch along the path. All paths taken can be expressed as nested‘if-statements’, which can be interpreted as rules.Most decision tree induction algorithms are based on the original work of Hunt et al. [36].Other well known decision tree algorithms are CART [9], C4.5 [60] and OC1 [46]. Thesealgorithms differ in the way they handle missing attribute values, continuous attributes, thesplitting criteria and how they establish the size of the tree.PruningThe maximal tree will do the best possible prediction on the data set it was trained on, i.e.every instance will be correctly classified by following the generated rules. This is referredto as 100% predictive accuracy or a misclassification rate of 0.0. The misclassification rate isdefined as the ratio of instances that were classified wrongly to the total number of instancesn.However, a maximal tree is unlikely to classify new data very well because it has been tailoredto the idiosyncrasies of one particular data set. Data sets contain a certain amount of noisewhich can arise from errors in measuring the attribute and class values of an instance. Thereis also the uncertainty in the selection of the attribute set and not all the relevant properties ofan instance are known. If a classifier fits the instances too closely, it may fit noisy instancesinstead of a sample from the population. This is undesirable because the decision tree may notnecessarily fit other data sets from the same population equally well. This phenomenon is wellknown in data mining as ’over-fitting’ and poses a challenge to all predictive and regressionmodels [6]. The success of any predictive model depends on its ability to distinguish betweennoisy instances and patterns in the data.The remedy <strong>for</strong> over-fitting is the use of a separate test data set together with a pruningtechnique where parts of the tree are eliminated. The premise is that both the training and testdata sets will contain similar errors or noise. The misclassification error on the test set is thencalculated <strong>for</strong> every sub-tree (sharing the same root) and the subtree with the smallest error isselected <strong>for</strong> the final model.While the misclassification error with respect to the training data will decrease with thetree’s complexity, the error with respect to the test set will find an optimum somewhere betweenthe root of the tree and the maximal tree. When the size of the tree is too small, under-fittingoccurs. An under-fitted decision tree will per<strong>for</strong>m poorly on the training data set as well as anindependent data set. When the size of the tree is too large, over-fitting occurs. An over-fitteddecision tree will per<strong>for</strong>m perfectly on the training data set but poorly on the independent dataset. The pruned tree has the optimal size and is expected to per<strong>for</strong>m well on both training andtest data. Not only does this final pruned tree per<strong>for</strong>m well as a predictive model, the numberof instances supporting each subset will be larger than those of the maximal tree and the rulesderived from it will be less complex than those of the maximal tree.78

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!