25.10.2016 Views

SAP HANA Predictive Analysis Library (PAL)

sap_hana_predictive_analysis_library_pal_en

sap_hana_predictive_analysis_library_pal_en

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.2.3 C4.5 Decision Tree<br />

A decision tree is used as a classifier for determining an appropriate action or decision among a<br />

predetermined set of actions for a given case. A decision tree helps you to effectively identify the factors to<br />

consider and how each factor has historically been associated with different outcomes of the decision. A<br />

decision tree uses a tree - like structure of conditions and their possible consequences. Each node of a<br />

decision tree can be a leaf node or a decision node.<br />

●<br />

●<br />

Leaf node: mentions the value of the dependent (target) variable.<br />

Decision node: contains one condition that specifies some test on an attribute value. The outcome of the<br />

condition is further divided into branches with sub-trees or leaf nodes.<br />

As a classification algorithm, C4.5 builds decision trees from a set of training data, using the concept of<br />

information entropy. The training data is a set of already classified samples. At each node of the tree, C4.5<br />

chooses one attribute of the data that most effectively splits it into subsets in one class or the other. Its<br />

criterion is the normalized information gain (difference in entropy) that results from choosing an attribute for<br />

splitting the data. The attribute with the highest normalized information gain is chosen to make the decision.<br />

The C4.5 algorithm then proceeds recursively until meeting some stopping criteria such as the minimum<br />

number of cases in a leaf node.<br />

The C4.5 decision tree functions implemented in <strong>PAL</strong> support both discrete and continuous values. In <strong>PAL</strong><br />

implementation, the REP (Reduced Error Pruning) algorithm is used as pruning method.<br />

Prerequisites<br />

●<br />

●<br />

●<br />

●<br />

The column order and column number of the predicted data are the same as the order and number used in<br />

tree model building.<br />

The last column of the training data is used as a predicted field and is of discrete type. The predicted data<br />

set has an ID column.<br />

The table used to store the tree model is a column table.<br />

The target column of training data must not have null values, and other columns should have at least one<br />

valid value (not null).<br />

Note<br />

C4.5 decision tree treats null as a special value.<br />

CREATEDTWITHC45<br />

This function creates a decision tree from the input training data.<br />

Procedure Generation<br />

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE (‘AFL<strong>PAL</strong>’, ‘CREATEDTWITHC45’,<br />

‘’, '', );<br />

138 P U B L I C<br />

<strong>SAP</strong> <strong>HANA</strong> <strong>Predictive</strong> <strong>Analysis</strong> <strong>Library</strong> (<strong>PAL</strong>)<br />

<strong>PAL</strong> Functions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!