25.10.2016 Views

SAP HANA Predictive Analysis Library (PAL)

sap_hana_predictive_analysis_library_pal_en

sap_hana_predictive_analysis_library_pal_en

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.1.4 Cluster Assignment<br />

Cluster assignment is used to assign data to the clusters that were previously generated by some clustering<br />

methods such as K-means, DBSCAN (Density-Based Spatial Clustering of Applications with Noise), and SOM<br />

(Self-Organizing Maps).<br />

This algorithm requires that the corresponding clustering procedures save cluster information, or cluster<br />

model, which also includes the control parameters for consistency. It assumes that new data is from similar<br />

distribution as previous data, and will not update the cluster information.<br />

For clusters generated by K-means, distances between new data and cluster centers are calculated, and then<br />

the new data is assigned to the cluster with the smallest distance.<br />

For clusters generated by DBSCAN, all core objects are stored. For each piece of new data, the algorithm tries<br />

to find a core object in some formed cluster whose distance is less than the value of the RADIUS parameter. If<br />

such a core object is found, the new data is then assigned to the corresponding cluster, otherwise it is<br />

assigned to cluster -1, indicating that it is noise. It is possible that a piece of data can belong to more than one<br />

cluster, which can be further divided into the following two cases:<br />

●<br />

●<br />

If the number of core objects whose distances to the new data is less than the MINPTS parameter value,<br />

meaning that the new data is a border object, the new data is assigned to the cluster where there is a core<br />

object having the smallest distance to the new data.<br />

If the number of core objects whose distances to the new data is not less than MINPTS, which means the<br />

new data is also a core object, it is then assigned to cluster -2, indicating that it belongs to more than one<br />

cluster. In this case, re-running the DBSCAN function is highly suggested.<br />

For clusters generated by SOM, similar to K-means, the distances between new data and weight vector are<br />

calculated, and the new data is then assigned to the cluster with the smallest distance.<br />

Prerequisites<br />

●<br />

●<br />

●<br />

No missing or null data in the inputs.<br />

Data types must be identical to those in the corresponding clustering procedure.<br />

The data types of the ID columns in the data input table and the result output table must be identical.<br />

CLUSTERASSIGNMENT<br />

This function directly assigns data to clusters based on the previous cluster model, without running clustering<br />

procedure thoroughly. It currently supports the K-means, DBSCAN, and SOM clustering methods.<br />

Procedure Generation<br />

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE (‘AFL<strong>PAL</strong>’, ‘CLUSTERASSIGNMENT’,<br />

‘’, '', );<br />

38 P U B L I C<br />

<strong>SAP</strong> <strong>HANA</strong> <strong>Predictive</strong> <strong>Analysis</strong> <strong>Library</strong> (<strong>PAL</strong>)<br />

<strong>PAL</strong> Functions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!