25.10.2016 Views

SAP HANA Predictive Analysis Library (PAL)

sap_hana_predictive_analysis_library_pal_en

sap_hana_predictive_analysis_library_pal_en

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>PAL</strong>_GMM_RESULTSMODEL_TBL:<br />

3.1.7 K-Means<br />

In predictive analysis, k-means clustering is a method of cluster analysis. The k-means algorithm partitions n<br />

observations or records into k clusters in which each observation belongs to the cluster with the nearest<br />

center. In marketing and customer relationship management areas, this algorithm uses customer data to<br />

track customer behavior and create strategic business initiatives. Organizations can thus divide their<br />

customers into segments based on variants such as demography, customer behavior, customer profitability,<br />

measure of risk, and lifetime value of a customer or retention probability.<br />

Clustering works to group records together according to an algorithm or mathematical formula that attempts<br />

to find centroids, or centers, around which similar records gravitate. The most common algorithm uses an<br />

iterative refinement technique. It is also referred to as Lloyd's algorithm:<br />

Given an initial set of k means m1, ..., mk, the algorithm proceeds by alternating between two steps:<br />

●<br />

●<br />

Assignment step: assigns each observation to the cluster with the closest mean.<br />

Update step: calculates the new means to be the center of the observations in the cluster.<br />

The algorithm repeats until the assignments no longer change.<br />

The k-means implementation in <strong>PAL</strong> supports multi-thread, data normalization, different distance level<br />

measurement, and cluster quality measurement (Silhouette). The implementation does not support<br />

categorical data, but this can be managed through data transformation. The first K and random K starting<br />

methods are supported.<br />

Support for Categorical Attributes<br />

If an attribute is of category type, it will be converted to a binary vector and then be used as a numerical<br />

attribute. For example, in the below table, "Gender" is of category type.<br />

Table 34:<br />

Customer ID Age Income Gender<br />

T1 31 10,000 Female<br />

T2 27 8,000 Male<br />

Because "Gender" has two distinct values, it will be converted into a binary vector with two dimensions:<br />

Table 35:<br />

Customer ID Age Income Gender_1 Gender_2<br />

T1 31 10,000 0 1<br />

T2 27 8,000 1 0<br />

<strong>SAP</strong> <strong>HANA</strong> <strong>Predictive</strong> <strong>Analysis</strong> <strong>Library</strong> (<strong>PAL</strong>)<br />

<strong>PAL</strong> Functions P U B L I C 63

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!