07.12.2012 Views

Play-Persona: Modeling Player Behaviour in Computer Games

Play-Persona: Modeling Player Behaviour in Computer Games

Play-Persona: Modeling Player Behaviour in Computer Games

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7. RESULTS<br />

This section presents the ma<strong>in</strong> f<strong>in</strong>d<strong>in</strong>gs of the cluster<strong>in</strong>g approaches applied to the data. A pre-<br />

process<strong>in</strong>g analysis of the data is complementary to and followed by the design of ESOM approach<br />

for unsupervised learn<strong>in</strong>g of the data and the identification of the different player styles.<br />

7.1 Pre-process<strong>in</strong>g and Initial Cluster Analysis<br />

All six features extracted are uniformly normalized <strong>in</strong>to [0; 1] before any cluster<strong>in</strong>g analysis is<br />

followed. Note that the cause of death features (Do, De and Df) are already normalized <strong>in</strong> [0; 1]<br />

be<strong>in</strong>g percentages of the total number of deaths.<br />

To get some first <strong>in</strong>sight of the possible number of data clusters existent <strong>in</strong> the data, we apply the<br />

k-means cluster<strong>in</strong>g algorithm to the normalized data for all k values less than or equal to 20. The<br />

number of player observations (6- dimensional vector samples) and the sum of the Euclidean<br />

distances between each player <strong>in</strong>stance and its correspond<strong>in</strong>g cluster centroid (quantization error)<br />

are calculated for all 20 trials of the k-means algorithm. The analysis shows that the percent<br />

decrease of the mean quantization error due to the <strong>in</strong>crease of k is notably high when k = 3 and k<br />

= 4. For k = 3 and k = 4 this value equals 19.06% and 13.11% respectively while it lies between<br />

7% and 2% for k > 4. Thus, the k-means cluster<strong>in</strong>g analysis provides the first <strong>in</strong>dication of the<br />

existence of 3 or 4 ma<strong>in</strong> clusters with<strong>in</strong> the data.<br />

An alternative approach to k-means for cluster analysis is through hierarchical cluster<strong>in</strong>g. This<br />

approach seeks to build a hierarchy of clusters existent <strong>in</strong> the data. The squared Euclidian distance<br />

is used as a measure of dissimilarity between data vector pairs and Ward’s cluster<strong>in</strong>g method [20]<br />

is utilized to specify the clusters <strong>in</strong> the data; the result<strong>in</strong>g dendrogram is depicted <strong>in</strong> Fig 3. (A<br />

dendrogram is a treelike diagram that illustrates the merg<strong>in</strong>g of data sets <strong>in</strong>to clusters. It consists<br />

of many U-shaped l<strong>in</strong>es connect<strong>in</strong>g the clusters while the height of each U represents the squared<br />

Euclidian distance between the two clusters be<strong>in</strong>g connected.)<br />

Depend<strong>in</strong>g on where the designer sets the squared Euclidian distance threshold, T, a dissimilar<br />

number of clusters can be observed. For <strong>in</strong>stance, 3, 4 and 5 clusters of data can be identified if<br />

6:56 > T > 4:72, 4:72 > T > 4:25 and 4:25 > T > 3:74, respectively.<br />

Both cluster<strong>in</strong>g approaches demonstrate that the 1365 players’ feature vector can be clustered <strong>in</strong> a<br />

low number of different player types. k-means statistics provide <strong>in</strong>dications for 3 or 4 clusters while<br />

the Ward’s dendrogram shows the existence of 2 populated and 2 smaller clusters, respectively, <strong>in</strong><br />

the middle and at the edges of the illustration result<strong>in</strong>g to four clusters. By further splitt<strong>in</strong>g the<br />

216

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!