Play-Persona: Modeling Player Behaviour in Computer Games
Play-Persona: Modeling Player Behaviour in Computer Games
Play-Persona: Modeling Player Behaviour in Computer Games
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
7. RESULTS<br />
This section presents the ma<strong>in</strong> f<strong>in</strong>d<strong>in</strong>gs of the cluster<strong>in</strong>g approaches applied to the data. A pre-<br />
process<strong>in</strong>g analysis of the data is complementary to and followed by the design of ESOM approach<br />
for unsupervised learn<strong>in</strong>g of the data and the identification of the different player styles.<br />
7.1 Pre-process<strong>in</strong>g and Initial Cluster Analysis<br />
All six features extracted are uniformly normalized <strong>in</strong>to [0; 1] before any cluster<strong>in</strong>g analysis is<br />
followed. Note that the cause of death features (Do, De and Df) are already normalized <strong>in</strong> [0; 1]<br />
be<strong>in</strong>g percentages of the total number of deaths.<br />
To get some first <strong>in</strong>sight of the possible number of data clusters existent <strong>in</strong> the data, we apply the<br />
k-means cluster<strong>in</strong>g algorithm to the normalized data for all k values less than or equal to 20. The<br />
number of player observations (6- dimensional vector samples) and the sum of the Euclidean<br />
distances between each player <strong>in</strong>stance and its correspond<strong>in</strong>g cluster centroid (quantization error)<br />
are calculated for all 20 trials of the k-means algorithm. The analysis shows that the percent<br />
decrease of the mean quantization error due to the <strong>in</strong>crease of k is notably high when k = 3 and k<br />
= 4. For k = 3 and k = 4 this value equals 19.06% and 13.11% respectively while it lies between<br />
7% and 2% for k > 4. Thus, the k-means cluster<strong>in</strong>g analysis provides the first <strong>in</strong>dication of the<br />
existence of 3 or 4 ma<strong>in</strong> clusters with<strong>in</strong> the data.<br />
An alternative approach to k-means for cluster analysis is through hierarchical cluster<strong>in</strong>g. This<br />
approach seeks to build a hierarchy of clusters existent <strong>in</strong> the data. The squared Euclidian distance<br />
is used as a measure of dissimilarity between data vector pairs and Ward’s cluster<strong>in</strong>g method [20]<br />
is utilized to specify the clusters <strong>in</strong> the data; the result<strong>in</strong>g dendrogram is depicted <strong>in</strong> Fig 3. (A<br />
dendrogram is a treelike diagram that illustrates the merg<strong>in</strong>g of data sets <strong>in</strong>to clusters. It consists<br />
of many U-shaped l<strong>in</strong>es connect<strong>in</strong>g the clusters while the height of each U represents the squared<br />
Euclidian distance between the two clusters be<strong>in</strong>g connected.)<br />
Depend<strong>in</strong>g on where the designer sets the squared Euclidian distance threshold, T, a dissimilar<br />
number of clusters can be observed. For <strong>in</strong>stance, 3, 4 and 5 clusters of data can be identified if<br />
6:56 > T > 4:72, 4:72 > T > 4:25 and 4:25 > T > 3:74, respectively.<br />
Both cluster<strong>in</strong>g approaches demonstrate that the 1365 players’ feature vector can be clustered <strong>in</strong> a<br />
low number of different player types. k-means statistics provide <strong>in</strong>dications for 3 or 4 clusters while<br />
the Ward’s dendrogram shows the existence of 2 populated and 2 smaller clusters, respectively, <strong>in</strong><br />
the middle and at the edges of the illustration result<strong>in</strong>g to four clusters. By further splitt<strong>in</strong>g the<br />
216