28.12.2013 Views

Using Cluster Analysis in Persona Development

Using Cluster Analysis in Persona Development

Using Cluster Analysis in Persona Development

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

can use the standard measurement scale tool to convert the<br />

responses <strong>in</strong>to the 1 to 7 rat<strong>in</strong>g. The questions asked were:<br />

1: If you are a teacher, what course do you prefer to teach?<br />

A: Courses discuss about facts<br />

B: Courses discuss about theory<br />

2: Which one do you th<strong>in</strong>k is a better compliment?<br />

A: You are rational<br />

B: You are emotional<br />

3: When mak<strong>in</strong>g a decision, which one is more important?<br />

A: Take all factors <strong>in</strong>to consideration<br />

B: Focus on the feel<strong>in</strong>gs and viewpo<strong>in</strong>ts of people.<br />

etc.<br />

4) Onl<strong>in</strong>e Survey<br />

We notified each of the participants by email and telephone<br />

the onl<strong>in</strong>e survey web address and the purpose of the survey.<br />

The participants were asked to fill <strong>in</strong> the onl<strong>in</strong>e questionnaire<br />

(see Figure 1). The results of the survey were imported<br />

directly <strong>in</strong>to our database.<br />

We used the cluster analysis to group the users <strong>in</strong>to subgroups<br />

(vertical rows). We <strong>in</strong>put the data (see Figure 2) <strong>in</strong>to<br />

statistic software, and then obta<strong>in</strong>ed the follow<strong>in</strong>g output (see<br />

Figure 3).<br />

S<strong>in</strong>ce the algorithm of us<strong>in</strong>g complete l<strong>in</strong>kage cluster<strong>in</strong>g<br />

and Euclidean Distance is simple and quite efficient, we choose<br />

them as the rule of distance measurement. The steps <strong>in</strong> our<br />

cluster analysis calculations are:<br />

Step 0: In the analysis process, each participant is first put<br />

<strong>in</strong> separate cluster. This means there are 24 clusters <strong>in</strong>itially<br />

and we use C_1, C_2,..., C_24 to denote these clusters. The<br />

distance between two clusters is def<strong>in</strong>ed to be the distance<br />

between two participants they conta<strong>in</strong>; that is dC_iC_j=dij. Let<br />

t=1 be an <strong>in</strong>dex of the iterative process.<br />

Figure 3. The Statistic Output of the User <strong>Cluster</strong><strong>in</strong>g<br />

Figure 1. The screenshot of the onl<strong>in</strong>e survey<br />

form 1<br />

Figure 2. Data matrix based on users’ goals<br />

B. Data <strong>Analysis</strong> - <strong>Cluster</strong><strong>in</strong>g<br />

We organized the survey results <strong>in</strong>to a data matrix (see<br />

Figure 2), <strong>in</strong> which the columns are the 27 dimensions and the<br />

rows are records of participants. Please note that the personal<br />

demographic <strong>in</strong>formation such as age, <strong>in</strong>come, gender, job, etc<br />

were not used <strong>in</strong> the cluster analysis, thus did not appear <strong>in</strong> this<br />

data matrix.<br />

Step 1: Then f<strong>in</strong>d the smallest distance between any two<br />

clusters. Denote these closest clusters C_i and C_j<br />

Step 2: Amalgamate clusters C_i and C_j to form a new<br />

cluster denoted C_n+t.<br />

Step 3: Def<strong>in</strong>e the distance between the new cluster C_n+t<br />

and all rema<strong>in</strong><strong>in</strong>g clusters C_k as follows:<br />

dC_n+tC_k=m<strong>in</strong>{dC_iC_k, dC_jC_k}.<br />

Step 4: Add cluster C_n+t as a new cluster and remove<br />

clusters C_i and C_j. Let t=t+1.<br />

Step 5: Return to step 1 and cont<strong>in</strong>ue until only one cluster<br />

of size 24 rema<strong>in</strong>s.<br />

The result, known as Tree Diagram (see Figure 3), can<br />

clearly <strong>in</strong>dicate which observations are jo<strong>in</strong>ed together at what<br />

step of the analysis.<br />

Based on the outputs and after discuss<strong>in</strong>g with our client,<br />

we obta<strong>in</strong>ed 2 clusters: (participants 1, 12, 24, 13, 18, 20, 3, 16,<br />

23, 15) and (participants 2, 4, 9, 11, 6, 10, 8, 21, 17, 5, 7, 14, 19,<br />

22) (see Figure 3). The distance level between the two clusters<br />

is around 14. This 2-cluster user classification formed the basis<br />

from which we created two typical user profiles.<br />

Note: Figure 1, 4 and 5 are <strong>in</strong>tentionally small and hide details to preserve<br />

proprietary <strong>in</strong>formation <strong>in</strong> them

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!