Using Cluster Analysis in Persona Development

Using Cluster Analysis in Persona Development 

Nan Tu 

Research Center for Modern Logistics 

Graduate School at Shenzhen, Tsinghua University 

Shenzhen, 518055, P.R. China 

dr.nan.tu@gmail.com 

Xiao Dong / Pei-Luen Patrick Rau / Tao Zhang 

Department of Industrial Engineering, 

Tsinghua University, 

Beijing, 100084, P.R.China 

Abstract—Personas are user models that represent the user 

characteristics. In this paper we describe a Persona creation 

process which combines the quantitative method such as cluster 

analysis with qualitative method such as observation and 

interview to produce convincing and representative Personas. 

We illustrate the Personas creation process through a case study. 

We use cluster analysis to group the users by their similarities in 

goals and decision-making preference. 

Keywords- Personas, user profiles, cluster analysis, user 

observation and interview 

I. INTRODUCTION 

Persona, as a concept, has its root in marketing. Copper 

proposed that the Persona as an interactive design technique to 

be used in product design [2]. For the past few years, Personas 

have gained enthusiasm from both academic and practitioner 

community. 

Most of the paper and field reports describe the Personas 

creation using qualitative methods such as ethnographic studies 

and user observations. Grudin & Pruitt think that finding the 

representative user is the key to Personas creation [3]. Sinha 

shows that qualitative method can be used to identify important 

underlying groups of information needs [7]. 

This paper illustrates the Persona creation process through a 

case study. We show that the qualitative and quantitative 

methods can be combined to create Personas. 

II. RELATED RESEARCH 

We categorize the Persona research paper into two 

domains: (1) Personas definition and its creation; (2) statistical 

methods: clustering analysis. 

A. Personas Definition and Its Creation 

Goodwin defines Personas as: “User models, or Personas, 

are fictional, detailed archetypical characters that represent 

distinct groups of behaviors, goals and motivations observed 

and identified during the research phase.”[1] By this definition, 

Personas have the following characteristics: 

• A persona is a detailed user model that represents 

archetypical users. In other words, a persona has the 

characters of a group of similar users. A persona is 

not a real person but a fictitious person. 

• A Persona is defined by his or her goals. Cooper 

explains that the Personas are used in goal-directed 

design [2]. He further categorize the Personas goals 

as personal goals, corporate goals, practical goals and 

false goals. Goals are different than the tasks in that a 

goal is an end condition while a task is an 

intermediate process that is necessary to accomplish 

goals. 

• A Persona is created by analyzing the real users’ 

goals, behaviors and motivations. The real users’ 

data may come from marketing research, user study 

which includes interviews, questionnaires, 

observations, and ethnographic studies, etc. 

Ethnography studies the users’ behaviors in a natural 

setting (field observations). While the other user 

study methods are used mostly in the laboratory 

environment. 

Sinha pointed out that the current Persona development 

processes emphasize precision (building detailed descriptions), 

but not accuracy (identifying representative users) [7]. Upon 

reviewing the current state-of-the-art researches and practices, 

we noticed that the Personas creation process is viewed as a 

qualitative method by observing, interviewing and abstracting. 

However, the qualitative methods have met some criticism 

such as: the created Personas may not represent the typical 

users, the Personas are too subjective and different creators will 

have different Personas. 

To address the above criticisms, we propose an approach 

that combines the quantitative and qualitative methods in the 

Personas creation. We will use a case study to illustrate this 

approach. 

B. Statistical methods: Principal Components Analysis and 

Cluster Analysiss 

Marketing data and ethnography studies data may consist of 

large quantity of multidimensional data sets. The statistical 

method Principal Components Analysis (PCA) can be used to 

reduce the dimensionality of the data while Cluster Analysis 

(CA) can group the data based on their similarities. 

Principal Components Analysis (PCA) is used to reduce the 

dimensionality of large datasets by identifying important 

underlying factors. Sinha shows how it can be used in the 

Personas creation process [7]. In the example, he first identified 

32 dimensions of the restaurant experience and asked the 

restaurant finders to rate the experience on the scale of 1 to 5 (1 

= not important, 5 = very important). He then used the PCA to

divide the dimensions into 5 major components. Each 

component can be regarded as an independent cluster of needs. 

Cluster Analysis (CA) involves the categorization of data. 

It divides a large group of observations into subsets so that 

observations within each subset are relatively similar while 

observations in different groups are relatively dissimilar. Two 

major different types of cluster analysis are widely used: 

hierarchical methods (in which the k-cluster solution is 

constructed by joining together two clusters from the k+1 

cluster solution) and partitioning methods (in which the 

observations are separated into a given number of subsets, and 

the k-cluster solution and the k+1 cluster solution are not 

necessarily nested) [4]. In both methods, there is no definitive 

answer regarding how many clusters should be chosen. It is up 

to the analyst to determine the “best” cluster solution. 

Since its objective is to address the heterogeneity in each 

data subset, cluster analysis has become a common tool for 

marketing researchers to develop empirical groupings of 

persons, products, and usage occasions that share certain 

common characteristics. While its primary use has been 

focused on market segmentation, there is growing interest on 

applying cluster analysis into the classification of relevant 

buyer characteristics and identify homogeneous groups of 

customers [6]. The results of cluster analysis can contribute to 

the definition of a classification scheme, or indicate rules for 

assigning new cases to classes, or provide measures of 

definition, size and change of broad concepts, or find 

representative users and respective classification from a large 

sample, which is most important in user experience research. 

III. METHOD 

We worked with a company to develop the Personas for 

their online travel service business. The company’s main 

business is selling airline tickets, hotel bookings and tour 

packages through the company websites and telephone booking 

system. The company has been in business for a few years and 

has enjoyed stable growth of their core business. 

We were given two typical user descriptions by the 

company’s marketing department. The descriptions include the 

gender, age, annual income, family members, frequency using 

the company’s service, etc. We were to find out and write the 

Personas for their online tickets booking business. 

A. Participants and Procedure 

1) Recruiting participants: 

The two typical user profiles given to us are based on the 

marketing department recommendation. We refined the profile 

by. 

1: Include people who have not used the company’s online 

booking system but they have similar experience on 

competitors’ websites to our user base. 

2: Find out the users’ goals and their decision making 

process. 

We decided to use an online survey to gather more user 

data. We recruited a total of 24 participants from two sources. 

Although more participants are appropriate for the qualitative 

analysis, we are limited by the project budget and time. First, 

we selected some participants from the name list given to us by 

the company market department. These participants have used 

the company service and were willing to participant in the 

company’s future customer researches. Then, we put on 

advertisement which specified the type of people that we are 

looking. Using the advisement, we recruited some 

participants who had not used the company website but had 

similar experience with competitor’s products. 

2) Defining dimensions 

In the Persona Creation and Usage Toolkit [5], Olsen thinks 

that Personas should include information in the following 

categories: 

• Persona’s Biographic Background 

• Business’ Relation to Persona 

• Persona’s Relation to Product/Business 

• Specific Goals/ Needs/ Attitudes 

• Specific Knowledge / Proficiency 

• Context of Usage 

• Interaction Characteristics of Usage 

• Information Characteristics of Usage 

• Sensory/Immersive Characteristics of Use 

• Emotional Characteristics of Usage 

• Accessibility Issues 

He also outlines the dimensions in each of the categories. 

Using the categories and dimensions outlined by Olson as 

template, and after discussing with the company 

representatives, we identified 45 dimensions that would be 

used in our survey. Among them: 

1: 18 dimensions will be used in the Persona definitions. 

These dimensions, such as Persona’s Biographic Background 

and these attributes will be used in the final Personas definition 

but they do not contribute to the user clustering analysis. 

2: 27 dimensions will be used in the clustering of the users. 

These dimensions represent user goals and behaviors, such as: 

• What is your spending habits in purchasing travel 

products? 

• How will you select a travel agent? 

• What is your frequency of traveling? etc. 

3) Measuring dimensions 

For each of the dimensions, we asked the participants to 

rate it on the scale of 1 to 7, with 1 being the lowest and 7 the 

highest. For some of the dimensions that can not be easily 

measured by the participants’ subjective ratings, we used 

standard measurement tools. For example, on the question 

regarding the participants spending habit: is he emotional or 

rational, we asked 7 indirect questions. With the answers, we

can use the standard measurement scale tool to convert the 

responses into the 1 to 7 rating. The questions asked were: 

1: If you are a teacher, what course do you prefer to teach? 

A: Courses discuss about facts 

B: Courses discuss about theory 

2: Which one do you think is a better compliment? 

A: You are rational 

B: You are emotional 

3: When making a decision, which one is more important? 

A: Take all factors into consideration 

B: Focus on the feelings and viewpoints of people. 

etc. 

4) Online Survey 

We notified each of the participants by email and telephone 

the online survey web address and the purpose of the survey. 

The participants were asked to fill in the online questionnaire 

(see Figure 1). The results of the survey were imported 

directly into our database. 

We used the cluster analysis to group the users into subgroups 

(vertical rows). We input the data (see Figure 2) into 

statistic software, and then obtained the following output (see 

Figure 3). 

Since the algorithm of using complete linkage clustering 

and Euclidean Distance is simple and quite efficient, we choose 

them as the rule of distance measurement. The steps in our 

cluster analysis calculations are: 

Step 0: In the analysis process, each participant is first put 

in separate cluster. This means there are 24 clusters initially 

and we use C_1, C_2,..., C_24 to denote these clusters. The 

distance between two clusters is defined to be the distance 

between two participants they contain; that is dC_iC_j=dij. Let 

t=1 be an index of the iterative process. 

Figure 3. The Statistic Output of the User Clustering 

Figure 1. The screenshot of the online survey 

form 1 

Figure 2. Data matrix based on users’ goals 

B. Data Analysis - Clustering 

We organized the survey results into a data matrix (see 

Figure 2), in which the columns are the 27 dimensions and the 

rows are records of participants. Please note that the personal 

demographic information such as age, income, gender, job, etc 

were not used in the cluster analysis, thus did not appear in this 

data matrix. 

Step 1: Then find the smallest distance between any two 

clusters. Denote these closest clusters C_i and C_j 

Step 2: Amalgamate clusters C_i and C_j to form a new 

cluster denoted C_n+t. 

Step 3: Define the distance between the new cluster C_n+t 

and all remaining clusters C_k as follows: 

dC_n+tC_k=min{dC_iC_k, dC_jC_k}. 

Step 4: Add cluster C_n+t as a new cluster and remove 

clusters C_i and C_j. Let t=t+1. 

Step 5: Return to step 1 and continue until only one cluster 

of size 24 remains. 

The result, known as Tree Diagram (see Figure 3), can 

clearly indicate which observations are joined together at what 

step of the analysis. 

Based on the outputs and after discussing with our client, 

we obtained 2 clusters: (participants 1, 12, 24, 13, 18, 20, 3, 16, 

23, 15) and (participants 2, 4, 9, 11, 6, 10, 8, 21, 17, 5, 7, 14, 19, 

22) (see Figure 3). The distance level between the two clusters 

is around 14. This 2-cluster user classification formed the basis 

from which we created two typical user profiles. 

Note: Figure 1, 4 and 5 are intentionally small and hide details to preserve 

proprietary information in them

C. Identify Typical User 

In order to generate a user profile for each group, we need 

to identify the participants who can represent each group. We 

revised the data matrix (see Figure 2) and put participants who 

were in the same group adjacent to each other. We calculated 

the average scores for each of the group. We noticed that the 

average score of these two groups differ in quite a few 

dimensions, particularly on dimensions such as: 

• Rational/Emotional 

• Hotel budget 

We then compared the scores of each participant to the 

average scores of his or her groups. 

We reviewed the data and selected two participants whose 

scores are closest to the average score. We gave preference to 

the dimensions that make the two groups different such as 

Rational/Emotional, Hotel budget, etc. Then we selected two 

real people to represent the participants of the same group. 

D. User Observation and Interview 

In order to develop the user profiles and the final Personas, 

we conducted a user observation and interview. The objective 

is to understand the users’ behavior, goals and requirements, 

obtain more detailed user information and create the basis for 

the Personas illustration. Various types of user information, 

including name, age, gender, working experience, education 

level, hobbies, internet usage, traveling experiences, 

preferences, expectations, individual requirements, etc. were 

collected during the observation and interview process. 

The user observation and interview process was divided 

into three phrases: preparation and pilot test, formal test, and 

data analysis. Preparation of user testing included the scenarios 

design, questionnaire, interview script writing. A scenario is a 

concise description of a person using the website to achieve an 

end goal, which consists of actor, background information, 

context assumption, goal, task procedure and events. The 

scenarios we used were designed to reflect the major user tasks 

performed in the company’s online business and to reveal the 

user goals and requirements. For example, typical scenarios of 

using travel service website are determining destination, 

selecting travel time, booking air ticket and hotel. 

We recruited 9 participants from original 24 participants of 

our online survey to participate in our user observation and 

interview. The 2 typical users identified earlier were among the 

9 participants selected for the user observation and interview. 

E. User Profile and Persona Creation 

Based on the online survey and user observation and 

interview results, we wrote a user profile for each of the 2 

typical users. The 2 user profiles contained the following 

information: 

• Demographic information: Name, Age, Working 

experience, Income, Traveling experience, Internet and 

computer experience, Traveling service website experience. 

• Typical activities in scenarios, based on abstraction of 

the interview records: Involvement description, User goal and 

respective task procedure (planning, information searching, 

decision making) 

After creating the initial user profiles for these two typical 

users, a few revisions were made to enrich the content with 

other participants’ information. The goal is to create the user 

profile that can truly represent the participants of all the group 

members. This process is consistent with the idea from Cooper 

that stresses “Design for just one person” [2]. Personas are the 

fictitious people that best represent the very users whose 

requirements we want to satisfy. The results of the revision 

and enrichment were the two Personas for our project. 

We separate the two Personas into primary and secondary 

Personas (see Figure 4, 5) Primary Persona are the most 

important user or target user that must be satisfied, because 

failing to satisfy their requirements means failing to satisfy the 

other Persona. The primary Persona’s requirements can not be 

satisfied if the website is designed for the other Persona. 

Figure 4. Primary Persona of this project 1 

Figure 5. Primary Persona of this project 1

IV. CONCLUSIONS AND DISCUSSION 

In this project, we recruited 24 users based on the two 

typical user descriptions that were given by the marketing 

department. We identified 27 dimensions that represent the 

users’ goal and decision making process. We designed an 

online questionnaire to gather the information about the users’ 

demographic information and the goal/decision making 

process. We used cluster analysis method to group the 

participants into two groups. From these two groups, we 

identified two typical users. We then created 2 detailed users 

profile for the two typical users. We interviewed and observed 

9 users from 24 participants when they were interacting with 

the system. We then enriched and refined the two user profile 

based on the user interview and observations. We constructed 

the primary and secondary Persona. The company then used 

the created Personas for their website redesign project and 

reported very satisfactory results. The designers are pleased 

when they have a very concert user to design the website. They 

can design the business flow that better serve the users 

requirements. For example, the designers created a short cut 

pass for the more business oriented Persona as these users 

know what they are purchasing and want to buy the tickets and 

leave the site. The designers also created a BBS (online 

discussion) for the casual users who have no particular 

objectives in mind when they visit the website. These users 

simply want more information on destinations, discount ticket, 

etc. The designers also reported more confidence when 

designing the site as they learned more about the users during 

the Persona creation process. 

Through the experience gained though our own project, we 

learned that Personas can be created using both quantitative 

and qualitative method. The final Personas are more 

representative and less ambiguous that all the team members 

can agree upon. 

By first using the quantitative methods to identify the 

cluster of Personas, we addressed the limitations of the 

qualitative only approach. Although the data from 24 users is 

rather limited for real statistic analysis, we showed that a 

Persona could be created in two steps: clustering (quantitative 

method) and abstracting (qualitative method). The cluster of 

the users is based on the similarity between the users’ goals, 

behaviors and motivations. Once the users are clustered, a user 

profile is created from the characteristics of this group of users. 

The qualitative methods such as user observation and interview 

can be used to refine and enrich the details of Persona. 

Additional observation and interview will provide detailed 

information which can be used in enriching and refining the 

typical user profile into Personas. 

In the future, we are looking for opportunities to use a 

larger data set to perform the cluster analysis. We also wish to 

find out how different it is to create Persona using the 

quantitative methods such as cluster analysis vs. qualitative 

methods such as observation, interviewing, etc. 

ACKNOWLEDGMENT 

We would like to thank Songyue Wang for helping us on 

the user studies, Danshuo Zhuang and Yun.Wen for helping us 

on report writing and data analysis 

REFERENCES 

[1] Calde, S., K. Goodwin, and R. Reimann. SHS Orcas: the First Integrated 

Information System for Long-Term Healthcare Facility Management. in 

Case Studies of the CHI2002;AIGA Experience Design FORUM. 2002. 

ACM Press. 

[2] Cooper, A., The Inmates Are Running the Asylum: Why High Tech 

Products Drive Us Crazy and How To Restore The Sanity. 1st ed. 1999: 

Sams. 

[3] Grudin, J. and J. Pruitt. Personas, Participatory Design, and Product 

Development: An Infrastructure for Engagement. in the Participartory 

Design Conference. 2002. 

[4] Lattin, J., D. Carroll, and P. Green, Analyzing Multivariate Data 

(Duxbury Applied Series). 1 ed. 2002: Duxbury Press. 

[5] Olsen G. Persona Creation and Usage Toolkit. http:// 

www.iasummit.org/finalpapers/86/86_Handout_or__final__paper.pdf , 

2004 

[6] Punj, G. and D.W. Stewart, Cluster Analysis in Marketing Research: 

Review and Suggestions for Application. Journal of Marketing 

Research, 1983. 20(2): p. 134-148. 

[7] Sinha, R. Persona Development for Information-Rich Domains. in CHI 

2003 extended abstracts on Human Factors in Computing Systems. 

2003. ACM Press

Using Cluster Analysis in Persona Development

Create successful ePaper yourself

Delete template?

Save as template?