Part-based PCA for Facial Feature Extraction and Classification

Part-based PCA for Facial Feature Extraction and Classification 

Yisu Zhao, Xiaojun Shen, Nicolas D. Georganas, Emil M. Petriu 

Distributed and Collaborative Virtual Environments Research Laboratory (DISCOVER Lab) 

School of Information Technology and Engineering 

University of Ottawa, K1N 6N5 

Canada 

{yzhao/ shen / georganas / petriu}@discover.uottawa.ca 

Abstract – With the latest advances in the fields of computer 

vision, image processing and pattern recognition, facial expression 

recognition is becoming more and more feasible for human 

computer interaction in Virtual Environments (VEs). In order to 

achieve subject-independent facial feature extraction and 

classification, we present part-based PCA (Principal Component 

Analysis) for facial feature extraction and apply a modified PCA 

reconstruction method for expression classification. Part-based 

PCA is employed to minimize the influence of individual 

differences which hinder facial expression recognition. For the 

purpose of obtaining part-based PCA, a novel feature detection 

and extraction approach based on multi-step integral projection is 

proposed. The features can be automatically detected and located 

by multi-step integral projection curves without being manually 

picked and PCA is applied in the detected area instead of the 

whole face. To solve the problem that the features extracted from 

PCA are not the best features suitable for classification, we 

propose a modified PCA reconstruction method. We divide the 

training set into 7 classes and carry out PCA reconstruction on 

each class independently. We can identify the expression class by 

measuring the similarity between the input image and the 

reconstructed image. Experiments demonstrate that when tested 

on the JAFFE database, the part-based PCA outperforms 

traditional PCA of higher recognition rate. 

Keywords – facial feature extraction and classification, partbased 

PCA, multi-step integral projection. 

I. INTRODUCTION 

Human computer interfaces (HCI) have evolved from textbased 

interfaces through 2D graphical-based interfaces, 

multimedia-supported interfaces to fully-fledged multiparticipant 

Virtual Environment (VE) systems [1]. Instead of 

applying traditional two dimensional HCI devices such as the 

keyboards and the mice, VE applications require utilizing 

several various modalities and technologies and integrating 

them into a more immersive user experience [2]. For a more 

immersive virtual environment human computer interaction, 

integrating multimodal sensory information such as hand 

gesture, speech, sound, body posture and facial expression is 

necessary. This means that the communication between 

human and the VE can be more naturally if the computer 

could detect and express human affective states by applying 

the devices which sense hand gestures, body position, facial 

expressions, voice, etc. The most expressive way humans 

display their emotional state is through facial expressions. 

Research in social psychology [3-7] suggests that facial 

expressions form the major modality in human 

communication and is a visible manifestation of the affective 

state of a person. Many applications, such as VE, videoconferencing, 

synthetic face animation require efficient facial 

expression recognition in order to achieve the desired results 

[8]. The facial expressions under examination were defined 

by psychologists as a set of six universal facial emotions 

including happiness, sadness, anger, disgust, fear, and 

surprise [9]. 

A generic facial expression recognition system usually has a 

sequential configuration of processing steps: face detection, 

pre-processing, feature detection, feature extraction and 

classification. Early research of facial expression recognition 

needs the help of markers for facial feature point detection. In 

our research, we perform principal component analysis (PCA) 

in facial feature detection and extraction without any help of 

markers. PCA is a popular technique which has been 

successfully used in face recognition [10] [11]. Part-based 

PCA is used in order to avoid the influence of individual 

differences. A novel method called multi-step integral 

projection has been used to automatically locate and detect 

the eyes and mouth areas. We also develop and modify the 

PCA reconstruction method for further expression 

classification. The modified PCA reconstruction is then 

applied on the extracted areas. 

II. METHODOLOGY 

1. Face Detection and Pre-processing 

In order to build a system capable of automatically 

capturing facial feature positions in a face scene, the first step 

is to detect and extract the human face from the background 

image. We make use of a robust and automated real-time face 

detection scheme proposed by Viola and Jones [12] [13], 

which consists of a cascade of classifiers trained by 

AdaBoost. In their algorithm, the concept of "integral image" 

is also introduced to compute a rich set of Haar-like features 

(see Fig 1). Each classifier employs integral image filters, 

which allows the features be computed very fast at any 

location and scale. For each stage in the cascade, a subset of 

features is chosen using a feature selection procedure based 

on AdaBoost. The Viola-Jones algorithm is approximately 15 

times faster than any other previous approaches while 

achieving equivalent accuracy as the best published results 

[12]. Fig 2 shows the detection of the human face using the 

Viola-Jones algorithm. 

Since input images are affected by the type of camera, 

illumination conditions, background information and so on 

978-1-4244-4218-8/09/$25.00 ©2009 IEEE 99

and so forth we need to normalize the face images before 

feature detection and extraction. The aim of pre-processing is 

to eliminate the differences between input images as far as 

possible, so that we could detect and extract them under the 

same conditions. Expression representation can be sensitive 

to translation, scaling, and rotation of the head in an image, to 

combat the effect of these unwanted transformations, the 

steps of pre-processing steps are: 

1) Transform the face video into face images. 

2) Convert the input color images into gray-scale 

images. 

3) Normalize the face images into the same size of 

128 × 128 pixels. Scale normalization is used to align 

all facial features. We use the Lanczos Resampling 

[14] method to resize images. 

4) Smooth face images to remove noise by using 

Mean-based Fast Median Filter [15]. 

5) Perform grayscale equalization [16] to reduce the 

influence due to illumination variety and ethnicity. 

Although Gabor transformation is insensitive to 

illumination variety, by using histogram 

equalization, the results will be improved. 

(1) (2) (3) (4) 

Fig 1. A set of Harr-like features 

a) Input Video b) Detection c) Detected face 

Fig 2. Face Detection using Harr-like features 

2. Facial Feature Detection and Classification using Part-based 

PCA and PCA Reconstruction 

After the pre-processing step, we can employ the preprocessed 

image for facial feature detection and extraction. 

The reason we choose the PCA algorithm in our research is 

because of its speed and simplicity. PCA (Principal 

Component Analysis) is a popular technique for data 

dimensionality reduction and has been widely used in the 

computer vision area, such as face recognition and object 

recognition [17]. It is used to reduce a complex data set into a 

lower dimension in order to reveal the hidden and simplified 

structure under it. It has been called one of the most valuable 

results from applied linear algebra [18]. 

2.1 Applying PCA in Facial Expression Recognition 

Assume the width and height of the image are n and m 

pixels respectively, the size of the transformed vector of this 

image is d=n*m. Given k pre-processed facial expression 

images as training data, we convert these images into 

corresponding column image vectorsτ = { τ i 

, i=1,2,3,…,k}. 

Compute the mean of training data ψ : 

1 k i 

k n= 

1 

ψ = τ 

The difference vector φ 

i 

is defined as φ 

i 

= τ i 

-ψ , where ψ is 

the average vector of τ 

i 

. The covariance matrix 

corresponding to all training samples is obtained as: 

C 

k i= 

1 

A [ ψψ ,..., ψ k 

] 

k 

1 T 1 

= ψψ = 

 

i 

i 

AA 

k 

where = 

1, 2 

. 

The principal components are then the eigenvectors of C, 

where C is a d*d matrix. It contains d eigenvectors 

λ λ λ . However, it 

VV , 

2,..., V 

d 

and d eigenvalues 1 2 

T 

, ,..., 

d 

is time-consuming to determine d eigenvalues and 

eigenvectors. Therefore, it is necessary to reduce the 

complexity of computation. According to SVD (Singular 

T 

T 

Value Decomposition), AA and AA have the same 

eigenvalue λ . As a result, instead of directly computing 

eigenvectors 

d 

ui 

of matrix 

T 

AA , eigenvectors v i 

of matrix 

T 

T 

AAis computed. Eigenvector u i 

of AA can be defined 

by 

u 

i 

= 

1 

Avi 

λ 

The face image can be represented by projecting the data in 

the image space onto the face space, from which we can 

obtain the projection vectors. By sorting the eigenvalues in 

descending order, we could select the corresponding 

eigenvectors. The bigger the eigenvectors are, the more 

important the corresponding eigenvalues are. These 

eigenvectors can compose a much lower dimension matrix. 

This greatly reduced the dimension comparing with the 

original matrixτ i 

. If we reconstruct the matrix using these 

eigenvectors, we could find that they appear to be face-like 

images (see Fig 3) 

Fig 3 Face-like Image 

Face recognition tries to make use of the differences between 

individuals while facial expression recognition take 

i 

100

advantage of the differences among expressions, that is, the 

differences between individuals which are useful for face 

recognition may become interference in facial expression 

recognition since individual differences must be neglected in 

facial expression recognition. Traditionally, PCA is applied 

in a whole face image which contains many individual 

differences. This happens to explain why PCA is commonly 

used in face recognition while few studies show it being used 

in facial expression recognition [19]. Another problem that is 

a handicap PCA for facial expression recognition is that the 

features extracted from PCA are not the best features suitable 

for classification but for expressing the data set. Based on 

these two factors that might affect expression recognition 

results, we propose part-based PCA for facial feature 

extraction and apply a modified PCA reconstruction method 

for expression classification. 

2.2 Part-based PCA 

To avoid the influence of personal differences, instead of 

applying PCA on the whole facial image, we attempt to apply 

PCA on part of the face image where only useful facial 

regions are analyzed. This can refine useful information and 

abandon useless ones, such as the disturbance of facial form 

and ratios of facial features. The most important areas in 

human faces for classifying expression are eyes, eyebrows 

and mouth. Other areas in the human face contribute little or 

even encumber facial expression recognition. In this section, 

we propose a new facial feature location approach call multistep 

integral projection for feature area detection. For each of 

the integral projection step, the location of the eyes and 

mouth area is getting more and more accurate. 

The integral projection technique was originally propose by 

Kanade[20]. The basic idea of gray-level integral projection 

is to accumulate the sum of vertical gray-level from and 

horizontal gray-level respectively. The vertical gray-level 

integral projection stands for the variations on the horizontal 

direction of an image. The horizontal gray-level integral 

projection stands for the variations on the vertical direction of 

an image. Suppose there is an image m*n. The gray-level of 

each pixel is I(x,y). Thus, the definition of the vertical 

projection function is 

S ( ) 

x 

y x = 

y − 1 

I( x, y) 

The definition of the horizontal projection function is 

m 

x−1 

Sx( y) = I( x, y) 

Before employing integral projection to facial expression 

detection, we need to convert input image into binary image. 

Image binarization is one of the main techniques for image 

segmentation. It segments an image into foreground and 

background. The image will only appear in two gray-levels: 

the brightest level 255 and the darkest level 0. The 

foreground contains information of interest. The most 

important part in image binarization is the threshold selection. 

Image thresholding is a useful method in many image 

processing. We use a nonparametric and unsupervised 

method of automatic threshold selection, called the Otsu 

method [21]. This method has been widely used as the 

classical technique in thresholding tasks since it is not 

sensitive to non-uniform illuminations. Gray-level histogram 

indicates the total pixels of an image at a gray-level, that is, 

the distribution of pixels of the image (see Fig 4). The main 

idea of Otsu is to dichotomize the gray-level histogram into 

two classes by a threshold level. The maximum value m of 

the variance σ between 0~K is the threshold we are looking 

for. 

Fig 4. The gray-level histogram 

Using the threshold, we convert the original picture into a 

binary image, that is, an image with pixel values 0 and 255, 

representing black and white respectively. The black is called 

the foreground which contains facial feature information that 

we are interested in; while the white background will be 

ignored since it is useless information. 

After image binarization, we make use of multi-step integral 

projection to obtain facial feature positions. The first step is 

applying horizontal integral projection on original binary 

image. Fig 5 shows the result of vertical and horizontal 

projection curves. Suppose I ( xy , ) is a gray value of an 

image, the horizontal integral projection in intervals [ y1, y 

2] 

and the vertical projection in intervals [ x 1 

, x 2 

] can be 

defined respectively as H ( y) 

and V( x ), thus we have: 

x2 

1 

H ( y) = I( x, y) 

x − x = 

2 1 x x1 

y2 

1 

V( x) = I( x, y) 

y − y = 

2 1 y 

The horizontal projection indicates the x-axis of the eyebrow, 

eyes, and mouth. Let x-axis of the eyes be the central point 

and double length from eyebrow to eye as the region, the 

result of the vertical projection indicates the y-axis of the left 

eye and right eye. Since the original integral projection 

curves are irregular, we use Bezier Curves [22, 23], which are 

used in computer graphics to model smooth curves at all 

scales. For any four points: Ax ( 

A, y 

A) 

, B( xB, y 

B) 

, 

Cx ( 

C, y 

C) 

, D( xD, y 

D) 

, it starts at Ax ( 

A, y 

A) 

and ends 

at D( xD, y 

D) 

, the so-called end points. Cx ( 

C, y 

C) 

, 

D( xD, y 

D) 

are called the control points. Therefore, any 

coordinate ( xt, yt) 

in a curve is: 

x = x ⋅ t + x ⋅(1 − t) + 3 ⋅xc⋅(1 −t) ⋅ t + 3 ⋅x ⋅t⋅(1 −t) 

3 3 2 2 

t A B D 

y = y ⋅ t + y ⋅(1 − t) + 3 ⋅yc⋅(1 −t) ⋅ t + 3 ⋅y ⋅t⋅(1 −t) 

3 3 2 2 

t A B D 

y1 

101

vertical position of the eyebrows as H1; set the second wave 

trough that corresponds to the vertical position of the eyes as 

H2. The starting vertical position of the eyes area is H1- 

0.2 × VH, where VH=H2-H1; while the ending vertical 

position is H2+0.8× VH. Here, we get the extracted eyes area 

shown in Fig 8. 

Fig 5. Integral projection curves 

If we rotate the projection curve 90 degree, aligning the 

image with its vertical position, we could detect the position 

of eyes and mouth region by observing the horizontal graylevel 

projection curves. In horizontal gray-level integral 

projection, wave troughs of previous features are formed on 

the curves of the graphs due to the majority of black color in 

the areas of eyebrows, eyes, and mouths. By observing the 

wave trough, we could locate the position of the eyes and 

mouth areas. In horizontal gray-level projection, from the left 

to the right, the first minimum value represents the position 

of the eyebrow; the second minimum value represents the 

position of the eye. From the right to the left, the first 

minimum value represents the position of the mouth. Fig 6 

shows the position of the eyebrows, eyes and mouth. 

Fig 8. The extracted eyes area 

We apply the same method for mouth area detection. Suppose 

the vertical length of the face image is H. Set the first wave 

trough from the top after 0.7H as H4, and the closest wave 

trough above H4 as H3. The starting vertical position of the 

mouth area is H3+0.4 × VH, where VH=H4-H3; while the 

ending vertical position is H4+0.7× VH. The extracted mouth 

area is shown in Fig 9. 

Fig 6. The position of the eyebrows, eyes and mouth 

However, due to image complexity and noise, there might be 

some small wave trough in the projection curve, which 

interferes with eyes and mouth location detection. Therefore, 

we need to smoothen the integral projection curves, filter 

minor wave troughs, and eliminate disturbing information. 

After smoothening, there will be four main wave troughs 

appearing, which represent eyebrows, eyes, nose and mouth 

respectively, see Fig 7. 

Fig 9. The extracted mouth area 

Based on the extracted eyes and mouth areas, we still need to 

refine the extracted eyes and mouth areas for future Gabor 

transformation. We apply integral projection once more with 

vertical projection. The vertical projection curves of eyes and 

mouth areas are shown in the following figure: 

Fig 7. Horizontal projection after smoothening 

The detail steps to detect eyes areas are: Considering the 

vertical length of the face image is H. Set the first wave 

trough from the top after 0.15H that corresponds to the 

Fig 10. Vertical projection curves of eyes and mouth areas 

The final extracted eyes area is the range from the left most 

maximum value W1 to the right most maximum value W2 

(see Fig 11); while the final extracted mouth area is between: 

the right most maximum value W1 from the middle to the left 

and the left most maximum value W2 from the middle to the 

right (see Fig 12). 

102

Fig 11. Final extracted eyes area 

image of each class, we can identify the expression class of 

the input image. 

We performed the experiments on the JAFF database [25] 

using image sequences and compared the results between 

traditional PCA and part-based PCA. By applying traditional 

PCA with neural networks, we achieve 89.10% on trained 

data and 69.76% on untrained data; while by applying partbased 

PCA with modified PCA reconstruction, we obtain 

93.71% on trained data and 76.67% on untrained data. The 

results demonstrate that the part-based PCA outperforms 

traditional PCA in terms of higher recognition accuracy. 

Fig 12. Final extracted mouth area. 

2.3 PCA Reconstruction for Facial Expression Classification 

Using the idea of multi-step integral projection we can extract 

the eyes and mouth regions and then apply part-based PCA 

on them. This can solve the problem of individual differences 

when applying whole face PCA. In order to solve the 

problem that the features extracted from PCA are not the best 

features suitable for classification but for expressing the data 

set, we propose a modified PCA reconstruction method [24]. 

The main idea of this method is: Instead of having a single 

training set for PCA reconstruction, we divide the training set 

into different classes according to different facial expressions. 

For the JAFFE database [25], we divide the training set into 

seven classes. Each class represents a facial expression and 

then perform PCAs on these classes respectively. 

Let V be the matrix whose columns are the first k 

eigenvectors of C. The projection of image τ i 

into its space 

is given by 

P = V( τ − ψ ) 

i 

The face image can be represented by P 

i 

. From this 

projection, we can get the reconstructed image R 

i 

T 

Ri 

= V Pi 

+ ψ 

Since the input image is much similar to one expression 

training set than the others, the reconstructed image will have 

less distortion than the image reconstructed from other 

eigenvectors of training expressions. For an input happy 

expression image, if we reconstruct the eyes area and mouth 

area in each expression class independently, we could find 

that the happy class shows the best similarity with the input 

image (See Fig 13). By measuring the similarity (distance 

measure) between the input image and the reconstructed 

i 

Fig 13 Comparison between original eye and mouth image 

with reconstructed images 

3. Error analysis 

Some of the incorrect facial expression recognitions are due 

to the difference of expression images in the database are not 

obvious. Another reason is that the boundaries between some 

expression, such as anger and disgust, disgust and sad, fear 

and sad, etc. are not very clear. The above two reasons cause 

most of the false recognition results. Table 1 lists some 

incorrect classification examples. a) list of the expressions 

that the images are supposed to be classified into; b) list of 

our classification results. We could see that some expressions 

103

are even impossible for human beings to classify correctly. 

As a result, the above two factors encumber the recognition 

rates to some extent. 

Images 

Table 1 Incorrect classification examples 

a) Sadness Surprise Disgust Surprise 

b) Happiness Happiness Sadness Neutral 

III. CONCLUSION 

For a more immersive virtual environment human 

computer interaction, applying multimodal sensory 

information such as hand gesture, speech, sound, body 

posture and facial expression is necessary. In this paper, our 

research focuses on facial expression recognition to express 

human affective states. We present part-based PCA for facial 

feature extraction and apply a modified PCA reconstruction 

method for expression classification. Part-based PCA is 

proposed as to minimize the influence of individual 

differences. In order to achieve part-based PCA, a novel 

feature detection and extraction approach based on multi-step 

integral projection is proposed. The features can be 

accurately detected and located by multi-step integral 

projection curves and PCA is applied in the detected area 

instead of the whole face. To solve the problem that the 

features extracted from PCA are not the best features suitable 

for classification but for expressing the data set, we propose a 

modified PCA reconstruction method. We divide the training 

set into 7 classes and carry out PCA reconstruction on each 

class independently. The expression can be recognized by 

measuring the similarity between the input image and the 

reconstructed images. 

method based on class features or multiple discriminant analysis,” in 

Proceedings of the International Conference on Image Processing, Vol. 

1, 1999, pp. 648-652. 

[11] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of 

Cognitive Neuroscience, Vol. 19, 1997, pp. 743-756. 

[12] P.Viola, M. Jones, ‘‘Robust real-time object detection,’’ Cambridge 

Research Laboratory Technical Report Series CRL2001/01, pp. 1-24, 

2001 

[13] P.Viola, M. Jones, ‘‘Robust real-time face detection,’’, International 

Journal of Computer Vision 57(2), 137-154,2004 

[14] K. Turkowski, Apple Computer, “Filters for Common Resampling 

Tasks” Filters for Common Resampling , (Tasks 10 April 1990) 

[15] L. Zhang. I. Chen, Z, Gao, W., “Mean-based fast median filter”, Journal 

of Tsinghua University, 2004, Vol 44; Part 9, page 1157-1159 

[16] J. Sumbera, “Histogram Equalization”, CS – 4802, Digital Image 

Processing, Lab #2 

[17] P N. Belhumeur , J. P. Hespanha , D. J. Kriegman, Eigenfacesvs. 

Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE 

Transactions on Pattern Analysis and Machine Intelligence, v.19 n.7, 

p.711-720, July 1997 

[18] J. Shlens, A Tutorial on Principal Component Analysis, 2005 

[19] A. J. Calder, A. M. Burdon, P. Miller, A. W. Yong, S. Akamatsu, “A 

principal component analysis of facial expressions”, Vision Resrach 

41 (2001) 1179-1208 

[20] T.Kanade, “Picture Processing System by Computer Complex and 

Recognition of Human Faces,” Ph.D. thesis, Kyoto University, Japan, 

Nov. 1973. 

[21] N. Otsu. “A threshold selection method from gray-level histograms” [J]. 

IEEE Trans on SMC, 1979, 9(1): 62~66 

[22] G. G. Mateos, A. Ruiz, P. E. Lopez-de-Teruel. “Face Detection Using 

Integral Projection Models,” Proceedings of the Joint IAPR 

International Workshop on Structural, Syntactic, and Statistical 

Pattern Recognition. 2002, pp. 644-653 

[23] Z. H. Zhou, X. Geng “Projection functions for eye detection,” Pattern 

Recognition 37 (2004) 1049-1056 

[24] L. M. Borja, O. FuentesJ.D. Foley, “Object detection using image 

reconstruction with PCA”, Image and Vision Computing 27 (2009) 2-9 

[25] M. Lyons, J. Budynek, and S. Akamatsu, “Automatic classification of 

single facial images,” IEEE Transactions on Pattern Analysis and 

Machine Intelligence, Vol. 21, 1999, pp. 1357-1362. 

IV. REFERENCES 

[1] Q. Chen, Real-time Vision-based Hand Gesture Recognition Using 

Haar-like Features, IMTC IEEE Vol., Issue, pp. 1-6,200 

[2] M.Turk, Gesture Recognition in Handbook of Virtual Environment 

Technology, Lawrence Erlbaum Associates, Inc., 2001 

[3] K. Matsumura, Y. Nakamura, and K. Matsui, “Mathematical 

representation and image generation of human faces by 

metamorphosis,” Electron. Commun. Jpn., vol. 80, pp. 36–46, 1997. 

[4] P. Ekman, “Facial expression and emotion,” Am. Psychol., vol. 48, pp. 

384–392, 1993. 

[5] P. Ekman., “Strong evidence for universals in facial expressions: a reply 

to Russell’s mistaken critique,” Psychol. Bull., vol. 115, pp. 268–287, 

1994. 

[6] P. Ekman, “Emotions inside out. 130 years after Darwin’s “The 

Expression of the Emotions in Man and Animal”,” Ann. NY Acad. Sci., 

vol. 1000, pp. 1–6, 2003. 

[7] M. Pantic and L. J. M. Rothkrantz, “Automatic analysis of facial 

expressions:the state of the art,” IEEE Trans. Pattern Anal. Mach. Intell., 

vol. 22, pp. 1424–1445, 2000. 

[8] B. Fasel, J. Luettin. Automatic facial expression analysis: a survey. 

Pattern Recognition. 2003, 36(1): 259-275. 

[9] P. Ekman and W. V. Friesen, Emotion in the Human Face. Englewood 

Cliffs, NJ: Prentice-Hall, 1975. 

[10] T. Kruozumi, Y. Shinza, Y. T. Kurozumi, Y. Shinza, Y. Kenmochi, and 

K. Kotani, “Facial individuality and expression analysis by eigenspace 

104

Part-based PCA for Facial Feature Extraction and Classification

Create successful ePaper yourself

Delete template?

Save as template?