Part-based PCA for Facial Feature Extraction and Classification
Part-based PCA for Facial Feature Extraction and Classification
Part-based PCA for Facial Feature Extraction and Classification
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Part</strong>-<strong>based</strong> <strong>PCA</strong> <strong>for</strong> <strong>Facial</strong> <strong>Feature</strong> <strong>Extraction</strong> <strong>and</strong> <strong>Classification</strong><br />
Yisu Zhao, Xiaojun Shen, Nicolas D. Georganas, Emil M. Petriu<br />
Distributed <strong>and</strong> Collaborative Virtual Environments Research Laboratory (DISCOVER Lab)<br />
School of In<strong>for</strong>mation Technology <strong>and</strong> Engineering<br />
University of Ottawa, K1N 6N5<br />
Canada<br />
{yzhao/ shen / georganas / petriu}@discover.uottawa.ca<br />
Abstract – With the latest advances in the fields of computer<br />
vision, image processing <strong>and</strong> pattern recognition, facial expression<br />
recognition is becoming more <strong>and</strong> more feasible <strong>for</strong> human<br />
computer interaction in Virtual Environments (VEs). In order to<br />
achieve subject-independent facial feature extraction <strong>and</strong><br />
classification, we present part-<strong>based</strong> <strong>PCA</strong> (Principal Component<br />
Analysis) <strong>for</strong> facial feature extraction <strong>and</strong> apply a modified <strong>PCA</strong><br />
reconstruction method <strong>for</strong> expression classification. <strong>Part</strong>-<strong>based</strong><br />
<strong>PCA</strong> is employed to minimize the influence of individual<br />
differences which hinder facial expression recognition. For the<br />
purpose of obtaining part-<strong>based</strong> <strong>PCA</strong>, a novel feature detection<br />
<strong>and</strong> extraction approach <strong>based</strong> on multi-step integral projection is<br />
proposed. The features can be automatically detected <strong>and</strong> located<br />
by multi-step integral projection curves without being manually<br />
picked <strong>and</strong> <strong>PCA</strong> is applied in the detected area instead of the<br />
whole face. To solve the problem that the features extracted from<br />
<strong>PCA</strong> are not the best features suitable <strong>for</strong> classification, we<br />
propose a modified <strong>PCA</strong> reconstruction method. We divide the<br />
training set into 7 classes <strong>and</strong> carry out <strong>PCA</strong> reconstruction on<br />
each class independently. We can identify the expression class by<br />
measuring the similarity between the input image <strong>and</strong> the<br />
reconstructed image. Experiments demonstrate that when tested<br />
on the JAFFE database, the part-<strong>based</strong> <strong>PCA</strong> outper<strong>for</strong>ms<br />
traditional <strong>PCA</strong> of higher recognition rate.<br />
Keywords – facial feature extraction <strong>and</strong> classification, part<strong>based</strong><br />
<strong>PCA</strong>, multi-step integral projection.<br />
I. INTRODUCTION<br />
Human computer interfaces (HCI) have evolved from text<strong>based</strong><br />
interfaces through 2D graphical-<strong>based</strong> interfaces,<br />
multimedia-supported interfaces to fully-fledged multiparticipant<br />
Virtual Environment (VE) systems [1]. Instead of<br />
applying traditional two dimensional HCI devices such as the<br />
keyboards <strong>and</strong> the mice, VE applications require utilizing<br />
several various modalities <strong>and</strong> technologies <strong>and</strong> integrating<br />
them into a more immersive user experience [2]. For a more<br />
immersive virtual environment human computer interaction,<br />
integrating multimodal sensory in<strong>for</strong>mation such as h<strong>and</strong><br />
gesture, speech, sound, body posture <strong>and</strong> facial expression is<br />
necessary. This means that the communication between<br />
human <strong>and</strong> the VE can be more naturally if the computer<br />
could detect <strong>and</strong> express human affective states by applying<br />
the devices which sense h<strong>and</strong> gestures, body position, facial<br />
expressions, voice, etc. The most expressive way humans<br />
display their emotional state is through facial expressions.<br />
Research in social psychology [3-7] suggests that facial<br />
expressions <strong>for</strong>m the major modality in human<br />
communication <strong>and</strong> is a visible manifestation of the affective<br />
state of a person. Many applications, such as VE, videoconferencing,<br />
synthetic face animation require efficient facial<br />
expression recognition in order to achieve the desired results<br />
[8]. The facial expressions under examination were defined<br />
by psychologists as a set of six universal facial emotions<br />
including happiness, sadness, anger, disgust, fear, <strong>and</strong><br />
surprise [9].<br />
A generic facial expression recognition system usually has a<br />
sequential configuration of processing steps: face detection,<br />
pre-processing, feature detection, feature extraction <strong>and</strong><br />
classification. Early research of facial expression recognition<br />
needs the help of markers <strong>for</strong> facial feature point detection. In<br />
our research, we per<strong>for</strong>m principal component analysis (<strong>PCA</strong>)<br />
in facial feature detection <strong>and</strong> extraction without any help of<br />
markers. <strong>PCA</strong> is a popular technique which has been<br />
successfully used in face recognition [10] [11]. <strong>Part</strong>-<strong>based</strong><br />
<strong>PCA</strong> is used in order to avoid the influence of individual<br />
differences. A novel method called multi-step integral<br />
projection has been used to automatically locate <strong>and</strong> detect<br />
the eyes <strong>and</strong> mouth areas. We also develop <strong>and</strong> modify the<br />
<strong>PCA</strong> reconstruction method <strong>for</strong> further expression<br />
classification. The modified <strong>PCA</strong> reconstruction is then<br />
applied on the extracted areas.<br />
II. METHODOLOGY<br />
1. Face Detection <strong>and</strong> Pre-processing<br />
In order to build a system capable of automatically<br />
capturing facial feature positions in a face scene, the first step<br />
is to detect <strong>and</strong> extract the human face from the background<br />
image. We make use of a robust <strong>and</strong> automated real-time face<br />
detection scheme proposed by Viola <strong>and</strong> Jones [12] [13],<br />
which consists of a cascade of classifiers trained by<br />
AdaBoost. In their algorithm, the concept of "integral image"<br />
is also introduced to compute a rich set of Haar-like features<br />
(see Fig 1). Each classifier employs integral image filters,<br />
which allows the features be computed very fast at any<br />
location <strong>and</strong> scale. For each stage in the cascade, a subset of<br />
features is chosen using a feature selection procedure <strong>based</strong><br />
on AdaBoost. The Viola-Jones algorithm is approximately 15<br />
times faster than any other previous approaches while<br />
achieving equivalent accuracy as the best published results<br />
[12]. Fig 2 shows the detection of the human face using the<br />
Viola-Jones algorithm.<br />
Since input images are affected by the type of camera,<br />
illumination conditions, background in<strong>for</strong>mation <strong>and</strong> so on<br />
978-1-4244-4218-8/09/$25.00 ©2009 IEEE 99
<strong>and</strong> so <strong>for</strong>th we need to normalize the face images be<strong>for</strong>e<br />
feature detection <strong>and</strong> extraction. The aim of pre-processing is<br />
to eliminate the differences between input images as far as<br />
possible, so that we could detect <strong>and</strong> extract them under the<br />
same conditions. Expression representation can be sensitive<br />
to translation, scaling, <strong>and</strong> rotation of the head in an image, to<br />
combat the effect of these unwanted trans<strong>for</strong>mations, the<br />
steps of pre-processing steps are:<br />
1) Trans<strong>for</strong>m the face video into face images.<br />
2) Convert the input color images into gray-scale<br />
images.<br />
3) Normalize the face images into the same size of<br />
128 × 128 pixels. Scale normalization is used to align<br />
all facial features. We use the Lanczos Resampling<br />
[14] method to resize images.<br />
4) Smooth face images to remove noise by using<br />
Mean-<strong>based</strong> Fast Median Filter [15].<br />
5) Per<strong>for</strong>m grayscale equalization [16] to reduce the<br />
influence due to illumination variety <strong>and</strong> ethnicity.<br />
Although Gabor trans<strong>for</strong>mation is insensitive to<br />
illumination variety, by using histogram<br />
equalization, the results will be improved.<br />
(1) (2) (3) (4)<br />
Fig 1. A set of Harr-like features<br />
a) Input Video b) Detection c) Detected face<br />
Fig 2. Face Detection using Harr-like features<br />
2. <strong>Facial</strong> <strong>Feature</strong> Detection <strong>and</strong> <strong>Classification</strong> using <strong>Part</strong>-<strong>based</strong><br />
<strong>PCA</strong> <strong>and</strong> <strong>PCA</strong> Reconstruction<br />
After the pre-processing step, we can employ the preprocessed<br />
image <strong>for</strong> facial feature detection <strong>and</strong> extraction.<br />
The reason we choose the <strong>PCA</strong> algorithm in our research is<br />
because of its speed <strong>and</strong> simplicity. <strong>PCA</strong> (Principal<br />
Component Analysis) is a popular technique <strong>for</strong> data<br />
dimensionality reduction <strong>and</strong> has been widely used in the<br />
computer vision area, such as face recognition <strong>and</strong> object<br />
recognition [17]. It is used to reduce a complex data set into a<br />
lower dimension in order to reveal the hidden <strong>and</strong> simplified<br />
structure under it. It has been called one of the most valuable<br />
results from applied linear algebra [18].<br />
2.1 Applying <strong>PCA</strong> in <strong>Facial</strong> Expression Recognition<br />
Assume the width <strong>and</strong> height of the image are n <strong>and</strong> m<br />
pixels respectively, the size of the trans<strong>for</strong>med vector of this<br />
image is d=n*m. Given k pre-processed facial expression<br />
images as training data, we convert these images into<br />
corresponding column image vectorsτ = { τ i<br />
, i=1,2,3,…,k}.<br />
Compute the mean of training data ψ :<br />
1 k i<br />
k n=<br />
1<br />
ψ = τ<br />
The difference vector φ<br />
i<br />
is defined as φ<br />
i<br />
= τ i<br />
-ψ , where ψ is<br />
the average vector of τ<br />
i<br />
. The covariance matrix<br />
corresponding to all training samples is obtained as:<br />
C<br />
k i=<br />
1<br />
A [ ψψ ,..., ψ k<br />
]<br />
k<br />
1 T 1<br />
= ψψ =<br />
<br />
i<br />
i<br />
AA<br />
k<br />
where =<br />
1, 2<br />
.<br />
The principal components are then the eigenvectors of C,<br />
where C is a d*d matrix. It contains d eigenvectors<br />
λ λ λ . However, it<br />
VV ,<br />
2,..., V<br />
d<br />
<strong>and</strong> d eigenvalues 1 2<br />
T<br />
, ,...,<br />
d<br />
is time-consuming to determine d eigenvalues <strong>and</strong><br />
eigenvectors. There<strong>for</strong>e, it is necessary to reduce the<br />
complexity of computation. According to SVD (Singular<br />
T<br />
T<br />
Value Decomposition), AA <strong>and</strong> AA have the same<br />
eigenvalue λ . As a result, instead of directly computing<br />
eigenvectors<br />
d<br />
ui<br />
of matrix<br />
T<br />
AA , eigenvectors v i<br />
of matrix<br />
T<br />
T<br />
AAis computed. Eigenvector u i<br />
of AA can be defined<br />
by<br />
u<br />
i<br />
=<br />
1<br />
Avi<br />
λ<br />
The face image can be represented by projecting the data in<br />
the image space onto the face space, from which we can<br />
obtain the projection vectors. By sorting the eigenvalues in<br />
descending order, we could select the corresponding<br />
eigenvectors. The bigger the eigenvectors are, the more<br />
important the corresponding eigenvalues are. These<br />
eigenvectors can compose a much lower dimension matrix.<br />
This greatly reduced the dimension comparing with the<br />
original matrixτ i<br />
. If we reconstruct the matrix using these<br />
eigenvectors, we could find that they appear to be face-like<br />
images (see Fig 3)<br />
Fig 3 Face-like Image<br />
Face recognition tries to make use of the differences between<br />
individuals while facial expression recognition take<br />
i<br />
100
advantage of the differences among expressions, that is, the<br />
differences between individuals which are useful <strong>for</strong> face<br />
recognition may become interference in facial expression<br />
recognition since individual differences must be neglected in<br />
facial expression recognition. Traditionally, <strong>PCA</strong> is applied<br />
in a whole face image which contains many individual<br />
differences. This happens to explain why <strong>PCA</strong> is commonly<br />
used in face recognition while few studies show it being used<br />
in facial expression recognition [19]. Another problem that is<br />
a h<strong>and</strong>icap <strong>PCA</strong> <strong>for</strong> facial expression recognition is that the<br />
features extracted from <strong>PCA</strong> are not the best features suitable<br />
<strong>for</strong> classification but <strong>for</strong> expressing the data set. Based on<br />
these two factors that might affect expression recognition<br />
results, we propose part-<strong>based</strong> <strong>PCA</strong> <strong>for</strong> facial feature<br />
extraction <strong>and</strong> apply a modified <strong>PCA</strong> reconstruction method<br />
<strong>for</strong> expression classification.<br />
2.2 <strong>Part</strong>-<strong>based</strong> <strong>PCA</strong><br />
To avoid the influence of personal differences, instead of<br />
applying <strong>PCA</strong> on the whole facial image, we attempt to apply<br />
<strong>PCA</strong> on part of the face image where only useful facial<br />
regions are analyzed. This can refine useful in<strong>for</strong>mation <strong>and</strong><br />
ab<strong>and</strong>on useless ones, such as the disturbance of facial <strong>for</strong>m<br />
<strong>and</strong> ratios of facial features. The most important areas in<br />
human faces <strong>for</strong> classifying expression are eyes, eyebrows<br />
<strong>and</strong> mouth. Other areas in the human face contribute little or<br />
even encumber facial expression recognition. In this section,<br />
we propose a new facial feature location approach call multistep<br />
integral projection <strong>for</strong> feature area detection. For each of<br />
the integral projection step, the location of the eyes <strong>and</strong><br />
mouth area is getting more <strong>and</strong> more accurate.<br />
The integral projection technique was originally propose by<br />
Kanade[20]. The basic idea of gray-level integral projection<br />
is to accumulate the sum of vertical gray-level from <strong>and</strong><br />
horizontal gray-level respectively. The vertical gray-level<br />
integral projection st<strong>and</strong>s <strong>for</strong> the variations on the horizontal<br />
direction of an image. The horizontal gray-level integral<br />
projection st<strong>and</strong>s <strong>for</strong> the variations on the vertical direction of<br />
an image. Suppose there is an image m*n. The gray-level of<br />
each pixel is I(x,y). Thus, the definition of the vertical<br />
projection function is<br />
S ( )<br />
x<br />
y x = <br />
y − 1<br />
I( x, y)<br />
The definition of the horizontal projection function is<br />
m<br />
x−1<br />
Sx( y) = I( x, y)<br />
Be<strong>for</strong>e employing integral projection to facial expression<br />
detection, we need to convert input image into binary image.<br />
Image binarization is one of the main techniques <strong>for</strong> image<br />
segmentation. It segments an image into <strong>for</strong>eground <strong>and</strong><br />
background. The image will only appear in two gray-levels:<br />
the brightest level 255 <strong>and</strong> the darkest level 0. The<br />
<strong>for</strong>eground contains in<strong>for</strong>mation of interest. The most<br />
important part in image binarization is the threshold selection.<br />
Image thresholding is a useful method in many image<br />
processing. We use a nonparametric <strong>and</strong> unsupervised<br />
method of automatic threshold selection, called the Otsu<br />
method [21]. This method has been widely used as the<br />
classical technique in thresholding tasks since it is not<br />
sensitive to non-uni<strong>for</strong>m illuminations. Gray-level histogram<br />
indicates the total pixels of an image at a gray-level, that is,<br />
the distribution of pixels of the image (see Fig 4). The main<br />
idea of Otsu is to dichotomize the gray-level histogram into<br />
two classes by a threshold level. The maximum value m of<br />
the variance σ between 0~K is the threshold we are looking<br />
<strong>for</strong>.<br />
Fig 4. The gray-level histogram<br />
Using the threshold, we convert the original picture into a<br />
binary image, that is, an image with pixel values 0 <strong>and</strong> 255,<br />
representing black <strong>and</strong> white respectively. The black is called<br />
the <strong>for</strong>eground which contains facial feature in<strong>for</strong>mation that<br />
we are interested in; while the white background will be<br />
ignored since it is useless in<strong>for</strong>mation.<br />
After image binarization, we make use of multi-step integral<br />
projection to obtain facial feature positions. The first step is<br />
applying horizontal integral projection on original binary<br />
image. Fig 5 shows the result of vertical <strong>and</strong> horizontal<br />
projection curves. Suppose I ( xy , ) is a gray value of an<br />
image, the horizontal integral projection in intervals [ y1, y<br />
2]<br />
<strong>and</strong> the vertical projection in intervals [ x 1<br />
, x 2<br />
] can be<br />
defined respectively as H ( y)<br />
<strong>and</strong> V( x ), thus we have:<br />
x2<br />
1<br />
H ( y) = I( x, y)<br />
x − x =<br />
2 1 x x1<br />
y2<br />
1<br />
V( x) = I( x, y)<br />
y − y =<br />
2 1 y<br />
The horizontal projection indicates the x-axis of the eyebrow,<br />
eyes, <strong>and</strong> mouth. Let x-axis of the eyes be the central point<br />
<strong>and</strong> double length from eyebrow to eye as the region, the<br />
result of the vertical projection indicates the y-axis of the left<br />
eye <strong>and</strong> right eye. Since the original integral projection<br />
curves are irregular, we use Bezier Curves [22, 23], which are<br />
used in computer graphics to model smooth curves at all<br />
scales. For any four points: Ax (<br />
A, y<br />
A)<br />
, B( xB, y<br />
B)<br />
,<br />
Cx (<br />
C, y<br />
C)<br />
, D( xD, y<br />
D)<br />
, it starts at Ax (<br />
A, y<br />
A)<br />
<strong>and</strong> ends<br />
at D( xD, y<br />
D)<br />
, the so-called end points. Cx (<br />
C, y<br />
C)<br />
,<br />
D( xD, y<br />
D)<br />
are called the control points. There<strong>for</strong>e, any<br />
coordinate ( xt, yt)<br />
in a curve is:<br />
x = x ⋅ t + x ⋅(1 − t) + 3 ⋅xc⋅(1 −t) ⋅ t + 3 ⋅x ⋅t⋅(1 −t)<br />
3 3 2 2<br />
t A B D<br />
y = y ⋅ t + y ⋅(1 − t) + 3 ⋅yc⋅(1 −t) ⋅ t + 3 ⋅y ⋅t⋅(1 −t)<br />
3 3 2 2<br />
t A B D<br />
y1<br />
101
vertical position of the eyebrows as H1; set the second wave<br />
trough that corresponds to the vertical position of the eyes as<br />
H2. The starting vertical position of the eyes area is H1-<br />
0.2 × VH, where VH=H2-H1; while the ending vertical<br />
position is H2+0.8× VH. Here, we get the extracted eyes area<br />
shown in Fig 8.<br />
Fig 5. Integral projection curves<br />
If we rotate the projection curve 90 degree, aligning the<br />
image with its vertical position, we could detect the position<br />
of eyes <strong>and</strong> mouth region by observing the horizontal graylevel<br />
projection curves. In horizontal gray-level integral<br />
projection, wave troughs of previous features are <strong>for</strong>med on<br />
the curves of the graphs due to the majority of black color in<br />
the areas of eyebrows, eyes, <strong>and</strong> mouths. By observing the<br />
wave trough, we could locate the position of the eyes <strong>and</strong><br />
mouth areas. In horizontal gray-level projection, from the left<br />
to the right, the first minimum value represents the position<br />
of the eyebrow; the second minimum value represents the<br />
position of the eye. From the right to the left, the first<br />
minimum value represents the position of the mouth. Fig 6<br />
shows the position of the eyebrows, eyes <strong>and</strong> mouth.<br />
Fig 8. The extracted eyes area<br />
We apply the same method <strong>for</strong> mouth area detection. Suppose<br />
the vertical length of the face image is H. Set the first wave<br />
trough from the top after 0.7H as H4, <strong>and</strong> the closest wave<br />
trough above H4 as H3. The starting vertical position of the<br />
mouth area is H3+0.4 × VH, where VH=H4-H3; while the<br />
ending vertical position is H4+0.7× VH. The extracted mouth<br />
area is shown in Fig 9.<br />
Fig 6. The position of the eyebrows, eyes <strong>and</strong> mouth<br />
However, due to image complexity <strong>and</strong> noise, there might be<br />
some small wave trough in the projection curve, which<br />
interferes with eyes <strong>and</strong> mouth location detection. There<strong>for</strong>e,<br />
we need to smoothen the integral projection curves, filter<br />
minor wave troughs, <strong>and</strong> eliminate disturbing in<strong>for</strong>mation.<br />
After smoothening, there will be four main wave troughs<br />
appearing, which represent eyebrows, eyes, nose <strong>and</strong> mouth<br />
respectively, see Fig 7.<br />
Fig 9. The extracted mouth area<br />
Based on the extracted eyes <strong>and</strong> mouth areas, we still need to<br />
refine the extracted eyes <strong>and</strong> mouth areas <strong>for</strong> future Gabor<br />
trans<strong>for</strong>mation. We apply integral projection once more with<br />
vertical projection. The vertical projection curves of eyes <strong>and</strong><br />
mouth areas are shown in the following figure:<br />
Fig 7. Horizontal projection after smoothening<br />
The detail steps to detect eyes areas are: Considering the<br />
vertical length of the face image is H. Set the first wave<br />
trough from the top after 0.15H that corresponds to the<br />
Fig 10. Vertical projection curves of eyes <strong>and</strong> mouth areas<br />
The final extracted eyes area is the range from the left most<br />
maximum value W1 to the right most maximum value W2<br />
(see Fig 11); while the final extracted mouth area is between:<br />
the right most maximum value W1 from the middle to the left<br />
<strong>and</strong> the left most maximum value W2 from the middle to the<br />
right (see Fig 12).<br />
102
Fig 11. Final extracted eyes area<br />
image of each class, we can identify the expression class of<br />
the input image.<br />
We per<strong>for</strong>med the experiments on the JAFF database [25]<br />
using image sequences <strong>and</strong> compared the results between<br />
traditional <strong>PCA</strong> <strong>and</strong> part-<strong>based</strong> <strong>PCA</strong>. By applying traditional<br />
<strong>PCA</strong> with neural networks, we achieve 89.10% on trained<br />
data <strong>and</strong> 69.76% on untrained data; while by applying part<strong>based</strong><br />
<strong>PCA</strong> with modified <strong>PCA</strong> reconstruction, we obtain<br />
93.71% on trained data <strong>and</strong> 76.67% on untrained data. The<br />
results demonstrate that the part-<strong>based</strong> <strong>PCA</strong> outper<strong>for</strong>ms<br />
traditional <strong>PCA</strong> in terms of higher recognition accuracy.<br />
Fig 12. Final extracted mouth area.<br />
2.3 <strong>PCA</strong> Reconstruction <strong>for</strong> <strong>Facial</strong> Expression <strong>Classification</strong><br />
Using the idea of multi-step integral projection we can extract<br />
the eyes <strong>and</strong> mouth regions <strong>and</strong> then apply part-<strong>based</strong> <strong>PCA</strong><br />
on them. This can solve the problem of individual differences<br />
when applying whole face <strong>PCA</strong>. In order to solve the<br />
problem that the features extracted from <strong>PCA</strong> are not the best<br />
features suitable <strong>for</strong> classification but <strong>for</strong> expressing the data<br />
set, we propose a modified <strong>PCA</strong> reconstruction method [24].<br />
The main idea of this method is: Instead of having a single<br />
training set <strong>for</strong> <strong>PCA</strong> reconstruction, we divide the training set<br />
into different classes according to different facial expressions.<br />
For the JAFFE database [25], we divide the training set into<br />
seven classes. Each class represents a facial expression <strong>and</strong><br />
then per<strong>for</strong>m <strong>PCA</strong>s on these classes respectively.<br />
Let V be the matrix whose columns are the first k<br />
eigenvectors of C. The projection of image τ i<br />
into its space<br />
is given by<br />
P = V( τ − ψ )<br />
i<br />
The face image can be represented by P<br />
i<br />
. From this<br />
projection, we can get the reconstructed image R<br />
i<br />
T<br />
Ri<br />
= V Pi<br />
+ ψ<br />
Since the input image is much similar to one expression<br />
training set than the others, the reconstructed image will have<br />
less distortion than the image reconstructed from other<br />
eigenvectors of training expressions. For an input happy<br />
expression image, if we reconstruct the eyes area <strong>and</strong> mouth<br />
area in each expression class independently, we could find<br />
that the happy class shows the best similarity with the input<br />
image (See Fig 13). By measuring the similarity (distance<br />
measure) between the input image <strong>and</strong> the reconstructed<br />
i<br />
Fig 13 Comparison between original eye <strong>and</strong> mouth image<br />
with reconstructed images<br />
3. Error analysis<br />
Some of the incorrect facial expression recognitions are due<br />
to the difference of expression images in the database are not<br />
obvious. Another reason is that the boundaries between some<br />
expression, such as anger <strong>and</strong> disgust, disgust <strong>and</strong> sad, fear<br />
<strong>and</strong> sad, etc. are not very clear. The above two reasons cause<br />
most of the false recognition results. Table 1 lists some<br />
incorrect classification examples. a) list of the expressions<br />
that the images are supposed to be classified into; b) list of<br />
our classification results. We could see that some expressions<br />
103
are even impossible <strong>for</strong> human beings to classify correctly.<br />
As a result, the above two factors encumber the recognition<br />
rates to some extent.<br />
Images<br />
Table 1 Incorrect classification examples<br />
a) Sadness Surprise Disgust Surprise<br />
b) Happiness Happiness Sadness Neutral<br />
III. CONCLUSION<br />
For a more immersive virtual environment human<br />
computer interaction, applying multimodal sensory<br />
in<strong>for</strong>mation such as h<strong>and</strong> gesture, speech, sound, body<br />
posture <strong>and</strong> facial expression is necessary. In this paper, our<br />
research focuses on facial expression recognition to express<br />
human affective states. We present part-<strong>based</strong> <strong>PCA</strong> <strong>for</strong> facial<br />
feature extraction <strong>and</strong> apply a modified <strong>PCA</strong> reconstruction<br />
method <strong>for</strong> expression classification. <strong>Part</strong>-<strong>based</strong> <strong>PCA</strong> is<br />
proposed as to minimize the influence of individual<br />
differences. In order to achieve part-<strong>based</strong> <strong>PCA</strong>, a novel<br />
feature detection <strong>and</strong> extraction approach <strong>based</strong> on multi-step<br />
integral projection is proposed. The features can be<br />
accurately detected <strong>and</strong> located by multi-step integral<br />
projection curves <strong>and</strong> <strong>PCA</strong> is applied in the detected area<br />
instead of the whole face. To solve the problem that the<br />
features extracted from <strong>PCA</strong> are not the best features suitable<br />
<strong>for</strong> classification but <strong>for</strong> expressing the data set, we propose a<br />
modified <strong>PCA</strong> reconstruction method. We divide the training<br />
set into 7 classes <strong>and</strong> carry out <strong>PCA</strong> reconstruction on each<br />
class independently. The expression can be recognized by<br />
measuring the similarity between the input image <strong>and</strong> the<br />
reconstructed images.<br />
method <strong>based</strong> on class features or multiple discriminant analysis,” in<br />
Proceedings of the International Conference on Image Processing, Vol.<br />
1, 1999, pp. 648-652.<br />
[11] M. Turk <strong>and</strong> A. Pentl<strong>and</strong>, “Eigenfaces <strong>for</strong> recognition,” Journal of<br />
Cognitive Neuroscience, Vol. 19, 1997, pp. 743-756.<br />
[12] P.Viola, M. Jones, ‘‘Robust real-time object detection,’’ Cambridge<br />
Research Laboratory Technical Report Series CRL2001/01, pp. 1-24,<br />
2001<br />
[13] P.Viola, M. Jones, ‘‘Robust real-time face detection,’’, International<br />
Journal of Computer Vision 57(2), 137-154,2004<br />
[14] K. Turkowski, Apple Computer, “Filters <strong>for</strong> Common Resampling<br />
Tasks” Filters <strong>for</strong> Common Resampling , (Tasks 10 April 1990)<br />
[15] L. Zhang. I. Chen, Z, Gao, W., “Mean-<strong>based</strong> fast median filter”, Journal<br />
of Tsinghua University, 2004, Vol 44; <strong>Part</strong> 9, page 1157-1159<br />
[16] J. Sumbera, “Histogram Equalization”, CS – 4802, Digital Image<br />
Processing, Lab #2<br />
[17] P N. Belhumeur , J. P. Hespanha , D. J. Kriegman, Eigenfacesvs.<br />
Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE<br />
Transactions on Pattern Analysis <strong>and</strong> Machine Intelligence, v.19 n.7,<br />
p.711-720, July 1997<br />
[18] J. Shlens, A Tutorial on Principal Component Analysis, 2005<br />
[19] A. J. Calder, A. M. Burdon, P. Miller, A. W. Yong, S. Akamatsu, “A<br />
principal component analysis of facial expressions”, Vision Resrach<br />
41 (2001) 1179-1208<br />
[20] T.Kanade, “Picture Processing System by Computer Complex <strong>and</strong><br />
Recognition of Human Faces,” Ph.D. thesis, Kyoto University, Japan,<br />
Nov. 1973.<br />
[21] N. Otsu. “A threshold selection method from gray-level histograms” [J].<br />
IEEE Trans on SMC, 1979, 9(1): 62~66<br />
[22] G. G. Mateos, A. Ruiz, P. E. Lopez-de-Teruel. “Face Detection Using<br />
Integral Projection Models,” Proceedings of the Joint IAPR<br />
International Workshop on Structural, Syntactic, <strong>and</strong> Statistical<br />
Pattern Recognition. 2002, pp. 644-653<br />
[23] Z. H. Zhou, X. Geng “Projection functions <strong>for</strong> eye detection,” Pattern<br />
Recognition 37 (2004) 1049-1056<br />
[24] L. M. Borja, O. FuentesJ.D. Foley, “Object detection using image<br />
reconstruction with <strong>PCA</strong>”, Image <strong>and</strong> Vision Computing 27 (2009) 2-9<br />
[25] M. Lyons, J. Budynek, <strong>and</strong> S. Akamatsu, “Automatic classification of<br />
single facial images,” IEEE Transactions on Pattern Analysis <strong>and</strong><br />
Machine Intelligence, Vol. 21, 1999, pp. 1357-1362.<br />
IV. REFERENCES<br />
[1] Q. Chen, Real-time Vision-<strong>based</strong> H<strong>and</strong> Gesture Recognition Using<br />
Haar-like <strong>Feature</strong>s, IMTC IEEE Vol., Issue, pp. 1-6,200<br />
[2] M.Turk, Gesture Recognition in H<strong>and</strong>book of Virtual Environment<br />
Technology, Lawrence Erlbaum Associates, Inc., 2001<br />
[3] K. Matsumura, Y. Nakamura, <strong>and</strong> K. Matsui, “Mathematical<br />
representation <strong>and</strong> image generation of human faces by<br />
metamorphosis,” Electron. Commun. Jpn., vol. 80, pp. 36–46, 1997.<br />
[4] P. Ekman, “<strong>Facial</strong> expression <strong>and</strong> emotion,” Am. Psychol., vol. 48, pp.<br />
384–392, 1993.<br />
[5] P. Ekman., “Strong evidence <strong>for</strong> universals in facial expressions: a reply<br />
to Russell’s mistaken critique,” Psychol. Bull., vol. 115, pp. 268–287,<br />
1994.<br />
[6] P. Ekman, “Emotions inside out. 130 years after Darwin’s “The<br />
Expression of the Emotions in Man <strong>and</strong> Animal”,” Ann. NY Acad. Sci.,<br />
vol. 1000, pp. 1–6, 2003.<br />
[7] M. Pantic <strong>and</strong> L. J. M. Rothkrantz, “Automatic analysis of facial<br />
expressions:the state of the art,” IEEE Trans. Pattern Anal. Mach. Intell.,<br />
vol. 22, pp. 1424–1445, 2000.<br />
[8] B. Fasel, J. Luettin. Automatic facial expression analysis: a survey.<br />
Pattern Recognition. 2003, 36(1): 259-275.<br />
[9] P. Ekman <strong>and</strong> W. V. Friesen, Emotion in the Human Face. Englewood<br />
Cliffs, NJ: Prentice-Hall, 1975.<br />
[10] T. Kruozumi, Y. Shinza, Y. T. Kurozumi, Y. Shinza, Y. Kenmochi, <strong>and</strong><br />
K. Kotani, “<strong>Facial</strong> individuality <strong>and</strong> expression analysis by eigenspace<br />
104