25.11.2014 Views

Part-based PCA for Facial Feature Extraction and Classification

Part-based PCA for Facial Feature Extraction and Classification

Part-based PCA for Facial Feature Extraction and Classification

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Part</strong>-<strong>based</strong> <strong>PCA</strong> <strong>for</strong> <strong>Facial</strong> <strong>Feature</strong> <strong>Extraction</strong> <strong>and</strong> <strong>Classification</strong><br />

Yisu Zhao, Xiaojun Shen, Nicolas D. Georganas, Emil M. Petriu<br />

Distributed <strong>and</strong> Collaborative Virtual Environments Research Laboratory (DISCOVER Lab)<br />

School of In<strong>for</strong>mation Technology <strong>and</strong> Engineering<br />

University of Ottawa, K1N 6N5<br />

Canada<br />

{yzhao/ shen / georganas / petriu}@discover.uottawa.ca<br />

Abstract – With the latest advances in the fields of computer<br />

vision, image processing <strong>and</strong> pattern recognition, facial expression<br />

recognition is becoming more <strong>and</strong> more feasible <strong>for</strong> human<br />

computer interaction in Virtual Environments (VEs). In order to<br />

achieve subject-independent facial feature extraction <strong>and</strong><br />

classification, we present part-<strong>based</strong> <strong>PCA</strong> (Principal Component<br />

Analysis) <strong>for</strong> facial feature extraction <strong>and</strong> apply a modified <strong>PCA</strong><br />

reconstruction method <strong>for</strong> expression classification. <strong>Part</strong>-<strong>based</strong><br />

<strong>PCA</strong> is employed to minimize the influence of individual<br />

differences which hinder facial expression recognition. For the<br />

purpose of obtaining part-<strong>based</strong> <strong>PCA</strong>, a novel feature detection<br />

<strong>and</strong> extraction approach <strong>based</strong> on multi-step integral projection is<br />

proposed. The features can be automatically detected <strong>and</strong> located<br />

by multi-step integral projection curves without being manually<br />

picked <strong>and</strong> <strong>PCA</strong> is applied in the detected area instead of the<br />

whole face. To solve the problem that the features extracted from<br />

<strong>PCA</strong> are not the best features suitable <strong>for</strong> classification, we<br />

propose a modified <strong>PCA</strong> reconstruction method. We divide the<br />

training set into 7 classes <strong>and</strong> carry out <strong>PCA</strong> reconstruction on<br />

each class independently. We can identify the expression class by<br />

measuring the similarity between the input image <strong>and</strong> the<br />

reconstructed image. Experiments demonstrate that when tested<br />

on the JAFFE database, the part-<strong>based</strong> <strong>PCA</strong> outper<strong>for</strong>ms<br />

traditional <strong>PCA</strong> of higher recognition rate.<br />

Keywords – facial feature extraction <strong>and</strong> classification, part<strong>based</strong><br />

<strong>PCA</strong>, multi-step integral projection.<br />

I. INTRODUCTION<br />

Human computer interfaces (HCI) have evolved from text<strong>based</strong><br />

interfaces through 2D graphical-<strong>based</strong> interfaces,<br />

multimedia-supported interfaces to fully-fledged multiparticipant<br />

Virtual Environment (VE) systems [1]. Instead of<br />

applying traditional two dimensional HCI devices such as the<br />

keyboards <strong>and</strong> the mice, VE applications require utilizing<br />

several various modalities <strong>and</strong> technologies <strong>and</strong> integrating<br />

them into a more immersive user experience [2]. For a more<br />

immersive virtual environment human computer interaction,<br />

integrating multimodal sensory in<strong>for</strong>mation such as h<strong>and</strong><br />

gesture, speech, sound, body posture <strong>and</strong> facial expression is<br />

necessary. This means that the communication between<br />

human <strong>and</strong> the VE can be more naturally if the computer<br />

could detect <strong>and</strong> express human affective states by applying<br />

the devices which sense h<strong>and</strong> gestures, body position, facial<br />

expressions, voice, etc. The most expressive way humans<br />

display their emotional state is through facial expressions.<br />

Research in social psychology [3-7] suggests that facial<br />

expressions <strong>for</strong>m the major modality in human<br />

communication <strong>and</strong> is a visible manifestation of the affective<br />

state of a person. Many applications, such as VE, videoconferencing,<br />

synthetic face animation require efficient facial<br />

expression recognition in order to achieve the desired results<br />

[8]. The facial expressions under examination were defined<br />

by psychologists as a set of six universal facial emotions<br />

including happiness, sadness, anger, disgust, fear, <strong>and</strong><br />

surprise [9].<br />

A generic facial expression recognition system usually has a<br />

sequential configuration of processing steps: face detection,<br />

pre-processing, feature detection, feature extraction <strong>and</strong><br />

classification. Early research of facial expression recognition<br />

needs the help of markers <strong>for</strong> facial feature point detection. In<br />

our research, we per<strong>for</strong>m principal component analysis (<strong>PCA</strong>)<br />

in facial feature detection <strong>and</strong> extraction without any help of<br />

markers. <strong>PCA</strong> is a popular technique which has been<br />

successfully used in face recognition [10] [11]. <strong>Part</strong>-<strong>based</strong><br />

<strong>PCA</strong> is used in order to avoid the influence of individual<br />

differences. A novel method called multi-step integral<br />

projection has been used to automatically locate <strong>and</strong> detect<br />

the eyes <strong>and</strong> mouth areas. We also develop <strong>and</strong> modify the<br />

<strong>PCA</strong> reconstruction method <strong>for</strong> further expression<br />

classification. The modified <strong>PCA</strong> reconstruction is then<br />

applied on the extracted areas.<br />

II. METHODOLOGY<br />

1. Face Detection <strong>and</strong> Pre-processing<br />

In order to build a system capable of automatically<br />

capturing facial feature positions in a face scene, the first step<br />

is to detect <strong>and</strong> extract the human face from the background<br />

image. We make use of a robust <strong>and</strong> automated real-time face<br />

detection scheme proposed by Viola <strong>and</strong> Jones [12] [13],<br />

which consists of a cascade of classifiers trained by<br />

AdaBoost. In their algorithm, the concept of "integral image"<br />

is also introduced to compute a rich set of Haar-like features<br />

(see Fig 1). Each classifier employs integral image filters,<br />

which allows the features be computed very fast at any<br />

location <strong>and</strong> scale. For each stage in the cascade, a subset of<br />

features is chosen using a feature selection procedure <strong>based</strong><br />

on AdaBoost. The Viola-Jones algorithm is approximately 15<br />

times faster than any other previous approaches while<br />

achieving equivalent accuracy as the best published results<br />

[12]. Fig 2 shows the detection of the human face using the<br />

Viola-Jones algorithm.<br />

Since input images are affected by the type of camera,<br />

illumination conditions, background in<strong>for</strong>mation <strong>and</strong> so on<br />

978-1-4244-4218-8/09/$25.00 ©2009 IEEE 99


<strong>and</strong> so <strong>for</strong>th we need to normalize the face images be<strong>for</strong>e<br />

feature detection <strong>and</strong> extraction. The aim of pre-processing is<br />

to eliminate the differences between input images as far as<br />

possible, so that we could detect <strong>and</strong> extract them under the<br />

same conditions. Expression representation can be sensitive<br />

to translation, scaling, <strong>and</strong> rotation of the head in an image, to<br />

combat the effect of these unwanted trans<strong>for</strong>mations, the<br />

steps of pre-processing steps are:<br />

1) Trans<strong>for</strong>m the face video into face images.<br />

2) Convert the input color images into gray-scale<br />

images.<br />

3) Normalize the face images into the same size of<br />

128 × 128 pixels. Scale normalization is used to align<br />

all facial features. We use the Lanczos Resampling<br />

[14] method to resize images.<br />

4) Smooth face images to remove noise by using<br />

Mean-<strong>based</strong> Fast Median Filter [15].<br />

5) Per<strong>for</strong>m grayscale equalization [16] to reduce the<br />

influence due to illumination variety <strong>and</strong> ethnicity.<br />

Although Gabor trans<strong>for</strong>mation is insensitive to<br />

illumination variety, by using histogram<br />

equalization, the results will be improved.<br />

(1) (2) (3) (4)<br />

Fig 1. A set of Harr-like features<br />

a) Input Video b) Detection c) Detected face<br />

Fig 2. Face Detection using Harr-like features<br />

2. <strong>Facial</strong> <strong>Feature</strong> Detection <strong>and</strong> <strong>Classification</strong> using <strong>Part</strong>-<strong>based</strong><br />

<strong>PCA</strong> <strong>and</strong> <strong>PCA</strong> Reconstruction<br />

After the pre-processing step, we can employ the preprocessed<br />

image <strong>for</strong> facial feature detection <strong>and</strong> extraction.<br />

The reason we choose the <strong>PCA</strong> algorithm in our research is<br />

because of its speed <strong>and</strong> simplicity. <strong>PCA</strong> (Principal<br />

Component Analysis) is a popular technique <strong>for</strong> data<br />

dimensionality reduction <strong>and</strong> has been widely used in the<br />

computer vision area, such as face recognition <strong>and</strong> object<br />

recognition [17]. It is used to reduce a complex data set into a<br />

lower dimension in order to reveal the hidden <strong>and</strong> simplified<br />

structure under it. It has been called one of the most valuable<br />

results from applied linear algebra [18].<br />

2.1 Applying <strong>PCA</strong> in <strong>Facial</strong> Expression Recognition<br />

Assume the width <strong>and</strong> height of the image are n <strong>and</strong> m<br />

pixels respectively, the size of the trans<strong>for</strong>med vector of this<br />

image is d=n*m. Given k pre-processed facial expression<br />

images as training data, we convert these images into<br />

corresponding column image vectorsτ = { τ i<br />

, i=1,2,3,…,k}.<br />

Compute the mean of training data ψ :<br />

1 k i<br />

k n=<br />

1<br />

ψ = τ<br />

The difference vector φ<br />

i<br />

is defined as φ<br />

i<br />

= τ i<br />

-ψ , where ψ is<br />

the average vector of τ<br />

i<br />

. The covariance matrix<br />

corresponding to all training samples is obtained as:<br />

C<br />

k i=<br />

1<br />

A [ ψψ ,..., ψ k<br />

]<br />

k<br />

1 T 1<br />

= ψψ =<br />

<br />

i<br />

i<br />

AA<br />

k<br />

where =<br />

1, 2<br />

.<br />

The principal components are then the eigenvectors of C,<br />

where C is a d*d matrix. It contains d eigenvectors<br />

λ λ λ . However, it<br />

VV ,<br />

2,..., V<br />

d<br />

<strong>and</strong> d eigenvalues 1 2<br />

T<br />

, ,...,<br />

d<br />

is time-consuming to determine d eigenvalues <strong>and</strong><br />

eigenvectors. There<strong>for</strong>e, it is necessary to reduce the<br />

complexity of computation. According to SVD (Singular<br />

T<br />

T<br />

Value Decomposition), AA <strong>and</strong> AA have the same<br />

eigenvalue λ . As a result, instead of directly computing<br />

eigenvectors<br />

d<br />

ui<br />

of matrix<br />

T<br />

AA , eigenvectors v i<br />

of matrix<br />

T<br />

T<br />

AAis computed. Eigenvector u i<br />

of AA can be defined<br />

by<br />

u<br />

i<br />

=<br />

1<br />

Avi<br />

λ<br />

The face image can be represented by projecting the data in<br />

the image space onto the face space, from which we can<br />

obtain the projection vectors. By sorting the eigenvalues in<br />

descending order, we could select the corresponding<br />

eigenvectors. The bigger the eigenvectors are, the more<br />

important the corresponding eigenvalues are. These<br />

eigenvectors can compose a much lower dimension matrix.<br />

This greatly reduced the dimension comparing with the<br />

original matrixτ i<br />

. If we reconstruct the matrix using these<br />

eigenvectors, we could find that they appear to be face-like<br />

images (see Fig 3)<br />

Fig 3 Face-like Image<br />

Face recognition tries to make use of the differences between<br />

individuals while facial expression recognition take<br />

i<br />

100


advantage of the differences among expressions, that is, the<br />

differences between individuals which are useful <strong>for</strong> face<br />

recognition may become interference in facial expression<br />

recognition since individual differences must be neglected in<br />

facial expression recognition. Traditionally, <strong>PCA</strong> is applied<br />

in a whole face image which contains many individual<br />

differences. This happens to explain why <strong>PCA</strong> is commonly<br />

used in face recognition while few studies show it being used<br />

in facial expression recognition [19]. Another problem that is<br />

a h<strong>and</strong>icap <strong>PCA</strong> <strong>for</strong> facial expression recognition is that the<br />

features extracted from <strong>PCA</strong> are not the best features suitable<br />

<strong>for</strong> classification but <strong>for</strong> expressing the data set. Based on<br />

these two factors that might affect expression recognition<br />

results, we propose part-<strong>based</strong> <strong>PCA</strong> <strong>for</strong> facial feature<br />

extraction <strong>and</strong> apply a modified <strong>PCA</strong> reconstruction method<br />

<strong>for</strong> expression classification.<br />

2.2 <strong>Part</strong>-<strong>based</strong> <strong>PCA</strong><br />

To avoid the influence of personal differences, instead of<br />

applying <strong>PCA</strong> on the whole facial image, we attempt to apply<br />

<strong>PCA</strong> on part of the face image where only useful facial<br />

regions are analyzed. This can refine useful in<strong>for</strong>mation <strong>and</strong><br />

ab<strong>and</strong>on useless ones, such as the disturbance of facial <strong>for</strong>m<br />

<strong>and</strong> ratios of facial features. The most important areas in<br />

human faces <strong>for</strong> classifying expression are eyes, eyebrows<br />

<strong>and</strong> mouth. Other areas in the human face contribute little or<br />

even encumber facial expression recognition. In this section,<br />

we propose a new facial feature location approach call multistep<br />

integral projection <strong>for</strong> feature area detection. For each of<br />

the integral projection step, the location of the eyes <strong>and</strong><br />

mouth area is getting more <strong>and</strong> more accurate.<br />

The integral projection technique was originally propose by<br />

Kanade[20]. The basic idea of gray-level integral projection<br />

is to accumulate the sum of vertical gray-level from <strong>and</strong><br />

horizontal gray-level respectively. The vertical gray-level<br />

integral projection st<strong>and</strong>s <strong>for</strong> the variations on the horizontal<br />

direction of an image. The horizontal gray-level integral<br />

projection st<strong>and</strong>s <strong>for</strong> the variations on the vertical direction of<br />

an image. Suppose there is an image m*n. The gray-level of<br />

each pixel is I(x,y). Thus, the definition of the vertical<br />

projection function is<br />

S ( )<br />

x<br />

y x = <br />

y − 1<br />

I( x, y)<br />

The definition of the horizontal projection function is<br />

m<br />

x−1<br />

Sx( y) = I( x, y)<br />

Be<strong>for</strong>e employing integral projection to facial expression<br />

detection, we need to convert input image into binary image.<br />

Image binarization is one of the main techniques <strong>for</strong> image<br />

segmentation. It segments an image into <strong>for</strong>eground <strong>and</strong><br />

background. The image will only appear in two gray-levels:<br />

the brightest level 255 <strong>and</strong> the darkest level 0. The<br />

<strong>for</strong>eground contains in<strong>for</strong>mation of interest. The most<br />

important part in image binarization is the threshold selection.<br />

Image thresholding is a useful method in many image<br />

processing. We use a nonparametric <strong>and</strong> unsupervised<br />

method of automatic threshold selection, called the Otsu<br />

method [21]. This method has been widely used as the<br />

classical technique in thresholding tasks since it is not<br />

sensitive to non-uni<strong>for</strong>m illuminations. Gray-level histogram<br />

indicates the total pixels of an image at a gray-level, that is,<br />

the distribution of pixels of the image (see Fig 4). The main<br />

idea of Otsu is to dichotomize the gray-level histogram into<br />

two classes by a threshold level. The maximum value m of<br />

the variance σ between 0~K is the threshold we are looking<br />

<strong>for</strong>.<br />

Fig 4. The gray-level histogram<br />

Using the threshold, we convert the original picture into a<br />

binary image, that is, an image with pixel values 0 <strong>and</strong> 255,<br />

representing black <strong>and</strong> white respectively. The black is called<br />

the <strong>for</strong>eground which contains facial feature in<strong>for</strong>mation that<br />

we are interested in; while the white background will be<br />

ignored since it is useless in<strong>for</strong>mation.<br />

After image binarization, we make use of multi-step integral<br />

projection to obtain facial feature positions. The first step is<br />

applying horizontal integral projection on original binary<br />

image. Fig 5 shows the result of vertical <strong>and</strong> horizontal<br />

projection curves. Suppose I ( xy , ) is a gray value of an<br />

image, the horizontal integral projection in intervals [ y1, y<br />

2]<br />

<strong>and</strong> the vertical projection in intervals [ x 1<br />

, x 2<br />

] can be<br />

defined respectively as H ( y)<br />

<strong>and</strong> V( x ), thus we have:<br />

x2<br />

1<br />

H ( y) = I( x, y)<br />

x − x =<br />

2 1 x x1<br />

y2<br />

1<br />

V( x) = I( x, y)<br />

y − y =<br />

2 1 y<br />

The horizontal projection indicates the x-axis of the eyebrow,<br />

eyes, <strong>and</strong> mouth. Let x-axis of the eyes be the central point<br />

<strong>and</strong> double length from eyebrow to eye as the region, the<br />

result of the vertical projection indicates the y-axis of the left<br />

eye <strong>and</strong> right eye. Since the original integral projection<br />

curves are irregular, we use Bezier Curves [22, 23], which are<br />

used in computer graphics to model smooth curves at all<br />

scales. For any four points: Ax (<br />

A, y<br />

A)<br />

, B( xB, y<br />

B)<br />

,<br />

Cx (<br />

C, y<br />

C)<br />

, D( xD, y<br />

D)<br />

, it starts at Ax (<br />

A, y<br />

A)<br />

<strong>and</strong> ends<br />

at D( xD, y<br />

D)<br />

, the so-called end points. Cx (<br />

C, y<br />

C)<br />

,<br />

D( xD, y<br />

D)<br />

are called the control points. There<strong>for</strong>e, any<br />

coordinate ( xt, yt)<br />

in a curve is:<br />

x = x ⋅ t + x ⋅(1 − t) + 3 ⋅xc⋅(1 −t) ⋅ t + 3 ⋅x ⋅t⋅(1 −t)<br />

3 3 2 2<br />

t A B D<br />

y = y ⋅ t + y ⋅(1 − t) + 3 ⋅yc⋅(1 −t) ⋅ t + 3 ⋅y ⋅t⋅(1 −t)<br />

3 3 2 2<br />

t A B D<br />

y1<br />

101


vertical position of the eyebrows as H1; set the second wave<br />

trough that corresponds to the vertical position of the eyes as<br />

H2. The starting vertical position of the eyes area is H1-<br />

0.2 × VH, where VH=H2-H1; while the ending vertical<br />

position is H2+0.8× VH. Here, we get the extracted eyes area<br />

shown in Fig 8.<br />

Fig 5. Integral projection curves<br />

If we rotate the projection curve 90 degree, aligning the<br />

image with its vertical position, we could detect the position<br />

of eyes <strong>and</strong> mouth region by observing the horizontal graylevel<br />

projection curves. In horizontal gray-level integral<br />

projection, wave troughs of previous features are <strong>for</strong>med on<br />

the curves of the graphs due to the majority of black color in<br />

the areas of eyebrows, eyes, <strong>and</strong> mouths. By observing the<br />

wave trough, we could locate the position of the eyes <strong>and</strong><br />

mouth areas. In horizontal gray-level projection, from the left<br />

to the right, the first minimum value represents the position<br />

of the eyebrow; the second minimum value represents the<br />

position of the eye. From the right to the left, the first<br />

minimum value represents the position of the mouth. Fig 6<br />

shows the position of the eyebrows, eyes <strong>and</strong> mouth.<br />

Fig 8. The extracted eyes area<br />

We apply the same method <strong>for</strong> mouth area detection. Suppose<br />

the vertical length of the face image is H. Set the first wave<br />

trough from the top after 0.7H as H4, <strong>and</strong> the closest wave<br />

trough above H4 as H3. The starting vertical position of the<br />

mouth area is H3+0.4 × VH, where VH=H4-H3; while the<br />

ending vertical position is H4+0.7× VH. The extracted mouth<br />

area is shown in Fig 9.<br />

Fig 6. The position of the eyebrows, eyes <strong>and</strong> mouth<br />

However, due to image complexity <strong>and</strong> noise, there might be<br />

some small wave trough in the projection curve, which<br />

interferes with eyes <strong>and</strong> mouth location detection. There<strong>for</strong>e,<br />

we need to smoothen the integral projection curves, filter<br />

minor wave troughs, <strong>and</strong> eliminate disturbing in<strong>for</strong>mation.<br />

After smoothening, there will be four main wave troughs<br />

appearing, which represent eyebrows, eyes, nose <strong>and</strong> mouth<br />

respectively, see Fig 7.<br />

Fig 9. The extracted mouth area<br />

Based on the extracted eyes <strong>and</strong> mouth areas, we still need to<br />

refine the extracted eyes <strong>and</strong> mouth areas <strong>for</strong> future Gabor<br />

trans<strong>for</strong>mation. We apply integral projection once more with<br />

vertical projection. The vertical projection curves of eyes <strong>and</strong><br />

mouth areas are shown in the following figure:<br />

Fig 7. Horizontal projection after smoothening<br />

The detail steps to detect eyes areas are: Considering the<br />

vertical length of the face image is H. Set the first wave<br />

trough from the top after 0.15H that corresponds to the<br />

Fig 10. Vertical projection curves of eyes <strong>and</strong> mouth areas<br />

The final extracted eyes area is the range from the left most<br />

maximum value W1 to the right most maximum value W2<br />

(see Fig 11); while the final extracted mouth area is between:<br />

the right most maximum value W1 from the middle to the left<br />

<strong>and</strong> the left most maximum value W2 from the middle to the<br />

right (see Fig 12).<br />

102


Fig 11. Final extracted eyes area<br />

image of each class, we can identify the expression class of<br />

the input image.<br />

We per<strong>for</strong>med the experiments on the JAFF database [25]<br />

using image sequences <strong>and</strong> compared the results between<br />

traditional <strong>PCA</strong> <strong>and</strong> part-<strong>based</strong> <strong>PCA</strong>. By applying traditional<br />

<strong>PCA</strong> with neural networks, we achieve 89.10% on trained<br />

data <strong>and</strong> 69.76% on untrained data; while by applying part<strong>based</strong><br />

<strong>PCA</strong> with modified <strong>PCA</strong> reconstruction, we obtain<br />

93.71% on trained data <strong>and</strong> 76.67% on untrained data. The<br />

results demonstrate that the part-<strong>based</strong> <strong>PCA</strong> outper<strong>for</strong>ms<br />

traditional <strong>PCA</strong> in terms of higher recognition accuracy.<br />

Fig 12. Final extracted mouth area.<br />

2.3 <strong>PCA</strong> Reconstruction <strong>for</strong> <strong>Facial</strong> Expression <strong>Classification</strong><br />

Using the idea of multi-step integral projection we can extract<br />

the eyes <strong>and</strong> mouth regions <strong>and</strong> then apply part-<strong>based</strong> <strong>PCA</strong><br />

on them. This can solve the problem of individual differences<br />

when applying whole face <strong>PCA</strong>. In order to solve the<br />

problem that the features extracted from <strong>PCA</strong> are not the best<br />

features suitable <strong>for</strong> classification but <strong>for</strong> expressing the data<br />

set, we propose a modified <strong>PCA</strong> reconstruction method [24].<br />

The main idea of this method is: Instead of having a single<br />

training set <strong>for</strong> <strong>PCA</strong> reconstruction, we divide the training set<br />

into different classes according to different facial expressions.<br />

For the JAFFE database [25], we divide the training set into<br />

seven classes. Each class represents a facial expression <strong>and</strong><br />

then per<strong>for</strong>m <strong>PCA</strong>s on these classes respectively.<br />

Let V be the matrix whose columns are the first k<br />

eigenvectors of C. The projection of image τ i<br />

into its space<br />

is given by<br />

P = V( τ − ψ )<br />

i<br />

The face image can be represented by P<br />

i<br />

. From this<br />

projection, we can get the reconstructed image R<br />

i<br />

T<br />

Ri<br />

= V Pi<br />

+ ψ<br />

Since the input image is much similar to one expression<br />

training set than the others, the reconstructed image will have<br />

less distortion than the image reconstructed from other<br />

eigenvectors of training expressions. For an input happy<br />

expression image, if we reconstruct the eyes area <strong>and</strong> mouth<br />

area in each expression class independently, we could find<br />

that the happy class shows the best similarity with the input<br />

image (See Fig 13). By measuring the similarity (distance<br />

measure) between the input image <strong>and</strong> the reconstructed<br />

i<br />

Fig 13 Comparison between original eye <strong>and</strong> mouth image<br />

with reconstructed images<br />

3. Error analysis<br />

Some of the incorrect facial expression recognitions are due<br />

to the difference of expression images in the database are not<br />

obvious. Another reason is that the boundaries between some<br />

expression, such as anger <strong>and</strong> disgust, disgust <strong>and</strong> sad, fear<br />

<strong>and</strong> sad, etc. are not very clear. The above two reasons cause<br />

most of the false recognition results. Table 1 lists some<br />

incorrect classification examples. a) list of the expressions<br />

that the images are supposed to be classified into; b) list of<br />

our classification results. We could see that some expressions<br />

103


are even impossible <strong>for</strong> human beings to classify correctly.<br />

As a result, the above two factors encumber the recognition<br />

rates to some extent.<br />

Images<br />

Table 1 Incorrect classification examples<br />

a) Sadness Surprise Disgust Surprise<br />

b) Happiness Happiness Sadness Neutral<br />

III. CONCLUSION<br />

For a more immersive virtual environment human<br />

computer interaction, applying multimodal sensory<br />

in<strong>for</strong>mation such as h<strong>and</strong> gesture, speech, sound, body<br />

posture <strong>and</strong> facial expression is necessary. In this paper, our<br />

research focuses on facial expression recognition to express<br />

human affective states. We present part-<strong>based</strong> <strong>PCA</strong> <strong>for</strong> facial<br />

feature extraction <strong>and</strong> apply a modified <strong>PCA</strong> reconstruction<br />

method <strong>for</strong> expression classification. <strong>Part</strong>-<strong>based</strong> <strong>PCA</strong> is<br />

proposed as to minimize the influence of individual<br />

differences. In order to achieve part-<strong>based</strong> <strong>PCA</strong>, a novel<br />

feature detection <strong>and</strong> extraction approach <strong>based</strong> on multi-step<br />

integral projection is proposed. The features can be<br />

accurately detected <strong>and</strong> located by multi-step integral<br />

projection curves <strong>and</strong> <strong>PCA</strong> is applied in the detected area<br />

instead of the whole face. To solve the problem that the<br />

features extracted from <strong>PCA</strong> are not the best features suitable<br />

<strong>for</strong> classification but <strong>for</strong> expressing the data set, we propose a<br />

modified <strong>PCA</strong> reconstruction method. We divide the training<br />

set into 7 classes <strong>and</strong> carry out <strong>PCA</strong> reconstruction on each<br />

class independently. The expression can be recognized by<br />

measuring the similarity between the input image <strong>and</strong> the<br />

reconstructed images.<br />

method <strong>based</strong> on class features or multiple discriminant analysis,” in<br />

Proceedings of the International Conference on Image Processing, Vol.<br />

1, 1999, pp. 648-652.<br />

[11] M. Turk <strong>and</strong> A. Pentl<strong>and</strong>, “Eigenfaces <strong>for</strong> recognition,” Journal of<br />

Cognitive Neuroscience, Vol. 19, 1997, pp. 743-756.<br />

[12] P.Viola, M. Jones, ‘‘Robust real-time object detection,’’ Cambridge<br />

Research Laboratory Technical Report Series CRL2001/01, pp. 1-24,<br />

2001<br />

[13] P.Viola, M. Jones, ‘‘Robust real-time face detection,’’, International<br />

Journal of Computer Vision 57(2), 137-154,2004<br />

[14] K. Turkowski, Apple Computer, “Filters <strong>for</strong> Common Resampling<br />

Tasks” Filters <strong>for</strong> Common Resampling , (Tasks 10 April 1990)<br />

[15] L. Zhang. I. Chen, Z, Gao, W., “Mean-<strong>based</strong> fast median filter”, Journal<br />

of Tsinghua University, 2004, Vol 44; <strong>Part</strong> 9, page 1157-1159<br />

[16] J. Sumbera, “Histogram Equalization”, CS – 4802, Digital Image<br />

Processing, Lab #2<br />

[17] P N. Belhumeur , J. P. Hespanha , D. J. Kriegman, Eigenfacesvs.<br />

Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE<br />

Transactions on Pattern Analysis <strong>and</strong> Machine Intelligence, v.19 n.7,<br />

p.711-720, July 1997<br />

[18] J. Shlens, A Tutorial on Principal Component Analysis, 2005<br />

[19] A. J. Calder, A. M. Burdon, P. Miller, A. W. Yong, S. Akamatsu, “A<br />

principal component analysis of facial expressions”, Vision Resrach<br />

41 (2001) 1179-1208<br />

[20] T.Kanade, “Picture Processing System by Computer Complex <strong>and</strong><br />

Recognition of Human Faces,” Ph.D. thesis, Kyoto University, Japan,<br />

Nov. 1973.<br />

[21] N. Otsu. “A threshold selection method from gray-level histograms” [J].<br />

IEEE Trans on SMC, 1979, 9(1): 62~66<br />

[22] G. G. Mateos, A. Ruiz, P. E. Lopez-de-Teruel. “Face Detection Using<br />

Integral Projection Models,” Proceedings of the Joint IAPR<br />

International Workshop on Structural, Syntactic, <strong>and</strong> Statistical<br />

Pattern Recognition. 2002, pp. 644-653<br />

[23] Z. H. Zhou, X. Geng “Projection functions <strong>for</strong> eye detection,” Pattern<br />

Recognition 37 (2004) 1049-1056<br />

[24] L. M. Borja, O. FuentesJ.D. Foley, “Object detection using image<br />

reconstruction with <strong>PCA</strong>”, Image <strong>and</strong> Vision Computing 27 (2009) 2-9<br />

[25] M. Lyons, J. Budynek, <strong>and</strong> S. Akamatsu, “Automatic classification of<br />

single facial images,” IEEE Transactions on Pattern Analysis <strong>and</strong><br />

Machine Intelligence, Vol. 21, 1999, pp. 1357-1362.<br />

IV. REFERENCES<br />

[1] Q. Chen, Real-time Vision-<strong>based</strong> H<strong>and</strong> Gesture Recognition Using<br />

Haar-like <strong>Feature</strong>s, IMTC IEEE Vol., Issue, pp. 1-6,200<br />

[2] M.Turk, Gesture Recognition in H<strong>and</strong>book of Virtual Environment<br />

Technology, Lawrence Erlbaum Associates, Inc., 2001<br />

[3] K. Matsumura, Y. Nakamura, <strong>and</strong> K. Matsui, “Mathematical<br />

representation <strong>and</strong> image generation of human faces by<br />

metamorphosis,” Electron. Commun. Jpn., vol. 80, pp. 36–46, 1997.<br />

[4] P. Ekman, “<strong>Facial</strong> expression <strong>and</strong> emotion,” Am. Psychol., vol. 48, pp.<br />

384–392, 1993.<br />

[5] P. Ekman., “Strong evidence <strong>for</strong> universals in facial expressions: a reply<br />

to Russell’s mistaken critique,” Psychol. Bull., vol. 115, pp. 268–287,<br />

1994.<br />

[6] P. Ekman, “Emotions inside out. 130 years after Darwin’s “The<br />

Expression of the Emotions in Man <strong>and</strong> Animal”,” Ann. NY Acad. Sci.,<br />

vol. 1000, pp. 1–6, 2003.<br />

[7] M. Pantic <strong>and</strong> L. J. M. Rothkrantz, “Automatic analysis of facial<br />

expressions:the state of the art,” IEEE Trans. Pattern Anal. Mach. Intell.,<br />

vol. 22, pp. 1424–1445, 2000.<br />

[8] B. Fasel, J. Luettin. Automatic facial expression analysis: a survey.<br />

Pattern Recognition. 2003, 36(1): 259-275.<br />

[9] P. Ekman <strong>and</strong> W. V. Friesen, Emotion in the Human Face. Englewood<br />

Cliffs, NJ: Prentice-Hall, 1975.<br />

[10] T. Kruozumi, Y. Shinza, Y. T. Kurozumi, Y. Shinza, Y. Kenmochi, <strong>and</strong><br />

K. Kotani, “<strong>Facial</strong> individuality <strong>and</strong> expression analysis by eigenspace<br />

104

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!