Part-based PCA for Facial Feature Extraction and Classification

More documents

Recommendations

Info

and so forth we need to normalize the face images before feature detection and extraction. The aim of pre-processing is to eliminate the differences between input images as far as possible, so that we could detect and extract them under the same conditions. Expression representation can be sensitive to translation, scaling, and rotation of the head in an image, to combat the effect of these unwanted transformations, the steps of pre-processing steps are: 1) Transform the face video into face images. 2) Convert the input color images into gray-scale images. 3) Normalize the face images into the same size of 128 × 128 pixels. Scale normalization is used to align all facial features. We use the Lanczos Resampling [14] method to resize images. 4) Smooth face images to remove noise by using Mean-based Fast Median Filter [15]. 5) Perform grayscale equalization [16] to reduce the influence due to illumination variety and ethnicity. Although Gabor transformation is insensitive to illumination variety, by using histogram equalization, the results will be improved. (1) (2) (3) (4) Fig 1. A set of Harr-like features a) Input Video b) Detection c) Detected face Fig 2. Face Detection using Harr-like features 2. Facial Feature Detection and Classification using Part-based PCA and PCA Reconstruction After the pre-processing step, we can employ the preprocessed image for facial feature detection and extraction. The reason we choose the PCA algorithm in our research is because of its speed and simplicity. PCA (Principal Component Analysis) is a popular technique for data dimensionality reduction and has been widely used in the computer vision area, such as face recognition and object recognition [17]. It is used to reduce a complex data set into a lower dimension in order to reveal the hidden and simplified structure under it. It has been called one of the most valuable results from applied linear algebra [18]. 2.1 Applying PCA in Facial Expression Recognition Assume the width and height of the image are n and m pixels respectively, the size of the transformed vector of this image is d=n*m. Given k pre-processed facial expression images as training data, we convert these images into corresponding column image vectorsτ = { τ i , i=1,2,3,…,k}. Compute the mean of training data ψ : 1 k i k n= 1 ψ = τ The difference vector φ i is defined as φ i = τ i -ψ , where ψ is the average vector of τ i . The covariance matrix corresponding to all training samples is obtained as: C k i= 1 A [ ψψ ,..., ψ k ] k 1 T 1 = ψψ = i i AA k where = 1, 2 . The principal components are then the eigenvectors of C, where C is a d*d matrix. It contains d eigenvectors λ λ λ . However, it VV , 2,..., V d and d eigenvalues 1 2 T , ,..., d is time-consuming to determine d eigenvalues and eigenvectors. Therefore, it is necessary to reduce the complexity of computation. According to SVD (Singular T T Value Decomposition), AA and AA have the same eigenvalue λ . As a result, instead of directly computing eigenvectors d ui of matrix T AA , eigenvectors v i of matrix T T AAis computed. Eigenvector u i of AA can be defined by u i = 1 Avi λ The face image can be represented by projecting the data in the image space onto the face space, from which we can obtain the projection vectors. By sorting the eigenvalues in descending order, we could select the corresponding eigenvectors. The bigger the eigenvectors are, the more important the corresponding eigenvalues are. These eigenvectors can compose a much lower dimension matrix. This greatly reduced the dimension comparing with the original matrixτ i . If we reconstruct the matrix using these eigenvectors, we could find that they appear to be face-like images (see Fig 3) Fig 3 Face-like Image Face recognition tries to make use of the differences between individuals while facial expression recognition take i 100
advantage of the differences among expressions, that is, the differences between individuals which are useful for face recognition may become interference in facial expression recognition since individual differences must be neglected in facial expression recognition. Traditionally, PCA is applied in a whole face image which contains many individual differences. This happens to explain why PCA is commonly used in face recognition while few studies show it being used in facial expression recognition [19]. Another problem that is a handicap PCA for facial expression recognition is that the features extracted from PCA are not the best features suitable for classification but for expressing the data set. Based on these two factors that might affect expression recognition results, we propose part-based PCA for facial feature extraction and apply a modified PCA reconstruction method for expression classification. 2.2 Part-based PCA To avoid the influence of personal differences, instead of applying PCA on the whole facial image, we attempt to apply PCA on part of the face image where only useful facial regions are analyzed. This can refine useful information and abandon useless ones, such as the disturbance of facial form and ratios of facial features. The most important areas in human faces for classifying expression are eyes, eyebrows and mouth. Other areas in the human face contribute little or even encumber facial expression recognition. In this section, we propose a new facial feature location approach call multistep integral projection for feature area detection. For each of the integral projection step, the location of the eyes and mouth area is getting more and more accurate. The integral projection technique was originally propose by Kanade[20]. The basic idea of gray-level integral projection is to accumulate the sum of vertical gray-level from and horizontal gray-level respectively. The vertical gray-level integral projection stands for the variations on the horizontal direction of an image. The horizontal gray-level integral projection stands for the variations on the vertical direction of an image. Suppose there is an image m*n. The gray-level of each pixel is I(x,y). Thus, the definition of the vertical projection function is S ( ) x y x = y − 1 I( x, y) The definition of the horizontal projection function is m x−1 Sx( y) = I( x, y) Before employing integral projection to facial expression detection, we need to convert input image into binary image. Image binarization is one of the main techniques for image segmentation. It segments an image into foreground and background. The image will only appear in two gray-levels: the brightest level 255 and the darkest level 0. The foreground contains information of interest. The most important part in image binarization is the threshold selection. Image thresholding is a useful method in many image processing. We use a nonparametric and unsupervised method of automatic threshold selection, called the Otsu method [21]. This method has been widely used as the classical technique in thresholding tasks since it is not sensitive to non-uniform illuminations. Gray-level histogram indicates the total pixels of an image at a gray-level, that is, the distribution of pixels of the image (see Fig 4). The main idea of Otsu is to dichotomize the gray-level histogram into two classes by a threshold level. The maximum value m of the variance σ between 0~K is the threshold we are looking for. Fig 4. The gray-level histogram Using the threshold, we convert the original picture into a binary image, that is, an image with pixel values 0 and 255, representing black and white respectively. The black is called the foreground which contains facial feature information that we are interested in; while the white background will be ignored since it is useless information. After image binarization, we make use of multi-step integral projection to obtain facial feature positions. The first step is applying horizontal integral projection on original binary image. Fig 5 shows the result of vertical and horizontal projection curves. Suppose I ( xy , ) is a gray value of an image, the horizontal integral projection in intervals [ y1, y 2] and the vertical projection in intervals [ x 1 , x 2 ] can be defined respectively as H ( y) and V( x ), thus we have: x2 1 H ( y) = I( x, y) x − x = 2 1 x x1 y2 1 V( x) = I( x, y) y − y = 2 1 y The horizontal projection indicates the x-axis of the eyebrow, eyes, and mouth. Let x-axis of the eyes be the central point and double length from eyebrow to eye as the region, the result of the vertical projection indicates the y-axis of the left eye and right eye. Since the original integral projection curves are irregular, we use Bezier Curves [22, 23], which are used in computer graphics to model smooth curves at all scales. For any four points: Ax ( A, y A) , B( xB, y B) , Cx ( C, y C) , D( xD, y D) , it starts at Ax ( A, y A) and ends at D( xD, y D) , the so-called end points. Cx ( C, y C) , D( xD, y D) are called the control points. Therefore, any coordinate ( xt, yt) in a curve is: x = x ⋅ t + x ⋅(1 − t) + 3 ⋅xc⋅(1 −t) ⋅ t + 3 ⋅x ⋅t⋅(1 −t) 3 3 2 2 t A B D y = y ⋅ t + y ⋅(1 − t) + 3 ⋅yc⋅(1 −t) ⋅ t + 3 ⋅y ⋅t⋅(1 −t) 3 3 2 2 t A B D y1 101
Page 1: Part-based PCA for Facial Feature E
Page 5 and 6: Fig 11. Final extracted eyes area i

Part-based PCA for Facial Feature Extraction and Classification

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?