Face Recognition by Elastic Bunch Graph Matching* - FIAS

7th Intern. Conf. on Computer Analysis of Images and Patterns, 

Kiel, Germany, September 1997. 

Face Recognition by Elastic Bunch 

Graph Matching 

Laurenz Wiskott 1y , Jean-Marc Fellous 2z , 

Norbert Kruger 1 , and Christoph von der Malsburg 1;2 

1 Institut fur Neuroinformatik 

Ruhr-Universitat Bochum 

D-44780 Bochum, Germany 

http://www.neuroinformatik.ruhr-uni-bochum.de 

2 Computer Science Department 

University of Southern California 

Los Angeles, CA 90089, USA 

Abstract 

We present a system for recognizing human faces from single images 

out of a large database with one image per person. The task is dicult 

because of image variation in terms of position, size, expression, and pose. 

The system collapses most of this variance by extracting concise face descriptions 

in the form of image graphs. In these, ducial points on the 

face (eyes, mouth etc.) are described by sets of wavelet components (jets). 

Image graph extraction is based on a novel approach, the bunch graph, 

which is constructed from a small set of sample image graphs. Recognition 

is based on a straight-forward comparison of image graphs. We 

report recognition experiments on the FERET database and the Bochum 

database, including recognition across pose. 

1 Introduction 

The system presented here is based on a face recognition system described in [3]. 

Faces are represented by labeled graphs based on a Gabor wavelet transform. A 

 

Supported by grants from the German Federal Ministry for Science and Technology (413- 

5839-01 IN 101 B9) and from ARPA and the U.S. Army Research Lab (01/93/K-109). 

y 

Current address: Computational Neurobiology Laboratory, The Salk Institute for Biological 

Studies, San Diego, CA 92186-5800, http://www.cnl.salk.edu/CNL, wiskott@salk.edu. 

z 

Current address: Volen Center for Complex Systems, Brandeis University, Waltham, MA 

02254-9110. 

1

matching process allows nding image graphs of new faces automatically. Recognition 

is then based on a simple similarity measure between the image graph and 

a gallery of model graphs. We have made three major extensions to the preceding 

system in order to handle larger galleries and larger variations in pose, 

and to increase the matching accuracy. Firstly, we use the phase of the complex 

Gabor wavelet coecients to achieve a more accurate location of the nodes of 

the graph. Secondly, we employ object adapted graphs, so that nodes refer to 

specic facial landmarks, called ducial points. The correct correspondences between 

two faces can then be found across large viewpoint changes. Thirdly, we 

have introduced a new data structure, called the bunch graph, which serves as a 

generalized representation of faces by combining jets of a small set of individual 

faces. This allows the system to nd the ducial points in one matching process, 

which eliminates the need for matching each model graph individually. This 

reduces computational eort signicantly. A more detailed description of this 

system is given in [8]. 

2 The System 

2.1 Preprocessing with Gabor Wavelets 

The representation of local features is based on a Gabor wavelet transform. It is 

dened as a convolution of the grey value image with a family of Gabor kernels 

j(~x) = k2 j 

2 exp ,k2 j x2 

2 2 ! 

exp(i ~ k j ~x) , exp 

, 2 

2 

in the shape of plane waves with wave vector ~ k j , restricted by a Gaussian 

envelope function. We employ a discrete set of 5 dierent frequencies, index 

= 0; :::; 4, and 8 orientations, index = 0; :::; 7, yielding wave vectors 

~ kj = (k jx ;k jy ) = (k cos ' ;k sin ' ) with j = +8, k = 2 , +2 

2 , and 

' = . This sampling evenly covers a band in frequency space. The width 

8 

=k of the Gaussian is controlled by the parameter =2. The R second term in 

the bracket of (1) makes the kernels DC-free, i.e. the integral j(~x)d 2 ~x vanishes. 

One speaks of a wavelet transform since the family of kernels is self-similar, 

all kernels being generated from one mother wavelet by dilation and rotation. 

A jet J is dened as the set fJ j g of 40 complex coecients obtained for one 

image point by this convolution. It can be written as 

 

(1) 

J j = a j exp(i j ) (2) 

with amplitudes a j (~x), which slowly vary with position, and phases j (~x), which 

rotate with a rate set by the spatial frequency or wave vector ~ k j of the kernels. 

Due to phase rotation, jets taken from image points with a relative distance 

~d of few pixels have very dierent coecients although representing almost the 

same local feature. This can cause severe problems for matching. We therefore 

2

either ignore the phase or compensate for its variation explicitly. The former 

leads to the similarity function 

already used in [1, 3]. 

function 

S a (J ; J 0 )= 

P 

j 

rP 

j 

a j a 0 j 

a 2 j 

P 

j 

; (3) 

a 02 

j 

Compensating phase variation leads to the similarity 

S (J ; J 0 )= 

P 

j 

a j a 0 j cos( j , 0 j , ~ d ~ k j ) 

rP 

j 

a 2 j 

P 

j 

a 02 

j 

: (4) 

In order to compute it, the displacement ~ d has to be estimated. This can be 

done by maximizing S with respect to ~ d in its Taylor expansion. If applied 

iteratively from low to high frequencies, displacements of up to eight pixels can 

be estimated [8], which is useful to guide the matching process. 

2.2 Face Representation 

For faces, we have dened a set of ducial points, e.g. the pupils, the corners of 

the mouth, the tip of the nose, the top and bottom of the ears, etc. A labeled 

graph G representing a face consists of N nodes on these ducial points at positions 

~x n ;n=1; :::; N and E edges between them. The nodes are labeled with 

jets J n . The edges are labeled with distances ~x e = ~x n ,~x n 0;e=1; :::; E, where 

edge e connects node n 0 with n. Hence the edge labels are two-dimensional vectors. 

(When wanting to refer to the geometrical structure of a graph, unlabeled 

by jets, we call it a grid.) This face graph is object-adapted, since the nodes are 

selected from face-specic points (ducial points); see Fig. 2. 

Graphs for dierent head pose dier in geometry and local features. Although 

the ducial points refer to corresponding object locations, some may be occluded, 

and jets as well as distances vary due to rotation in depth. To be able to compare 

graphs for dierent poses, we manually dene pointers to associate corresponding 

nodes in the dierent graphs. 

In order to nd ducial points in new faces, one needs a general representation 

rather than models of individual faces. This representation should cover a wide 

range of possible variations in the appearance of faces, such as dierently shaped 

eyes, mouths, or noses, dierent types of beards, variations due to gender, age, 

and race etc. It is obvious that it would be too expensive tocover each feature 

combination by a separate graph. We instead combine a representative set of 

individual model graphs into a stack-like structure, called a face bunch graph 

(FBG) (see Fig. 1). Each model has the same grid structure and the nodes refer 

to identical ducial points. A set of jets referring to one ducial point is called a 

bunch. An eye bunch, for instance, may include jets from closed, open, female, 

and male eyes etc. to cover these local variations. During the location of ducial 

points in a face not seen before, the procedure described in the next section 

3

selects the best tting jet, called the local expert, from the bunch dedicated to 

each ducial point. Thus, the full combinatorics of jets in the bunch graph is 

available, covering a much larger range of facial variation than represented in 

the constituting model graphs themselves. 

Assume for a particular pose there are M model graphs G Bm (m =1; :::; M) 

of identical structure, taken from dierent model faces. The corresponding FBG 

B then is given the same structure, its nodes are labeled with bunches of jets 

J Bm 

n 

and its edges are labeled with the averaged distances ~x B e = P m 

~x Bm 

e 

=M. 

face bunch graph 

Figure 1: The face bunch graph (FBG) serves as 

a representation of faces in general. It is designed 

to cover all possible variations in the appearance of 

faces. The FBG combines information from a number 

of face graphs. Its nodes are labeled with sets of 

jets, called bunches, and its edges are labeled with 

averages of distance vectors. During comparison to 

an image the best tting jet in each bunch is selected 

independently, indicated by grey shading. 

2.3 Generating Face Representations by Elastic Bunch 

Graph Matching 

So far we have only described how individual faces and general knowledge about 

faces are represented by labeled graphs and the FBG, respectively. How can 

these graphs be generated The simplest method is to do so manually. We 

have used this method to generate initial graphs for the system, one graph for 

each pose, together with pointers to indicate which pairs of nodes in graphs for 

dierent poses correspond to each other. Once the system has an FBG (possibly 

consisting of only one manually dened model), graphs for new images can be 

generated automatically by elastic bunch graph matching. Initially, when the 

FBG contains only few faces, it is necessary to review and correct the resulting 

matches, but once the FBG is rich enough (approximately 70 graphs) one can 

rely on the matching and generate large galleries of model graphs automatically. 

Akey role in elastic bunch graph matching is played by a function evaluating 

the graph similarity between an image graph and the FBG of identical pose. It 

depends on the jet similarities and the distortion of the image grid relative to the 

FBG grid. For an image graph G I with nodes n =1; :::; N and edges e =1; :::; E 

and an FBG B with model graphs m =1; :::; M the similarity is dened as 

S B (G I ; B) = 1 N 

X 

n 

max 

m 

, 

S (Jn I ; J n 

Bm ) , E 

X 

e 

(~x I e , ~xB e )2 

(~x B e )2 ; (5) 

where determines the relative importance of jets and metric structure. J n 

are the jets at node n and ~x e are the distance vectors used as labels at edges 

4

e. Since the FBG provides several jets for each ducial point, the best one is 

selected and used for comparison. These best tting jets serve aslocal experts 

for the image face. 

The goal of elastic bunch graph matching on a probe image is to nd the 

ducial points and thus to extract from the image a graph which maximizes the 

similarity with the FBG as dened in Equation (5). In practice, one has to apply 

a heuristic algorithm to come close to the optimum within reasonable time. We 

use a coarse to ne approach in which weintroduce the degrees of freedom of the 

FBG progressively: translation, scale, aspect ratio, and nally local distortions. 

Phase information is used only in the latter steps. The resulting graph is called 

the image graph and is stored as a representation for the individual face of the 

image. 

Figure 2: Object-adapted grids for dierent 

poses. The nodes are positioned automatically by 

elastic graph matching against the corresponding 

face bunch graphs. The two top images show 

originals with widely diering size and grids as 

used for the normalization stage with many nodes 

on the outline. The other images are already 

rescaled to normal size. Their grids have more 

nodes on the face, which is more appropriate for 

recognition (The grids used in the experiments 

had about 14 additional nodes which are not 

shown here for simplicity). 

To minimize computing eort and to optimize reliability we extract a face 

representation in two stages. The rst stage, called normalization stage, has the 

purpose of estimating the position and size of the face in the original image, 

so that the image can be scaled and cut to standard size. The second stage 

takes this image as input and extracts a precise image graph appropriate for 

face recognition purposes. The two stages dier in emphasis. The rst one has 

to deal with greater uncertainty as to size and position of the head and has to 

optimize the reliability with which it nds the face, but there is no need to nd 

ducial points with any precision or extract data important for face recognition. 

The second stage can start with little uncertainty as to position and size of the 

head but has to extract a detailed face graph with high precision. The two 

stages took about 20 plus 10 seconds on a SPARCstation 10-512 with a 50 MHz 

processor. 

2.4 Recognition 

After having extracted model graphs from the gallery images and image graphs 

from the probe images, recognition is possible with relatively little computational 

eort by comparing an image graph with all model graphs and picking 

the one with the highest similarity value. The similarity function is based on 

5

the average similarity between corresponding jets. For image and model graphs 

referring to dierent pose, we compare jets according to the manually provided 

correspondences. Here we use the similarity function without phase (Eq. 3). A 

comparison against a gallery of 250 individuals took slightly less than a second. 

3 Experiments 

For the experiments we used image galleries taken from the ARPA/ARL FERET 

database provided by the US Army Research Laboratory. The poses are: frontal 

view, half-prole right or left (rotated by about 40-70 ), and prole right or 

left. For most faces there are two frontal views with dierent facial expression. 

Apart from a few exceptions there are no disguises, variations in hair-style or 

in clothing. The background is always a homogeneous light or grey, except for 

smoothly varying shadows. The size of the faces varies by about a factor of 

three (but is constant for each individual, information which we could have used 

to improve recognition rates, but didn't). The format of the original images is 

256384 pixels. 

Model Probe First rank First 10 ranks 

gallery images # % # % 

250 fa 250 fb 245 98 248 99 

250 hr 181 hl 103 57 147 81 

250 pr 250 pl 210 84 236 94 

249 fa + 1fb 171 hl + 79 hr 44 18 111 44 

171 hl + 79 hr 249 fa + 1fb 42 17 95 38 

170 hl + 80 hr 217 pl + 33 pr 22 9 67 27 

217 pl + 33 pr 170 hl + 80 hr 31 12 80 32 

Table 1: Recognition results for cross-runs between dierent galleries (f: frontal 

views; a, b: expression a and b; h: half proles; p: proles; l, r: left and right). 

Each gallery contained only one image per person; the dierent compositions in 

the four bottom lines are due to the fact that not all poses were available for all 

people. The table shows how often the correct model was identied as rank one 

and how often it was among the rst 10 (4%). 

We used various model and probe galleries with faces of dierent pose. Each 

model gallery contained 250 faces with just one image per person. Recognition 

results are shown in Table 1. The recognition rate is very high for frontal against 

frontal images (rst row). This is mainly due to the fact that in this database two 

frontal views show only little variation, and any face recognition system should 

perform well under these circumstances (cf. results on the Bochum database 

below for a more challenging example). Before comparing left against right 

poses we ipped all right pose images over. The recognition rate is high for right 

prole against left prole (84%, third row). The sharply reduced recognition 

rate of 57% (second row) when comparing left and right half-proles is probably 

due to the poor control in rotation angle in the database | inspection of images 

6

shows that right and left rotation angles dier by upto30 . When comparing 

half proles with either frontal views or proles another reduction in recognition 

rate is observed (although even a correct recognition rate of 10% out of a gallery 

of 250 is still high above chance level, which would be 0.4%!). 

We also did experiments on the Bochum database, which is described in 

greater detail in [3]. In this database the dierences in facial expression are 

stronger than in the FERET database, some faces were even half covered by 

hair or a hand. First rank recognition rates against a model gallery of 108 

frontal views were 91% on frontal views of dierent facial expression, 94% on 

faces rotated in depth by 11 , and 88% on faces rotated by 22 . The results are 

similar to those in [3] and demonstrate robustness against variations in facial 

expression and rotation in depth up to 22 . 

4 Discussion 

The system presented is general and exible. It is designed for an in-class 

recognition task, i.e. for recognizing members of a known class of objects. We 

have applied it to face recognition but the system is in no way specialized to faces 

and we assume that it can be directly applied to other in-class recognition tasks, 

such as recognizing individuals of a given animal species, given the same level 

of standardization of the images. In contrast to many neural network systems, 

no extensive training for new faces or new object classes is required. Only a 

moderate numberoftypical examples have to be inspected to build up a bunch 

graph, and individuals can then be recognized after storing a single image. 

We tested the system with respect to rotation in depth and dierences in 

facial expression. We did not investigate robustness to other variations, such 

as illumination changes or structured background. The performance is high on 

faces of same pose. We also showed robustness against rotation in depth up 

to about 22 . For large rotation angles the performance degrades signicantly. 

Our system performs well compared to other systems. Results of a blind test of 

dierent systems on the FERET database were published in [5, 6]. 

In comparison to the system [3] on the basis of which we have developed 

the system presented here we have made several major modications. We now 

utilize wavelet phase information for accurate node localization. Previously, 

node localization was rather imprecise. We have introduced the potential to 

specialize the system to specic object types and to handle dierent poses with 

the help of object-adapted grids. The face bunch graph is able to represent a 

wide variety of faces, which allows matching on face images of unseen individuals. 

These improvements make it possible to extract an image graph from a new face 

image in one matching process. Even if the person of the new image is not 

included in the FBG, the image graph reliably refers to the ducial points. This 

accelerates recognition from large databases considerably since for each probe 

image, correct node positions need to be searched only once instead of in each 

attempted match to a gallery image, as was previously necessary. We did not 

expect and the system does not show an improvement interms of recognition 

rates compared to the preceding system. 

7

Besides the ability to handle larger galleries and larger rotation in depth, the 

main achievement of the system presented here is that the increased matching 

accuracy, the object adapted graphs, and the face bunch graph provide the basis 

for further developments [2, 4, 7]. 

Acknowledgements: This work has benet from many colleagues. We particularly 

wish to thank Irving Biederman, Michael Lyons, Thomas Maurer, Jonathon 

Phillips, Ladan Shams, Marni Stewart Bartlett, and Jan Vorbruggen. For the 

experiments we have used the FERET database of facial images collected under 

the ARPA/ARL FERET program and the Bochum database. 

References 

[1] Buhmann, J., Lange, J., von der Malsburg, C., Vorbruggen, J. C., and 

Wurtz, R. P. Object recognition with Gabor functions in the dynamic link architecture: 

Parallel implementation on a transputer network. In Neural Networks for 

Signal Processing, B. Kosko, Ed. Prentice Hall, Englewood Clis, NJ 07632, 1992, 

pp. 121{159. 

[2] Kruger, N. An algorithm for the learning of weights in discrimination functions 

using a priori constraints. accepted for publication in IEEE Transactions on Pattern 

Analysis and Machine Intelligence, 1997. 

[3] Lades, M., Vorbruggen, J. C., Buhmann, J., Lange, J., von der Malsburg, 

C., Wurtz, R. P., and Konen, W. Distortion invariant object recognition in the 

dynamic link architecture. IEEE Transactions on Computers 42, 3 (1993), 300{311. 

[4] Maurer, T., and von der Malsburg, C. Linear feature transformations to 

recognize faces rotated in depth. In Proceedings of the International Conference on 

Articial Neural Networks, ICANN'95 (Paris, Oct. 1995), pp. 353{358. 

[5] Phillips, P. J., Rauss, P. J., and Der, S. Z. FERET (face recognition technology) 

recognition algorithm development and test report. Tech. Rep. ARL-TR-995, 

U. S. Army Research Laboratory, 2800 Powder Mill Road, Adelphi, MD 20783- 

1197, Oct. 1996. 

[6] Rauss, P. J., Phillips, J., Hamilton, M. K., and DePersia, A. T. FERET 

(face-recognition technology) recognition algorithms. In Proc. of the Fifth Automatic 

Target Recognizer System and Technology Symposium (1996). 

[7] Wiskott, L. Phantom faces for face analysis. Pattern Recognition 30, 6 (1996). 

[8] Wiskott, L., Fellous, J.-M., Kruger, N., and von der Malsburg, C. Face 

recognition by elastic bunch graph matching. Tech. Rep. IR-INI 96-08, Institut fur 

Neuroinformatik, Ruhr-Universitat Bochum, D-44780 Bochum, Germany, 1996. 

8

Face Recognition by Elastic Bunch Graph Matching* - FIAS

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?