Abstract book (pdf) - ICPR 2010

CONTENTS 

Organizing Committees 2 

Tracks and Co-Chairs 4 

Message from the General Chair 6 

Message from the Technical Program Chairs 7 

Technical Program Overview 8 

Technical Program for Monday 17 

Technical Program for Tuesday 69 

Technical Program for Wednesday 157 

Technical Program for Thursday 239 

- 1 -

Organizing Committees 

Conference Chair 

Aytül Erçil 

Sabanci University 

Turkey 

Technical Co-Chairs 

Kim Boyer 

Rensselaer 

Polytechnic Institute 

USA 

Müjdat Çetin 


Turkey 

Seong-Whan Lee 

Korea University 

Korea 

Advisory Committee 

Sergey Ablameyko 

National Academy of Sciences 

Belarus 

Hüseyin Abut 

San Diego 

State University 

USA 

Jake Aggarwal 

University of Texas 

USA 

Horst Bunke 

University of Bern 

Switzerland 

Rama Chellappa 

University of Maryland 

USA 

Igor B. Gurevich 

Russian Academy of Sciences 

Russia 

Anil K. Jain 

Michigan State University 

USA 

Takeo Kanade 

Carnegie Mellon University 

USA 

Rangachar Kasturi 

University of South Florida 

USA 

- 2 - 

Josef Kittler 

University of Surrey 

UK 

Brian Lovell 

University of Queensland 

Australia 

Theo Pavlidis 

Stony Brook University 

USA 

Pietro Perona 

California Institute of Technology 

USA 

Fatih Porikli 

MERL 

USA 

Alberto Sanfeliu 

Politechnical University of Catalonia 

Spain 

Bülent Sankur 

Bogazici University 

Turkey 

Bernhard Schölkopf 

Max Planck Institutes 

Germany 

Mubarak Shah 

University of Central Florida 

USA 

Tieniu Tan 

National Laboratory of 

Pattern Recognition 

China 

Sergios Theodoridis 

University of Athens 

Greece 

Plenary Speakers Committee 

Anil K. Jain 

Michigan State University 

USA 

Tutorials 

Denis Laurendeau 

Laval University 

Canada 

- 2 -

Arun Ross 

West Virginia University 

USA 

Birsen Yazıcı 

Rensselaer 

Polytechnic Institute 

USA 

Workshops 

Selim Aksoy 

Bilkent University 

Turkey 

Theo Gevers 

University of Amsterdam 

The Netherlands 

Denis Laurendeau 

Laval University 

Canada 

Bülent Sankur 


Turkey 

Contest Organization 

Selim Aksoy 


Turkey 

Zehra Çataltepe 

Istanbul Technical Universityv 

Turkey 

Devrim Ünay 

Bahcesehir University 

Turkey 

Publicity 

Enis Çetin 


Turkey 

Pınar Duygulu Şahin 


Turkey 

Asian Liaisons 

Karthik Nandakumar 

Institute for Infocomm Research 

Singapore 

Yunhou Wang 

Beihang University 

China 

- 3 - 

European Liaisons 

Javier Ortega-Garcia 

Universidad Autonoma de Madrid 

Spain 

Fabio Roli 

University of Cagliari 

Italy 

American Liaisons 

Deniz Erdoğmuş 

Northeastern University 

USA 

Publications 

Nafiz Arıca 

Naval Academy 

Turkey 

Cem Ünsalan 

Yeditepe University 

Turkey 

Local Arrangements 

Ayşın Baytan Ertüzün 


Turkey 

Mustafa Ünel 


Turkey 

Finance 

Gülbin Akgün 


Turkey 

Hakan Erdoğan 


Turkey 

Sponsorship 

Fatoş Yarman Vural 

Middle East Technical University 

Turkey 

Exhibits 

Olcay Kurşun 

Istanbul University 

Turkey

Tracks and Co-Chairs 

Track I: Computer Vision 

Joachim Buhmann 

ETH Zurich, Switzerland 

Xiaoyi Jiang 

University of Munster, Germany 

Jussi Parkkinen 

University of Joensuu, Finland 

Alper Yılmaz 

Ohio State University, USA 

Area Co-Chairs: 

Ahmet Ekin, Philips Research Europe, The Netherlands 

Georgy Gimel’farb, University of Auckland, New Zealand 

Muhittin Gökmen, Istanbul Technical University, Turkey 

Atsushi Imiya, Chiba University, Japan 

Nikos Paragios, Ecole Centrale de Paris, France 

Fatih Porikli, MERL, USA 

Sudeep Sarkar, University of South Florida, USA 

Bernt Schiele, TU Darmstadt, Germany 

Yaser Ajmal Sheikh, Carnegie Mellon, USA 

Dacheng Tao, Nanyang Technological University, Singapore 

Track II: Pattern Recognition and Machine Learning 

G. Sanniti di Baja 

Istituto di Cibernetica Eduardo Caianiello, Italy 

Mario Figueiredo 

Instituto Superior Tecnico, Portugal 

Bilge Günsel 

Istanbul Technical University, Turkey 

D.Y. Yeung 

Hong Kong University of Science and Technology, China 

Area Co-Chairs: 

Ethem Alpaydın, Bogazici University, Turkey 

Gunilla Borgefors, CBA Uppsala, Sweden 

Yang Gao, Nanjing University, China 

Simone Marinai, University of Florence, Italy 

Aleix Martinez, The Ohio State University, USA 

Petr Somol, UTIA, Czech Republic 

Tolga Taşdizen, University of Utah, USA 

Zhi-Hua Zhou, Nanjing University, China 

Track III: Signal, Speech, Image and Video Processing 

Maria Petrou 

Imperial College, UK 

Kazuya Takeda 

Nagoya University, Japan 

Murat Tekalp 

Koc University, Turkey 

Jean-Philippe Thiran 

Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland 

- 4 -

Track IV: 

Biometrics and Human Computer Interaction 

Lale Akarun 

Bogazici University, Turkey 

Patrick Flynn 

University of Notre Dame, USA 

B. Vijaya Kumar 

Carnegie Mellon, USA 

Stan Z. Li 

Chinese Academy of Sciences, China 

Track V: Multimedia and Document Analysis, 

Processing and Retrieval 

Nozha Boujemaa 

INRIA, France 

David Doermann 

University of Maryland, USA 

B. S. Manjunath 

University of California, USA 

Nicu Sebe 

University of Trento, Italy 

Berrin Yanıkoğlu 

Sabanci University, Turkey 

Track VI: Bioinformatics and Biomedical Applications 

Rachid Deriche 

INRIA, France 

Tianzi Jiang 

Chinese Academy of Sciences, China 

Elena Marchiori 

Radboud University, Netherlands 

Dimitris Metaxas 

The State University of New Jersey, USA 

Gözde Ünal 

Sabanci University, Turkey 

- 5 -

Message from the General Chair 

It is my great honor and privilege to welcome all of you to the 20th International Conference on Pattern Recognition. 

In the past 40 years, this conference brought together the research communities of industry and academia, from all over 

the world to discuss important issues, challenges, and solutions in Pattern Recognition related problems. The conference 

has established itself as a forum at which research as well as practical aspects of pattern recognition are enthusiastically 

addressed. We hope to continue this tradition by offering you another successful forum with an interesting program. 

Once again we have a very strong technical program, with technical sessions on computer vision, pattern recognition and 

machine learning, signal, speech, image and video processing, biometrics and human computer interaction, multimedia 

and Document Analysis, Processing, and Bioinformatics and Biomedical Applications. We are also fortunate to have distinguished 

invited speakers: Christopher Bishop from Microsoft Research Cambridge, Shree K. Nayar Columbia University, 

Prabhakar Raghavan from Yahoo! Research will share their experiences and vision with us. The conference also has 

an extremely varied program: there will be 7 interesting tutorials that are an integrated part of the program, as well as 9 

workshops that allow an even deeper focus on areas that are of interest to the conference participants. A new feature in the 

program this year is the organization of 9 contests which will provide a setting where participants will have the opportunity 

to evaluate their algorithms using publicly available datasets, and discuss technical topics in an atmosphere that fosters 

active exchange of ideas. 

A number of organizations, namely, Tüpraş (TR), Tübitak (TR), Havelsan (TR), Cybersoft (TR), Savronik (TR), Chryso 

(TR), Star Alliance (TR), Mitsubishi Electric Research Laboratories (USA), IBM Research (USA) and Elsevier (USA), 

kindly served as supporters of the Conference. We are most grateful to these organizations for their financial support and 

encouragement. The conference is technically co-sponsored by IEEE Computer Society, continuing our desire to seek 

closer collaboration between our two communities. 

During this period, I have had the opportunity to work closely with some of the best people in our community. We are extremely 

grateful to Prof. Müjdat Çetin and Osman Rahmi Fıçıcı, who worked day and night, beyond their professional duties 

to make the conference a success. The success of any conference depends heavily on the quality of the selected papers. 

For selecting the best out of many excellent submitted papers, we are indebted to the technical co-chairs Müjdat Çetin, 

Kim Boyer and Seong-Whan Lee, all the track chairs, and the external referees for their hard work that has continued to 

uphold the high standard that is now custom to this conference series. 

Special thanks also to the conference organizing committee, in particular, Ayşın Ertüzün and Mustafa Ünel (Local arrangement 

chairs), Cem Ünsalan (Publication chair), Fatoş Yarman Vural (Sponsorship chair), Anil Jain (Plenary speakers chair), 

Hakan Erdoğan and Gülbin Akgül (Finance chairs), Olcay Kurşun (Exhibits chair), Denis Laurendeau, Arun Ross and 

Birsen Yazıcı (Tutorial chairs), Selim Aksoy, Theo Gevers, Denis Laurendeau and Bülent Sankur (Workshop chairs), Selim 

Aksoy, Zehra Çataltepe and Devrim Ünay (Contest chairs) and Pınar Duygulu, Enis Çetin (Publicity chairs). There would 

be no conference without them. We also thank IAPR ex-co members and past chairs of this event for their continued 

support and advice in helping us. We would also like to thank Sabancı University President Prof. Nihat Berker for his support 

and encouragement. Our special thanks go to the Teamcon staff members, who provided critical support overseeing 

all the logistics and making the smooth operation of the entire conference possible. 

Finally, no conference can ever take place without the support of those individuals who submit their original research results, 

or without the participants, who honor the conference with their presence. 

We hope that you will find the conference both enjoyable and valuable, and also enjoy the architectural, cultural and 

natural beauty of Istanbul, and Turkey. 

Aytül Erçil 

ICPR 2010 General Chair 

Sabancı University, Faculty of Engineering and Natural Sciences 

- 6 -

Message from the Technical Program Chairs 

The full technical program committee joins the three of us in welcoming you to the 2010 International Conference on Pattern 

Recognition in beautiful, fascinating İstanbul! This is the 20th edition of ICPR, world famous as the flagship conference 

of the International Association for Pattern Recognition. For nearly 40 years, ICPR has been the international forum 

for reporting the latest advances across a wide spectrum of fields including pattern recognition and machine learning, 

computer vision, image and signal understanding, medical image analysis, biometrics and human-computer interaction, 

multimedia and document analysis, bioinformatics and biomedical applications, and more. 

The conference program is the work of many people, whose names you will find in the accompanying lists. Track Chairs, 

in some cases supported by Area Chairs, pored over thoughtful, well-written reviews provided by an extensive set of referees 

drawn from the broad IAPR community. Preliminary decisions were funneled to a set of Track Chairs and Müjdat 

Çetin, who met in İstanbul to finalize the program. Papers submitted by Track Chairs were processed by Kim Boyer. General 

Chair’s and Technical Program Chairs’ papers were handled by a senior researcher and a separate set of reviewers in 

a process completely external to the main paper management system. Seong-Whan Lee took the point on awards. 

In all we received 2140 submissions, and accepted 1147 for an acceptance rate of 54%. Of the accepted papers, we were 

able to accommodate 385 for oral presentation and 762 as posters. This submission number continues an upward trend for 

ICPR, and underscores the health of our scientific community. A slight tightening of the acceptance rate ensures a highquality 

meeting, and indeed was necessary to fit into the space and time constraints. It is, however, undoubtedly true that 

many quality submissions were left out. This is an unfortunate byproduct of the compressed time window in which such 

a large number of decisions need to be made. 

We thank all of the authors who took the time to prepare and submit their work. We are also deeply grateful to all of the 

reviewers, and especially the Track and Area Chairs who devoted so much time and expertise to bringing forth a quality 

meeting. 

We are confident that ICPR 2010 will prove to be a rewarding experience, both scientifically as you interact with others 

at the meeting, and culturally as you enjoy the rich heritage, local cuisine, crafts, shopping, and so much more that İstanbul 

has to offer. 

We look forward to seeing you during our time together, here where the continents meet. 

Müjdat Çetin, Kim Boyer, and Seong-Whan Lee 

Technical Program Chairs 

- 7 -

- 8 -

- 9 -

- 10 -

- 11 -

- 12 -

- 13 -

- 14 -

- 15 -

- 16 -

Technical Program for Monday 

August 23, 2010 

- 17 -

- 18 -

09:00-09.30, MoOT10 Anadolu Auditorium 

Opening Session 

09:30-10:30, MoP1L1 Anadolu Auditorium 

K.S. Fu Prize Lecture: 

Towards the Unification of Structural and Statistical Pattern Recognition 

Horst Bunke Plenary Session 

Research Group on Computer Vision and Artificial Intelligence IAM 

University of Bern, Switzerland 

Statistical pattern recognition is characterized by the use of feature vectors for pattern representation, while the structural 

approach is based on symbolic data structures, such as strings, trees, and graphs. Clearly, symbolic data structures have a 

higher representational power than feature vectors because they allows one to directly model relationships that may exist 

between the individual parts of a pattern. However, many operations that are needed in classification, clustering, and other 

pattern recognition tasks are not defined for graphs. Consequently, there has been a lack of algorithmic tools in the domain 

of structural pattern recognition since its beginning. This talk gives an overview of the development of the field of structural 

pattern recognition and shows various attempts to bridge the gap between statistical and structural pattern recognition, i.e. 

to make algorithmic tools originally developed for feature vectors applicable to symbolic data structures. 

MoAT1 Anadolu Auditorium 

Image Analysis - I Regular Session 

Session chair: Aksoy, Selim (Bilkent Univ.) 

11:00-11:20, Paper MoAT1.1 

Minimizing Geometric Distance by Iterative Linear Optimization 

Chen, Yisong, Peking Univ. 

Sun, Jiewei, Peking Univ. 

Wang, Guoping, Peking Univ. 

This paper proposes an algorithm that solves planar homography by iterative linear optimization. we iteratively employ 

direct linear transformation (DLT) algorithm to robustly estimate the homography induced by a given set of point correspondences 

under perspective transformation. By simple on-the-fly homogeneous coordinate adjustment we progressively minimize 

the difference between the algebraic error and the geometric error. When the difference is sufficiently close to zero, 

the geometric error is equivalently minimized and the homography is reliably solved. Backward covariance propagation is 

employed to do error analysis. The experiments prove that the algorithm is able to find global minimum despite erroneous 

initialization. It gives very precise estimate at low computational cost and greatly outperforms existing techniques. 

11:20-11:40, Paper MoAT1.2 

Hyper Least Squares and its Applications 

Rangarajan, Prasanna, Southern Methodist Univ. 

Kanatani, Kenichi, Okayama Univ. 

Niitsuma, Hirotaka, Okayama Univ. 

Sugaya, Yasuyuki, Toyohashi Univ. of Tech. 

We present a new form of least squares (LS), called ``hyper LS’’, for geometric problems that frequently appear in computer 

vision applications. Doing rigorous error analysis, we maximize the accuracy by introducing a normalization that eliminates 

statistical bias up to second order noise terms. Our method yields a solution comparable to maximum likelihood (ML) 

without iterations, even in large noise situations where ML computation fails. 

11:40-12:00, Paper MoAT1.3 

Integrating a Discrete Motion Model into GMM based Background Subtraction 

Wolf, Christian, INSA de Lyon 

Jolion, Jean-Michel, Univ. de Lyon 

GMM based algorithms have become the de facto standard for background subtraction in video sequences, mainly because 

of their ability to track multiple background distributions, which allows them to handle complex scenes including moving 

trees, flags moving in the wind etc. However, it is not always easy to determine which distributions of the mixture belong 

- 19 -

to the background and which distributions belong to the foreground, which disturbs the results of the labeling process for 

each pixel. In this work we tackle this problem by taking the labeling decision together for all pixels of several consecutive 

frames minimizing a global energy function taking into account spatial and temporal relationships. A discrete approximative 

optical-flow like motion model is integrated into the energy function and solved with Ishikawa’s convex graph cuts algorithm. 

12:00-12:20, Paper MoAT1.4 

Saliency based on Multi-Scale Ratio of Dissimilarity 

Huang, Rui, Huazhong Univ. of Science and Tech. 

Sang, Nong, Huazhong Univ. of Science and Tech. 

Liu, Leyuan, Huazhong Univ. of Science and Tech. 

Tang, Qiling, Huazhong Univ. of Science and Tech. 

Recently, many vision applications tend to utilize saliency maps derived from input images to guide them to focus on processing 

salient regions in images. In this paper, we propose a simple and effective method to quantify the saliency for 

each pixel in images. Specially, we define the saliency for a pixel in a ratio form, where the numerator measures the 

number of dissimilar pixels in its center-surround and the denominator measures the total number of pixels in its centersurround. 

The final saliency is obtained by combining these ratios of dissimilarity over multiple scales. For images, the 

saliency map generated by our method not only has a high quality in resolution also looks more reasonable. Finally, we 

apply our saliency map to extract the salient regions in images, and compare the performance with some state-of-the-art 

methods over an established ground-truth which contains 1000 images. 

12:20-12:40, Paper MoAT1.5 

Online Principal Background Selection for Video Synopsis 

Feng, Shikun, Chinese Acad. of Sciences 

Liao, Shengcai, Chinese Acad. of Sciences 

Yuan, Zhiyong, Wuhan Univ. 

Li, Stan Z., Chinese Acad. of Sciences 

Video synopsis provides a means for fast browsing of activities in video. Principal background selection (PBS) is an important 

step in video synopsis. Existing methods make PBS in an offline way and at a high memory cost. In this paper we 

propose a novel background selection method, `ònline principal background selection’’ (OPBS). The OPBS selects n 

principal backgrounds from N backgrounds in an online fashion with a low memory cost, making it possible to build an 

efficient online video synopsis system. Another advantage is that, with OPBS, the selected backgrounds are related to not 

only background changes over time but also video activities. Experimental results demonstrate the advantages of the proposed 

OPBS. 

MoAT2 Marmara Hall 

Support Vector Machines Regular Session 

Session chair: Alpaydin, Ethem (Bogazici Univ.) 

11:00-11:20, Paper MoAT2.1 

Large Margin Classifier based on Affine Hulls 

Cevikalp, Hakan, Eskisehir Osmangazi Univ. 

Yavuz, Hasan Serhan, Eskisehir Osmangazi Univ. 

This paper introduces a geometrically inspired large-margin classifier that can be a better alternative to the Support Vector 

Machines (SVMs) for the classification problems with limited number of training samples. In contrast to the SVM classifier, 

we approximate classes with affine hulls of their class samples rather than convex hulls, which may be unrealistically 

tight in high-dimensional spaces. To find the best separating hyperplane between any pair of classes approximated with 

the affine hulls, we first compute the closest points on the affine hulls and connect these two points with a line segment. 

The optimal separating hyperplane is chosen to be the hyperplane that is orthogonal to the line segment and bisects the 

line. To allow soft margin solutions, we first reduce affine hulls in order to alleviate the effects of outliers and then search 

for the best separating hyperplane between these reduced models. Multi-class classification problems are dealt with constructing 

and combining several binary classifiers as in SVM. The experiments on several databases show that the proposed 

method compares favorably with the SVM classifier. 

- 20 -

11:20-11:40, Paper MoAT2.2 

2D Shape Recognition using Information Theoretic Kernels 

Bicego, Manuele, Univ. of Verona 

Torres Martins, André Filipe, Inst. Superior Técnico 

Murino, Vittorio, Univ. of Verona 

Aguiar, Pedro M. Q., Inst. for Systems and Robotics / Inst. Superior Tecnico 

Figueiredo, Mario A. T., Inst. Superior Técnico 

In this paper, a novel approach for contour based 2D shape recognition is proposed, using a class of information theoretic 

kernels recently introduced. This kind of kernels, based on a non-extensive generalization of the classical Shannon information 

theory, are defined on probability measures. In the proposed approach, chain code representations are first extracted 

from the contours; then n-gram statistics are computed and used as input to the information theoretic kernels. We tested 

different versions of such kernels, using support vector machine and nearest neighbor classifiers. An experimental evaluation 

on the Chicken pieces dataset shows that the proposed approach significantly outperforms the current state-of-theart 

methods. 

11:40-12:00, Paper MoAT2.3 

Time Series Classification using Support Vector Machine with Gaussian Elastic Metric Kernel 

Zhang, Dongyu, Harbin Inst. of Tech. 

Zuo, Wangmeng, Harbin Inst. of Tech. 

Zhang, David, The Hong Kong Pol. Univ. 

Zhang, Hongzhi, Harbin Inst. of Tech. 

Motivated by the great success of dynamic time warping (DTW) in time series matching, Gaussian DTW kernel had been 

developed for support vector machine (SVM)-based time series classification. Counter-examples, however, had been subsequently 

reported that Gaussian DTW kernel usually cannot outperform Gaussian RBF kernel in the SVM framework. 

In this paper, by extending the Gaussian RBF kernel, we propose one novel class of Gaussian elastic metric kernel (GEMK), 

and present two examples of GEMK: Gaussian time warp edit distance (GTWED) kernel and Gaussian edit distance with 

real penalty (GERP) kernel. Experimental results on UCR time series data sets show that, in terms of classification accuracy, 

SVM with GEMK is much superior to SVM with Gaussian RBF kernel and Gaussian DTW kernel, and the state-of-theart 

similarity measure methods. 

12:00-12:20, Paper MoAT2.4 

Multiplicative Update Rules for Multilinear Support Tensor Machines 

Kotsia, Irene, Queen Mary Univ. of London 

Patras, Ioannis, Queen Mary Univ. of London 

In this paper, we formulate the Multilinear Support Tensor Machines (MSTMs) problem in a similar to the Non-negative 

Matrix Factorization (NMF) algorithm way. A novel set of simple and robust multiplicative update rules are proposed in 

order to find the multilinear classifier. Updates rules are provided for both hard and soft margin MSTMs and the existence 

of a bias term is also investigated. We present results on standard gait and action datasets and report faster convergence of 

equivalent classification performance in comparison to standard MSTMs. 

12:20-12:40, Paper MoAT2.5 

Support Vectors Selection for Supervised Learning using an Ensemble Approach 

Guo, Li, Univ. of Bordeaux 3 

Boukir, Samia, Univ. of Bordeaux 3 

Chehata, Nesrine, Univ. of Bordeaux 3 

Support Vector Machines (SVMs) are popular for pattern classification. However, training a SVM requires large memory 

and high processing time, especially for large datasets, which limits their applications. To speed up their training, we 

present a new efficient support vector selection method based on ensemble margin, a key concept in ensemble classifiers. 

This algorithm exploits a new version of the margin of an ensemble-based classification and selects the smallest margin 

instances as support vectors. Our experimental results show that our method reduces training set size significantly without 

degrading the performance of the resulting SVMs classifiers. 

- 21 -

MoAT3 Topkapı Hall A 

Motion and Multiple-View Vision – I Regular Session 

Session chair: Hancock, Edwin (Univ. of York) 

11:00-11:20, Paper MoAT3.1 

Estimating Apparent Motion on Satellite Acquisitions with a Physical Dynamic Model 

Huot, Etienne, INRIA and UVSQ 

Herlin, Isabelle, INRIA 

Mercier, Nicolas, INRIA 

Plotnikov, Evgeny, National Academy of Sciences, Ukraine 

The paper presents a motion estimation method based on data assimilation in a dynamic model, named Image Model, expressing 

the physical evolution of a quantity observed on the images. The application concerns the retrieval of apparent 

surface velocity from a sequence of satellite data, acquired over the ocean. The Image Model includes a shallow-water 

approximation for the dynamics of the velocity field (the evolution of the two components of motion are linked by the 

water layer thickness) and a transport equation for the image field. For retrieving the surface velocity, a sequence of Sea 

Surface Temperature (SST) acquisitions is assimilated in the Image Model with a 4D-Var method. This is based on the 

minimization of a cost function including the discrepancy between model outputs and SST data and a regularization term. 

Several types of regularization norms have been studied. Results are discussed to analyze the impact of the different components 

of the assimilation system. 

11:20-11:40, Paper MoAT3.2 

Multiple View Geometries for Mirrors and Cameras 

Fujiyama, Shinji, Nagoya Inst. of Tech. 

Sakaue, Fumihiko, Nagoya Inst. of Tech. 

Sato, Jun, Nagoya Inst. of Tech. 

In this paper, we analyze the multiple view geometry for a camera and mirrors, and propose a method for computing the 

geometry of the camera and mirrors accurately from fewer corresponding points than the existing methods. The geometry 

between a camera and mirrors can be described as the multiple view geometry for a real camera and virtual cameras. We 

show that very strong constraints on geometries can be obtained in addition to the ordinary multilinear constraints. By 

using these constraints, we can estimate multiple view geometry more accurately from fewer corresponding points than 

usual. The experimental results show the efficiency of the proposed method. 

11:40-12:00, Paper MoAT3.3 

Perspective Reconstruction and Camera Auto-Calibration as Rectangular Polynomial Eigenvalue Problem 

Pernek, Ákos, MTA SZTAKI, BME 

Hajder, Levente, MTA SZTAKI 

Motion-based 3D reconstruction (SfM) with missing data has been a challenging computer vision task since the late 90s. 

Under perspective camera model, one of the most difficult problems is camera auto-calibration which means determining 

the intrinsic camera parameters without using any known calibration object or assuming special properties of the scene. 

This paper presents a novel algorithm to perform camera auto-calibration from multiple images and dealing with the missing 

data problem. The method supposes semi-calibrated cameras (every intrinsic camera parameter except for the focal 

length is considered to be known) and constant focal length over all the images. The solution requires at least one image 

pair having at least eight common measured points. Tests verified that the algorithm is numerically stable and produces 

accurate results both on synthetic and real test sequences. 

12:00-12:20, Paper MoAT3.4 

Multi-Camera Platform Calibration using Multi-Linear Constraints 

Nyman, Patrik, Lund Univ. 

Heyden, Anders, Lund Univ. 

Astroem, Kalle, Lund Univ. 

We present a novel calibration method for multi-camera platforms, based on multi-linear constraints. The calibration 

method can recover the relative orientation between the different cameras on the platform, even when there are no corre- 

- 22 -

sponding feature points between the cameras, i.e. there are no overlaps between the cameras. It is shown that two translational 

motions in different directions are sufficient to linearly recover the rotational part of the relative orientation. Then 

two general motions, including both translation and rotation, are sufficient to linearly recover the translational part of the 

relative orientation. However, as a consequence of the speed-scale ambiguity the absolute scale of the translational part 

can not be determined if no prior information about the motions are known, e.g. from dead reckoning. It is shown that in 

case of planar motion, the vertical component of the translational part can not be determined. However, if at least one 

feature point can be seen in two different cameras, this vertical component can also be estimated. Finally, the performance 

of the proposed method is shown in simulated experiments. 

12:20-12:40, Paper MoAT3.5 

A Game-Theoretic Approach to Robust Selection of Multi-View Point Correspondence 

Rodolà, Emanuele, Univ. Ca’ Foscari Venezia 

Albarelli, Andrea, Univ. Ca’ Foscari di Venezia 

Torsello, Andrea, Univ. Ca’ Foscari 

In this paper we introduce a robust matching technique that allows very accurate selection of corresponding feature points 

from multiple views. Robustness is achieved by enforcing global geometric consistency at an early stage of the matching 

process, without the need of subsequent verification through reprojection. The global consistency is reduced to a pairwise 

compatibility making use of the size and orientation information provided by common feature descriptors, thus projecting 

what is a high-order compatibility problem into a pairwise setting. Then a game-theoretic approach is used to select a 

maximally consistent set of candidate matches, where highly compatible matches are enforced while incompatible correspondences 

are driven to extinction. 

MoAT4 Dolmabahçe Hall A 

Ensemble Learning Regular Session 

Session chair: Roli, Fabio (Univ. of Cagliari) 

11:00-11:20, Paper MoAT4.1 

A Bias-Variance Analysis of Bootstrapped Class-Separability Weighting for Error-Correcting Output Code Ensemble 

Smith, Raymond, Univ. of Surrey 

Windeatt, Terry, Univ. of Surrey 

We investigate the effects, in terms of a bias-variance decomposition of error, of applying class-separability weighting 

plus bootstrapping in the construction of error-correcting output code ensembles of binary classifiers. Evidence is presented 

to show that bias tends to be reduced at low training strength values whilst variance tends to be reduced across the full 

range. The relative importance of these effects, however, varies depending on the stability of the base classifier type. 

11:20-11:40, Paper MoAT4.2 

Multi-Class AdaBoost with Hypothesis Margin 

Jin, Xiaobo, Chinese Acad. of Sciences 

Hou, Xinwen, Chinese Acad. of Sciences 

Liu, Cheng-Lin, Chinese Acad. of Sciences 

Most AdaBoost algorithms for multi-class problems have to decompose the multi-class classification into multiple binary 

problems, like the Adaboost.MH and the LogitBoost. This paper proposes a new multi-class AdaBoost algorithm based 

on hypothesis margin, called AdaBoost.HM, which directly combines multi-class weak classifiers. The hypothesis margin 

maximizes the output about the positive class meanwhile minimizes the maximal outputs about the negative classes. We 

discuss the upper bound of the training error about AdaBoost.HM and a previous multi-class learning algorithm AdaBoost.M1. 

Our experiments using feed forward neural networks as weak learners show that the proposed AdaBoost.HM 

yields higher classification accuracies than the AdaBoost.M1 and the AdaBoost.MH, and meanwhile, AdaBoost.HM is 

computationally efficient in training. 

- 23 -

11:40-12:00, Paper MoAT4.3 

A Score Decidability Index for Dynamic Score Combination 

Lobrano, Carlo, DIEE- Univ. of Cagliari 

Tronci, Roberto, Univ. of Cagliari 

Giacinto, Giorgio, Univ. of Cagliari 

Roli, Fabio, Univ. of Cagliari 

In two-class problems, the combination of the outputs (scores) of an ensemble of classifiers is widely used to attain high 

performance. Dynamic combination techniques that estimate the combination parameters on a pattern per pattern basis, 

usually provide better performance than those of static combination techniques. In this paper, we propose an Index of Decidability 

derived from the Wilcox on-Mann-Whitney statistic, that is used to estimate the combination parameters. Reported 

results on a multimodal biometric dataset show the effectiveness of the proposed dynamic combination mechanisms 

in terms of misclassification errors. 

12:00-12:20, Paper MoAT4.4 

AUC-Based Combination of Dichotomizers: Is Whole Maximization also Effective for Partial Maximization? 

Ricamato, Maria Teresa, Univ. degli Studi di Cassino 

Tortorella, Francesco, Univ. degli Studi di Cassino 

The combination of classifiers is an established technique to improve the classification performance. When dealing with 

two-class classification problems, a frequently used performance measure is the Area under the ROC curve (AUC) since 

it is more effective than accuracy. However, in many applications, like medical or biometric ones, tests with false positive 

rate over a given value are of no practical use and thus irrelevant for evaluating the performance of the system. In these 

cases, the performance should be measured by looking only at the interesting part of the ROC curve. Consequently, the 

optimization goal is to maximize only a part of the AUC instead of the whole area. In this paper we propose a method tailored 

for these situations which builds a linear combination of two dichotomizers maximizing the partial AUC (pAUC). 

Another aim of the paper is to understand if methods that maximize the AUC can maximize also the pAUC. An empirical 

comparison drawn between algorithms maximizing the AUC and the proposed method shows that this latter is more effective 

for the pAUC maximization than methods designed to globally optimize the AUC. 

12:20-12:40, Paper MoAT4.5 

Random Prototypes-Based Oracle for Selection-Fusion Ensembles 

Armano, Giuliano, Univ. of Cagliari 

Hatami, Nima, Univ. of Cagliari 

Classifier ensembles based on selection-fusion strategy have recently aroused enormous interest. The main idea underlying 

this strategy is to use miniensembles instead of monolithic base classifiers in an ensemble in order to improve the overall 

performance. This paper proposes a classifier selection method to be used in selection-fusion strategies. The method involves 

first splitting the original classification problem according to some prototypes randomly selected from training 

data, and then building a classifier on each subset. The trained classifiers, together with an oracle used to switch between 

them, form a miniensemble of classifier selection. With respect to the other methods used in the selection-fusion framework, 

the proposed method has proven to be more efficient in the decomposition process with no limitation in the number of resulting 

partitions. Experimental results on some datasets from the UCI repository show the validity of the proposed method. 

MoAT5 Dolmabahçe Hall B 

Detection and Segmentation of Audio Signals Regular Session 

Session chair: Erdogan, Hakan (Sabanci Univ.) 

11:00-11:20, Paper MoAT5.1 

Noise-Robust Voice Activity Detector based on Hidden Semi-Markov Models 

Liu, Xianglong, Beihang Univ. 

Liang, Yuan, Beihang Univ. 

Lou, Yihua, Beihang Univ. 

Li, He, Beihang Univ. 

Shan, Baosong, Beihang Univ. 

- 24 -

This paper concentrates on speech duration distributions that are usually invariant to noises and proposes a noise-robust 

and real-time voice activity detector (VAD) using the hidden semi-Markov model (HSMM) to explicitly model state durations. 

Motivated by statistical observations and tests on TIMIT and the IEEE sentence database, we use Weibull distributions 

to model state durations approximately and estimate their parameters by maximum likelihood estimators. The 

final VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge and modified 

forward variables. An efficient way that recursively calculates modified forward variables is devised and a dynamic adjustment 

scheme is used to update parameters. Experiments on noisy speech data show that the proposed method performs 

more robustly and accurately than the standard ITU-T G.729B VAD and AMR2. 

11:20-11:40, Paper MoAT5.2 

Simultaneous Segmentation and Modelling of Signals based on an Equipartition Principle 

Panagiotakis, Costas, Univ. of Crete 

We propose a general framework for simultaneous segmentation and modelling of signals based on an Equipartition Principle 

(EP). According to EP, the signal is divided into segments with equal reconstruction errors by selecting the most 

suitable model to describe each segment. In addition, taking into account change detection on signal model an efficient 

signal reconstruction is also obtained. The model selection concerns both the kind and the order of the model. The proposed 

methodology is very flexible on different error criteria and signal features. 

11:40-12:00, Paper MoAT5.3 

Voice Activity Detection based on Complex Exponential Atomic Decomposition and Likelihood Ratio Test 

Deng, Shiwen, Harbin Inst. of Tech. 

Han, Jiqing, Harbin Inst. of Tech. 

The voice activity detection (VAD) algorithms by using Discrete Fourier Transform (DFT) coefficients are widely found 

in literature. However, some shortcomings for modeling a signal in the DFT can easily degrade the performance of a VAD 

in noise environment. To overcome the problem, this paper presents a novel approach by using the complex coefficients 

derived from complex exponential atomic decomposition of a signal. Those coefficients are modeled by a complex Gaussian 

probability distribution and a statistical model is employed to derive the decision rule from the likelihood ratio test. 

According to the experimental results, the proposed VAD method shows better performance than the VAD based on DFT 

coefficients in various noise environments. 

12:00-12:20, Paper MoAT5.4 

Speaker Change Detection based on the Pairwise Distance Matrix 

Seo, Jin S., Gangneung-Wonju National Univ. 

Speaker change detection is most commonly done by statistically determining whether the two adjacent segments of a 

speech stream are significantly different or not. In this paper, we propose a novel method to detect speaker change points 

based on the minimum statistics of the pairwise distance matrix of feature vectors. The use of the minimum statistics 

makes it possible to compare between the similar acoustic groups, which is effective in suppressing the phonetic variation. 

Experimental results showed that the proposed method is promising for speech change detection problem. 

12:20-12:40, Paper MoAT5.5 

Real-Time User Position Estimation in Indoor Environments using Digital Watermarking for Audio Signals 

Kaneto, Ryosuke, Osaka Univ. 

Nakashima, Yuta, Osaka Univ. 

Babaguchi, Noboru, Osaka Univ. 

In this paper, we propose a method for estimating the user position where a user is holding a microphone in an indoor environment 

using digital watermarking for audio signals. The proposed method utilizes detection strengths, which are calculated 

while detecting spread-spectrum-based watermarks. Taking into account delays and attenuation of the watermarked 

signals emitted from multiple loudspeakers and other factors, we construct a model of detection strengths. The user position 

is estimated in real-time using the model. The experimental results indicate that the user positions are estimated with 1.3 

m of root mean squared error on average for the case where the user is static. We demonstrate that the proposed method 

successfully estimates the user position even when the user moves. 

- 25 -

MoAT6 Topkapı Hall B 

Human Computer Interaction Regular Session 

Session chair: Drygajlo, Andrzej (EPFL) 

11:00-11:20, Paper MoAT6.1 

Gaze Probing: Event-Based Estimation of Objects being Focused On 

Yonetani, Ryo, Kyoto Univ. 

Kawashima, Hiroaki, Kyoto Univ. 

Hirayama, Takatsugu, Kyoto Univ. 

Matsuyama, Takashi, Kyoto Univ. 

We propose a novel method to estimate the object that a user is focusing on by using the synchronization between the 

movements of objects and a user’s eyes as a cue. We first design an event as a characteristic motion pattern, and we then 

embed it within the movement of each object. Since the user’s ocular reactions to these events are easily detected using a 

passive camera-based eye tracker, we can successfully estimate the object that the user is focusing on as the one whose 

movement is most synchronized with the user’s eye reaction. Experimental results obtained from the application of this 

system to dynamic content (consisting of scrolling images) demonstrate the effectiveness of the proposed method over 

existing methods. 

11:20-11:40, Paper MoAT6.2 

A Covariate Shift Minimisation Method to Alleviate Non-Stationarity Effects for an Adaptive Brain-Computer Interface 

Satti, Abdul Rehman, Univ. of Ulster 

Guan, Cuntai, Inst. For Infocomm Res. 

Coyle, Damien, Univ. of Ulster 

Prasad, Girijesh, Univ. of Ulster 

The non-stationary nature of the electroencephalogram (EEG) poses a major challenge for the successful operation of a 

brain-computer interface (BCI) when deployed over multiple sessions. The changes between the early training measurements 

and the proceeding multiple sessions can originate as a result of alterations in the subject’s brain process, new 

cortical activities, change of recording conditions and/or change of operation strategies by the subject. These differences 

and alterations over multiple sessions cause deterioration in BCI system performance if periodic or continuous adaptation 

to the signal processing is not carried out. In this work, the covariate shift is analyzed over multiple sessions to determine 

the non-stationarity effects and an unsupervised adaptation approach is employed to account for the degrading effects this 

might have on performance. To improve the system’s online performance, we propose a covariate shift minimization 

(CSM) method, which takes into account the distribution shift in the feature set domain to reduce the feature set overlap 

and unbalance for different classes. The analysis and the results demonstrate the importance of CSM, as this method not 

only improves the accuracy of the system, but also reduces the classification unbalance for different classes by a significant 

amount. 

11:40-12:00, Paper MoAT6.3 

A Probabilistic Language Model for Hand Drawings 

Akce, Abdullah, Univ. of Illinois at Urbana-Champaign 

Bretl, Timothy, Univ. of Illinois at Urbana-Champaign 

Probabilistic language models are critical to applications in natural language processing that include speech recognition, 

optical character recognition, and interfaces for text entry. In this paper, we present a systematic way to learn a similar 

type of probabilistic language model for hand drawings from a database of existing artwork by representing each stroke 

as a sequence of symbols. First, we propose a language in which the symbols are circular arcs with length fixed by a scale 

parameter and with curvature chosen from a fixed low-cardinality set. Then, we apply an algorithm based on dynamic 

programming to represent each stroke of the drawing as a sequence of symbols from our alphabet. Finally, we learn the 

probabilistic language model by constructing a Markov model. We compute the entropy of our language in a test set as 

measured by the expected number of bits required for each symbol. Our language model might be applied in future work 

to create a drawing interface for noisy and low-bandwidth input devices, for example an electroencephalograph (EEG) 

that admits one binary command per second. The results indicate that by leveraging our language model, the performance 

of such an interface would be enhanced by about 20 percent. 

- 26 -

12:00-12:20, Paper MoAT6.4 

AR-PCA-HMM Approach for Sensorimotor Task Classification in EEG-Based Brain-Computer Interfaces 

Argunsah, Ali Ozgur, Inst. Gulbenkian de Ciencia 

Cetin, Mujdat, Sabanci Univ. 

We propose an approach based on Hidden Markov models (HMMs) combined with principal component analysis (PCA) 

for classification of four-class single trial motor imagery EEG data for brain computer interfacing (BCI) purposes. We extract 

autoregressive (AR) parameters from EEG data and use PCA to decrease the number of features for better training 

of HMMs. We present experimental results demonstrating the improvements provided by our approach over an existing 

HMM-based EEG single trial classification approach as well as over state-of-the-art classification methods. 

12:20-12:40, Paper MoAT6.5 

Design, Implementation and Evaluation of a Real-Time P300-Based Brain-Computer Interface System 

Amcalar, Armagan, Sabanci Univ. 


We present a new end-to-end brain-computer interface system based on electroencephalography (EEG). Our system exploits 

the P300 signal in the brain, a positive deflection in event-related potentials, caused by rare events. P300 can be 

used for various tasks, perhaps the most well-known being a spelling device. We have designed a flexible visual stimulus 

mechanism that can be adapted to user preferences and developed and implemented EEG signal processing, learning and 

classification algorithms. Our classifier is based on Bayes linear discriminant analysis, in which we have explored various 

choices and improvements. We have designed data collection experiments for offline and online decision-making and 

have proposed modifications in the stimulus and decision-making procedure to increase online efficiency. We have evaluated 

the performance of our system on 8 healthy subjects on a spelling task and have observed that our system achieves 

higher average speed than state-of-the-art systems reported in the literature for a given classification accuracy. 

MoAT7 Dolmabahçe Hall C 

Video Classification and Retrieval Regular Session 

Session chair: Sarkar, Sudeep (Univ. of South Florida) 

11:00-11:20, Paper MoAT7.1 

Motion-Sketch based Video Retrieval using a Trellis Levenshtein Distance 

Hu, Rui, Univ. of Surrey 

Collomosse, John Philip, Univ. of Surrey 

We present a fast technique for retrieving video clips using free-hand sketched queries. Visual keypoints within each video 

are detected and tracked to form short trajectories, which are clustered to form a set of space-time tokens summarising 

video content. A Viterbi process matches a space-time graph of tokens to a description of colour and motion extracted 

from the query sketch. Inaccuracies in the sketched query are ameliorated by computing path cost using a Levenshtein 

(edit) distance. We evaluate over datasets of sports footage. 

11:20-11:40, Paper MoAT7.2 

Extracting Key Sub-Trajectory Features for Supervised Tactic Detection in Sports Video 

Zhang, Yi, Chinese Acad. of Sciences 

Xu, Changsheng, Chinese Acad. of Sciences 

Lu, Hanqing, Chinese Acad. of Sciences 

Tactic analysis is receiving more attention in sports video analysis for its assistance to coaches and players. This paper 

proposes an efficient key sub-trajectory feature representation of ball trajectory for tactic analysis. Ball trajectories are 

modeled with the generalized suffix tree where frequent sub-trajectory patterns are searched for. Key sub-trajectory patterns 

are extracted by further filtering these frequent sub-trajectory patterns. Instead of directly using individual sub-trajectories 

as features to train tactic detectors, we take key sub-trajectory patterns as a whole. Key sub-trajectory feature representation 

effectively removes noise, reduces the dimension of features, and improves the performance of supervised learning to 

detect tactics. 

- 27 -

11:40-12:00, Paper MoAT7.3 

A New Symmetry based on Proximity of Wavelet-Moments for Text Frame Classification in Video 

Palaiahnakote, Shivakumara, National Univ. of Singapore 

Dutta, Anjan, Univ. Autonoma de Barcelona 

Tan, Chew-Lim, National Univ. of Singapore 

Pal, Umapada, Indian Statistical Inst. 

This paper proposes the use of a new symmetry property based on proximity of the median moments in the wavelet domain. 

The method divides a given frame into 16 equally sized blocks to classify the true text frame. The average of high frequency 

subbands of a block is used for computing median moments to brighten the text pixel in a block of video frame. Then Kmeans 

clustering with K=2 is applied on the median moments of the block to classify it as a probable text block. For classified 

blocks, average wavelet median moments are computed for a sliding window. We introduce Max-Min cluster to 

classify the probable text pixel in each probable text block. The four quadrants are formed from the centroid of the probable 

text pixels. The new concept called symmetry is introduced to identify the true text block based on proximity between 

probable text pixels in each quadrant. If the frame produces at least one true text block, it is considered as a text frame 

otherwise a non-text frame. The method is tested on three datasets to evaluate the robustness of the method in classification 

of text frames in terms of recall and precision. 

12:00-12:20, Paper MoAT7.4 

Edge based Binarization for Video Text Images 

Zhou, Zhiwei, National Univ. of Singapore 

Li, Linlin, Univ. of Singapore 


This paper introduces a binarization method based on edge for video text images, especially for images with complex 

background or low contrast. The binarization method first detects the contour of the text, and utilizes a local thresholding 

method to decide the inner side of the contour, and then fills up the contour to form characters that are recognizable to 

OCR software. Experiment results show that our method is especially effective on complex background and low contrast 

images. 

12:20-12:40, Paper MoAT7.5 

Detecting Group Turn Patterns in Conversations using Audio-Video Change Scale-Space 

Krishnan, Ravikiran, Univ. of South Florida 

Sarkar, Sudeep, Univ. of South Florida 

Automatic analysis of conversations is important for extracting high-level descriptions of meetings. In this work, as an alternative 

to linguistic approaches, we develop a novel, purely bottom-up representation, constructed from both audio and 

video signals that help us characterize and build a rich description of the content at multiple temporal scales. We consider 

the evolution of the detected change, using Bayesian Information Criterion (BIC) at multiple temporal scales to build an 

audio-visual change scale space. Peaks detected in this representation, yields group-turn based conversational changes at 

different temporal scales. Conversation overlaps, changes and their inferred models offer an intermediate-level description 

of meeting videos that can be useful in summarization and indexing of meetings. Results on NIST meeting room dataset 

showed a true positive rate of 88% 

14:00-15:00, MoP2L1 Anadolu Auditorium 

Embracing Uncertainty: The New Machine Intelligence 

Christopher M. Bishop Plenary Session 

Microsoft Research Cambridge, UK 

Professor Chris Bishop is Chief Research Scientist at Microsoft Research Cambridge. He also has a Chair in computer 

science at the University of Edinburgh, and is a Fellow of Darwin College Cambridge. Chris is the author of the leading 

textbook “Pattern Recognition and Machine Learning” (Springer, 2006). His research interests include probabilistic approaches 

to machine learning, as well as their application to fields such as biomedical sciences and healthcare. 

The first successful applications of machine intelligence were based on expert systems constructed using rules elicited 

from human experts. Limitations in the applicability of this approach helped drive the second generation of machine intelligence 

methods, as typified by neural as elin and as eli vector machines, which can be characterised as black-box sta- 

- 28 -

tistical models fitted to large data sets. In this talk I will describe a new paradigm for machine intelligence, based on probabilistic 

graphical models, which has emerged over the last five years and which allows strong prior knowledge from 

domain experts to be combined with machine learning techniques to enable a new generation of large-scale applications. 

The talk will be illustrated with tutorial examples as well as real-world case studies. 

MoBT1 Marmara Hall 

Tracking and Surveillance – I Regular Session 

Session chair: Goldgof, Dmitry (Univ of South Florida) 

15:30-15:50, Paper MoBT1.1 

Improved Shadow Removal for Robust Person Tracking in Surveillance Scenarios 

Sanin, Andres, NICTA 

Sanderson, Conrad, NICTA 

Lovell, Brian Carrington, The Univ. of Queensland 

Shadow detection and removal is an important step employed after foreground detection, in order to improve the segmentation 

of objects for tracking. Methods reported in the literature typically have a significant trade-off between the shadow 

detection rate (classifying true shadow areas as shadows) and the shadow discrimination rate (discrimination between 

shadows and foreground). We propose a method that is able to achieve good performance in both cases, leading to improved 

tracking in surveillance scenarios. Chromacity information is first used to create a mask of candidate shadow pixels, followed 

by employing gradient information to remove foreground pixels that were incorrectly included in the mask. Experiments 

on the CAVIAR dataset indicate that the proposed method leads to considerable improvements in multiple object 

tracking precision and accuracy. 

15:50-16:10, Paper MoBT1.2 

Multi-Cue Integration for Multi-Camera Tracking 

Chen, Kuan-Wen, National Taiwan Univ. 

Hung, Yi-Ping, National Taiwan Univ. 

For target tracking across multiple cameras with disjoint views, previous works usually employed multiple cues and 

focused on learning a better matching model of each cue, separately. However, none of them had discussed how to integrate 

these cues to improve performance, to our best knowledge. In this paper, we look into the multi-cue integration problem 

and propose an unsupervised learning method since a complicated training phase is not always viable. In the experiments, 

we evaluate several types of score fusion methods and show that our approach learns well and can be applied to large 

camera networks more easily. 

16:10-16:30, Paper MoBT1.3 

Learning Pedestrian Trajectories with Kernels 

Ricci, Elisa, Fondazione Bruno Kessler 

Tobia, Francesco, Fondazione Bruno Kessler 

Zen, Gloria, Fondazione Bruno Kessler 

We present a novel method for learning pedestrian trajectories which is able to describe complex motion patterns such as 

multiple crossing paths. This approach adopts Kernel Canonical Correlation Analysis (KCCA) to build a mapping between 

the physical location space and the trajectory patterns space. To model crossing paths we rely on a clustering algorithm 

based on Kernel K-means with a Dynamic Time Warping (DTW) kernel. We demonstrate the effectiveness of our method 

incorporating the learned motion model into a multi-person tracking algorithm and testing it on several video surveillance 

sequences. 

16:30-16:50, Paper MoBT1.4 

Bag of Features Tracking 

Yang, Fan, Dalian Univ. of Tech. 

Lu, Hu-Chuan, Dalian Univ. of Tech. 

Chen, Yen-Wei, Ritsumeikan Univ. 

- 29 -

In this paper, we propose a visual tracking approach based on “bag of features” (BoF) algorithm. We randomly sample 

image patches within the object region in training frames for constructing two codebooks using RGB and LBP features, 

instead of only one codebook in traditional BoF. Tracking is accomplished by searching for the highest similarity between 

candidates and codebooks. Besides, updating mechanism and result refinement scheme are included in BoF tracking. We 

fuse patch-based approach and global template-based approach into a unified framework. Experiments demonstrate that 

our approach is robust in handling occlusion, scaling and rotation. 

16:50-17:10, Paper MoBT1.5 

Gradient Constraints Can Improve Displacement Expert Performance 

Tresadern, Philip Andrew, Univ. of Manchester 

Cootes, Tim, The Univ. of Manchester 

The `displacement expert’ has recently proven popular for rapid tracking applications. In this paper, we note that experts 

are typically constrained only to produce approximately correct parameter updates at training locations. However, we 

show that incorporating constraints on the gradient of the displacement field within the learning framework results in an 

expert with better convergence and fewer local minima. We demonstrate this proposal for facial feature localization in 

static images and object tracking over a sequence. 

MoBT2 Topkapı Hall B 

Dimensionality Reduction Regular Session 

Session chair: Somol, Petr (Institute of Information Theory and Automation) 

15:30-15:50, Paper MoBT2.1 

Temporal Extension of Laplacian Eigenmaps for Unsupervised Dimensionality Reduction of Time Series 

Lewandowski, Michal, Kingston Univ. 

Martinez-Del-Rincon, Jesus, Kingston Univ. 

Makris, Dimitrios, Kingston Univ. 

Nebel, Jean-Christophe, Kingston Univ. 

A novel non-linear dimensionality reduction method, called Temporal Laplacian Eigenmaps, is introduced to process efficiently 

time series data. In this embedded-based approach, temporal information is intrinsic to the objective function, 

which produces description of low dimensional spaces with time coherence between data points. Since the proposed 

scheme also includes bidirectional mapping between data and embedded spaces and automatic tuning of key parameters, 

it offers the same benefits as mapping-based approaches. Experiments on a couple of computer vision applications demonstrate 

the superiority of the new approach to other dimensionality reduction method in term of accuracy. Moreover, its 

lower computational cost and generalisation abilities suggest it is scalable to larger datasets. 

15:50-16:10, Paper MoBT2.2 

Orthogonal Locality Sensitive Fuzzy Discriminant Analysis in Sleep-Stage Scoring 

Khushaba, Rami N., Univ. of Tech. Sydney 

Elliott, Rosalind, Univ. of Tech. Sydney 

Alsukker, Akram, Univ. of Tech. Sydney 

Al-Ani, Ahmed, Univ. of Tech. Sydney 

Mckinley, Sharon, Univ. of Tech. Sydney 

Sleep-stage scoring plays an important role in analyzing the sleep patterns of people. Studies have revealed that Intensive 

Care Unit (ICU) patients do not usually get enough quality sleep, and hence, analyzing their sleep patterns is of increased 

importance. Due to the fact that sleep data are usually collected from a number of Electroencephalogram (EEG), Electromyogram 

(EMG) and Electrooculography (EOG) channels, the feature set size can become large, which may affect the 

development of on-line scoring systems. Hence, a dimensionality reduction step is needed. One of the powerful dimensionality 

reduction approaches is based on the concept of Linear Discriminant Analysis (LDA). Unlike existing variants 

of LDA, this paper presents a new method that considers the fuzzy nature of input measurements while preserving their 

local structure. Practical results indicate the significance of preserving the local structure of sleep data, which is achieved 

by the proposed method, and hence attaining superior results to other dimensionality reduction methods. 

- 30 -

16:10-16:30, Paper MoBT2.3 

A Recursive Online Kernel PCA Algorithm 

Hasanbelliu, Erion, Univ. of Florida 

Sanchez-Giraldo, Luis Gonzalo, Univ. of Florida 

Principe, Jose, Univ. of Florida 

In this paper, we describe a new method for performing kernel principal component analysis which is online and also has 

a fast convergence rate. The method follows the Rayleigh quotient to obtain a fixed point update rule to extract the leading 

eigenvalue and eigenvector. Online deflation is used to estimate the remaining components. These operations are performed 

in reproducing kernel Hilbert space (RKHS) with linear order memory and computation complexity. The derivation of the 

method and several applications are presented. 

16:30-16:50, Paper MoBT2.4 

Effective Dimensionality Reduction based on Support Vector Machine 

Moon, Sangwoo, Univ. of Tennessee 

Qi, Hairong, Univ. of Tennessee 

This paper presents an effective dimensionality reduction method based on support vector machine. By utilizing mapping 

vectors from support vector machine for dimensionality reduction purpose, we obtain features which are computationally 

efficient, providing high classification accuracy and robustness especially in noisy environment. These characteristics are 

acquired from the generalization capability of support vector machine by minimizing the structural risk. To further reduce 

dimensionality, this paper introduces the redundancy removal process based on an asymmetric I relation measure with 

kernel function. Experimental results show that the proposed dimensionality reduction method provides the most appropriate 

trade off between classification accuracy and robustness in relatively low dimensional space. 

16:50-17:10, Paper MoBT2.5 

Prototype Selection for Dissimilarity Representation by a Genetic Algorithm 

Plasencia, Yenisel, CENATAV, Cuba 

Garcia, Edel, Advanced Tech. Application Center 

Orozco-Alzate, Mauricio, Univ. Nacional de Colombia Sede Manizales, Colombia 

Duin, Robert, TU Delft 

Dissimilarities can be a powerful way to represent objects like strings, graphs and images for which it is difficult to find 

good features. The resulting dissimilarity space may be used to train any classifier appropriate for feature spaces. There is, 

however, a strong need for dimension reduction. Straightforward procedures for prototype selection as well as feature selection 

have been used for this in the past. Complicated sets of objects may need more advanced procedures to overcome local minima. 

In this paper it is shown that genetic algorithms, previously used for feature selection, may be used for building good 

dissimilarity spaces as well, especially when small sets of prototypes are needed for computational reasons. 

MoBT3 Topkapı Hall A 

Motion and Multiple-View Vision – II Regular Session 

Session chair: Torsello, Andrea (Univ. Ca’ Foscari) 

15:30-15:50, Paper MoBT3.1 

Multiple View Geometry for Non-Rigid Motions Viewed from Curvilinear Motion Projective Cameras 

Wan, Cheng, Nagoya Inst. of Tech. 


This paper presents a tensorial representation of multiple projective cameras with arbitrary curvilinear motions. It enables 

us to define multilinear relationship of image points derived from non-rigid object motions viewed from multiple cameras 

with arbitrary curvilinear motions. We show the new multilinear relationship is useful for generating images of non-rigid 

object motions viewed from cameras with arbitrary curvilinear motions. The method is tested in real image sequences. 

- 31 -

15:50-16:10, Paper MoBT3.2 

Estimating Nonrigid Shape Deformation using Moments 

Liu, Wei, Florida Inst. of Tech. 

Ribeiro, Eraldo, Florida Inst. of Tech. 

Image moments have been widely used for designing robust shape descriptors that are invariant to rigid transformations. 

In this work, we address the problem of estimating non-rigid deformation fields based on image moment variations. By 

using a single family of polynomials to both parameterize the deformation field and to define image moments, we can 

represent image moments variation as a system of quadratic functions, and solve for the deformation parameters. As a 

result, we can recover the deformation field between two images without solving the correspondence problem. Additionally, 

our method is highly robust to image noise. The method was tested on both synthetically deformed MPEG-7 shapes and 

cardiac MRI sequences. 

16:10-16:30, Paper MoBT3.3 

Optical Flow Estimation using Diffusion Distances 

Wartak, Szymon, Univ. of York 

Bors, Adrian, Univ. of York 

In this paper we apply the diffusion framework to dense optical flow estimation. Local image information is represented 

by matrices of gradients between paired locations. Diffusion distances are modelled as sums of eigenvectors weighted by 

their eigenvalues extracted following the eigen decomposion of these matrices. Local optical flow is estimated by correlating 

diffusion distances characterizing features from different frames. A feature confidence factor is defined based on 

the local correlation efficiency when compared to that of its neighbourhood. High confidence optical flow estimates are 

propagated to areas of lower confidence. 

16:30-16:50, Paper MoBT3.4 

Novel Multi View Structure Estimation based on Barycentric Coordinates 

Ruether, Matthias, Graz Univ. of Tech. 

Bischof, Horst, Graz Univ. of Tech. 

Traditionally, multi-view stereo algorithms estimate three-dimensional structure from corresponding points by linear triangulation 

or bundle-adjustment. This introduces systematic errors in case of inaccurate camera calibration and partial 

occlusion. The errors are not negligible in applications requiring high accuracy like micro-metrology or quality inspection. 

We show how accuracy of structure estimation can be significantly increased by using a barycentric coordinate representation 

for central perspective projection. Experiments show a reduction of geometric error by 50% compared with bundle 

adjustment. The error remains almost constantly low, even under partial occlusion. 

16:50-17:10, Paper MoBT3.5 

Estimation of Non-Rigid Surface Deformation using Developable Surface Model 

Watanabe, Yoshihiro, Univ. of Tokyo 

Nakashima, Takashi, Univ. of Tokyo 

Komuro, Takashi, Univ. of Tokyo 

Ishikawa, Masatoshi, Univ. of Tokyo 

There is a strong demand for a method of acquiring a non-rigid shape under deformation with high accuracy and high resolution. 

However, this is difficult to achieve because of performance limitations in measurement hardware. In this paper, 

we propose a model based method for estimating non-rigid deformation of a developable surface. The model is based on 

geometric characteristics of the surface, which are important in various applications. This method improves the accuracy 

of surface estimation and planar development from a low-resolution point cloud. Experiments using curved documents 

showed the effectiveness of the proposed method. 

- 32 -

MoBT4 Dolmabahçe Hall A 

Ocular Biometrics Regular Session 

Session chair: Zhang, David (The Hong Kong Polytechnic Univ.) 

15:30-15:50, Paper MoBT4.1 

On the Fusion of Periocular and Iris Biometrics in Non-Ideal Imagery 

Woodard, Damon, Clemson Univ. 

Pundlik, Shrinivas, Clemson Univ. 

Miller, Philip, Clemson Univ. 

Jillela, Raghavender, West Virginia Univ. 

Ross, Arun, West Virginia Univ. 

Human recognition based on the iris biometric is severely impacted when encountering non-ideal images of the eye characterized 

by occluded irises, motion and spatial blur, poor contrast, and illumination artifacts. This paper discusses the 

use of the periocular region surrounding the iris, along with the iris texture patterns, in order to improve the overall recognition 

performance in such images. Periocular texture is extracted from a small, fixed region of the skin surrounding the 

eye. Experiments on the images extracted from the Near Infra-Red (NIR) face videos of the Multi Biometric Grand Challenge 

(MBGC) dataset demonstrate that valuable information is contained in the periocular region and it can be fused with 

the iris texture to improve the overall identification accuracy in non-ideal situations. 

15:50-16:10, Paper MoBT4.2 

Genetic-Based Type II Feature Extraction for Periocular Biometric Recognition: Less is More 

Adams, Joshua, North Carolina A&T Univ. 

Woodard, Damon, Clemson Univ. 

Dozier, Gerry, North Carolina A&T State Univ. 

Miller, Philip, Clemson Univ. 

Bryant, Kelvin, North Carolina A&T State Univ. 

Glenn, George, North Carolina A&T State Univ. 

Given an image from a biometric sensor, it is important for the feature extraction module to extract an original set of 

features that can be used for identity recognition. This form of feature extraction has been referred to as Type I feature extraction. 

For some biometric systems, Type I feature extraction is used exclusively. However, a second form of feature extraction 

does exist and is concerned with optimizing/minimizing the original feature set given by a Type I feature extraction 

method. This second form of feature extraction has been referred to as Type II feature extraction (feature selection). In 

this paper, we present a genetic-based Type II feature extraction system, referred to as GEFE (Genetic & Evolutionary 

Feature Extraction), for optimizing the feature sets returned by Loocal Binary Pattern Type I feature extraction for periocular 

biometric recognition. Our results show that not only does GEFE dramatically reduce the number of features needed 

but the evolved features sets also have higher recognition rates. 

16:10-16:30, Paper MoBT4.3 

Multispectral Eye Detection: A Preliminary Study 

Whitelam, Cameron, WVU 

Jafri, Zain, WVU 

Bourlai, Thirimachos, WVU 

In this paper the problem of eye detection across three different bands, i.e., the visible, multispectral, and short wave 

infrared (SWIR), is studied in order to illustrate the advantages and limitations of multi-band eye localization. The contributions 

of this work are two-fold. First, a multi-band database of 30 subjects is assembled and used to illustrate the 

challenges associated with the problem. Second, a set of experiments is performed in order to demonstrate the possibility 

for multi-band eye detection. Experiments show that the eyes on face images captured under different bands can be detected 

with promising results. Finally, we illustrate that recognition performance in all studied bands is favorably affected by the 

geometric normalization of raw face images that is based on our proposed detection methodology. To the best of our 

knowledge this is the first time that this problem is being investigated in the open literature in the context of human eye 

localization across different bands. 

- 33 -

16:30-16:50, Paper MoBT4.4 

Entropy of Feature Point-Based Retina Templates 

Jeffers, Jason, RMIT Univ. 

Arakala, Arathi, RMIT Univ. 

Horadam, Kathy, RMIT Univ. 

This paper studies the amount of distinctive information contained in a privacy protecting and compact template of a 

retinal image created from the locations of crossings and bifurcations in the choroidal vasculature, otherwise called feature 

points. Using a training set of 20 different retina, we build a template generator that simulates one million imposter comparisons 

and computes the number of imposter retina comparisons that successfully matched at various thresholds. The 

template entropy thus computed was used to validate a theoretical model of imposter comparisons. The simulator and the 

model both estimate that 20 bits of entropy can be achieved by the feature point-based template. Our results reveal the 

distinctiveness of feature point-based retinal templates, hence establishing their potential as a biometric identifier for high 

security and memory intensive applications. 

16:50-17:10, Paper MoBT4.5 

Hierarchical Fusion of Face and Iris for Personal Identification 

Zhang, Xiaobo, Chinese Acad. of Sciences 

Sun, Zhenan, Chinese Acad. of Sciences 

Tan, Tieniu, Chinese Acad. of Sciences 

Most existing face and iris fusion schemes are concerned about improving performance on good quality images under 

controlled environments. In this paper, we propose a hierarchical fusion scheme for low quality images under uncontrolled 

situations. In the training stage, canonical correlation analysis (CCA) is adopted to construct a statistical mapping from 

face to iris in pixel level. In the testing stage, firstly the probe face image is used to obtain a subset of candidate gallery 

samples via regression between the probe face and gallery irises, then ordinal representation and sparse representation are 

performed on these candidate samples for iris recognition and face recognition respectively. Finally, score level fusion via 

min-max normalization is performed to make final decision. Experimental results on our low quality database show the 

outperforming performance of proposed method. 

MoBT5 Anadolu Auditorium 

Image Analysis – II Regular Session 

Session chair: Mirmehdi, Majid (Univ. of Bristol) 

15:30-15:50, Paper MoBT5.1 

Wavelet-Based Texture Retrieval using a Mixture of Generalized Gaussian Distributions 

Allili, Mohand Said, Univ. du Québec en Outaouais 

In this paper, we address the texture retrieval problem using wavelet distribution. We propose a new statistical scheme to 

represent the marginal distribution of the wavelet coefficients using a mixture of generalized Gaussian distributions 

(MoGG). The MoGG allows to capture a wide range of histogram shapes, which provides a better description of texture 

and enhances texture discrimination. We propose a similarity measurement based on Kullback-Leibler distance (KLD), 

which is calculated using MCMC Metropolis-Hastings sampling algorithm. We show that our approach yields better texture 

retrieval results than previous methods using only a single probability density function (pdf) for wavelet representation, 

or texture energy distribution. 

15:50-16:10, Paper MoBT5.2 

Adaptive Color Curve Models for Image Matting 

Cho, Sunyoung, Yonsei Univ. 

Byun, Hyeran, Yonsei Univ. 

Image matting is the process of extracting a foreground element from a single image with limited user input. To solve the 

inherently ill-posed problem, there exist various methods which use specific color model. One representative method assumes 

that the colors of the foreground and background elements satisfy the linear color model. The other recent method 

considers line-point color model and point-point color model. In this paper we present a new adaptive color curve model 

for image matting. We assume that the colors of local region form curve. Based on these pixels in the local region, we 

adaptively construct a curve model using quadratic Bezier curve model. This curve model enables us to derive a matting 

- 34 -

equation for estimating alphas of pixels forming a curve using quadratic formula. We show that our model estimates alpha 

mattes comparable or more accurately than recent existing methods. 

16:10-16:30, Paper MoBT5.3 

Fast and Accurate Approximation of the Euclidean Opening Function in Arbitrary Dimension 

Coeurjolly, David, CNRS – Univ. Claude Bernard Lyon 1 

In this paper, we present a fast and accurate approximation of the Euclidean opening function which is a wide-used tool 

in morphological mathematics to analyze binary shapes since it allows us to define a local thickness distribution. The proposed 

algorithm can be defined in arbitrary dimension thanks to the existing techniques to compute the discrete power diagram. 

16:30-16:50, Paper MoBT5.4 

Non-Ring Filters for Robust Detection of Linear Structures 

Läthén, Gunnar, Linkoping Univ. 

Olivier, Olivier Cros, Linköping Univ. 

Knutsson, Hans, 

Borga, Magnus, Linköping Univ. 

Many applications in image analysis include the problem of linear structure detection, e.g. segmentation of blood vessels 

in medical images, roads in satellite images, etc. A simple and efficient solution is to apply linear filters tuned to the structures 

of interest and extract line and edge positions from the filter output. However, if the filter is not carefully designed, 

artifacts such as ringing can distort the results and hinder a robust detection. In this paper, we study the ringing effects 

using a common Gabor filter for linear structure detection, and suggest a method for generating non-ring filters in 2D and 

3D. The benefits of the non-ring design are motivated by results on both synthetic and natural images. 

16:50-17:10, Paper MoBT5.5 

Incremental Distance Transforms (IDT) 

Schouten, Theo, Radboud Univ. Nijmegen 

Van Den Broek, Egon L., Univ. of Twente 

A new generic scheme for incremental implementations of distance transforms (DT) is presented: Incremental Distance 

Transforms (IDT). This scheme is applied on the city-block, Chamfer, and three recent exact Euclidean DT (E2DT). A 

benchmark shows that for all five DT, the incremental implementation results in a significant speedup: 3.4 -10 times. However, 

significant differences (i.e., up to 12.5 times) among the DT remain present. The FEED transform, one of the recent 

E2DT, even showed to be faster than both city-block and Chamfer DT. So, through a very efficient incremental processing 

scheme for DT, a relief is found for E2DT’s computational burden. 

MoBT6 Dolmabahçe Hall B 

Document Segmentation Regular Session 

Session chair: Srihari, Sargur (Univ. at Buffalo) 

15:30-15:50, Paper MoBT6.1 

Text Separation from Mixed Documents using a Tree-Structured Classifier 

Peng, Xujun, State Univ. at Buffalo 

Setlur, Srirangaraj, Univ. at Buffalo 

Govindaraju, Venu, Univ. at Buffalo 

Sitaram, Ramachandrula, HP Lab. India 

In this paper, we propose a tree-structured multi-class classifier to identify annotations and overlapping text from machine 

printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT) 

which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all 

training data at each node with different weights. The evaluation of the proposed method is presented on a set of machine 

printed documents which have been annotated by multiple writers in an office/collaborative environment. 

- 35 -

15:50-16:10, Paper MoBT6.2 

Document Segmentation using Pixel-Accurate Ground Truth 

An, Chang, Lehigh Univ. 

Yin, Dawei, Lehigh Univ. 

Baird, Henry, Lehigh Univ. 

We compare methodologies for trainable document image content extraction, using a variety of ground-truth policies: 

loose, tight, and pixel-accurate. The goal is to achieve pixel-accurate segmentation of document images. Which groundtruth 

policy is the best has been debated. ``Loose’’ truth is obtained by sweeping rectangles to enclose entire text blocks 

etc, and can be an efficient manual task. ``Tight’’ truth requires more care, and more time, to enclose individual text lines. 

Pixel-accurate truth, in which only foreground pixels are labeled, can be obtained by applying the PARC PixLabeler tool; 

in our experience this tool was as quick to use as loose truthing. We have compared the accuracy of all three truthing policies, 

and report that tight truth supports higher accuracy than loose truth, and pixel-accurate truth yields the highest accuracy. 

We have also experimented on morphological expansions on pixel-accurate truth, by expanding sets of foreground 

pixels morphologically, and report that expanded pixel-accurate truth supports higher accuracy than pixel-accurate truth. 

16:10-16:30, Paper MoBT6.3 

An Adaptive Script-Independent Block-Based Text Line Extraction 

Ziaratban, Majid, Amirkabir Univ. of Technology 

Faez, Karim, Amirkabir Univ. of Technology 

In this paper, a novel script-independent block-based text line extraction technique is proposed for multi-skewed document 

images. Three parameters are defined to adopt the method with various writings. Extensive experiments on different 

datasets demonstrate that the proposed algorithm outperforms previous methods. 

16:30-16:50, Paper MoBT6.4 

Automated Quality Assurance for Document Logical Analysis 

Meunier, Jean-Luc, XRCE 

We consider here the general problem of converting documents available in print-ready or image format into a structured 

format that reflects the logical structure of the document. One aspect of the problem involves reconstructing conventional 

constructs such as titles, headings, captions, footnotes, etc. In practice, another important aspect involves putting in place 

some automated Quality Assessment (QA) method. We propose here a method to automate the QA in the case of a homogeneous 

collection by considering multiple documents at once instead of focusing only on the document being processed. 

16:50-17:10, Paper MoBT6.5 

The PAGE (Page Analysis and Ground-Truth Elements) Format Framework 

Pletschacher, Stefan, Univ. of Salford 

Antonacopoulos, Apostolos, Univ. of Salford 

There is a plethora of established and proposed document representation formats but none that can adequately support individual 

stages within an entire sequence of document image analysis methods (from document image enhancement to 

layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation 

framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, 

binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation 

of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications 

such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition 

series. 

MoBT7 Dolmabahçe Hall C 

Computer Aided Detection and Diagnosis Regular Session 

Session chair: Unal, Gozde (Sabanci Univ.) 

- 36 -

15:30-15:50, Paper MoBT7.1 

Dyslexia Diagnostics by Centerline-Based Shape Analysis of the Corpus Callosum 

Elnakib, Ahmed, Univ. of Louisville 

El-Baz, Ayman, Univ. of Auckland 

Casanova, Manuel, Univ. of Louisville 

Switala, Andrew, Univ. of Louisville 

Dyslexia severely impairs learning abilities, so that improved diagnostic methods are called for. Neuropathological studies 

have revealed abnormal anatomy of the Corpus Callosum (CC) in dyslexic brains. We explore a possibility of distinguishing 

between dyslexic and normal (control brains by quantitative CC shape analysis in 3D magnetic resonance images (MRI). 

Our approach consists of the three steps: (I) segmenting the CC from a given 3D MRI using the learned CC shape and 

visual appearance; (ii) extracting the centerline of the CC; and (iii) classifying the subject as dyslexic or normal based on 

the estimated length of the CC centerline using a _-nearest neighbor classifier. Experiments revealed significant differences 

(at the 95% confidence level) between the CC centerlines for 14 normal and 16 dyslexic subjects. Our initial classification 

suggests the proposed centerline-based shape analysis of the CC is a promising supplement to the current dyslexia diagnostics. 

15:50-16:10, Paper MoBT7.2 

A Probabilistic Information Fusion Approach to MR-Based Automated Diagnosis of Dementia 

Akgul, Ceyhun Burak, Vistek Machine Vision and Automation 

Ekin, Ahmet, Philips Res. Europe 

In this work, we present a probabilistic information fusion approach for the diagnosis of dementia from cross-sectional 

magnetic resonance (MR) images. The approach relies on first mapping the outputs of a support vector classifier (SVM) 

trained on image features to probabilities and then on combining these probabilities with the class-conditional distributions 

of neuropsychiatric test scores, such as the mini-mental state examination (MMSE). The SVM classifier is trained and 

tested on 121 subjects drawn from the Open Access Series of Imaging Studies (OASIS) database. Two independent sets 

of MMSE related statistics are estimated from data, one from the training set in OASIS and the other from the Alzheimer’s 

Disease Neuroimaging Initiative (ADNI) database. The probabilistic fusion of image-based SVM decisions with no visual 

MMSE information exhibits very steep receiver operating characteristic curves on the test set; giving, at the equal error 

rate operating point, 92% accuracy. 

16:10-16:30, Paper MoBT7.3 

Two-Level Algorithm for MCs Detection in Mammograms using Diverse-Adaboost-SVM 

Harirchi, Farshad, K. N. Toosi Univ. of Tech. 

Radparvar, Parham, K. N. Toosi Univ. of Tech. 

Abrishami Moghaddam, Hamid, K. N. Toosi Univ. of Tech. 

Dehghan, Faramarz, K. N. Toosi Univ. of Tech. 

Giti, Masoumeh, Tehran Univ. of Medical Sciences 

Clustered micro calcifications (MCs) are one of the early signs of breast cancer. In this paper, we propose a new computer 

aided diagnosis (CAD) system for automatic detection of MCs in two steps. First, pixels corresponding to potential micro 

calcifications are found using a multilayer feed-forward neural network. The input of this network consists of 4 wavelet 

and 2 gray-level features. The output of the network is then transformed to potential micro calcification objects using 

spatial 4-point connectivity. Second, we extract 25 features from the potential MC objects and use Diverse Adaboost SVM 

(DA-SVM) and 3 other classifiers to detect individual MCs. A free-response operating characteristics (FROC) curve issued 

to evaluate the performance of the CAD system. The 90.44% mean TP detection rate is achieved at the cost of 1.043 FP 

per image by using DA-SVM shows a quite satisfactory detection performance of CAD system. 

16:30-16:50, Paper MoBT7.4 

An Image Analysis Approach for Detecting Malignant Cells in Digitized H&E-Stained Histology Images of Follicular 

Lymphoma 

Sertel, Olcay, The Ohio State Univ. 

Catalyurek, Umit, The Ohio State Univ. 

Lozanski, Gerard, The Ohio State Univ. 

Shana’Ah, Arwa, The Ohio State Univ. 

- 37 -

Gurcan, Metin, The Ohio State Univ. 

The gold standard in follicular lymphoma (FL) diagnosis and prognosis is histopathological examination of tumor tissue 

samples. However, the qualitative manual evaluation is tedious and subject to considerable inter- and intra-reader variations. 

In this study, we propose an image analysis system for quantitative evaluation of digitized FL tissue slides. The developed 

system uses a robust feature space analysis method, namely the mean shift algorithm followed by a hierarchical grouping 

to segment a given tissue image into basic cytological components. We then apply further morphological operations to 

achieve the segmentation of individual cells. Finally, we generate a likelihood measure to detect candidate cancer cells 

using a set of clinically driven features. The proposed approach has been evaluated on a dataset consisting of 100 region 

of interest (ROI) images and achieves a promising 89% average accuracy in detecting target malignant cells. 

16:50-17:10, Paper MoBT7.5 

Microaneurysm (MA) Detection via Sparse Representation Classifier with MA and Non-MA Dictionary Learning 

Zhang, Bob, Univ. of Waterloo 

Zhang, Lei, The Hong Kong Pol. Univ. 

You, Jane, The Hong Kong Pol. Univ. 

Karray, Fakhri, Univ. of Waterloo 

Diabetic retinopathy (DR) is a common complication of diabetes that damages the retina and leads to sight loss if treated 

late. In its earliest stage, DR can be diagnosed by micro aneurysm (MA). Although some algorithms have been developed, 

the accurate detection of MA in color retinal images is still a challenging problem. In this paper we propose a new method 

to detect MA based on Sparse Representation Classifier (SRC). We first roughly locate MA candidates by using multiscale 

Gaussian correlation filtering, and then classify these candidates with SRC. Particularly, two dictionaries, one for 

MA and one for non-MA, are learned from example MA and non-MA structures, and are used in the SRC process. Experimental 

results on the ROC database show that the proposed method can well distinguish MA from non-MA objects. 

MoBT8 Lower Foyer 

Object Detection and Recognition; Performance Evaluation of Computer Vision Algorithms; Computer Vision 

Applications Poster Session 

Session chair: Chen, Chu-Song (Academia Sinica) 

15:00-17:10, Paper MoBT8.1 

A Neurobiologically Motivated Stochastic Method for Analysis of Human Activities in Video 

Sethi, Ricky, Univ. of California, Riverside 

Roy-Chowdhury, Amit, Univ. of California, Riverside 

In this paper, we develop a neurobiologically-motivated statistical method for video analysis that simultaneously searches 

the combined motion and form space in a concerted and efficient manner using well-known Markov chain Monte Carlo 

(MCMC) techniques. Specifically, we leverage upon an MCMC variant called the Hamiltonian Monte Carlo (HMC), 

which we extend to utilize data-based proposals rather than the blind proposals in a traditional HMC, thus creating the 

Data-Driven HMC (DDHMC). We demonstrate the efficacy of our system on real-life video sequences. 

15:00-17:10, Paper MoBT8.2 

Arbitrary Stereoscopic View Generation using Multiple Omnidirectional Image Sequences 

Hori, Maiya, Nara Inst. of Science and Tech. 

Kanbara, Masayuki, Nara Inst. of Science and Tech. 

Yokoya, Naokazu, Nara Inst. of Science and Tech. 

This paper proposes a novel method for generating arbitrary stereoscopic view from multiple omni directional image sequences. 

Although conventional methods for arbitrary view generation with an image-based rendering approach can create 

binocular views, positions and directions of viewpoints for stereoscopic vision are limited to a small range. In this research, 

we attempt to generate arbitrary stereoscopic views from omni directional image sequences that are captured in various 

multiple paths. To generate a high-quality stereoscopic view from a number of images captured at various viewpoints, appropriate 

ray information needs to be selected. In this paper, appropriate ray information is selected from a number of 

omni directional images using a penalty function expressed as ray similarity. In experiments, we show the validity of this 

penalty function by generating stereoscopic view from multiple real image sequences. 

- 38 -

15:00-17:10, Paper MoBT8.3 

Fast Odometry Integration in Local Bundle Adjustment-Based Visual SLAM 

Eudes, Alexandre, CEA LIST 

Lhuillier, Maxime, LASMEA 

Naudet Collette, Sylvie, CEA LIST, LVIC 

Dhome, Michel, Blaise Pascal Univ. 

The Simultaneous Localisation And Mapping (SLAM) for a camera moving in a scene is a long term research problem. 

Here we improve a recent visual SLAM which applies Local Bundle Adjustments (LBA) on selected key-frames of a 

video: we show how to correct the scale drift observed in long monocular video sequence using an additional odometry 

sensor. Our method and results are interesting for several reasons: (1) the pose accuracy is improved on real examples (2) 

we do not sacrifice the consistency between the reconstructed 3D points and image features to fit odometry data (3) the 

modification of the original visual SLAM method is not difficult. 

15:00-17:10, Paper MoBT8.4 

Classifying Textile Designs using Bags of Shapes 

Jia, Wei, Univ. of Dundee 

Mckenna, Stephen James, Univ. of Dundee 

The use of region shape descriptors was investigated for categorisation of textile design images. Images were segmented 

using MRF pixel labelling and the shapes of regions obtained were described with generic Fourier descriptors. Each image 

was represented as a bag of shapes. A simple yet competitive classification scheme based on nearest neighbour class-based 

matching was used. Classification performance was compared to that obtained when using bags of SIFT features. 

15:00-17:10, Paper MoBT8.5 

Driver Body-Height Prediction for an Ergonomically Optimized Ingress using a Single Omnidirectional Camera 

Scharfenberger, Christian, TU-Munich 

Chakraborty, Samarjit, TU-Munich 

Faerber, Georg, TU-Munich 

Maximizing passengers comfort is an important research topic in the domain of automotive systems engineering. In particular, 

an automatic adjustment of seat position according to driver height significantly increases the level of comfort 

during ingress. In this paper, we present a new method to estimate the height of approaching car drivers based on a single 

omni directional camera integrated with the side-view mirror of a car. Towards this, we propose mathematical descriptions 

of standard parking scenarios, allowing for an accurate height estimation. First, approaching drivers are extracted from 

image frames captured by the camera. Second, the scenario and height are initially estimated based on gathered samples 

of angles to head and foot-points of an approaching driver. An iterative optimization process removes outliers and refines 

the initially estimated scenario and height. Finally, we present a number of experimental results based on image sequences 

captured from real-life ingress scenarios. 

15:00-17:10, Paper MoBT8.6 

Torchlight Navigation 

Felsberg, Michael, Linköping Univ. 

Larsson, Fredrik, Linköping Univ. 

Wang, Han, Nanyang Tech. Univ. 

Ynnerman, Anders, Linköping Univ. 

Schön, Thomas, Linköping Univ. 

A common computer vision task is navigation and mapping. Many indoor navigation tasks require depth knowledge of 

flat, unstructured surfaces (walls, floor, ceiling). With passive illumination only, this is an ill-posed problem. Inspired by 

small children using a torchlight, we use a spotlight for active illumination. Using our torchlight approach, depth and orientation 

estimation of unstructured, flat surfaces boils down to estimation of ellipse parameters. The extraction of ellipses 

is very robust and requires little computational effort. 

- 39 -

15:00-17:10, Paper MoBT8.7 

Adaptive Image Projection Onto Non-Planar Screen using Projector-Camera Systems 

Yamanaka, Takashi, Nagoya Inst. of Tech. 



In this paper, we propose a method for projecting images onto non-planar screens by using projector-camera systems eliminating 

distortion in projected images. In this system, point-to-point correspondences in a projector image and a camera 

image should be extracted. For finding correspondences, the epipolar geometry between a projector and a camera is used. 

By using dynamic programming method on epipolar lines, correspondences between projector image and camera image 

are obtained. Furthermore, in order to achieve faster and more robust matching, the non-planar screen is approximately 

represented by a B-spline surface. The small number of parameters for the B-spline surface are estimated from corresponding 

pixels on epipolar lines rapidly. Experimental results show the proposed method works well for projecting images 

onto non-planar screens. 

15:00-17:10, Paper MoBT8.8 

Analysis and Adaptation of Integration Time in PMD Camera for Visual Servoing 

Gil, Pablo, Univ. of Alicante 

Pomares, Jorge, Univ. of Alicante 

Torres, Fernando, Univ. of Alicante 

The depth perception in the objects of a scene can be useful for tracking or applying visual servoing in mobile systems. 3D 

time-of-flight (ToF) cameras provide range images which give measurements in real time to improve these types of tasks. 

However, the distance computed from these range images is very changing with the integration time parameter. This paper 

presents an analysis for the online adaptation of integration time of ToF cameras. This online adaptation is necessary in order 

to capture the images in the best condition irrespective of the changes of distance (between camera and objects) caused by 

its movement when the camera is mounted on a robotic arm. 

15:00-17:10, Paper MoBT8.9 

Detecting Paper Fibre Cross Sections in Microtomy Images 

Kontschieder, Peter, Graz Univ. of Tech. 

Donoser, Michael, Graz Univ. of Tech. 

Kritzinger, Johannes, Graz Univ. of Tech. 

Bauer, Wolfgang, Graz Univ. of Tech. 


The goal of this work is the fully-automated detection of cellulose fibre cross sections in microtomy images. A lack of 

significant appearance information makes edges the only reliable cue for detection. We present a novel and highly discriminative 

edge fragment descriptor that represents angular relations between fragment points. We train a Random Forest 

with a plurality of these descriptors including their respective center votes. In such a way, the Random Forest exploits the 

knowledge about the object centroid for detection using a generalized Hough voting scheme. In the experiments we found 

that our method is able to robustly detect fibre cross sections in microtomy images and can therefore serve as initialization 

for successive fibre segmentation or tracking algorithms. 

15:00-17:10, Paper MoBT8.10 

Active Calibration of Camera-Projector Systems based on Planar Homography 

Park, Soon-Yong, Kyungpook National Univ. 

Park, Go Gwang, Kyungpook National Univ. 

This paper presents a simple and active calibration technique of camera-projector systems based on planar homography. 

From the camera image of a planar calibration pattern, we generate a projector image of the pattern through the homography 

between the camera and the projector. To determine the coordinates of the pattern corners from the view of the projector, 

we actively project a corner marker from the projector to align the marker with the printed pattern corners. Calibration is 

done in two steps. First, four outer corners of the pattern are identified. Second, all other inner corners are identified. The 

pattern image from the projector is then used to calibrate the projector. Experimental results of two types of camera-projector 

systems show that the projection errors of both camera and projector are less than 1 pixel. 

- 40 -

15:00-17:10, Paper MoBT8.11 

Abnormal Traffic Detection using Intelligent Driver Model 

Sultani, Waqas, Seoul National Univ. 

Choi, Jin Young, Seoul National Univ. 

We present a novel approach for detecting and localizing abnormal traffic using intelligent driver model. Specifically, we 

advect particles over video sequence. By treating each particle as a car, we compute driver behavior using intelligent driver 

model. The behaviors are learned using latent dirichlet allocation and frames are classified as abnormal using likelihood 

threshold criteria. In order to localize the abnormality; we compute spatial gradients of behaviors and construct Finite 

Time Lyaponov Field. Finally the region of abnormality is segmented using watershed algorithm. The effectiveness of 

proposed approach is validated using videos from stock footage websites. 

15:00-17:10, Paper MoBT8.12 

Detection of Moving Objects with Removal of Cast Shadows and Periodic Changes using Stereo Vision 

Moro, Alessandro, Univ. of Trieste 

Terabayashi, Kenji, Chuo Univ. 

Umeda, Kazunori, Chuo Univ. 

In this paper we present a method for the detection of moving objects for unknown and generic environments under cast 

shadow and periodic movements of non relevant objects (like waving leaves), using a combination of non-parametric 

thresholding algorithms and local cast shadow analysis with stereo camera information. Good detection rates were achieved 

in several environments under different lighting conditions, and objects could be detected independently of scene illumination, 

shadow, and periodic changes. 

15:00-17:10, Paper MoBT8.13 

Localized Image Matte Evaluation by Gradient Correlation 

Yao, Guilin, Harbin Inst. of Tech. 

Yao, Hongxun, Harbin Inst. of Tech. 

In natural image matting, various kinds of algorithms have been recently proposed. Moreover, alpha matting results have 

also been generated for comparison and composition into new backgrounds. However, all these methods have to make an 

alpha matte comparison to the ground truth so that one can get the final pixel-wised evaluation of these results. Nevertheless, 

while the input datasets are just used for test and there are no ground truth mattes, it is not possible to perform comparisons 

and to generate the quantitative comparison results. In this paper we combine the two ideas above and propose a 

new pixel-wise alpha mattes evaluation method. This approach is based on using local windows to measure gradient correlation 

between image and the matte. An optimal image channel minimizing the image variance is also selected at each 

window in order to perform the correlation more correctly. Experimental result shows that, our system can generate precise 

evaluation result for each pixel of each matte without ground truth. 

15:00-17:10, Paper MoBT8.14 

Multiple Plane Detection in Image Pairs using J-Linkage 

Fouhey, David Ford, Middlebury Coll. 

Scharstein, Daniel, Middlebury Coll. 

Briggs, Amy, Middlebury Coll. 

We present a new method for the robust detection and matching of multiple planes in pairs of images. Such planes can 

serve as stable landmarks for vision-based urban navigation. Our approach starts from SIFT matches and generates multiple 

local homography hypotheses using the recent J-linkage technique by Toldo and Fusiello, a robust randomized multimodel 

estimation algorithm. These hypotheses are then globally merged, spatially analyzed, robustly fitted, and checked 

for stability. When tested on more than 30,000 image pairs taken from panoramic views of a college campus, our method 

yields no false positives and recovers 72% of the matchable building walls identified by a human, despite significant occlusions 

and viewpoint changes. 

- 41 -

15:00-17:10, Paper MoBT8.15 

Contextual Features for Head Pose Estimation in Football Games 

Launila, Andreas, Royal Inst. of Tech. (KTH) 

Sullivan, Josephine, Royal Inst. of Tech. (KTH) 

We explore the benefits of using contextual features for head pose estimation in football games. Contextual features are 

derived from knowledge of the position of all players and combined with image based features derived from low-resolution 

footage. Using feature selection and combination techniques, we show that contextual features can aid head pose estimation 

in football games and potentially be an important complement to the image based features traditionally used. 

15:00-17:10, Paper MoBT8.16 

Coarse-To-Fine Multiclass Nested Cascades for Object Detection 

Verschae, Rodrigo, Univ. de Chile 

Ruiz-Del-Solar, Javier, Univ. de Chile 

Building robust and fast object detection systems is an important goal of computer vision. A problem arises when several 

object types are to be detected, because the computational burden of running several specific classifiers in parallel becomes 

a problem. In addition the accuracy and the training time can be greatly affected. Seeking to provide a solution to these 

problems, we extend cascade classifiers to the multiclass case by proposing the use of multiclass coarse-to-fine (CTF) 

nested cascades. The presented results show that the proposed system scales well with the number of classes, both at 

training and running time. 

15:00-17:10, Paper MoBT8.17 

Visual SLAM with an Omnidirectional Camera 

Rituerto, Alejandro, Univ. de Zaragoza 

Puig, Luis, Univ. de Zaragoza 

Guerrero, Jose J., Univ. de Zaragoza 

In this work we integrate the Spherical Camera Model for catadioptric systems in a Visual-SLAM application. The Spherical 

Camera Model is a projection model that unifies central catadioptric and conventional cameras. To integrate this model 

into the Extended Kalman Filter-based SLAM we require to linearize the direct and the inverse projection. We have performed 

an initial experimentation with omni directional and conventional real sequences including challenging trajectories. 

The results confirm that the omni directional camera gives much better orientation accuracy improving the estimated camera 

trajectory. 

15:00-17:10, Paper MoBT8.18 

Shape Index SIFT: Range Image Recognition using Local Features 

Bayramoglu, Neslihan, Middle East Tech. Univ. 

Alatan, A. Aydin, Middle East Tech. Univ. 

Range image recognition gains importance in the recent years due to the developments in acquiring, displaying, and storing 

such data. In this paper, we present a novel method for matching range surfaces. Our method utilizes local surface properties 

and represents the geometry of local regions efficiently. Integrating the Scale Invariant Feature Transform (SIFT) with the 

shape index (SI) representation of the range images allows matching of surfaces with different scales and orientations. We 

apply the method for scaled, rotated, and occluded range images and demonstrate the effectiveness it by comparing the 

previous studies. 

15:00-17:10, Paper MoBT8.19 

Windows Detection using K-Means in CIE-Lab Color Space 

Recky, Michal, ICG TU Graz 

Leberl, Franz, ICG TU Graz 

In this paper, we present a method for window detection, robust enough to process complex facades of historical buildings. 

This method is able to provide results even for facades under severe perspective distortion. Our algorithm is able to detect 

many different window types and does not require a learning step. We achieve these features thanks to an extended gradient 

projection method and introduction of a I color descriptor based on a k-means clustering in a CIE-Lab color space into the 

- 42 -

process. This method is an important step towards creating large 3D city models in an automated workflow from large online 

image databases, or industrial systems. As such, it was designed to provide a high level of robustness for processing 

a large variety of I types. 

15:00-17:10, Paper MoBT8.20 

Robust Figure Extraction on Textured Background: A Game-Theoretic Approach 

Albarelli, Andrea, Univ. Ca’ Foscari Venezia 

Rodolà, Emanuele, Univ. Ca’ Foscari Venezia 

Cavallarin, Alberto, Univ. Ca’ Foscari Venezia 

Torsello, Andrea, Univ. Ca’ Foscari Venezia 

Feature-based image matching relies on the assumption that the features contained in the model are distinctive enough. When 

both model and data present a sizeable amount of clutter, the signal-to-noise ratio falls and the detection becomes more challenging. 

If such clutter exhibits a coherent structure, as it is the case for textured background, matching becomes even harder. 

In fact, the large amount of repeatable features extracted from the texture dims the strength of the relatively few interesting 

points of the object itself. In this paper we introduce a game-theoretic approach that allows to distinguish foreground features 

from background ones. In addition the same technique can be used to deal with the object matching itself. The whole procedure 

is validated by applying it to a practical scenario and by comparing it with a standard point-pattern matching technique. 

15:00-17:10, Paper MoBT8.21 

Image Retrieval of First-Person Vision for Pedestrian Navigation in Urban Area 

Kameda, Yoshinari, Univ. of Tsukuba 

Ohta, Yuich, Univ. of Tsukuba 

We propose a new computer vision approach to locate a walking pedestrian by a camera image of first-person vision in practical 

situation. We assume reference points have been registered with other first-person vision images. We utilize SURF and 

define seven matching criteria that derive from the property of first-person vision so that it rejects false matching. We have 

implemented a preliminary system that can respond to a query within ½ seconds for a path of approximately 1 km long 

around Tokyo downtown area where pedestrians and vehicles are always in images. 

15:00-17:10, Paper MoBT8.22 

Unexpected Human Behavior Recognition in Image Sequences using Multiple Features 

Zweng, Andreas, Vienna Univ. of Tech. 

Kampel, Martin, Vienna Univ. of Tech. 

This paper presents a novel approach for unexpected behavior recognition in image sequences with attention to high density 

crowd scenes. Due to occlusions, object-tracking in such scenes is challenging and in cases of low resolution or poor image 

quality it is not robust enough to efficiently detect abnormal behavior. The wide variety of possible actions performed by 

humans and the problem of occlusions makes action recognition unsuitable for behavior recognition in high density crowd 

scenes. The novel approach, which is presented in this paper uses features based on motion information instead of detecting 

actions or events in order to detect abnormality. Experiments demonstrate the potentials of the approach. 

15:00-17:10, Paper MoBT8.23 

Object Recognition based on N-Gram Expression of Human Actions 

Kojima, Atsuhiro, Osaka Prefecture Univ. 

Miki, Hiroshi, Osaka Prefecture Univ. 

Kise, Koichi, Osaka Prefecture Univ. 

In this paper, we propose a novel method for recognizing objects by observing human actions based on bag-of-features. The 

key contribution of our method is that human actions are represented as n-grams of symbols and used to identify specific 

object categories. First, features of human actions taken on a object are extracted from video images and encoded to symbols. 

Then, n-grams are generated from the sequence of symbols and registered for corresponding object category. For recognition 

phase, actions taken on the object are converted into set of n-grams in the same way and compared with ones representing 

object categories. We performed experiments to recognize objects in an office environment and confirmed the effectiveness 

of our method. 

- 43 -

15:00-17:10, Paper MoBT8.24 CANCELED 

Image Feature Associations via Local Semantic Structure 

Parrish, Nicholas, Colorado State Univ. 

Draper, Bruce A., Colorado State Univ. 

Most research in object recognition suffers from two distinct weaknesses that limits its effectiveness in natural environments. 

First, it tends to rely on labeled training images to learn object models. Second, it tends to assume that the goal is 

to recognize a single, dominant foreground object. This paper presents a different method of object recognition that learns 

to recognize objects in natural scenes without supervision. The approach uses semantic co-occurance information of local 

image features to form object models (called percepts) from groups of image features. These percepts are used to recognize 

objects in novel images. It will be shown that this approach is capable of learning object categories without supervision, 

and of recognizing objects in complex multi-object scenes. It will also be shown that it outperforms nearest-neighbor 

scene recognition. 

15:00-17:10, Paper MoBT8.25 

Unifying Approach for Fast License Plate Localization and Super-Resolution 

Nguyen, Chu Duc, Ec. Centrale de Lyon 

Ardabilian, Mohsen, Ec. Centrale de Lyon 

Chen, Liming, Ec. Centrale de Lyon 

This paper addresses the localization and super resolution of license plate in a unifying approach. Higher quality license 

plate can be obtained using super resolution on successive lower resolution plate images. All existing methods assume that 

plate zones are correctly extracted from every frame. However, the accurate localization needs a sufficient quality of the 

image, which is not always true in real video. Super-resolution on all pixels is a possible but much time consuming alternative. 

We propose a framework which interlaces successfully these two modules. First, coarse candidates are found by an weak 

but fast license plate detection based on edge map sub-sampling. Then, an improved fast MAP-based super-resolution, using 

local phase accurate registration and edge preserving prior, applied on these regions of interest. Finally, our robust ICHTbased 

localizer rejects false-alarms and localizes the high resolution license plate more accurately. Experiments which were 

conducted on synthetic and real data, proved the robustness of our approach with real-time possibility. 

15:00-17:10, Paper MoBT8.26 

Dimensionality Reduction for Distributed Vision Systems using Random Projection 

Sulic, Vildana, Univ. of Ljubljana 

Pers, Janez, Univ. of Ljubljana 

Kristan, Matej, Univ. of Ljubljana 

Kovacic, Stanislav, Univ. of Ljubljana 

Dimensionality reduction is an important issue in the context of distributed vision systems. Processing of dimensionality 

reduced data requires far less network resources (e.g., storage space, network bandwidth) than processing of original data. 

In this paper we explore the performance of the random projection method for distributed smart cameras. In our tests, random 

projection is compared to principal component analysis in terms of recognition efficiency (i.e., object recognition). 

The results obtained on the COIL-20 image data set show good performance of the random projection in comparison to 

the principal component analysis, which requires distribution of a subspace and therefore consumes more resources of the 

network. This indicates that random projection method can elegantly solve the problem of subspace distribution in embedded 

and distributed vision systems. Moreover, even without explicit orthogonalization or normalization of random 

projection transformation subspace, the method achieves good object recognition efficiency. 

15:00-17:10, Paper MoBT8.27 

Sensor Fusion for Cooperative Head Localization 

Del Bimbo, Alberto, Univ. of Florence 

Dini, Fabrizio, Univ. of Florence 

Lisanti, Giuseppe, Univ. of Florence 

Pernici, Federico, Univ. of Florence 

In modern video surveillance systems, pan; tilt; zoom (PTZ) cameras certainly have the potential to allow the coverage of 

wide areas with a much smaller number of sensors, compared to the common approach of fixed camera networks. This 

- 44 -

paper describes a general framework that aims at exploiting the capabilities of modern PTZ cameras in order to acquire 

high resolution images of body parts, such as the head, from the observation of pedestrians moving in a wide outdoor 

area. The framework allows to organize the sensors in a network with arbitrary topology, and to establish pairwise 

master;slave relationship between them. In this way a slave camera can be steered to acquire imagery of a target keeping 

into account both target and zooming uncertainties. Experiments show good performance in localizing targets head, independently 

from the zooming factor of the slave camera. 

15:00-17:10, Paper MoBT8.28 

Shared Random Ferns for Efficient Detection of Multiple Categories 

Villamizar Vergel, Michael, CSIC-UPC 

Moreno-Noguer, Francesc, CSIC-UPC 

Andrade Cetto, Juan, CSIC-UPC 

Sanfeliu, Alberto, Univ. Pol. De Catalunya 

We propose a new algorithm for detecting multiple object categories that exploits the fact that different categories may 

share common features but with different geometric distributions. This yields an efficient detector which, in contrast to 

existing approaches, considerably reduces the computation cost at runtime, where the feature computation step is traditionally 

the most expensive. More specifically, at the learning stage we compute common features by applying the same 

Random Ferns over the Histograms of Oriented Gradients on the training images. We then apply a boosting step to build 

discriminative weak classifiers, and learn the specific geometric distribution of the Random Ferns for each class. At 

runtime, only a few Random Ferns have to be densely computed over each input image, and their geometric distribution 

allows performing the detection. The proposed method has been validated in public datasets achieving competitive detection 

results, which are comparable with state-of-the-art methods that use specific features per class. 

15:00-17:10, Paper MoBT8.29 

Age Recognition in the Wild 

Bauckhage, Christian, Fraunhofer IAIS 

Jahanbekam, Amirhossein, Fraunhofer IAIS 

Thurau, Christian, Fraunhofer IAIS 

In this paper, we present a novel approach to age recognition from facial images. The method we propose, combines 

several established features in order to characterize facial characteristics and aging patterns. Since we explicitly consider 

age recognition in the wild, i.e. vast amounts of unconstrained Internet images, the methods we employ are tailored towards 

speed and efficiency. For evaluation, we test different classifiers on common benchmark data and a new data set of unconstrained 

images harvested from the Internet. Extensive experimental evaluation shows state of the art performance on 

the benchmarks, very high accuracy for the novel data set, and superior runtime performance; to our knowledge, this is 

the first time that automatic age recognition is carried out on a large Internet data set. 

15:00-17:10, Paper MoBT8.30 

EKF-SLAM and Machine Learning Techniques for Visual Robot Navigation 

Casarrubias-Vargas, Heriberto, CINVESTAV 

Petrilli-Barceló, Alberto E., CINVESTAV 

Bayro Corrochano, Eduardo Jose, CINVESTAV, Unidad Guadalajara 

In this work we propose the use of machine learning techniques to improve Simultaneous Localization and Mapping 

(SLAM) using an extended Kalman filter (EKF) and visual information for robot navigation. We are using the Viola and 

Jones approach for looking specific visual landmarks in environment. The landmarks are used to improve the robot localization 

in the EKF-SLAM system. Our experiments validate the efficiency of our algorithm. 

15:00-17:10, Paper MoBT8.31 

Boosting Clusters of Samples for Sequence Matching in Camera Networks 

Takala, Valtteri, Univ. of Oulu 

Cai, Yinghao, Univ. of Oulu 

Pietikäinen, Matti, Univ. of Oulu 

This study introduces a novel classification algorithm for learning and matching sequences in view independent object 

- 45 -

tracking. The proposed learning method uses adaptive boosting and classification trees on a wide collection (shape, pose, 

color, texture, etc.) of image features that constitute a model for tracked objects. The temporal dimension is taken into account 

by using k-mean clusters of sequence samples. Most of the utilized object descriptors have a temporal quality also. 

We argue that with a proper boosting approach and decent number of reasonably descriptive image features it is feasible 

to do view-independent sequence matching in sparse camera networks. The experiments on real-life surveillance data support 

this statement. 

15:00-17:10, Paper MoBT8.32 

Saliency Detection and Object Localization in Indoor Environments 

Rudinac, Maja, Delft Univ. of Tech. 

Jonker, Pieter, Delft Univ. of Tech. 

In this paper we present a scene exploration method for the identification of interest regions in unknown indoor environments 

and the position estimation of the objects located in those regions. Our method consists of two stages: First, we 

generate a saliency map of the scene based on the spectral residual of three color channels and interest points are detected 

in this map. Second, we propose and evaluate a method for the clustering of neighboring interest regions, the rejection of 

outliers and the estimation of the positions of potential objects. Once the location of objects in the scene is known, recognition 

of objects/object classes can be performed or the locations can be used for grasping the object. The main contribution 

of this paper lies in a computationally inexpensive method for the localization of multiple salient objects in a scene. The 

performance obtained on a dataset of indoor scenes shows that our method performs good, is very fast and hence highly 

suitable for real-world applications, such as mobile robots and surveillance. 

15:00-17:10, Paper MoBT8.33 

Bubble Tag Identification using an Invariant–Under–Perspective Signature 

Patraucean, Viorica, Univ. of Toulouse 

Gurdjos, Pierre, Univ. of Toulouse 

Conter, Jean, Univ. of Toulouse 

We have at our disposal a large database containing images of various configurations of coplanar circles, randomly laidout, 

called Bubble Tags. The images are taken from different viewpoints. Given a new image (query image), the goal is to 

find in the database the image containing the same bubble tag as the query image. We propose representing the images 

through projective invariant signatures which allow identifying the bubble tag without passing through an Euclidean reconstruction 

step. This is justified by the size of the database, which imposes the use of queries in 1D/vectorial form, i.e. 

not in 2D/matrix form. The experiments carried out confirm the efficiency of our approach, in terms of precision and complexity. 

15:00-17:10, Paper MoBT8.35 

The Role of Polarity in Haar-Like Features for Face Detection 

Landesa-Vázquez, Iago, Univ. de Vigo 

Alba Castro, Jose Luis, Univ. of Vigo 

Human vision is primarily based on local contrast perception and its polarity. Viola and Jones proposed, in their wellknown 

face detector framework, a boosted cascade of weak classifiers based on Haar-like features which encode local 

contrast and polarity information. Nevertheless contrast polarity invariance, which is not directly modeled in their framework, 

has been shown to be perceptually relevant for the human capability of detecting faces. In this paper we study, from 

both algorithmical and perceptual points of view, the effect of enhancing Haar-like features with polarity invariance and 

how it may improve cascaded classifiers. 

15:00-17:10, Paper MoBT8.36 

A Human Detection Framework for Heavy Machinery 

Heimonen, Teuvo Antero, Univ. of Oulu 

Heikkilä, Janne, Univ. of Oulu 

A stereo camera based human detection framework for heavy machinery is proposed. The framework allows easy integration 

of different human detection and image segmentation methods. This integration is essential for diverge and challenging 

- 46 -

work machine environments, in which traditional, one detector based human detection approaches has been found to be 

insufficient. The framework is based on the idea of pixel-wise human probabilities, which are obtained by several separate 

detection trials following binomial distribution. The framework has been evaluated with extensive image sequences of 

authentic work machine environments, and it has proven to be feasible. Promising detection performance was achieved 

by utilizing publically available human detectors. 

15:00-17:10, Paper MoBT8.37 

Building a Videorama with Shallow Depth of Field 

Bae, Soonmin, Boston Coll. 

Jiang, Hao, Boston Coll. 

This paper presents a new automatic approach to building a videorama with shallow depth of field. We stitch the static background 

of video frames and render the dynamic foreground onto the enlarged background after foreground/background segmentation. 

To this end, we extract the depth information from a two-view video stream. We show that the depth cues combined 

with color cues improve segmentation. Finally, we use the depth cues to synthesize the shallow depth of field effects in the 

final videorama. Our approach stabilizes the camera motion as if the video was captured from a static camera and improves 

the visual quality with the increased field of view and shallow depth of field effects. 

15:00-17:10, Paper MoBT8.38 

Fast Training of Object Detection using Stochastic Gradient Descent 

Wijnhoven, Rob, ViNotion BV 

De With, Peter H. N., Eindhoven Univ. of Tech. / CycloMedia 

Training datasets for object detection problems are typically very large and Support Vector Machine (SVM) implementations 

are computationally complex. As opposed to these complex techniques, we use Stochastic Gradient Descent (SGD) algorithms 

that use only a single new training sample in each iteration and process samples in a stream-like fashion. We have incorporated 

SGD optimization in an object detection framework. The object detection problem is typically highly asymmetric, because 

of the limited variation in object appearance, compared to the background. Incorporating SGD speeds up the optimization 

process significantly, requiring only a single iteration over the training set to obtain results comparable to state-of-the-art 

SVM techniques. SGD optimization is linearly scalable in time and the obtained speedup in computation time is two to three 

orders of magnitude. We show that by considering only part of the total training set, SGD converges quickly to the overall 

optimum. 

15:00-17:10, Paper MoBT8.39 

Assessing Water Quality by Video Monitoring Fish Swimming Behavior 

Serra-Toro, Carlos, Univ. Jaume I 

Montoliu, Raúl, Univ. Jaume I 

Traver, V. Javier, Univ. Jaume I 

Hurtado-Melgar, Isabel M., Univ. Jaume I 

Núñez-Redó, Manuela, Univ. Jaume I 

Cascales, Pablo, Univ. Jaume I 

Animals are known to alter their behavior in response to changes in their environments. Therefore, automatic visual monitoring 

of animal behavior is currently of great interest because of its many applications. In this paper, a video-based system 

is proposed for analyzing the swimming patterns of fishes so that the presence of toxic in the water can be inferred. This 

problem is challenging, among other reasons, because how fishes react when swimming in contaminated water is neither 

really known nor well defined. A novel use of recurrence plots is proposed, and very compact and simple descriptors based 

on these recurrence representation are found to be highly discriminative between videos of fishes in clean and polluted water. 

15:00-17:10, Paper MoBT8.40 

Detecting Wires in Cluttered Urban Scenes using a Gaussian Model 

Candamo, Joshua, Univ. of South Florida 

Goldgof, Dmitry, Univ. of South Florida 

Kasturi, Rangachar, Univ. of South Florida 

Godavarthy, Sridhar, Univ. of South Florida 

- 47 -

A novel wire detection algorithm for use by unmanned aerial vehicles (UAV) in low altitude urban reconnaissance is presented. 

This is of interest to urban search and rescue and military reconnaissance operations. Detection of wires plays an 

important role, because thin wires are hard to discern by tele-operators and automated systems. Our algorithm is based on 

identification of linear patterns in images. Most existing methods that search for linear patterns use a simple model of a 

line, which does not take into account the line surroundings. We propose the use of a robust Gaussian model to approximate 

the intensity profile of a line and its surroundings which allows effective discrimination of wires from other visually similar 

linear patterns. The algorithm is able to cope with highly cluttered urban backgrounds, moderate rain, and mist. Experimental 

results show a 17.7% detection improvement over the baseline. 

15:00-17:10, Paper MoBT8.41 

Abandoned Objects Detection based on Radial Reach Correlation of Double Illumination Invariant Foreground Masks 

Li, Xunli, Peking Univ. 

Zhang, Chao, Peking Univ. 

Zhang, Duo, 

This paper proposes an automatic and robust method to detect and recognize the abandoned objects for video surveillance 

systems. Two Gaussian Mixture Models(Long-term and Short-term models) in the RGB color space are constructed to 

obtain two binary foreground masks. By refining the foreground masks through Radial Reach Filter(RRF) method, the influence 

of illumination changes is greatly reduced. The height/width ratio and a linear SVM classifier based on HOG (Histogram 

of Oriented Gradient) descriptor is also used to recognize the left-baggage. Tests on datasets of PETS2006, 

PETS2007 and our own videos show that the proposed method in this paper can detect very small abandoned objects 

within low quality surveillance videos, and it is also robust to the varying illuminations and dynamic background. 

15:00-17:10, Paper MoBT8.42 

Unsupervised Visual Object Categorisation via Self-Organisation 

Kinnunen, Juha Teemu Ensio, Lappeenranta Univ. of Tech. 

Kamarainen, Joni-Kristian, Lappeenranta Univ. of Tech. 

Lensu, Lasse, Lappeenranta Univ. of Tech. 

Kalviainen, Heikki, Lappeenranta Univ. of Tech. 

Visual object categorisation (VOC) has become one of the most actively investigated topic in computer vision. In the 

mainstream studies, the topic is considered as a supervised problem, but recently, the ultimate challenge has been posed: 

Unsupervised visual object categorisation. Hitherto only a few methods have been published, all of them being computationally 

demanding successors of their supervised counterparts. In this study, we address this problem with a simple and 

effective method: competitive learning leading to self organisation (self-categorisation). The unsupervised competitive 

learning approach is implemented using the Kohonen self-organising map algorithm (SOM). The SOM is used to perform 

the both unsupervised codebook generation and object categorisation. We present our method in detail and compare results 

to the supervised approach. 

15:00-17:10, Paper MoBT8.43 

A Novel Shape Feature for Fast Region-Based Pedestrian Recognition 

Shahrokni, Ali, Univ. of Reading 

Gawley, Darren, Univ. of Adelaide 

Ferryman, James, Univ. of Reading 

A new class of shape features for region classification and high-level recognition is introduced. The novel Randomised 

Region Ray (RRR) features can be used to train binary decision trees for object category classification using an abstract 

representation of the scene. In particular we address the problem of human detection using an over segmented input image. 

We therefore do not rely on pixel values for training, instead we design and train specialised classifiers on the sparse set 

of semantic regions which compose the image. Thanks to the abstract nature of the input, the trained classifier has the potential 

to be fast and applicable to extreme imagery conditions. We demonstrate and evaluate its performance in people 

detection using a pedestrian dataset. 

- 48 -

15:00-17:10, Paper MoBT8.44 

Road Change Detection from Multi-Spectral Aerial Data 

Mancini, Adriano, Univ. Pol. Delle Marche 

Frontoni, Emanuele, Univ. Pol. Delle Marche 

Zingaretti, Primo, Univ. Pol. Delle Marche 

The paper presents a novel approach to automate the Change Detection (CD) problem for the specific task of road extraction. 

Manual approaches to CD fail in terms of the time for releasing updated maps; in the contrary, automatic approaches, 

based on machine learning and image processing techniques, allow to update large areas in a short time with an accuracy 

and precision comparable to those obtained by human operators. This work is focused on the road-graph update starting 

from aerial, multi-spectral data. Geore ferenced, ground data, acquired by a GPS and an inertial sensor, are integrated with 

aerial data to speed up the change detector. After roads extraction by means of a binary AdaBoost classifier, the old roadgraph 

is updated exploiting a particle filter. In particular this filter results very useful to link (track) parts of roads not extracted 

by the classifier due to the presence of occlusions (e.g., shadows, trees). 

15:00-17:10, Paper MoBT8.45 

Object Recognition and Localization via Spatial Instance Embedding 

Ikizler Cinbis, Nazli, Boston Univ. 

Sclaroff, Stan, Boston Univ. 

We propose an approach for improving object recognition and localization using spatial kernels together with instance embedding. 

Our approach treats each image as a bag of instances (image features) within a multiple instance learning framework, 

where the relative locations of the instances are considered as well as the appearance similarity of the localized image features. 

The introduced spatial kernel augments the recognition power of the instance embedding in an intuitive and effective way, 

providing increased localization performance. We test our approach over two object datasets and present promising results. 

15:00-17:10, Paper MoBT8.46 

Co-Recognition of Actions in Video Pairs 

Shin, Young Min, Seoul National Univ. 

Cho, Minsu, Seoul National Univ. 

Lee, Kyoung Mu, Seoul National Univ. 

In this paper, we present a method that recognizes single or multiple common actions between a pair of video sequences. 

We establish an energy function that evaluates geometric and photometric consistency, and solve the action recognition 

problem by optimizing the energy function. The proposed stochastic inference algorithm based on the Monte Carlo method 

explores the video pair from the local spatio-temporal interest point matches to find the common actions. Our algorithm 

works in unsupervised way without prior knowledge about the type and the number of common actions. Experiments 

show that our algorithm produces promising results on single and multiple action recognition. 

15:00-17:10, Paper MoBT8.47 

Detecting Moving Objects using a Camera on a Moving Platform 

Lin, Chung-Ching, Georgia Inst. of Tech. 

Wolf, Marilyn, Georgia Inst. of Tech. 

This paper proposes a new ego-motion estimation and background/foreground classification method to effectively segment 

moving objects from videos captured by a moving camera on a moving platform. Existing methods for moving-camera 

detecting impose serious constraints. In our approach, ellipsoid scene shape is applied in the motion model and a complicated 

ego-motion estimation formula is derived. Genetic algorithm is introduced to accurately solve ego-motion parameters. 

After motion recovery, noisy result is refined by motion vector correlation and foreground is classified by pixel level probability 

model. Experiment results show that the method demonstrates significant detecting performance without further 

restrictions and performs effectively in complex detecting environment. 

- 49 -

15:00-17:10, Paper MoBT8.48 

A Unified Probabilistic Approach to Feature Matching and Object Segmentation 

Kim, Tae Hoon, Seoul National Univ. 


Lee, Sang Uk, Seoul National Univ. 

This paper deals with feature matching and segmentation of common objects in a pair of images, simultaneously. For the 

feature matching problem, the matching likelihoods of all feature correspondences are obtained by combining their discriminative 

power with the spatial coherence constraint that favors their spatial aggregation via object segmentation. At 

the same time, for the object segmentation problem, our algorithm estimates the object likelihood that each subregion is 

a commonly existing part in two images by the affinity propagation of the resulted matching likelihoods. Since these two 

problems are related to each other, our main idea to solve them is to integrate all the priors about them into a unified framework, 

that consists of several correlated quadratic cost functions. Eventually, all matching and object likelihoods are estimated 

simultaneously as a solution of linear system of equations. Based on these likelihoods, we finally recover the optimal 

feature matches and the common object parts by imposing simple sequential mapping and thresholding techniques, respectively. 

The experiments demonstrate the superiority of our algorithm compared with the conventional methods. 

15:00-17:10, Paper MoBT8.49 

Automatic Restoration of Scratch in Old Archive 

Kim, Kyung-Tai, Konkuk Univ. 

Kim, Byunggeun, Konkuk Univ. 

Kim, Eun Yi, Konkuk Univ. 

This paper presents scratch restoration method that can deal with scratches of various lengths and widths in old film. The 

proposed method consists of detection and reconstruction. The detection is performed using texture and shape properties 

of the scratches: first, each pixel is classified as scratches and non-scratches using a neural network (NN)-based texture 

classifier, and then some false alarms are removed by shape filtering. Thereafter, the detected region is reconstructed. 

Here, the reconstruction is formulated as energy minimization problem, thus genetic algorithm is used as optimization algorithm. 

The experimental result with well-known old films showed the effectiveness of the proposed method. 

15:00-17:10, Paper MoBT8.50 

Automatic Building Detection in Aerial Images using a Hierarchical Feature based Image Segmentation 

Izadi, Mohammad, Simon Fraser Univ. 

Saeedi, Parvaneh, Simon Fraser Univ. 

This paper introduces a novel automatic building detection method for aerial images. The proposed method incorporates 

a hierarchical multilayer feature based image segmentation technique using color. A number of geometrical/regional attributes 

are defined to identify potential regions in multiple layers of segmented images. A tree-based mechanism is utilized 

to inspect segmented regions using their spatial relationships with each other and their regional/geometrical characteristics. 

This process allows the creation of a set of candidate regions that are validated as rooftops based on the overlap between 

existing and predicted shadows of each region according to the image acquisition information. Experimental results show 

an overall shape accuracy and completeness of 96%. 

15:00-17:10, Paper MoBT8.51 

Making Visual Object Categorization More Challenging: Randomized Caltech-101 Data Set 

Kinnunen, Juha Teemu Ensio, Lappeenranta Univ. of Tech. 


Lensu, Lasse, Lappeenranta Univ. of Tech. 

Lankinen, Jukka, Lappeenranta Univ. of Tech. 

Kalviainen, Heikki, Lappeenranta Univ. of Tech. 

Visual object categorization is one of the most active research topics in computer vision, and Caltech-101 data set is one 

of the standard benchmarks for evaluating the method performance. Despite of its wide use, the data set has certain weaknesses: 

I) the objects are practically in a standard pose and scale in the middle of the images and ii) background varies too 

little in certain categories making it more discriminative than the foreground objects. In this work, we demonstrate how 

these weaknesses bias the evaluation results in an undesired manner. In addition, we reduce the bias effect by replacing 

- 50 -

the backgrounds with random landscape images from Google and by applying random Euclidean transformations to the 

foreground objects. We demonstrate how the proposed randomization process makes visual object categorization more 

challenging improving the relative results of methods which categorize objects by their visual appearance and are invariant 

to pose changes. The new data set is made publicly available for other researchers. 

15:00-17:10, Paper MoBT8.52 

A Reliability Assessment Paradigm for Automated Video Tracking Systems 

Chen, Chung-Hao, North Carolina Central Univ. 

Yao, Yi, GE Global Res. 

Koschan, Andreas, The Univ. of Tennessee 

Abidi, Mongi, The Univ. of Tennessee 

Most existing performance evaluation methods concentrate on defining separate metrics over a wide range of conditions 

and generating standard benchmarking video sequences for examining the effectiveness of video tracking systems. In 

other words, these methods attempt to design a robustness margin or factor for the system. These methods are deterministic 

in which a robustness factor, for example, 2 or 3 times the expected number of subjects to track or the strength of illumination 

would be required in the design. This often results in over design, thus increasing costs, or under design causing 

failure by unanticipated factors. In order to overcome these limitations, we propose in this paper an alternative framework 

to analyze the physics of the failure process and, through the concept of reliability, determine the time to failure in automated 

video tracking systems. The benefit of our proposed framework is that we can provide a unified and statistical index 

to evaluate the performance of automated video tracking system for a task to be performed. At the same time, the uncertainty 

problem about a failure process, which may be caused by the systems complexity, imprecise measurements of the 

relevant physical constants and variables, or the indeterminate nature of future events, can be addressed accordingly based 

on our proposed framework. 

15:00-17:10, Paper MoBT8.53 

Road Sign Detection in Images: A Case Study 

Belaroussi, Rachid, Univ. Paris Est,INRETS-LCPC 

Foucher, Philippe, Lab. Des Ponts et Chaussées 

Tarel, Jean-Philippe, LCPC 

Soheilian, Bahman, Ins. Géographique National, 

Charbonnier, Pierre, ERA27 LCPC – LRPC 

Paparoditis, Nicolas, Inst. Geographique National 

Road sign identification in images is an important issue, in particular for vehicle safety applications. It is usually tackled 

in three stages: detection, recognition and tracking, and evaluated as a whole. To progress towards better algorithms, we 

focus in this paper on the first stage of the process, namely road sign detection. More specifically, we compare, on the 

same ground-truth image database, results obtained by three algorithms that sample different state-of-the-art approaches. 

The three tested algorithms: Contour Fitting, Radial Symmetry Transform, and pair-wise voting scheme, all use color and 

edge information and are based on geometrical models of road signs. The test dataset is made of 847 images 960x1080 of 

complex urban scenes (available at www.itowns.fr/benchmarking.html). They feature 251 road signs of different shapes 

(circular, rectangular, triangular), sizes and types. The pros and cons of the three algorithms are discussed, allowing to 

draw new research perspectives. 

15:00-17:10, Paper MoBT8.54 

ImageCLEF@ICPR Contest: Challenges, Methodologies and Results of the Photo Annotation Task 

Nowak, Stefanie, Fraunhofer Inst. For Digital Media Tech. 

The Photo Annotation Task is performed as one task in the Image CLEF@ICPR contest and poses the challenge to annotate 

53 visual concepts in Flickr photos. Altogether 12 research teams met the multilabel classification challenge and submitted 

solutions. The participants were provided with a training and a validation set consisting of 5,000 and 3,000 annotated images, 

respectively. The test was performed on 10,000 images. Two evaluation paradigms have been applied, the evaluation per 

concept and the evaluation per example. The evaluation per concept was performed by calculating the Equal Error Rate and 

the Area Under Curve (AUC). The evaluation per example utilizes a recently proposed Ontology Score. For the concepts, an 

average AUC of 86.5% could be achieved, including concepts with an AUC of 96%. The classification performance for each 

image ranged between 59% and 100% with an average score of 85%. 

- 51 -

15:00-17:10, Paper MoBT8.55 

Task-Oriented Evaluation of Super-Resolution Techniques 

Tian, Li, NTT Corp. 

Suzuki, Akira, NTT Cyber Space Lab. 

Koike, Hideki, NTT Corp. 

The goal of super-resolution (SR) techniques is to enhance the resolution of low-resolution (LR) images. How to evaluate 

the performance of an SR algorithm seems to be forgotten when researchers keep producing algorithms. This paper presents 

a task-oriented method for evaluating SR techniques. Our method includes both objective and subjective measures and is 

designed from the viewpoint of how SR impacts many essential image processing and vision tasks. We evaluate some 

state-of-the-art SR algorithms and the results suggest that different SR algorithms should be utilized for different applications. 

In general, they reflect the consistency and conflict between objective and subjective measures as well as computer 

vision systems and human vision systems do. 

15:00-17:10, Paper MoBT8.56 

FeEval – a Dataset for Evaluation of Spatio-Temporal Local Features 

Stoettinger, Julian, TU Vienna 

Zambanini, Sebastian, TU Vienna 

Khan, Rehanullah, TU Vienna 

Hanbury, Allan, Information Retrieval Facility 

The most successful approaches to video understanding and video matching use local spatio-temporal features as a sparse 

representation for video content. Until now, no principled evaluation of these features has been done. We present FeEval, 

a dataset for the evaluation of such features. For the first time, this dataset allows for a systematic measurement of the stability 

and the invariance of local features in videos. FeEval consists of 30 original videos from a great variety of different 

sources, including HDTV shows, 1080p HD movies and surveillance cameras. The videos are iteratively varied by increasing 

blur, noise, increasing or decreasing light, median filter, compression quality, scale and rotation leading to a total 

of 1710 video clips. Homography matrices are provided for geometric transformations. The surveillance videos are taken 

from 4 different angles in a calibrated environment. Similar to prior work on 2D images, this leads to a repeatability and 

matching measurement in videos for spatio-temporal features estimating the overlap of features under increasing changes 

in the data. 

15:00-17:10, Paper MoBT8.57 

Performance Evaluation Tools for Zone Segmentation and Classification (PETS) 

Seo, Wontaek, Univ. of Maryland 

Agrawal, Mudit, Univ. of Maryland 

Doermann, David, Univ. of Maryland 

This paper describes a set of Performance Evaluation Tools (PETS) for document image zone segmentation and classification. 

The tools allow researchers and developers to evaluate, optimize and compare their algorithms by providing a 

variety of quantitative performance metrics. The evaluation of segmentation quality is based on the pixel-based overlaps 

between two sets of zones proposed by Randriamasy and Vincent. PETS extends the approach by providing a set of metrics 

for overlap analysis, RLE and polygonal representation of zones and introduces type-matching to evaluate zone classification. 

The software is available for research use. 

MoBT9 Upper Foyer 

Feature Extraction; Classification; Clustering; Bayesian Methods Poster Session 

Session chair: Pietikäinen, Matti (Univ of Oulu) 

15:00-17:10, Paper MoBT9.1 

Shape Filling Rate for Silhouette Representation and Recognition 

An, Guocheng, Chinese Acad. of Sciences 

Zhang, Fengjun, Chinese Acad. of Sciences 

Wang, Hong’An, Chinese Acad. of Sciences 

Dai, Guozhong, Chinese Acad. of Sciences 

- 52 -

Research on complex shape recognition showed that the shape context algorithm is sensitive to relative position variation 

of articulation. Aimed at this problem, a shape recognition method is proposed based on local shape filling rate of various 

object silhouettes. We take each landmark point as a circle center and use as its radius. Then, under a particular radius, the 

ratio between the covered silhouette pixels and the total pixels is defined as local shape filling rate. Thus, different radius 

may form different local shape filling rates. All landmark points with different radius will constitute a characteristic matrix 

which can effectively reflects the entire statistical property of the object shape. Experiments on a variety of shape databases 

show that the novel method is insensitive to articulation and less influenced by the number of landmark points, so our algorithm 

has strong power in describing object details. 

15:00-17:10, Paper MoBT9.2 

Learning Gmm using Elliptically Contoured Distributions 

Li, Bo, Beijing Inst. of Tech. 

Liu, Wenju, Chinese Acad. of Sciences 

Dou, Lihua, Beijing Inst. of Tech. 

Model order selection and parameter estimation for Gaussian mixture model (GMM) are important issues for clustering 

analysis and density estimation. Most methods for model selection usually add a penalty term in the objective function 

that can penalize the models and choose an optimal one from a set of candidate models. This paper presents a simple and 

novel approach to determine the number of components and simultaneously estimate the parameters for GMM. By introducing 

the degenerating model, the proposed approach overcomes the drawback of likelihood estimate that is a non-decreasing 

function and can not be used to select the number of components. The degenerating model is a more general form 

of mixture component density and it can degenerate into the component density or a crater-like density when its parameter 

K varies from 1 to a bigger value. The likelihood of the crater-like density evaluated for the training data approximates to 

zero. This characteristic of the degenerating model forms the foundation of the proposed approach. The experimental 

results show robust and evident performance improvement of the approach. 

15:00-17:10, Paper MoBT9.3 

FIND: A Neat Flip Invariant Descriptor 

Guo, Xiaojie, Tianjin Univ. 

Cao, Xiaochun, Tianjin Univ. 

In this paper, we introduce a novel Flip Invariant Descriptor (FIND). FIND improves the degenerated performance resulted 

from image flips and reduces both space and time costs. Flip invariance of FIND enables the intractable flip detection to 

be achieved easily, instead of duplicately implementing the procedure. To alleviate the pressure brought by the increasing 

scale of image and video data, FIND utilizes a concise structure with less storage space. Comparing to SIFT, FIND reduces 

35.94% length for a descriptor. We compare FIND against SIFT with respect to accuracy, speed and space cost. An application 

to image search over a database of 3.27 million descriptors is also shown. 

15:00-17:10, Paper MoBT9.4 

Matching Image with Multiple Local Features 

Cao, Yudong, Beijing Univ. of Posts and Telecommunications/ Liaoning Univ. of Tech 

Zhang, Honggang, Beijing Univ. of Posts and Telecommunications 

Gao, Yanyan, Beijing Univ. of Posts and Telecommunications 

Xu, Xiaojun, Beijing Univ. of Posts and Telecommunications 

Guo, Jun, Beijing Univ. of Posts and Telecommunications 

In this paper, we present the fusional feature composed of Affine-SIFT, MSER and color moment invariants. The fusional 

feature is more robust and distinctive than a single local feature. Instead of adding three local features together simply, an 

efficient two-level matching strategy is devised with the fusional feature, which speeds up the establishment of the local 

correspondences. To remove partial false positives, an affine transformation is estimated with the weighted RANSAC 

which decreases iteration times. The experimental results show that our approach can achieve more accurate correspondence. 

We prospect to apply the fusional feature and match strategy to image retrieval in the end. 

- 53 -

15:00-17:10, Paper MoBT9.5 

Lipreading: A Graph Embedding Approach 

Zhou, Ziheng, Univ. of Oulu 

Zhao, Guoying, Univ. of Oulu 


In this paper, we propose a novel graph embedding method for the problem of lipreading. To characterize the temporal 

connections among video frames of the same utterance, a new distance metric is defined on a pair of frames and graphs 

are constructed to represent the video dynamics based on the distances between frames. Audio information is used to assist 

in calculating such distances. For each utterance, a subspace of the visual feature space is learned from a well-defined intrinsic 

and penalty graph within a graph-embedding framework. Video dynamics are found to be well preserved along 

some dimensions of the subspace. Discriminatory cues are then decoded from curves of the projected visual features to 

classify different utterances. 

15:00-17:10, Paper MoBT9.6 

Face Recognition using a Multi-Manifold Discriminant Analysis Method 

Yang, Wankou, Southeast Univ. Nanjing 

Sun, Changyin, Southeast Univ. Nanjing 


In this paper, we propose a Multi-Manifold Discriminant Analysis (MMDA) method for face feature extraction and face 

recognition, which is based on graph embedded learning and under the Fisher discirminant analysis framework. In MMDA, 

the within-class graph and between-class graph are designed to characterize the within-class compactness and the between-class 

separability, respectively, seeking for the discriminant matrix that simultaneously maximizing the betweenclass 

scatter and minimizing the within-class scatter. In addition, the within-class graph can also represent the sub-manifold 

information and the between-class graph can also represent the multi-manifold information. The proposed MMDA is examined 

by using the FERET face database, and the experimental results demonstrate that MMDA works well in feature 

extraction and lead to good recognition performance. 

15:00-17:10, Paper MoBT9.7 

Globally-Preserving based Locally Linear Embedding 

Hui, Kanghua, Chinese Acad. of Sciences 

Wang, Chunheng, Chinese Acad. of Sciences 

Xiao, Baihua, Chinese Acad. of Sciences 

The locally linear embedding (LLE) algorithm is considered as a powerful method for the problem of nonlinear dimensionality 

reduction. In this paper, a new method called globally-preserving based LLE (GPLLE) is proposed. It not only 

preserves the local neighborhood, but also keeps those distant samples still far away, which solves the problem that LLE 

may encounter, i.e. LLE only makes local neighborhood preserving, but cannot prevent the distant samples from nearing. 

Moreover, GPLLE can estimate the intrinsic dimensionality d of the manifold structure. The experiment results show that 

GPLLE always achieves better classification performances than LLE based on the estimated d. 

15:00-17:10, Paper MoBT9.8 

3d Human Pose Estimation by an Annealed Two-Stage Inference Method 

Wang, Yuan-Kai, Fu Jen Univ. 

Cheng, Kuang-You, Fu Jen Univ. 

This paper proposes a novel human motion capture method that locates human body joint position and reconstructs the 

human pose in 3D space from monocular images. We propose a two-stage framework including 2D and 3D probabilistic 

graphical models which can solve the occlusion problem for the estimation of human joint positions. The 2D and 3D 

models adopt directed acyclic structure to avoid error propagation of inference in the models. Both the 2D and 3D models 

utilize the Expectation Maximization algorithm to learn prior distributions of the models. An annealed Gibbs sampling 

method is proposed for the two-stage method to inference the maximum posteriori distributions of joint positions. The annealing 

process can efficiently explore the mode of distributions and find solutions in high-dimensional space. Experiments 

are conducted on the Human Eva dataset to show the effectiveness of the proposed method. The experimental data are 

image sequences of walking motion with a full 180 turn around a region, which causes occlusion of poses and loss of 

- 54 -

image observations. Experimental results show that the proposed two-stage approach can efficiently estimate more accurate 

human poses from monocular images. 

15:00-17:10, Paper MoBT9.9 

Extended Locality Preserving Discriminant Analysis for Face Recognition 

Yang, Liping, Chongqing Univ. 

Gong, Weiguo, Chongqing Univ. 

Gu, Xiaohua, Chongqing Univ. 

In this paper, an extended locality preserving discriminant analysis (ELPDA) method is proposed. To address the disadvantages 

of original locality preserving discriminant analysis (LPDA), a new locality preserving between-class scatter, 

which is characterized by samples and the corresponding k out-class nearest neighbors, is defined. Moreover, the small 

sample size problem is also avoided by solving a new optimization function. Experimental results on AR and FERET subsets 

illustrate the effectiveness of the proposed method for face recognition. 

15:00-17:10, Paper MoBT9.10 

Beyond “Near-Duplicates”: Learning Hash Codes for Efficient Similar-Image Retrieval 

Baluja, Shumeet, Google, Inc. 

Covell, Michele, Google, Inc. 

Finding similar images in a large database is an important, but often computationally expensive, task. In this paper, we 

present a two-tier similar-image retrieval system with the efficiency characteristics found in simpler systems designed to 

recognize near-duplicates. We compare the efficiency of lookups based on random projections and learned hashes to 100times-more-frequent 

exemplar sampling. Both approaches significantly improve on the results from exemplar sampling, 

despite having significantly lower computational costs. Learned-hash keys provide the best result, in terms of both recall 

and efficiency. 

15:00-17:10, Paper MoBT9.11 

Rare Class Classification on SVM 

He, He, The Hong Kong Pol. Univ. 

Ghodsi, Ali, University of Waterloo 

The problem of classification on highly imbalanced datasets has been studied extensively in the literature. Most classifiers 

show significant deterioration in performance when dealing with skewed datasets. In this paper, we first examine the underlying 

reasons for SVM’s deterioration on imbalanced datasets. We then propose two modifications for the soft margin 

SVM, where we change or add constraints to the optimization problem. The proposed methods are compared with regular 

SVM, cost-sensitive SVM and two re-sampling methods. Our experimental results demonstrate that this constrained SVM 

can consistently outperform the other associated methods. 

15:00-17:10, Paper MoBT9.12 

Package Boosting for Readaption of Cascaded Classifiers 

Szczot, Magdalena, Daimler AG 

Löhlein, Otto, Daimler AG 

Forster, Julian, Daimler AG 

Palm, Günther, Univ. of Ulm 

This contribution presents an efficient and useful way to readapt a cascaded classifier. We introduce Package Boosting 

which combines the advantages of Real Adaboost and Online Boosting for the realization of the strong learners in each 

cascade layer. We also examine the conditions which need to be fulfilled by a cascade in order to meet the requirements 

of an online algorithm and present the evaluation results of the system. 

- 55 -

15:00-17:10, Paper MoBT9.13 

Baby-Posture Classification from Pressure-Sensor Data 

Boughorbel, Sabri, Philips Res. Lab. 

Bruekers, Fons, Philips Res. Lab. 

Breebaart, Jeroen, Philips Res. Lab. 

The activity of babies and more specifically the posture of babies is an important aspect in their safety and development. 

In this paper, we studied the automatic classification of baby posture using a pressure-sensitive mat. The posture classification 

problem is formulated as the design of features that describe the pressure patterns induced by the child in combination 

with generic classifiers. Novel rotation invariant features constructed from high order statistics obtained from the concentric 

rings around the center of gravity. Non-constant ring radii are used in order to ensure uniform cell areas and therefore 

equal importance of features. A vote fusion of various generic classifiers is used for classification. Temporal information 

was shown to improve the classification performance. The obtained results are promising and open new opportunities for 

applications and further research in the area of baby safety and development. 

15:00-17:10, Paper MoBT9.14 

Vector Quantization Mappings for Speaker Verification 

Brew, Anthony, Univ. Coll. Dublin 

Cunningham, Pádraig, Univ. Coll. Dublin 

In speaker verification several techniques have emerged to map variable length utterances into a fixed dimensional space 

for classification. One popular approach uses Maximum A-Posteriori (MAP) adaptation of a Gaussian Mixture Model 

(GMM) to create a super-vector. This paper investigates using Vector Quantisation (VQ) as the global model to provide a 

similar mapping. This less computationally complex mapping gives comparable results to its GMM counterpart while 

also providing the ability for an efficient iterative update enabling media files to be scanned with a fixed length window. 

15:00-17:10, Paper MoBT9.15 

Maximum Entropy Model based Classification with Feature Selection 

Dukkipati, Ambedkar, Indian Inst. of Science 

Yadav, Abhay Kumar, Indian Inst. of Science 

M, Narasimha Murty, Indian Inst. of Science 

In this paper, we propose a classification algorithm based on the maximum entropy principle. This algorithm finds the 

most appropriate class-conditional maximum entropy distributions for classification. No prior knowledge about the form 

of density function for estimating the class conditional density is assumed except that the information is given in the form 

of expected valued of features. This algorithm also incorporates a method to select relevant features for classification. The 

proposed algorithm is suitable for large data-sets and is demonstrated by simulation results on some real world benchmark 

data-sets. 

15:00-17:10, Paper MoBT9.16 

Dimensionality Reduction by Minimal Distance Maximization 

Xu, Bo, Chinese Acad. of Sciences 

Huang, Kaizhu, Chinese Acad. of Sciences 


In this paper, we propose a novel discriminant analysis method, called Minimal Distance Maximization (MDM). In contrast 

to the traditional LDA, which actually maximizes the average divergence among classes, MDM attempts to find a low-dimensional 

subspace that maximizes the minimal (worst-case) divergence among classes. This ``minimal” setting solves 

the problem caused by the `àverage” setting of LDA that tends to merge similar classes with smaller divergence when 

used for multi-class data. Furthermore, we elegantly formulate the worst-case problem as a convex problem, making the 

algorithm solvable for larger data sets. Experimental results demonstrate the advantages of our proposed method against 

five other competitive approaches on one synthetic and six real-life data sets. 

- 56 -

15:00-17:10, Paper MoBT9.17 

Possibilistic Clustering based on Robust Modeling of Finite Generalized Dirichlet Mixture 

Ben Ismail, Maher, Univ. of Louisville 

Frigui, Hichem, Univ. of Louisville 

We propose a novel possibilistic clustering algorithm based on robust modelling of the Generalized Dirichlet (GD) finite 

mixture. The algorithm generates two types of membership degrees. The first one is a posterior probability that indicates 

the degree to which the point fits the estimated distribution. The second membership represents the degree of typicality 

and is used to indentify and discard noise points. The algorithm minimizes one objective function to optimize GD mixture 

parameters and possibilistic membership values. This optimization is done iteratively by dynamically updating the Dirichlet 

mixture parameters and the membership values in each iteration. We compare the performance of the proposed algorithm 

with an EM based approach. We show that the possibilistic approach is more robust. 

15:00-17:10, Paper MoBT9.18 

Cluster-Pairwise Discriminant Analysis 

Makihara, Yasushi, The Inst. of Scientific and Industrial Res. Univ. 

Yagi, Yasushi, Osaka Univ. 

Pattern recognition problems often suffer from the larger intra-class variation due to situation variations such as pose, 

walking speed, and clothing variations in gait recognition. This paper describes a method of discriminant subspace analysis 

focused on situation cluster pair. In training phase, both a situation cluster discriminant subspace and class discriminant 

subspaces for the situation cluster pair by using training samples of non recognition-target classes. In testing phase, given 

a matching pair of patterns of recognition-target classes, posterior of situation cluster pairs is estimated at first, and then 

the distance is calculated in the corresponding cluster-pairwise class discriminant subspace. The experiments both with 

simulation data and real data show the effectiveness of the proposed method. 

15:00-17:10, Paper MoBT9.19 

Online Discriminative Kernel Density Estimation 

Kristan, Matej, Univ. of Ljubljana 

Leonardis, Ales, Univ. of Ljubljana 

We propose a new method for online estimation of probabilistic discriminative models. The method is based on the recently 

proposed online Kernel Density Estimation (oKDE) framework which produces Gaussian mixture models and allows 

adaptation using only a single data point at a time. The oKDE builds reconstructive models from the data, and we extend 

it to take into account the interclass discrimination through a new distance function between the classifiers. We arrive at 

an online discriminative Kernel Density Estimator (odKDE). We compare the odKDE to oKDE, batch state-of-the-art 

KDEs and support vector machine (SVM) on a standard database. The odKDE achieves comparable classification performance 

to that of best batch KDEs and SVM, while allowing online adaptation, and produces models of lower complexity 

than the oKDE. 

15:00-17:10, Paper MoBT9.20 

Local Outlier Detection based on Kernel Regression 

Gao, Jun, Chinese Acad. of Sciences 

Hu, Weiming, Chinese Acad. of Sciences 

Li, Wei, Chinese Acad. of Sciences 

Zhang, Zhongfei, State Univ. of New York, Binghamton 

Wu, Ou, Chinese Acad. of Sciences 

Outlier detection keeps an important and attractive task of the knowledge discovery in databases. In this paper, a novel 

approach named Multi-scale Local Kernel Regression is proposed. It transfers the unsupervised learning of outlier detection 

to the classic non-parameter regression learning. Through preprocessing the original data by the basic local density-based 

method, it adopts the local kernel regression estimator in the multiple scale neighborhoods to determine outliers. Experiments 

on several real life data sets demonstrate that this approach is promising in detection performance. 

- 57 -

15:00-17:10, Paper MoBT9.21 

Verification under Increasing Dimensionality 

Hendrikse, Anne, Univ. of Twente 

Veldhuis, Raymond, Univ. of Twente 

Spreeuwers, Luuk, Univ. of Twente 

Verification decisions are often based on second order statistics estimated from a set of samples. Ongoing growth of computational 

resources allows for considering more and more features, increasing the dimensionality of the samples. If the 

dimensionality is of the same order as the number of samples used in the estimation or even higher, then the accuracy of 

the estimate decreases significantly. In particular, the eigenvalues of the covariance matrix are estimated with a bias and 

the estimate of the eigenvectors differ considerably from the real eigenvectors. We show how a classical approach of verification 

in high dimensions is severely affected by these problems, and we show how bias correction methods can reduce 

these problems. 

15:00-17:10, Paper MoBT9.22 

Discriminant Feature Manifold for Facial Aging Estimation 

Fang, Hui, Swansea Univ. 

Grant, Phil, Swansea Univ. 

Min, Chen, Swansea Univ. 

Computerised facial aging estimation, which has the potential for many applications in human-computer interactions, has 

been investigated by many computer vision researchers in recent years. In this paper, a feature-based discriminant subspace 

is proposed to extract more discriminating and robust representations for aging estimation. After aligning all the faces by 

a piece-wise affine transform, orthogonal locality preserving projection (OLPP) is employed to project local binary patterns 

(LBP) from the faces into an age-discriminant subspace. The feature extracted from this manifold is more distinctive for 

age estimation compared with the features using in the state-of-the-art methods. Based on the public database FG-NET, 

the performance of the proposed feature is evaluated by using two different regression techniques, quadratic function and 

neural-network regression. The proposed feature subspace achieves the best performance based on both types of regression. 

15:00-17:10, Paper MoBT9.23 

Tensor Voting based Color Clustering 

Nguyen Dinh, Toan, Chonnam National Univ. 

Park, Jonghyun, Chonnam National Univ. 

Lee, Chilwoo, Chonnam National Univ. 

Lee, Gueesang, Chonnam National Univ. 

A novel color clustering algorithm based on tensor voting is proposed. Each color feature vector is encoded by a second 

order tensor. Tensor voting is then applied to estimate the number of dominant colors and perform color clustering by exploiting 

the shape and data density of the color clusters. The experimental results show that the proposed method generates 

good results in image segmentation, especially in the case of images with multi-color texts. 

15:00-17:10, Paper MoBT9.24 

An Improved Structural EM to Learn Dynamic Bayesian Nets 

De Campos, Cassio, Dalle Molle Inst. For Artificial Intelligence 

Zeng, Zhi, Rensselaer Pol. Inst. 

Ji, Qiang, RPI 

This paper addresses the problem of learning structure of Bayesian and Dynamic Bayesian networks from incomplete 

data based on the Bayesian Information Criterion. We describe a procedure to map the problem of the dynamic case into 

a corresponding augmented Bayesian network through the use of structural constraints. Because the algorithm is exact 

and anytime, it is well suitable for a structural Expectation–Maximization (EM) method where the only source of approximation 

is due to the EM itself. We show empirically that the use a global maximizer inside the structural EM is computationally 

feasible and leads to more accurate models. 

- 58 -

15:00-17:10, Paper MoBT9.25 

Gaussian Process Learning from Order Relationships using Expectation Propagation 

Wang, Ruixuan, Univ. of Dundee 

Mckenna, Stephen James, Univ. of Dundee 

A method for Gaussian process learning of a scalar function from a set of pair-wise order relationships is presented. Expectation 

propagation is used to obtain an approximation to the log marginal likelihood which is optimised using an analytical 

expression for its gradient. Experimental results show that the proposed method performs well compared with a 

previous method for Gaussian process preference learning. 

15:00-17:10, Paper MoBT9.26 

Feature Ranking based on Decision Border 

Diamantini, Claudia, Univ. Pol. Delle Marche 

Gemelli, Alberto, Univ. Pol. Delle Marche 

Potena, Domenico, Univ. Pol. Delle Marche 

In this paper a Feature Ranking algorithm for classification is proposed, which is based on the notion of Bayes decision 

border. The method elaborates upon the results of the Decision Border Feature Extraction approach, exploiting properties 

of eigenvalues and eigenvectors of the orthogonal transformation to calculate the discriminative importance weights of 

the original features. Non parametric classification is also considered by resorting to Labeled Vector Quantizers neural 

networks trained by the BVQ algorithm. The choice of this architecture leads to a cheap implementation of the ranking algorithm 

we call BVQ-FR. The effectiveness of BVQ-FR is tested on real datasets. The novelty of the method is to use a 

feature extraction technique to assess the weight of the original features, as opposed to heuristics methods commonly used. 

15:00-17:10, Paper MoBT9.27 

Three-Layer Spatial Sparse Coding for Image Classification 

Dai, Dengxin, Wuhan Univ. 

Yang, Wen, Wuhan Univ. 

Wu, Tianfu, Lotus Hill Res. Inst. 

In this paper, we propose a three-layer spatial sparse coding (TSSC) for image classification, aiming at three objectives: 

naturally recognizing image categories without learning phase, naturally involving spatial configurations of images, and 

naturally counteracting the intra-class variances. The method begins by representing the test images in a spatial pyramid 

as the to-be-recovered signals, and taking all sampled image patches at multiple scales from the labeled images as the 

bases. Then, three sets of coefficients are involved into the cardinal sparse coding to get the TSSC, one to penalize spatial 

inconsistencies of the pyramid cells and the corresponding selected bases, one to guarantee the sparsity of selected images, 

and the other to guarantee the sparsity of selected categories. Finally, the test images are classified according to a simple 

image-to-category similarity defined on the coding coefficients. In experiments, we test our method on two publicly available 

datasets and achieve significantly more accurate results than the conventional sparse coding with only a modest increase 

in computational complexity. 

15:00-17:10, Paper MoBT9.28 

Theoretical Analysis of a Performance Measure for Imbalanced Data 

Garcia, Vicente, Univ. Jaume I 

Mollineda, Ramón A., Univ. Jaume I 

Sanchez, J. Salvador, Univ. Jaume I 

This paper analyzes a generalization of a new metric to evaluate the classification performance in imbalanced domains, 

combining some estimate of the overall accuracy with a plain index about how dominant the class with the highest individual 

accuracy is. A theoretical analysis shows the merits of this metric when compared to other well-known measures. 

15:00-17:10, Paper MoBT9.29 

Cluster Preserving Embedding 

Zhan, Yubin, National Univ. of Defense Tech. 

Yin, Jianping, National Univ. of Defense Tech. 

- 59 -

Most of existing dimensionality reduction methods obtain the low-dimensional embedding via preserving a certain property 

of the data, such as locality, neighborhood relationship. However, the intrinsic cluster structure of data, which plays a key 

role in analyzing and utilizing the data, has been ignored by the state-of-the-art dimensionality reduction methods. Hence, 

in this paper we propose a novel dimensionality reduction method called Cluster Preserving Embedding(CPE), in which 

the cluster structure of original data is preserved via preserving the robust path-based similarity between pairwise points. 

We present two different methods to preserve this similarity. One is the Multidimensional Scaling(MDS) way, which tries 

to preserve similarity matrix accurately, the other one is a Laplacian-style way, which preserves the topological partial 

order of the similarity rather than similarity itself. Encouraging experimental results on a toy data set and handwritten 

digits from MNIST database demonstrate the effectiveness of our Cluster Preserving Embedding method. 

15:00-17:10, Paper MoBT9.30 

Color Image Analysis by Quaternion Zernike Moments 

Chen, Beijing, Southeast Univ. 

Shu, Huazhong, Southeast Univ. 

Zhang, Hui, Southeast Univ. 

Chen, Gang, Southeast Univ. 

Luo, Limin, Southeast Univ. 

Moments and moment invariants are useful tool in pattern recognition and image analysis. Conventional methods to deal 

with color images are based on RGB decomposition or graying. In this paper, by using the theory of quaternions, we introduce 

a set of quaternion Zernike moments (QZMs) for color images in a holistic manner. It is shown that the QZMs 

can be obtained via the conventional Zernike moments of each channel. We also construct a set of combined invariants to 

rotation and translation (RT) using the modulus of central QZMs. Experimental results show that the proposed descriptors 

are more efficient than the existing ones. 

15:00-17:10, Paper MoBT9.32 

Topic-Sensitive Tag Ranking 

Jin, Yan’An, Huazhong Univ. of Science and Tech. 

Li, Ruixuan, Huazhong Univ. of Science and Tech. 

Lu, Zhengding, Huazhong Univ. of Science and Tech. 

Wen, Kunmei, Huazhong Univ. of Science and Tech. 

Gu, Xiwu, Huazhong Univ. of Science and Tech. 

Social tagging is an increasingly popular way to describe and classify documents on the web. However, the quality of the 

tags varies considerably since the tags are authored freely. How to rate the tags becomes an important issue. In this paper, 

we propose a topic-sensitive tag ranking (TSTR) approach to rate the tags on the web. We employ a generative probabilistic 

model to associate each tag with a distribution of topics. Then we construct a tag graph according to the co-tag relationships 

and perform a topic-level random walk over the graph to suggest a ranking score for each tag at different topics. Experimental 

results validate the effectiveness of the proposed tag ranking approach. 

15:00-17:10, Paper MoBT9.33 

Water Reflection Detection using a Flip Invariant Shape Detector 

Zhang, Hua, Tianjin Univ. 



Water reflection detection is a tough task in computer vision, since the reflection is distorted by ripples irregularly. This 

paper proposes an effective method to detect water reflections. We introduce a descriptor that is not only invariant to 

scales, rotations and affine transformations, but also tolerant to the flip transformation and even non-rigid distortions, such 

as ripple effects. We analyze the structure of our descriptor and show how it outperforms the existing mirror feature descriptors 

in the context of water reflection. The experimental results demonstrate that our method is able to detect the 

water reflections. 

- 60 -

15:00-17:10, Paper MoBT9.34 

CDP Mixture Models for Data Clustering 

Ji, Yangfeng, Peking Univ. 

Lin, Tong, Peking Univ. 

Zha, Hongbin, Peking Univ. 

In Dirichlet process (DP) mixture models, the number of components is implicitly determined by the sampling parameters 

of Dirichlet process. However, this kind of models usually produces lots of small mixture components when modeling 

real-world data, especially high-dimensional data. In this paper, we propose a new class of Dirichlet process mixture 

models with some constrained principles, named constrained Dirichlet process (CDP) mixture models. Based on general 

DP mixture models, we add a resampling step to obtain latent parameters. In this way, CDP mixture models can suppress 

noise and generate the compact patterns of the data. Experimental results on data clustering show the remarkable performance 

of the CDP mixture models. 

15:00-17:10, Paper MoBT9.35 

A Simple Approach to Find the Best Wavelet Basis in Classification Problems 

Faradji, Farhad, Univ. of British Columbia 

Ward, Rabab K., Univ. of British Columbia 

Birch, Gary E., Neil Squire Society 

In this paper, we address the problem of finding the best wavelet basis in wavelet packet analysis for applications based 

on classification. We implement and evaluate our proposed method in the design of a self-paced 2-state mental task-based 

brain-computer interface (BCI) as one possible type of classification-based applications. The autoregressive coefficients 

of the best wavelet basis are concatenated to form the feature vector. The 2-stage classification process is based on quadratic 

discriminant analysis and majority voting. Seventeen wavelets from 2 different families are tested. A cross-validation 

process is per-formed twice to do model selection and system performance evaluation. The results show that the proposed 

method can be well applied to BCI systems. 

15:00-17:10, Paper MoBT9.36 

Learning Probabilistic Models of Contours 

Amate, Laure, Univ. of Nice-Sophia Antipolis, CNRS 

Rendas, Maria João, Univ. of Nice-Sophia Antipolis, CNRS 

We present a methodology for learning spline-based probabilistic models for sets of contours, proposing a new Monte 

Carlo variant of the EM algorithm to estimate the parameters of a family of distributions defined over the set of spline 

functions (with fixed complexity). The proposed model effectively captures the major morphological properties of the observed 

set of contours as well as its variability, as the simulation results presented demonstrate. 

15:00-17:10, Paper MoBT9.37 

Local Sparse Representation based Classification 

Li, Chun-Guang, Beijing Univ. of Posts and Telecommunications 


Zhang, Honggang, Beijing Univ. of Posts and Telecommunications 

In this paper, we address the computational complexity issue in Sparse Representation based Classification (SRC). In 

SRC, it is time consuming to find a global sparse representation. To remedy this deficiency, we propose a Local Sparse 

Representation based Classification (LSRC) scheme, which performs sparse decomposition in local neighborhood. In 

LSRC, instead of solving the L1-norm constrained least square problem for all of training samples we solve a similar 

problem in a local neighborhood for each test sample. Experiments on face recognition data sets ORL and Extended Yale 

B demonstrated that the proposed LSRC algorithm can reduce the computational complexity and remain the comparative 

classification accuracy and robustness. 

- 61 -

15:00-17:10, Paper MoBT9.38 

Manifold Modeling with Learned Distance in Random Projection Space for Face Recognition 

Tsagkatakis, Grigorios, Rochester Inst. of Tech. 

Savakis, Andreas, Rochester Inst. of Tech. 

In this paper, we propose the combination of manifold learning and distance metric learning for the generation of a representation 

that is both discriminative and informative, and we demonstrate that this approach is effective for face recognition. 

Initial dimensionality reduction is achieved using random projections, a computationally efficient and data independent 

linear transformation. Distance metric learning is then applied to increase the separation between classes and improve the 

accuracy of nearest neighbor classification. Finally, a manifold learning method is used to generate a mapping between 

the randomly projected data and a low dimensional manifold. Face recognition results suggest that the combination of 

distance metric learning and manifold learning can increase performance. Furthermore, random projections can be applied 

as an initial step without significantly affecting the classification accuracy. 

15:00-17:10, Paper MoBT9.39 

Part Detection, Description and Selection based on Hidden Conditional Random Fields 

Lu, Wenhao, Tsinghua Univ. 

Wang, Shengjin, Tsinghua Univ. 

Ding, Xiaoqing, Tsinghua Univ. 

In this paper, the problem of part detection, description and selection is discussed. This problem is crucial in the learning 

algorithms of part-based models, but cannot be solved well when some candidate parts are extracted from background. 

This paper studies this problem and introduces a new algorithm, HCRF-PS (Hidden Conditional Random Fields for Part 

Selection), for part detection, description, especially selection. Our algorithm is distinguished for its power to optimize 

multiple kinds of information at the same time, including texture, color, location and part label. Finally, we did some experiments 

with HCRF-PS algorithm which give good results on both virtual and real data. 

15:00-17:10, Paper MoBT9.40 

Boosting Bayesian MAP Classification 

Piro, Paolo, CNRS/Univ. of Nice-Sophia Antipolis 

Nock, Richard, Univ. des Antilles et de la Guyane 

Nielsen, Frank, Ec. Pol. 

Barlaud, Michel, CNRS/Univ. of Nice-Sophia Antipolis 

In this paper we redefine and generalize the classic k-nearest neighbors (k-NN) voting rule in a Bayesian maximum-aposteriori 

(MAP) framework. Therefore, annotated examples are used for estimating pointwise class probabilities in the 

feature space, thus giving rise to a new instance-based classification rule. Namely, we propose to ``boost’’ the classic k- 

NN rule by inducing a strong classifier from a combination of sparse training data, called ``prototypes’’. In order to learn 

these prototypes, our MapBoost algorithm globally minimizes a multiclass exponential risk defined over the training data, 

which depends on the class probabilities estimated at sample points themselves. We tested our method for image categorization 

on three benchmark databases. Experimental results show that MapBoost significantly outperforms classic k-NN 

(up to 8%). Interestingly, due to the supervised selection of sparse prototypes and the multiclass classification framework, 

the accuracy improvement is obtained with a considerable computational cost reduction. 

15:00-17:10, Paper MoBT9.41 

Weighting of the K-Nearest-Neighbors 

Chernoff, Konstantin, Univ. of Copenhagen 

Nielsen, Mads 

This paper presents two distribution independent weighting schemes for k-Nearest-Neighbors (kNN). Applying the first 

scheme in a Leave-One-Out (LOO) setting corresponds to performing complete b-fold cross validation (b-CCV), while 

applying the second scheme corresponds to performing bootstrapping in the limit of infinite iterations. We demonstrate 

that the soft kNN errors obtained through b-CCV can be obtained by applying the weighted kNN in a LOO setting, and 

that the proposed weighting schemes can decrease the variance and improve the generalization of kNN in a CV setting. 

- 62 -

15:00-17:10, Paper MoBT9.42 

Learning Sparse Face Features : Application to Face Verification 

Buyssens, Pierre, Greyc UMR6072 

Revenu, Marinette, GREYC UMR 6072 

We present a low resolution face recognition technique based on a Convolutional Neural Network approach. The network 

is trained to reconstruct a reference per subject image. In classical feature–based approaches, a first stage of features extraction 

is followed by a classification to perform the recognition. In classical Convolutional Neural Network approaches, 

features extraction stages are stacked (interlaced with pooling layers) with classical neural layers on top to form the complete 

architecture of the network. This paper addresses two questions : 1. Does a pretraining of the filters in an unsupervised 

manner improve the recognition rate compared to the one with filters learned in a purely supervised scheme ? 2. Is there 

an advantage of pretraining more than one feature extraction stage ? We show particularly that a refinement of the filters 

during the supervised training improves the results. 

15:00-17:10, Paper MoBT9.43 

Image Feature Extraction using 2D Mel-Cepstrum 

Cakir, Serdar, Bilkent Univ. 

Cetin, E., Bilkent Univ. 

In this paper, a feature extraction method based on two-dimensional (2D) mel-cepstrum is introduced. Feature matrices 

resulting from the 2D mel-cepstrum, Fourier LDA approach and original image matrices are individually applied to the 

Common Matrix Approach (CMA) based face recognition system. For each of these feature extraction methods, recognition 

rates are obtained in the AR face database, ORL database and Yale database. Experimental results indicate that recognition 

rates obtained by the 2D mel-cepstrum method is superior to the recognition rates obtained using Fourier LDA approach 

and raw image matrices. This indicates that 2D mel-cepstral analysis can be used in image feature extraction problems. 

15:00-17:10, Paper MoBT9.44 

Entropy Estimation and Multi-Dimensional Scale Saliency 

Suau, Pablo, Univ. of Alicante 

Escolano, Francisco, Univ. of Alicante 

In this paper we survey two multi-dimensional Scale Saliency approaches based on graphs and the k-d partition algorithm. 

In the latter case we introduce a new divergence metric and we show experimentally its suitability. We also show an application 

of multi-dimensional Scale Saliency to texture discrimination. We demonstrate that the use of multi-dimensional 

data can improve the performance of texture retrieval based on feature extraction. 

15:00-17:10, Paper MoBT9.45 

A Novel Facial Localization for Three-Dimensional Face using Multi-Level Partition of Unity Implicits 

Hu, Yuan, Shanghai Jiao Tong Univ. 

Yan, Jingqi, Shanghai Jiao Tong Univ. 

Li, Wei, Shanghai Jiao Tong Univ. 

Shi, Pengfei, Shanghai Jiao Tong Univ. 

This paper presents a novel facial localization method for 3D face in the presence of facial pose and expression variation. 

An idea of using Multi-level Partition of Unity (MPU) Implicits in a hierarchical way is proposed for reconstruction of 

face surface. Based on the analysis of curvature features, nose and eyeholes regions can be detected on lower level reconstructed 

face surface uniquely. Experimental results show that this method is invariant to pose, holes, noise and expression. 

The overall performance of 99.18% is achieved. 

15:00-17:10, Paper MoBT9.46 

Automated Feature Weighting in Fuzzy Declustering-Based Vector Quantization 

Ng, Theam Foo, Univ. of New South Wales@ADFA 

Pham, Tuan D., Univ. of New South Wales@ADFA 

Sun, Changming, CSIRO 

- 63 -

Feature weighting plays an important role in improving the performance of clustering technique. We propose an automated 

feature weighting in fuzzy declustering-based vector quantization (FDVQ), namely AFDVQ algorithm, for enhancing effectiveness 

and efficiency in classification. The proposed AFDVQ imposes weights on the modified fuzzy c-means (FCM) 

so that it can automatically calculate feature weights based on their degrees of importance rather than treating them equally. 

Moreover, the extension of FDVQ and AFDVQ algorithms based on generalized improved fuzzy partitions (GIFP), known 

as GIFP-FDVQ and GIFP-AFDVQ respectively, are proposed. The experimental results on real data (original and noisy 

data) and modified data (biased and noisy-biased data) have demonstrated that the proposed algorithms outperformed 

standard algorithms in classifying clusters especially for biased data. 

15:00-17:10, Paper MoBT9.47 

A Discriminative and Heteroscedastic Linear Feature Transformation for Multiclass Classification 

Lee, Hung-Shin, National Taiwan Univ. 

Wang, Hsin-Min, Acad. Sinica 

Chen, Berlin, National Taiwan Normal Univ. 

This paper presents a novel discriminative feature transformation, named full-rank generalized likelihood ratio discriminant 

analysis (fGLRDA), on the grounds of the likelihood ratio test (LRT). fGLRDA attempts to seek a feature space, which is 

linearly isomorphic to the original n-dimensional feature space and is characterized by a full-rank transformation matrix, 

under the assumption that all the class-discrimination information resides in a d-dimensional subspace, through making 

the most confusing situation, described by the null hypothesis, as unlikely as possible to happen without the homoscedastic 

assumption on class distributions. Our experimental results demonstrate that fGLRDA can yield moderate performance 

improvements over other existing methods, such as linear discriminant analysis (LDA) for the speaker identification task. 

15:00-17:10, Paper MoBT9.48 

Sparse Representation Classifier Steered Discriminative Projection 

Yang, Jian, Nanjing Univ. of Science and Tech. 

Chu, Delin, National Univ. of Singapore 

The sparse representation-based classifier (SRC) has been developed and shows great potential for pattern classification. 

This paper aims to gain a discriminative projection such that SRC achieves the optimum performance in the projected 

pattern space. We use the decision rule of SRC to steer the design of a dimensionality reduction method, which is coined 

the sparse representation classifier steered discriminative projection (SRC-DP). SRC-DP matches SRC optimally in theory. 

Experiments are done on the AR and extended Yale B face image databases, and results show the proposed method is 

more effective than other dimensionality reduction methods with respect to the sparse representation-based classifier. 

15:00-17:10, Paper MoBT9.49 

Designing a Pattern Stabilization Method using Scleral Blood Vessels for Laser Eye Surgery 

Kaya, Aydin, Hacettepe Univ. 

Can, Ahmet Burak, Hacettepe Univ. 

Çakmak, Hasan Basri, Ataturk Research Hospital 

In laser eye surgery, the accuracy of operation depends on coherent eye tracking and registration techniques. Main approach 

used in image processing based eye trackers is extraction and tracking of pupil and limbus regions. In eye registration 

step, iris region features extracted from infrared images are used generally. Registration step determines the angular shift 

of eye origin by comparing the eye position on operation table with the eye topology obtained before the operation. Registration 

is only applied at the beginning but patients movements don not stop during operation. Hence we presented a 

method for pattern stabilization which can be repeated during operation at regular intervals. We use scleral blood vessels 

as features due to texturedness and resistance to errors caused by pupil center shift and ablation of cornea region. 

15:00-17:10, Paper MoBT9.51 

Aggregation of Probabilistic PCA Mixtures with a Variational-Bayes Technique over Parameters 

Bruneau, Pierrick, Nantes Univ. 

Gelgon, Marc, Nantes Univ. 

Picarougne, Fabien, Nantes Univ. 

- 64 -

This paper proposes a solution to the problem of aggregating versatile probabilistic models, namely mixtures of probabilistic 

principal component analyzers. These models are a powerful generative form for capturing high-dimensional, non 

Gaussian, data. They simultaneously perform mixture adjustment and dimensionality reduction. We demonstrate how such 

models may be advantageously aggregated by accessing mixture parameters only, rather than original data. Aggregation 

is carried out through Bayesian estimation with a specific prior and an original variational scheme. Experimental results 

illustrate the effectiveness of the proposal. 

15:00-17:10, Paper MoBT9.52 

Kernel Uncorrelated Adjacent-Class Discriminant Analysis 

Jing, Xiaoyuan, Nanjing Univ. of Posts and Telecommunications 

Li, Sheng, Nanjing Univ. of Posts and Telecommunications 

Yao, Yongfang, Nanjing Univ. of Posts and Telecommunications 

Bian, Lusha, Nanjing Univ. of Posts and Telecommunications 

Yang, Jingyu, Nanjing Univ. of Science and Tech. 

In this paper, a kernel uncorrelated adjacent-class discriminant analysis (KUADA) approach is proposed for image recognition. 

The optimal nonlinear discriminant vector obtained by this approach can differentiate one class and its adjacent 

classes, i.e., its nearest neighbor classes, by constructing the specific between-class and within-class scatter matrices in 

kernel space using the Fisher criterion. In this manner, KUADA acquires all discriminant vectors class by class. Furthermore, 

KUADA makes every discriminant vector satisfy locally statistical uncorrelated constraints by using the corresponding 

class and part of its most adjacent classes. Experimental results on the public AR and CAS-PEAL face databases 

demonstrate that the proposed approach outperforms several representative nonlinear discriminant methods. 

15:00-17:10, Paper MoBT9.53 

A Meta-Learning Approach to Conditional Random Fields using Error-Correcting Output Codes 

Ciompi, Francesco, Univ. de Barcelona 

Pujol, Oriol, UB 

Radeva, Petia, CVC 

We present a meta-learning framework for the design of potential functions for Conditional Random Fields. The design 

of both node potential and edge potential is formulated as a classification problem where margin classifiers are used. The 

set of state transitions for the edge potential is treated as a set of different classes, thus defining a multi-class learning 

problem. The Error-Correcting Output Codes (ECOC) technique is used to deal with the multi-class problem. Furthermore, 

the point defined by the combination of margin classifiers in the ECOC space is interpreted in a probabilistic manner, and 

the obtained distance values are then converted into potential values. The proposed model exhibits very promising results 

when applied to two real detection problems. 

15:00-17:10, Paper MoBT9.54 

Statistical Modeling of Image Degradation based on Quality Metrics 

Chetouani, Aladine, Inst. Galilée – Univ. Paris 13 

Beghdadi, Azeddine, Univ. Paris 13 

Deriche, Mohamed, KFUPM 

A plethora of Image Quality Metrics (IQM) has been proposed during the last two decades. However, at present time, 

there is no accepted IQM able to predict the perceptual level of image degradation across different types of visual distortions. 

Some measures are more adapted for a set of degradations but inefficient for others. Indeed, the efficiency of any 

IQM has been shown to depend upon the type of degradation. Thus, we propose here a new approach for predicting the 

type of degradation before using IQMs. The basic idea is first to identify the type of distortion using a Bayesian approach, 

then select the most appropriate IQM for estimating image quality for that specific type of distortion. The performance of 

the proposed method is evaluated in terms of classification accuracy across different types of degradations. 

15:00-17:10, Paper MoBT9.55 

Performance Evaluation of Automatic Feature Discovery Focused within Error Clusters 

Wang, Sui-Yu, Lehigh Univ. 


- 65 -

We report performance evaluation of our automatic feature discovery method on the publicly available Gisette dataset: a 

set of 29 features discovered by our method ranks 129 among all 411 current entries on the validation set. Our approach 

is a greedy forward selection algorithm guided by error clusters. The algorithm finds error clusters in the current feature 

space, then projects one tight cluster into the null space of the feature mapping, where a new feature that helps to classify 

these errors can be discovered. This method assumes a ``data-rich’’ problem domain and works well when large amount 

of labeled data is available. The result on the Gisette dataset shows that our method is competitive to many of the current 

feature selection algorithms. We also provide analytical results showing that our method is guaranteed to lower the error 

rate on Gaussian distributions and that our approach may outperform the standard Linear Discriminant Analysis (LDA) 

method in some cases. 

15:00-17:10, Paper MoBT9.56 

Optimized Entropy-Constrained Vector Quantization of Lossy Vector Map Compression 

Chen, Minjie, Univ. of Eastern Finland 

Xu, Mantao, Carestream Health Corp. Shanghai, China 

Fränti, Pasi, Univ. of Eastern Finland 

Quantization plays an important part in lossy vector map compression, for which the existing solutions are based on either 

a fixed size open-loop codebook, or a simple uniform quantization. In this paper, we proposed an entropy-constrained 

vector quantization to optimize both the structure and size of the codebook at the same time using a closed-loop approach. 

In order to lower the distortion to a desirable level, we exploit two-level design strategy, where the vector quantization 

codebook is designed only for most common vectors and the remaining (outlier) vectors are coded by uniform quantization. 

15:00-17:10, Paper MoBT9.57 

Nonnegative Embeddings and Projections for Dimensionality Reduction and Information Visualization 

Zafeiriou, Stefanos, Imperial Coll. of London 

Laskaris, Nikolaos, AiiA-Lab. AUTH, 

In this paper, we propose novel algorithms for low dimensionality nonnegative embedding of vectorial and/or relational 

data, as well as nonnegative projections for dimensionality reduction. We start by introducing a novel algorithm for Metric 

Multidimensional Scaling (MMS). We propose algorithms for Nonnegative Locally Linear Embedding (NLLE) and Nonnegative 

Laplacian Eigenmaps (NLE). By reformulating the problem of MMS, NLLE and NLE for finding projections 

we propose algorithms for Nonnegative Principal Component Analysis (NPCA), for Nonnegative Orthogonal Neighbourhood 

Preserving Projections (NONPP) and Nonnegative Orthogonal Locality Preserving Projections (NOLPP). We demonstrate 

some first preliminary results of the proposed methods in data visualization. 

15:00-17:10, Paper MoBT9.58 

Unsupervised Learning from Linked Documents 

Guo, Zhen, SUNY at Binghamton 

Zhu, Shenghuo, NEC Lab. 

Chi, Yun, NEC Lab. 

Zhang, Zhongfei, State Univ. of New York, Binghamton 

Gong, Yihong, NEC Lab. America, Inc. 

Documents in many corpora, such as digital libraries and webpages, contain both content and link information. In a traditional 

topic model which plays an important role in the unsupervised learning, the link information is either totally ignored 

or treated as a feature similar to content. We believe that neither approach is capable of accurately capturing the relations 

represented by links. To address the limitation of traditional topic models, in this paper we propose a citation-topic (CT) 

model that explicitly considers the document relations represented by links. In the CT model, instead of being treated as 

yet another feature, links are used to form the structure of the generative model. As a result, in the CT model a given document 

is modeled as a mixture of a set of topic distributions, each of which is borrowed (cited) from a document that is 

related to the given document. We apply the CT model to several document collections and the experimental comparisons 

against state-of-the-art approaches demonstrate very promising performances. 

- 66 -

15:00-17:10, Paper MoBT9.59 

Tensor Power Method for Efficient MAP Inference in Higher-Order MRFs 

Semenovich, Dimitri, Univ. of New South Wales 

Sowmya, Arcot, Univ. of New South Wales 

We present a new efficient algorithm for maximizing energy functions with higher order potentials suitable for MAP inference 

in discrete MRFs. Initially we relax integer constraints on the problem and obtain potential label assignments 

using higher-order (tensor) power method. Then we utilise an ascent procedure similar to the classic ICM algorithm to 

converge to a solution meeting the original integer constraints. 

15:00-17:10, Paper MoBT9.60 

Detection and Characterization of Anomalous Entities in Social Communication Networks 

Gupta, Nithi, Tata Consultancy Services 

Dey, Lipika, Tata Consultancy Services 

Social networks generated from emails or calls provide enormous geospatial and interaction information about subscribers. 

These have served as important inputs to intelligence analysts. In this paper, we propose an efficient algorithm for anomaly 

detection from social networks. Anomalous users are detected based on their behavioral dissimilarity from others. A rich 

feature set is proposed for outlier detection. A method for providing visual explanation for the results is also proposed. 

15:00-17:10, Paper MoBT9.61 

Mahalanobis-Based Adaptive Nonlinear Dimension Reduction 

Aouada, Djamila, Univ. of Luxembourg, SnT 

Baryshnikov, Yuliy, Bell Lab. 

Krim, Hamid, NCSU 

We define a new adaptive embedding approach for data dimension reduction applications. Our technique entails a local 

learning of the manifold of the initial data, with the objective of defining local distance metrics that take into account the 

different correlations between the data points. We choose to illustrate the properties of our work on the isomap algorithm. 

We show through multiple simulations that the new adaptive version of isomap is more robust to noise than the original 

non-adaptive one. 

15:00-17:10, Paper MoBT9.62 

Maximum Likelihood Estimation of Gaussian Mixture Models using Particle Swarm Optimization 

Ari, Caglar, Bilkent Univ. 

Aksoy, Selim, Bilkent Univ. 

We present solutions to two problems that prevent the effective use of population-based algorithms in clustering problems. 

The first solution presents a new representation for arbitrary covariance matrices that allows independent updating of individual 

parameters while retaining the validity of the matrix. The second solution involves an optimization formulation 

for finding correspondences between different parameter orderings of candidate solutions. The effectiveness of the proposed 

solutions are demonstrated on a novel clustering algorithm based on particle swarm optimization for the estimation of 

Gaussian mixture models. 

15:00-17:10, Paper MoBT9.63 

Object Discovery by Clustering Correlated Visual Word Sets 

Fuentes Pineda, Gibran, The Univ. of Electro-Communications 

Koga, Hisashi, Univ. of Electro-Communications 

Watanabe, Toshinori, Univ. of Electro-Communications 

This paper presents a novel approach to discovering particular objects from a set of unannotated images. We aim to find 

discriminative feature sets that can effectively represent particular object classes (as opposed to object categories). We 

achieve this by mining correlated visual word sets from the bag-of-features model. Specifically, we consider that a visual 

word set belongs to the same object class if all its visual words consistently occur together in the same image. To efficiently 

find such sets we apply Min-LSH to the occurrence vector of the each visual word. An agglomerative hierarchical clustering 

- 67 -

is further performed to eliminate redundancy and obtain more representative sets. We also propose a simple and efficient 

strategy for quantizing the feature descriptors based on locality-sensitive hashing. By experiment, we show that our approach 

can efficiently discover objects against cluster and slight viewpoint variations. 

- 68 -

Technical Program for Tuesday 


- 69 -

- 70 -

TuAT1 Marmara Hall 

Object Detection and Recognition – I Regular Session 

Session chair: Jiang, Xiaoyi (Univ. of Münster) 

09:00-09:20, Paper TuAT1.1 

Learning an Efficient and Robust Graph Matching Procedure for Specific Object Recognition 

Revaud, Jerome, Univ. de Lyon, CNRS 

Lavoue, Guillaume, Univ. de Lyon, CNRS 

Ariki, Yasuo, Kobe Univ. 

Baskurt, Atilla, LIRIS, INSA Lyon 

We present a fast and robust graph matching approach for 2D specific object recognition in images. From a small number 

of training images, a model graph of the object to learn is automatically built. It contains its local key points as well as 

their spatial proximity relationships. Training is based on a selection of the most efficient subgraphs using the mutual information. 

The detection uses dynamic programming with a lattice and thus is very fast. Experiments demonstrate that 

the proposed method outperforms the specific object detectors of the state-of-the-art in realistic noise conditions. 

09:20-09:40, Paper TuAT1.2 

A New Biologically Inspired Feature for Scene Image Classification 

Jiang, Aiwen, Chinese Acad. of Sciences 



Dai, Ruvei, Chinese Acad. of Sciences 

Scene classification is a hot topic in pattern recognition and computer vision area. In this paper, based on the past research 

on vision neuroscience, we proposed a new biologically inspired feature method for scene image classification. The new 

feature accounts for the visual processing from simple cell to complex cell in V1 area, and also the spatial layout for scene 

gist signature. It provides a different line and model revision to consider some nonlinearities inV1 area. We compare it 

with traditional HMAX model and recently proposed ScSPM model, and experiment on a popular 15 scenes dataset. We 

show that our proposed method has many important differences and merits. The experiment results also show that our 

method outperforms the state-of-the-art like ScSPM and KSPM model. 

09:40-10:00, Paper TuAT1.3 

On a Quest for Image Descriptors based on Unsupervised Segmentation Maps 

Koniusz, Piotr, Univ. of Surrey 

Mikolajczyk, Krystian, Univ. of Surrey 

This paper investigates segmentation-based image descriptors for object category recognition. In contrast to commonly 

used interest points the proposed descriptors are extracted from pairs of adjacent regions given by a segmentation method. 

In this way we exploit semi-local structural information from the image. We propose to use the segments as spatial bins 

for descriptors of various image statistics based on gradient, colour and region shape. Proposed descriptors are validated 

on standard recognition benchmarks. Results show they outperform state-of-the-art reference descriptors with 5.6x less 

data and achieve comparable results to them with 8.6x less data. The proposed descriptors are complementary to SIFT 

and achieve state-of-the-art results when combined together within a kernel based classifier. 

10:00-10:20, Paper TuAT1.4 

An RST-Tolerant Shape Descriptor for Object Detection 

Su, Chih-Wen, Acad. Sinica 

Liao, Mark, Acad. Sinica, Taiwan 

Liang, Yu-Ming, Acad. Sinica 

Tyan, Hsiao-Rong, Chung Yuan Christian Univ. 

In this paper, we propose a new object detection method that does not need a learning mechanism. Given a hand-drawn 

model as a query, we can detect and locate objects that are similar to the query model in cluttered images. To ensure the 

invariance with respect to rotation, scaling, and translation (RST), high curvature points (HCPs) on edges are detected 

first. Each pair of HCPs is then used to determine a circular region and all edge pixels covered by the circular region are 

- 71 -

transformed into a polar histogram. Finally, we use these local descriptors to detect and locate similar objects within any 

images. The experiment results show that the proposed method outperforms the existing state-of-the-art work. 

10:20-10:40, Paper TuAT1.5 

Inverse Multiple Instance Learning for Classifier Grids 

Sternig, Sabine, Graz Univ. of Tech. 

Roth, Peter M., Graz Univ. of Tech. 


Recently, classifier grids have shown to be a considerable alternative for object detection from static cameras. However, 

one drawback of such approaches is drifting if an object is not moving over a long period of time. Thus, the goal of this 

work is to increase the recall of such classifiers while preserving their accuracy and speed. In particular, this is realized 

by adapting ideas from Multiple Instance Learning within a boosting framework. Since the set of positive samples is well 

defined, we apply this concept to the negative samples extracted from the scene: Inverse Multiple Instance Learning. By 

introducing temporal bags, we can ensure that each bag contains at least one sample having a negative label, providing 

the required stability. The experimental results demonstrate that using the proposed approach state-of-the-art detection results 

can by obtained, however, showing superior classification results in presence of non-moving objects. 

TuAT2 Topkapı Hall B 

Clustering Regular Session 

Session chair: Tasdizen, Tolga (Univ. of Utah) 

09:00-09:20, Paper TuAT2.1 

On Dynamic Weighting of Data in Clustering with K-Alpha Means 

Chen, Si-Bao, Anhui Univ. 

Wang, Hai-Xian, Southeast Univ. 

Luo, Bin, Anhui Univ. 

Although many methods of refining initialization have appeared, the sensitivity of K-Means to initial centers is still an 

obstacle in applications. In this paper, we investigate a new class of clustering algorithm, K-Alpha Means (KAM), which 

is insensitive to the initial centers. With K-Harmonic Means as a special case, KAM dynamically weights data points 

during iteratively updating centers, which deemphasizes data points that are close to centers while emphasizes data points 

that are not close to any centers. Through replacing minimum operator in K-Means by alpha-mean operator, KAM significantly 

improves the clustering performances. 

09:20-09:40, Paper TuAT2.2 

ARImp: A Generalized Adjusted Rand Index for Cluster Ensembles 

Zhang, Shaohong, City Univ. of Hong Kong 

Wong, Hau-San, City Univ. of Hong Kong 

Adjusted Rand Index (ARI) is one of the most popular measure to evaluate the consistency between two partitions of data 

sets in the areas of pattern recognition. In this paper, ARI is generalized to a new measure, Adjusted Rand Index between 

a similarity matrix and a cluster partition (ARImp), to evaluate the consistency between a set of clustering solutions (or 

cluster partitions) and their associated consensus matrix in a cluster ensemble. The generalization property of ARImp from 

ARI is proved and its preservation of desirable properties of ARI is illustrated with simulated experiments. Also, we show 

with application experiments on several real data sets that ARImp can serve as a filter to identify the less effective cluster 

ensemble methods. 

09:40-10:00, Paper TuAT2.3 

On the Scalability of Evidence Accumulation Clustering 

Lourenço, André, Inst. Superior de Engenharia de Lisboa (ISEL), Inst. Superior Técnico (IST), IT 

Fred, Ana Luisa Nobre, Inst. Superior Técnico 

Jain, Anil, Michigan State Univ. 

This work focuses on the scalability of the Evidence Accumulation Clustering (EAC) method. We first address the space 

- 72 -

complexity of the co-association matrix. The sparseness of the matrix is related to the construction of the clustering ensemble. 

Using a split and merge strategy combined with a sparse matrix representation, we empirically show that a linear 

space complexity is achievable in this framework, leading to the scalability of EAC method to clustering large data-sets. 

10:00-10:20, Paper TuAT2.4 

A Hierarchical Clustering Method for Color Quantization 

Zhang, Jun, Waseda Univ. 

Hu, Jinglu, Waseda Univ. 

In this paper, we propose a hierarchical frequency sensitive competitive learning (HFSCL) method to achieve color quantization 

(CQ). In HFSCL, the appropriate number of quantized colors and the palette can be obtained by an adaptive procedure 

following a binary tree structure with nodes and layers. Starting from the root node that contains all colors in an 

image until all nodes are examined by split conditions, a binary tree will be generated. In each node of the tree, a frequency 

sensitive competitive learning (FSCL) network is used to achieve two-way division. To avoid over-split, merging condition 

is defined to merge the clusters that are close enough to each other at each layer. Experimental results show that HFSCL 

has the desired ability for CQ. 

10:20-10:40, Paper TuAT2.5 

Combining Real and Virtual Graphs to Enhance Data Clustering 

Wang, Liang, The Univ. of Melbourne 

Leckie, Christopher, The Univ. of Melbourne 

Kotagiri, Rao, Univ. of Melbourne 

Fusion of multiple information sources can yield significant benefits to accomplishing certain learning tasks. This paper 

exploits the sparse representation of signals for the problem of data clustering. The method is built within the framework 

of spectral clustering algorithms, which convexly combines a real graph constructed from the given physical features with 

a virtual graph constructed from sparse reconstructive coefficients. The experimental results on several real-world data 

sets have shown that fusion of both real and virtual graphs can obtain better (or at least comparable) results than using 

either graph alone. 

TuAT3 Topkapı Hall A 

3D Shape Recovery Regular Session 

Session chair: Sato, Jun (Nagoya Institute of Technology) 

09:00-09:20, Paper TuAT3.1 

Calibration Method for Line Structured Light Vision Sensor based on Vanish Points and Lines 

Wei, Zhenzhong, Beihang Univ. 

Xie, Meng, Beihang Univ. Ministry of Education 

Zhang, Guangjun, Beihang Univ. 

Line structured light vision sensor (LSLVS) calibration is to establish the location relationship between the camera and 

the light plane projector. This paper proposes a geometrical calibration method of LSLVS based on the property of vanish 

points and lines, by randomly moving the planar target. This method contains two steps, (1) the vanish point of the light 

stripe projected by the light plane is found in each target image, and all the obtained vanish points form the vanish line of 

the light plane, which is helpful to determine the normal of the light plane. (2) one 3D feature point on the light plane is 

acquired (one is enough, surely can be more than one) to determine d parameter of the light plane. Then the equation of 

the light plane under the camera coordinate system can be solved out. Computer simulations and real experiments have 

been carried out to validate our method, and the result of the real calibration reaches the accuracy of 0.141mm within the 

view field of about 300mm×200mm. 

- 73 -

09:20-09:40, Paper TuAT3.2 

A Color Invariant based Binary Coded Structured Light Range Scanner for Shiny Objects 

Benveniste, Rifat, Yeditepe Univ. 

Unsalan, Cem, Yeditepe Univ. 

Object range data provide valuable information in recognition and modeling applications. Therefore, it is extremely important 

to reliably extract the range data from a given object. There are various range scanners based on different principles. 

Among these, structured light based range scanners deserve spacial attention. In these systems, coded light stripes are projected 

onto the object. Using the bending of these light stripes on the object and the triangulation principle, range information 

can be obtained. Since this method is simple and fast, it is used in most industrial range scanners. Unfortunately, 

these range scanners can not scan shiny objects reliably. The main reason is either highlights on the shiny object or the 

ambient light in the environment. These disturb the coding by illumination. As the code is changed, the range data extracted 

from it will also be disturbed. In this study, we propose a color invariant based binary coded structured light range scanner 

to solve this problem. The color invariant used can eliminate the effects of highlights on the object and the ambient light 

from the environment. This way, we can extract the range data of shiny objects in a robust manner. To test our method, we 

developed a prototype range scanner. We provide the obtained range data of various test objects with our range scanner. 

09:40-10:00, Paper TuAT3.3 

Improving Shape-From-Focus by Compensating for Image Magnification Shift 

Pertuz, Said|, Rovira I Virgili Univ. 

Puig, Domenec, Rovira I Virgili Univ. 

Garcia, Miguel Angel, Autonomous Univ. of Madrid 

Images taken with different focus settings are used in shape-from-focus to reconstruct the depth map of a scene. A problem 

when acquiring images with different focus settings is the shift of image features due to changes in magnification. This 

paper shows that those changes affect the shape-from-focus performance and that the final reconstruction can be improved 

by compensating for that shift. The proposed scheme takes into account the effects due to magnification changes between 

near and far focused images and it is able to determine the depth of the scene points with higher accuracy than traditional 

techniques. Experimental results of the application of the proposed method are shown. 

10:00-10:20, Paper TuAT3.4 

Quasi-Dense Wide Baseline Matching for Three Views 

Koskenkorva, Pekka, Univ. of Oulu 

Kannala, Juho, Univ. of Oulu 

Brandt, Sami Sebastian, Univ. of Oulu 

This paper proposes a method for computing a quasi-dense set of matching points between three views of a scene. The 

method takes a sparse set of seed matches between pairs of views as input and then propagates the seeds to neighboring 

regions. The proposed method is based on the best-first match propagation strategy, which is here extended from twoview 

matching to the case of three views. The results show that utilizing the three-view constraint during the correspondence 

growing improves the accuracy of matching and reduces the occurrence of outliers. In particular, compared with 

two-view stereo, our method is more robust for repeating texture. Since the proposed approach is able to produce high 

quality depth maps from only three images, it could be used in multi-view stereo systems that fuse depth maps from multiple 

views. 

10:20-10:40, Paper TuAT3.5 

Robust Shape from Polarisation and Shading 

Huynh, Cong Phuoc, Australian National Univ. 

Robles-Kelly, Antonio, National ICT Australia 

Hancock, Edwin, Univ. of York 

In this paper, we present an approach to robust estimation of shape from single-view multi-spectral polarisation images. 

The developed technique tackles the problem of recovering the azimuth angle of surface normals robust to image noise 

and a low degree of polarisation. We note that the linear least-squares estimation results in a considerable phase shift from 

the ground truth in the presence of noise and weak polarisation in multispectral and hyper spectral imaging. This paper 

discusses the utility of robust statistics to discount the large error attributed to outliers and noise. Combining this approach 

- 74 -

with Shape from Shading, we fully recover the surface shape. We demonstrate the effectiveness of the robust estimator 

compared to the linear least-squares estimator through shape recovery experiments on both synthetic and real images. 

TuAT4 Dolmabahçe Hall A 

Signal Separation and Classification Regular Session 

Session chair: Erzin, Engin (Koc Univ.) 

09:00-09:20, Paper TuAT4.1 

Classifying Three-Way Seismic Volcanic Data by Dissimilarity Representation 

Porro, Diana, Advanced Tech. Application Center 


Orozco-Alzate, Mauricio, Univ. Nacional de Colombia Sede Manizales 

Talavera, Isneri, Advanced Tech. Application Center 

Londoño-Bonilla, John Makario, Inst. Colombiano de Geología y Minería 

Multi-way data analysis is a multivariate data analysis technique having a wide application in some fields. Nevertheless, 

the development of classification tools for this type of representation is incipient yet. In this paper we study the dissimilarity 

representation for the classification of three-way data, as dissimilarities allow the representation of multi-dimensional objects 

in a natural way. As an example, the classification of seismic volcanic events is used. It is shown that in this application 

classification based on 2D spectrograms, dissimilarities perform better than on 1D spectral features. 

09:20-09:40, Paper TuAT4.2 

Improved Blur Insensitivity for Decorrelated Local Phase Quantization 


Ojansivu, Ville, Univ. of Oulu 

Rahtu, Esa, Univ. of Oulu 

This paper presents a novel blur tolerant I relation scheme for local phase quantization (LPQ) texture descriptor. As opposed 

to previous methods, the introduced model can be applied with virtually any kind of blur regardless of the point spread 

function. The new technique takes also into account the changes in the image characteristics originating from the blur 

itself. The implementation does not suffer from multiple solutions like the I relation in original LPQ, but still retains the 

same run-time computational complexity. The texture classification experiments illustrate considerable improvements in 

the performance of LPQ descriptors in the case of blurred images and show only negligible loss of accuracy with sharp 

images. 

09:40-10:00, Paper TuAT4.3 

Ensemble Discriminant Sparse Projections Applied to Music Genre Classification 

Kotropoulos, Constantine, Aristotle Univ. of Thessaloniki 

Arce, Gonzalo, Univ. of Delaware 

Panagakis, Yannis, Aristotle Univ. of Thessaloniki 

Resorting to the rich, psycho-physiologically grounded, properties of the slow temporal modulations of music recordings, 

a novel classifier ensemble is built, which applies discriminant sparse projections. More specifically, over complete dictionaries 

are learned and sparse coefficient vectors are extracted to optimally approximate the slow temporal modulations 

of the training music recordings. The sparse coefficient vectors are then projected to the principal subspaces of their withinclass 

and between-class covariance matrices. Decisions are taken with respect to the minimum Euclidean distance from 

the class mean sparse coefficient vectors, which undergo the aforementioned projections. The application of majority 

voting to the decisions taken by 10 individual classifiers, which are trained on the 10 training folds defined by stratified 

10-fold cross-validation on the GTZAN dataset, yields a music genre classification accuracy of 84.96% on average. The 

latter exceeds by 2.46% the highest accuracy previously reported without employing any sparse representations. 

- 75 -

10:00-10:20, Paper TuAT4.4 

Single Channel Speech Separation using Source-Filter Representation 

Stark, Michael, Graz Univ. of Tech. 

Wohlmayr, Michael, Graz Univ. of Tech. 

Pernkopf, Franz, Graz Univ. of Tech. 

We propose a fully probabilistic model for source-filter based single channel source separation. In particular, we perform 

separation in a sequential manner, where we estimate the source-driven aspects by a factorial HMM used for multi-pitch 

estimation. Afterwards, these pitch tracks are combined with the vocal tract filter model to form an utterance dependent 

model. Additionally, we introduce a gain estimation approach to enable adaptation to arbitrary mixing levels in the speech 

mixtures. We thoroughly evaluate this system and finally end up in a speaker independent model. 

10:20-10:40, Paper TuAT4.5 

Nonlinear Blind Source Separation using Slow Feature Analysis with Random Features 

Ma, Kuijun, Chinese Acad. of Sciences 

Tao, Qing, Chinese Acad. of Sciences 

Wang, Jue, Chinese Acad. of Sciences 

We develop an algorithm RSFA to perform nonlinear blind source separation with temporal constraints. The algorithm is 

based on slow feature analysis using random Fourier features for shift invariant kernels, followed by a selection procedure 

to obtain the sought-after signals. This method not only obtains remarkable results in a short computing time, but also excellently 

handles situations where there are multiple types of mixtures. In kernel methods, since the problem is unsupervised, 

the need of multiple kernels is ubiquitous. Experiments on music excerpts illustrate the strong performance of our 

method. 

TuAT5 Anadolu Auditorium 

Image Analysis – III Regular Session 

Session chair: Kittler, Josef (Univ. of Surrey) 

09:00-09:20, Paper TuAT5.1 

Canonical Image Selection by Visual Context Learning 

Zhou, Wengang, Univ. of Science and Tech. of China 

Lu, Yijuan, Texas State Univ. at San Marcos 

Li, Houqiang, Univ. of Science and Tech. of China 

Tian, Qi, Univ. of Texas at San Antonio 

Canonical image selection is to select a subset of photos that best summarize a photo collection. In this paper, we define 

the canonical image as those that contain most important and distinctive visual words. We propose to use visual context 

learning to discover visual word significance and develop Weighted Set Coverage algorithm to select canonical images 

containing distinctive visual words. Experiments with web image datasets demonstrate that the canonical images selected 

by our approach are not only representatives of the collected photos, but also exhibit a diverse set of views with minimal 

redundancy. 

09:20-09:40, Paper TuAT5.2 

Exposing Digital Image Forgeries by using Canonical Correlation Analysis 

Zhang, Chi, Beijing Univ. of Tech. 

Zhang, Hongbin, Beijing Univ. of Tech. 

In this paper, we propose a new method to detect the forgeries in digital images by using photo-response non-uniformity 

(PRNU) noise features. The method utilizes canonical correlation analysis (CCA) to measure linear correlation relationship 

between two sets of PRNU noise estimation from images taken by the same camera. The linear correlation relationship 

maximizes the correlation between the noise reference pattern(or PRNU noise estimation) and PRNU noise features from 

the same camera. To further improve the detection accuracy rate, the difference of variance between an image region and 

its smoothed version is used to categorize the image region into heavily textured region class or non-heavily textured 

region class. For a heavily textured region or a non-heavily textured region, Neyman-Pearson decision is used to calculate 

the corresponding threshold, and get the final result of detection. 

- 76 -

09:40-10:00, Paper TuAT5.3 

Adding Affine Invariant Geometric Constraint for Partial-Duplicate Image Retrieval 

Wu, Zhipeng, Chinese Acad. of Sciences 

Xu, Qianqian, Chinese Acad. of Sciences 

Jiang, Shuqiang, Chinese Acad. of Sciences 

Huang, Qingming, Chinese Acad. of Sciences 

Cui, Peng, Chinese Acad. of Sciences 

Li, Liang, Chinese Acad. of Sciences 

The spring up of large numbers of partial-duplicate images on the internet brings a new challenge to the image retrieval 

systems. Rather than taking the image as a whole, researchers bundle the local visual words by MSER detector into groups 

and add simple relative ordering geometric constraint to the bundles. Experiments show that bundled features become 

much more discriminative than single feature. However, the weak geometric constraint is only applicable when there is 

no significant rotation between duplicate images and it couldn’t handle the circumstances of image flip or large rotation 

transformation. In this paper, we improve the bundled features with an affine invariant geometric constraint. It employs 

area ratio invariance property of affine transformation to build the affine invariant matrix for bundled visual words. Such 

affine invariant geometric constraint can cope well with flip, rotation or other transformations. Experimental results on 

the internet partial-duplicate image database verify the promotion it brings to the original bundled features approach. Since 

currently there is no available public corpus for partial-duplicate image retrieval, we also publish our dataset for future 

studies. 

10:00-10:20, Paper TuAT5.4 

Outlier-Resistant Dissimilarity Measure for Feature-Based Image Matching 

Palenichka, Roman, Univ. of Quebec 

Lakhssassi, Ahmed, Univ. of Quebec 

Zaremba, Marek, Univ. of Quebec 

A novel dissimilarity measure is proposed to perform correspondence image matching for object recognition, image registration 

and content-based image retrieval. This is a feature-based matching, which supposes image representation (object 

description) in the form of a set of multi-location descriptor vectors. The proposed measure called intersection matching 

distance eliminates outlies (false or missing feature points) while transformation-invariantly matching two sets of descriptor 

vectors. A block-subdivision algorithm for time-efficient image matching is also described. 

10:20-10:40, Paper TuAT5.5 

The University of Surrey Visual Concept Detection System at ImageCLEF@ICPR: Working Notes 

Tahir, Muhammad Atif, Univ. of Surrey 

Fei, Yan, Univ. of Surrey 

Barnard, Mark, Univ. of Surrey 

Awais, Muhammad, Univ. of Surrey 


Kittler, Josef, Univ. of Surrey 

Visual concept detection is one of the most important tasks in image and video indexing. This paper describes our system 

in the Image CLEF@ICPR Visual Concept Detection Task which ranked {\it first} for large-scale visual concept detection 

tasks in terms of Equal Error Rate (EER) and Area under Curve (AUC) and ranked {\it third} in terms of hierarchical 

measure. The presented approach involves state-of-the-art local descriptor computation, vector quantisation via clustering, 

structured scene or object representation via localised histograms of vector codes, similarity measure for kernel construction 

and classifier learning. The main novelty is the classifier-level and kernel-level fusion using Kernel Discriminant Analysis 

with RBF/Power Chi-Squared kernels obtained from various image descriptors. For 32 out of 53 individual concepts, we 

obtain the best performance of all 12 submissions to this task. 

- 77 -

TuAT6 Dolmabahçe Hall B 

Texture Regular Session 

Session chair: Theodoridis, Sergios (Univ. of Athens) 

09:00-09:20, Paper TuAT6.1 

On Adapting Pixel-Based Classification to Unsupervised Texture Segmentation 

Melendez, Jaime, Rovira I Virgili Univ. 

Puig, Domenec, Univ. Rovira I Virgili 

Garcia, Miguel Angel, Autonomous Univ. of Madrid 

An inherent problem of unsupervised texture segmentation is the absence of previous knowledge regarding the texture 

patterns present in the images to be segmented. A new efficient methodology for unsupervised image segmentation based 

on texture is proposed. It takes advantage of a supervised pixel-based texture classifier trained with feature vectors associated 

with a set of texture patterns initially extracted through a clustering algorithm. Therefore, the final segmentation is 

achieved by classifying each image pixel into one of the patterns obtained after the previous clustering process. Multisized 

evaluation windows following a top-down approach are applied during pixel classification in order to improve accuracy. 

The proposed technique has been experimentally validated on MeasTex, VisTex and Brodatz compositions, as well 

as on complex ground and aerial outdoor images. Comparisons with state-of the-art unsupervised texture segmenters are 

also provided. 

09:20-09:40, Paper TuAT6.2 

Natural Material Recognition with Illumination Invariant Textural Features 

Vacha, Pavel, Inst. of Information Theory and Automation 

Haindl, Michael, Inst. of Information Theory and Automation 

A visual appearance of natural materials fundamentally depends on illumination conditions, which significantly complicates 

a real scene analysis. We propose textural features based on fast Markovian statistics, which are simultaneously invariant 

to illumination colour and robust to illumination direction. No knowledge of illumination conditions is required and a 

recognition is possible from a single training image per material. Material recognition is tested on the currently most realistic 

visual representation – Bidirectional Texture Function (BTF), using the Amsterdam Library of Textures (I), which 

contains 250 natural materials acquired in different illumination conditions. Our proposed features significantly outperform 

several leading alternatives including Local Binary Patterns (LBP, LBP-HF) and Gabor features. 

09:40-10:00, Paper TuAT6.3 

Gaze-Motivated Compression of Illumination and View Dependent Textures 

Filip, Jiri, Inst. of Information Theory and Automation of the AS CR 


Chantler, Michael J., Heriot-Watt Univ. 

Illumination and view dependent texture provide ample information on the appearance of real materials at the cost of enormous 

data storage requirements. Hence, past research focused mainly on compression and modelling of these data, however, 

few papers have explicitly addressed the way in which humans perceive these compressed data. We analyzed human 

gaze information to determine appropriate texture statistics. These statistics were then exploited in a pilot illumination 

and view direction dependent data compression algorithm. Our results showed that taking into account local texture variance 

can increase compression of current methods more than twofold, while preserving original realistic appearance and 

allowing fast data reconstruction. 

10:00-10:20, Paper TuAT6.4 

Perceptual Color Texture Codebooks for Retrieving in Highly Diverse Texture Datasets 

Alvarez, Susana, Univ. Rovira I Virgili 

Salvetella, Anna, Univ. Autònoma de Barcelona 

Vanrell, Maria, Univ. Autònoma de Barcelona 

Otazu, Xavier, Univ. Autònoma de Barcelona 

Color and texture are visual cues of different nature, their integration in a useful visual descriptor is not an obvious step. 

One way to combine both features is to compute texture descriptors independently on each color channel. A second way 

- 78 -

is integrate the features at a descriptor level, in this case arises the problem of normalizing both cues. A significant progress 

in the last years in object recognition has provided the bag-of-words framework that again deals with the problem of 

feature combination through the definition of vocabularies of visual words. Inspired in this framework, here we present 

perceptual textons that will allow to fuse color and texture at the level of p-blobs, which is our feature detection step. 

Feature representation is based on two uniform spaces representing the attributes of the p-blobs. The low-dimensionality 

of these text on spaces will allow to bypass the usual problems of previous approaches. Firstly, no need for normalization 

between cues; and secondly, vocabularies are directly obtained from the perceptual properties of text on spaces without 

any learning step. Our proposal improve current state-of-art of color-texture descriptors in an image retrieval experiment 

over a highly diverse texture dataset from Corel. 

10:20-10:40, Paper TuAT6.5 

Illumination Estimation of 3D Surface Texture based on Active Basis 

Dong, Junyu, Ocean Univ. of China 

Su, Liyuan, Ocean Univ. of China 

Duan, Yuanxu, Alcatel-Lucent R&D 

This paper describes an approach to estimate illumination directions of 3D surface texture based on Active Basis. Instead 

of applying Gabor wavelet transform to extract texture features, we represent our texture features with a simple Haar 

feature to improve efficiency. The Active Basis model can be learned from training image patches by the shared pursuit 

algorithm. The base histogram can then be obtained based on each model. We measure the illumination directions by minimizing 

the Euclidean distance and the entropy difference of base histograms between the test image and the training sets. 

Experimental results demonstrate the effectiveness and accuracy of the proposed approach. 

TuAT7 Dolmabahçe Hall C 

Security and Privacy Regular Session 

Session chair: Veldhuis, Raymond (Univ of Twente) 

09:00-09:20, Paper TuAT7.1 

Binary Discriminant Analysis for Face Template Protection 

Feng, Y C, Hong Kong Baptist Univ. 

Yuen, Pong C, Hong Kong Baptist Univ. 

Biometric cryptosystem (BC) is a very secure approach for template protection because the stored template is encrypted. 

The key issues in BC approach include(I) limited capability in handling intra-class variations and (ii) binary input is required. 

To overcome these problems, this paper adopts the concept of discriminative analysis and develops a new binary 

discriminant analysis (BDA) method to convert a real valued template to a binary template. Experimental results on CMU- 

PIE and FRGC face databases show that the proposed BDA method outperforms existing template binarization schemes. 

09:20-09:40, Paper TuAT7.2 

Renewable Minutiae Templates with Tunable Size and Security 

Yang, Bian, Gjovik Univ. Coll. 

Busch, Christoph, Gjovik Univ. Coll. 

Gafurov, Davrondzhon, Gjovik Univ. Coll. 

Bours, Patrick, Gjovik Univ. Coll. 

A renewable fingerprint minutiae template generation scheme is proposed to utilize random projection for template diversification 

in a security enhanced way. The scheme first achieves absolute pre-alignment over local minutiae quadruplets 

in the original template and results in a fix-length feature vector; and then encrypts the feature vector by projecting it to 

multiple random matrices and quantizing the projected result; and finally post-process the resultant binary vector in a size 

and security tunable way to obtain the final protected minutia vicinity. Experiments on the fingerprint database 

FVC2002DB2_A demonstrate the desirable biometric performance of the proposed scheme. 

- 79 -

09:40-10:00, Paper TuAT7.3 

Tokenless Cancelable Biometrics Scheme for Protecting IrisCodes 

Ouda, Osama, Chiba Univ. 

Tsumura, Norimichi, Chiba Univ. 

Nakaguchi, Toshiya, Chiba Univ. 

In order to satisfy the requirements of the cancelable biometrics construct, cancelable biometrics techniques rely on other 

authentication factors such as password keys and/or user specific tokens in the transformation process. However, such 

multi-factor authentication techniques suffer from the same issues associated with traditional knowledge-based and tokenbased 

authentication systems. This paper presents a new one-factor cancelable biometrics scheme for protecting Iris Codes. 

The proposed method is based solely on Iris Codes; however, it satisfies the requirements of revocability, diversity and 

noninvertibility without deteriorating the recognition performance. Moreover, the transformation process is easy to implement 

and can be integrated simply with current iris matching systems. The impact of the proposed transformation 

process on the the recognition accuracy is discussed and its noninvertibility is analyzed. The effectiveness of the proposed 

method is confirmed experimentally using CASIA-IrisV3-Interval dataset. 

10:00-10:20, Paper TuAT7.4 

A Novel Fingerprint Template Protection Scheme based on Distance Projection Coding 

Wang, Ruifang, Chinese Acad. of Sciences 

Yang, Xin, Chinese Acad. of Sciences 

Liu, Xia, Harbin University of Science and Technology 

Zhou, Sujing, Chinese Acad. of Sciences 

Li, Peng, Chinese Acad. of Sciences 

Cao, Kai, Chinese Acad. of Sciences 

Tian, Jie, Chinese Acad. of Sciences 

The biometric template, which is stored in the form of raw data, has become the greatest potential threat to the security of 

biometric authentication system. As the compromise of the biometric data is permanent, the protection of biometric data 

is particularly important. Consequently, biometric template protection technologies have aroused research highlights recently. 

One of the most popular template protection methods is biometric cryptosystem method. In this paper, we design 

a codebook named distance projection for biometric coding to generate secured biometric template, and propose a novel 

fingerprint biometric cryptosystem scheme based on the codebook. Experimental results on FVC2002 DB2 show that the 

proposed scheme can obtain positive results on both security and authentication accuracy. 

10:20-10:40, Paper TuAT7.5 

Combination of Symmetric Hash Functions for Secure Fingerprint Matching 

Kumar, Gaurav, State Univ. of New York at Buffalo 

Tulyakov, Sergey, Univ. at Buffalo 


Fingerprint based secure biometric authentication systems have received considerable research attention lately, where the 

major goal is to provide an anonymous, multipliable and easily revocable methodology for fingerprint verification. In our 

previous work, we have shown that symmetric hash functions are very effective in providing such secure fingerprint representation 

and matching since they are independent of order of minutiae triplets as well as location of singular points 

(e.g. core and delta). In this paper, we extend our prior work by generating a combination of symmetric hash functions, 

which increases the security of fingerprint matching by an exponential factor. Firstly, we extract kplets from each fingerprint 

image and generate a unique key for combining multiple hash functions up to an order of (k-1). Each of these keys is generated 

using the features extracted from minutiae k-plets such as bin index of smallest angles in each k-plet. This combination 

provides us an extra security in the face of brute force attacks, where the compromise of few hash functions as well 

do not compromise the overall matching. Our experimental results suggest that the EER obtained using the combination 

of hash functions (4.98%) is comparable with the baseline system (3.0%), with the added advantage of being more secure. 

- 80 -

TuAT8 Lower Foyer 

Structural Methods and Speech/Image Analysis Poster Session 

Session chair: Aguiar, Pedro M. Q. (Institute for Systems and Robotics / Instituto Superior Tecnico) 

09:00-11:10, Paper TuAT8.1 

Face Recognition based on Illumination Adaptive LDA 

Liu, Zhonghua, Nanjing Univ. of Science and Tech. 

Zhou, Jingbo, Nanjing Univ. of Science and Tech. 

Jin, Zhong, Nanjing Univ. of Science and Tech. 

The variation of facial appearance due to the illumination degrades face recognition systems considerably, which is well 

known as one of the bottlenecks in face recognition. However, the variations of each subject which are due to the changes 

of illumination are extremely similar to each other. We offline collect many face classes each of which has many images 

under different lighting conditions, a common within-class scatter matrix describing the within-class illumination variations 

of all the face classes can be gotten. Based on this, illumination adaptive linear discriminant analysis (IALDA) is proposed 

to solve illumination variation problems in face recognition when each face class has only one training sample under the 

standard lighting conditions. In the IALDA method, the illumination direction of an input face image is firstly estimated. 

Then the corresponding LDA feature, which is robust to the variations between the images under the estimated lighting 

conditions and the standard lighting conditions, is extracted. Experiments on the face databases demonstrate the effectiveness 

of the proposed method. 

09:00-11:10, Paper TuAT8.2 

Topological Dynamic Bayesian Networks 

Bouchaffra, Djamel, Grambling State Univ. 

The objective of this research is to embed topology within the dynamic Bayesian network (DBN) formalism. This extension 

of a DBN (that encodes statistical or causal relationships) to a topological DBN (TDBN) allows continuous mappings 

(e.g., topological homeomorphisms), topological relations (e.g., homotopy equivalences) and invariance properties (e.g., 

surface genus, compactness) to be exploited. The mission of TDBN is not limited only to classify objects but to reveal 

how these objects are topologically related as well. Because TDBN formalism uses geometric constructors that project a 

discrete space onto a continuous space, it is well suited to identify objects that undergo smooth deformation. Experimental 

results in face identification across ages represent conclusive evidence that the fusion of statistics and topology embodied 

by the TDBN concept holds promise. The TDBN formalism outperformed the DBN approach in facial identification across 

ages. 

09:00-11:10, Paper TuAT8.3 

Vector Space Embedding of Undirected Graphs with Fixed-Cardinality Vertex Sequences for Classification 

Richiardi, Jonas, Ec. Pol. Fédérale de Lausanne 

Van De Ville, Dimitri, Ec. Pol. Fédérale de Lausanne 

Riesen, Kaspar, Univ. of Bern 

Bunke, Horst, Univ. of Bern 

Simple weighted undirected graphs with a fixed number of vertices and fixed vertex orderings can be used to represent 

data and patterns in a wide variety of scientific and engineering domains. Classification of such graphs by existing graph 

matching methods perform rather poorly because they do not exploit their specificity. As an alternative, methods relying 

on vector-space embedding hold promising potential. We propose two such techniques that can be deployed as a frontend 

for any pattern recognition classifiers: one has low computational cost but generates high-dimensional spaces, while 

the other is more computationally demanding but can yield relatively low-dimensional vector space representations. We 

show experimental results on an fMRI brain state decoding task and discuss the shortfalls of graph edit distance for the 

type of graph under consideration. 

- 81 -

09:00-11:10, Paper TuAT8.4 

Hierarchical Large Margin Nearest Neighbor Classification 

Chen, Qiaona, East China Normal Univ. 

Sun, Shiliang, East China Normal Univ. 

Distance metric learning has exhibited its great power to enhance performance in metric related pattern recognition tasks. 

The recent large margin nearest neighbor classification (LMNN) improves the performance of k-nearest neighbor classification 

by learning a global distance metric. However, it does not consider the locality of data distributions, which is 

crucial in determining a proper metric. In this paper, we propose a novel local distance metric learning method called hierarchical 

LMNN (HLMNN) which first builds a hierarchical structure by grouping data points according to the overlapping 

ratios defined by us and then learns distance metrics sequentially. Experimental results on real-world data sets including 

comparisons with the traditional k-nearest neighbor and the state-of-the-art LMNN show the effectiveness of the proposed 

HLMNN. 

09:00-11:10, Paper TuAT8.5 

Adapting Information Theoretic Clustering to Binary Images 



We consider the problem of finding points of interest along local curves of binary images. Information theoretic vector 

quantization is a clustering algorithm that shifts cluster centers towards the modes of principal curves of a data set. Its 

runtime characteristics, however, do not allow for efficient processing of many data points. In this paper, we show how to 

solve this problem when dealing with data on a 2D lattice. Borrowing concepts from signal processing, we adapt information 

theoretic clustering to the quantization of binary images and gain significant speedup. 

09:00-11:10, Paper TuAT8.6 

Nearest-Manifold Classification with Gaussian Processes 

Jun, Goo, Univ. of Texas at Austin 

Ghosh, Joydeep, Univ. of Texas 

Manifold models for nonlinear dimensionality reduction provide useful low-dimensional representations of high-dimensional 

data. Most manifold models are unsupervised algorithms and map the entire data onto a single manifold. Heterogeneous 

data with multiple classes are often better modeled by multiple manifolds rather than by a single global manifold, 

but there is no explicit way to compare instances embedded in different subspaces. We propose a novel low-to-high dimensional 

mapping using Gaussian processes that offers comparisons in the original space. Based on the mapping, we 

propose a nearest-manifold classification algorithm for high-dimensional data. Experimental results show that the proposed 

algorithm provides good classification accuracies for problems well-modeled by multiple manifolds. 

09:00-11:10, Paper TuAT8.7 

Mining Exemplars for Object Modelling using Affinity Propagation 

Xia, Shengping, Univ. of York 

Liu, Jianjun, Univ. of York 


This paper focusses on the problem of locating object class exemplars from a large corpus of images using a infinity propagation. 

We use attributed relational graphs to represent groups of local invariant features together with their spatial arrangement. 

Rather than mining exemplars from the entire graph corpus, we prefer to cluster object specific exemplars. Firstly, 

we obtain an object specific cluster of graphs using similarity propagation. The popular a nity propagation method is then 

individually applied to each object specific cluster. Using this clustering method, we can obtain object specific exemplars 

together with a high precision for the data associated with each exemplar. Experiments are performed on over 80K images 

spanning 500 objects, and demonstrate the performance of the method in terms of efficiency, scalability. 

- 82 -

09:00-11:10, Paper TuAT8.8 

Background Filtering for Improving of Object Detection in Images 

Qin, Ge, Univ. of Surrey 

Vrusias, Bogdan, Univ. of Surrey 

Gillam, Lee, Univ. of Surrey 

We propose a method for improving object recognition in street scene images by identifying and filtering out background 

aspects. We analyse the semantic relationships between foreground and background objects and use the information obtained 

to remove areas of the image that are misclassified as foreground objects. We show that such background filtering 

improves the performance of four traditional object recognition methods by over 40%. Our method is independent of the 

recognition algorithms used for individual objects, and can be extended to generic object recognition in other environments 

by adapting other object models. 

09:00-11:10, Paper TuAT8.9 

Sparse Local Discriminant Projections for Feature Extraction 

Lai, Zhihui, Nanjing Univ. of Science and Tech. 


Yang, Jian, Nanjing Univ. of Science and Tech. 

Wong, W.K., The Hong Kong Pol. Univ. 

One of the major disadvantages of the linear dimensionality reduction algorithms, such as Principle Component Analysis 

(PCA) and Linear Discriminant Analysis (LDA), are that the projections are linear combination of all the original features 

or variables and all weights in the linear combination known as loadings are typically non-zero. Thus, they lack physical 

interpretation in many applications. In this paper, we propose a novel supervised learning method called Sparse Local 

Discriminant Projections (SLDP) for linear dimensionality reduction. SLDP introduces a sparse constraint into the objective 

function and obtains a set of sparse projective axes with directly physical interpretation. The sparse projections can be efficiently 

computed by the Elastic Net combining with spectral analysis. The experimental results show that SLDP give 

the explicit interpretation on its projections and achieves competitive performance compared with some dimensionality 

reduction techniques. 

09:00-11:10, Paper TuAT8.10 

Information-Theoretic Feature Selection from Unattributed Graphs 

Bonev, Boyan, Univ. of Alicante 


Giorgi, Daniela, National Res. Council 

Biasotti, Silvia, CNR – IMATI 

In this work we evaluate purely structural graph measures for 3D objects classification. We extract spectral features from 

different Reeb graph representations. Information-theoretic feature selection gives an insight on which are the most relevant 

features. 

09:00-11:10, Paper TuAT8.11 

Head Pose Estimation based on Random Forests for Multiclass Classification 

Huang, Chen, Tsinghua Univ. 


Fang, Chi, Tsinghua Univ. 

Head pose estimation remains a unique challenge for computer vision system due to identity variation, illumination 

changes, noise, etc. Previous statistical approaches like PCA, linear discriminative analysis (LDA) and machine learning 

methods, including SVM and Adaboost, cannot achieve both accuracy and robustness that well. In this paper, we propose 

to use Gabor feature based random forests as the classification technique since they naturally handle such multi-class classification 

problem and are accurate and fast. The two sources of randomness, random inputs and random features, make 

random forests robust and able to deal with large feature spaces. Besides, we implement LDA as the node test to improve 

the discriminative power of individual trees in the forest, with each node generating both constant and variant number of 

children nodes. Experiments are carried out on two public databases to show the proposed algorithm outperforms other 

approaches in both accuracy and computational efficiency. 

- 83 -

09:00-11:10, Paper TuAT8.12 

Differential Morphological Decomposition Segmentation: A Multi-Scale Object based Image Description 

Gueguen, Lionel, JRC – European Commission 

Soille, Pierre, Ec. Joint Res. Centre 

Pesaresi, Martino, Ec. Joint Res. Centre 

In order to describe, to extract image information content, segmentation is a well-known approach to represent the information 

in terms of objects. Image segmentation is a common image processing technique aiming at disintegrating an 

image into a partition of its support. Hierarchical of fuzzy segmentation are extension of segmentation definition, in order 

to provide a covering of the image support with overlapping segments. In this paper, we propose a novel approach for 

breaking up an image into multi-scale overlapping objects. The image is decomposed by granulometry or differential morphological 

pyramid, resulting in a discrete scale-space representation. Then, the scale-space transform is segmented by a 

region based method. Projecting the obtained scale-space partition into space constitutes the disintegrated image representation, 

which enables a multi-scale object based image description. 

09:00-11:10, Paper TuAT8.13 

Efficient Learning to Label Images 

Jia, Ke, Australian National Univ. National ICT Australia 

Cheng, Li, NICTA 

Liu, Nianjun, NICTA 

Wang, Lei, The Australian National Univ. 

Conditional random field methods (CRFs) have gained popularity for image labeling tasks in recent years. In this paper, 

we describe an alternative discriminative approach, by extending the large margin principle to incorporate spatial correlations 

among neighboring pixels. In particular, by explicitly enforcing the sub modular condition, graph-cuts is conveniently 

integrated as the inference engine to attain the optimal label assignment efficiently. Our approach allows learning 

a model with thousands of parameters, and is shown to be capable of readily incorporating higher-order scene context. 

Empirical studies on a variety of image datasets suggest that our approach performs competitively compared to the stateof-the-art 

scene labeling methods. 

09:00-11:10, Paper TuAT8.14 

NAVIDOMASS: Structural-Based Approaches towards Handling Historical Documents 

Jouili, Salim, LORIA 

Coustaty, Mickaël, Univ. of La Rochelle 

Tabbone, Salvatore, Univ. Nancy 2-LORIA 

Ogier, Jean-Marc, Univ. de la Rochelle 

In the context of the NAVIDOMASS project, the problematic of this paper concerns the clustering of historical document 

images. We propose a structural-based framework to handle the ancient ornamental letters data-sets. The contribution, 

firstly, consists of examining the structural (i.e. graph) representation of the ornamental letters, secondly, the graph matching 

problem is applied to the resulted graph-based representations. In addition, a comparison between the structural (graphs) 

and statistical (generic Fourier descriptor) techniques is drawn. 

09:00-11:10, Paper TuAT8.15 

Median Graph Shift: A New Clustering Algorithm for Graph Domain 

Jouili, Salim, LORIA 


Lacroix, Vinciane, Royal Military Acad. Belgium 

In the context of unsupervised clustering, a new algorithm for the domain of graphs is introduced. In this paper, the key idea 

is to adapt the mean-shift clustering and its variants proposed for the domain of feature vectors to graph clustering. These algorithms 

have been applied successfully in image analysis and computer vision domains. The proposed algorithm works in 

an iterative manner by shifting each graph towards the median graph in a neighborhood. Both the set median graph and the 

generalized median graph are tested for the shifting procedure. In the experiment part, a set of cluster validation indices are 

used to evaluate our clustering algorithm and a comparison with the well-known Kmeans algorithm is provided. 

09:00-11:10, Paper TuAT8.16 

- 84 -

A Discrete Labelling Approach to Attributed Graph Matching using SIFT Features 

Sanroma, Gerard, Univ. Rovira I Virgili 

Alquezar, Rene, Univ. Pol. De Catalunya 

Serratosa, Francesc, Univ. Rovira I Virgili 

Local invariant feature extraction methods are widely used for image-features matching. There exist a number of approaches 

aimed at the refinement of the matches between image-features. It is a common strategy among these approaches 

to use geometrical criteria to reject a subset of outliers. One limitation of the outlier rejection design is that it is unable to 

add new useful matches. We present a new model that integrates the local information of the SIFT descriptors along with 

global geometrical information to estimate a new robust set of feature-matches. Our approach encodes the geometrical information 

by means of graph structures while posing the estimation of the feature-matches as a graph matching problem. 

Some comparative experimental results are presented. 

09:00-11:10, Paper TuAT8.17 

A Conductance Electrical Model for Representing and Matching Weighted Undirected Graphs 

Igelmo, Manuel, Univ. Pol. De Catalunya 


Ferrer, Miquel, Univ. Pol. De Catalunya 

In this paper we propose a conductance electrical model to represent weighted undirected graphs that allows us to efficiently 

compute approximate graph isomorphism in large graphs. The model is built by transforming a graph into an electrical 

circuit. Edges in the graph become conductances in the electrical circuit. This model follows the laws of the electrical 

circuit theory and we can potentially use all the existing theory and tools of this field to derive other approximate techniques 

for graph matching. In the present work, we use the proposed circuital model to derive approximated graph isomorphism 

solutions. 

09:00-11:10, Paper TuAT8.18 

Computing the Barycenter Graph by Means of the Graph Edit Distance 

Bardaji, Itziar, Univ. Pol. De Catalunya 

Ferrer, Miquel, Univ. Pol. De Catalunya 


The barycenter graph has been shown as an alternative to obtain the representative of a given set of graphs. In this paper 

we propose an extension of the original algorithm which makes use of the graph edit distance in conjunction with the 

weighted mean of a pair of graphs. Our main contribution is that we can apply the method to attributed graphs with any 

kind of labels in both the nodes and the edges, equipped with a distance function less constrained than in previous approaches. 

Experiments done on four different datasets support the validity of the method giving good approximations of 

the barycenter graph. 

09:00-11:10, Paper TuAT8.19 

Refined Morphological Methods of Moment Computation 

Suk, Tomas, Inst. of Information Theory and Automation 

Flusser, Jan, Inst. of Information Theory and Automation 

A new method of moment computation based on decomposition of the object into rectangular blocks is presented. The decomposition 

is accomplished by means of distance transform. The method is compared with earlier morphological methods, 

namely with erosion decomposition to squares. All the methods are also compared with direct computation by definition. 

09:00-11:10, Paper TuAT8.20 

Robust Computation of the Polarisation Image 

Saman, Gule, Univ. of York 


In this paper we show how to render the computation of polarisation information from multiple polariser angle images robust. 

We make two contributions. First, we show how to use M-estimators to make robust moments estimates of the mean 

intensity, polarisation and phase. Second, we show how directional statistics can be used to smooth the phase-angle, and 

- 85 -

to improve its estimation when the polarisation is small. We apply the resulting techniques to polariser images and perform 

surface quality inspection. Compared to polarisation information delivered by the three-point method, our estimates reveal 

finer surface detail. 

09:00-11:10, Paper TuAT8.21 

Fast Polar and Spherical Fourier Descriptors for Feature Extraction 

Yang, Zhuo, Waseda Univ. 

Kamata, Sei-Ichiro, Waseda Univ. 

Polar Fourier Descriptor(PFD) and Spherical Fourier Descriptor(SFD) are rotation invariant feature descriptors for two 

dimensional(2D) and three dimensional(3D) image retrieval and pattern recognition tasks. They are demonstrated to show 

superiorities compared with other methods on describing rotation invariant features of 2D and 3D images. However in 

order to increase the computation speed, fast computation method is needed especially for applications like real-time systems 

and large image databases. This paper presents fast computation method for PFD and SFD that based on mathematical 

properties of trigonometric functions and associated Legendre polynomials. Proposed fast PFD and SFD are 8 and 16 

times faster than traditional ones that significantly boost computation process. 

09:00-11:10, Paper TuAT8.22 

RBM-Based Silhouette Encoding for Human Action Modelling 

Marin-Jimenez, Manuel Jesus, Univ. of Cordoba 

Perez De La Blanca, Nicolas, UGR 

Mendoza Perez, Maria Angeles, Univ. de Granada 

In this paper we evaluate the use of Restricted Bolzmann Machines (RBM) in the context of learning and recognizing 

human actions. The features used as basis are binary silhouettes of persons. We test the proposed approach on two datasets 

of human actions where binary silhouettes are available: ViHASi (synthetic data) and Weizmann (real data). In addition, 

on Weizmann dataset, we combine features based on optical flow with the associated binary silhouettes. The results show 

that thanks to the use of RBM-based models, very informative and shorter feature vectors can be obtained for the classification 

tasks, improving the classification performance. 

09:00-11:10, Paper TuAT8.23 

Shape Classification using Tree-Unions 

Wang, Bo, Huazhong Univ. of Science and Tech. 

Shen, Wei, Huazhong Univ. of Science and Tech. 

Liu, Wenyu, Huazhong Univ. of Science and Tech. 

You, Xinge, Huazhong Univ. of Science and Tech. 

Bai, Xiang, Huazhong Univ. of Science and Tech. 

In this paper, we proposed a novel approach to shape classification. A new shape tree based on junction nodes can represent 

the global structure in a simple way. The statistic distribution of junctions can be learned by merging the shape trees. In 

the process of learning, context of a junction node is obtained to improve the rate of classification. We illustrate the utility 

of the proposed method on the problem of 2D shape classification using the new shape tree representation. 

09:00-11:10, Paper TuAT8.24 

Sparse Coding of Linear Dynamical Systems with an Application to Dynamic Texture Recognition 

Ghanem, Bernard, Univ. of Illinois at Urbana-Champaign 

Ahuja, Narendra, 

Given a sequence of observable features of a linear dynamical system (LDS), we propose the problem of finding a representation 

of the LDS which is sparse in terms of a given dictionary of LDSs. Since LDSs do not belong to Euclidean 

space, traditional sparse coding techniques do not apply. We propose a probabilistic framework and an efficient MAP algorithm 

to learn this sparse code. Since dynamic textures (DTs) can be modeled as LDSs, we validate our framework and 

algorithm by applying them to the problems of DT representation and DT recognition. In the case of occlusion, we show 

that this sparse coding scheme outperforms conventional DT recognition methods. 

- 86 -

09:00-11:10, Paper TuAT8.25 

Background Modeling by Combining Joint Intensity Histogram with Time-Sequential Data 

Kita, Yasuyo, National Inst. of Advanced Industrial Science and Technology 

In this paper, a method for detecting changes from time-sequential images of outside scenes which are taken with several 

minutes interval is proposed. Recently, statistical background intensity model per pixel using Gaussian mixture model 

(GMM) has shown its effectiveness for detecting changes from video streams. However, when the time interval between 

consecutive images is long, enough number of frames can not be sampled for building useful GMM. To robustly build a 

pixel wise background model at time t0 from small number of fore and aft frames, we propose to use the joint intensity 

histogram of the images at time t0 and t0 + 1, H(It0, Ito+1). Under background dominance condition, background probability 

distribution for each intensity level at t0 can be estimated from H(It0, Ito+1). By taking this background probability 

distribution per intensity as a prior probability, GMM which models the variation in each pixel is robustly calculated even 

from several frames. Experimental results using actual field monitoring images have shown the advantage of the proposed 

method. 

09:00-11:10, Paper TuAT8.26 

2LDA: Segmentation for Recognition 

Perina, Alessandro, Univ. of Verona 

Cristani, Marco, Univ. of Verona 


Following the trend of segmentation for recognition, we present 2LDA, a novel generative model to automatically segment 

an image in 2 segments, background and foreground, while inferring a latent Dirichlet allocation (LDA) topic distribution 

on both segments. The idea is to merge two separate modules, LDA and the segmentation module, explicitly considering 

(and exchanging) the uncertainty between them. The resulting model adds spatial relationships to LDA, which in turn 

helps in using the topics to segment an image. The experimental results show that, unlike LDA, our model can be used to 

recognize objects, and also outperforms the state of the art algorithms. 

09:00-11:10, Paper TuAT8.27 

Modeling and Generalization of Discrete Morse Terrain Decompositions 

De Floriani, L. 

Magillo, Paola, Univ. of Genova 

Vitali, Maria, DISI, Univ. of Genova 

We address the problem of morphological analysis of real terrains. We describe a morphological model for a terrain by 

considering extensions of Morse theory to the discrete case. We propose a two-level model of the morphology of a terrain 

based on a graph joining the critical points of the terrain through integral lines. We present a new set of generalization operators 

specific for discrete piece-wise linear terrain models, which are used to reduce noise and the size of the morphological 

representation. We show results of our approach on real terrains. 

09:00-11:10, Paper TuAT8.28 

Region Description using Extended Local Ternary Patterns 

Liao, Wen-Hung, National Chengchi Univ. 

The local binary pattern (LBP) operator is a computationally efficient local texture descriptor and has found many useful 

applications. However, its sensitivity to noise and the high dimensionality of histogram associated with a mediocre size 

neighborhood have raised some concerns. In this paper, we attempt to improve the original LBP by proposing a novel extension 

named extended local ternary pattern (ELTP). We will investigate the characteristics of ELTP in terms of noise 

sensitivity, discriminability and computational efficiency. Preliminary experimental results have shown better efficacy of 

ELTP over the original LBP. 

09:00-11:10, Paper TuAT8.29 

A Novel Multi-View Agglomerative Clustering Algorithm based on Ensemble of Partitions on Different Views 

Mirzaei, Hamidreza, SFU 

- 87 -

In this paper, we propose a new algorithm for extending the hierarchical clustering methods and introduce a Multi-View 

Agglomerative Clustering approach to handle multi-view represented objects. Experiments on real world datasets indicate 

that our algorithm considering the relationship among multiple views can provide a solution with improved quality in 

multi-view setting. We find empirically that the multi-view version of our Agglomerative Clustering, independent of 

linkage method and given any number of views, greatly improves on its single-view counterparts. 

09:00-11:10, Paper TuAT8.30 

Hydroacoustic Signal Classification using Kernel Functions for Variable Feature Sets 

Tuma, Matthias, Ruhr-Univ. Bochum 

Igel, Christian, Ruhr-Univ. Bochum 

Prior, Mark, Preparatory Commission for the CTBTO 

Large-scale geophysical monitoring systems raise the need for real-time feature extraction and signal classification. We 

study support vector machine (SVM) classification of hydroacoustic signals recorded by the Comprehensive Nuclear- 

Test-Ban Treaty’s verification network. Due to constraints in the early signal processing most samples have incomplete 

feature sets with values missing not at random. We propose kernel functions explicitly incorporating Boolean representations 

of the missingness pattern through dedicated sub-kernels. For kernels with more than a few parameters, gradientbased 

model selection algorithms were employed. In the case of binary classification, an increase in classification accuracy 

as compared to baseline SVM and linear classifiers was observed. In the multi-class case we evaluated four different formulations 

of multi-class SVMs. Here, neither SVMs with standard nor with problem-specific kernels outperformed a baseline 

linear discriminant analysis. 

09:00-11:10, Paper TuAT8.31 

Large Margin Discriminant Hashing for Fast K-Nearest Neighbor Classification 

Shibata, Tomoyuki, Toshiba Corp. 

Kubota, Susumu, Toshiba Corp. 

Ito, Satoshi, Toshiba Corp. 

Since the k-nearest neighbor (k-NN) classification is computationally demanding in terms of time and memory, approximate 

nearest neighbor (ANN) algorithms that utilize dimensionality reduction and hashing are gathering interest. Dimensionality 

reduction saves memory usage for storing training patterns and hashing techniques significantly reduce the 

computation required for distance calculation. Several ANN methods have been proposed which make k-NN classification 

applicable to those tasks that have a large number of training patterns with very high-dimensional feature. Though conventional 

ANN methods try to approximate Euclidean distance calculation in the original high-dimensional feature space 

with much lower-dimensional subspace, the Euclidean distance in the original feature space is not necessarily optimal for 

classification. According to the recent studies, metric learning is effective to improve accuracy of the k-NN classification. 

In this paper, Large Margin Discriminative Hashing (LMDH) method, which projects input patterns into low dimensional 

subspace with the optimized metric for the k-NN classification, is proposed. 

09:00-11:10, Paper TuAT8.32 

Robust Frame-To-Frame Hybrid Matching 

Chen, Lei, Beijing Inst. of Tech. 

Jia, Yunde, Beijing Inst. of Tech. 

Wang, Zhongli, Beijing Inst. of Tech. 

In this paper, we propose a hybrid approach for addressing feature-based matching problem. We aim to obtain robust and 

accurate correspondence between features from image frames under unknown and unstructured environments. The approach 

incorporates image texture analysis, 2-D analytic signal theory and color modeling. It takes advantage of geometric 

invariant property in texture and monogenic signal information as well as photometric invariant property in HSV color 

information. The detected features are well localized with high accuracy and the selected matches are robust to changes 

in scale, blur, viewpoint, and illumination. Experiments conducted on a standard benchmark dataset demonstrate the effectiveness 

and reliability of our approach. 

- 88 -

09:00-11:10, Paper TuAT8.33 

A Fast Extension for Sparse Representation on Robust Face Recognition 

Qiu, Hui-Ning, Sun Yat-sen Univ. 

Pham, Duc-Son, Curtin Univ. of Tech. 

Venkatesh, Svetha, Curtin Univ. of Tech 

Liu, Wanquan, Curtin Univ. of Tech. 

Lai, Jian-Huang, Sun Yat-sen Univ. 

We extend a recent Sparse Representation-based Classification (SRC) algorithm for face recognition to work on 2D images 

directly, aiming to reduce the computational complexity whilst still maintaining performance. Our contributions include: 

(1) a new 2D extension of SRC algorithm; (2) an incremental computing procedure which can reduce the eigen decomposition 

expense of each 2D-SRC for sequential input data; and (3) extensive numerical studies to validate the proposed 

methods. 

09:00-11:10, Paper TuAT8.34 

A MANOVA of Major Factors of RIU-LBP Feature for Face Recognition 

Luo, Jie, Shanghai Univ. 

Fang, Yuchun, Shanghai Univ. 

Cai, Qiyun, Shanghai Univ. 

Local Binary Patterns (LBP) feature is one of the most popular representation schemes for face recognition. The four 

factors deciding its effect are the blocking number, image resolution, the sampling radius and sampling density of LBP 

operator. Numerous previous researches have taken various groups of value of these factors based on experimental comparisons. 

However, which factor among them contributes the most? Numerous revisions are made to the LBP operators 

for it is believed that the LBP coding is the most essential factor. Is it true? In this paper, with the very simple and classical 

Multivariate Analysis of Variance (MANOVA), we discover that the blocking number contributes the most; though all 

four factors have significant effect for recognition rate. In addition, with the same analysis, we disclose the detailed effect 

of each factor and their interactions to the precision of LBP features. 

09:00-11:10, Paper TuAT8.35 

Consistent Estimators of Median and Mean Graph 

Jain, Brijnesh J., Berlin Univ. of Tech. 

Obermayer, Klaus, Berlin Univ. of Tech. 

The median and mean graph are basic building blocks for statistical graph analysis and unsupervised pattern recognition 

methods such as central clustering and graph quantization. This contribution provides sufficient conditions for consistent 

estimators of true but unknown central points of a distribution on graphs. 

09:00-11:10, Paper TuAT8.36 

Efficient Encoding of N-D Combinatorial Pyramids 

Fourey, Sébastien, GREYC Ensicaen & Univ. of Caen 

Brun, Luc, ENSICAEN 

Combinatorial maps define a general framework which allows to encode any subdivision of an n-D orientable quasi-manifold 

with or without boundaries. Combinatorial pyramids are defined as stacks of successively reduced combinatorial 

maps. Such pyramids provide a rich framework which allows to encode fine properties of objects (either shapes or partitions). 

Combinatorial pyramids have first been defined in 2D, then extended using n-D generalized combinatorial maps. 

We motivate and present here an implicit and efficient way to encode pyramids of n-D combinatorial maps. 

- 89 -

09:00-11:10, Paper TuAT8.37 

View-Invariant Object Recognition with Visibility Maps 

Raytchev, Bisser, Hiroshima Univ. 

Mino, Tetsuya, Hiroshima Univ. 

Tamaki, Toru, Hiroshima Univ. 

Kaneda, Kazufumi, Hiroshima Univ. 

In this paper we propose a new framework for view-invariant 3D object recognition, based on what we call Visibility Maps. 

A Visibility Map (VM) encodes a compact model of an arbitrary 3D object for which a set of images taken from different 

views is available. Representative local invariant features extracted from each image are selectively combined to form a visibility 

basis, in terms of which an arbitrary view of the modeled object can be represented. A metric which incorporates geometric 

information is also provided for comparing test images to the model, and can be used for recognition. 

09:00-11:10, Paper TuAT8.38 

Normalized Sum-Over-Paths Edit Distances 

García, Silvia, Univ. Catholique de Louvain 

Fouss, François, Facultés Univ. Catholiques de Mons 

Shimbo, Masashi, Graduate School of Information Science 

Saerens, Marco, Univ. Catholique de Louvain 

In this paper, normalized SoP string-edit distances, taking into account all possible alignments between two sequences, are 

investigated. These normalized distances are variants of the Sum-over-Paths (SoP) distances which compute the expected 

cost on all sequence alignments by favoring low-cost ones therefore favoring good alignment. Such distances consider two 

sequences tied by many optimal or nearly-optimal alignments as more similar than two sequences sharing only one, optimal, 

alignment. They depend on a parameter, and reduce to the standard distances the edit-distance or the longest common subsequence 

when 0, while having the same time complexity. This paper puts the emphasis on applying some type of normalization 

of the expectation of the cost. Experimental results for clustering and classification tasks performed on four OCR 

data sets show that (I) the applied normalization generally improves the existing results, and (ii) as for the SoP edit-distances, 

the normalized SoP edit-distances clearly outperform the non-randomized measures, i.e. the standard edit distance and longest 

common subsequence. 

09:00-11:10, Paper TuAT8.39 

Effective Multi-Level Image Representation for Image Categorization 

Li, Hao, Peking Univ. 

Peng, Yuxin, Peking Univ. 

This paper proposes a novel approach for image categorization based on effective multi-level image representation(MLIR). 

On one hand, to exploit fully the information of segmented regions at different levels in the image, we recursively segment 

the image into a hierarchical structure. On the other hand, to represent the information at different levels in a uniform manner, 

we construct a visual vocabulary based on the image regions of the hierarchical structure by a random sampling strategy. 

And the intermediate feature mapping is adopted to form a multi-level image representation, which encodes the information 

of the image at different levels, and can be very useful for distinguishing images from different categories. Experimental results 

on the widely used COREL data set have shown our proposed approach can achieve significant improvement compared 

with the state-of-the-art methods. 

09:00-11:10, Paper TuAT8.40 

Classification of Volcano Events Observed by Multiple Seismic Stations 


Orozco-Alzate, Mauricio, Univ. Nacional de Colombia Sede Manizales, Colombia 

Londoño-Bonilla, John Makario, Inst. Colombiano de Geología y Minería (INGEOMINAS), Colombia 

Seismic events in and around volcanos, like tremors, earth quakes, ice quakes and strokes of lightning, are usually observed 

by multiple stations. The question rises whether classifiers trained for one seismic station can be used for classifying observations 

by other stations, and, moreover, whether a combination of station signals improves the classification performances 

for a single station. We study this for seismic time signals represented by spectra and spectrograms obtained from 5 seismic 

stations on the Nevado del Ruiz in Colombia. 

- 90 -

09:00-11:10, Paper TuAT8.41 

A Variational Bayesian EM Algorithm for Tree Similarity 

Takasu, Atsuhiro, National Inst. of Informatics 

Fukagawa, Daiji, National Inst. of Informatics 

Akutsu, Tatsuya, Kyoto Univ. 

In recent times, a vast amount of tree-structured data has been generated. For mining, retrieving, and integrating such data, 

we need a fine-grained tree similarity measure that can be adapted to objective data. To achieve this goal, this paper (1) 

proposes a probabilistic generative model that generates pairs of similar trees, and (2) derives a learning algorithm for estimating 

the parameters of the model based on the variational Bayesian expectation maximization (VBEM) method. This 

method can handle rooted, ordered, and labeled trees. We show that the tree similarity model obtained via the BEM technique 

performs better than that obtained via maximum likelihood estimation by tuning the hyper parameters. 

09:00-11:10, Paper TuAT8.42 

Enhancing Image Classification with Class-Wise Clustered Vocabularies 

Wojcikiewicz, Wojciech, Fraunhofer Inst. FIRST 

Kawanabe, Motoaki, Fraunhofer FIRST and TU Berlin 

Binder, Alexander, Fraunhofer Inst. FIRST, Berlin 

In recent years bag-of-visual-words representations have gained increasing popularity in the field of image classification. 

Their performance highly relies on creating a good visual vocabulary from a set of image features (e.g. SIFT). For realworld 

photo archives such as Flicker, codebooks with larger than a few thousand words are desirable, which is infeasible 

by the standard k-means clustering. In this paper, we propose a two-step procedure which can generate more informative 

codebooks efficiently by class-wise k-means and a novel procedure for word selection. Our approach was compared favorably 

to the standard k-means procedure on the PASCAL VOC data sets. 

09:00-11:10, Paper TuAT8.43 

Efficiently Computing Optimal Consensus of Digital Line Fitting 

Kenmochi, Yukiko, Univ. Paris-Est 

Buzer, Lilian, ESIEE 

Talbot, Hugues, ESIEE 

Given a set of discrete points in a 2D digital image containing noise, we formulate our problem as robust digital line 

fitting. More precisely, we seek the maximum subset whose points are included in a digital line, called the optimal consensus. 

The paper presents an efficient method for exactly computing the optimal consensus by using the topological 

sweep, which provides us with the quadratic time complexity and the linear space complexity with respect to the number 

of input points. 

09:00-11:10, Paper TuAT8.44 

Learning a Joint Manifold Representation from Multiple Data Sets 

Torki, Marwan, Rutgers Univ. 

Elgammal, Ahmed, Rutgers Univ. 

Lee, Chan-Su, Yeungnam Univ. 

The problem we address in the paper is how to learn a joint representation from data lying on multiple manifolds. We are 

given multiple data sets and there is an underlying common manifold among the different data set. We propose a framework 

to learn an embedding of all the points on all the manifolds in a way that preserves the local structure on each manifold 

and, in the same time, collapses all the different manifolds into one manifold in the embedding space, while preserving 

the implicit correspondences between the points across different data sets. The proposed solution works as extensions to 

current state of the art spectral-embedding approaches to handle multiple manifolds. 

09:00-11:10, Paper TuAT8.45 

A Multi-Scale Approach to Decompose a Digital Curve into Meaningful Parts 

Nguyen, Thanh Phuong, LORIA 

Debled-Rennesson, Isabelle, LORIA – Nancy Univ. 

- 91 -

A multi-scale approach is proposed for polygonal representation of a digital curve by using the notion of blurred segment 

and a split-and-merge strategy. Its main idea is to decompose the curve into meaningful parts that are represented by detected 

dominant points at the appropriate scale. The method uses no threshold and can automatically decompose the curve 

into meaningful parts. 

09:00-11:10, Paper TuAT8.46 

A Memetic Algorithm for Selection of 3D Clustered Features with Applications in Neuroscience 

Björnsdotter, Malin, Univ. of Gothenburg 

Wessberg, Johan, Univ. of Gothenburg 

We propose a Memetic algorithm for feature selection in volumetric data containing spatially distributed clusters of informative 

features, typically encountered in neuroscience applications. The proposed method complements a conventional genetic 

algorithm with a local search utilizing inherent spatial relationships to efficiently identify informative feature clusters across 

multiple regions of the search volume. First, we demonstrate the utility of the algorithm on simulated data containing informative 

feature clusters of varying contrast-to-noise-ratios. The Memetic algorithm identified a majority of the relevant 

features whereas a conventional genetic algorithm detected only a subset sufficient for fitness maximization. Second, we 

applied the algorithm to authentic functional magnetic resonance imaging (fMRI) brain activity data from a motor task study, 

where the Memetic algorithm identified expected brain regions and subsequent brain activity prediction in new individuals 

was accurate at an average of 76% correct classification. The proposed algorithm constitutes a novel method for efficient 

volumetric feature selection and is applicable in any 3D data scenario. In particular, the algorithm is a promising alternative 

for sensitive brain activity mapping and decoding. 

09:00-11:10, Paper TuAT8.47 

Pose Estimation of Known Objects by Efficient Silhouette Matching 

Reinbacher, Christian, Graz Tech. Univ. 

Ruether, Matthias, Graz Univ. of Tech. 


Pose estimation is essential for automated handling of objects. In many computer vision applications only the object silhouettes 

can be acquired reliably, because untextured or slightly transparent objects do not allow for other features. We propose 

a pose estimation method for known objects, based on hierarchical silhouette matching and unsupervised clustering. The 

search hierarchy is created by an unsupervised clustering scheme, which makes the method less sensitive to parametrization, 

and still exploits spatial neighborhood for efficient hierarchy generation. Our evaluation shows a decrease in matching time 

of 80% compared to an exhaustive matching and scalability to large models. 

09:00-11:10, Paper TuAT8.48 

Learning Non-Linear Dynamical Systems by Alignment of Local Linear Models 

Joko, Masao, The Univ. of Tokyo 

Kawahara, Yoshinobu, Osaka Univ. 

Yairi, Takehisa, Univ. of Tokyo 

Learning dynamical systems is one of the important problems in many fields. In this paper, we present an algorithm for 

learning non-linear dynamical systems which works by aligning local linear models, based on a probabilistic formulation of 

subspace identification. Because the procedure for constructing a state sequence in subspace identification can be interpreted 

as the CCA between past and future observation sequences, we can derive a latent variable representation for this problem. 

Therefore, as in a similar manner to the recent works on learning a mixture of probabilistic models, we obtain a framework 

for constructing a state space by aligning local linear coordinates. This leads to a prominent algorithm for learning non-linear 

dynamical systems. Finally, we apply our method to motion capture data and show how our algorithm works well. 

09:00-11:10, Paper TuAT8.49 

A Column Generation Approach for the Graph Matching Problem 

Silva, Freire, Alexandre, Univ. of Sao Paulo 

Jr., R. M. Cesar, Univ. of Sao Paulo 

Ferreira, C.E., Univ. of Sao Paulo 

Graph matching plays a central role in different problems for structural pattern recognition. Examples of applications include 

- 92 -

matching 3D CAD models, shape matching and medical imaging, to name but a few. In this paper, we present a new integer 

linear formulation for the problem and employ a combinatorial optimization technique, called column generation, in order 

to solve instances of the problem. We also present computational experiments with generated instances. 

09:00-11:10, Paper TuAT8.50 

Pattern Recognition using Functions of Multiple Instances 

Zare, Alina, Univ. of Florida 

Gader, Paul, Univ. of Florida 

The Functions of Multiple Instances (FUMI) method for learning a target prototype from data points that are functions of 

target and non-target prototypes is introduced. In this paper, a specific case is considered where, given data points which are 

convex combinations of a target prototype and several non-target prototypes, the Convex-FUMI (C-FUMI) method learns 

the target and non-target patterns, the number of nontarget patterns, and determines the weights (or proportions) of all the 

prototypes for each data point. For this method, training data need only binary labels indicating whether the data contains or 

does not contain some proportion of the target prototype; the specific target weights for the training data are not needed. 

After learning the target prototype using the binary labeled training data, target detection is performed on test data. Results 

showing detection of the skin in hyper spectral imagery and sub-pixel target detection in simulated data are presented. 

09:00-11:10, Paper TuAT8.51 

Linear Decomposition of Planar Shapes 

Faure, Alexandre, LAIC Univ. d’Auvergne 

Feschet, Fabien, Univ. d’Auvergne Clermont-Ferrand 1 

The issue of decomposing digital shapes into sets of digital primitives has been widely studied over the years. Practically all 

existing approaches require perfect or cleaned shapes. Those are obtained using various pre-processing techniques such as 

thinning or skeletonization. The aim of this paper is to bypass the use of such pre-processings, in order to obtain decompositions 

of shapes directly from connected components. This method has the advantage of taking into account the intrinsic 

thickness of digital shapes, and provides a decomposition which is also robust to 

09:00-11:10, Paper TuAT8.52 

Sketched Symbol Recognition with a Latent-Dynamic Conditional Model 

Deufemia, Vincenzo, Univ. di Salerno 

Risi, Michele, Univ. of Salerno 

Tortora, Genoveffa, Univ. di Salerno 

In this paper we present a recognizer of sketched symbols based on Latent-Dynamic Conditional Random Fields (LDCRF), 

a discriminative model for sequence classification. The LDCRF model classifies unsegmented sequences of strokes into domain 

symbols by taking into account contextual and temporal information. In particular, LDCRFs learn the extrinsic dynamics 

among strokes by modeling a continuous stream of symbol labels, and learn internal stroke sub-structure by using intermediate 

hidden states. The performance of our work is evaluated in the electric circuit domain. 

09:00-11:10, Paper TuAT8.53 

Canonical Patterns of Oriented Topologies 

Mankowski, Walter, Drexel Univ. 

Shokoufandeh, Ali, Drexel Univ. 

Salvucci, Dario, Drexel Univ. 

A common problem in many areas of behavioral research is the analysis of the large volume of data recorded during the execution 

of the tasks being studied. Recent work has proposed the use of an automated method based on canonical sets to 

identify the most representative patterns in a large data set, and described an initial experiment in identifying canonical webbrowsing 

patterns. However, there is a significant limitation to the method: it requires the similarity matrix to be symmetric, 

and thus can only be used for problems that can be modeled as unoriented topologies. In this paper we propose a novel enhancement 

to the method to support oriented topologies by allowing the similarity matrix to be nonsymmetric. We demonstrate 

the power of this new technique by applying the new method to find canonical lane changes in a driving simulator experiment. 

- 93 -

09:00-11:10, Paper TuAT8.54 

Hierarchical Anomality Detection based on Situation 

Nishio, Shuichi, Advanced Telecommunication Res. Inst. International 

Okamoto, Hiromi, Nara Women’s Univ. 


In this paper, we propose a novel anomality detection method based on external situational information and hierarchical 

analysis of behaviors. Past studies model normal behaviors to detect anomality as outliers. However, normal behaviors tend 

to differ by situations. Our method combines a set of simple classifiers with pedestrian trajectories as inputs. As mere path 

information is not sufficient for detecting anomality, trajectories are first decomposed into hierarchical features of different 

abstract levels and then applied to appropriate classifiers corresponding to the situation it belongs to. Effects of the methods 

are tested using real environment data. 

09:00-11:10, Paper TuAT8.55 

Image Classification using Subgraph Histogram Representation 

Ozdemir, Bahadir, Bilkent Univ. 


We describe an image representation that combines the representational power of graphs with the efficiency of the bag-ofwords 

model. For each image in a data set, first, a graph is constructed from local patches of interest regions and their spatial 

arrangements. Then, each graph is represented with a histogram of sub graphs selected using a frequent subgraph mining algorithm 

in the whole data. Using the sub graphs as the visual words of the bag-of-words model and transforming of the 

graphs into a vector space using this model enables statistical classification of images using support vector machines. Experiments 

using images cut from a large satellite scene show the effectiveness of the proposed representation in classification 

of complex types of scenes into eight high-level semantic classes. 

09:00-11:10, Paper TuAT8.56 

Oriented Boundary Graph: A Framework to Design and Implement 3D Segmentation Algorithms 

Baldacci, Fabien, Univ. de Bordeaux 

Braquelaire, Achille, Univ. de Bordeaux 

Domenger, Jean Philippe, Univ. de Bordeaux 

In this paper we show the interest of a topological model to represent 3D segmented image which is a good compromise between 

the complete but time consuming representations and the partial but not expressive enough ones. We show that this 

model, called Oriented Boundary Graph, provides an effective framework for both volumic image analysis and segmentation. 

The Oriented Boundary Graph provides an efficient implementation of a set of primitives suitable for the design complex 

segmentation algorithms and to implement the computation of the segmented image characteristics needed by such algorithms. 

We first present the framework and give the time complexity of its main primitives. Then, we give some examples of the use 

of this framework in order to efficiently design non-trivial image analysis operations and image segmentation algorithms. 

Those examples are applied on 3D CT-scan data. 

09:00-11:10, Paper TuAT8.57 

Hierarchical Segmentation of Complex Structures 

Akcay, Huseyin Gokhan, Bilkent Univ. 



We present an unsupervised hierarchical segmentation algorithm for detection of complex heterogeneous image structures 

that are comprised of simpler homogeneous primitive objects. An initial segmentation step produces regions corresponding 

to primitive objects with uniform spectral content. Next, the transitions between neighboring regions are modeled and clustered. 

We assume that the clusters that are dense and large enough in this transition space can be considered as significant. 

Then, the neighboring regions belonging to the significant clusters are merged to obtain the next level in the hierarchy. The 

experiments show that the algorithm that iteratively clusters and merges region groups is able to segment high-level complex 

structures in a hierarchical manner. 

- 94 -

TuAT9 Upper Foyer 

Biometrics Poster Session 

Session chair: Dobrišek, Simon (University of Ljubljana) 

09:00-11:10, Paper TuAT9.1 

Image Specific Error Rate: A Biometric Performance Metric 

Tabassi, Elham, NIST 

Image-specific false match and false non-match error rates are defined by inheriting concepts from the biometric zoo. These 

metrics support failure mode analyses by allowing association of a covariate (e.g., dilation for iris recognition) with a matching 

error rate without having to consider the covariate of a comparison image. Image-specific error rates are also useful in detection 

of ground truth errors in test datasets. Images with higher image-specific error rates are more ``difficult’’ to recognize, 

so these metrics can be used to assess the level of difficulty of test corpora or partition a corpus into sets with varying level 

of difficulty. Results on use of image-specific error rates for ground-truth error detection, covariate analysis and corpus partitioning 

is presented. 

09:00-11:10, Paper TuAT9.2 

Low Cost and Usable Multimodal Biometric System based on Keystroke Dynamics and 2D Face Recognition 

Giot, Romain, Univ. de Caen, Basse-Normandie – CNRS 

Hemery, Baptiste, Univ. de CAEN 

Rosenberger, Christophe, Lab. GREYC 

We propose in this paper a low cost multimodal biometric system combining keystroke dynamics and 2D face recognition. 

The objective of the proposed system is to be used while keeping in mind: good performances, acceptability, and espect of 

privacy. Different fusion methods have been used (min, max, mul, svm, weighted sum configured with genetic algorithms, 

and, genetic programming) on the scores of three keystroke dynamics algorithms and two 2D face recognition ones. This 

multimodal biometric system improves the recognition rate in comparison with each individual method. On a chimeric database 

composed of 100 individuals, the best keystroke dynamics method obtains an EER of 8.77%, the best face recognition 

one has an EER of 6.38%, while the best proposed fusion system provides an EER of 2.22%. 

09:00-11:10, Paper TuAT9.3 

Parallel versus Hierarchical Fusion of Extended Fingerprint Features 

Zhao, Qijun, The Hong Kong Pol. Univ. 

Liu, Feng, The Hong Kong Pol. Univ. 



Extended fingerprint features such as pores, dots and incipient ridges have been increasingly attracting attention from researchers 

and engineers working on automatic fingerprint recognition systems. A variety of methods have been proposed to 

combine these features with the traditional minutiae features. This paper comparatively analyses the parallel and hierarchical 

fusion approaches on a high resolution fingerprint image dataset. Based on the results, a novel and more effective hierarchical 

approach is presented for combining minutiae, pores, dots and incipient ridges. 

09:00-11:10, Paper TuAT9.4 

Feature Band Selection for Multispectral Palmprint Recognition 

Guo, Zhenhua, The Hong Kong Pol. Univ. 



Palm print is a unique and reliable biometric characteristic with high usability. Many palm print recognition algorithms and 

systems have been successfully developed in the past decades. Most of the previous works use the white light sources for illumination. 

Recently, it has been attracting much research attention on developing new biometric systems with both high 

accuracy and high anti-spoof capability. Multispectral palm print imaging and recognition can be a potential solution to such 

systems because it can acquire more discriminative information for personal identity recognition. One crucial step in developing 

such systems is how to determine the minimal number of spectral bands and select the most representative bands to 

build the multispectral imaging system. This paper presents preliminary studies on feature band selection by analyzing hyper 

- 95 -

spectral palm print data (420nm~1100nm). Our experiments showed that 2 spectral bands at 700nm and 960nm could provide 

most discriminate information of palm print. This finding could be used as the guidance for designing multispectral palm 

print systems in the future. 

09:00-11:10, Paper TuAT9.5 

Automatic Gender Recognition using Fusion of Facial Strips 

Lee, Ping-Han, National Taiwan Univ. 

Hung, Jui-Yu, National Taiwan Univ. 


We propose a fully automatic system that detects and normalizes faces in images and recognizes their genders. To boost the 

recognition accuracy, we correct the in-plane and out-of-plane rotations of faces, and align faces based on estimated eye positions. 

To perform gender recognition, a face is first decomposed into several horizontal and vertical strips. Then, a regression 

function for each strip gives an estimation of the likelihood the strip sample belongs to a specific gender. The likelihoods 

from all strips are concatenated to form a new feature, based on which a gender classifier gives the final decision. The proposed 

approach achieved an accuracy of 88.1% in recognizing genders of faces in images collected from the World-Wide 

Web. For faces in the FERET dataset, our system achieved an accuracy of 98.8%, outperforming all the six state-of-the-art 

algorithms compared in this paper 

09:00-11:10, Paper TuAT9.6 

Benchmarking Local Orientation Extraction in Fingerprint Recognition 

Cappelli, Raffaele, Univ. of Bologna 

Maltoni, Davide, Univ. of Bologna 

Turroni, Francesco, Univ. of Bologna 

The computation of local orientations is a fundamental step in fingerprint recognition. Although a large number of approaches 

have been proposed in the literature, no systematic quantitative evaluations have been done yet, mainly due to the lack of 

proper datasets with associated ground truth information. In this paper we propose a new benchmark (which includes two 

datasets and an accuracy metric) and report preliminary results obtained by testing four well-known local orientation extraction 

algorithms. 

09:00-11:10, Paper TuAT9.7 

Efficient Finger Vein Localization and Recognition 

Li, Xu, Civil Aviation Univ. of China 

Yang, Jinfeng, Civil Aviation Univ. of China 

In order to achieve accurate recognition of human finger vein (FV), this paper addresses the problems of finger vein localization 

and vein feature extraction. An inherent physical property of human fingers is used to localize the region of interest 

(ROI) of vein images as well as removing uninformative vein imagery based on the inter-phalangeal joint prior. In addition, 

vein images are characterized as a series of energy features through steerable filters. Experimental results show the promising 

performance of the proposed algorithm for human vein identification. 

09:00-11:10, Paper TuAT9.8 

Learning the Relationship between High and Low Resolution Images in Kernel Space for Face Super Resolution 

Zou, Wilman, W W, Hong Kong Baptist Univ. 

Yuen, Pong C, Hong Kong Baptist Univ. 

This paper proposes a new nonlinear face super resolution algorithm to address an important issue in face recognition from 

surveillance video namely, recognition of low resolution face image with nonlinear variations. The proposed method learns 

the nonlinear relationship between low resolution face image and high resolution face image in (nonlinear) kernel feature 

space. Moreover, the discriminative term can be easily included in the proposed framework. Experimental results on CMU- 

PIE and FRGC v2.0 databases show that proposed method outperforms existing methods as well as the recognition based on 

high resolution images. 

- 96 -

09:00-11:10, Paper TuAT9.9 

Robust Regression for Face Recognition 

Naseem, Imran, The Univ. of Western Australia 

Togneri, Roberto, The Univ. of Western Australia 

Bennamoun, Mohammed, The Univ. of Western Australia 

In this paper we address the problem of illumination invariant face recognition. Using a fundamental concept that in 

general, patterns from a single object class lie on a linear subspace [2], we develop a linear model representing a probe 

image as a linear combination of class-specific galleries. In the presence of noise, the well-conditioned inverse problem 

is solved using the robust Huber estimation and the decision is ruled in favor of the class with the minimum reconstruction 

error. The proposed Robust Linear Regression Classification (RLRC) algorithm is extensively evaluated for two standard 

databases and has shown good performance index compared to the state-of-art robust approaches. 

09:00-11:10, Paper TuAT9.10 

Recognition of Blurred Faces via Facial Deblurring Combined with Blur-Tolerant Descriptors 

Hadid, Abdenour, Univ. of Oulu 

Nishiyama, Masashi, Toshiba Corp. 

Sato, Yoichi, Univ. of Tokyo 

Blur is often present in real-world images and significantly affects the performance of face recognition systems. To improve 

the recognition of blurred faces, we propose a new approach which inherits the advantages of two recent methods. The 

idea consists of first reducing the amount of blur in the images via deblurring and then extracting blur-tolerant descriptors 

for recognition. We assess our analysis on real blurred face images (FRGC 1.0 database) and also on face images artificially 

degraded by focus blur (FERET database), demonstrating significant performance enhancement compared to the state-ofthe-art. 

09:00-11:10, Paper TuAT9.11 

Diffusion-Based Face Selective Smoothing in DCT Domain to Illumination Invariant Face Recognition 

Ezoji, Mehdi, Amirkabir Univ. of Tech. 

Faez, Karim, Amirkabir Univ. of Tech. 

In this paper, a diffusion-based iterative algorithm is proposed for illumination invariant face representation using image 

selective smoothing in DCT domain. In fact, we split the image I into three parts (R+w)+L of an illumination invariant 

component, an oscillating component and a smooth component. At each iteration, the influence of different frequency 

sub-bands of image is determined and the additive oscillating component is reduced. The experimental results confirmed 

that our approach provides a suitable representation for overcoming illumination variations. 

09:00-11:10, Paper TuAT9.12 

BioHashing for Securing Fingerprint Minutiae Templates 

Belguechi, Rima, National School of Computer Science 


Ait Aoudia, Samy, National School of Computer Science 

The storage of fingerprints is an important issue as this biometric modality is more and more deployed for real applications. 

The a prori impossibility to revoke a biometric template (like a password) in case of theft, is a major concern for privacy 

reasons. We propose in this paper a new method to secure fingerprint minutiae templates by storing a bio code while keeping 

good recognition results. We show the efficiency of the method in comparison to some published methods for different 

scenarios. 

09:00-11:10, Paper TuAT9.13 

Fusion of an Isometric Deformation Modeling Approach using Spectral Decomposition and a Region-Based Approach 

using ICP for Expression-Invariant 3D Face Recognition 

Smeets, Dirk, K.U.Leuven 

Fabry, Thomas, K.U.Leuven 

Hermans, Jeroen, K.U.Leuven 

- 97 -

Vandermeulen, Dirk 

Suetens, Paul, K.U.Leuven 

The recognition of faces under varying expressions is one of the current challenges in the face recognition community. In 

this paper, we propose a method fusing different complementary approaches each dealing with expression variations. The 

first approach uses an isometric deformation model and is based on the largest singular values of the geodesic distance 

matrix as an expression-invariant shape descriptor. The second approach performs recognition on the more rigid parts of 

the face that are less affected by expression variations. Several fusion techniques are examined for combining the approaches. 

The presented method is validated on a subset of 900 faces of the BU-3DFE face database resulting in an equal 

error rate of 5.85% for the verification scenario and a rank 1 recognition rate of 94.48% for the identification scenario 

using the sum rule as fusion technique. This result outperforms other 3D expression-invariant face recognition methods 

on the same database. 

09:00-11:10, Paper TuAT9.14 

Towards a Best Linear Combination for Multimodal Biometric Fusion 

Chia, Chaw, Chia, Chaw, Nottingham Trent Univ. 

Sherkat, Nasser, Nottingham Trent Univ. 

Nolle, Lars, Nottingham Trent Univ. 

Owing to effectiveness and ease of implementation Sum rule has been widely applied in the biometric research field. Different 

matcher information has been used as weighting parameters in the weighted Sum rule. In this work, a new parameter 

has been devised in reducing the genuine/imposter distribution overlap. It is shown that the overlap region width has the 

best generalization performance as the weighting parameter amongst other commonly used matcher information. Furthermore, 

it is illustrated that the equal weighted Sum rule can generally perform better than the Equal Error Rate and d-prime 

weighted Sum rule. The publicly available databases: the NIST-BSSR1 multimodal biometric and Xm2vts score sets have 

been used. 

09:00-11:10, Paper TuAT9.15 

Slap Fingerprint Segmentation for Live-Scan Devices and Ten-Print Cards 

Zhang, Yongliang, Zhejiang Univ. of Technology 

Xiao, Gang, Zhejiang Univ. of Technology 

Li, Yanmiao, Jiaotong Univ. Dalian 

Wu, Hongtao, Hebei Univ. of Tech. 

Huang, Yaping, Zhejiang Univ. of Technology 

Presented here is a highly accurate and computationally efficient algorithm suitable for slap fingerprint segmentation. The 

main advantages of this algorithm are as follows: 1)three-order cumulant is used to roughly segment the foreground; 2)frequency 

domain analysis is carried out in local areas to do binarization and fine segmentation; 3)cumulative sum analysis 

is applied to extract the knuckle lines; 4)two shape features of the ellipse are adapted to calculate the confidence of each 

fingertip candidate. Experimental results show that the algorithm has the characteristic of more robustness against noise 

and superior precision, not only for live-scan four finger slaps but also for ten-print-card five finger slaps. 

09:00-11:10, Paper TuAT9.16 

A Metric of Information Gained through Biometric Systems 

Takahashi, Kenta, Hitachi Ltd. 

Murakami, Takao, Hitachi Ltd. 

We propose a metric of information gained through biometric matching systems. Firstly, we discuss how the information 

about the identity of a person is derived from biometric samples through a biometric system, and define the “biometric 

system entropy” or BSE. Then we prove that the BSE can be approximated asymptotically by the Kullback-Leibler divergence 

D(f_G(x) || f_I(x)) where f_G(x), f_I(x) are PDFs of matching scores between samples from an individuals and 

among population. We also discuss how to evaluate D(f_G || f_I) of a biometric system and show a numerical example of 

face and fingerprint matching systems. 

- 98 -

09:00-11:10, Paper TuAT9.17 

Probabilistic Measure for Signature Verification based on Bayesian Learning 

Pu, Danjun, State Univ. of New York at Buffalo 

Srihari, Sargur 

Signature verification is a common task in forensic document analysis. The goal is to make a decision whether a questioned 

signature belongs to a set of known signatures of an individual or not. In a typical forgery case a very limited number of 

known signatures may be available, with as few as four or five knowns \cite{Stev95}. Here we describe a fully Bayesian 

approach which overcomes the limitation of having too few genuine samples. The algorithm has three steps: Step 1: Learn 

prior distributions of parameters from a population of known signatures; Step 2: Determine the posterior distributions of 

parameters using the genuine samples of a particular person; Step 3: Determine probabilities of the query from both genuine 

and forgery classes and the Log Likelihood Ratio (LLR) of the query. Rather than give a hard decision, this method provides 

a probabilistic measure LLR of the decision and the performance of the Bayesian Learning is improved especially in the 

case of limited known samples. 

09:00-11:10, Paper TuAT9.18 

Gender Classification using on Single Frontal Image Per Person: Combination of Appearance and Geometric based 

Features 

Mozaffari, Saeed, Semnan Univ. 

Behravan, Hamid, Semnan Univ. 

Akbari, Rohollah, Qazvin Azad Univ. 

Today, many social interactions and services depend on gender. In this paper, we introduce a single image gender classification 

algorithm using combination of appearance-based and geometric-based features. These include Discrete Cosine 

Transform (DCT), and Local Binary Pattern (LBP), and geometrical distance feature (GDF). The novel feature, GDF proposed 

in this paper, is inspired from physiological differences between male and female faces. Combination of appearance-based 

features (DCT and LBP) with geometric-based feature (GDF) leads to higher gender classification accuracy. 

Our system estimates gender of the input image based on the majority rule. If the results of DCT and LBP features are not 

identical, gender classification will be based on GDF feature. The proposed method was evaluated on two databases: AR 

and ethnic. Experimental results show that the novel geometric feature improves the gender classification accuracy by 

13%. 

09:00-11:10, Paper TuAT9.19 

Residual Analysis for Fingerprint Orientation Modeling 

Jirachaweng, Suksan, Kasetsart Univ. 

Hou, Zujun, Inst. For Infocomm Res. 

Li, Jun, Inst. For Infocomm Res. 

Yau, Wei-Yun, Inst. For Infocomm Res. 

Areekul, Vutipong, Kasetsart Univ. 

This paper presents a novel method for fingerprint orientation modeling, which executes in two phases. Firstly, the orientation 

field is reconstructed through fitting to a lower order Legendre polynomial basis to capture the global orientation 

pattern. Then the preliminary model around the singular region is dynamically refined by fitting to a higher order Legendre 

polynomial basis. The singular region is automatically detected through the analysis on the orientation residual field between 

the original orientation field and the orientation model. The method has been evaluated using the FVC 2004 data 

sets and compared with state-of-the-arts. Experiments turn out that the propose method attains higher accuracy in fingerprint 

matching and singularity preservation. 

09:00-11:10, Paper TuAT9.20 

Dynamic Amelioration of Resolution Mismatches for Local Feature based Identity Inference 

Wong, Yongkang, NICTA 


Mau, Sandra, NICTA 


While existing face recognition systems based on local features are robust to issues such as misalignment, they can exhibit 

- 99 -

accuracy degradation when comparing images of differing resolutions. This is common in surveillance environments 

where a gallery of high resolution mugshots is compared to low resolution CCTV probe images, or where the size of a 

given image is not a reliable indicator of the underlying resolution (e.g. poor optics). To alleviate this degradation, we 

propose a compensation framework which dynamically chooses the most appropriate face recognition system for a given 

pair of image resolutions. This framework applies a novel resolution detection method which does not rely on the size of 

the input images, but instead exploits the sensitivity of local features to resolution using a probabilistic multi-region histogram 

approach. Experiments on a resolution-modified version of the “Labeled Faces in the Wild” dataset show that the 

proposed resolution detector frontend obtains a 99% average accuracy in selecting the most appropriate face recognition 

system, resulting in higher overall face discrimination accuracy (across several resolutions) compared to the individual 

baseline face recognition systems. 

09:00-11:10, Paper TuAT9.21 

Patch-Based Similarity HMMs for Face Recognition with a Single Reference Image 

Vu, Ngoc-Son, Gipsa-Lab. 

Caplier, Alice, GIPSA-Lab. Grenoble Univ. 

In this paper we present a new architecture for face recognition with a single reference image, which completely separates 

the training process from the recognition process. In the training stage, by using a database containing various individuals, 

the spatial relations between face components are represented by two Hidden Markov Models (HMMs), one modeling 

within-subject similarities, and the other modeling inter-subject differences. This allows us during the recognition stage 

to take a pair of face images, neither of which has been seen before, and to determine whether or not they come from the 

same individual. Whilst other face-recognition HMMs use Maximum Likelihood criterion, we test our approach using 

both Maximum Likelihood and Maximum a Posteriori (MAP) criterion, and find that MAP provides better results. Importantly, 

the training database can be entirely separated from the gallery and test images: this means that adding new individuals 

to the system can be done without re-training. We present results based upon models trained on the FERET training 

dataset, and demonstrate that these give satisfactory recognition rates on both the FERET database itself and more impressively 

the unseen AR database. When compared to other HMM based face recognition techniques, our algorithm is of 

much lower complexity due to the small size of our observation sequence. 

09:00-11:10, Paper TuAT9.22 

How to Control Acceptance Threshold for Biometric Signatures with Different Confidence Values? 


Hossain, Md. Altab, Osaka Univ. 


In the biometric verification, authentication is given when a distance of biometric signatures between enrollment and test 

phases is less than an acceptance threshold, and the performance is usually evaluated by a so-called Receiver Operating 

Characteristics (ROC) curve expressing a trade off between False Rejection Rate (FRR) and False Acceptance Rate (FAR). 

On the other hand, it is also well known that the performance is significantly affected by the situation differences between 

enrollment and test phases. This paper describes a method to adaptively control an acceptance threshold with quality measures 

derived from situation differences so as to optimize the ROC curve. We show that the optimal evolution of the adaptive 

threshold in the domain of the distance and quality measure is equivalent to a constant evolution in the domain of the error 

gradient defined as a ratio of a total error rate to a total acceptance rate. An experiment with simulation data demonstrates 

that the proposed method outperforms the previous methods, particularly under a lower FAR or FRR tolerance condition. 

09:00-11:10, Paper TuAT9.23 

Binary Representations of Fingerprint Spectral Minutiae Features 

Xu, Haiyun, Univ. of Twente 

Veldhuis, Raymond, Univ. of Twente 

A fixed-length binary representation of a fingerprint has the advantages of a fast operation and a small template storage. 

For many biometric template protection schemes, a binary string is also required as input. The spectral minutiae representation 

is a method to represent a minutiae set as a fixed-length real-valued feature vector. In order to be able to apply the 

spectral minutiae representation with a template protection scheme, we introduce two novel methods to quantize the 

spectral minutiae features into binary strings: Spectral Bits and Phase Bits. The experiments on the FVC2002 database 

show that the binary representations can even outperformed the spectral minutiae real-valued features. 

- 100 -

09:00-11:10, Paper TuAT9.24 

Attacking Iris Recognition: An Efficient Hill-Climbing Technique 

Rathgeb, Christian, Univ. of Salzburg 

Uhl, Andreas, Univ. of Salzburg 

In this paper we propose a modified hill-climbing attack to iris biometric systems. Applying our technique we are able to 

effectively gain access to iris biometric systems at very low effort. Furthermore, we demonstrate that reconstructing approximations 

of original iris images is highly non-trivial. 

09:00-11:10, Paper TuAT9.25 

Face Recognition at-a-Distance using Texture, Dense- and Sparse-Stereo Reconstruction 

Rara, Ham, CVIP Lab. Univ. of Louisville 

Ali, Asem, Univ. of Louisville 

Elhabian, Shireen, Univ. of Louisville 

Starr, Thomas, Univ. of Louisville 

Farag, Aly A., Univ. of Louisville 

This paper introduces a framework for long-distance face recognition using dense and sparse stereo reconstruction, with 

texture of the facial region. Two methods to determine correspondences of the stereo pair are used in this paper: (a) dense 

global stereo-matching using maximum-a-posteriori Markov Random Fields (MAP-MRF) algorithms and (b) Active Appearance 

Model (AAM) fitting of both images of the stereo pair and using the fitted AAM mesh as the sparse correspondences. 

Experiments are performed using combinations of different features extracted from the dense and sparse 

reconstructions, as well as facial texture. The cumulative rank curves (CMC), which are generated using the proposed 

framework, confirms the feasibility of the proposed work for long distance recognition of human faces. 

09:00-11:10, Paper TuAT9.26 

Automatic Asymmetric 3D-2D Face Recognition 

Huang, Di, Ec. Centrale de Lyon 


Wang, Yunhong, Beihang Univ. 


3D Face recognition has been considered as a major solution to deal with unsolved issues of reliable 2D face recognition 

in recent years, i.e. lighting and pose variations. However, 3D techniques are currently limited by their high registration 

and computation cost. In this paper, an asymmetric 3D-2D face recognition method is presented, enrolling in textured 3D 

whilst performing automatic identification using only 2D facial images. The goal is to limit the use of 3D data to where it 

really helps to improve face recognition accuracy. The proposed approach contains two separate matching steps: Sparse 

Representation Classifier (SRC) is applied to 2D-2D matching, while Canonical Correlation Analysis (CCA) is exploited 

to learn the mapping between range LBP faces (3D) and texture LBP faces (2D). Both matching scores are combined for 

the final decision. Moreover, we propose a new preprocessing pipeline to enhance robustness to lighting and pose effects. 

The proposed method achieves better experimental results in the FRGC v2.0 dataset than 2D methods do, but avoiding 

the cost and inconvenience of data acquisition and computation of 3D approaches. 

09:00-11:10, Paper TuAT9.27 

Model and Score Adaptation for Biometric Systems: Coping with Device Interoperability and Changing Acquisition 

Conditions 

Poh, Norman, Univ. of Surrey 

Kittler, Josef, Univ. of Surrey 

Marcel, Sebastien, IDIAP Res. Inst. EPFL 

Matrouf, Driss, Univ. d’Avignon et des Pays de Vaucluse 

Bonastre, Jean-Francois, Univ. d’Avignon et des Pays de Vaucluse 

The performance of biometric systems can be significantly affected by changes in signal quality. In this paper, two types 

of changes are considered: change in acquisition environment and in sensing devices. We investigated three solutions: (I) 

model-level adaptation, (ii) score-level adaptation (normalisation), and (iii) the combination of the two, called compound 

adaptation. In order to cope with the above changing conditions, the model-level adaptation attempts to update the param- 

- 101 -

eters of the expert systems (classifiers). This approach requires the authenticity of the candidate samples used for adaptation 

be known (corresponding to supervised adaptation), or can be estimated (unsupervised adaptation). In comparison, the 

score-level adaptation merely involves post processing the expert output, with the objective of rendering the associated 

decision threshold to be dependent only on the class priors despite the changing acquisition conditions. Since the above 

adaptation strategies treat the underlying biometric experts/classifiers as a black-box, they can be applied to any unimodal 

or multimodal biometric system, thus facilitating system-level integration and performance optimisation. Our contributions 

are: (I) proposal of compound adaptation; (ii) investigation and comparison of two different quality-dependent score normalisation 

strategies; and, (iii) empirical comparison of the merit of the above three solutions on the BANCA face (video) 

and speech database. 

09:00-11:10, Paper TuAT9.28 

Online Boosting OC for Face Recognition in Continuous Video Stream 

Huo, Hongwen, Peking Univ. 

Feng, Jufu, Peking Univ. 

In this paper, we present a novel online face recognition approach for video stream called online boosting OC (output 

code). Recently, boosting was successfully used in many study fields such as object detection and tracking. It is one kind 

of large margin classifiers for binary classification problems and also efficient for on-line learning. However, face recognition 

is a typical multi-class problem. Hence, it is difficult to use boosting in face recognition, especially in an online 

version. In our work, we combine online boosting and OC algorithm to solve real-time online multi-class classification 

problems. We perform online boosting OC on real-world experiments: face recognition in continuous video stream, and 

the results show that our algorithm is accurate and robust. 

09:00-11:10, Paper TuAT9.29 

On the Dimensionality Reduction for Sparse Representation based Face Recognition 


Yang, Meng, The Hong Kong Pol. Univ. 

Feng, Zhizhao, The Hong Kong Pol. Univ. 


Face recognition (FR) is an active yet challenging topic in computer vision applications. As a powerful tool to represent 

high dimensional data, recently sparse representation based classification (SRC) has been successfully used for FR. This 

paper discusses the dimensionality reduction (DR) of face images under the framework of SRC. Although one important 

merit of SRC is that it is insensitive to DR or feature extraction, a well trained projection matrix can lead to higher FR rate 

at a lower dimensionality. An SRC oriented unsupervised DR algorithm is proposed in this paper and the experimental results 

on benchmark face databases demonstrated the improvements brought by the proposed DR algorithm over PCA or 

random projection based DR under the SRC framework. 

09:00-11:10, Paper TuAT9.30 

Improved Fingerprint Image Segmentation and Reconstruction of Low Quality Areas 

Mieloch, Krzysztof, Univ. of Goettingen 

Munk, Axel, Univ. of Goettingen 

Mihailescu, Preda, Univ. of Goettingen 

One of the main reason for false recognition is noise added to fingerprint images during the acquisition step. Hence, the 

improvement of the enhancement step affects general accuracy of automatic recognition systems. In one of our previous 

publications we introduced hierarchically linked extended features – the new set of features which not only includes additional 

fingerprint features individually but also contains the information about their relationships such as line adjacency 

information at minutiae points or links between neighbouring fingerprint lines. In this work we present the application of 

the extended features to preprocessing and enhancement. We use structural information for improving the segmentation 

step, as well as connecting disrupted fingerprint lines and recovering missing minutiae. Experiments show a decrease in 

the error rate in matching. 

- 102 -

09:00-11:10, Paper TuAT9.31 

An Efficient Method for Offline Text Independent Writer Identification 

Ghiasi, Golnaz, Amirkabir Univ. of Tech. 

Safabakhsh, Reza, Amirkabir Univ. of Tech. 

This paper proposes, an efficient method for text independent writer identification using a codebook. The occurrence histogram 

of the shapes in the codebook is used to create a feature vector for the handwriting. There is a wide variety of different 

shapes in the connected components obtained from handwriting. Small fragments of connected components should 

be used to avoid complex patterns. A new and more efficient method is introduced for this purpose. To evaluate the methods, 

writer identification is conducted on three varieties of a Farsi database. These varieties include texts of short, medium and 

large lengths. Experimental results show the efficiency of the method especially for short texts. 

09:00-11:10, Paper TuAT9.32 

Study on Color Spaces for Single Image Enrolment Face Authentication 

Hemery, Baptiste, Univ. de CAEN 

Schwartzmann, Jean-Jacques, Orange Lab. 


We propose in this paper to study different color spaces for representing an image for the face authentication application. 

We used a generic algorithm based on a matching of keypoints using sift descriptors computed on one color component. 

Ten color spaces have been studied on four large and signicant benchmark databases (ENSIB, FACES94, AR and FERET). 

We show that all color spaces do not provide the same efficiency and the use of the color information allows an interesting 

improvement of verification results. 

09:00-11:10, Paper TuAT9.33 

Estimation of Fingerprint Orientation Field by Weighted 2D Fourier Expansion Mode 

Tao, Xunqiang, Chinese Acad. of Sciences 

Yang, Xin, Chinese Acad. of Sciences 

Cao, Kai, Chinese Acad. of Sciences 

Wang, Ruifang, Chinese Acad. of Sciences 

Li, Peng, Chinese Acad. of Sciences 

Tian, Jie 

Accurate estimation of fingerprint orientation field is an essential module in fingerprint recognition. This paper proposes 

a novel technique for improving fingerprint orientation field estimation by fingerprint orientation model based on weighted 

2D fourier expansion(W-FOMFE). The motivation for the proposed method can be found by: 1)the original FOMFE is 

sensitive to abrupt changes in orientation field; 2) blocks of different quality should have different impacts on FOMFE. 

Thus, we take into account the information of the Harris-corner strength (HCS) for orientation field estimation. In our 

method, we first calculate the fingerprint’s HCS; then use the HCS to remove abrupt changes in orientation field; finally, 

incorporate the normalized HCS as weighted value into original FOMFE. We test our method on FVC2004DB1. Experimental 

results show that our method (W-FOMFE) has better orientation field estimation than FOMFE. 

09:00-11:10, Paper TuAT9.34 

Iterative Fingerprint Enhancement with Matched Filtering and Quality Diffusion in Spatial-Frequency Domain 

Sutthiwichaiporn, Prawit, Kasetsart Univ. 

Areekul, Vutipong, Kasetsart Univ. 

Jirachaweng, Suksan, Kasetsart Univ. 

The proposed fingerprint enhancement algorithm utilizes power spectrum in spatial-frequency domain. The input fingerprint 

is partitioned and assessed as high/low quality zone by using signal-to-noise ratio (SNR) approach. For high quality 

zone, signal spectrum with noise suppression is used to shape an enhanced filter in frequency domain. Then, this algorithm 

feed neighboring enhanced zone back in order to repair unreliable low quality region. The proposed algorithm out-performs 

Gabor and STFT approaches by fingerprint matching experiments on FVC2004 Db2 and Db3. 

- 103 -

09:00-11:10, Paper TuAT9.35 

Cancelable Face Recognition using Random Multiplicative Transform 

Wang, Yongjin, Univ. of Toronto 

Hatzinakos, Dimitrios, Univ. of Toronto 

The generation of cancelable and privacy preserving biometric templates is important for the pervasive deployment of 

biometric technology in a wide variety of applications. This paper presents a novel approach for cancelable biometric authentication 

using random multiplicative transform. The proposed method transforms the original biometric feature vector 

through element-wise multiplication with a random vector, and the sorted index numbers of the resulting vector in the 

transformed domain are stored as the biometric template. The changeability and privacy protecting properties of the generated 

biometric template are analyzed in detail. The effectiveness of the proposed method is well supported by extensive 

experimentation on a face verification problem. 

09:00-11:10, Paper TuAT9.36 

Evaluation of Multi-Frame Fusion based Face Classification under Shadow 

Canavan, Shaun, SUNY Binghamton 

Johnson, Benjamin, Youngstown State Univ. 

Reale, Michael, Binghamton Univ. 

Zhang, Yong, Youngstown State Univ. 

Yin, Lijun, SUNY Binghamton 

Sullins, John, Youngstown State Univ. 

A video sequence of a head moving across a large pose angle contains much richer information than a single-view image, 

and hence has greater potential for identification purposes. This paper explores and evaluates the use of a multi-frame 

fusion method to improve face recognition in the presence of strong shadow. The dataset includes videos of 257 subjects 

who rotated their heads by 0 to 90. Experiments were carried out using ten video frames per subject that were fused on 

the score level. The primary findings are: (I) A significant performance increase was observed, with the recognition rate 

being doubled from 40% using a single frame to 80% using ten frames; (ii) The performance of multi-frame fusion is 

strongly related to its inter-frame variation that measures its information diversity. 

09:00-11:10, Paper TuAT9.37 

Finger-Vein Authentication based on Wide Line Detector and Pattern Normalization 

Huang, Beining, Peking Univ. 

Dai, Yanggang, Peking Univ. 

Li, Rongfeng, Peking Univ. 

Tang, Darun, Univ. 

Li, Wenxin, Peking Univ. 

In the finger-vein authentication, there are two problems in practice. One is that the quality of the vein image will be 

reduced under bad environment conditions; the other is the irregular distortion of the image caused by the variance of the 

finger poses. Both problems raise the error ratios. In this paper, we introduced a wide line detector for feature extraction, 

which can obtain precise width information of the vein and increase the information of the extracted feature from low 

quality image. We also developed a new pattern normalization model based on a hypothesis that the fingers cross-sections 

are approximately ellipses and the vein that can be imaged is close to the finger surface. It can effectively reduce the distortion 

caused by the pose. In our experiment based on a database containing 50,700 images, our method shows advantages 

on dealing with the low quality data collected from the practical personal authentication system. 

09:00-11:10, Paper TuAT9.38 

Performance Evaluation of Micropattern Representation on Gabor Features for Face Recognition 

Zhao, Sanqiang, Griffith Univ. / National ICT Australia 

Gao, Yongsheng, Griffith Univ. 

Zhang, Baochang, Beihang Univ. 

Face recognition using micropattern representation has recently received much attention in the computer vision and pattern 

recognition community. Previous researches demonstrated that micropattern representation based on Gabor features 

achieves better performance than its direct usage on gray-level images. This paper conducts a comparative performance 

- 104 -

evaluation of micropattern representations on four forms of Gabor features for face recognition. Three evaluation rules 

are proposed and observed for a fair comparison. To reduce the high feature dimensionality problem, uniform quantization 

is used to partition the spatial histograms. The experimental results reveal that: 1) micropattern representation based on 

Gabor magnitude features outperforms the other three representations, and the performances of the other three are comparable; 

and 2) micropattern representation based on the combination of Gabor magnitude and phase features performs 

the best. 

09:00-11:10, Paper TuAT9.39 

Block Pyramid based Adaptive Quantization Watermarking for Multimodal Biometric Authentication 

Ma, Bin, Beihang Univ. 

Li, Chunlei, Beihang Univ. 


Zhang, Zhaoxiang, Beihang Univ. 

Wang, Yiding, North China Univ. of Tech. 

This paper proposes a novel robust watermarking scheme to embed fingerprint minutiae into face images for multimodal 

biometric authentication. First, a block pyramid is layered according to the block-wise face region distinctiveness estimated 

by Adaboost; upper level indicates informative spacial regions. Then, we adopt a first-order statics QIM method to perform 

watermark embedding in each pyramid level. Numeric watermark bits with higher priority are embedded into upper pyramid 

level with a larger embedding strength. By joint differentiation of host image regions and watermark bits priority, our 

scheme achieves a trade-offs among watermarking robustness, capacity and fidelity. Experimental results demonstrate 

that our approach guarantees the robustness of hidden biometric data, while preserving the distinctiveness of host biometric 

images. 

09:00-11:10, Paper TuAT9.40 

A Topologic Approach to User-Dependent Key Extraction from Fingerprints 

Gudkov, Vladimir, Sonda 

Ushmaev, Oleg, Russian Acad. of Sciences 

The paper briefly describes an approach to key extraction from fingerprint images based on topological descriptors of 

minutiae point neighborhood. The approach allows designing biometric encryption procedures with variable key length 

and successful decryption rate. 

09:00-11:10, Paper TuAT9.41 

Robust Face Recognition using Block-Based Bag of Words 

Li, Zisheng, The Univ. of Electro-Communications 

Imai, Jun-Ichi, The Univ. of Electro-Communications 

Kaneko, Masahide, The Univ. of Electro-Communications 

A novel block-based bag of words (BboW) method is proposed for robust face recognition. In our approach, a face image 

is partitioned into multiple blocks, dense SIFT features are then calculated and vector quantized into different codewords 

on each block respectively. Finally, histograms of codeword distribution on each local block are concatenated to represent 

the face image. Experimental results on AR database show that only using one neutral expression frame per person for 

training, our method can obtain excellent face recognition results on face images with extreme expressions, variant illumination, 

and partial occlusions. Our method also achieves an average recognition rate of 100% on XM2VTS database. 

09:00-11:10, Paper TuAT9.42 

Analysis of Fingerprint Pores for Vitality Detection 

Marcialis, Gian Luca, Univ. of Cagliari 

Roli, Fabio, Univ. of Cagliari 

Tidu, Alessandra, Univ. of Cagliari 

Spoofing is an open-issue for fingerprint recognition systems. It consists in submitting an artificial fingerprint replica from 

a genuine user. Current sensors provide an image which is then processed as a true fingerprint. Recently, the so-called 3 rd - 

level features, namely, pores, which are visible in high-definition fingerprint images, have been used for matching. In this 

- 105 -

paper, we propose to analyse pores location for characterizing the liveness of fingerprints. Experimental results on a large 

dataset of spoofed and live fingerprints show the benefits of the proposed approach. 

09:00-11:10, Paper TuAT9.43 

Applying Dissimilarity Representation to Off-Line Signature Verification 

Batista, Luana, École de Tech. Supérieure 

Granger, Eric, École de Tech. Supérieure 

Sabourin, R., École de Tech. Supérieure 

In this paper, a two-stage off-line signature verification system based on dissimilarity representation is proposed. In the 

first stage, a set of discrete left-to-right HMMs trained with different number of states and codebook sizes is used to 

measure similarity values that populate new feature vectors. Then, these vectors are input to the second stage, which provides 

the final classification. Experiments were performed using two different classification techniques – AdaBoost, and 

Random Subspaces with SVMs – and a real-world signature verification database. Results indicate that the performance 

is significantly better with the proposed system over other reference signature verification systems from literature. 

09:00-11:10, Paper TuAT9.44 

3D Face Decomposition and Region Selection against Expression Variations 

Günlü, Göksel, Gazi Univ. 

Bilge, Hasan Sakir, Gazi Univ. 

3D face recognition exploits shape information as well as texture information in 2D systems. The use of whole 3D face is 

sensitive to some undesired situations like expression variations. To overcome this problem, we investigate a new approach 

that decomposes the whole 3D face into sub-regions and independently extracts features from each sub-region. 3D DCT 

is applied to each sub-region and most discriminating DCT coefficients are selected. The nose region gives the most contribution 

to the list of discriminating coefficients. Furthermore, a better recognition rate is achieved by only using the nose 

region. The highest recognition score in our experiments is 98.97% where rank-one recognition rates are considered. The 

results of the proposed approach are compared to other methods that use FRGC v2 database. 

09:00-11:10, Paper TuAT9.45 

Fusion of Qualities for Frame Selection in Video Face Verification 

Villegas, Mauricio, Univ. Pol. De Valencia 

Paredes, Roberto, Univ. Pol. De Valencia 

It is known that the use of video can help improve the performance of face verification systems. However, processing 

video in resource constrained devices is prohibitive. In order to reduce the load of the algorithms, a quality-based selection 

of frames can be applied. Generally there are available several qualities and thus a good fusion scheme is required. This 

paper addresses the problem of fusing quality measures such that the resulting quality improves the performance of frame 

selection. A comparison of different methods for fusing qualities is presented. Also, some new quality measures based on 

time derivatives are proposed, which are shown to be beneficial for estimating the overall quality. Finally, a curve is proposed 

which proves that the qualities used for frame selection effectively improve verification performance, independent 

of the number of frames selected or the method employed for obtaining the overall biometric score. 

09:00-11:10, Paper TuAT9.46 

A Person Retrieval Solution using Finger Vein Patterns 

Tang, Darun, Peking Univ. 

Huang, Beining, Peking Univ. 

Li, Rongfeng, Peking Univ. 

Li, Wenxin, Peking Univ. 

Dai, Yanggang, Peking Univ. 

Personal identification based on finger vein patterns is a newly developed biometrics technique and several practical systems 

have been deployed recent years. We developed a finger vein verification system for checking attendance and have 

collected a database of 0.8 million finger vein samples. Based on the database, we proposed a person retrieval solution for 

searching an image in the database and can get the response in an acceptable time. To fit for the retrieval solution, we de- 

- 106 -

signed a new encoding method. The experimental results show that our solution can get a result in about 10 seconds when 

working on a database of 50,700 samples. In the same time, the error rate is nearly the same as the linear searching. 

09:00-11:10, Paper TuAT9.47 

Multi-Classifier Q-Stack Aging Model for Adult Face Verification 

Li, Weifeng, Swiss Federal Inst. of Tech. Lausanne (EPFL) 

Drygajlo, Andrzej, Swiss Federal Inst. of Tech. Lausanne (EPFL) 

The influence of age progression on the performance of multi-classifier face verification systems is a challenging and 

largely open research problem that deserves more and more attention. In this paper, we propose to manage the aging influence 

on the adult face verification system by a multi-classifier Q-stack age modeling technique, which uses the age as 

a class-independent metadata quality measure together with scores from baseline classifiers, combining global and local 

patterns, in order to obtain better recognition rates. This allows for improved long-term class separation by introducing a 

2D parameterized decision boundary in the scores-age space using a short-term enrollment model. This new method, based 

on the concept of classifier stacking and age-dependent decision boundary, compares favorably with the conventional face 

verification approach, which uses age-independent decision threshold calculated only in the score space at the time of enrollment. 

The proposed approach is evaluated on the MORPH database. 

09:00-11:10, Paper TuAT9.48 

Quality-Based Fusion for Multichannel Iris Recognition 

Vatsa, Mayank, IIIT Delhi 

Singh, Richa, IIIT Delhi 


Noore, Afzel, West Virginia Univ. 

We propose a quality-based fusion scheme for improving the recognition accuracy using color iris images characterized 

by three spectral channels – Red, Green and Blue. In the proposed method, quality scores are employed to select two channels 

of a color iris image which are fused at the image level using a Redundant Discrete Wavelet Transform (RDWT). The 

fused image is then used in a score-level fusion framework along with the remaining channel to improve recognition accuracy. 

Experimental results on a heterogenous color iris database demonstrate the efficacy of the technique when compared 

against other score-level and image-level fusion methods. The proposed method can potentially benefit the use of color 

iris images in conjunction with their NIR counterparts. 

09:00-11:10, Paper TuAT9.49 

Iris Image Retrieval based on Macro-Features 

Sam Sunder, Manisha, West Virginia Univ. 


Most iris recognition systems use the global and local texture information of the iris in order to recognize individuals. In 

this work, we investigate the use of macro-features that are visible on the anterior surface of RGB images of the iris for 

matching and retrieval. These macro-features correspond to structures such as moles, freckles, nevi, melanoma, etc. and 

may not be present in all iris images. Given an image of a macro-feature, the goal is to determine if it can be used to successfully 

retrieve the associated iris from the database. To address this problem, we use features extracted by the Scale- 

Invariant Feature Transform (SIFT) to represent and match macro-features. Experiments using a subset of 770 distinct 

irides from the Miles Research Iris Database suggest the possibility of using macro-features for iris characterization and 

retrieval. 

09:00-11:10, Paper TuAT9.50 

A Gradient Descent Approach for Multi-Modal Biometric Identification 

Basak, Jayanta, IBM Res. 

Kate, Kiran, IBM Res. – India 

Tyagi, Vivek, IBM Res. - India 

Ratha, Nalini, IBM Res. 

While biometrics-based identification is a key technology in many critical applications such as searching for an identity 

- 107 -

in a watch list or checking for duplicates in a citizen ID card system, there are many technical challenges in building a solution 

because the size of the database can be very large (often in 100s of millions) and the intrinsic errors with the underlying 

biometrics engines. Often multi-modal biometrics is proposed as a way to improve the underlying biometrics accuracy 

performance. In this paper, we propose a score based fusion scheme tailored for identification applications. The proposed 

algorithm uses a gradient descent method to learn weights for each modality such that weighted sum of genuine scores is 

larger than the weighted sum of all the impostor scores. During the identification phase, top K candidates from each modality 

are retrieved and a super-set of identities is constructed. Using the learnt weights, we compute the weighted score for 

all the candidates in the superset. The highest scoring candidate is declared as the top candidate for identification. The 

proposed algorithm has been tested using NIST BSSR-1 dataset and results in terms of accuracy as well as the speed (execution 

time) are shown to be far superior than the published results on this dataset. 

09:00-11:10, Paper TuAT9.51 

Robust ECG Biometrics by Fusing Temporal and Cepstral Information 

Li, Ming, Univ. of Southern California 

Narayanan, Shrikanth, Univ. of Southern California 

The use of vital signs as a biometric is a potentially viable approach in a variety of application scenarios such as security 

and personalized health care. In this paper, a novel robust Electrocardiogram (ECG) biometric algorithm based on both 

temporal and cepstral information is proposed. First, in the time domain, after pre-processing and normalization, each 

heartbeat of the ECG signal is modeled by Hermite polynomial expansion (HPE) and support vector machine (SVM). 

Second, in the homomorphic domain, cepstral features are extracted from the ECG signals and modeled by Gaussian mixture 

modeling (GMM). In the GMM framework, heteroscedastic linear discriminant analysis and GMM super vector kernel 

is used to perform feature dimension reduction and discriminative modeling, respectively. Finally, fusion of both temporal 

and cepstral system outcomes at the score level is used to improve the overall performance. Experiment results show that 

the proposed hybrid approach achieves 98.3% accuracy and 0.5% equal error rate on the MIT-BIH Normal Sinus Rhythm 

Database. 

09:00-11:10, Paper TuAT9.52 

A Comparative Study of Facial Landmark Localization Methods for Face Recognition using HOG Descriptors 

Monzo, David, Univ. Pol. Valencia 

Albiol, Alberto, Univ. Pol. Valencia 

Albiol, Antonio, Univ. Pol. Valencia 

Mossi, Jose M., Univ. Pol. Valencia 

This paper compares several approaches to extract facial landmarks and studies their influence on face recognition problems. 

In order to obtain fair comparisons, we use the same number of facial landmarks and the same type of descriptors 

(HOG descriptors) for each approach. The comparative results are obtained using FERET and FRGC datasets and show 

that better recognition rates are obtained when landmarks are located at real facial fiducial points. However, if the automatic 

detection of these is compromised by the difficulty of the images, better results are obtained using fixed landmarks grids. 

09:00-11:10, Paper TuAT9.53 

Confidence Weighted Subspace Projection Techniques for Robust Face Recognition in the Presence of Partial Occlusio 

Struc, Vitomir, Univ. of Ljubljana 

Dobrišek, Simon, Univ. of Ljubljana 

Pavesic, Nikola, Univ. of Ljubljana 

Subspace projection techniques are known to be susceptible to the presence of partial occlusions in the image data. To 

overcome this susceptibility, we present in this paper a confidence weighting scheme that assigns weights to pixels according 

to a measure, which quantifies the confidence that the pixel in question represents an outlier. With this procedure 

the impact of the occluded pixels on the subspace representation is reduced and robustness to partial occlusions is obtained. 

Next, the confidence weighting concept is improved by a local procedure for the estimation of the subspace representation. 

Both the global weighting approach and the local estimation procedure are assessed in face recognition experiments on 

the AR database, where encouraging results are obtained with partially occluded facial images. 

- 108 -

09:00-11:10, Paper TuAT9.54 

Face Recognition across Pose with Automatic Estimation of Pose Parameters through AAM-Based Landmarking 

Teijeiro-Mosquera, Lucía, Univ. de Vigo 

Alba Castro, Jose Luis, Univ. of Vigo 

Gonzalez-Jimenez, Daniel, Univ. of Vigo 

In this paper we present a fully automatic system for face recognition across pose where no frontal view is needed in enrollment 

or test. The system uses three Active Appearance Models(AAMs): the first one is a generic multi resolution AAM, 

while the remaining ones are trained to cope with left/right variations (i.e. pose-dependent AAMs). During the fitting 

stage, pose is automatically estimated using eigenvector analysis, and a synthetic face is generated through texture warping. 

Results over CMU PIE Database show promising results compared to the performance achieved with manually land 

marked faces. 

09:00-11:10, Paper TuAT9.55 

Cross-Spectral Face Verification in the Short Wave Infrared (SWIR) Band 

Bourlai, Thirimachos, WVU 

Kalka, Nathan, WVU 


Cukic, Bojan, WVU 

Hornak, Lawrence, WVU 

The problem of face verification across the short wave infrared spectrum (SWIR) is studied in order to illustrate the advantages 

and limitations of SWIR face verification. The contributions of this work are two-fold. First, a database of 50 

subjects is assembled and used to illustrate the challenges associated with the problem. Second, a set of experiments is 

performed in order to demonstrate the possibility of SWIR cross-spectral matching. Experiments also show that images 

captured under different SWIR wavelengths can be matched to visible images with promising results. The role of multispectral 

fusion in improving recognition performance in SWIR images is finally illustrated. To the best of our knowledge, 

this is the first time cross-spectral SWIR face recognition is being investigated in the open literature. 

09:00-11:10, Paper TuAT9.56 

Decision Fusion for Patch-Based Face Recognition 

Topçu, Berkay, Sabancı Univ. 

Erdogan, Hakan, Sabanci Univ. 

Patch-based face recognition is a recent method which uses the idea of analyzing face images locally, in order to reduce 

the effects of illumination changes and partial occlusions. Feature fusion and decision fusion are two distinct ways to 

make use of the extracted local features. Apart from the well-known decision fusion methods, a novel approach for calculating 

weights for the weighted sum rule is proposed in this paper. Improvements in recognition accuracies are shown and 

superiority of decision fusion over feature fusion is advocated. In the challenging AR database, we obtain significantly 

better results using decision fusion as compared to conventional methods and feature fusion methods by using validation 

accuracy weighting scheme and nearest-neighbor discriminant analysis dimension reduction method. 

09:00-11:10, Paper TuAT9.57 

Video based Palmprint Recognition 

Methani, Chhaya, IIIT-H 

Namboodiri, Anoop, International Inst. of Information Tech. 

The use of camera as a biometric sensor is desirable due to its ubiquity and low cost, especially for mobile devices. Palm 

print is an effective modality in such cases due to its discrimination power, ease of presentation and the scale and size of 

texture for capture by commodity cameras. However, the unconstrained nature of pose and lighting introduces several 

challenges in the recognition process. Even minor changes in pose of the palm can induce significant changes in the visibility 

of the lines. We turn this property to our advantage by capturing a short video, where the natural palm motion 

induces minor pose variations, providing additional texture information. We propose a method to register multiple frames 

of the video without requiring correspondence, while being efficient. Experimental results on a set of different 100 palms 

show that the use of multiple frames reduces the error rate from 12.75% to 4.7%. We also propose a method for detection 

of poor quality samples due to specularities and motion blur, which further reduces the EER to 1.8%. 

- 109 -

09:00-11:10, Paper TuAT9.58 

Profile Lip Reading for Vowel and Word Recognition 

Saitoh, Takeshi, Kyushu Inst. of Tech. 

Konishi, Ryosuke, Tottori Univ. 

This paper focuses on the profile view, which is the second most typical angle after the frontal face, and proposes a profile 

view lip reading method. We applied the normalized cost method to detect profile contour. Five feature points, the tip of the 

nose, upper lip, lip corner, lower lip, and chin, were detected from the contour, and eight features obtained from the five 

feature points were defined. We gathered two types of utterance scenes, five Japanese vowels and 20 Japanese words. We 

selected 20 combinations based on the eight features and carried out recognition experiments. Recognition rates of 99% for 

vowel recognition and 86% for word recognition were obtained with five features: two lip heights, two protrusion lengths, 

and one lip angle. 

11:10-12:10, TuPL1 Anadolu Auditorium 

Computational Cameras: Redefining the Image 

Shree Nayar Plenary Session 

Columbia University, USA 

Shree K. Nayar received his PhD degree in Electrical and Computer Engineering from the Robotics Institute at Carnegie 

Mellon University in 1990. He is currently the T. C. Chang Professor of Computer Science at Columbia University. He 

co-directs the Columbia Vision and Graphics Center. He also heads the Columbia Computer Vision Laboratory (CAVE), 

which is dedicated to the development of advanced computer vision systems. His research is focused on three areas; the 

creation of novel cameras, the design of physics based models for vision, and the development of algorithms for scene 

understanding. His work is motivated by applications in the fields of digital imaging, computer graphics, and robotics. 

He has received best paper awards at ICCV 1990, ICPR 1994, CVPR 1994, ICCV 1995, CVPR 2000 and CVPR 2004. 

He is the recipient of the David Marr Prize (1990 and 1995), the David and Lucile Packard Fellowship (1992), the National 

Young Investigator Award (1993), the NTT Distinguished Scientific Achievement Award (1994), the Keck Foundation 

Award for Excellence in Teaching (1995) and the Columbia Great Teacher Award (2006). In February 2008, he was elected 

to the National Academy of Engineering. 

The computational camera embodies the convergence of the camera and the computer. It uses new optics to select rays 

from the scene in unusual ways, and an appropriate algorithm to process the selected rays. This as eline to manipulate 

images before they are recorded and process the recorded images before they are presented is a powerful one. It enables 

us to experience our visual world in rich and compelling ways. 

TuBT1 Anadolu Auditorium 

Image Analysis – IV Regular Session 

Session chair: Hlavac, Vaclav (Czech Technical Univ.) 

13:30-13:50, Paper TuBT1.1 

Joint Image GMM and Shading MAP Estimation 

Shekhovtsov, Alexander, Czech Tech. Univ. in Prague 

Hlavac, Vaclav, Czech Tech. Univ. 

We consider a simple statistical model of the image, in which the image is represented as a sum of two parts: one part is 

explained by an i.i.d. color Gaussian mixture and the other part by a (piecewise) smooth gray scale shading function. The 

smoothness is ensured by a quadratic (Tikhonov) or total variation regularization. We derive an EM algorithm to estimate 

simultaneously the parameters of the mixture model and the shading. Our algorithms for both kinds of the regularization 

solve for shading and mean parameters of the mixture model jointly. 

13:50-14:10, Paper TuBT1.2 

Continuous Markov Random Field Optimization using Fusion Move Driven Markov Chain Monte Carlo Technique 

Kim, Wonsik, Seoul National Univ. 


Many vision applications have been formulated as Markov Random Field (MRF) problems. Although many of them are 

discrete labeling problems, continuous formulation often achieves great improvement on the qualities of the solutions in 

- 110 -

some applications such as stereo matching and optical flow. In continuous formulation, however, it is much more difficult 

to optimize the target functions. In this paper, we propose a new method called fusion move driven Markov Chain Monte 

Carlo method (MCMC-F) that combines the Markov Chain Monte Carlo method and the fusion move to solve continuous 

MRF problems effectively. This algorithm exploits powerful fusion move while it fully explore the whole solution space. 

We evaluate it using the stereo matching problem. We empirically demonstrate that the proposed algorithm is more stable 

and always finds lower energy states than the state-of-the art optimization techniques. 

14:10-14:30, Paper TuBT1.3 

Approximate Belief Propagation by Hierarchical Averaging of Outgoing Messages 

Ogawara, Koichi, Kyushu Univ. 

This paper presents an approximate belief propagation algorithm that replaces outgoing messages from a node with the 

averaged outgoing message and propagates messages from a low resolution graph to the original graph hierarchically. The 

proposed method reduces the computational time by half or two-thirds and reduces the required amount of memory by 

60% compared with the standard belief propagation algorithm when applied to an image. The proposed method was implemented 

on CPU and GPU, and was evaluated against Middlebury stereo benchmark dataset in comparison with the 

standard belief propagation algorithm. It is shown that the proposed method outperforms the other in terms of both the 

computational time and the required amount of memory with minor loss of accuracy. 

14:30-14:50, Paper TuBT1.4 

Cascaded Background Subtraction using Block-Based and Pixel-Based Codebooks 

Guo, Jing-Ming, National Taiwan Univ. of Science and Tech. 

Chih-Sheng Hsu, Sheng, National Taiwan Univ. of Science and Tech. 

This paper presents a cascaded scheme with block-based and pixel-based codebooks for background subtraction. The 

codebook is mainly used to compress information to achieve high efficient processing speed. In the block-based stage, 12 

intensity values are employed to represent a block. The algorithm extends the concept of the Block Truncation Coding 

(BTC), and thus it can further improve the processing efficiency by enjoying its low complexity advantage. In detail, the 

block-based stage can remove the most noise without reducing the True Positive (TP) rate, yet it has low precision. To 

overcome this problem, the pixel-based stage is adopted to enhance the precision, which also can reduce the False Positive 

(FP) rate. Moreover, this study also presents a color model and a match function which can classify an input pixel as 

shadow, highlight, background, or foreground. As documented in the experimental results, the proposed algorithm can 

provide superior performance to that of the former approaches. 

14:50-15:10, Paper TuBT1.5 

Moving Cast Shadow Removal based on Local Descriptors 

Qin, Rui, Chinese Acad. of Sciences 

Liao, Shengcai, Chinese Acad. of Sciences 

Lei, Zhen, Chinese Acad. of Sciences 

Li, Stan Z., Chinese Acad. of Sciences 

Moving cast shadow removal is an important yet difficult problem in video analysis and applications. This paper presents 

a novel algorithm for detection of moving cast shadows, that based on a local texture descriptor called Scale Invariant 

Local Ternary Pattern (SILTP). An assumption is made that the texture properties of cast shadows bears similar patterns 

to those of the background beneath them. The likelihood of cast shadows is derived using information in both color and 

texture. An online learning scheme is employed to update the shadow model adaptively. Finally, the posterior probability 

of cast shadow region is formulated by further incorporating prior contextual constrains using a Markov Random Field 

(MRF) model. The optimal solution is found using graph cuts. Experimental results tested on various scenes demonstrate 

the robustness of the algorithm. 

TuBT2 Topkapı Hall A 

Feature Extraction – I Regular Session 

Session chair: Franke, Katrin (Gjøvik Univ. College) 

- 111 -

13:30-13:50, Paper TuBT2.1 

Local Rotation Invariant Patch Descriptors for 3D Vector Fields 

Janis, Fehr, Univ. Freiburg 

In this paper, we present two novel methods for the fast computation of local rotation invariant patch descriptors for 3D 

vectorial data. Patch based algorithms have recently become very popular approach for a wide range of 2D computer 

vision problems. Our local rotation invariant patch descriptors allow an extension of these methods to 3D vector fields. 

Our approaches are based on a harmonic representation for local spherical 3D vector field patches, which enables us to 

derive fast algorithms for the computation of rotation invariant power spectrum and bispectrum feature descriptors of such 

patches. 

13:50-14:10, Paper TuBT2.2 

Anomaly Detection for Longwave Flir Imagery using Kernel wavelet-Rx 

Mehmood, Asif, US Army Res. Lab. 

Nasrabadi, Nasser, US Army Res. Lab. 

This paper describes a new kernel wavelet-based anomaly detection technique for long-wave (LW) Forward Looking Infrared 

(FLIR) imagery. The proposed approach called kernel wavelet-RX algorithm is essentially an extension of the 

wavelet-RX algorithm (combination of wavelet transform and RX anomaly detector) to a high dimensional feature space 

(possibly infinite) via a certain nonlinear mapping function of the input data. The wavelet-RX algorithm in this high dimensional 

feature space can easily be implemented in terms of kernels that implicitly compute dot products in the feature 

space (kernelizing the wavelet-RX algorithm). In our kernel wavelet-RX algorithm, a 2-D wavelet transform is first applied 

to decompose the input image into uniform subbands. A number of significant subbands (high energy subbands) are concatenated 

together to form a subband-image cube. The kernel RX algorithm is then applied to these subband-image cubes 

obtained from wavelet decomposition of the LW database images. Experimental results are presented for the proposed 

kernel wavelet-RX, wavelet-RX and the classical CFAR algorithm for detecting anomalies (targets) in a large database of 

LW imagery. The ROC plots show that the proposed kernel wavelet-RX algorithm outperforms the wavelet-RX as well 

as the classical CFAR detector. 

14:10-14:30, Paper TuBT2.3 

Detection of Salient Image Points using Principal Subspace Manifold Structure 

Paiva, Antonio, Univ. of Utah 

Tasdizen, Tolga, Univ. of Utah 

This paper presents a method to find salient image points in images with regular patterns based on deviations from the 

overall manifold structure. The two main contributions are that: (I) the features to extract salient point are derived directly 

and in an unsupervised manner from image neighborhoods, and (ii) the manifold structure is utilized, thus avoiding the 

assumption that data lies in clusters and the need to do density estimation. We illustrate the concept for the detection of 

fingerprint minutiae, fabric defects, and interesting regions of seismic data. 

14:30-14:50, Paper TuBT2.4 

Triangle-Constraint for Finding More Good Features 



We present a novel method for finding more good feature pairs between two sets of features. We first select matched features 

by Bi-matching method as seed points, then organize these seed points by adopting the Delaunay triangulation algorithm. 

Finally, we use Triangle-Constraint (T-C) to increase both number of correct matches and matching score (the ratio 

between number of correct matches and total number of matches). The experimental evaluation shows that our method is 

robust to most of geometric and photometric transformations including rotation, scale change, blur, viewpoint change, 

JPEG compression and illumination change, and significantly improves both number of correct matches and matching 

score. 

- 112 -

14:50-15:10, Paper TuBT2.5 

Compressing Sparse Feature Vectors using Random Ortho-Projections 

Rahtu, Esa, Univ. of Oulu 

Salo, Mikko, Univ. of Helsinki 


In this paper we investigate the usage of random ortho-projections in the compression of sparse feature vectors. The study 

is carried out by evaluating the compressed features in classification tasks instead of concentrating on reconstruction accuracy. 

In the random ortho-projection method, the mapping for the compression can be obtained without any further 

knowledge of the original features. This makes the approach favorable if training data is costly or impossible to obtain. 

The independence from the data also enables one to embed the compression scheme directly into the computation of the 

original features. Our study is inspired by the results in compressive sensing, which state that up to a certain compression 

ratio and with high probability, such projections result in no loss of information. In comparison to learning based compression, 

namely principal component analysis (PCA), the random projections resulted in comparable performance already 

at high compression ratios depending on the sparsity of the original features. 

TuBT3 Marmara Hall 

Object Detection and Recognition – II Regular Session 

Session chair: Porikli, Fatih (MERL) 

13:30-13:50, Paper TuBT3.1 

Learning Discriminative Features based on Distribution 

Shen, Jifeng, Southeast Univ. 

Yang, Wankou, Southeast Univ. 

Sun, Changyin, Southeast Univ. 

In this paper, a novel feature named adaptive projection LBP (APLBP) is proposed for face detection. To promote discriminative 

power, the distribution information of training samples is embedded into the proposed feature. APLBP is generated 

by LDA which maximizes the margin between positive and negative samples adaptively, utilizing characteristics 

of similarity to Gaussian distribution of the training samples. Asymmetric Gentle Adaboost is utilized to train strong classifier 

and nested cascade is applied to construct the final detector. Experimental results based on MIT+CMU database 

demonstrate that APLBP feature outperforms several well-existing features due to its excellent discriminative power with 

less feature number. 

13:50-14:10, Paper TuBT3.2 

Sub-Category Optimization for Multi-View Multi-Pose Object Detection 

Das, Dipankar, Saitama Univ. 

Kobayashi, Yoshinori, Saitama Univ. 

Kuno, Yoshinori, Saitama Univ. 

Object category detection with large appearance variation is a fundamental problem in computer vision. The appearance 

of object categories can change due to intra-class variability, viewpoint, and illumination. For object categories with large 

appearance change a sub-categorization based approach is necessary. This paper proposes a sub-category optimization 

approach that automatically divides an object category into an appropriate number of sub-categories based on appearance 

variation. Instead of using a predefined intra-category sub-categorization based on domain knowledge or validation 

datasets, we divide the sample space by unsupervised clustering based on discriminative image features. Then the clustering 

performance is verified using a sub-category discriminant analysis. Based on the clustering performance of the unsupervised 

approach and sub-category discriminant analysis results we determine an optimal number of sub-categories per object category. 

Extensive experimental results are shown using two standard and the authors’ own databases. The comparison 

results show that our approach outperforms the state-of-the-art methods. 

14:10-14:30, Paper TuBT3.3 

Learning and Detection of Object Landmarks in Canonical Object Space 


Ilonen, Jarmo, Lappeenranta Univ. of Tech. 

- 113 -

This work contributes to part-based object detection and recognition by introducing an enhanced method for local part 

detection. The method is based on complex-valued multiresolution Gabor features and their ranking using multiple hypothesis 

testing. In the present work, our main contribution is the introduction of a canonical object space, where objects 

are represented in their `èxpected pose and visual appearance’’. The canonical space circumvents the problem of geometric 

image normalisation prior to feature extraction. In addition, we define a compact set of Gabor filter parameters, from 

where the optimal values can be easily devised. These enhancements make our method an attractive landmark detector 

for part-based object detection and recognition methods. 

14:30-14:50, Paper TuBT3.4 

Multiple-Shot Person Re-Identification by HPE Signature 

Bazzani, Loris, Univ. of Verona 


Perina, Alessandro, Univ. of Verona 

Farenzena, Michela, Univ. of Verona 


In this paper, we propose a novel appearance-based method for person re-identification, that condenses a set of frames of 

the same individual into a highly informative signature, called Histogram Plus Epitome, HPE. It incorporates complementary 

global and local statistical descriptions of the human appearance, focusing on the overall chromatic content, via 

histograms representation, and on the presence of recurrent local patches, via epitome estimation. The matching of HPEs 

provides optimal performances against low resolution, occlusions, pose and illumination variations, defining novel stateof-the-art 

results on all the datasets considered. 

14:50-15:10, Paper TuBT3.5 

Building Detection in a Single Remotely Sensed Image with a Point Process of Rectangles 

Benedek, Csaba, Computer and Automation Res. Inst. Hungarian 

Descombes, Xavier, INRIA 

Zerubia, Josiane, INRIA 

In this paper we introduce a probabilistic approach of building extraction in remotely sensed images. To cope with data 

heterogeneity we construct a flexible hierarchical framework which can create various building appearance models from 

different elementary feature based modules. A global optimization process attempts to find the optimal configuration of 

buildings, considering simultaneously the observed data, prior knowledge, and interactions between the neighboring building 

parts. The proposed method is evaluated on various aerial image sets containing more than 500 buildings, and the 

results are matched against two state-of-the-art techniques. 

TuBT4 Dolmabahçe Hall A 

Model Selection and Clustering Regular Session 

Session chair: Shapiro, Linda (Univ. of Washington) 

13:30-13:50, Paper TuBT4.1 

A Relationship between Generalization Error and Training Samples in Kernel Regressors 

Tanaka, Akira, Hokkaido Univ. 

Imai, Hideyuki, Hokkaido Univ. 

Kudo, Mineichi, Hokkaido Univ. 

Miyakoshi, Masaaki, Hokkaido Univ. 

A relationship between generalization error and training samples in kernel regressors is discussed in this paper. The generalization 

error can be decomposed into two components. One is a distance between an unknown true function and an 

adopted model space. The other is a distance between an estimated function and the orthogonal projection of the unknown 

true function onto the model space. In our previous work, we gave a framework to evaluate the first component. In this 

paper, we theoretically analyze the second one and show that a larger set of training samples usually causes a larger generalization 

error. 

- 114 -

13:50-14:10, Paper TuBT4.2 

Localized Multiple Kernel Regression 

Gönen, Mehmet, Bogazici Univ. 

Alpaydin, Ethem, Bogazici Univ. 

Multiple kernel learning (MKL) uses a weighted combination of kernels where the weight of each kernel is optimized 

during training. However, MKL assigns the same weight to a kernel over the whole input space. Our main objective is the 

formulation of the localized multiple kernel learning (LMKL) framework that allows kernels to be combined with different 

weights in different regions of the input space by using a gating model. In this paper, we apply the LMKL framework to 

regression estimation and derive a learning algorithm for this extension. Canonical support vector regression may over fit 

unless the kernel parameters are selected appropriately; we see that even if provide more kernels than necessary, LMKL 

uses only as many as needed and does not overfit due to its inherent regularization. 

14:10-14:30, Paper TuBT4.3 

Probabilistic Clustering using the Baum-Eagon Inequality 

Rota Bulo’, Samuel, Univ. Ca’ Foscari di Venezia 

Pelillo, Marcello, Ca’ Foscari Univ. 

The paper introduces a framework for clustering data objects in a similarity-based context. The aim is to cluster objects 

into a given number of classes without imposing a hard partition, but allowing for a soft assignment of objects to clusters. 

Our approach uses the assumption that similarities reflect the likelihood of the objects to be in a same class in order to 

derive a probabilistic model for estimating the unknown cluster assignments. This leads to a polynomial optimization in 

probability domain, which is tackled by means of a result due to Baum and Eagon. Experiments on both synthetic and real 

standard datasets show the effectiveness of our approach. 

14:30-14:50, Paper TuBT4.4 

Ensemble Clustering via Random Walker Consensus Strategy 

Abdala, Daniel Duarte, Univ. of Münster 

Wattuya, Pakaket, Univ. of Münster 

Jiang, Xiaoyi, Univ. of Münster 

In this paper we present the adaptation of a random walker algorithm for combination of image segmentations to work 

with clustering problems. In order to achieve it, we pre-process the ensemble of clusterings to generate its graph representation. 

We show experimentally that a very small neighborhood will produce similar results if compared with larger choices. 

This fact alone improves the computational time needed to produce the final consensual clustering. We also present an experimental 

comparison between our results against other graph based and well known combination clustering methods in 

order to assess the quality of this approach. 

14:50-15:10, Paper TuBT4.5 

Bhattacharyya Clustering with Applications to Mixture Simplifications 

Nielsen, Frank, Ecole Polytechnique/SONY CLS 

Boltz, Sylvain, Ecole Polytechnique/SONY CLS 

Schwander, Olivier, Ecole Polytechnique/SONY CLS 

Bhattacharrya distance (BD) is a widely used distance in statistics to compare probability density functions (PDFs). It has 

shown strong statistical properties (in terms of Bayes error) and it relates to Fisher information. It has also practical advantages, 

since it strongly relates on measuring the overlap of the supports of the PDFs. Unfortunately, even with common 

parametric models on PDFs, few closed-form formulas are known. Moreover, the BD centroid estimation was limited to 

univariate gaussian PDFs in the literature and no convergence guarantees were provided. In this paper, we propose a 

closed-form formula for BD on a general class of parametric distributions named exponential families. We show that the 

BD is a Burbea-Rao divergence for the log normalizer of the exponential family. We propose an efficient iterative scheme 

to compute a BD centroid on exponential families. Finally, these results allow us to define a Bhattacharrya hierarchical 

clustering algorithms (BHC). It can be viewed as a generalization of k-means on BD. Results on image segmentation 

shows the stability of the method. 

- 115 -

TuBT5 Dolmabahçe Hall B 

Watermarking and Authentication Regular Session 

Session chair: Bülent Sankur (Boğaziçi Univ.) 

13:30-13:50, Paper TuBT5.1 

High Capacity Data Hiding for Binary Image Authentication 

Guo, Meng, Beijing Univ. of Tech. 

Zhang, Hongbin, Beijing Univ. of Tech. 

This paper proposes a novel data hiding scheme with high capacity for binary images, including document images, halftone 

images, scanned figures, text and signatures. In our scheme, the embedding efficiency and the placement of embedding 

changes are considered simultaneously. Given an MxN image block, the upper bound of the amount of bits that can be 

embedded of the scheme is nlog2((MxN)/n + 1) by changing at most n pixels. Experimental results show that the proposed 

scheme can embed more data, meanwhile maintain a better quality, and have wider applications than existing schemes. 

13:50-14:10, Paper TuBT5.2 

Secure Self-Recovery Image Authentication using Randomly-Sized Blocks 

Hassan, Ammar M., Otto-von-Guericke Univ. 

Al-Hamadi, Ayoub, IESK 

Michaelis, Bernd, IESK 

Hasan, Yassin M. Y., Assiut Univ. 

Wahab, Mohamed A. A., Minia Univ. 

In this paper, a secure variable-size block-based image authentication technique is proposed that can not only localize the 

alteration detection but also recover the missing data. An image undergoes recursive arbitrarily-asymmetric binary tree 

partitioning to obtain randomly-sized blocks spanning the entire image. To enhance reliability of altered block recovery, 

multiple description coding (MDC) is utilized to generate two block descriptions. Block signature copies and the two 

block descriptions are embedded into two relatively-distant blocks making a doubly linked chain. The experimental results 

deposit that the proposed technique successfully both localizes and compensates the alterations. Furthermore, it is robust 

against the vector quantization (VQ) attack. 

14:10-14:30, Paper TuBT5.3 

Blind Wavelet based Logo Watermarking Resisting to Cropping 

Soheili, Mohammadreza, Tarbiat Moallem Univ. 

In this paper we propose a blind wavelet-based logo watermarking scheme focusing on resisting to cropping. The binary 

logo is embedded in the LL2 sub-band of host image, using quantization technique. For increasing robustness of proposed 

algorithm two dimensional parity bits are added to the binary logo. Experimental results show that the proposed watermarking 

method can resist not only cropping attack, but also some common signal processing attacks, such as JPEG compression, 

average and median filtering, rotation and scaling. 

14:30-14:50, Paper TuBT5.4 

The New Blockwise Algorithm for Large-Scale Images Robust Watermarking 

Mitekin, Vitaly, Russian Acad. of Sciences 

Glumov, Nikolay, Russian Acad. of Sciences 

A new algorithm for digital watermarking of large-scale digital images is proposed in the article. The proposed algorithm 

provides watermark robustness to a wide range of host image distortions and has a number of advantages compared to an 

existing algorithms of robust watermarking. 

14:50-15:10, Paper TuBT5.5 

Lossless ROI Medical Image Watermarking Technique with Enhanced Security and High Payload Embedding 

Kundu, Malay Kumar, Indian Statistical Inst. 

Das, Sudeb, Indian Statistical Inst. 

- 116 -

In this article, a new fragile, blind, high payload capacity, ROI (Region of Interest) preserving Medical image watermarking 

(MIW) technique in the spatial domain for gray scale medical images is proposed. We present a watermarking scheme 

that combines lossless data compression and encryption technique in application to medical images. The effectiveness of 

the proposed scheme, proven through experiments on various medical images through various image quality measure matrices 

such as PSNR, MSE and MSSIM enables us to argue that, the method will help to maintain Electronic Patient Report(EPR)/DICOM 

data privacy and medical image integrity. 

TuBT6 Topkapı Hall B 

Face Recognition – I Regular Session 

Session chair: Ross, Arun (West Virginia Univ.) 

13:0-13:50, Paper TuBT6.1 

Efficient Facial Attribute Recognition with a Spatial Codebook 

Ijiri, Yoshihisa, OMRON Corp. 

Lao, Shihong, OMRON Corp. 

Han, Tony X., Univ. of Missouri 

Murase, Hiroshi, Nagoya Univ. 

There is a large number of possible facial attributes such as hairstyle, with/without glasses, with/without mustache, etc. 

Considering large number of facial attributes and their combinations, it is difficult to build attributes classifiers for all possible 

combinations needed in various applications, especially at the designing stage. To tackle this important and challenging 

problem, we propose a novel efficient facial attributes recognition algorithm using a learned spatial codebook. 

The Maximum Entropy and Maximum Orthogonality (MEMO) criterion is followed to learn the spatial codebook. With 

a spatial codebook constructed at the designing stage, attribute classifiers can be trained on demand with a small number 

of exemplars with high accuracy on the testing data. Meanwhile, up to 600 times speedup is achieved in the on-demand 

training process, compared to current state-of-the-art method. The effectiveness of the proposed method is supported by 

convincing experimental results. 

13:50-14:10, Paper TuBT6.2 

Feature Space Hausdorff Distance for Face Recognition 

Chen, Shaokang, NICTA 


We propose a novel face image similarity measure based on Hausdorff distance (HD). In contrast to conventional HDbased 

measures, which are generally applied in the image space (such as edge maps or gradient images), the proposed 

HD-based similarity measure is applied in the feature space. By extending the concept of HD using a variable radius and 

reference set, we can generate a neighbourhood set for HD measures in feature space and then apply this concept for classification. 

Experiments on the Labeled Faces in the Wild; and FRGC datasets show that the proposed measure improves 

the overall classification performance quite dramatically, especially under the highly desirable low false acceptance rate 

conditions. 

14:10-14:30, Paper TuBT6.3 

How to Measure Biometric Information? 

Sutcu, Yagiz, Pol. Inst. of New York Univ. 

Sencar, Husrev Taha, TOBB Univ. of Ec. and Tech. 

Memon, Nasir, Pol. Inst. of New York Univ. 

Being able to measure the actual information content of biometrics is very important but also a challenging problem. Main 

difficulty here is not only related to the selected feature representation of the biometric data, but also related to the matching 

algorithm employed in biometric systems. In this paper, we propose a new measure for measuring biometric information 

using relative entropy between intra-user and inter-user distance distributions. As an example, we evaluated the proposed 

measure on a face image dataset. 

- 117 -

14:30-14:50, Paper TuBT6.4 

Intensity-Based Congealing for Unsupervised Joint Image Alignment 

Storer, Markus, Graz Univ. of Tech. 

Urschler, Martin, Graz Univ. of Tech. 


We present an approach for unsupervised alignment of an ensemble of images called congealing. Our algorithm is based 

on image registration using the mutual information measure as a cost function. The cost function is optimized by a standard 

gradient descent method in a multiresolution scheme. As opposed to other congealing methods, which use the SSD measure, 

the mutual information measure is better suited as a similarity measure for registering images since no prior assumptions 

on the relation of intensities between images are required. We present alignment results on the MNIST handwritten digit 

database and on facial images obtained from the CVL database. 

14:50-15:10, Paper TuBT6.5 

An Illumination Quality Measure for Face Recognition 

Rizo-Rodriguez, Dayron, Advanced Tech. Application Center 

Mendez-Vazquez, Heydi, Advanced Tech. Application Center 

Garcia, Edel, Advanced Tech. Application Center 

A method to determine whether face images are affected or not by lighting problems is proposed. The method is the result 

of combining the analysis of lighting effect on face regions with the analysis of special areas which have a weight on the 

decision. Good results were obtained classifying well and badly illuminated images. The proposed method was inserted 

on a face recognition framework in order to apply the preprocessing step only to those images affected by illumination 

variations. The good performance achieved on verification and identification experiments, confirm that it is better to apply 

the proposed methodology than to preprocess all images when the lighting conditions are variable. 

TuBT7 Dolmabahçe Hall C 

Biomedical Image Segmentation Regular Session 

Session chair: Kato, Zoltan (Univ. of Szeged) 

13:30-13:50, Paper TuBT7.1 

Cascaded Segmentation of Grained Cell Tissue with Active Contour Models 

Moeller, Birgit, Martin-Luther-Univ. Halle-Wittenberg 

Stöhr, Nadine, ZAMED, Martin Luther Univ. Halle-Wittenberg 

Hüttelmaier, Stefan, ZAMED, Martin Luther Univ. Halle-Wittenberg 

Posch, Stefan, Martin-Luther-Univ. Halle-Wittenberg 

Cell tissue in microscope images is often grained and its intensities do not well agree with Gaussian distribution assumptions 

widely used in many segmentation approaches. We present a new cascaded segmentation scheme for inhomogeneous 

cell tissue based on active contour models. Cell regions are iteratively expanded from initial nuclei regions applying a 

data-dependent number of optimization levels. Experimental results on a set of microscope images from a human hepatoma 

cell line prove high quality of the results with regard to the cell segmentation task and biomedical investigations. 

13:50-14:10, Paper TuBT7.2 

Live Cell Segmentation in Fluorescence Microscopy via Graph Cut 

Lesko, Milan, Univ. of Szeged 

Kato, Zoltan, Univ. of Szeged 

Nagy, Antal, Univ. of Szeged 

Gombos, Imre, Hungarian Acad. of Sciences 

Torok, Zsolt, Hungarian Acad. of Sciences 

Vigh Jr, Laszlo, Univ. of Szeged 

Vigh, Laszlo, Hungarian Acad. of Sciences 

We propose a novel Markovian segmentation model which takes into account edge information. By construction, the 

model uses only pairwise interactions and its energy is sub modular. Thus the exact energy minima is obtained via a maxflow/min-cut 

algorithm. The method has been quantitatively evaluated on synthetic images as well as on fluorescence microscopic 

images of live cells. 

- 118 -

14:10-14:30, Paper TuBT7.3 

Retinal Blood Vessels Segmentation using the Radial Projection and Supervised Classification 

Peng, Qinmu, Huazhong Univ. of Science and Tech. 


Zhou, Long, Wuhan Pol. Univ. 

Cheung, Yiu-Ming, Hong Kong Baptist Univ. 

The low-contrast and narrow blood vessels in retinal images are difficult to be extracted but useful in revealing certain 

systemic disease. Motivated by the goals of improving detection of such vessels, we propose the radial projection method 

to locate the vessel centerlines. Then the supervised classification is used for extracting the major structures of vessels. 

The final segmentation is obtained by the union of the two types of vessels after removal schemes. Our approach is tested 

on the STARE database, the results demonstrate that our algorithm can yield better segmentation. 

14:30-14:50, Paper TuBT7.4 

Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During Speech 

Fasel, Ian, Univ. of Arizona 

Berry, Jeff, Univ. of Arizona 

Ultrasound has become a useful tool for speech scientists studying mechanisms of language sound production. State-ofthe-art 

methods for extracting tongue contours from ultrasound images of the mouth, typically based on active contour 

snakes, require considerable manual interaction by an expert linguist. In this paper we describe a novel method for fully 

automatic extraction of tongue contours based on a hierarchy of restricted Boltzmann machines (RBMs), i.e. deep belief 

networks (DBNs). Usually, DBNs are first trained generatively on sensor data, then discriminatively to predict humanprovided 

labels of the data. In this paper we introduce the translational RBM (tRBM), which allows the DBN to make use 

of both human labels and raw sensor data at all stages of learning. This method yields performance in contour extraction 

comparable to human labelers, without any temporal smoothing or human intervention, and runs in real-time. 

14:50-15:10, Paper TuBT7.5 

Automated Gland Segmentation and Classification for Gleason Grading of Prostate Tissue Images 

Nguyen, Kien, Michigan State Univ. 


Allen, Ronald, BioImagene 

The well-known Gleason grading method for an H&E prostatic carcinoma tissue image uses morphological features of histology 

patterns within a tissue slide to classify it into 5 grades. We have developed an automated gland segmentation and 

classification method that will be used for automated Gleason grading of a prostatic carcinoma tissue image. We demonstrate 

the performance of the proposed classification system for a three-class classification problem (benign, grade 3 carcinoma 

and grade 4 carcinoma) on a dataset containing 78 tissue images and achieve a classification accuracy of 88.84%. In comparison 

to the other segmentation-based methods, our approach combines the similarity of morphological patterns associated 

with a grade with the domain knowledge such as the appearance of nuclei and blue mucin for the grading task. 

TuCT1 Topkapı Hall B 

Face Recognition – II Regular Session 

Session chair: Tistarelli, Massimo (Univ. of Sassari) 

15:40-16:00, Paper TuCT1.1 

Multi-Resolution Local Appearance-Based Face Verification 

Gao, Hua, Karlsruhe Inst. of Tech. 

Ekenel, Hazim Kemal, Karlsruhe Inst. of Tech. 

Fischer, Mika, Karlsruhe Inst. of Tech. 

Stiefelhagen, Rainer, Karlsruhe Inst. of Tech. & Fraunhofer IITB 

Facial analysis based on local regions/blocks usually outperforms holistic approaches because it is less sensitive to local 

deformations and occlusions. Moreover, modeling local features enables us to avoid the problem of high dimensionality 

of feature space. In this paper, we model the local face blocks with Gabor features and project them into a discriminant 

identity space. The similarity score of a face pair is determined by fusion of the local classifiers. To acquire complementary 

- 119 -

information in different scales of face images, we integrate the local decisions from various image resolutions. The proposed 

multi-resolution block based face verification system is evaluated on the experiment 4 of Face Recognition Grand Challenge 

(FRGC) version 2.0. We obtained 92.5% verification rate@0.1% FAR, which is the highest performance reported 

on this experiment so far in the literature. 

16:00-16:20, Paper TuCT1.2 

Partial Face Biometry using Shape Decomposition on 2D Conformal Maps of Faces 

Szeptycki, Przemyslaw, Ec. Centrale de Lyon 



Zeng, Wei, Wayne State Univ. 

Gu, Xianfeng, State Univ. of New York at Stony Brook 

Samaras, Dimitris, Stony Brook Univ. 

In this paper, we introduce a new approach for partial 3D face recognition, which makes use of shape decomposition over 

the rigid1 part of a face. To explore the descriptiveness of shape dissimilarity over an isometric part of a face, which has 

lower probability to be influenced by expression, we transform a 3D shape to a 2D domain using conformal mapping and 

use shape decomposition as a similarity measurement. In our work we investigate several classifiers as well as several 

shape descriptors for recognition purposes. Recognition tests on a subset of the FRGC data set show approximately 80% 

rank-one recognition rate using only the eyes and nose part of the face. 

16:20-16:40, Paper TuCT1.3 

Gender Classification using Interlaced Derivative Patterns 

Shobeirinejad, Ameneh, Griffith Univ. 


Automated gender recognition has become an interesting and challenging research problem in recent years with its potential 

applications in security industry and human-computer interaction systems. In this paper we present a novel feature representation, 

namely Interlaced Derivative Patterns (IDP), which is a derivative-based technique to extract discriminative 

facial features for gender classification. The proposed technique operates on a neighborhood around a pixel and concatenates 

the extracted regional feature distributions to form a feature vector. The experimental results demonstrate the effectiveness 

of the IDP method for gender classification, showing that the proposed approach achieves 29.6% relative error 

reduction compared to Local Binary Patterns (LBP), while it performs over four times faster than Local Derivative Patterns 

(LDP). 

16:40-17:00, Paper TuCT1.4 

Heterogeneous Face Recognition: Matching NIR to Visible Light Images 

Klare, Brendan, Michigan State Univ. 


Matching near-infrared (NIR) face images to visible light (VIS) face images offers a robust approach to face recognition 

with unconstrained illumination. In this paper we propose a novel method of heterogeneous face recognition that uses a 

common feature-based representation for both NIR images as well as VIS images. Linear discriminant analysis is performed 

on a collection of random subspaces to learn discriminative projections. NIR and VIS images are matched (I) directly 

using the random subspace projections, and (ii) using sparse representation classification. Experimental results demonstrate 

the effectiveness of the proposed approach for matching NIR and VIS face images. 

17:00-17:20, Paper TuCT1.5 

Clustering Face Carvings: Application to Devatas of Angkor Wat 

Klare, Brendan, Michigan State Univ. 

Mallapragada, Pavan Kumar, Michigan State Univ. 


Davis, Kent, DatAsia Inc. 

We propose a framework for clustering and visualization of images of face carvings at archaeological sites. The pairwise 

- 120 -

similarities among face carvings are computed by performing Procrustes analysis on local facial features (eyes, nose, 

mouth, etc.). The distance between corresponding face features is computed using point distribution models; the final pairwise 

similarity is the weighted sum of feature similarities. A web-based interface is provided to allow domain experts to 

interactively assign different weights to each face feature, and display hierarchical clustering results in 2D or 3D projections 

obtained by multidimensional scaling. The proposed framework has been successfully applied to the devata goddesses 

depicted in the ancient Angkor Wat temple. The resulting clusterings and visualization will enable a systematic anthropological, 

ethnological and artistic analysis of nearly 1,800 stone portraits of devatas of Angkor Wat. 

TuCT2 Topkapı Hall A 

Feature Extraction – II Regular Session 

Session chair: Covell, Michele (Google, Inc.) 

15:40-16:00, Paper TuCT2.1 

Action Recognition using Spatial-Temporal Context 

Hu, Qiong, Chinese Acad. of Sciences 

Qin, Lei, Chinese Acad. of Sciences 




The spatial-temporal local features and the bag of words representation have been widely used in the action recognition 

field. However, this framework usually neglects the internal spatial-temporal relations between video-words, resulting in 

ambiguity in action recognition task, especially for videos in the wild. In this paper, we solve this problem by utilizing the 

volumetric context around a video-word. Here, a local histogram of video-words distribution is calculated, which is referred 

as the context and further clustered into contextual words. To effectively use the contextual information, the descriptive 

video-phrases (ST-DVPs) and the descriptive video-cliques (ST-DVCs) are proposed. A general framework for ST-DVP 

and ST-DVC generation is described, and then action recognition can be done based on all these representations and their 

combinations. The proposed method is evaluated on two challenging human action datasets: the KTH dataset and the 

YouTube dataset. Experiment results confirm the validity of our approach. 

16:00-16:20, Paper TuCT2.2 

Feature Extraction for Simple Classification 

Stuhlsatz, André, Univ. of Applied Sciences Duesseldorf 

Lippel, Jens, Univ. of Applied Sciences Duesseldorf 

Zielke, Thomas, Univ. of Applied Sciences Duesseldorf 

Constructing a recognition system based on raw measurements for different objects usually requires expert knowledge of 

domain specific data preprocessing, feature extraction, and classifier design. We seek to simplify this process in a way 

that can be applied without any knowledge about the data domain and the specific properties of different classification algorithms. 

That is, a recognition system should be simple to construct and simple to operate in practical applications. For 

this, we have developed a nonlinear feature extractor for high-dimensional complex patterns, using Deep Neural Networks 

(DNN). Trained partly supervised and unsupervised, the DNN effectively implements a nonlinear discriminant analysis 

based on a Fisher criterion in a feature space of very low dimensions. Our experiments show that the automatically extracted 

features work very well with simple linear discriminants, while the recognition rates improve only minimally if more sophisticated 

classification algorithms like Support Vector Machines (SVM) are used instead. 

16:20-16:40, Paper TuCT2.3 

Towards a Generic Feature-Selection Measure for Intrusion Detection 

Nguyen, Hai Thanh, Gjøvik Univ. Coll. 

Franke, Katrin, Gjøvik Univ. Coll. 

Petrovic, Slobodan, Gjøvik Univ. Coll. 

Performance of a pattern recognition system depends strongly on the employed feature-selection method. We perform an 

in-depth analysis of two main measures used in the filter model: the correlation-feature-selection (CFS) measure and the 

minimal-redundancy-maximal-relevance (mRMR) measure. We show that these measures can be fused and generalized 

into a generic feature-selection (GeFS) measure. Further on, we propose a new feature-selection method that ensures glob- 

- 121 -

ally optimal feature sets. The new approach is based on solving a mixed 0-1 linear programming problem (M01LP) by 

using the branch-and-bound algorithm. In this M01LP problem, the number of constraints and variables is linear ($O(n)$) 

in the number $n$ of full set features. In order to evaluate the quality of our GeFS measure, we chose the design of an intrusion 

detection system (IDS) as a possible application. Experimental results obtained over the KDD Cup’99 test data set 

for IDS show that the GeFS measure removes 93% of irrelevant and redundant features from the original data set, while 

keeping or yielding an even better classification accuracy. 

16:40-17:00, Paper TuCT2.4 

Discriminative Basis Selection using Non-Negative Matrix Factorization 

Jammalamadaka, Aruna, Univ. of California, Santa Barbara 

Joshi, Swapna, Univ. of California, Santa Barbara 

Shanmuga Vadivel, Karthikeyan, Univ. of California, Santa Barbara 

Manjunath, B. S., Univ. of California, Santa Barbara 

Non-negative matrix factorization (NMF) has proven to be useful in image classification applications such as face recognition. 

We propose a novel discriminative basis selection method for classification of image categories based on the popular 

term frequency-inverse document frequency (TF-IDF) weight used in information retrieval. We extend the algorithm to 

incorporate color, and overcome the drawbacks of using unaligned images. Our method is able to choose visually significant 

bases which best discriminate between categories and thus prune the classification space to increase correct classifications. 

We apply our technique to ETH-80, a standard image classification benchmark dataset. Our results show that our algorithm 

outperforms other state-of-the-art techniques. 

17:00-17:20, Paper TuCT2.5 

Recognizing Dance Motions with Segmental SVD 

Deng, Liqun, Univ. of Science & Tech. of China 

Leung, Howard, City Univ. of Hong Kong 

Gu, Naijie, Univ. of Science & Tech. of China 

Yang, Yang, Univ. of Science & Tech. of China 

In this paper, a novel concept of segmental singular value decomposition (SegSVD) is proposed to represent a motion 

pattern with a hierarchical structure. The similarity measure based on the SegSVD representation is also proposed. SegSVD 

is capable of capturing the temporal information of the time series. It is effective in matching patterns in a time series in 

which the start and end points of the patterns are not known in advance. We evaluate the performance of our method on 

both isolated motion classification and continuous motion recognition for dance movements. Experiments show that our 

method outperforms existing work in terms of higher recognition accuracy. 

TuCT3 Marmara Hall 

Object Detection and Recognition – III Regular Session 

Session chair: Nixon, Mark (Univ. of Southampton) 

15:40-16:00, Paper TuCT3.1 

Multi-Class Graph Boosting with Subgraph Sharing for Object Recognition 

Zhang, Bang, Univ. of New South Wales, National ICT Australia 

Ye, Getian, Univ. of New South Wales 

Wang, Yang, National ICT Australia, Univ. of New South Wales 

Wang, Wei, Univ. of New South Wales 

Xu, Jie, National ICT Australia, Univ. of New South Wales 

Herman, Gunawan, National ICT Australia, Univ. of New South Wales 

Yang, Jun, National ICT Australia, Univ. of New South Wales 

In this paper, we propose a novel multi-class graph boosting algorithm to recognize different visual objects. The proposed 

method treats subgraph as feature to construct base classifier, and utilizes popular error correcting output code scheme to 

solve multi-class problem. Both factors, base classifier and error-correcting coding matrix are considered simultaneously. 

And subgragphs, which are shareable by different classes, are wisely used to improve the classification performance. The 

experimental results on multi-class object recognition show the effectiveness of the proposed algorithm. 

- 122 -

16:00-16:20, Paper TuCT3.2 

Level-Set Segmentation of Brain Tumors using a New Hybrid Speed Function 

Cho, Wanhyun, Chonnam National Univ. 

Park, Jonghyun, Chonnam National Univ. 

Park, Soonyoung, Mokpo National Univ. 

Kim, Soohyung, Chonnam National Univ. 

Kim, Sunworl, Chonnam National Univ. 

Ahn, Gukdong, Chonnam National Univ. 

Lee, Myungeun, Chonnam National Univ. 

Lee, Gueesang, Chonnam National Univ. 

This paper presents a new hybrid speed function needed to perform image segmentation within the level-set framework. 

This speed function provides a general form that incorporates the alignment term as a part of the driving force for the 

proper edge direction of an active contour by using the probability term derived from the region partition scheme and, for 

regularization, the geodesics contour term. First, we use an external force for active contours as the Gradient Vector Flow 

field. This is computed as the diffusion of gradient vectors of a gray level edge map derived from an image. Second, we 

partition the image domain by progressively fitting statistical models to the intensity of each region. Here we adopt two 

Gaussian distributions to model the intensity distribution of the inside and outside of the evolving curve partitioning the 

image domain. Third, we use the active contour model that has the computation of geodesics or minimal distance curves, 

which allows stable boundary detection when the model’s gradients suffer from large variations including gaps or noise. 

Finally, we test the accuracy and robustness of the proposed method for various medical images. Experimental results 

show that our method can properly segment low contrast, complex images. 

16:20-16:40, Paper TuCT3.3 

The Impact of Color on Bag-of-Words based Object Recognition 

Rojas Vigo, David Augusto, Computer Vision Center Barcelona 

Shahbaz Khan, Fahad, Computer Vision Center Barcelona 

Van De Weijer, Joost, Computer Vision Center Barcelona 

Gevers, Theo, Univ. of Amsterdam 

In recent years several works have aimed at exploiting color information in order to improve the bag-of-words based 

image representation. There are two stages in which color information can be applied in the bag-of-words framework. 

Firstly, feature detection can be improved by choosing highly informative color-based regions. Secondly, feature description, 

typically focusing on shape, can be improved with a color description of the local patches. Although both approaches 

have been shown to improve results the combined merits have not yet been analyzed. Therefore, in this paper we investigate 

the combined contribution of color to both the feature detection and extraction stages. Experiments performed on two 

challenging data sets, namely Flower and Pascal VOC 2009; clearly demonstrate that incorporating color in both feature 

detection and extraction significantly improves the overall performance. 

16:40-17:00, Paper TuCT3.4 

Pyramidal Model for Image Semantic Segmentation 

Passino, Giuseppe, Queen Mary, Univ. of London 

Patras, Ioannis, Queen Mary, Univ. of London 

Izquierdo, Ebroul, Queen Mary, Univ. of London 

We present a new hierarchical model applied to the problem of image semantic segmentation, that is, the association to 

each pixel in an image with a category label (e.g. tree, cow, building, ...). This problem is usually addressed with a combination 

of an appearance-based pixel classification and a pixel context model. In our proposal, the images are initially 

over-segmented in dense patches. The proposed pyramidal model naturally embeds the compositional nature of a scene to 

achieve a multi-scale contextualisation of patches. This is obtained by imposing an order on the patches aggregation operations 

towards the final scene. The nodes of the pyramid (that is, a dendrogram) thus represent patch clusters, or superpatches. 

The probabilistic model favours the homogeneous labelling of super-patches that are likely to contain a single 

object instance, modelling the uncertainty in identifying such super-patches. The proposed model has several advantages, 

including the computational efficiency, as well as the expandability. Initial results place the model in line with other works 

in the recent literature. 

- 123 -

17:00-17:20, Paper TuCT3.5 

Multi-View based Estimation of Human Upper-Body Orientation 

Rybok, Lukas, Karlsruhe Inst. of Tech. 

Voit, Michael, Fraunhofer Inst. of Optronics 



The knowledge about the body orientation of humans can improve speed and performance of many service components 

of a smart-room. Since many of such components run in parallel, an estimator to acquire this knowledge needs a very low 

computational complexity. In this paper we address these two points with a fast and efficient algorithm using the smartroom’s 

multiple camera output. The estimation is based on silhouette information only and is performed for each camera 

view separately. The single view results are fused within a Bayesian filter framework. We evaluate our system on a subset 

of videos from the CLEAR 2007 dataset and achieve an average correct classification rate of 87.8%, while the estimation 

itself just takes 12 ms when four cameras are used. 

TuCT4 Dolmabahçe Hall A 

Structural Methods Regular Session 

Session chair: Ghosh, Joydeep (Univ. of Texas) 

15:40-16:00, Paper TuCT4.1 

An Iterative Algorithm for Approximate Median Graph Computation 

Ferrer, Miquel, Univ. Pol. De Catalunuya 


Recently, the median graph has been shown to be a good choice to obtain a representative of a given set of graphs. It has 

been successfully applied to graph-based classification and clustering. In this paper we exploit a theoretical property of 

the median, which has not yet been utilized in the past, to derive a new iterative algorithm for approximate median graph 

computation. Experiments done using five different graph databases show that the proposed approach yields, in four out 

of these five datasets, better medians than two of the previous existing methods. 

16:00-16:20, Paper TuCT4.2 

A Supergraph-Based Generative Model 

Han, Lin, Univ. of York 

Wilson, Richard, Univ. of York 


This paper describes a method for constructing a generative model for sets of graphs. The method is posed in terms of 

learning a supergraph from which the samples can be obtained by edit operations. We construct a probability distribution 

for the occurrence of nodes and edges over the supergraph. We use the EM algorithm to learn both the structure of the supergraph 

and the correspondences between the nodes of the sample graphs and those of the supergraph, which are treated 

as missing data. In the experimental evaluation of the method, we a) prove that our supergraph learning method can lead 

to an optimal or suboptimal supergraph, and b) show that our proposed generative model gives good graph classification 

results. 

16:20-16:40, Paper TuCT4.3 

Levelings and Flatzone Morphology 

Meyer, Fernand, Mines-ParisTech 

Successive levelings are applied on document images. The residues of successive levelings are made of flat zones for 

which morphological transforms are described. 

16:40-17:00, Paper TuCT4.4 

Combining Force Histogram and Discrete Lines to Extract Dashed Lines 

Debled-Rennesson, Isabelle, LORIA – Nancy Univ. 

Wendling, Laurent, Univ. Paris Descartes 

- 124 -

A new method to extract dashed lines in technical document is proposed in this paper by combining force histogram and 

discrete lines. The aim is to study the spatial location of couples of connected components using force histogram and to 

refine the recognition by considering surrounding discrete lines. This new model is fast and it allows a good extraction of 

occulted patterns in presence of noise. Efficient common methods require several thresholds to process with technical 

documents. The proposed method requires only few thresholds which can be automatically set from data. 

17:00-17:20, Paper TuCT4.5 

Heat Flow-Thermodynamic Depth Complexity in Networks 


Lozano, Miguel Angel, Univ. of Alicante 


In this paper we establish a formal link between network complexity in terms of Birkhoff-von Newmann decompositions 

and heat flow complexity (in terms of quantifying the heat flowing through the network at a given inverse temperature). 

Furthermore, we also define heat flow complexity in terms of thermodynamic depth, which results in a novel approach 

for characterizing networks and quantify their complexity. In our experiments we characterize several protein-protein interaction 

(PPI) networks and then highlight their evolutive differences. 

TuCT5 Anadolu Auditorium 

Image Analysis – V Regular Session 

Session chair: Kasturi, Rangachar (Univ. of South Florida) 

15:40-16:00, Paper TuCT5.1 

Content Adaptive Hash Lookups for Near-Duplicate Image Search by Full or Partial Image Queries 

Harmanci, Oztan, Anvato Inc. 

Haritaoglu, Ismail, Pol. Rain Inc. 

In this paper we present a scalable and high performance near-duplicate image search method. The proposed algorithm 

follows the common paradigm of computing local features around repeatable scale invariant interest points. Unlike existing 

methods, much shorter hashes are used (40 bits). By leveraging on the shortness of the hashes, a novel high performance 

search algorithm is introduced which analyzes the reliability of each bit of a hash and performs content adaptive hash 

lookups by adaptively adjusting the “range” of each hash bit based on reliability. Matched features are post-processed to 

determine the final match results. We experimentally show that the algorithm can detect cropped, resized, print-scanned 

and re-encoded images and pieces from images among thousands of images. The proposed algorithm can search for a 

200x200 piece of image in a database of 2,250 images with size 2400x4000 in 0.020 seconds on 2.5GHz Intel Core 2. 

16:00-16:20, Paper TuCT5.2 

The Good, the Bad, and the Ugly: Predicting Aesthetic Image Labels 

Wu, Yaowen, RWTH Aachen Univ. Fraunhofer Inst. IAIS 



Automatic classification of the aesthetic content of a picture is one of the challenges in the emerging discipline of computational 

aesthetics. Any suitable solution must cope with the facts that aesthetic experiences are highly subjective and 

that a commonly agreed upon theory of their psychological constituents is still missing. In this paper, we present results 

obtained from an empirical basis of several thousand images. We train SVM based classifiers to predict aesthetic adjectives 

rather than aesthetic scores and we introduce a probabilistic post processing step that alleviates effects due to misleadingly 

labeled training data. Extensive experimentation indicates that aesthetics classification is possible to a large extent. In particular, 

we find that previously established low-level features are well suited to recognize beauty. Robust recognition of 

unseemliness, on the other hand, appears to require more high-level analysis. 

16:20-16:40, Paper TuCT5.3 

Information Fusion for Combining Visual and Textual Image Retrieval 

Zhou, Xin, Geneva Univ. Hospitals and Univ. of Geneva 

Depeursinge, Adrien, Geneva Univ. Hospitals and Univ. of Geneva 

- 125 -

Müller, Henning, Univ. of Applied Sciences Sierre, Switzerland 

In this paper, classical approaches such as maximum combinations (combMAX), sum combinations (comb-SUM) and 

the product of the maximum and a nonzero number (combMNZ) were employed and the trade off between two fusion effects 

(chorus and dark horse effects) was studied based on the sum of n maximums. Various normalization strategies were 

tried out. The fusion algorithms are evaluated using the best four visual and textual runs of the ImageCLEF medical image 

retrieval task 2008 and 2009. The results show that fused runs outperform the best original runs and multi-modality fusion 

statistically outperforms single modality fusion. The logarithmic rank penalization shows to be the most stable normalization. 

The dark horse effect is in competition with the chorus effect and each of them can produce best fusion performance 

depending on the nature of the input data. 

16:40-17:00, Paper TuCT5.4 

Perceptual Image Retrieval by Adding Color Information to the Shape Context Descriptor 

Rusiñol, Marçal, Univ. Autònoma de Barcelona 

Nourbakhsh, Farshad, Computer Vision Center / Univ. Autònoma de Barcelona 

Karatzas, Dimosthenis, Univ. Autonoma de Barcelona 

Valveny, Ernest, Computer Vision Center / Univ. Autònoma de Barcelona 

Llados, Josep, Computer Vision Center 

In this paper we present a method for the retrieval of images in terms of perceptual similarity. Local color information is 

added to the shape context descriptor in order to obtain an object description integrating both shape and color as visual cues. 

We use a color naming algorithm in order to represent the color information from a perceptual point of view. The proposed 

method has been tested in two different applications, an object retrieval scenario based on color sketch queries and a color 

trademark retrieval problem. Experimental results show that the addition of the color information significantly outperforms 

the sole use of the shape context descriptor. 

17:00-17:20, Paper TuCT5.5 

Weighted Boundary Points for Shape Analysis 

Zhang, Jing, Univ. of South Florida 


Shape analysis is an active and important branch in computer vision research field. In recent years, many geometrical, topological, 

and statistical features have been proposed and widely used for shape-related applications. In this paper, based on 

the properties of Distance Transform, we present a new shape feature, weight of boundary point. By computing the shortest 

distances between boundary points and distance contours of a transformed shape, every boundary point is assigned a weight, 

which contains the interior structure information of the shape. To evaluate the proposed new shape feature, we tested the 

weighted boundary points on shape matching and shape decomposition. The experimental results demonstrated the validity. 

TuCT6 Dolmabahçe Hall B 

Speech and Speaker Recognition Regular Session 

Session chair: Shinoda, Koichi (Tokyo Institute of Technology) 

15:40-16:00, Paper TuCT6.1 

Dimension-Decoupled Gaussian Mixture Model for Short Utterance Speaker Recognition 

Stadelmann, Thilo, Univ. of Marburg 

Freisleben, Bernd, Univ. of Marburg 

The Gaussian Mixture Model (GMM) is often used in conjunction with Mel-frequency cepstral coefficient (MFCC) feature 

vectors for speaker recognition. A great challenge is to use these techniques in situations where only small sets of training 

and evaluation data are available, which typically results in poor statistical estimates and, finally, recognition scores. Based 

on the observation of marginal MFCC probability densities, we suggest to greatly reduce the number of free parameters in 

the GMM by modeling the single dimensions separately after proper preprocessing. Saving about 90% of the free parameters 

as compared to an already optimized GMM and thus making the estimates more stable, this approach considerably improves 

recognition accuracy over the baseline as the utterances get shorter and saves a huge amount of computing time both in training 

and evaluation, enabling real-time performance. The approach is easy to implement and to combine with other short-utterance 

approaches, and applicable to other features as well. 

- 126 -

16:00-16:20, Paper TuCT6.2 

Modeling Syllable-Based Pronunciation Variation for Accented Mandarin Speech Recognition 

Zhang, Shilei, IBM Res. 

Shi, Qin, IBM Res. – China 

Qin, Yong, IBM Res. – China 

Pronunciation variation is a natural and inevitable phenomenon in an accented Mandarin speech recognition application. 

In this paper, we integrate knowledge-based and data-driven approaches together for syllable-based pronunciation variation 

modeling to improve the performance of Mandarin speech recognition system for speakers with Southern accent. First, 

we generate the syllable-based pronunciation variation rules of Southern accent observed from the training corpus by Chinese 

linguistic expert. Second, dictionary augmentation with multiple pronunciation variants and pronunciation probability 

derived from forced alignment statistics of training data. The acoustic models will be retrained based on the new expansion 

dictionary. Finally, pronunciation variation adaptation will be performed to further fit the data on the decoding stage by 

taking distribution of variation rules clusters of testing set into account. The experimental results show that the proposed 

method provides a flexible framework to improve the recognition performance for accented speech effectively. 

16:20-16:40, Paper TuCT6.3 

Automatic Pronunciation Transliteration for Chinese-English Mixed Language Keyword Spotting 

Zhang, Shilei, IBM Res. 

Shuang, Zhiwei, IBM Res. – China 

Qin, Yong, IBM Res. – China 

This paper presents automatic pronunciation transliteration method with acoustic and contextual analysis for Chinese- 

English mixed language keyword spotting (KWS) system. More often, we need to develop robust Chinese-English mixed 

language spoken language technology without Chinese accented English acoustic data. In this paper, we exploit pronunciation 

conversion method based on syllable-based characteristic analysis of pronunciation and data-driven phoneme pairs 

mappings to solve mixed language problem by only using well-trained Chinese models. One obvious advantage of such 

method is that it provides a flexible framework to implement the pronunciation conversion of English keywords to Chinese 

automatically. The efficiency of the proposed method was demonstrated under KWS task on mixed language database. 

16:40-17:00, Paper TuCT6.4 

Learning Virtual HD Model for Bi-Model Emotional Speaker Recognition 

Huang, Ting, Zhejiang Univ. 

Yang, Yingchun, Zhejiang Univ. 

Pitch mismatch between training and testing is one of the important factors causing the performance degradation of the 

speaker recognition system. In this paper, we adopted the missing feature theory and specified the Unreliable Region (UR) 

as the parts of the utterance with high emotion induced pitch variation. To model these regions, a virtual HD (High Different 

from neutral, with large pitch offset) model for each target speaker was built from the virtual speech, which were converted 

from the neutral speech by the Pitch Transformation Algorithm (PTA). In the PTA, a polynomial transformation function 

was learned to model the relationship of the average pitch between the neutral and the high-pitched utterances. Compared 

with traditional GMM-UBM and our previous method, our new method obtained 1.88% and 0.84% identification rate 

(IR) increase on the MASC respectively, which are promising results. 

17:00-17:20, Paper TuCT6.5 

Role of Synthetically Generated Samples on Speech Recognition in a Resource-Scarce Language 

Chakraborty, Rupayan, St. Thomas’ Coll. of Eng. & Tech. 

Garain, Utpal, Indian Statistical Inst. 

Speech recognition systems that make use of statistical classifiers require a large number of training samples. However, 

collection of real samples has always been a difficult problem due to the involvement of substantial amount of human intervention 

and cost. Considering this problem, this paper presents a novel method for generating synthetic samples from 

a handful of real samples and investigates the role of these samples in designing a speech recognition system. Speaker dependent 

limited vocabulary isolated word recognition in an Indian language (i.e. Bengali) has been taken a reference to 

demonstrate the potential of the proposed framework. The role of synthetic samples is demonstrated by showing a significant 

improvement in recognition accuracy. A maximum improvement of 10% is achieved using the proposed approach. 

- 127 -

TuCT7 Dolmabahçe Hall C 

Fingerprint Regular Session 

Session chair: Sankur, Bülent (Bogazici Univ.) 

15:40-16:00, Paper TuCT7.1 

Detecting Altered Fingerprints 

Feng, Jianjiang, Tsinghua Univ. 



The widespread deployment of Automated Fingerprint Identification Systems (AFIS) in law enforcement and border 

control applications has prompted some individuals with criminal background to evade identification by purposely altering 

their fingerprints. Available fingerprint quality assessment software cannot detect most of the altered fingerprints since 

the implicit image quality does not always degrade due to alteration. In this paper, we classify the alterations observed in 

an operational database into three categories and propose an algorithm to detect altered fingerprints. Experiments were 

conducted on both real-world altered fingerprints and synthetically generated altered fingerprints. At a false alarm rate of 

7%, the proposed algorithm detected 92% of the altered fingerprints, while a well-known fingerprint quality software, 

NFIQ, only detected 20% of the altered fingerprints. 

16:00-16:20, Paper TuCT7.2 

A Variational Formulation for Fingerprint Orientation Modeling 

Hou, Zujun, Inst. For Infocomm Res. 

Yau, Wei-Yun, Inst. For Infocomm Res. 

Fingerprint orientation plays important roles in fingerprint recognition. This paper proposes a framework for modeling 

the fingerprint orientation field based on the variational principle. The proposed method does not require any prior information 

about the structure of acquired fingerprints. Comparison has been made with respect to state-of-the-arts in fingerprint 

orientation modeling. 

16:20-16:40, Paper TuCT7.3 

Fingerprint Pore Matching based on Sparse Representation 

Liu, Feng, The Hong Kong Pol. Univ. 




This paper proposes an improved direct fingerprint pore matching method. It measures the differences between pores by 

using the sparse representation technique. The coarse pore correspondences are then established and weighted based on 

the obtained differences. The false correspondences among them are finally removed by using the weighted RANSAC algorithm. 

Experimental results have shown that the proposed method can greatly improve the accuracy of existing methods. 

16:40-17:00, Paper TuCT7.4 

Latent Fingerprint Core Point Prediction based on Gaussian Processes 

Su, Chang, Univ. at Buffalo, State Univ. of New York 

Srihari, Sargur, Univ. at Buffalo, State Univ. of New York 

Core point prediction is of critical importance to latent fingerprints individuality assessment. While tremendous effort 

have been made in core point detection, locating core points in latent fingerprints continues to be a difficult problem 

because latent prints usually contain only partial images with core points left outside the print. A novel method is proposed 

that predicts the locations and orientations of core points for latent fingerprints. The method is based on Gaussian processes 

and provides prediction in interpretations of probability rather than binary decision. The accuracy of the method is illustrated 

by experiments on a real-life latent fingerprint data set. 

17:00-17:20, Paper TuCT7.5 

Towards a Better Understanding of the Performance of Latent Fingerprint Recognition in Realistic Forensic Conditions 

Puertas, Maria, Univ. Autonoma de Madrid 

- 128 -

Ramos, Daniel, Univ. Autonoma de Madrid 

Fierrez, Julian, Univ. Autonoma de Madrid 

Ortega-Garcia, Javier, Univ. Autonoma de Madrid 

Exposito-Marquez, NicomedesDepartamento de Identificacion. Servicio de Criminalistica de la Guardia Civil, Ministerio 

del Interior, Spain. 

This work studies the performance of a state-of-the-art fingerprint recognition technology, in several practical scenarios 

of interest in forensic casework. First, the differences in performance between manual and automatic minutiae extraction 

for latent fingerprints are presented. Then, automatic minutiae extraction is analyzed using three different types of fingerprints: 

latent, rolled and plain. The experiments are carried out using a database of latent finger marks and fingerprint impressions 

from real forensic cases. The results show high performance degradation in automatic minutiae extraction 

compared to manual extraction by human experts. Moreover, high degradation in performance on latent finger marks can 

be observed in comparison to fingerprint impressions. 

TuBCT8 Upper Foyer 

3D Shape Recovery; Image and Physics-Based Modeling; Motion and Multi-View Vision; Tracking and Surveillance 

Poster Session 

Session chair: Jiang, Xiaoyi (Univ. of Münster) 

13:30-16:30, Paper TuBCT8.1 

Online Next-Best-View Planning for Accuracy Optimization using an Extended E-Criterion 

Trummer, Michael, Friedrich-Schiller Univ. of Jena 

Munkelt, Christoph, Fraunhofer Society 

Denzler, Joachim, Friedrich-Schiller Univ. of Jena 

Next-best-view (NBV) planning is an important aspect for three-dimensional (3D) reconstruction within controlled environments, 

such as a camera mounted on a robotic arm. NBV methods aim at a purposive 3D reconstruction sustaining 

predefined goals and limitations. Up to now, literature mainly presents NBV methods for range sensors, model-based approaches 

or algorithms that address the reconstruction of a finite set of primitives. For this work, we use an intensity 

camera without active illumination. We present a novel combined online approach comprising feature tracking, 3D reconstruction, 

and NBV planning that addresses arbitrary unknown objects. In particular we focus on accuracy optimization 

based on the reconstruction uncertainty. To this end we introduce an extension of the statistical E-criterion to model directional 

uncertainty, and we present a closed-form, optimal solution to this NBV planning problem. Our experimental 

evaluation demonstrates the effectivity of our approach using an absolute error measure. 

13:30-16:30, Paper TuBCT8.2 

Non Contact 3D Measurement Scheme for Transparent Objects using UV Structured Light 

Rantoson, Rindra, LE2I 

Fofi, David, Le2i UMR CNRS 5158 

Stolz, Christophe, LE2I 

Meriaudeau, Fabrice, LE2I 

This paper introduces a novel 3D measurement scheme based on UV laser triangulation to ascertain the shape of transparent 

objects. Transparent objects are extremely difficult to scan with traditional 3D scanners because of the refraction problem 

observed in the visible range. Therefore, the object surface needs to be preliminary powdered before being digitized with 

commercial scanners. Our approach consists of using non contact measurement scheme while dealing with the refraction 

problem in visible environment. The object shape is computed by classical triangulation method based on stereovision 

constraint. The proposed acquisition system is composed of two classical visible range cameras and a UV laser source. 

The exploitation of the UV laser for triangulation system characterizes the novelty of the proposed approach. The fluorescence 

generated by the UV radiation enables to acquire 3D data of transparent surface with a classical stereovision scheme. 

13:30-16:30, Paper TuBCT8.3 

Extending Fast Marching Method under Point Light Source Illumination and Perspective Projection 

Iwahori, Yuji, Chubu Univ. 

Iwai, Kazuki, Chubu Univ. 

Woodham, Robert J., Univ. of British Columbia 

- 129 -

Kawanaka, Haruki, Aichi Prefectural Univ. 

Fukui, Shinji, Aichi Univ. of Education 

Kasugai, Kunio, Aichi Medical Univ. 

An endoscope is a medical instrument that acquires images inside the human body. An endoscope carries its own light 

source. Classic shape-from-shading can be used to recover the 3-D shape of objects in view. Recent implementations have 

used the Fast Marching Method (FMM). Previous FMM approaches recover 3-D shape under assumptions of parallel light 

source illumination and orthographic projection. This paper extends the FMM approach to recover the 3-D shape under 

more realistic conditions of endoscopy, namely nearby point light source illumination and perspective projection. The new 

approach is demonstrated through experiment and is seen to improve performance. 

13:30-16:30, Paper TuBCT8.4 

Effective Structure-From-Motion for Hybrid Camera Systems 

Bastanlar, Yalin, Middle East Tech. Univ. 

Temizel, Alptekin, Middle East Tech. Univ. 

Yardimci, Yasemin, Middle East Tech. Univ. 

Sturm, Peter, INRIA 

We describe a pipeline for structure-from-motion with mixed camera types, namely omni directional and perspective cameras. 

The steps of the pipeline can be summarized as calibration, point matching, pose estimation, triangulation and bundle 

adjustment. For these steps, we either propose improved methods or modify existing perspective camera methods to make 

the pipeline more effective and automatic when employed for hybrid camera systems. 

13:30-16:30, Paper TuBCT8.5 

Single View Metrology Along Orthogonal Directions 

Peng, Kun, Peking Univ. 

Hou, Lulu, Peking Univ. 

Ren, Ren, Peking Univ. 

Ying, Xianghua, Peking Univ. 


In this paper, we describe how 3D metric measurements can be determined from a single uncalibrated image, when only 

minimal geometric information are available in the image. The minimal information just is orthogonal vanishing points. 

Given such limited information, we show that the length ratios on different orthogonal directions can be directly computed. 

The exciting discovery of the method seems to oppose common senses: Usually, in the calibration process, all edge-lengths 

of cuboid are known, in this paper, cuboid edge-lengths are unknown but its edge-lengths ratios can be recovered from 

image. 3D metric measurements can be directly computed from the image using our linear method. 

13:30-16:30, Paper TuBCT8.6 

Depth Perception Model based on Fixational Eye Movements using Bayesian Statistical Inference 

Tagawa, Norio, Tokyo Metropolitan Univ. 

Small vibrations of eyeball, which occur when we fix our gaze on object, is called ``fixational eye movements.’’ It has 

been reported that such the involuntary eye movements work also for monocular depth perception. In this study, we focus 

on ``tremor’’ which is the smallest type of fixational eye movement, and construct depth perception model based on tremor 

using MAP-EM algorithm. Its effectiveness is confirmed through numerical evaluations using artificial images. 

13:30-16:30, Paper TuBCT8.7 

One-Shot Scanning using a Color Stripe Pattern 

Li, Renju, Peking Univ. 


Structured light 3D scanning has many applications such as 3D modeling, animation, motion analysis, deformation measurement 

and so on. Traditional structured light methods make use of a sequence of patterns to obtain the dense 3D data of 

objects. However, few methods have been proposed to achieve pixel wise reconstruction using one pattern only. In this 

- 130 -

paper, we proposes a one-shot scanning system based on a novel stripe pattern. This pattern uses color stripes with quadratic 

intensity distribution in each stripe. The color distribution is based on a De Bruijn sequence with six colors and order 

three. Graph cut is utilized to decode the color information and the resulting code is calculated using local intensity. Compared 

with traditional methods, the proposed method uses one pattern only and achieves pixel wise reconstruction. Experimental 

results show that our one-shot scanning system can robustly capture 3D data with high accuracy. 

13:30-16:30, Paper TuBCT8.8 

Face Appearance Reconstruction based on a Regional Statistical Craniofacial Model (RSCM) 

Yan-Fei, Zhang, Northwest Univ. 

Ming-Quan, Zhou, Northwest Univ. 

Geng, Guohua, Northwest Univ. 

Feng, Jun, Northwest Univ. 

The reconstruction of facial soft tissue is an essential processing phase in a few of fields. In this paper, we propose a face 

appearance reconstruction algorithm based on a Regional Statistical Craniofacial model called RSCM. Specifically, the 

shape of the craniofacial model is decomposed into a few of segments, such as the eyes, the nose and the mouth regions, 

then the joint statistical models of different regions are constructed independently to address the small sample size problem. 

The face reconstruction task is formulated as a miss data problem, and is also fulfilled region by region respectively. 

Finally, the recovered regions are assembled together to achieve a completed face model. The experimental results show 

that the proposed reconstruction scheme achieves less error rate than a state of the art method. 

13:30-16:30, Paper TuBCT8.9 

3D Human Pose Reconstruction using Millions of Exemplars 

Jiang, Hao, Boston Coll. 

We propose a novel exemplar based method to estimate 3D human poses from single images by using only the joint correspondences. 

Due to the inherent depth ambiguity, estimating 3D poses from a monocular view is a challenging problem. 

We solve the problem by searching through millions of exemplars for optimal poses. Compared with traditional parametric 

schemes, our method is able to handle very large pose database, relieves parameter tweaking, is easier to train and is more 

effective for complex pose 3D reconstruction. The proposed method estimates upper body poses and lower body poses 

sequentially, which implicitly squares the size of the exemplar database and enables us to reconstruct unconstrained poses 

efficiently. Our implementation based on the kd-tree achieves real-time performance. The experiments on a variety of images 

show that the proposed method is efficient and effective. 

13:30-16:30, Paper TuBCT8.10 

Recovering 3D Shape using an Improved Fast Marching Method 

Zou, Chengming, Wuhan Univ. of Tech. 


In this paper we present an improved shape from shading method using improved fast marching method. We commence 

by showing how to recover 3D shape from a single image using an improved fast marching method for solving SFS problem. 

Then we use the level set method constrained by energy minimization to evolve the 3D shape. Finally we show that 

the method can recover stable surface estimates from both synthetic and real world images of complex objects. The experimental 

results show that the resulting method is both robust and accurate. 

13:30-16:30, Paper TuBCT8.11 

The Motion Dynamics Approach to the PnP Problem 

Wang, Bo, Chinese Acad. of Sciences 

Sun, Fengmei, North China University of Technology 

We propose a new motion dynamics approach to solve the PnP problem, where a dynamic simulation system is constituted 

by springs and balls. The equivalence between minimizing the energy of the dynamic system and solving the PnP problem 

is proved. With the assumption of the existence of resistances, the solution of the original PnP problem can be solved 

through the simulation of the process of the movement of the balls. 

- 131 -

13:30-16:30, Paper TuBCT8.12 

Eigenbubbles: An Enhanced Apparent BRDF Representation 

Kumar, Ritwik, Harvard Univ. 

Baba, Vemuri, Univ. of Florida 

Banerjee, Arunava, Univ. of Florida 

In this paper we address the problem of relighting faces in presence of cast shadows and specularities. We present a solution 

to this problem by capturing the spatially varying Apparent Bidirectional Reflectance Functions (ABRDF) fields of human 

faces using Spline Modulated Spherical Harmonics and representing them using a few salient spherical functions called 

Eigenbubbles. Through extensive experiments on the Extended Yale B and the CMU PIE benchmark datasets we demonstrate 

that the proposed method clearly outperforms the state-of-the-art techniques in synthesized image quality. Furthermore, 

we show that our framework allows for ABDRF field compression and can also be used to enhance performance of 

face recognition algorithms. 

13:30-16:30, Paper TuBCT8.13 

Reactive Object Tracking with a Single PTZ Camera 

Al Haj, Murad, Univ. Autonoma de Bracelona 

Bagdanov, Andrew D., Univ. Autonoma de Barcelona 

Gonzalez, Jordi, Centre de Visio per Computador 

Roca, F. Xavier, Univ. Autonoma de Barcelona 

In this paper we describe a novel approach to reactive tracking of moving targets with a pan-tilt-zoom camera. The approach 

uses an extended Kalman filter to jointly track the object position in the real world, its velocity in 3D and the camera intrinsics, 

in addition to the rate of change of these parameters. The filter outputs are used as inputs to PID controllers which 

continuously adjust the camera motion in order to reactively track the object at a constant image velocity while simultaneously 

maintaining a desirable target scale in the image plane. We provide experimental results on simulated and real 

tracking sequences to show how our tracker is able to accurately estimate both 3D object position and camera intrinsics 

with very high precision over a wide range of focal lengths. 

13:30-16:30, Paper TuBCT8.14 

An Experimental Study of Image Components and Data Metrics for Illumination-Robust Variational Optical Flow 

Chetverikov, Dmitry, MTA SZTAKI 

Molnar, Jozsef, ELTE 

Illumination-robust optical flow algorithms are needed in numerous machine vision applications such as vision-based intelligent 

vehicles, surveillance and traffic monitoring. Recently, we have proposed an implicit nonlinear scheme for variational 

optical flow that assumes no particular analytical form of energy functional and can accommodate various image 

components and data metrics. Using test data with brightness and colour illumination changes, we study different features 

and metrics and demonstrate that cross-correlation is superior to the L1 metric for all combinations of the features. 

13:30-16:30, Paper TuBCT8.15 

Multiple Human Tracking based on Multi-View Upper-Body Detection and Discriminative Learning 

Xing, Junliang, Tsinghua Univ. 

Ai, Haizhou, Tsinghua Univ. China 


This paper focuses on the problem of tracking multiple humans in dense environments which is very challenging due to 

recurring occlusions between different humans. To cope with the difficulties it presents, an offline boosted multi-view 

upper-body detector is used to automatically initialize a new human trajectory and is capable of dealing with partial human 

occlusions. What is more, an online learning process is proposed to learn discriminative human observations, including 

discriminative interest points and color patches, to effectively track each human when even more occlusions occur. The 

offline and online observation models are neatly integrated into the particle filter framework to robustly track multiple 

highly interactive humans. Experiments results on CAVIAR dataset as well as many other challenging real-world cases 

demonstrate the effectiveness of the proposed method. 

- 132 -

13:30-16:30, Paper TuBCT8.16 

Visual Tracking using Sparsity Induced Similarity 

Liu, Huaping, Tsinghua Univ. 

Sun, Fuchun, Tsinghua Univ. 

Currently sparse signal reconstruction gains considerable interests and is applied in many fields. In this paper, we propose 

a new approach which utilizes the sparsity induced similarity to construct the tracking algorithm. Compared with stateof-the-art, 

the advantage of this approach is that the sparse representation needs to be calculated for only once and therefore 

the time cost is dramatically decreased. In addition, extensive experimental comparisons show that the proposed approach 

is more robust than some existing approaches. 

13:30-16:30, Paper TuBCT8.17 

An Information Fusion Approach for Multiview Feature Tracking 

Ataer-Cansizoglu, Esra, Boston Univ. 

Betke, Margrit, Boston Univ. 

We propose an information fusion approach to tracking objects from different viewpoints that can detect and recover from 

tracking failures. We introduce a reliability measure that is a combination of terms associated with correlation-based template 

matching and the epipolar geometry of the cameras. The measure is computed to evaluate the performance of 2D 

trackers in each camera view and detect tracking failures. The 3D object trajectory is constructed using stereoscopy and 

evaluated to predict the next 3D position of the object. In case of track loss in one camera view, the projection of the predicted 

3D position onto the image plane of this view is used to reinitialize the lost 2D tracker. We conducted experiments 

with 34 subjects to evaluate our proposed system on videos of facial feature movements during human-computer interaction. 

The system successfully detected feature loss and gave promising results on accurate re-initialization of the feature. 

13:30-16:30, Paper TuBCT8.18 

Monocular 3D Tracking of Deformable Surfaces using Linear Programming 

Chenhao, Wang, Shanghai Jiao Tong Univ. 

Li, Xiong, Shanghai Jiao Tong Univ. 

Liu, Yuncai, Shanghai Jiao Tong Univ. 

We present a method for 3D shape reconstruction of inextensible deformable surfaces from monocular image sequences. 

The key of our approach is to represent the surface as 3D triangulated mesh and formulate the reconstruction problem as 

a sequence of Linear Programming (LP) problems which can be effectively solved. The LP problem consists of data constraints 

which are 3D-to-2D keypoint correspondences and shape constraints which prevent large changes of the edge orientation 

between consecutive frames. Furthermore, we use a refined bisection algorithm to accelerate the computing speed. 

The robustness and efficiency of our approach are validated on both synthetic and real data. 

13:30-16:30, Paper TuBCT8.19 

Exploiting Visual Quasi-Periodicity for Automated Chewing Event Detection using Active Appearance Models and 

Support Vector Machines 

Cadavid, Steven, Univ. of Miami 

Abdel-Mottaleb, Mohamed, Univ. of Miami 

We present a method that automatically detects chewing events in surveillance video of a subject. Firstly, an Active Appearance 

Model (AAM) is used to track a subject’s face across the video sequence. It is observed that the variations in the 

AAM parameters across chewing events demonstrate a distinct periodicity. We utilize this property to discriminate between 

chewing and non-chewing facial actions such as talking. A feature representation is constructed by applying spectral analysis 

to a temporal window of model parameter values. The estimated power spectra subsequently undergo non-linear dimensionality 

reduction via spectral regression. The low-dimensional representations of the power spectra are employed 

to train a Support Vector Machine (SVM) binary classifier to detect chewing events. Experimental results yielded a cross 

validated percentage agreement of 93.4%, indicating that the proposed system provides an efficient approach to automated 

chewing detection. 

- 133 -

13:30-16:30, Paper TuBCT8.20 

Slip and Fall Events Detection by Analyzing the Integrated Spatiotemporal Energy Map 

Huang, Chung-Lin, National Tsing-Hua Univ. 

Liao, Tim, National Tsing-Hua Univ. 

This paper presents a new method to detect slip and fall events by analyzing the integrated spatiotemporal energy (ISTE) 

map. ISTE map includes motion and time of motion occurrence as our motion feature. The extracted human shape is represented 

by an ellipse that provides crucial information of human motion activities. We use this features to detect the 

events in the video with non-fixed frame rate. This work assumes that the person lies on the ground with very little motion 

after the fall accident. Experimental results show that our method is effective for fall and slip detection. 

13:30-16:30, Paper TuBCT8.21 

Color Constancy using Standard Deviation of Color Channels 

Choudhury, Anustup, Univ. of Southern California 

Medioni, Gerard, Univ. of Southern California 

We address here the problem of color constancy and propose a new method to achieve color constancy based on the statistics 

of images with color cast. Images with color cast have standard deviation of one color channel significantly different 

from that of other color channels. This observation is also applicable to local patches of images and ratio of the maximum 

and minimum standard deviation of color channels of local patches is used as a prior to select a pixel color as illumination 

color. We provide extensive validation of our method on commonly used datasets having images under varying illumination 

conditions and show our method to be robust to choice of dataset and at least as good as current state-of-the-art color constancy 

approaches. 

13:30-16:30, Paper TuBCT8.22 

Recognizing Human Actions using Key Poses 

Baysal, Sermetcan, Bilkent Univ. 

Kurt, Mehmet Can, Bilkent Univ. 

Duygulu, Pinar, Bilkent Univ. 

In this paper, we explore the idea of using only pose, without utilizing any temporal information, for human action recognition. 

In contrast to the other studies using complex action representations, we propose a simple method, which relies on 

extracting key poses from action sequences. Our contribution is two-fold. Firstly, representing the pose in a frame as a 

collection of line-pairs, we propose a matching scheme between two frames to compute their similarity. Secondly, to 

extract key poses for each action, we present an algorithm, which selects the most representative and discriminative poses 

from a set of candidates. Our experimental results on KTH and Weizmann datasets have shown that pose information by 

itself is quite effective in grasping the nature of an action and sufficient to distinguish one from others. 

13:30-16:30, Paper TuBCT8.23 

Action Recognition using Three-Way Cross Correlations Feature of Local Motion Attributes 

Matsukawa, Tetsu, Univ. of Tsukuba 

Kurita, Takio, National Inst. of Advanced Industrial Science andTechnology 

This paper proposes a spatio-temporal feature using three-way cross-correlations of local motion attributes for action 

recognition. Recently, the cubic higher-order local auto-correlations (CHLAC) feature has been shown high classification 

performances for action recognition. In previous researches, CHLAC feature was applied to binary motion image sequences 

that indicates moving or static points. However, each binary motion image lost informations about the type of motion such 

as timing of change or motion direction. Therefore, we can improve the classification accuracy further by extending 

CHLAC to multivalued motion image sequences that considered several types of local motion attributes. The proposed 

method is also viewed as an extension of popular bag-of-features approach. Experimental results using two datasets shows 

proposed method outperformed CHLAC features and bag-of-features approach. 

- 134 -

13:30-16:30, Paper TuBCT8.24 

Discriminative Level Set for Contour Tracking 

Li, Wei, Chinese Acad. of Sciences 

Conventional contour tracking algorithms with level set often use generative models to construct the energy function. For 

tracking through cluttered and noisy background, however, a generative model may not be discriminative enough. In this 

paper we integrate the discriminative methods into a level set framework when constructing the level set energy function. 

We train a set of weak classifiers to distinguish the object from the background. Each weak classifier is designed to select 

the most discriminative feature space and integrated via AdaBoost according to their training errors. We also introduce a 

novel interaction term to explore the correlation between pixels near the object edge. This term together with the discriminative 

model both enhance the discriminative power of the level set. The experimental results show that the contour 

tracked by our approach is more accurate than the conventional algorithms with the generative model. Our algorithm successfully 

tracks the object contour even in a cluttered environment. 

13:30-16:30, Paper TuBCT8.25 

Tracking Objects with Adaptive Feature Patches for PTZ Camera Visual Surveillance 

Xie, Yi, Beijing Inst. of Tech. 

Lin, Liang, Lotushill Inc 


Compared to the traditional tracking with fixed cameras, the PTZ-camera-based tracking is more challenging due to (I) 

lacking of reliable background modeling and subtraction; (ii) the appearance and scale of target changing suddenly and 

drastically. Tackling these problems, this paper proposes a novel tracking algorithm using patch-based object models and 

demonstrates its advantages with the PTZ-camera in the application of visual surveillance. In our method, the target model 

is learned and represented by a set of feature patches whose discriminative power is higher than others. The target model 

is matched and evaluated by both appearance and motion consistency measurements. The homography between frames is 

also calculated for scale adaptation. The experiment on several surveillance videos shows that our method outperforms 

the state-of-arts approaches. 

13:30-16:30, Paper TuBCT8.26 

Counting Moving People in Videos by Salient Points Detection 

Conte, Donatello, Univ. di Salerno 

Foggia, Pasquale, Univ. di Salerno 

Percannella, Gennaro, Univ. di Salerno 

Tufano, Francesco, Univ. degli Studi di Salerno 

Vento, Mario, Univ. degli Studi di Salerno 

This paper presents a novel method to count people for video surveillance applications. The problem is faced by establishing 

a mapping between some scene features and the number of people. Moreover, the proposed technique takes specifically 

into account problems due to perspective. In the experimental evaluation, the method has been compared with respect to 

the algorithm by Albiol et al., which provided the highest performance at the PETS 2009 contest on people counting, using 

the same datasets. The results confirm that the proposed method improves the accuracy, while retaining the robustness of 

Albiol’s algorithm. 

13:30-16:30, Paper TuBCT8.27 

Visualization of Customer Flow in an Office Complex over a Long Period 

Onishi, Masaki, National Inst. of Advanced Industrial Science and Technology 

Yoda, Ikushi, National Inst. of Advanced Industrial Science and Technology 

In facility management, analysis of customer trajectories in office complexes is considered critical. In this paper, we 

propose a novel approach for the visualization of customer flow in an office complex over a long period of time. We expressed 

the variation in the trajectories with respect to time by using a mixture model; this was used for the visualization 

of the trajectory flows. The effectiveness of our approach was evaluated from the results of the customer flow analysis experiments 

that were conducted in an office complex. 

- 135 -

13:30-16:30, Paper TuBCT8.28 

Incremental MPCA for Color Object Tracking 

Wang, Dong, Department of Electronic Engineering 

Lu, Hu-Chuan, Dalian Univ. of Tech. 


The task of visual tracking is to deal with dynamic image streams that change over time. For color object tracking, although 

a color object is a 3-order tensor in essence, little attention has been focused on this attribute. In this paper, we propose a 

novel Incremental Multiple Principal Component Analysis (IMPCA) method for online learning dynamic tensor streams. 

When newly added tensor set arrives, the mean tenor and the covariance matrices of different modes can be updated easily, 

and then projection matrices can be effectively calculated based on covariance matrices. Finally, we apply our IMPCA method 

to color object tracking using Bayes inference framework. Experiments are performed on some changeling public and our 

own video sequences. The experimental results demonstrate that the proposed method achieves considerable performance. 

13:30-16:30, Paper TuBCT8.29 

Epipolar-Based Stereo Tracking without Explicit 3D Reconstruction 

Gaschler, Andre Karlheinz, Tech. Univ. München 

Burschka, Darius, Tech. Univ. München 

Hager, Gregory 

We present a general framework for tracking image regions in two views simultaneously based on sum-of-squared differences 

(SSD) minimization. Our method allows for motion models up to affine transformations. Contrary to earlier approaches, we 

incorporate the well-known epipolar constraints directly into the SSD optimization process. Since the epipolar geometry can 

be computed from the image directly, no prior calibration is necessary. Our algorithm has been tested in different applications 

including camera localization, wide-baseline stereo, object tracking and medical imaging. We show experimental results on 

robustness and accuracy compared to the known ground truth given by a conventional tracking device. 

13:30-16:30, Paper TuBCT8.30 

Human Body Parts Tracking using Sequential Markov Random Fields 

Cao, Xiao-Qin, City Univ. of Hong Kong 

Zeng, Jia, Soochow University 

Liu, Zhi-Qiang, City Univ. of Hong Kong 

Automatically tracking human body parts is a difficult problem because of background clutters, missing body parts, and the 

high degrees of freedoms and complex kinematics of the articulated human body. This paper presents the sequential Markov 

random fields (SMRFs) for tracking and labeling moving human body parts automatically by learning the spatio-temporal 

structures of human motions in the setting of occlusions and clutters. We employ a hybrid strategy, where the temporal dependencies 

between two successive human poses are described by the sequential Monte Carlo method, and the spatial relationships 

between body parts in a pose is described by the Markov random fields. Efficient inference and learning algorithms 

are developed based on the relaxation labeling. Experimental results show that the SMRF can effectively track human body 

parts in natural scenes. 

13:30-16:30, Paper TuBCT8.31 

Action Recognition in Videos using Nonnegative Tensor Factorization 

Krausz, Barbara, Fraunhofer IAIS 


Recognizing human actions is of vital interest in video surveillance or ambient assisted living. We consider an action as a 

sequence of body poses which are themselves a linear combination of body parts. In an offline procedure, nonnegative tensor 

factorization is used to extract basis images that represent body parts. The weighting coefficients are obtained by filtering a 

frame with the set of basis images. Since the basis images are obtained from nonnegative tensor factorization, they are separable 

and filtering can be implemented efficiently. The weighting coefficients encode dynamics and are used for action 

recognition. In the proposed action recognition framework, neither explicit detection and tracking of humans nor background 

subtraction are needed. Furthermore, for recognizing location specific actions, we implicitely take scene objects into account. 

- 136 -

13:30-16:30, Paper TuBCT8.32 

Action Detection in Crowded Videos using Masks 

Guo, Ping, Beijing Jiaotong Univ. 

Miao, Zhenjiang, Beijing Jiaotong Univ. 

In this paper, we investigate the task of human action detection in crowded videos. Different from action analysis in clean 

scenes, action detection in crowded environments is difficult due to the cluttered backgrounds, high densities of people 

and partial occlusions. This paper proposes a method for action detection based on masks. No human segmentation or 

tracking technique is required. To cope with the cluttered and crowded backgrounds, shape and motion templates are built 

and the shape templates are used as masks for feature refining. In order to handle the partial occlusion problem, only the 

moving body parts in each motion are involved in action training. Experiments using our approach are conducted on the 

CMU dataset with encouraging results. 

13:30-16:30, Paper TuBCT8.33 

3D Model based Vehicle Tracking using Gradient based Fitness Evaluation under Particle Filter Framework 


Huang, Kaiqi, Chinese Academy of Sciences 

Tan, Tieniu, Chinese Academy of Sciences 


We address the problem of 3D model based vehicle tracking from monocular videos of calibrated traffic scenes. A 3D 

wire-frame model is set up as prior information and an efficient fitness evaluation method based on image gradients is introduced 

to estimate the fitness score between the projection of vehicle model and image data, which is then combined 

into a particle filter based framework for robust vehicle tracking. Numerous experiments are conducted and experimental 

results demonstrate the effectiveness of our approach for accurate vehicle tracking and robustness to noise and occlusions. 

13:30-16:30, Paper TuBCT8.34 

Recovering 3D Shape and Light Source Positions from Non-Planar Shadows 

Yamashita, Yukihiro, Nagoya Inst. of Tech. 



Recently, Shadow Graph has been proposed for recovering 3D shapes from shadows projected on curved surfaces. Unfortunately, 

this method requires a large computational cost. In this paper, we introduce 1D Shadow Graph which can be 

used for recovering 3D shapes with much smaller computational costs. We also extend our method, so that we can estimate 

both 3D shapes and light source positions simultaneously under a condition where 3D shapes and light sources are unknown. 

13:30-16:30, Paper TuBCT8.35 

3D Contour Model Creation for Stereo-Vision Systems 

Maruyama, Kenichi, National Inst. of Advanced Industrial Science and Tech. 

Kawai, Yoshihiro, National Inst. of Advanced Industrial Science and Tech. 

Tomita, Fumiaki, National Inst. of Advanced Industrial Science and Tech. 

The present paper describes a method for automatic 3D contour model creation for stereo-vision systems. The object 

model is a triangular surface mesh and a set of aspect models, which consists of model features and model points. Model 

features and model points are generated using 3D contours, which are estimated by the projected images of the triangular 

surface mesh from multiple discrete viewing directions. Using a non-photorealistic rendering approach, we extract not 

only the outer contours but also the inner contours of the projected images. Using both the inner and outer contours of the 

projected images, we create the object model which has 3D inner contour features and 3D contour generator features. Experimental 

results obtained using the 3D localization algorithm demonstrate the effectiveness of the proposed model. 

- 137 -

13:30-16:30, Paper TuBCT8.36 

Multibody Motion Classification using the Geometry of 6 Points in 2D Images 

Nordberg, Klas, Linköping Univ. 

Zografos, Vasileios, Linkoping Univ. 

We propose a method for segmenting an arbitrary number of moving objects using the geometry of 6 points in 2D images to 

infer motion consistency. This geometry allows us to determine whether or not observations of 6 points over several frames 

are consistent with a rigid 3D motion. The matching between observations of the 6 points and an estimated model of their 

configuration in 3D space is quantified in terms of a geometric error derived from distances between the points and 6 corresponding 

lines in the image. This leads to a simple motion inconsistency score that is derived from the geometric errors of 

6 points, that in the ideal case should be zero when the motion of the points can be explained by a rigid 3D motion. Initial 

clusters are determined in the spatial domain and merged in motion trajectory domain based on the score. Each point is then 

assigned to a cluster by assigning the point to the segment of the lowest score. Our algorithm has been tested with real image 

sequences from the Hopkins155 database with very good results, competing with the state of the art methods, particularly 

for degenerate motion sequences. In contrast the motion segmentation methods based on multi-body factorization, that assumes 

an affine camera model, the proposed method allows the mapping from the 3D space to the 2D image to be fully projective. 

13:30-16:30, Paper TuBCT8.37 

Reflection Removal in Colour Videos 

Conte, Donatello, Univ. di Salerno 

Foggia, Pasquale, Univ. di Salerno 

Percannella, Gennaro, Univ. di Salerno 

Tufano, Francesco, Univ. degli Studi di Salerno 

Vento, Mario, Univ. degli Studi di Salerno 

This paper presents a novel method for reflection removal in the context of an object detection system. The method is based 

on chromatic properties of the reflections and does not require a geometric model of the objects. An experimental evaluation 

of the proposed method has been performed on a large database, showing its effectiveness. 

13:30-16:30, Paper TuBCT8.38 

A Compound MRF Texture Model 

Haindl, Michael, Inst. of Information Theory and Aut. 

Havlicek, Vojtech, Inst. of Information Theory and Aut. 

This paper describes a novel compound Markov random field model capable of realistic modelling of multispectral bidirectional 

texture function, which is currently the most advanced representation of visual properties of surface materials. The 

proposed compound Markov random field model combines a non-parametric control random field with analytically solvable 

wide sense Markov representation for single regions and thus allows to avoid demanding Markov Chain Monte Carlo methods 

for both parameters estimation and the compound random field synthesis. 

13:30-16:30, Paper TuBCT8.39 

Shape Prototype Signatures for Action Recognition 


Riemenschneider, Hayko, Graz Univ. of Tech. 


Recognizing human actions in video sequences is frequently based on analyzing the shape of the human silhouette as the 

main feature. In this paper we introduce a method for recognizing different actions by comparing signatures of similarities 

to pre-defined shape prototypes. In training, we build a vocabulary of shape prototypes by clustering a training set of human 

silhouettes and calculate prototype similarity signatures for all training videos. During testing a prototype signature is calculated 

for the test video and is aligned to each training signature by dynamic time warping. A simple voting scheme over 

the similarities to the training videos provides action classification results and temporal alignments to the training videos. 

Experimental evaluation on a reference data set demonstrates that state-of-the-art results are achieved. 

- 138 -

13:30-16:30, Paper TuBCT8.40 

Shape Guided Maximally Stable Extremal Region (MSER) Tracking 


Riemenschneider, Hayko, Graz Univ. of Tech. 


Maximally Stable Extremal Regions (MSERs) are one of the most prominent interest region detectors in computer vision 

due to their powerful properties and low computational demands. In general MSERs are detected in single images, but given 

image sequences as input, the repeatability of MSER detection can be improved by exploiting correspondences between 

subsequent frames by feature based analysis. Such an approach fails during fast movements, in heavily cluttered scenes and 

in images containing several similar sized regions because of the simple feature based analysis. In this paper we propose an 

extension of MSER tracking by considering shape similarity as strong cue for defining the frame-to-frame correspondences. 

Efficient calculation of shape similarity scores ensures that real-time capability is maintained. Experimental evaluation 

demonstrates improved repeatability and an application for tracking weakly textured, planar objects. 

13:30-16:30, Paper TuBCT8.41 

Locating People in Images by Optimal Cue Integration 

Atienza-Vanacloig, Vicente, Pol. Univ. of Valencia 

Rosell Ortega, Juan, Pol. Univ. of Valencia 

Andreu-Garcia, Gabriela, Pol. Univ. of Valencia 

Valiente, Jose Miguel, Pol. Univ. of Valencia 

This paper describes an approach to segment and locate people in crowded scenarios with application to a surveillance system 

for airport dependencies. To obtain robust operation, the system analyzes a variety of visual cues color, motion and shape 

and integrates them optimally. A general method for automatic inference of optimal cue integration rules is presented. This 

schema, based on supervised training on video sequences, avoids the need of explicitly formulate combination rules based 

on a-priori constraints. The performance of the system is at least as good as classical fusing strategies like those based on 

voting, because the optimized decision engine implicitly includes these and other strategies. 

13:30-16:30, Paper TuBCT8.42 

Visual Tracking Algorithm using Pixel-Pair Feature 

Nishida, Kenji, National Inst. of Advanced Industrial Science and Tech. 

Kurita, Takio, National Inst. of Advanced Industrial Science and Tech. 

Ogiuchi, Yasuo, Sumitomo Electric Industries Ltd. 

Higashikubo, Masakatsu, Sumitomo Electric Industries Ltd. 

A novel visual tracking algorithm is proposed in this paper. The algorithm uses pixel-pair features to discriminate between 

an image patch with an object in the correct position and image patches with an object in an incorrect position. The pixelpair 

feature is considered to be robust for the illumination change, and also is robust for partial occlusion when appropriate 

features are selected in every video frame. The tracking precision for a deforming object (skier) is examined and also the occlusion 

detection method is described. 

13:30-16:30, Paper TuBCT8.43 

Self-Calibration of Radially Symmetric Distortion by Model Selection 

Fujiki, Jun, National Inst. of Advanced Industrial Science and Tech. 

Hino, Hideitsu, Waseda Univ. 

Usami, Yumi, Waseda Univ. 

Akaho, Shotaro, National Inst. of Advanced Industrial Science and Tech. 

Murata, Noboru, Waseda Univ. 

For self-calibration of general radially symmetric distortion (RSD) of omni directional cameras such as fish-eye lenses, calibration 

parameters are usually estimated so that curved lines, which are supposed to be straight in the real-world, are mapped 

to straight lines in the calibrated image, which is assumed to be taken by an ideal pin-hole camera. In this paper, a method 

of calibrating RSD is introduced base on the notion of principal component analysis (PCA). In the proposed method, the 

distortion function, which maps a distorted image to an ideal pin-hole camera image, is assumed to be a linear combination 

of a certain class of basis functions, and an algorithm for solving its coefficients by using line patterns is given. Then a 

- 139 -

method of selecting good basis functions is proposed, which aims to realize appropriate calibration in practice. Experimental 

results for synthetic data and real images are presented to emonstrate the performance of our calibration method. 

13:30-16:30, Paper TuBCT8.44 

A Global Spatio-Temporal Representation for Action Recognition 

Deng, Chao, Tianjin Univ. 


Liu, Hanyu, Univ. of Southern Mississippi 

Chen, Jian, Univ. of Southern Mississippi 

In this paper we introduce an effective method to construct a global spatio-temporal representation for action recognition. 

This representation is inspired by the fact that human actions can be treated as 3D shapes induced by the silhouettes in the 

space-time volume. We estimate the silhouettes which contain detailed shape information of the action, and present an 

efficient sampling method to extract interest points along the silhouettes. The local interest point is represented by a spatiotemporal 

descriptor based on 2D DAISY. Our global space-time representation is the integration of these local descriptors 

in an order along the silhouette. In this manner, we not only utilize the static shape information, but also the spatial-temporal 

cue. We have obtained impressive results on publicly available action datasets. 

13:30-16:30, Paper TuBCT8.45 

Super-Resolution Texture Mapping from Multiple View Images 

Iiyama, Masaaki, Kyoto Univ. 

Kakusho, Koh, Kwansei Gakuin Univ. 

Minoh, Michihiko, Kyoto Univ. 

This paper presents an artifact-free super resolution texture mapping from multiple-view images. The multiple-view images 

are upscaled with a learning-based super resolution technique and are mapped onto a 3D mesh model. However, mapping 

multiple-view images onto a 3D model is not an easy task, because artifacts may appear when different upscaled images are 

mapped onto neighboring meshes. We define a cost function that becomes large when artifacts appear on neighboring meshes, 

and our method seeks the image-and mesh assignment that minimizes the cost function. Experimental results with real images 

demonstrate the effectiveness of our method. 

13:30-16:30, Paper TuBCT8.46 

Automatic Weak Calibration of Master-Slave Surveillance System based on Mosaic Image 

Li, You, Shanghai Jiao Tong University 

Song, Li, Shanghai Jiao Tong University 

Wang, Jia, Shanghai Jiao Tong University 

A master-slave camera surveillance system is composed of one(or more) wide FOV(field of view) static camera and one(or 

more) dynamic PTZ(Pan-Tilt-Zoom) camera. In such a system, master camera monitors a wide field and provides positional 

information of interesting objects to slave camera so that it can dynamically track them. This paper describes a novel method 

for the calibration of master-slave surveillance. The method uses a mosaic image created by snapshots of slave camera to estimate 

the relationship between static master camera plane and pan-tilt controls of slave camera. Compared with other ways, 

this solution provides an efficient and automatic way to calibration of a master-slave system. 

13:30-16:30, Paper TuBCT8.47 

Reconstruction-Free Parallel Planes Identification from Uncalibrated Images 

Habed, Adlane, Univ. deBourgogne 

Amintabar, Amirhasan, Univ. of Windsor 

Boufama, Boubakeur, Univ. of Windsor 

This paper proposes a new method for identifying parallel planes in a scene from three or more uncalibrated images. By 

using the fact that parallel planes intersect at infinity, we were able to devise a linear relationship between the inter-image 

homographies of the parallel planes and the plane at infinity. This relationship is combined with the so-called modulus constraint 

for identifying pairs of parallel planes solely from point correspondences. Experiments with both synthetic and real 

images have validated our method. 

- 140 -

13:30-16:30, Paper TuBCT8.48 

Accurate Dense Stereo by Constraining Local Consistency on Superpixels 

Mattoccia, Stefano, Univ. of Bologna 

Segmentation is a low-level vision cue often deployed by stereo algorithms to assume that disparity within superpixels 

varies smoothly. In this paper, we show that constraining, on a superpixel basis, the cues provided by a recently proposed 

technique, which explicitly models local consistency among neighboring points, yields accurate and dense disparity fields. 

Our proposal, starting from the initial disparity hypotheses of a fast dense stereo algorithm based on scan line optimization, 

demonstrates its effectiveness by enabling us to obtain results comparable to top-ranked algorithms based on iterative disparity 

optimization methods. 

13:30-16:30, Paper TuBCT8.49 

On-Line Structure and Motion Estimation based on an Novel Parameterized Extended Kalman Filter 

Haner, Sebastian, Lund Univ. of Tech. 

Heyden, Anders, Lund Univ. 

Estimation of structure and motion in computer vision systems can be performed using a dynamic systems approach, 

where states and parameters in a perspective system are estimated. We present a novel on-line method for structure and 

motion estimation in densely sampled image sequences. The proposed method is based on an extended Kalman filter and 

a novel parameterization. We assume calibrated cameras and derive a dynamic system describing the motion of the camera 

and the image formation. By a change of coordinates, we represent this system by normalized image coordinates and the 

inverse depths. Then we apply an extended Kalman filter for estimation of both structure and motion. The performance of 

the proposed method is demonstrated in both simulated and real experiments. We furthermore compare our method to the 

unified inverse depth parameterization and show that we achieve superior results. 

13:30-16:30, Paper TuBCT8.51 

Discriminant and Invariant Color Model for Tracking under Abrupt Illumination Changes 

Scandaliaris, Jorge, CSIC-UPC 


The output from a color imaging sensor, or apparent color, can change considerably due to illumination conditions and 

scene geometry changes. In this work we take into account the dependence of apparent color with illumination an attempt 

to find appropriate color models for the typical conditions found in outdoor settings. We evaluate three color based trackers, 

one based on hue, another based on an intrinsic image representation and the last one based on a proposed combination of 

a chromaticity model with a physically reasoned adaptation of the target model. The evaluation is done on outdoor sequences 

with challenging illumination conditions, and shows that the proposed method improves the average track completeness 

by over 22% over the hue-based tracker and the closeness of track by over 7% over the tracker based on the 

intrinsic image representation. 

13:30-16:30, Paper TuBCT8.52 

Using Local Affine Invariants to Improve Image Matching 

Fleck, Daniel, George Mason Univ. 

Duric, Zoran, George Mason Univ. 

A method to classify tentative feature matches as inliers or outliers to a transformation model is presented. It is well known 

that ratios of areas of corresponding shapes are affine invariants [6]. Our algorithm uses consistency of ratios of areas in 

pairs of images to classify matches as inliers or outliers. The method selects four matches within a region, and generates 

all possible corresponding triangles. All matches are classified as inliers or outliers based on the variance among the ratio 

of areas of the triangles. The selected inliers are used to compute a homography transformation. We present experimental 

results showing significant improvements over the baseline RANSAC algorithm for pairs of images from the Zurich Building 

Database. 

- 141 -

13:30-16:30, Paper TuBCT8.53 

Segmenting Video Foreground using a Multi-Class MRF 

Dickinson, Patrick, Univ. of Lincoln 

Hunter, Andrew, Univ. of Lincoln 

Appiah, Kofi, Lincoln Univ. 

Methods of segmenting objects of interest from video data typically use a background model to represent an empty, static 

scene. However, dynamic processes in the background, such as moving foliage and water, can act to undermine the robustness 

of such methods and result in false positive object detections. Techniques for reducing errors have been proposed, 

including Markov Random Field (MRF) based pixel classification schemes, and also the use of region-based models. The 

work we present here combines these two approaches, using a region-based background model to provide robust likelihoods 

for multi-class MRF pixel labelling. Our initial results show the effectiveness of our method, by comparing performance 

with an analogous per-pixel likelihood model. 

13:30-16:30, Paper TuBCT8.54 

Real-Time Pose Regression with Fast Volume Descriptor Computation 

Hirai, Michiro, NAIST 

Ukita, Norimichi, Nara Inst. of Science and Tech. 

Kidode, Masatsugu, NAIST 

We present a real-time method for estimating the pose of a human body using its 3D volume obtained from synchronized 

videos. The method achieves pose estimation by pose regression from its 3D volume. While the 3D volume allows us to 

estimate the pose robustly against self occlusions, 3D volume analysis requires a large amount of computational cost. We 

propose fast and stable volume tracking with efficient volume representation in a low dimensional dynamical model. Experimental 

results demonstrated that pose estimation of a body with a significantly deformable clothing could run at around 

60 fps. 

TuBCT9 Lower Foyer 

Document Analysis Poster Session 

Session chair: Arica, Nafiz (Turkish Naval Academy) 

13:30-16:30, Paper TuBCT9.1 

Robust Staffline Thickness and Distance Estimation in Binary and Gray-Level Music Scores 

Cardoso S., Jaime, Univ. do Porto 

Silva, Rebelo, Ana Maria, Univ. do Porto 

The optical recognition of handwritten musical scores by computers remains far from ideal. Most OMR algorithms rely 

on an estimation of the staff line thickness and the vertical line distance within the same staff. Subsequent operation can 

use these values as references, dismissing the need for some predetermined threshold values. In this work we improve on 

previous conventional estimates for these two reference lengths. We start by proposing a new method for binarized music 

scores and then extend the approach for gray-level music scores. An experimental study with 50 images is used to assess 

the interest of the novel method. 

13:30-16:30, Paper TuBCT9.2 

Hierarchical Decomposition of Handwriting Deformation Vector Field for Improving Recognition Accuracy 

Wakahara, Toru, Hosei Univ. 

Uchida, Seiichi, Kyushu Univ. 

This paper addresses the problem of how to extract, describe, and evaluate handwriting deformation from the deterministic 

viewpoint for improving recognition accuracy. The key ideas are threefold. The first is to extract handwriting deformation 

vector field (DVF) between a pair of input and target images by 2D warping. The second is to hierarchically decompose 

the DVF by a parametric deformation model of global/local affine transformation, where local affine transformation is iteratively 

applied to the DVF by decreasing window sizes. The third is to accept only low-order deformation components 

as natural, within-class handwriting deformation. Experiments using the handwritten numeral database IPTP CDROM1B 

show that correlation-based matching absorbing components of global affine transformation and local affine transformation 

up to the 3 rd order achieved a higher recognition rate of 92.1% than that of 87.0% obtained by original 2D warping. 

- 142 -

13:30-16:30, Paper TuBCT9.3 

Prototype-Based Methodology for the Statistical Analysis of Local Features in Stereotypical Handwriting Tasks 

O’Reilly, Christian, École Pol. De Montreal 

Plamondon, Réjean, École Pol. De Montréal 

A three steps methodology is proposed to derive consistent sets of local features which may be easily compared between 

the different samples of a stereotypical human handwriting movement, allowing the statistical analysis its local variability. 

This technique is illustrated using the Sigma-Lognormal modeling of on-line triangular trajectory patterns obtained from 

a standardized neuromuscular task. The overall approach can be adapted and generalized to the analysis of the end-effector 

kinematics of many planar upper limb movements. 

13:30-16:30, Paper TuBCT9.4 

The Snippet Statistics of Font Recognition 

Lidke, Jakub, Fraunhofer IAIS 



This paper considers the topic of automatic font recognition. The task is to recognize a specific font from a text snippet. 

Unlike previous contributions, we evaluate, how the frequencies of certain letters or words influence automatic recognition 

systems. The evaluation provides estimates on the general feasibility of font recognition under various changing conditions. 

Results on a data-set containing 747 different fonts shows that precision can vary between 16% and 94%, dependent on 

(I) which letters are provided, (ii) how many letters are provided, and (iii) which language is used – as these factors considerably 

influence the text snippet statistics. As a second contribution, we introduce a novel bag-of-features based approach 

to font recognition. 

13:30-16:30, Paper TuBCT9.5 

A Study of Designing Compact Recognizers of Handwritten Chinese Characters using Multiple-Prototype based 

Classifiers 

Wang, Yongqiang, The Univ. of Hong Kong 

Huo, Qiang, Microsoft Res. Asia 

We present a study of designing compact recognizers of handwritten Chinese characters using multiple-prototype based 

classifiers. A modified Quick prop algorithm is proposed to optimize a sample-separation-margin based minimum classification 

error objective function. Split vector quantization technique is used to compress classifier parameters. Benchmark 

results are reported for classifiers with different footprints trained from about 10 million samples on a recognition task 

with a vocabulary of 9282 character classes which include 9119 Chinese characters, 62 alphanumeric characters, 101 

punctuation marks and symbols. 

13:30-16:30, Paper TuBCT9.6 

Membership Functions for Zoning-Based Recognition of Handwritten Digits 

Impedovo, Sebastiano, Univ. degli Studi di Bari 

Impedovo, Donato, Pol. Di Bari 

Pirlo, Giuseppe, Univ. degli Studi di Bari 

Modugno, Raffaele, Univ. of Bari “Aldo Moro” 

This paper focuses the role of membership functions in zoning based classification. In fact, the effectiveness of a zoning 

methods depends not only on the way in which the pattern image is partitioned by the zoning, but also on the criteria 

adopted to define the way in which a feature influences the diverse zones. For this purpose, an experimental investigation 

is presented, that focuses the most valuable way in which a features spreads its influence on the zones of the pattern image. 

The experimental tests have been carried out in the field of handwritten digit recognition, using the numeral digits of the 

CEDAR database. The result points out the membership function has a paramount relevance on the classification performance 

and demonstrate that the exponential model outperforms other membership functions. 

- 143 -

13:30-16:30, Paper TuBCT9.7 

Scribe Identification in Medieval English Manuscripts 

Gilliam, Tara, Univ. of York 


Clark, John A., Univ. of York 

In this paper we present work on automated scribe identification on a new Middle-English manuscript dataset from around 

the 14 th – 15 th century. We discuss the image and textual problems encountered in processing historical documents, and 

demonstrate the effect of accounting for manuscript style on the writer identification rate. The grapheme codebook method 

is used to achieve a Top-1 classification accuracy of up to 77% with a modification to the distance measure. The performance 

of the Sparse Multinomial Logistic Regression classifier is compared against five k-nn classifiers. We also consider 

classification against the principal components and propose a method for visualising the principal component vectors in 

terms of the original grapheme features. 

13:30-16:30, Paper TuBCT9.8 

Recognition of Handwritten Arabic (Indian) Numerals using Freeman’s Chain Codes and Abductive Network Classifier 

Lawal, Isah Abdullahi, King Fahd Univ. of Petroleum & Minerals 

Abdel-Aal, Radwan E., King Fahd Univ. of Petroleum & Minerals 

Mahmoud, Sabri A., King Fahd Univ. of Petroleum & Minerals 

Accurate automatic recognition of handwritten Arabic numerals has several important applications, e.g. in banking transactions, 

automation of postal services, and other data entry related applications. A number of modelling and machine learning 

techniques have been used for handwritten Arabic numerals recognition, including Neural Network, Support Vector 

Machine, and Hidden Markov Models. This paper proposes the use of abductive networks to the problem. We studied the 

performance of abductive network architecture on a dataset of 21120 samples of handwritten 0-9 digits produced by 44 

writers. We developed a new feature set using histograms of contour points chain codes. Recognition rates as high as 

99.03% were achieved, which surpass the performance reported in the literature for other recognition techniques on the 

same data set. Moreover, the technique achieves a significant reduction in the number of features required. 

13:30-16:30, Paper TuBCT9.9 

A SVM-HMM based Online Classifier for Handwritten Chemical Symbols 

Zhang, Yang, Nankai Univ. 

Shi, Guangshun, Nankai Univ. 

Wang, Kai, Nankai Univ. 

This paper presents a novel double-stage classifier for handwritten chemical symbols recognition task. The first stage is 

rough classification, SVM method is used to distinguish non-ring structure (NRS) and organic ring structure (ORS) symbols, 

while HMM method is used for fine recognition at second stage. A point-sequence-reordering algorithm is proposed 

to improve the recognition accuracy of ORS symbols. Our test data set contains 101 chemical symbols, 9090 training 

samples and 3232 test samples. Finally, we obtained top-1 accuracy of 93.10% and top-3 accuracy of 98.08% based on 

the test data set. 

13:30-16:30,Paper TuBCT9.10 

Symbol Recognition Combining Vectorial and Pixel-Level Features for Line Drawings 

Su, Feng, Nanjing Univ. 

Lu, Tong, Nanjing Univ. 

Yang, Ruoyu, Nanjing Univ. 

In this paper, we present an approach for symbol representation and recognition in line drawings, integrating both the vector-based 

structural description and pixel-level statistical features of the symbol. For the former, a vectorial template is 

defined on the basis of the vectorization model and exploited in segmenting symbols from the line network. For the latter, 

a Radon-transform-based signature is employed to characterize shapes on the symbol and the components level. Experimental 

results on real technical drawings are presented to show the promising aspect of our approach. 

- 144 -

13:30-16:30,Paper TuBCT9.11 

Writing Order Recovery from Off-Line Handwriting by Graph Traversal 

Cordella, Luigi P., Univ. di Napoli Federico II 

De Stefano, Claudio, Univ. di Napoli Federico II 

Marcelli, Angelo, Univ. of Salerno 

Santoro, Adolfo, Univ. of Salerno 

We present a method to recover the dynamic writing order from static images of handwriting. The static handwriting is 

initially represented by its skeleton, which is then converted into a graph, whose arcs correspond to the skeleton branches, 

and nodes to either end point or branch point of the skeleton. Criteria derived by handwriting generation are then applied 

to transform the graph in such a way that all its nodes, but the first and the last, have an even degree, so that it can be traversed 

from the first to the last by using the Fleury’s algorithm. The experimental results show that combining criteria derived 

from handwriting generation models with graph traversal leads to reconstruct the original sequence produced by a 

writer even in case of complex handwriting, i.e handwriting with retracing, crossings and pen-up’s. 

13:30-16:30,Paper TuBCT9.12 

Holistic Urdu Handwritten Word Recognition using Support Vector Machine 

Sagheer, Malik Waqas, CENPARMI, Concordia Univ. 

He, Chun Lei, Concordia Univ. 

Nobile, Nicola, Concordia Univ. CENPARMI 

Suen, Ching Y. 

Since the Urdu language has more isolated letters than Arabic and Farsi, a research on Urdu handwritten word is desired. 

This is a novel approach to use the compound features and a Support Vector Machine (SVM) in offline Urdu word recognition. 

Due to the cursive style in Urdu, a classification using a holistic approach is adapted efficiently. Compound feature 

sets, which involves in structural and gradient features (directional features), are extracted on each Urdu word. Experiments 

have been conducted on the CENPARMI Urdu Words Database, and a high recognition accuracy of 97.00% has been 

achieved. 

13:30-16:30,Paper TuBCT9.13 

A Framework for the Combination of Different Arabic Handwritten Word Recognition Systems 

El Abed, Haikal, Braunschweig Tech. Univ. 

Märgner, Volker, Braunschweig Tech. Univ. 

In this paper we present A Framework for the Combination of Different Arabic Handwritten Word Recognition Systems 

to achieve a decision with a higher performance. This performance can be expressed by lower rejection rates and higher 

recognition rates. The used methods range from voting schemes based on results of different recognizer to a neural network 

decision based on normalized confidences. This work presents an extension of the well known combination methods for 

a large lexicon, an extension from maximum 30 classes (e.g., 10 classes for digits classification) to 937 classes for the 

IfN/ENIT-database. In addition, different reject rules based on the evaluation and analysis of individual and combined 

systems output are discussed. Different threshold function for reject levels are tested and evaluated. Tests with a set of 

recognizer, which participated in the ICDAR 2007 competition and based on set coming from the IfN/ENIT-database 

show that a word error rate (WER) of 5.29% without reject and with a reject rate less than 25% even a word error rate of 

less than 1%. 

13:30-16:30,Paper TuBCT9.15 

Degraded Character Recognition by Image Quality Evaluation 

Liu, Chunmei, Tongji Univ. 

The character image quality plays an important role in degraded character recognition which could tell the recognition 

difficulty. This paper proposed a novel approach to degraded character recognition by three kinds of independent degradation 

sources. It is composed of two stems: character image quality evaluation, character recognition. Firstly, it presents 

the dual-evaluation to evaluate the image quality of the input character. Secondly, according to the input evaluation result, 

the character recognition sub-systems adaptively act on. These sub-systems are trained by character sets whose image 

qualities are similar to the input’s quality, and have special features and special classifiers respectively. Experiment results 

demonstrate the proposed approach highly improved the performance of degraded character recognition system. 

- 145 -

13:30-16:30,Paper TuBCT9.16 

Offline Arabic Handwriting Identification using Language Diacritics 

Lutf, Mohammed, Huazhong Univ. of Science and Tech. 


Li, Hong, Huazhong Univ. of Science and Tech. 

In this paper, we present an approach for writer identification using off-line Arabic handwriting. The proposed method introduced 

Arabic writing in a new form, by presenting Arabic writing in its basic components instead of alphabetic. We 

split the input document into two parts: one for the letters and the other for the diacritics, we extract all diacritics from the 

input image and calculate the LBP histogram for each diacritic then concatenate these histograms to use it as handwriting 

features. We use the IFN/ENIT database in the experiments reported here and our tests involve 287 writers. The results 

show that our method is very effective and makes the handling of the Arabic handwriting more easily than before. 

13:30-16:30,Paper TuBCT9.17 

Removing Rule-Lines from Binary Handwritten Arabic Document Images using Directional Local Profile 

Shi, Zhixin, SUNY at Buffalo 



In this paper, we present a novel approach for detecting and removing pre-printed rule-lines from binary handwritten 

Arabic document images. The proposed technique is based on a directional local profiling approach for the detection of 

the rule-line locations. Then a refined adaptive vertical run-length search is designed for removing the rule-line pixels 

without much damaging to the text. They are also tolerate to the variations in the rule-lines such as broken lines, orientation 

changes and variation in the thickness of the rule-lines. Analysis of experimental results on the DARPA MADCAT Arabic 

handwritten document data indicates that the method is robust and is capable of correctly removing rule-lines. 

13:30-16:30,Paper TuBCT9.18 

A Bag-of-Pages Approach to Unordered Multi-Page Document Classification 

Gordo, Albert, Univ. Autònoma de Barcelona 

Perronnin, Florent, Xerox Res. Centre Europe 

We consider the problem of classifying documents containing multiple unordered pages. For this purpose, we propose a 

novel bag-of-pages document representation. To represent a document, one assigns every page to a prototype in a codebook 

of pages. This leads to a histogram representation which can then be fed to any discriminative classifier. We also consider 

several refinements over this initial approach. We show on two challenging datasets that the proposed approach significantly 

outperforms a baseline system. 

13:30-16:30,Paper TuBCT9.19 

Fast Seamless Skew and Orientation Detection in Document Images 

Konya, Iuliu Vasile, Fraunhofer IAIS 

Eickeler, Stefan, Fraunhofer IAIS 

Seibert, Christoph, Fraunhofer IAIS 

Reliable and generic methods for skew detection are a necessity for any large-scale digitization projects. As one of the 

first processing steps, skew detection and correction has a heavy influence on all further document analysis modules, such 

as geometric and logical layout analysis. This paper introduces a generic, scale-independent algorithm capable of accurately 

detecting the global skew angle of document images within the range [-90,90] degrees. By using the same framework, the 

algorithm is then extended for Roman script documents so as to cope with the full range [-180,180) degrees of possible 

skew angles. Despite its generality, the improved algorithm is very fast and requires no explicit parameters. Experiments 

on a combined test set comprising around 110000 real-life images show the accuracy and robustness of the proposed 

method. 

- 146 -

13:30-16:30,Paper TuBCT9.20 

Unsupervised Block Covering Analysis for Text-Line Segmentation of Arabic Ancient Handwritten Document Images 

Boussellaa, Wafa, Univ. of Sfax 

Zahour, Abderrazek, Braunschweig Technical University 

El Abed, Haikal, Havre Univ. 

Benabdelhafid, Abdellatif, Braunschweig Technical University 

Alimi, Adel M., Univ. of Sfax 

This paper presents a new method for automatic text-line extraction from Arabic historical handwritten documents presenting 

an overlapping and multi-touching characters problems. Our approach is based on block covering analysis using unsupervised 

technique. This algorithm performs firstly a statistical block analysis which computes the optimal number of document decomposition 

into vertical strips. Then, our algorithm achieves a fuzzy as eline detection using fuzzy Cmeans algorithm. Finally, 

blocks are assigned to its corresponding lines. Experiment results show that the proposed method achieves high accuracy 

about 95% for detecting text lines in Arabic historical handwritten document images written with different scripts. 

13:30-16:30,Paper TuBCT9.21 

A Bi-Modal Handwritten Text Corpus: Baseline Results 

Pastor, Moises, Univ. Pol. De Valencia 

Toselli, Alejandro Héctor, Univ. Pol. De Valencia 

Casacuberta, Francisco, Univ. Pol. De Valencia 

Vidal, Enrique, Univ. Pol. De Valencia 

Handwritten text is generally captured through two main modalities: off-line and on-line. Smart approaches to handwritten 

text recognition (HTR) may take advantage of both modalities if they are available. This is for instance the case in computer-assisted 

transcription of text images, where on-line text can be used to interactively correct errors made by a main offline 

HTR system. We present here baseline results on the biMod-IAM-PRHLT corpus, which was recently compiled for 

experimentation with techniques aimed at solving the proposed multi-modal HTR problem, and is being used in one of the 

official ICPR-2010 contests. 

13:30-16:30,Paper TuBCT9.22 

Feature Selection using Multiobjective Optimization for Named Entity Recognition 

Ekbal, Asif, Univ. of Heidelberg 

Saha, Sriparna, Univ. of Heidelberg 

Garbe, Christoph S., Univ. of Heidelberg 

Appropriate feature selection is a very crucial issue in any machine learning framework, specially in Maximum Entropy 

(ME). In this paper, the selection of appropriate features for constructing a ME based Named Entity Recognition (NER) system 

is posed as a multiobjective optimization (MOO) problem. Two classification quality measures, namely recall and precision 

are simultaneously optimized using the search capability of a popular evolutionary MOO technique, NSGA-II. The 

proposed technique is evaluated to determine suitable feature combinations for NER in two languages, namely Bengali and 

English that have significantly different characteristics. Evaluation results yield the recall, precision and F-measure values 

of 70.76%, 81.88% and 75.91%, respectively for Bengali, and 78.38%, 81.27% and 79.80%, respectively for English. Comparison 

with an existing ME based NER system shows that our proposed feature selection technique is more efficient than 

the heuristic based feature selection. 

13:30-16:30,Paper TuBCT9.23 

Redif Extraction in Handwritten Ottoman Literary Texts 

Can, Ethem Fatih, Bilkent Univ. 

Duygulu, Pinar, Bilkent Univ. 

Can, Fazli, Bilkent Univ. 

Kalpakli, Mehmet, Bilkent Univ. 

Repeated patterns, rhymes and redifs, are among the fundamental building blocks of Ottoman Divan poetry. They provide 

integrity of a poem by connecting its parts and bring a melody to its voice. In Ottoman literature, poets wrote their works by 

making use of the rhymes and redifs of previous poems according to the nazire (creative imitation) tradition either to prove 

their expertise or to show respect towards old masters. Automatic recognition of redifs would provide important data mining 

- 147 -

opportunities in literary analyses of Ottoman poetry where the majority of it is in handwritten form. In this study, we propose 

a matching criterion and method, Red if Extraction using Contour Segments (RECS) using the proposed matching criterion, 

that detects redifs in handwritten Ottoman literary texts using only visual analysis. Our method provides a success rate of 

0.682 in a test collection of 100 poems. 

13:30-16:30,Paper TuBCT9.24 

Analysis of Local Features for Handwritten Character Recognition 

Uchida, Seiichi, Kyushu Univ. 

Liwicki, Marcus, DFKI 

This paper investigates a part-based recognition method of handwritten digits. In the proposed method, the global structure 

of digit patterns is discarded by representing each pattern by just a set of local feature vectors. The method is then comprised 

of two steps. First, each of J local feature vectors of a target pattern is recognized into one of ten categories (``0’’—``9’’) by 

the nearest neighbor discrimination with a large database of reference vectors. Second, the category of the target pattern is 

determined by the majority voting on the J local recognition results. Despite a pessimistic expectation, we have reached 

recognition rates much higher than 90% for the task of digit recognition. 

13:30-16:30,Paper TuBCT9.25 

Detect Visual Spoofing in Unicode-Based Text 

Qiu, Bite, City Univ. of Hong Kong 

Fang, Ning, City Univ. of Hong Kong 

Liu, Wenyin, City U of HK 

Visual spoofing in Unicode-based text is anticipated as a severe web security problem in the near future as more and more 

Unicode-based web documents will be used. In this paper, to detect whether a suspicious Unicode character in a word is 

visual spoofing or not, the context of the suspicious character is utilized by employing a Bayesian framework. Specifically, 

two contexts are taken into consideration: simple context and general context. Simple context of a suspicious character is 

the word where the character exists while general context consists of all homoglyphs of the character within Universal Character 

Set (UCS). Three decision rules are designed and used jointly for convicting a suspicious character. Preliminary evaluations 

and user study show that the proposed approach can detect Unicode-based visual spoofing with high effectiveness 

and efficiency. 

13:30-16:30,Paper TuBCT9.26 

Comparing Several Techniques for Offline Recognition of Printed Mathematical Symbols 

Álvaro, Francisco, Inst. Tecnológico de Informática 

Sánchez, Joan Andreu, Univ. Pol. De Valencia 

Automatic recognition of printed mathematical symbols is a fundamental problem for recognition of mathematical expressions. 

Several classification techniques has been previously used, but there are very few works that compare different classification 

techniques on the same database and with the same experimental conditions. In this work we have tested classical 

and novelty classification techniques for mathematical symbol recognition on two databases. 

13:30-16:30,Paper TuBCT9.27 

Symbol Classification using Dynamic Aligned Shape Descriptor 

Fornés, Alicia Computer Vision Center 

Escalera, Sergio UB 

Llados, Josep Computer Vision Center 

Valveny, Ernest Univ. Autònoma de Barcelona 

Shape representation is a difficult task because of several symbol distortions, such as occlusions, elastic deformations, gaps 

or noise. In this paper, we propose a new descriptor and distance computation for coping with the problem of symbol recognition 

in the domain of Graphical Document Image Analysis. The proposed D-Shape descriptor encodes the arrangement information 

of object parts in a circular structure, allowing different levels of distortion. The classification is performed using 

a cyclic Dynamic Time Warping based method, allowing distortions and rotation. The methodology has been validated on 

different data sets, showing very high recognition rates. 

- 148 -

13:30-16:30,Paper TuBCT9.28 

Document Logo Detection and Recognition using Bayesian Model 

Wang, Hongye, Tsinghua Univ. 

Chen, Youbin, Tsinghua Univ. 

This paper presents a simple, dynamic approach to logo detection and recognition in document images. Although there 

are literatures on both logo detection and logo recognition issues, Current methods lack the adaptability to variable realworld 

documents. In this paper we initially observe this deficiency from a different point of view and reveal its inherent 

causation. Then we reorganize the structure of the logo detection and recognition procedures and integrate them into a 

unified framework. By applying feedback and selecting proper features, we make our framework dynamic and interactive. 

Experiments show that the proposed method outperforms existing methods in document processing domain. 

13:30-16:30,Paper TuBCT9.29 

An Efficient Staff Removal Approach from Printed Musical Documents 

Dutta, Anjan, Univ. Autonoma de Barcelona 


Fornés, Alicia, Computer Vision Center 


Staff removal is an important preprocessing step of the Optical Music Recognition (OMR). The process aims to remove 

the stafflines from a musical document and retain only the musical symbols, later these symbols are used effectively to 

identify the music information. This paper proposes a simple but robust method to remove stafflines from printed musical 

scores. In the proposed methodology we have considered a staffline segment as a horizontal linkage of vertical black runs 

with uniform height. We have used the neighbouring properties of a staffline segment to validate it as a true segment. We 

have considered the dataset along with the deformations described in \cite{ex8} for evaluation purpose. From experimentation 

we have got encouraging results. 

13:30-16:30,Paper TuBCT9.30 

Combining Spectral and Spatial Features for Robust Foreground-Background Separation 

Lettner, Martin, Vienna Univ. of Tech. 

Sablatnig, Robert, Vienna Univ. of Tech. 

Foreground-background separation in multispectral images of damaged manuscripts can benefit from both, spectral and 

spatial information. Therefore, we incorporate a Markov Random Field which provides a powerful tool to combine both 

features simultaneously. Higher order models enable the inclusion of spatial constraints based on stroke characteristics. 

We apply belief propagation for inference and include the higher order potentials by upgrading the message update. The 

proposed segmentation method requires no training and is independent of script, size, and style of characters. We will 

demonstrate the robust performance on a set of degraded documents and on synthetic images. 

13:30-16:30,Paper TuBCT9.31 

Unsupervised Learning of Stroke Tagger for Online Kanji Handwriting Recognition 

Blondel, Mathieu, Kobe Univ. 

Seki, Kazuhiro, Kobe Univ. 

Uehara, Kuniaki, Kobe Univ. 

Traditionally, HMM-based approaches to online Kanji handwriting recognition have relied on a hand-made dictionary, 

mapping characters to primitives such as strokes or substrokes. We present an unsupervised way to learn a stroke tagger 

from data, which we eventually use to automatically generate such a dictionary. In addition to not requiring a prior handmade 

dictionary, our approach can improve the recognition accuracy by exploiting unlabeled data when the amount of labeled 

data is limited. 

- 149 -

13:30-16:30,Paper TuBCT9.32 

A Baseline Dependent Approach for Persian Handwritten Character Segmentation 

Alaei, Alireza, Univ. of Mysore 

Nagabhushan, P., Univ. of Mysore 


In this paper, an efficient approach to segment Persian off-line handwritten text-line into characters is presented. The proposed 

algorithm first traces the baseline of the input text-line image and straightens it. Subsequently, it over-segments 

each word/subwords using features extracted from histogram analysis and then removes extra segmentation points using 

some baseline dependent as well as language dependent rules. We tested the proposed character segmentation scheme 

with 2 different datasets. On a test set of 899 Persian words/subwords created by us, 90.26% of the characters were segmented 

correctly. From another dataset of 200 handwritten Arabic word images [11] we obtained 93.49% correct segmentation 

accuracy. 

13:30-16:30,Paper TuBCT9.33 

Bayesian Networks Learning Algorithms for Online Form Classification 

Philippot, Emilie, Univ. Nancy 2, Loria 

Belaid, Yolande, Univ. Nancy 2, Loria 

Belaid, Abdel, Univ. Nancy 2, Loria 

In this paper a new method is presented for the recognition of online forms filled manually by a digital-type clip. This 

writing system transmits only the written fields without the pre-printed form. The form recognition consists in retrieving 

the original form directly from the filled fields without any context, which is a very challenging problem. We propose a 

method based on Bayesian networks. The networks use the conditional probabilities between fields in order to infer the 

real form. Two learning algorithms of form structures are employed to test their suitability for the case studied. The tests 

were conducted on the basis of 3200 forms provided by the Act image compagny, specialist in interactive writing processes. 

The first experiments show a recognition rate reaching more than 97%. 

13:30-16:30,Paper TuBCT9.34 

Bangla and English City Name Recognition for Indian Postal Automation 


Roy, R.K., Indian Statistical Inst. 

Kimura, Fumitaka, Mie Univ. 

Because of multi-lingual behavior destination address block of a postal document of an Indian state may be written in two 

or more scripts. From a statistical analysis of Indian postal document we noted that about 22.04% of Indian postal documents 

are written in two scripts. Because of inter-mixing of these scripts in postal address writings, it is very difficult to 

identify the script by which a city name is written. To avoid such identification difficulties, in this paper we proposed a 

lexicon-driven bi-lingual (English and Bangla) city name recognition scheme for Indian postal automation. We obtained 

93.19% accuracy when tested on 11875 city name samples. 

13:30-16:30,Paper TuBCT9.35 

Shape Code based Word-Image Matching for Retrieval of Indian Multi-Lingual Documents 

Tarafdar, Arundhati, Indian Statistical Inst. 

Mandal, Ranju, Indian Statistical Inst. 

Pal, Srikanta, Indian Statistical Inst. 



In the current scenario retrieving information from document images is a challenging problem. In this paper we propose 

a shape code based word-image matching (word-spotting) technique for retrieval of multilingual documents written in Indian 

languages. Here, each query word image to be searched is represented by a primitive shape code using (I) zonal information 

of extreme points (ii) vertical shape based feature (iii) crossing count (with respect to vertical bar position) (iv) 

loop shape and position (v) background information etc. Each candidate word (a word having similar aspect ratio and 

topological feature to the query word) of the document is also coded accordingly. Then, an inexact string matching technique 

is used to measure the similarity between the primitive codes generated from the query word image and each can- 

- 150 -

didate word of the document with which the query image is to be searched. Based on the similarity score, we retrieve the 

document where the query image is found. Experimental results on Bangla, Devnagari and Gurumukhi scripts document 

image databases confirm the feasibility and efficiency of our proposed approach. 

13:30-16:30,Paper TuBCT9.36 

Stochastic Segment Model Adaptation for Offline Handwriting Recognition 

Prasad, Rohit, Raytheon BBN Tech. 

Bhardwaj, Anurag, SUNY Buffalo 

Subramanian, Krishna, Raytheon BBN Tech. 

Cao, Huaigu, Raytheon BBN Tech. 

Natarajan, P., BBN Tech. 

In this paper, we present techniques for unsupervised adaptation of stochastic segment models to improve accuracy on 

large vocabulary offline handwriting recognition (OHR) tasks. We build upon our previous work on stochastic segment 

modeling for Arabic OHR. In our previous work, stochastic character segments for each n-best hypothesis were generated 

by a hidden Markov model (HMM) recognizer, and then a segmental model was used as an additional knowledge source 

for re-ranking the n-best list. Here, we describe a novel framework for unsupervised adaptation. It integrates both HMM 

and segment model adaptation to achieve significant gains over un-adapted recognition. Experimental results demonstrate 

the efficacy of our proposed method on a large corpus of handwritten Arabic documents. 

13:30-16:30,Paper TuBCT9.37 

Shape-Based Image Retrieval using a New Descriptor based on the Radon and Wavelet Transforms 

Nacereddine, Nafaa, LORIA 


Ziou, Djemel, Sherbrooke Univ. 

Hamami, Latifa, Ec. Nationale Pol. 

In this paper, the Radon transform is used to design a new descriptor called Phi-signature invariant to usual geometric 

transformations. Experiments show the effectiveness of the multilevel representation of the descriptor built from Phi-signature 

and R. 

13:30-16:30,Paper TuBCT9.38 

CUDA Implementation of Deformable Pattern Recognition and its Application to MNIST Handwritten Digit Database 

Mizukami, Yoshiki, Yamaguchi Univ. 

Tadamura, Katsumi, Yamaguchi Univ. 

Warrell, Jonathan, Oxford Brookes University 

Li, Peng, Univ. Coll. London 

Prince, Simon, Univ. Coll. London 

In this study we propose a deformable pattern recognition method with CUDA implementation. In order to achieve the 

proper correspondence between foreground pixels of input and prototype images, a pair of distance maps are generated 

from input and prototype images, whose pixel values are given based on the distance to the nearest foreground pixel. Then 

a regularization technique computes the horizontal and vertical displacements based on these distance maps. The dissimilarity 

is measured based on the eight-directional derivative of input and prototype images in order to leverage characteristic 

information on the curvature of line segments that might be lost after the deformation. The prototype-parallel displacement 

computation on CUDA and the gradual prototype elimination technique are employed for reducing the computational time 

without sacrificing the accuracy. A simulation shows that the proposed method with the k-nearest neighbor classifier gives 

the error rate of 0.57% for the MNIST handwritten digit database. 

13:30-16:30,Paper TuBCT9.39 

Text Independent Writer Identification for Bengali Script 

Chanda, Sukalpa, GJØVIK Univ. Coll. 



Wakabayashi, Tetsushi, Mie Univ. 

- 151 -

Automatic identification of an individual based on his/her handwriting characteristics is an important forensic tool. In a 

computational forensic scenario, presence of huge amount of text/information in a questioned document cannot be always 

ensured. Also, compromising in terms of systems reliability under such situation is not desirable. We here propose a system 

to encounter such adverse situation in the context of Bengali script. Experiments with discrete directional feature and gradient 

feature are reported here, along with Support Vector Machine (SVM) as classifier. We got promising results of 95.19% 

writer identification accuracy at first top choice and 99.03% when considering first three top choices. 

13:30-16:30,Paper TuBCT9.40 

Document Image Retrieval using Feature Combination in Kernel Space 

Hassan, Ehtesham, Indian Inst. of Tech. Delhi 

Chaudhury, Santanu, Indian Inst. of Tech. Delhi 

Gopal, M, Indian Inst. of Tech. Delhi 

The paper presents application of multiple features for word based document image indexing and retrieval. A novel framework 

to perform Multiple Kernel Learning for indexing using the Kernel based Distance Based Hashing is proposed. The 

Genetic Algorithm based framework is used for optimization. Two different features representing the structural organization 

of word shape are defined. The optimal combination of both the features for indexing is learned by performing MKL. The 

retrieval results for document collection belonging to Devanagari script are presented. 

13:30-16:30,Paper TuBCT9.41 

A Novel Handwritten Urdu Word Spotting based on Connected Components Analysis 

Sagheer, Malik Waqas, CENPARMI, Concordia Univ. 

Nobile, Nicola, Concordia Univ. CENPARMI 



We propose a novel word spotting system for Urdu words within handwritten text lines. Spatial information of diacritics 

is integrated to the detection of the main connected components in candidate words generation. An Urdu word recognition 

system is effectively designed and applied to classify the candidate words. In this word recognition system, compound 

features and SVM were adapted. The verification/rejection process was based on the outputs from the Urdu word recognition 

system and the image’s global features were applied to achieve a promising result. As a result, a high 92.11% correct 

segmentation rate, a 50.75% word spotting precision rate were achieved while maintaining a 70.1% recall on CENPARMI’s 

Urdu Database. 

13:30-16:30,Paper TuBCT9.42 

Computer Assisted Transcription of Text Images: Results on the GERMANA Corpus and Analysis of Improvements 

Needed for Practical Use 

Romero Gomez, Verónica, Univ. Pol. De Valencia 

Toselli, Alejandro Héctor, Univ. Pol. De Valencia 

Vidal, Enrique, Univ. Pol. De Valencia 

We present a study of the application of Computer Assisted Transcription of Text Images (CATTI) to a task which is much 

closer to real applications than other tasks previously studied. The new task consists in the transcription of a new publicly 

available historic handwritten document, called GERMANA. A detailed analysis of the main factors influencing the system 

performance are exposed and some strategies to circumvent them are proposed. 

13:30-16:30,Paper TuBCT9.43 

OCR Post-Processing using Weighted Finite-State Transducers 

Llobet, Rafael, Univ. Pol. De Valencia 

Navarro Cerdán, José Ramón, Univ. Pol. De Valencia 

Perez-Cortes, Juan-Carlos, Univ. Pol. De Valencia 

Arlandis, Joaquim, Univ. Pol. De Valencia 

A new approach for Stochastic Error-Correcting Language Modeling based on Weighted Finite-State Transducers (WFSTs) 

is proposed as a method to post-process the results of an optical character recognizer (OCR). Instead of using the recognized 

- 152 -

string as an input to the transducer, in our approach the complete set of OCR hypotheses, a sequence of vectors of a posteriori 

class probabilities, is used to build a WFST that is then composed with independent WFSTs for the error and language 

models. This combines the practical advantages of a de-coupled (OCR + post-processor) model with the full power 

of an integrated model. 

13:30-16:30,Paper TuBCT9.44 

Top down Analysis of Line Structure in Handwritten Documents 

Kasiviswanathan, Harish, Univ. at Buffalo 

Ball, Gregory R., Univ. at Buffalo 

Srihari, Sargur, Univ. at Buffalo 

One of the most challenging tasks in analyzing handwritten documents is to tackle the inherent skew that is introduced 

due to writer’s handwriting, segment the handwritten lines and estimate the skew angle and its direction. Complexities 

such as variable spacing between words and lines, variable line skew, variable line width and height, overlapping words 

and lines etc. arises in handwritten documents unlike printed documents. This paper explores the application of Radon 

transform to process handwritten documents and compares its performance with Hough transform while segmenting lines 

and detecting skew. The computational advantage of Radon transform over Hough transform with equally good results 

makes it an ideal choice to process handwritten documents. 

13:30-16:30,Paper TuBCT9.45 

Unsupervised Evaluation Methods based on Local Gray-Intensity Variances for Binarization of Historical Documents 

Ramírez-Ortegón, Marte Alejandro, Freie Univ. Berlin 

Rojas, Raul, Freie Univ. Berlin 

We attempt to evaluate the efficacy of six unsupervised evaluation method to tune Sauvola’s threshold in optical character 

recognition (OCR) applications. We propose local implementations of well-known measures based on gray-intensity variances. 

Additionally, we derive four new measures from them using the unbiased variance estimator and gray-intensity 

logarithms. In our experiment, we selected the well binarized images, according each measure, and computed the accuracy 

of the recognized text of each. The results show that the weighted and uniform variance (using logarithms) are suitable 

measures for OCR applications. 

13:30-16:30,Paper TuBCT9.46 

On the Significance of Stroke Size and Position for Online Handwritten Devanagari Word Recognition: An Empirical 

Study 

Bharath, A, Hewlett-Packard Lab. India 

Madhvanath, Sriganesh, Hewlett-Packard Lab. India 

Stroke size and position are considered as important information for online recognition of handwritten characters and 

words in oriental and Indic family of scripts especially because of their multi-stroke and two-dimensional nature. In an 

Indic script such as Devanagari, the vowel diacritics (matras) can occur at any position around the base consonant and 

there are even pairs of matras which have similar shapes and differ only in their position with respect to the base consonant. 

In this paper, we study the relevance of stroke size and position information for the recognition of online handwritten Devanagari 

words by comparing three different preprocessing schemes. Our experimental results indicate that the word recognition 

accuracy achieved using a preprocessing scheme that completely disregards the original sizes and positions of the 

strokes (and symbols) is comparable with the scheme that retains them, when the input is in discrete style, and contextual 

knowledge in the form of a lexicon is available. 

13:30-16:30,Paper TuBCT9.47 

Noise Tolerant Script Identification of Oriental and English Documents using a Downgraded Pixel Density Feature 

Wang, Ning, Concordia Univ. 

Lam, Louisa, Concordia Univ. 


Document Script Identification (DSI) is a very useful application in document processing. This paper presents a method 

for this application that uses a new noise tolerant feature, the Downgraded Pixel Density feature. Compared to other 

- 153 -

features widely used in existing DSI solutions, this new feature is much more robust to variations in slant, font and style 

of printed documents. Experimental results show that the method achieves promising identification performances. 

13:30-16:30,Paper TuBCT9.48 

Using Spatial Relations for Graphical Symbol Description 

K. C., Santosh, INRIA – LORIA and INPL 

Wendling, Laurent, Univ. Paris Descartes 

Lamiroy, Bart, LORIA – INPL 

In this paper, we address the use of unified spatial relations for symbol description. We present a topologically guided directional 

relation signature. It references a unique point set instead of one entity in a pair, thus avoiding problems related 

to erroneous choices of reference entities and preserves symmetry. We experimentally validate our method on showing its 

ability to serve in a symbol retrieval application, based only on a spatial relational descriptor that represents the links between 

the decomposed structural patterns called “vocabulary” in a spatial relational graph. 

13:30-16:30,Paper TuBCT9.49 

Automatic Discrimination between Confusing Classes with Writing Styles Verification in Arabic Handwritten Numeral 

Recognition 


Lam, Louisa, Concordia Univ. 


In handwriting recognition, confusing/conflicting writing styles can result in irreducible errors, so the study of writing 

style consistencies is important for applications. In Arabic Handwritten Numeral Recognition, most errors occur between 

samples of classes two and three due to their very similar shapes in some writing styles. In this paper, an automated writing 

style detection process is effectively implemented in the pair-wise verification of samples in these two classes. As a result, 

the recognition results have improved significantly with a reduction by 25% of previous errors. With rejection, when the 

LDA (Linear Discriminant Analysis) measurement rejection threshold is adjusted to maintain the same error rate, the 

recognition rate increases from 96.87% to 97.81%. 

13:30-16:30,Paper TuBCT9.50 

Random Subspace Method in Text Categorization 

Gangeh, Mehrdad, Univ. of Waterloo 

Kamel, Mohamed S, Univ. of Waterloo 


In text categorization (TC), which is a supervised technique, a feature vector of terms or phrases is usually used to represent 

the documents. Due to the huge number of terms in even a moderate-size text corpus, high dimensional feature space is 

an intrinsic problem in TC. Random subspace method (RSM), a technique that divides the feature space to smaller ones 

each submitted to a (base) classifier (BC) in an ensemble, can be an effective approach to reduce the dimensionality of the 

feature space. Inspired by a similar research on functional magnetic resonance imaging (fMRI) of brain, here we address 

the estimation of ensemble parameters, i.e., the ensemble size (L) and the dimensionality of feature subsets (M) by defining 

three criteria: usability, coverage, and diversity of the ensemble. We will show that relatively medium M and small L yield 

an ensemble that improves the performance of a single support vector machine, which is considered as the state-of-the-art 

in TC. 

13:30-16:30,Paper TuBCT9.51 

Shape-DNA: Effective Character Restoration and Enhancement for Arabic Text Documents 

Caner, Gulcin, Pol. Rain, Inc. 

Haritaoglu, Ismail, Pol. Rain, Inc. 

We present a novel learning-based image restoration and enhancement technique for improving character recognition performance 

of OCR products for degraded documents or documents/text captured with mobile devices such as cameraphones. 

The proposed technique is language independent and can simultaneously increase the effective resolution and 

restore broken characters with artifacts due to image capturing device such as a low quality/low resolution camera, or due 

- 154 -

to previous pre-processing such as extracting text region from the document image. The proposed technique develops a 

predictive relationship between high-resolution training images and their low-resolution/degraded counterparts, and exploits 

this relationship in a probabilistic scheme to generate a high resolution image from a low quality, low-resolution text 

image. We present a fast and scalable implementation of the proposed character restoration algorithm to improve the text 

recognition for document/text images captured by mobile phones. Experimental results demonstrate that the system effectively 

increases OCR performance for documents captured by mobile imaging devices, from levels of 50% to levels of 

over 80% for non-latin document/scene text images at 120dpi. 

13:30-16:30,Paper TuBCT9.52 

Linguistic Adaptation for Whole-Book Recognition 

Xiu, Pingping, Lehigh Univ. 


Whole-book recognition is a document image analysis strategy that operates on the complete set of a book’s page images 

using automatic adaptation to improve accuracy. Our algorithm expects to be given approximate iconic and linguistic 

models—-derived from (generally errorful) OCR results and (generally incomplete) dictionaries—-and then, guided entirely 

by evidence internal to the test set, corrects the models yielding improved accuracy. The iconic model describes 

image formation and determines the behavior of a character-image classifier. The linguistic model describes word-occurrence 

probabilities. In previous work, we reported that adapting the iconic model alone (with a perfect linguistic model) 

was able to automatically reduce word error rate on a 180-page book by a large factor. In this paper, %we also adapt the 

linguistic model. We propose an algorithm that adapts both the iconic model and the linguistic model alternately to improve 

both models on the fly. The linguistic model adaptation method, which we report here, identifies new words and adds 

them to the dictionary. With 64.6% words in the dictionary missing, our previous algorithm reduced word error rate from 

40.2% to 23.2%. The new algorithm drives word error rate down further from 23.2% to 16.0%. 

13:30-16:30,Paper TuBCT9.53 

Online Arabic Handwriting Modeling System based on the Grapheme Segmentation 

Boubaker, Houcine, Univ. of Sfax 

El Baati, Abed El Karim, Univ. of Sfax 

Kherallah, Monji, Univ. of Sfax 

Alimi, Adel. M., Univ. of Sfax 

El Abed, Haikal, Braunschweig Tech. Univ. 

We present in this paper a new approach of online Arabic handwriting modeling based on the graphemes segmentation. 

This segmentation rests on the previous detection of baseline. It involves the detection of two types of topologically meaningful 

points: the backs of the valleys adjoining the baseline and the angular points. The stage of features extraction 

allows to model the shapes of segmented graphemes by relevant geometric parameters and to estimate their diacritics 

fuzzy affectation rates. The test results show a significant improvement in recognition rate with the introduction of new 

pertinent parameters. 

- 155 -

- 156 -

Technical Program for Wednesday 


- 157 -

- 158 -

WeAT1 Marmara Hall 

Tracking and Surveillance - II Regular Session 

Session chair: Yilmaz, Alper (The Ohio State Univ.) 

09:00-09:20, Paper WeAT1.1 

The Fusion of Deep Learning Architectures and Particle Filtering Applied to Lip Tracking 

Carneiro, Gustavo, Tech. Univ. of Lisbon 

Nascimento, Jacinto, Inst. de Sistemas e Robótica 

This work introduces a new pattern recognition model for segmenting and tracking lip contours in video sequences. We 

formulate the problem as a general nonrigid object tracking method, where the computation of the expected segmentation 

is based on a filtering distribution. This is a difficult task because one has to compute the expected value using the whole 

parameter space of segmentation. As a result, we compute the expected segmentation using sequential Monte Carlo sampling 

methods, where the filtering distribution is approximated with a proposal distribution to be used for sampling. The 

key contribution of this paper is the formulation of this proposal distribution using a new observation model based on 

deep belief networks and a new transition model. The efficacy of the model is demonstrated in publicly available databases 

of video sequences of people talking and singing. Our method produces results comparable to state-of-the-art models, but 

showing potential to be more robust to imaging conditions. 

09:20-09:40, Paper WeAT1.2 

Robust Head-Shoulder Detection by PCA-Based Multilevel HOG-LBP Detector for People Counting 

Zeng, Chengbin, Beijing Univ. of Posts and Telecommunications 

Ma, Huadong, Beijing Univ. of Posts and Telecommunications 

Robustly counting the number of people for surveillance systems has widespread applications. In this paper, we propose 

a robust and rapid head-shoulder detector for people counting. By combining the multilevel HOG (Histograms of Oriented 

Gradients) with the multilevel LBP (Local Binary Pattern) as the feature set, we can detect the head-shoulders of people 

robustly, even though there are partial occlusions occurred. To further improve the detection performance, Principal Components 

Analysis (PCA) is used to reduce the dimension of the multilevel HOG-LBP feature set. Our experiments show 

that the PCA based multilevel HOG-LBP descriptors are more discriminative, more robust than the state-of-the-art algorithms. 

For the application of the real-time people-flow estimation, we also incorporate our detector into the particle filter 

tracking and achieve convincing accuracy 

09:40-10:00, Paper WeAT1.3 

Adaptive Motion Model for Human Tracking using Particle Filter 

Mohammad Hossein Ghaeminia, Mohammad Hossein Ghaeminia, Iran Univ. of Science and Tech. 

Shabani, Amir-Hossein, Univ. of Waterloo 

Baradaran Shokouhi, Shahriar, Iran Univ. ofScience & Tech. 

This paper presents a novel approach to model the complex motion of human using a probabilistic autoregressive moving 

average model. The parameters of the model are adaptively tuned during the course of tracking by utilizing the main 

varying components of the pdf of the target’s acceleration and velocity. This motion model, along with the color histogram 

as the measurement model, has been incorporated in the particle filtering framework for human tracking. The proposed 

method is evaluated by PETS benchmark in which the targets have non-smooth motion and suddenly change their motion 

direction. Our method competes with the state-of-the-art techniques for human tracking in the real world scenario. 

10:00-10:20, Paper WeAT1.4 

Bayesian GOETHE Tracking 

Wirkert, Sebastian, Ec. Centrale de Lyon 

Dellandréa, Emmanuel, Ec. Centrale de Lyon 


Occlusions pose serious challenges when tracking multiple targets. By severly changing the measurement, they imply 

strong inter-target dependencies. Exact computation of these dependencies is not feasible. The GOETHE approximations 

preserve much of the information while staying computationally affordable. 

- 159 -

10:20-10:40, Paper WeAT1.5 

A Combined Self-Configuring Method for Object Tracking in Colour Video 

Rosell Ortega, Juan, Pol. Univ. of Valencia 

Andreu-Garcia, Gabriela, Pol. Univ. of Valencia 

Rodas-Jordà, Angel, Pol. Univ. of Valencia 

Atienza-Vanacloig, Vicente, Pol. Univ. of Valencia 

This paper introduces a novel approach to background modelling. We propose using initially a method to extract scene 

parameters from a sequence of frames. These parameters, together with an initial background model, are used as a starting 

point for a background subtraction method based on fuzzy logic. Our method permits modelling the background and detecting 

moving objects in a video sequence without user intervention. The algorithm is designed to work with CIEL*a*b* 

coordinates with multi modal support and eludes user parameters or fixed or probabilistic thresholds usually found in the 

traditional background subtraction methods. Quantitative and qualitative results obtained with a well-known benchmark 

and comparisons with other approaches justify the model. 

WeAT2 Dolmabahçe Hall A 

Shape Modeling - I Regular Session 

Session chair: De Floriani, L. 

09:00-09:20, Paper WeAT2.1 

A Geometric Invariant Shape Descriptor based on the Radon, Fourier, and Mellin Transforms 

Hoang, Thai V., Univ. Nancy 2-LORIA 


A new shape descriptor invariant to geometric transformation based on the Radon, Fourier, and Mellin transforms is proposed. 

The Radon transform converts the geometric transformation applied on a shape image into transformation in the 

columns and rows of the Radon image. Invariances to translation, rotation, and scaling are obtained by applying 1D Fourier- 

Mellin and Fourier transforms on the columns and rows of the shape’s Radon image respectively. Experimental results on 

different datasets show the usefulness of the proposed shape descriptor. 

09:20-09:40, Paper WeAT2.2 

Fundamental Geodesic Deformations in Spaces of Treelike Shapes 

Feragen, Aasa, Univ. of Copenhagen 

Lauze, Francois, Univ. of Copenhagen 

Nielsen, Mads 

This paper presents a new geometric framework for analysis of planar treelike shapes for applications such as shape matching, 

recognition and morphology, using the geometry of the space of treelike shapes. Mathematically, the shape space is 

given the structure of a stratified set which is a quotient of a normed vector space with a metric inherited from the vector 

space norm. We give examples of geodesic paths in tree-space corresponding to fundamental deformations of small trees, 

and discuss how these deformations are key building blocks for understanding deformations between larger trees. 

09:40-10:00, Paper WeAT2.3 

Shape Interpolation with Flattenings 

Meyer, Fernand, Mines-ParisTech 

This paper presents the binary flattenings of shapes, first as a connected operator suppressing particles or holes, second as 

an erosion in a particular lattice of shapes. Using this erosion, it is then possible to construct a distance from a shape to 

another and derive from it an interpolation function between shapes. 

10:00-10:20, Paper WeAT2.4 

Circularity Measuring in Linear Time 

Nguyen, Thanh Phuong, LORIA 

Debled-Rennesson, Isabelle, LORIA - Nancy Univ. 

We propose a new circularity measure inspired from Arkin \cite{Arkin91}, Latecki \cite{Latecki00} tools of shape match- 

- 160 -

ing that is constructed in a tangent space. We then introduce a linear algorithm that uses this measure for circularity measuring. 

This method can also be regarded as a method for circular object recognition. Experimental results show the robustness 

of this simple method. 

10:20-10:40, Paper WeAT2.5 

Multiscale Analysis from 1D Parametric Geometric Decomposition of Shapes 


This paper deals with the construction of a non parametric multiscale analysis from a 1D parametric decomposition of 

shapes where the elements of the decomposition are geometric primitives. We focus on the case of linear structures in 

shapes but our construction readily extends to the case of any geometric primitives. One key point of the construction is 

that it is truly multiscale in the sense that a higher level is a sublevel of a lower one and that it preserves symmetries of 

shapes. We made some experiments to show the simplification it provides on classical shapes. Results are promising. 

WeAT3 Dolmabahçe Hall B 

Image and Physics-Based Modeling Regular Session 

Session chair: Heyden, Anders (Lund Univ.) 

09:00-09:20, Paper WeAT3.1 

Region-Based Image Transform for Transition between Object Appearances 

Takahashi, Tomokazu, Gifu Shotoku Gakuen Univ. 

Kono, Yuki, Nagoya Univ. 

Ide, Ichiro, Nagoya Univ. 


We propose a method of region-based image transform to achieve accurate transition between object appearances. A viewtransition 

model (VTM) is one of the statistical methods that learn appearance transition from a sample image dataset of 

a large number of objects with various appearances. However, the VTM method has a practical problem that the appearance 

transition cannot be performed accurately if a sufficient number of learning samples is not available in the dataset. To 

cope with the problem, the proposed method first determines the regions of input and output images whose pixel values 

mutually affect each other during appearance transition, then transforms iteratively between partial images in the regions. 

We conducted experiments using actual image datasets. The results show that the proposed method could accurately transform 

appearances compared with the VTM method. 

09:20-09:40, Paper WeAT3.2 

Extended Multiple View Geometry for Lights and Cameras from Photometric and Geometric Constraints 

Kato, Kazuki, Nagoya Inst. of Tech. 



In this paper, we derive a novel multilinear relationship for close light sources and cameras. In this multilinear relationship, 

image intensities and image point coordinates can be handled in a single framework. We first derive a linear representation 

of image intensity taken under a general close light source. We next analyze multiple view geometry among close light 

sources and cameras, and derive novel multilinear constraints among image intensity and image coordinates. In particular, 

we study the detail of the multilinear relationship among 7 lights and a camera. Finally, we show some experimental 

results, and show that the new multilinear relationship can be used for linearly generating images illuminated by arbitrary 

close light sources. 

09:40-10:00, Paper WeAT3.3 

Near-Regular BTF Texture Model 


Hatka, Martin, Inst. of Information Theory and Automation 

In this paper we present a method for seamless enlargement and editing of intricate near-regular type of bidirectional 

texture function (BTF) which contains simultaneously both regular periodic and stochastic components. Such BTF textures 

- 161 -

cannot be convincingly synthesised using neither simple tiling nor using purely stochastic models. However these textures 

are ubiquitous in many man-made environments and also in some natural scenes. Thus they are required for their realistic 

appearance visualisation. The principle of the presented BTF-NR synthesis and editing method is to automatically separate 

periodic and random components from one or more input textures. Each of these components is subsequently independently 

modelled using its corresponding optimal method. The regular texture part is modelled using our roller method, while the 

random part is synthesised from its estimated exceptionally efficient Markov random field based representation. Both independently 

enlarged texture components from the original measured textures representing one (enlargement) or several 

(editing) materials are combined in the resulting synthetic near-regular texture. 

10:00-10:20, Paper WeAT3.4 

Detecting Vorticity in Optical Flows of Fluids 

Doshi, Ashish, Univ. of Surrey 

Bors, Adrian, Univ. of York 

In this paper we apply the diffusion framework to dense optical flow estimation. Local image information is represented 

by matrices of gradients between paired locations. Diffusion distances are modelled as sums of eigenvectors weighted by 

their eigenvalues extracted following the eigen decomposion of these matrices. Local optical flow is estimated by correlating 

diffusion distances characterizing features from different frames. A feature confidence factor is defined based on 

the local correlation efficiency when compared to that of its neighbourhood. High confidence optical flow estimates are 

propagated to areas of lower confidence. 

10:20-10:40, Paper WeAT3.5 

Modeling Facial Skin Motion Properties in Video and its Application to Matching Faces across Expressions 

Manohar, Vasant, Raytheon BBN Tech. 

Shreve, Matthew, Univ. of South Florida 


Sarkar, Sudeep, Univ. of South Florida 

In this paper, we propose a method to model the material constants (Young’s modulus) of the skin in subregions of the 

face from the motion observed in multiple facial expressions and present its relevance to an image analysis task such as 

face verification. On a public database consisting of 40 subjects undergoing some set of facial motions associated with 

anger, disgust, fear, happy, sad, and surprise expressions, we present an expression invariant strategy to matching faces 

using the Young’s modulus of the skin. Results show that it is indeed possible to match faces across expressions using the 

material properties of their skin. 

WeAT4 Topkapı Hall A 

Kernel Methods Regular Session 

Session chair: Aksoy, Selim (Bilkent Univ.) 

09:00-09:20, Paper WeAT4.1 

AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach 

Zhang, Ziming, Simon Fraser Univ. 

Li, Ze-Nian, Simon Fraser Univ. 

Drew, Mark S. 

In this paper, we propose a novel large-margin based approach for multiple kernel learning (MKL) using biconvex optimization, 

called Adaptive Multiple Kernel Learning (AdaMKL). To learn the weights for support vectors and the kernel 

coefficients, AdaMKL minimizes the objective function alternately by learning one component while fixing the other at a 

time, and in this way only one convex formulation needs to be solved. We also propose a family of biconvex objective 

functions with an arbitrary Lp-norm (p>=1) of kernel coefficients. As our experiments show, AdaMKL performs comparably 

with state-of-the-art convex optimization based MKL approaches, but its learning is much simpler and faster. 

- 162 -

09:20-09:40, Paper WeAT4.2 

Von Mises-Fisher Mean Shift for Clustering on a Hypersphere 

Kobayashi, Takumi, Nat. Inst. of Advanced Industrial Science 

Otsu, Nobuyuki, Nat. Inst. of Advanced Industrial Science 

We propose a method of clustering sample vectors on a hypersphere. Sample vectors are normalized in many cases, especially 

when applying kernel functions, and thus lie on a (unit) hypersphere. Considering the constraint of the hypersphere, 

the proposed method utilizes the von Mises-Fisher distribution in the framework of mean shift. It is also extended to the 

kernel-based clustering method via kernel tricks to cope with complex distributions. The algorithms of the proposed methods 

are based on simple matrix calculations. In the experiments, including a practical motion clustering task, the proposed 

methods produce favorable clustering results. 

09:40-10:00, Paper WeAT4.3 

Nonlinear Mappings for Generative Kernels on Latent Variable Models 

Carli, Anna, Univ. of Verona 


Baldo, Sisto, Univ. of Verona 


Generative kernels have emerged in the last years as an effective method for mixing discriminative and generative approaches. 

In particular, in this paper, we focus on kernels defined on generative models with latent variables (e.g. the states 

in a Hidden Markov Model). The basic idea underlying these kernels is to compare objects, via a inner product, in a feature 

space where the dimensions are related to the latent variables of the model. Here we propose to enhance these kernels via 

a nonlinear normalization of the space, namely a nonlinear mapping of space dimensions able to exploit their discriminative 

characteristics. In this paper we investigate three possible nonlinear mappings, for two HMM-based generative kernels, 

testing them in different sequence classification problems, with really promising results. 

10:00-10:20, Paper WeAT4.4 

Multiple Kernel Learning with High Order Kernels 

Wang, Shuhui, Chinese Acad. of Sciences 




Previous Multiple Kernel Learning approaches (MKL) employ different kernels by their linear combination. Though some 

improvements have been achieved over methods using single kernel, the advantages of employing multiple kernels for 

machine learning are far from being fully developed. In this paper, we propose to use high order kernels to enhance the 

learning of MKL when a set of original kernels are given. High order kernels are generated by the products of real power 

of the original kernels. We incorporate the original kernels and high order kernels into a unified localized kernel logistic 

regression model. To avoid over-fitting, we apply group LASSO regularization to the kernel coefficients of each training 

sample. Experiments on image classification prove that our approach outperforms many of the existing MKL approaches. 

10:20-10:40, Paper WeAT4.5 

Kernel-Based Implicit Regularization of Structured Objects 

Dupé, François-Xavier, GREYC 

Bougleux, Sébastien, Univ. de Caen 

Brun, Luc, ENSICAEN 

Lezoray, Olivier, Univ. de Caen 

Elmoataz, Abderrahim, Univ. de Caen 

Weighted Graph regularization provides a rich framework that allows to regularize functions defined over the vertices of a weighted 

graph. Until now, such a framework has been only defined for real or multivalued functions hereby restricting the regularization framework 

to numerical data. On the other hand, several kernels have been defined on structured objects such as strings or graphs. Using definite 

positive kernels, each original object is associated by the ``kernel trick’’ to one element of an Hilbert space. As a consequence, this 

paper proposes to extend the weighted graph regularization framework to objects implicitly defined by their kernel hereby performing 

the regularization within the Hilbert space associated to the kernel. This work opens the door to the regularization of structured objects. 

- 163 -

WeAT5 Topkapı Hall B 

Face Analysis Regular Session 

Session chair: Lovell, Brian Carrington (The Univ. of Queensland) 

09:00-09:20, Paper WeAT5.1 

Face Sketch Synthesis via Sparse Representation 

Chang, Liang, Beijing Normal Univ. 

Zhou, Mingquan, Beijing Normal Univ. 

Han, Yanjun, Chinese Acad. of Sciences 

Deng, Xiaoming, Chinese Acad. of Sciences 

Face sketch synthesis with a photo is challenging due to that the psychological mechanism of sketch generation is difficult 

to be expressed precisely by rules. Current learning-based sketch synthesis methods concentrate on learning the rules by 

optimizing cost functions with low-level image features. In this paper, a new face sketch synthesis method is presented, 

which is inspired by recent advances in sparse signal representation and neuroscience that human brain probably perceives 

images using high-level features which are sparse. Sparse representations are desired in sketch synthesis due to that sparseness 

can adaptively selects the most relevant samples which give best representations of the input photo. We assume that 

the face photo patch and its corresponding sketch patch follow the same sparse representation. In the feature extraction, 

we select succinct high-level features by using the sparse coding technique, and in the sketch synthesis process each sketch 

patch is synthesized with respect to high-level features by solving an $l_1$-norm optimization. Experiments have been 

given on CUHK database to show that our method can resemble the true sketch fairly well. 

09:20-09:40, Paper WeAT5.2 

Restoration of a Frontal Illuminated Face Image based on KPCA 

Xie, Xiaohua, Sun Yat-sen Univ. 

Zheng, Wei-Shi, Queen Mary Univ. of London 



In this paper, we propose a novel illumination-normalization method. By using the combination of the Kernel Principal 

Component Analysis (KPCA) and Pre-image technology, this method can restore the frontal-illuminated face image from 

a single non-frontal-illuminated face image. In this method, a frontal-illumination subspace is first learned by KPCA. For 

each input face image, we project its large-scale features, which are affected by illumination variations, onto this subspace 

to normalize the illumination. Then the frontal-illuminated face image is reconstructed by combining the small- and the 

normalized large- scale features. Unlike most existing techniques, the proposed method does not require any shape modeling 

or lighting estimation. As a holistic reconstruction, KPCA+Pre-image technology incurs less local distortion. Compared 

to directly applying KPCA+Pre-image technology on the original image, our proposed method can be better at 

processing an image of a face that is outside the training set. Experiments on CMU-PIE and Extended Yale B face databases 

show that the proposed method outperforms state-of-the-art algorithms. 

09:40-10:00, Paper WeAT5.3 

A Bayesian Approach to Face Hallucination using DLPP and KRR 

Tanveer, Muhammad, National Univ. of Science and Tech. 

Rao, Naveed Iqbal, National Univ. of Sciences and Tech. 

Low resolution faces are the main barrier to efficient face recognition and identification in several problems primarily 

surveillance systems. To mitigate this problem we proposes a novel learning based two-step approach by the use of Direct 

Locality Preserving Projections (DLPP), Maximum a posterior estimation (MAP) and Kernel Ridge Regression (KRR) 

for super-resolution of face images or in other words Face Hallucination. First using DLPP for manifold learning and 

MAP estimation, a smooth Global high resolution image is obtained. In second step to introduce high frequency components 

KRR is used to model the Residue high resolution image, which is then added to Global image to get final high 

quality detail featured Hallucinated face image. As shown in experimental results the proposed system is robust and 

efficient in synthesizing low resolution faces similar to the original high resolution faces. 

- 164 -

10:00-10:20, Paper WeAT5.4 

Face Hallucination under an Image Decomposition Perspective 

Liang, Yan, Sun Yat-sen Univ. 


Xie, Xiaohua, Sun Yat-sen Univ. 

Liu, Wanquan, Curtin Univ. of Tech. 

In this paper we propose to convert the task of face hallucination into an image decomposition problem, and then use the 

morphological component analysis (MCA) for hallucinating a single face image, based on a novel three-step framework. 

Firstly, a low-resolution input image is up-sampled by interpolation. Then, the MCA is employed to decompose the interpolated 

image into a high-resolution image and an unsharp masking, as MCA can properly decompose a signal into special 

parts according to typical dictionaries. Finally, a residue compensation, which is based on the neighbor reconstruction of 

patches, is performed to enhance the facial details. The proposed method can effectively exploit the facial properties for 

face hallucination under the image decomposition perspective. Experimental results demonstrate the effectiveness of our 

method, in terms of the visual quality of the hallucinated face images. 

10:20-10:40, Paper WeAT5.5 

Gender Classification using Local Directional Pattern (LDP) 

Jabid, Taskeed, Kyung Hee Univ. 

Kabir, Md. Hasanul, Kyung Hee Univ. 

Chae, Oksam, Kyung Hee Univ. 

In this paper, we present a novel texture descriptor Local Directional Pattern (LDP) to represent facial image for gender 

classification. The face area is divided into small regions, from which LDP histograms are extracted and concatenated 

into a single vector to efficiently represent the face image. The classification is performed by using support vector machines 

(SVMs), which had been shown to be superior to traditional pattern classifiers in gender classification problem. Experimental 

results show the superiority of the proposed method on the images collected from FERET face database and 

achieved 95.05% accuracy. 

WeAT6 Anadolu Auditorium 

Document Analysis - I Regular Session 

Session chair: Baird, Henry (Lehigh Univ.) 

09:00-09:20, Paper WeAT6.1 

Generating Sets of Classifiers for the Evaluation of Multi-Expert Systems 

Impedovo, Donato, Pol. di Bari 

Pirlo, Giuseppe, Univ. degli Studi di Bari 

This paper addresses the problem of multi-classifier system evaluation by artificially generated classifiers. For the purpose, 

a new technique is presented for the generation of sets of artificial abstract-level classifiers with different characteristics 

at the individual-level (i.e. recognition performance) and at the collective-level (i.e. degree of similarity). The technique 

has been used to generate sets of classifiers simulating different working conditions in which the performance of combination 

methods can be estimated. The experimental tests demonstrate the effectiveness of the approach in generating simulated 

data useful to investigate the performance of combination methods for abstract-level classifiers. 

09:20-09:40, Paper WeAT6.2 

Imbalance and Concentration in K-NN Classification 

Yin, Dawei, Lehigh Univ. 

An, Chang, Lehigh Univ. 


We propose algorithms for ameliorating difficulties in fast approximate k Nearest Neighbors (kNN) classifiers that arise 

from imbalances among classes in numbers of samples, and from concentrations of samples in small regions of feature 

space. These problems can occur with a wide range of binning kNN algorithms such as k-D trees and our variant, hashed 

k-D trees. The principal method we discuss automatically rebalances training data and estimates concentration in each Kd 

hash bin separately, which then controls how many samples should be kept in each bin. We report an experiment on 

- 165 -

86.7M training samples which shows a 7-times speedup and higher minimum per-class recall, compared to previously reported 

methods. The context of these experiments is the need for image classifiers able to handle an unbounded variety of 

inputs: in our case, highly versatile document classifiers which require training sets as large as a billion training samples. 

09:40-10:00, Paper WeAT6.3 

Gaussian Mixture Models for Arabic Font Recognition 

Slimane, Fouad, Univ. of Fribourg 

Kanoun, Slim, ENIS 


Ingold, Rolf, Univ. of Fribourg 

Hennebert, Jean, Univ. of Applied Sciences 

We present in this paper a new approach for Arabic font recognition. Our proposal is to use a fixed-length sliding window 

for the feature extraction and to model feature distributions with Gaussian Mixture Models (GMMs). This approach presents 

a double advantage. First, we do not need to perform a priori segmentation into characters, which is a difficult task 

for arabic text. Second, we use versatile and powerful GMMs able to model finely distributions of features in large multidimensional 

input spaces. We report on the evaluation of our system on the APTI (Arabic Printed Text Image) database 

using 10 different fonts and 10 font sizes. Considering the variability of the different font shapes and the fact that our 

system is independent of the font size, the obtained results are convincing and compare well with competing systems. 

10:00-10:20, Paper WeAT6.4 

Transfer of Supervision for Improved Address Standardization 

Kothari, Govind, IBM 

Faruquie, Tanveer, IBM Res. India 

Subramaniam, L. Venkata, IBM Res. India 

K, Hima Prasad, IBM Res. India 

Mohania, Mukesh, IBM Res. India 

Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners 

can be easily trained for different data sources. However, training requires labeling large corpora for each data source 

which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a 

given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data 

from one source. The shared component distribution across these dirichlet processes captures the semantic relation between 

data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision. 

10:20-10:40, Paper WeAT6.5 

Bag of Characters and SOM Clustering for Script Recognition and Writer Identification 

Marinai, Simone, Univ. of Florence 

Miotti, Beatrice, Univ. of Florence 

Soda, Giovanni, Univ. di Firenze 

In this paper, we describe a general approach for script (and language) recognition from printed documents and for writer 

identification in handwritten documents. The method is based on a bag of visual word strategy where the visual words 

correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM). Unknown pages (words 

in the case of script recognition) are classified comparing their vectorial representations with those of one training set 

using a cosine similarity. The comparison is improved using a similarity score that is obtained taking into account the 

SOM organization of cluster centroids. % Promising results are presented for both printed documents and handwritten 

musical scores. 

WeAT7 Dolmabahçe Hall C 

Gait and Gesture Regular Session 

Session chair: Shinoda, Koichi (Tokyo Institute of Technology) 

- 166 -

09:00-09:20, Paper WeAT7.1 

Multi-View Gait Recognition based on Motion Regression using Multilayer Perceptron 

Kusakunniran, Worapan, Univ. of New South Wales 

Wu, Qiang, Univ. of Tech. Sydney 

Zhang, Jian, National ICT Australia 

Li, Hongdong, Australian National Univ. 

It has been shown that gait is an efficient biometric feature for identifying a person at a distance. However, it is a challenging 

problem to obtain reliable gait feature when viewing angle changes because the body appearance can be different under 

the various viewing angles. In this paper, the problem above is formulated as a regression problem where a novel View 

Transformation Model (VTM) is constructed by adopting Multilayer Perceptron (MLP) as regression tool. It smoothly estimates 

gait feature under an unknown viewing angle based on motion information in a well selected Region of Interest 

(ROI) under other existing viewing angles. Thus, this proposal can normalize gait features under various viewing angles 

into a common viewing angle before gait similarity measurement is carried out. Encouraging experimental results have 

been obtained based on widely adopted benchmark database. 

09:20-09:40, Paper WeAT7.2 

Robust Gait Recognition against Speed Variation 

Aqmar, Muhammad Rasyid, Tokyo Inst. of Tech. 

Shinoda, Koichi, Tokyo Inst. of Tech. 

Furui, Sadaoki 

Variations in walking speed have a strong impact on the recognition of gait. We propose a method of recognition of gait 

that is robust against walking-speed variations. It is established on a combination of Fisher discriminant analysis (FDA)based 

cubic higher-order local auto-correlation (CHLAC) and the statistical framework provided by hidden Markov models 

(HMMs). The HMMs in this method identify the phase of each gait even when walking speed changes nonlinearly, and 

the CHLAC features capture the within-phase spatio-temporal characteristics of each individual. We compared the performance 

of our method with other conventional methods in our evaluation using three different databases, i.e., USH, 

USF-NIST, and Tokyo Tech DB. Ours was equal or better than the others when the speed did not change too much, and 

was significantly better when the speed varied across and within a gait sequence. 

09:40-10:00, Paper WeAT7.3 

Gait Recognition using Period-Based Phase Synchronization for Low Frame-Rate Videos 

Mori, Atsushi, Osaka Univ. 



This paper proposes a method for period-based gait trajectory matching in the eigenspace using phase synchronization for 

low frame-rate videos. First, a gait period is detected by maximizing the normalized autocorrelation of the gait silhouette 

sequence for the temporal axis. Next, a gait silhouette sequence is expressed as a trajectory in the eigenspace and the gait 

phase is synchronized by time stretching and time shifting of the trajectory based on the detected period. In addition, multiple 

period-based matching results are integrated via statistical procedures for more robust matching in the presence of 

fluctuations among gait sequences. Results of experiments conducted with 185 subjects to evaluate the performance of 

the gait verification with various spatial and temporal resolutions, demonstrate the effectiveness of the proposed method. 

10:00-10:20, Paper WeAT7.4 

Body Motion Analysis for Multi-Modal Identity Verification 

Williams, George, NYU 

Taylor, Graham, NYU 

Smolskiy, Kirill, NYU 

Bregler, Christoph, NYU 

This paper shows how Body Motion Signature Analysis a new soft-biometrics technique can be used for identity verification. 

It is able to extract motion features from the upper body of people and estimates so called super-features for input 

to a classifier. We demonstrate how this new technique can be used to identify people just based on their motion, or it can 

be used to significantly improve hard-biometrics techniques. For example, face verification achieves on this domain 6.45% 

- 167 -

Equal Error Rate (EER), and the combined verification performance of motion features and face reduces the error to 4.96% 

using an adaptive score-level integration method. The more ambiguous motion-only performance is 17.1% EER. 

10:20-10:40, Paper WeAT7.5 

Robust Sign Language Recognition with Hierarchical Conditional Random Fields 

Yang, Hee-Deok, Chosun Univ. 

Lee, Seong-Whan, Korea Univ. 

Sign language spotting is the task of detection and recognition of signs (words in the predefined vocabulary) and fingerspellings 

(a combination of continuous alphabets that are not found in signs) in a signed utterance. The internal structures 

of signs and fingerspellings differ significantly. Therefore, it is difficult to spot signs and fingerspellings simultaneously. 

In this paper, a novel method for spotting signs and fingerspellings is proposed, which can distinguish signs, fingerspellings, 

and nonsign patterns. This is achieved through a hierarchical framework consisting of three steps; (1) Candidate segments 

of signs and fingerspellings are discriminated with a two-layer conditional random field (CRF). (2) Hand shapes of detected 

signs and fingerspellings are verified by BoostMap embeddings. (3) The motions of fingerspellings are verified in order 

to distinguish those which have similar hand shapes and differ only in hand trajectories. Experiments demonstrate that the 

proposed method can spot signs and fingerspellings from utterance data at rates of 83% and 78%, respectively. 

WeAT8 Upper Foyer 

Image and Video Processing Poster Session 

Session chair: Koch, Reinhard (Univ. of Kiel) 

09:00-11:10, Paper WeAT8.1 

Compressive Sampling Recovery for Natural Images 

Shang, Fei, Beijing Inst. of Tech. 

Du, Huiqian, Beijing Inst. of Tech. 


Compressive sampling (CS) is a novel data collection and coding theory which allows us to recover sparse or compressible 

signals from a small set of measurements. This paper presents a new model for natural image recovery, in which the smooth 

l0 norm and the approximate total-variation (TV) norm are adopted simultaneously. By using one-order gradient decrease, 

the speed of algorithm for this new model can be guaranteed. Experimental results demonstrate that the principle of the 

model is correct and the performance is as good as that based on TV model. The computing speed of the proposed method 

is two orders of magnitude faster than that of interior point method and two times faster than that of the Nesta optimization 

based on TV model. 

09:00-11:10, Paper WeAT8.3 

De-Ghosting for Image Stitching with Automatic Content-Awareness 

Tang, Yu, The Univ. of Aizu 

Shin, Jungpil, The Univ. of Aizu 

Ghosting artifact in the field of image stitching is a common problem and the elimination of it is not an easy task. In this 

paper, we propose an intuitive technique according to a stitching line based on a novel energy map which is essentially a 

combination of gradient map which indicates the presence of structures and prominence map which determines the attractiveness 

of a region. We consider a region is of significance only if it is both structural and attractive. Using this improved 

energy map, the stitching line can easily skirt around the moving objects or salient parts based on the philosophy that 

human eyes mostly notice only the salient features of an image. We compare result of our method to those of 4 state-ofthe-art 

image stitching methods and it turns out that our method outperforms the 4 methods in removing ghosting artifacts. 

09:00-11:10, Paper WeAT8.4 

Content-Adaptive Automatic Image Sharpening 

Kobayashi, Tatsuya, Nagoya City Univ. 

Tajima, Johji, Nagoya City Univ. 

- 168 -

Optimal sharpness differs from image to image, de-pending on the content. In general, human observer prefers images of 

artifacts sharper and those of natural-objects less sharper. We have developed a content-adaptive automatic image sharpening 

algorithm that relies on the length of lines extracted from the image. It is applicable to images with various regions 

such as those contain natural and artificial objects. The proposed algorithm is expected to be used in image processing 

modules of image input/output devices, e.g. digital cameras, printers, etc. 

09:00-11:10, Paper WeAT8.5 

Irradiance Preserving Image Interpolation 

Giachetti, Andrea, Univ. di Verona 

In this paper we present a new image up scaling (single image super resolution) algorithm. It is based on the refinement 

of a simple pixel decimation followed by an optimization step maximizing the smoothness of the second order derivatives 

of the image intensity while keeping the sum of the brightness values of each subdivided pixel (i.e. the estimated irradiance 

on the area) constant. The method is physically grounded and creates images that appear very sharp and with reduced artifacts. 

Subjective and objective tests demonstrate the high quality of the results obtained. 

09:00-11:10, Paper WeAT8.7 

Interpolation and Sampling on a Honeycomb Lattice 

Strand, Robin, Uppsala Univ. 

In this paper, we focus on the three-dimensional honeycomb point-lattice in which the Voronoi regions are hexagonal 

prisms. The ideal interpolation function is derived by using a Fourier transform of the sampling lattice. From these results, 

the sampling efficiency of the lattice follows. 

09:00-11:10, Paper WeAT8.8 

Optimization of Topological Active Models with Multiobjective Evolutionary Algorithms 

Novo Buján, Jorge, Varpa group, Univ. of A Coruña 

Santos, Jose, Univ. of A Coruña 

Gonzalez Penedo, Manuel Francisco, Univ. of A Coruña 

Fernández Arias, Alba, VARPA Group, Univ. of A Coruña 

In this work we use the evolutionary multiobjective methodology for the optimization of topological active models, a deformable 

model that integrates features of region-based and boundary-based segmentation techniques. The model deformation 

is controlled by energy functions that must be minimized. As in other deformable models, a correct segmentation 

is achieved through the optimization of the model, governed by energy parameters that must be experimentally tuned. 

Evolutionary multiobjective optimization gives a solution to this problem by considering the optimization of several objectives 

in parallel. Concretely, we use the SPEA2 algorithm, adapted to our application, the search of the Pareto optimal 

individuals. The proposed method was tested on several representative images from different domains yielding highly accurate 

results. 

09:00-11:10, Paper WeAT8.9 

Fast Super-Resolution using Weighted Median Filtering 

Nasonov, Andrey, Lomonosov Moscow State Univ. 

Krylov, Andrey S., Lomonosov Moscow State Univ. 

A non-iterative method of image super-resolution based on weighted median filtering with Gaussian weights is proposed. 

Visual tests and basic edges metrics were used to examine the method. It was shown that the weighted median filtering 

reduces the errors caused by inaccurate motion vectors. 

- 169 -

09:00-11:10, Paper WeAT8.10 

Geodesic Thin Plate Splines for Image Segmentation 

Lombaert, Herve, Ec. Pol. de Montreal 

Cheriet, Farida, Ec. Pol. de Montreal 

Thin Plate Splines are often used in image registration to model deformations. Its physical analogy involves a thin lying 

sheet of metal that is deformed and forced to pass through a set of control points. The Thin Plate Spline equation minimizes 

that thin plate bending energy. Rather than using Euclidean distances between control points for image deformation, we 

are using geodesic distances for image segmentation. Control points become seed points and force the thin plate to pass 

through given heights. Intuitively, the thin plate surface in the vicinity of a seed point within a region should have similar 

heights. The minimally bended thin plate actually gives a “confidence” map telling what the closest seed point is for every 

surface point. The Thin Plate Spline has a closed-form solution which is fast to compute and global optimal. This method 

shows comparable results to the Graph Cuts method. 

09:00-11:10, Paper WeAT8.11 

Gestures and Lip Shape Integration for Cued Speech Recognition 

Heracleous, Panikos, Advanced Telecommunications Res. Inst. International 

Beautemps, Denis, Gipsa-Lab. 

Hagita, Norihiro, Advanced Telecommunications Res. Inst. International 

In this article, automatic recognition of Cued Speech in French based on hidden Markov models (HMMs) is presented. 

Cued Speech is a visual mode, which uses hand shapes in different positions and in combination with lip-patterns of speech 

makes all the sounds of spoken language clearly understandable to deaf and hearing-impaired people. The aim of Cued 

Speech is to overcome the problems of lipreading and thus enable deaf children and adults to understand full spoken language. 

In this study, lip shape component is fused with hand component using also multistream HMM decision fusion to 

realize Cued Speech recognition, and continuous phoneme recognition experiments using data from a normal-hearing and 

a deaf cuer were conducted. In the case of the normal-hearing cuer, the obtained phoneme accuracy was 83.5%, and in the 

case of the deaf cuer 82.1%. 

09:00-11:10, Paper WeAT8.12 

IFLT based Real-Time Framework for Image-Matching 

Janney, Pranam, Univ. of New South Wales 

Geers, Glenn, National ICT Australia 

In this paper we show that the features generated by the recently presented Invariant Features of Local Textures (IFLT) 

technique can be used in a SIFT like framework to deliver real-time point wise image matching with performance comparable 

to existing state-of-the-art image matching systems. The proposed framework is also capable of saving considerable 

amount of computation time. 

09:00-11:10, Paper WeAT8.13 

High-Order Circular Derivative Pattern for Image Representation and Recognition 

Zhao, Sanqiang, Griffith Univ. / National ICT Australia 


Caelli, Terry, National ICT Australia 

Micropattern based image representation and recognition, e.g. Local Binary Pattern (LBP), has been proved successful 

over the past few years due to its advantages of illumination tolerance and computational efficiency. However, LBP only 

encodes the first-order radial-directional derivatives of spatial images and is inadequate to completely describe the discriminative 

features for classification. This paper proposes a new Circular Derivative Pattern (CDP) which extracts highorder 

derivative information of images along circular directions. We argue that the high-order circular derivatives contain 

more detailed and more discriminative information than the first-order LBP in terms of recognition accuracy. Experimental 

evaluation through face recognition on the FERET database and insect classification on the NICTA Biosecurity Dataset 

demonstrated the effectiveness of the proposed method. 

- 170 -

09:00-11:10, Paper WeAT8.14 

Automatic Face Replacement in Video based on 2D Morphable Model 

Min, Feng, WuHan Inst. of Tech. 


Wang, Zhefu, Wuhan Inst. of Tech. 

This paper presents an automatic face replacement approach in video based on 2D morphable model. Our approach includes 

three main modules: face alignment, face morph, and face fusion. Given a source image and target video, the Active Shape 

Models (ASM) is adopted to source image and target frames for face alignment. Then the source face shape is warped to 

match the target face shape by a 2D morphable model. The color and lighting of source face are adjusted to keep consistent 

with those of target face, and seamlessly blended in the target face. Our approach is fully automatic without user interference, 

and generates natural and realistic results. 

09:00-11:10, Paper WeAT8.15 

3D Deformable Surfaces with Locally Self-Adjusting Parameters – a Robust Method to Determine Cell Nucleus Shapes 

Keuper, Margret, Univ. of Freiburg 

Schmidt, Thorsten, Univ. of Freiburg 

Padeken, Jan, Max-Planck-Insitute of Immunobiology 

Heun, Patrick, Max-Planck-Inst. of Immunobiology 

Palme, Klaus, Univ. of Freiburg 

Burkhardt, Hans, Univ. of Freiburg 

Ronneberger, Olaf, Univ. of Freiburg 

When using deformable models for the segmentation of biological data, the choice of the best weighting parameters for 

the internal and external forces is crucial. Especially when dealing with 3D fluorescence microscopic data and cells within 

dense tissue, object boundaries are sometimes not visible. In these cases, one weighting parameter set for the whole contour 

is not desirable. We are presenting a method for the dynamic adjustment of the weighting parameters, that is only depending 

on the underlying data and does not need any prior information. The method is especially apt to handle blurred, noisy, and 

deficient data, as it is often the case in biological microscopy. 

09:00-11:10, Paper WeAT8.16 

Decomposition of Dynamic Textures using Morphological Component Analysis: A New Adaptative Strategy 

Dubois, Sloven, Univ. de La Rochelle 

Péteri, Renaud, Univ. of La Rochelle 

Ménard, Michel, Univ. de La Rochelle 

The research context of this work is dynamic texture analysis and characterization. Many dynamic textures can be modeled 

as a large scale propagating wave and local oscillating phenomena. The Morphological Component Analysis algorithm is 

used to retrieve these components using a well chosen dictionary. We define a new strategy for adaptive thresholding in 

the Morphological Component Analysis framework, which greatly reduces the computation time when applied on videos. 

Tests on synthetic and real image sequences illustrate the efficiency of the proposed method and future prospects are 

finally exposed. 

09:00-11:10, Paper WeAT8.17 

Anisotropic Contour Completion for Cell Microinjection Targeting 

Becattini, Gabriele, Italian Inst. of Tech. 

Mattos, Leonardo, Italian Inst. of Tech. 

Caldwell, Darwin G., Italian Inst. of Tech. 

This paper shows a novel application of the diffusion tensor for anisotropic image processing. The designed system aims 

at spotting and localizing injection points on a population of adherent cells lying on a Petri’s dish. The overall procedure 

is described including pre-filtering, ridge enhancement, cell segmentation, shape analysis and injection point detection. 

The anisotropic contour completion (ACC) employed is equivalent to a dilation with a continuous elliptic structural element 

that takes into account the local orientation of the contours to be closed, preventing extension towards the normal direction. 

Experiments carried out on real images from an optical microscope revealed a remarkable reliability with up to 86% of 

cells in the field of view correctly segmented and targeted for microinjection. 

- 171 -

09:00-11:10, Paper WeAT8.18 

Active Contours with Thresholding Value for Image Segmentation 

Chen, Gang, Chinese Acad. of Sciences 

Zhang, Haiying, Chinese Acad. of Sciences 

Chen, Iron, Chinese Acad. of Sciences 

Yang, Wen, Wuhan Univ. 

In this paper, we propose an active contour with threshold value to detect objects and at the same time get rid of unimportant 

parts rather than extract all information. The basic ideal of our model is to introduce a weight matrix into region-based 

active contours, which can enhance the weight for the main parts while filter the weak intensity, such as shadows, illumination 

and so on. Moreover, we can choose threshold value to set weight matrix manually for accurate image segmentation. 

Thus, the proposed method can extract objects of interest in practice. Coupled partial differential equations are used to 

implement this method with level set algorithms. Experimental results show the advantages of our method in terms of accuracy 

for image segmentation. 

09:00-11:10, Paper WeAT8.19 

An Iterative Method for Superresolution of Optical Flow Derived by Energy Minimisation 

Mochizuki, Yoshihiko, Chiba Univ. 

Kameda, Yusuke, Chiba Univ. 

Imiya, Atsushi, IMIT, Chiba Univ. 

Sakai, Tomoya, Chiba Univ. 

Super resolution is a technique to recover a high resolution image from a low resolution image. We develop a variational 

super resolution method for the subpixel accurate optical flow computation using variational optimisation. We combine 

variational super resolution and the variational optical flow computation for the super resolution optical flow computation. 

09:00-11:10, Paper WeAT8.20 

Non-Rigid Image Registration for Historical Manuscript Restoration 

Wang, Jie, National Univ. of Singapore 


This paper presents a non-rigid registration method for the restoration of double-sided historical manuscripts. Firstly, the 

gradient direction maps of the two images of a manuscript are examined to identify candidate control points. Then the 

correspondences of these points are established by minimizing a disimilarity measure consisting of intensity, gradient and 

displacement. To fully capture the spatial relationship between the two images, a mapping function is defined as the combination 

of a global affine and local b-splines transformation. The cost function for optimization consists of two parts: 

normalized mutual information for the goal of similarity and space integral of the square of the second order derivatives 

for smoothness. To evaluate the proposed method, a wavelet based restoration procedure is applied to registered images. 

Real documents from the National Archives of Singapore are used for testing and the experimental results are impressive. 

09:00-11:10, Paper WeAT8.21 

An Effective Decentralized Nonparametric Quickest Detection Approach 

Yang, Dayu, Univ. of Tennessee 

Qi, Hairong, Univ. of Tennessee 

This paper studies decentralized quickest detection schemes that can be deployed in a sensing environment where data 

streams are simultaneously collected from multiple channels located distributively to jointly support the detection. Existing 

decentralized detection approaches are largely parametric that require the knowledge of pre-change and post-change distributions. 

In this paper, we first present an effective nonparametric detection procedure based on Q-Q distance measure. 

We then describe two implementations schemes, binary quickest detection and local decision fusion by majority voting, 

that realize decentralized nonparametric detection. Experimental results show that the proposed method has a comparable 

performance to the parametric CUSUM test in binary detection. Its decision fusion-based implementation also outperforms 

the other three popular fusion rules under the parametric framework. 

- 172 -

09:00-11:10, Paper WeAT8.22 

On the Design of a Class of Odd-Length Biorthogonal Wavelet Filter Banks for Signal and Image Processing 

Baradarani, Aryaz, Univ. of Windsor 

Mendapara, Pankajkumar, Univ. of Windsor 

Wu, Q. M. Jonathan, Univ. of Windsor 

In this paper, we introduce an approach to the design of odd-length biorthogonal wavelet filter banks based on semi definite 

programming employing Bernstein polynomials. The method is systematic and renders a simple optimization problem, 

yet it offers wavelet filters ranging from maximally flat to maximal passband/stopband width. The odd-length biorthogonal 

filter pairs are then used in multi-focus imaging to obtain a fully-focused image from a set of registered semi-focused 

input images at varying focus employing the distance transform and exponentially decaying function on the subbands in 

wavelet domain. Various images are tested and experimental results compare favorably to recent results in literature. 

09:00-11:10, Paper WeAT8.23 

Implicit Feature-Based Alignment System for Radiotherapy 

Yamakoshi, Ryoichi, Mitsubishi Electric Corp. 

Hirasawa, Kousuke, Mitsubishi Electric Corp. 

Okuda, Haruhisa, Mitsubishi Electric Corp. 

Kage, Hiroshi, Mitsubishi Electric Corp. 

Sumi, Kazuhiko, Mitsubishi Electric Corp. 

Ivanov, Yuri, MERL, USA 

Sakamoto, Hidenobu, Mitsubishi Electric Corp. 

Yanou, Toshihiro, Hyogo Ion Bean Medical Center, Tokyo 

Suga, Daisaku, Hyogo Ion Bean Medical Center, Tokyo 

Murakami, Masao, Hyogo Ion Bean Medical Center, Tokyo 

In this paper we present a robust alignment algorithm for correcting the effects of out-of-plane rotation to be used for automatic 

alignment of the Computed Tomography (CT) volumes and the generally low quality fluoroscopic images for radiotherapy 

applications. Analyzing not only in-plane but also out-of-plane rotation effects on the Dignitary Reconstructed 

Radiograph (DRR) images, we develop simple alignment algorithm that extracts a set of implicit features from DRR. 

Using these SIFT-based features, we align DRRs with the fluoroscopic images of the patient and evaluate the alignment 

accuracy. We compare our approach with traditional techniques based on gradient-based operators and show that our algorithm 

performs faster while in most cases delivering higher accuracy. 

09:00-11:10, Paper WeAT8.24 

3D Vertebrae Segmentation in CT Images with Random Noises 

Aslan, Melih Seref, Univ. of Louisville 



Arnold, Ben, Image Analysis, Inc 

Chen, Dongqing, Univ. of Louisville 

Ping, Xiang, Image Analysis, Inc. 

Exposure levels (X-ray tube amperage and peak kilovoltage) are associated with various noise levels and radiation dose. 

When higher exposure levels are applied, the images have higher signal to noise ratio (SNR) in the CT images. However, 

the patient receives higher radiation dose in this case. In this paper, we use our robust 3D framework to segment vertebral 

bodies (VBs) in clinical computed tomography (CT) images with different noise levels. The Matched filter is employed 

to detect the VB region automatically. In the graph cuts method, a VB (object) and surrounding organs (background) are 

represented using a gray level distribution models which are approximated by a linear combination of Gaussians (LCG). 

Initial segmentation based on the LCG models is then iteratively refined by using Markov Gibbs random field(MGRF) 

with analytically estimated potentials. Experiments on the data sets show that the proposed segmentation approach is more 

accurate and robust than other known alternatives. 

- 173 -

09:00-11:10, Paper WeAT8.25 

An Improved Method for Cirrhosis Detection using Liver’s Ultrasound Images 

Fujita, Yusuke, Yamaguchi Univ. 

Hamamoto, Yoshihiko, Yamaguchi Univ. 

Segawa, Makoto, Yamaguchi Univ. 

Terai, Shuji, Yamaguchi Univ. 

Sakaida, Isao, Yamaguchi Univ. 

This paper describes an improved method for cirrhosis detection in the liver using Gabor features from ultrasound images. 

There are three main contributions of our cirrhosis detection method. The first contribution of this method is to combine 

weak classifiers using the AdaBoost algorithm. The second one is to use an artificial dataset to avoid the problem of over 

fitting the limited training dataset. The third one is to apply a voting classification with use of multiple regions of interest 

(ROIs). Although the accuracy rate of a single classifier designed with only original dataset was 56%, that of the proposed 

method was 80% in cross-validation. 

09:00-11:10, Paper WeAT8.26 

A Dual Pass Video Stabilization System using Iterative Motion Estimation and Adaptive Motion Smoothing 

Pan, Pan, Fujitsu R&D Center Co., Ltd. 

Minagawa, Akihiro, Fujitsu Lab. LTD 

Sun, Jun, Fujitsu R&D Center Co., LTD 

Hotta, Yoshinobu, Fujitsu Lab. LTD. 

Naoi, Satoshi, Fujitsu R&D Center Co., LTD 

In this paper, we propose a novel dual pass video stabilization system using iterative motion estimation and adaptive 

motion smoothing. In the first pass, the transformation matrix to stabilize each frame is returned. The global motion estimation 

is carried out by a novel iterative method. The intentional motion is estimated using adaptive window smoothing. 

Before the beginning of the second pass, we obtain the optimal trim size for a specific video based on the statistics of the 

transformation parameters. In the second pass, the stabilized video is composed according to the optimal trim size. Experimental 

results show the superior performance of the proposed method in comparison to other existing methods. 

09:00-11:10, Paper WeAT8.27 

A Modified Particle Swarm Optimization Applied in Image Registration 

Niazi, Muhammad Khalid Khan, Uppsala Univ. 

Nystrom, Ingela, Uppsala Univ. 

We report a modified version of the particle swarm optimization (PSO) algorithm and its application to image registration. 

The modified version utilizes benefits from the Gaussian and the uniform distribution, when updating the velocity equation 

in the PSO algorithm. Which one of the distributions is selected depends on the direction of the cognitive and social components 

in the velocity equation. This direction checking and selection of the appropriate distribution provide the particles 

with an ability to jump out of local minima. The registration results achieved by this new version proves the robustness 

and its ability to find a global minimum. 

09:00-11:10, Paper WeAT8.28 

Image Segmentation based on Adaptive Fuzzy-C-Means Clustering 

Ayech, Mohamed Walid, Pol. de Recherche Informatique Du CEntre 

El Kalti, Karim, Faculty of Science of Monastir Tunisia 

El Ayeb, Bechir, Pol. de Recherche Informatique Du CEntre 

The clustering method Fuzzy-C-Means (FCM) is widely used in image segmentation. However, the major drawback of 

this method is its sensitivity to the noise. In this paper, we propose a variant of this method which aims at resolving this 

problem. Our approach is based on an adaptive distance which is calculated according to the spatial position of the pixel 

in the image. The obtained results have shown a significant improvement of our approach performance compared to the 

standard version of the FCM, especially regarding the robustness face to noise and the accuracy of the edges between regions. 

- 174 -

09:00-11:10, Paper WeAT8.29 

Multi-Spectral Satellite Image Registration using Scale-Restricted SURF 

Teke, Mustafa, Middle East Tech. Univ. 

Temizel, Alptekin, Middle East Tech. Univ. 

Satellites generally have arrays of sensors having different resolution and wavelength parameters. For some applications, 

images acquired from different viewpoints and positions are required to be aligned. This alignment process could be 

achieved by matching the image features followed by image registration. In this paper registration of multispectral satellite 

images using Speeded Up Robust Features (SURF) method is examined. The performance of SURF for registration of 

high resolution satellite images captured at different bands is evaluated. Scale restriction (SR) method, which has recently 

been proposed for SIFT, is adapted to SURF to improve multispectral image registration performance. Matching performance 

between different bands using SURF, U-SURF, SURF with SR and U-SURF with SR is tested and robustness of 

these with respect to orientation and scale is evaluated. 

09:00-11:10, Paper WeAT8.30 

Automatic Attribute Threshold Selection for Blood Vessel Enhancement 

Kiwanuka, Fred Noah, Univ. of Groningen 

Wilkinson, Michael H.f., Univ. of Groningen 

Attribute filters allow enhancement and extraction of features without distorting their borders, and never introduce new 

image features. These are highly desirable properties in biomedical imaging, where accurate shape analysis is paramount. 

However, setting the attribute-threshold parameters has to date only been done manually. This paper explores simple, fast 

and automated methods of computing attribute threshold parameters based on image segmentation, thresholding and data 

clustering techniques. Though several techniques perform well on blood-vessel filtering, the choice of technique appears 

to depend on the imaging mode. 

09:00-11:10, Paper WeAT8.31 

Initialisation-Free Active Contour Segmentation 

Xie, Xianghua, Swansea Univ. 

Mirmehdi, Majid, Univ. of Bristol 

We present a region based active contour model which does not require any initialisation and is capable of modelling 

multi-modal image regions. Its external force is based on statistically learning and grouping of image primitives in multiscale, 

and its numerical solution is carried out using radial basis function interpolation and time dependent expansion 

coefficient updating. The initialisation-free property makes it attractive to applications such as detecting unkown number 

of objects with unkown topologies. 

09:00-11:10, Paper WeAT8.32 

On Clock Offset Estimation in Wireless Sensor Networks with Weibull Distributed Network Delays 

Ahmad, Aitzaz, Texas A&M Univ. Coll. Station 

Noor, Amina, Texas A&M Univ. Coll. Station 

Serpedin, Erchin, Texas A&M Univ. Coll. Station 

Nounou, Hazem, Texas A&M Univ. 

Nounou, Mohamed, Texas A&M Univ. 

We consider the problem of Maximum Likelihood (ML) estimation of clock parameters in a two-way timing exchange 

scenario where the random delays assume a Weibull distribution, which represents a more generalized model. The ML estimate 

of the clock offset for the case of exponential distribution was obtained earlier. Moreover, it was reported that when 

the fixed delay is known, MLE is not unique. We determine the uniformly minimum variance unbiased (UMVU) estimators 

for exponential distribution under such a scenario and produce biased estimators having lower MSE than UMVU for all 

values of clock offset. We then consider the case when shape parameter is greater than one and reduce the corresponding 

optimization problems to their equivalent convex forms, thus guaranteeing convergence to a global minimum. 

- 175 -

09:00-11:10, Paper WeAT8.33 

Parallel Algorithm of Two-Dimensional Discrete Cosine Transform based on Special Data Representation 

Chicheva, Marina, Image Processing System Inst. of RAS 

The paper is devoted to parallel approach efficiency research for two-dimensional discrete cosine transform. The algorithm 

based on data representation in hypercompex algebra is proposed. 

09:00-11:10, Paper WeAT8.34 

Parallel Scales for More Accurate Displacement Estimation in Phase-Based Image Registration 

Forsberg, Daniel, Linköping Univ. 

Andersson, Mats, Linköping Univ. 

Knutsson, Hans 

Phase-based methods are commonly applied in image registration. When working with phase-difference methods only a 

single scale is employed, although the algorithms are normally iterated over multiple scales, whereas phase-congruency 

methods utilize the the phase from multiple scales simultaneously. This paper presents an extension to phase-difference 

methods employing parallel scales to achieve more accurate displacements. Results are also presented clearly favouring 

the use of parallel scales over single scale in more than 95% of the 120 tested cases. 

09:00-11:10, Paper WeAT8.35 

A Comprehensive Evaluation on Non-Deterministic Motion Estimation 

Wu, Changzhu, Northwestern Pol. Univ. 

Wang, Qing, Northwestern Pol. Univ. 

When computing optical flow with region-based matching, very few of them can be reliably obtained, especially for the 

high-contrast areas or those with little texture. Instead of using a single pixel from the reference frame, non-deterministic 

motion utilizes multiple pixels within a neighborhood to represent the corresponding pixel in the current frame. Although 

remarkable improvement has been made with this method, the weight associated to each reference pixel is quite sensitive 

to the selection of its standard deviation. To address this issue, a dual probability is presented in this paper. Intuitively, it 

enhances those weights of pixels that are more similar to its counterpart in the current frame, while suppressing the rest 

of them. Experimental results show that the proposed method is effective to deal with intense motion and occlusion, especially 

in the case of reducing the adverse impact of noise. 

09:00-11:10, Paper WeAT8.36 

A Full-View Spherical Image Format 

Li, Shigang, Faculty of Engineering 

Hai, Ying, Tottori Univ. 

This paper proposes a full-view spherical image format which is based on the geodesic division of a sphere. In comparison 

with the conventional 3D array representation which consists of five parallelograms, the proposed spherical image format 

is a simple 2D array representation. The algorithms of finding the neighboring pixels given a pixel of a spherical image 

and mapping between spherical coordinate and spherical image pixel are given also. 

09:00-11:10, Paper WeAT8.37 

Shift-Map Image Registration 

Svärm, Linus, Lund Univ. 

Strandmark, Petter, Lund Univ. 

Shift-map image processing is a new framework based on energy minimization over a large space of labels. The optimization 

utilizes alpha-expansion moves and iterative refinement over a Gaussian pyramid. In this paper we extend the range 

of applications to image registration. To do this, new data and smoothness terms have to be constructed. We note a great 

improvement when we measure pixel similarities with the dense DAISY descriptor. The main contributions of this paper 

are: * The extension of the shift-map framework to include image registration. We register images for which SIFT only 

provides 3 correct matches. * A publicly available implementation of shift-map image processing (e.g. in painting, registration). 

We conclude by comparing shift-map registration to a recent method for optical flow with favorable results. 

- 176 -

09:00-11:10, Paper WeAT8.38 

An Adaptive Method for Efficient Detection of Salient Visual Object from Color Images 

Brezovan, Marius, Univ. of Craiova 

Burdescu, Dumitru Dan, Univ. of Craiova 

Ganea, Eugen, Univ. of Craiova 

Stanescu, Liana, Univ. of Craiova 

Stoica, Cosmin, Univ. of Craiova 

This paper presents an efficient graph-based method to detect salient objects from color images and to extract their color 

and geometric features. Despite of the majority of the segmentation methods our method is totally adaptive and it do not 

require any parameter to be chosen in order to produce a better segmentation. The proposed segmentation method uses a 

hexagonal structure defined on the set of the image pixels ant it performs two different steps: a pre-segmentation step that 

will produce a maximum spanning tree of the connected components of the visual graph constructed on the hexagonal 

structure of an image, and the final segmentation step that will produce a minimum spanning tree of the connected components, 

representing the visual objects, by using dynamic weights based on the geometric features of the regions. Experimental 

results are presented indicating a good performance of our method. 

09:00-11:10, Paper WeAT8.39 

Robust Matching in an Uncertain World 

Sur, Frédéric, INPL / INRIA Nancy Grand Est 

Finding point correspondences which are consistent with a geometric constraint is one of the cornerstones of many computer 

vision problems. This is a difficult task because of spurious measurements leading to ambiguously matched points 

and because of uncertainty in point location. In this article we address these problems and propose a new robust algorithm 

that explicitly takes account of location uncertainty. We propose applications to SIFT matching and 3D data fusion. 

09:00-11:10, Paper WeAT8.41 

Recursive Dynamically Variable Step Search Motion Estimation Algorithm for High Definition Video 

Tasdizen, Ozgur, Sabanci Univ. 

Hamzaoglu, Ilker, Sabanci Univ. 

For High Definition (HD) video formats, computational complexity of Full Search (FS) Motion Estimation (ME) algorithm 

is prohibitively high, whereas the Peak Signal-to-Noise Ratio obtained by fast search ME algorithms is low. Therefore, in 

this paper, we propose Recursive Dynamically Variable Step Search (RDVSS) ME algorithm for real-time processing of 

HD video formats. RDVSS algorithm dynamically determines the search patterns that will be used for each Macro block 

(MB) based on the motion vectors of its spatial and temporal neighboring MBs. RDVSS performs very close to FS by 

searching much fewer search locations than FS and it outperforms successful fast search ME algorithms by searching 

more search locations than these algorithms. In addition, RDVSS algorithm can be efficiently implemented by a reconfigurable 

systolic array based ME hardware. 

09:00-11:10, Paper WeAT8.42 

Spatial and Temporal Enhancement of Depth Images Captured by a Time-of-Flight Depth Sensor 

Kim, Sung-Yeol, The Unviersity of Tennessee 

Cho, Ji-Ho, Gwangju Insititute of Science and Tech. 

Koschan, Andreas, The Unviersity of Tennessee 

Abidi, Mongi, The Unviersity of Tennessee 

In this paper, we present a new method to enhance depth images captured by a time-of-flight (TOF) depth sensor spatially 

and temporally. In practice, depth images obtained from TOF depth sensors have critical problems, such as optical noise 

existence, unmatched boundaries, and temporal inconsistency. In this work, we improve depth quality by performing a 

newly-designed joint bilateral filtering, color segmentation-based boundary refinement, and motion estimation-based temporal 

consistency. Experimental results show that the proposed method significantly minimizes the inherent problems of 

the depth images so that we can use them to generate a dynamic and realistic 3D scene. 

- 177 -

09:00-11:10, Paper WeAT8.43 

Transition Thresholds for Binarization of Historical Documents 

Ramírez-Ortegón, Marte Alejandro, Free Univ. of Berlin 

Rojas, Raul, Freie Univ. Berlin 

This paper extends the transition method for binarization based on transition pixels, a generalization of edge pixels. This 

method originally computes transition thresholds using the quantile thresholding algorithm, that has a critical parameter. 

We achieved an automatic version of the transition method by computing the transition thresholds with the Rosin’s algorithm. 

We experimentally tested four variants of the transition method combining the density and cumulative distribution 

functions of transition values, with gray-intensity thresholds based on the normal and lognormal density functions. The 

results of our experiments show that these unsupervised methods yields superior binarization compared with top-ranked 

algorithms. 

09:00-11:10, Paper WeAT8.44 

Image Quality Metrics: PSNR vs. SSIM 

Horé, Alain, Sherbrooke Univ. 


In this paper, we analyse two well-known objective image quality metrics, the peak-signal-to-noise ratio (PSNR) as well 

as the structural similarity index measure (SSIM), and we derive a simple mathematical relationship between them which 

works for various kinds of image degradations such as Gaussian blur, additive Gaussian white noise, jpeg and jpeg2000 

compression. A series of tests realized on images extracted from the Kodak database gives a better understanding of the 

similarity and difference between the SSIM and the PSNR. 

09:00-11:10, Paper WeAT8.45 

Coarse Scale Feature Extraction using the Spiral Architecture Structure 

Coleman, Sonya, Univ. of Ulster 

Scotney, Bryan, Univ. of Ulster 

Gardiner, Bryan, Univ. of Ulster 

The Spiral Architecture has been developed as a fast way of indexing a hexagonal pixel-based image. In combination with 

spiral addition and spiral multiplication, methods have been developed for hexagonal image processing operations such 

as translation and rotation. Using the Spiral Architecture as the basis for our operator structure, we present a general approach 

to the computation of adaptive coarse scale Laplacian operators for use on hexagonal pixel-based images. We evaluate 

the proposed operators using simulated hexagonal images and demonstrate improved performance when compared 

with rectangular Laplacian operators such as Marr-Hildreth 

09:00-11:10, Paper WeAT8.46 

Visual Perception Driven Registration of Mammograms 

Boucher, Arnaud, Univ. Paris Descartes 

Cloppet, Florence, Paris Descartes Univ. 

Vincent, Nicole, Paris Descartes Univ. 

Jouve, Pierre Emmanuel, Fenics Company 

This paper aims to develop a methodology to register pairs of temporal mammograms. Control points based on anatomical 

features are detected in an automated way. Thereby, image semantic is used to extract landmarks based on these control 

points. A referential is generated from these control points based on this referential the studied images are realigned using 

different levels of observation leading to both rigid and pseudo non-rigid transforms according to expert mammogram 

reading. 

- 178 -

09:00-11:10, Paper WeAT8.47 

Robust Fourier-Based Image Alignment with Gradient Complex Image 

Su, Hong-Ren, National Tsing Hua Univ. 

Lai, Shang-Hong, National Tsing Hua Univ. 

Tsai, Ya-Hui, Industrial Tech. Res. Inst. 

The paper proposes a robust image alignment framework based on Fourier transform of a gradient complex image. The 

proposed Fourier-based algorithm can handle translation, rotation, and scaling, and it is robust against noise and non-uniform 

illumination. The proposed alignment algorithm is further extended to work under occlusion by partitioning the template 

and performing the Fourier-based alignment for all partitioned sub-templates in a voting framework. Our experiments 

show superior alignment results by using the proposed robust Fourier-based alignment over the previous related methods. 

09:00-11:10, Paper WeAT8.48 

Rate Control of H.264 Encoded Sequences by Dropping Frames in the Compressed Domain 

Kapotas, Spyridon, Hellenic Open Univ. 

Skodras, Athanassios N., Hellenic Open Univ. 

A new technique for controlling the bitrate of H.264 encoded sequences is presented. Bitrate control is achieved by dropping 

frames directly in the compressed domain. The dropped frames are carefully selected so as to either eliminate or cause 

non perceptible drift errors in the decoder. The technique suits well H.264 encoded sequences such as movies and tv news, 

which are transmitted over wireless networks. 

09:00-11:10, Paper WeAT8.49 

Statistical Analysis of Kalman Filters by Conversion to Gauss Helmert Models with Applications to Process Noise Estimation 

Petersen, Arne, Christian-Albrechts-Univ. of Kiel 

Koch, Reinhard, Univ. of Kiel 

This paper introduces a reformulation of the extended Kalman Filter using the Gauss-Helmert model for least squares estimation. 

By proving the equivalence of both estimators it is shown how the methods of statistical analysis in least squares 

estimation can be applied to the prediction and update process in Kalman Filtering. Especially the efficient computation 

of the reliability (or redundancy) matrix allows the implementation of self supervising systems. As an application an unparameterized 

method for estimating the variances of the filters process noise is presented. 

09:00-11:10, Paper WeAT8.50 

Color Adjacency Modeling for Improved Image and Video Segmentation 

Price, Brian, Brigham Young Univ. 

Morse, Bryan, Brigham Young Univ. 

Cohen, Scott, Adobe Systems 

Color models are often used for representing object appearance for foreground segmentation applications. The relationships 

between colors can be just as useful for object selection. In this paper, we present a method of modeling color adjacency 

relationships. By using color adjacency models, the importance of an edge in a given application can be determined and 

scaled accordingly. We apply our model to foreground segmentation of similar images and video. We show that given one 

previously-segmented image, we can greatly reduce the error when automatically segmenting other images by using our 

color adjacency model to weight the likelihood that an edge is part of the desired object boundary. 

09:00-11:10, Paper WeAT8.51 

Paired Transform Slice Theorem of 2-D Image Reconstruction from Projections 

Dursun, Serkan, Univ. of Texas at San Antonio 

Du, Nan, Univ. of Texas at San Antonio 

Grigoryan, Artyom M., Univ. of Texas at San Antonio 

This paper discusses the paired transform-based method of reconstruction of 2-D images from their projections. The complete 

set of basic functions of the 2-D discrete paired transform are defined by specific directions, i.e. the transform is di- 

- 179 -

ectional and can be calculated from the projection data. A simple formula is presented for image reconstruction without 

calculating the 2-D discrete Fourier transform in the case, when the size of image is Lr x Lr, when L is prime. The image 

reconstruction is described by the discrete model that is used in the series expansion methods of image reconstruction. 

The proposed method of reconstruction has been implemented and successfully applied for modeled images on Cartesian 

grid of sizes up to 256x256. 

09:00-11:10, Paper WeAT8.52 

Segmentation of Cervical Cell Images 

Kale, Asli, Bilkent Univ. 


The key step of a computer-assisted screening system that aims early diagnosis of cervical cancer is the accurate segmentation 

of cells. In this paper, we propose a two-phase approach to cell segmentation in Pap smear test images with the 

challenges of inconsistent staining, poor contrast, and overlapping cells. The first phase consists of segmenting an image 

by a non-parametric hierarchical segmentation algorithm that uses spectral and shape information as well as the gradient 

information. The second phase aims to obtain nucleus regions and cytoplasm areas by classifying the segments resulting 

from the first phase based on their spectral and shape features. Experiments using two data sets show that our method performs 

well for images containing both a single cell and many overlapping cells. 

09:00-11:10, Paper WeAT8.53 

Principal Contour Extraction and Contour Classification to Detect Coronal Loops from the Solar Images 

Durak, Nurcan, Univ. of Louisville 

Nasraoui, Olfa, Univ. of Louisville 

In this paper, we describe a system that determines coronal loop existence from a given Solar image region in two stages: 

1) extracting principal contours from the solar image regions, 2) deciding whether the extracted contours are in a loop 

shape. In the first stage, we propose a principal contour extraction method that achieves 88% accuracy in extracting the 

desired contours from the cluttered regions. In the second stage, we analyze the extracted contours in terms of their geometric 

features such as linearity, elliptical features, curvature, proximity, smoothness, and corner points. To distinguish 

loop contours from the other forms, we train an Adaboost classifier based C4.5 decision tree by using geometric features 

of 150 loop contours and 250 non-loop contours. Our system achieves 85% F1-Score from 10-fold cross validation experiments. 

09:00-11:10, Paper WeAT8.54 

Human Shadow Removal with Unknown Light Source 

Chen, Chia-Chih, The Univ. of Texas at Austin 

Aggarwal, J. K., The Univ. of Texas at Austin 

In this paper, we present a shadow removal technique which effectively eliminates a human shadow cast from an unknown 

direction of light source. A multi-cue shadow descriptor is proposed to characterize the distinctive properties of shadows. 

We employ a 3-stage process to detect then remove shadows. Our algorithm improves the shadow detection accuracy by 

imposing the spatial constraint between the foreground subregions of human and shadow. We collect a dataset containing 

81 human-shadow images for evaluation. Both descriptor ROC curves and qualitative results demonstrate the superior 

performance of our method. 

09:00-11:10, Paper WeAT8.55 

Generalizing Tableau to Any Color of Teaching Boards 

Oliveira, Daniel Marques, Univ. Federal de Pernambuco 

Lins, Rafael Dueire, Univ. Federal de Pernambuco 

Teaching boards are omnipresent in classrooms throughout the world. Tableau is a software environment for processing 

images from teaching-boards acquired using portable digital cameras and cell-phones. The previous versions of Tableau 

were restricted to white-board processing. This paper generalizes the enhancement algorithm to work with boards of any 

color, being the first software environment able to process non-white boards. 

- 180 -

09:00-11:10, Paper WeAT8.56 

Enhancing the Filtering-out of the Back-to-Front Interference in Color Documents with a Neural Classifier 

Silva, Gabriel De França Pereira E, Univ. Federal de Pernambuco 

Lins, Rafael Dueire, Univ. Federal de Pernambuco 

Silva, João Marcelo Monte Da, Univ. Federal de Pernambuco 

Banergee, Serene, Hewlett-Packard Labs - India 

Kuchibhotla, Anjaneyulu, Hewlett-Packard Labs - India 

Thielo, Marcelo, Hewlett-Parckard Labs - Brazil 

Back-to-front, show-through, or bleeding are the names given to the interference that appears whenever one writes or 

prints on both sides of translucent paper. Such interference degrades image binarization and document transcription via 

OCR. The technical literature presents several algorithms to remove the back-to-front noise, but no algorithm is good 

enough in all cases. This article presents a new technique to remove such noise in color documents which makes use of 

neural classifiers to evaluate the degree of intensity of the interference and besides that to indicate the existence of blur. 

Such classifier allows tuning the parameters of an algorithm for back-to-front interference and document enhancement. 

09:00-11:10, Paper WeAT8.57 

A Scale Estimation Algorithm using Phase-Based Correspondence Matching for Electron Microscope Images 

Suzuki, Ayako, Tohoku Univ. 

Ito, Koichi, Tohoku Univ. 

Aoki, Takafumi, Tohoku Univ. 

Tsuneta, Ruriko, Hitachi, Ltd., Central Res. Lab. 

This paper proposes a multi-stage scale estimation algorithm using phase-based correspondence matching for electron 

microscope images. Consider a sequence of microscope images of the same target object, where the image magnification 

is gradually increased so that the final image has a very large scale factor S (e.g., S=1,000) with respect to the initial image. 

The problem considered in this paper is to estimate the overall scale factor S of the given image sequence. The proposed 

scale estimation technique provides a new methodology for high-accuracy magnification calibration of electron microscopes. 

Experimental evaluation using Mandelbrot images as precisely scale-controlled image sequence shows that the 

proposed method can estimate the scale factor S=1,000 with approximately 0.1%-scale error. This paper also describes an 

application of the proposed algorithm to the magnification calibration of an actual STEM (Scanning Transmission Electron 

Microscope). 

09:00-11:10, Paper WeAT8.58 

Edge Drawing: An Heuristic Approach to Robust Real-Time Edge Detection 

Topal, Cihan, Anadolu Univ. 

Akinlar, Cuneyt, Anadolu Univ. 

Genc, Yakup, Siemens Corp. Res. 

We propose a new edge detection algorithm that works by computing a set of anchor edge points in an image and then 

linking these anchor points by drawing edges between them. The resulting edge map consists of perfect contiguous, one 

pixel wide edges. The performance tests show that our algorithm is up to 16% faster than the fastest known edge detection 

algorithm, i.e., OpenCV implementation of the Canny edge detector. We believe that our edge detector is a novel step in 

edge detection and would be very suitable for the next generation real-time image processing and computer vision applications. 

09:00-11:10, Paper WeAT8.59 

MPEG-2 Video Watermarking using Pattern Consideration 

Mansouri, Azadeh, Shahid Beheshti Univ. 

Mahmoudi Aznaveh, Ahmad, Shahid Beheshti Univ. 

Torkamani-Azar, Farah, Shahid Beheshti Univ. 

This paper proposes a new method for digital video watermarking in compressed domain. Both the embedding and extracting 

phases are performed after entropy decoding. Consequently, fully decompressing the compressed video is not 

necessary making this scheme an appropriate choice for real-time application. Furthermore, taking the structural information 

into account leads to presenting a robust watermarking scheme along with less quality degradation. To select suitable 

- 181 -

coefficients for embedding the watermark, three different aspects, imperceptibility, security, and bit rate increase, have 

been considered. These performance factors are adjusted by defining three priority matrices. In addition, a content based 

key is proposed in order to overcome the collusion attack. The flexibility of our method to provide desired characteristic 

can be expressed as another advantage. 

09:00-11:10, Paper WeAT8.60 

Lip Segmentation using Level Set Method: Fusing Landmark Edge Distance and Image Information 

Banimahd, Seyed Reza, Sahand Univ. of Tech. 

Ebrahimnezhad, Hossein, Sahand Univ. of Tech. 

Lip segmentation is an essential step in audio-visual processing systems. In this paper, we incorporate the color and edge 

information in level set formulation, for extraction of lip contour. We build two initiative auxiliary images by mixing of 

different color spaces, to extract the landmark edges for upper and lower part of lip. The performance of this approach on 

VidTIMIT databases is tasted and accuracy of 91.2% is reached. 

09:00-11:10, Paper WeAT8.61 

Adaptive Color Independent Components based SIFT Descriptors for Image Classification 

Ai, Danni, Ritsumeikan Univ. 

Han, Xian-Hua, Ritsumeikan Univ. 

Ruan, Xiang, Omron corparation 


This paper proposes an adaptive color independent components based SIFT descriptor (termed CIC-SIFT) for image classification. 

Our motivation is to seek an adaptive and efficient color space for color SIFT feature extraction. Our work has 

two key contributions. First, based on independent component analysis (ICA), an adaptive and efficient color space is 

proposed for color image representation. Second, in this ICA-based color space, a discriminative CIC-SIFT descriptor is 

calculated for image classification. The experiment results indicate that (1) contrast between objects and background can 

be enhanced on the ICA-based color space and (2) the CIC-SIFT descriptor outperforms other conventional color SIFT 

descriptors on image classification. 

WeAT9 Lower Foyer 

Bioinformatics and Biomedical Applications Poster Session 

Session chair: Unay, Devrim (Bahcesehir Univ.) 

09:00-11:10, Paper WeAT9.1 

Joint Registration and Segmentation of Histological Volume Data by Diffusion-Based Label Adaption 

Bollenbeck, Felix, Fraunhofer Inst. for Factory Operation and Automation 

Seiffert, Udo, Fraunhofer IFF Magdeburg 

Three-dimensional serial section imaging delivers high spatial resolution and histological detail, which facilitates analysis 

of differentiation and development by exact labelling of tissues and cells, unknown to other 3-D imaging modalities. We 

propose an algorithm for interleaved reconstruction and segmentation of tissues in serial section volumes by diffusionbased 

registration and adaption of two-dimensional reference labellings. Iterative refinement of the global image congruence 

and local deformation of labellings delivers an efficient algorithm for processing of large volume data-sets. The 

benefits of the approach are shown by means of reconstruction and segmentation of giga-voxel serial section volumes of 

plant specimen. 

09:00-11:10, Paper WeAT9.2 

The Use of Genetic Programming for Learning 3D Craniofacial Shape Quantification 

Atmosukarto, Indriyati, Univ. of Washington 

Shapiro, Linda, 

Heike, Carrie, Seattle Children’s Hospital, Craniofacial Center 

Craniofacial disorders commonly result in various head shape dysmorphologies. The goal of this work is to quantify the 

various 3D shape variations that manifest in the different facial abnormalities in individuals with a craniofacial disorder 

- 182 -

called 22q11.2 Deletion Syndrome. Genetic programming (GP) is used to learn the different 3D shape quantifications. 

Experimental results show that the GP method achieves a higher classification rate than those of human experts and existing 

computer algorithms. 

09:00-11:10, Paper WeAT9.3 

Identification of Ancestry Informative Markers from Chromosome-Wide Single Nucleotide Polymorphisms using Symmetrical 

Uncertainty Ranking 

Piroonratana, Theera, King Mongkut’s Univ. of Tech. 

Wongseree, Waranyu, King Mongkut’s Univ. of Tech. 

Usavanarong, Touchpong, King Mongkut’s Univ. of Tech. 

Assawamakin, Anunchai, Mahidol Univ. 

Limwongse, Chanin, Mahidol Univ. 

Chaiyaratana, Nachol, King Mongkut’s Univ. of Tech. 

Ancestry informative markers (AIMs) have been proven to contain necessary information for population classification. In 

this article, round robin symmetrical uncertainty ranking for preliminary AIM screening is proposed. Each single nucleotide 

polymorphism (SNP) is assigned a rank based on its ability to separate two populations from each other. In a multi-population 

scenario, all possible population pairs are considered and the screened SNP set incorporates top-ranked SNPs from 

every pair-wise comparison. After the preliminary screening, SNPs are further screened by a wrapper which is embedded 

with a naive Bayes classifier. A classification model is subsequently constructed from the finally screened SNPs via a 

naive Bayes classifier. The application of the proposed procedure to the Hap Map data indicates that AIM panels can be 

found on all chromosomes. Each panel consists of 11 to 24 SNPs and can be used to completely classify the CEU, CHB, 

JPT and YRI populations. Moreover, all panels are smaller than the AIM panels reported in previous studies. 

09:00-11:10, Paper WeAT9.4 

Evaluation of a New Point Clouds Registration Method based on Group Averaging Features 

Temerinac-Ott, Maja, Univ. of Freiburg 



Registration of point clouds is required in the processing of large biological data sets. The trade off between computation 

time and accuracy of the registration is the main challenge in this task. We present a novel method for registering point 

clouds in two and three dimensional space based on Group Averaging on the Euclidean transformation group. It is applied 

on a set of neighboring points whose size directly controls computing time and accuracy. The method is evaluated regarding 

dependencies of the computing time and the registration accuracy versus the point density assuming their random distribution. 

Results are verified in two biological applications on 2D and 3D images. 

09:00-11:10, Paper WeAT9.5 

Cell Tracking in Video Microscopy using Bipartite Graph Matching 

Chowdhury, Ananda, Jadavpur Univ. 

Chatterjee, Rohit, Jadavpur Univ. 

Ghosh, Mayukh, Jadavpur Univ. 

Ray, Nilanjan, Univ. of Alberta 

Automated visual tracking of cells from video microscopy has many important biomedical applications. In this paper, we 

model the problem of cell tracking over pairs of video microscopy image frames as a minimum weight matching problem in 

bipartite graphs. The bipartite matching essentially establishes one-to-one correspondences between the cells in different 

frames. A key advantage of using bipartite matching is the inherent scalability, which arises from its polynomial time-complexity. 

We propose two different tracking methods based on bipartite graph matching and properties of Gaussian distributions. 

In both the methods, i) the centers of the cells appearing in two frames are treated as vertices of a bipartite graph and ii) the 

weight matrix contains information about distance between the cells (in two frames) and cell velocity. In the first method, 

we identify fast-moving cells based on distance and filter them out using Gaussian distributions before the matching is 

applied. In the second method, we remove false matches using Gaussian distributions after the bipartite graph matching is 

employed. Experimental results indicate that both the methods are promising while the second method has higher accuracy. 

- 183 -

09:00-11:10, Paper WeAT9.6 

Human State Classification and Predication for Critical Care Monitoring by Real-Time Bio-Signal Analysis 

Li, Xiaokun, DCM Res. Res. LLC 

Porikli, Fatih, MERL 

To address the challenges in critical care monitoring, we present a multi-modality bio-signal modeling and analysis modeling 

framework for real-time human state classification and predication. The novel bioinformatic framework is developed 

to solve the human state classification and predication issues from two aspects: a) achieve 1:1 mapping between the biosignal 

and the human state via discriminant feature analysis and selection by using probabilistic principle component 

analysis (PPCA); b) avoid time-consuming data analysis and extensive integration resources by using Dynamic Bayesian 

Network (DBN). In addition, intelligent and automatic selection of the most suitable sensors from the bio-sensor array is 

also integrated in the proposed DBN. 

09:00-11:10, Paper WeAT9.7 

Automated Cephalometric Landmark Identification using Shape and Local Appearance Models 

Keustermans, Johannes, K.U. Leuven 

Mollemans, Wouter, Medicim nv. 

Vandermeulen, Dirk 

Suetens, Paul, K.U.Leuven 

In this paper a method is presented for the automated identification of cephalometric anatomical landmarks in craniofacial 

cone-beam CT images. This method makes use of statistical models, incorporating both local appearance and shape knowledge 

obtained from training data. Firstly, the local appearance model captures the local intensity pattern around each 

anatomical landmark in the image. Secondly, the shape model contains a local and a global component. The former improves 

the flexibility, whereas the latter improves the robustness of the algorithm. Using a leave-one-out approach to the 

training data, we assess the overall accuracy of the method. The mean and median error values for all landmarks are equal 

to 2.55mm and 1.72mm, respectively. 

09:00-11:10, Paper WeAT9.8 

Color Analysis for Segmenting Digestive Organs in VCE 

Vu, Hai, The Inst. of Scientific and Industrial Res. Osaka 

Echigo, Tomio, Osaka Electro-Communication Univ. 


Yagi, Keiko, Kobe Pharmaceutical Univ. 

Shiba, Masatsugu, Osaka City Univ. 

Higuchi, Kazuhide, Osaka City Univ. 

Arakawa, Tetsuo, Osaka City Univ. 

This paper presents an efficient method for automatically segmenting the digestive organs in a Video Capsule Endoscopy 

(VCE) sequence. The method is based on unique characteristics of color tones of the digestive organs. We first introduce 

a color model of the gastrointestinal (GI) tract containing the color components of GI wall and non-wall regions. Based 

on the wall regions extracted from images, the distribution along the time dimension for each color component is exploited 

to learn the dominant colors that are candidates for discriminating digestive organs. The strongest candidates are then 

combined to construct a representative signal to detect the boundary of two adjacent regions. The results of experiments 

are comparable with previous works, but computation cost is more efficient. 

09:00-11:10, Paper WeAT9.9 

A New Application of Meg and DTI on Word Recognition 

Meng, Lu, Northeastern Univ. 

Xiang, Jing, CCHMC 

Zhao, Hong, Northeastern Univ. 

Zhao, Dazhe, Northeastern Univ. 

This paper presented a novel application of Magneto encephalography (MEG) and diffusion tensor image (DTI) on word 

recognition, in which the spatiotemporal signature and the neural network of brain activation associated with word recognition 

were investigated. The word stimuli consisted of matched and mismatched words, which were visually and acousti- 

- 184 -

cally presented simultaneously. Twenty participants were recruited to distinguish and gave different reactions to these two 

types of stimuli. The neural activations caused by their reactions were recorded by MEG system and 3T magnetic DTI 

scanner. Virtual sensor technique and wavelet beam former source analysis, which were state-of-the-art methods, were 

used to study the MEG and DTI data. Three responses were evoked in the MEG waveform and M160 was identified in 

the left temporal-occipital junction. All the results coincided with the previous studies’ conclusions, which indicated that 

the integration of virtual sensor and wavelet beam former were effective techniques in analyzing the MEG and DTI data. 

09:00-11:10, Paper WeAT9.10 

A Hypothesis Testing Approach for Fluorescent Blob Identification 

Wu, Le-Shin, Indiana Univ. 

Shaw, Sidney, Indiana Univ. 

Template matching is a common approach for identifying fluorescent objects within a biological image. But how to decide 

a threshold value for the purpose of justifying the goodness of matching score is a rather difficult task. In this paper, we 

propose a framework that dynamically chooses appropriate threshold values for correct object identification at a non-arbitrary 

statistical power based on the local measure of signal and noise. We validate the feasibility of our proposed framework 

by presenting simulation experiments conducted with both synthetic and live-cell data sets. The experimental results 

suggest that our auto-thresholding algorithm and local signal to noise ratio estimation can provide solid means for effective 

spot identity in place of an ad hoc threshold fitting value or minimization method. 

09:00-11:10, Paper WeAT9.11 

Automated Detection of Nucleoplasmic Bridges for DNA Damage Scoring in Binucleated Cells 

Sun, Changming, CSIRO 

Vallotton, Pascal, CSIRO 

Fenech, Michael, CSIRO 

Thomas, Phil, CSIRO 

Quantification of DNA damage, which may be caused by radiation or exposure to chemicals, is very important and can be 

very time consuming and subject to variability if carried out visually. The quantification of scoring DNA damage includes 

biomarkers such as micronuclei, nucleoplasmic bridges, and nuclear buds as scored in cytokinesis-blocked binucleated 

cells. In this paper, we present a new algorithm based on a shortest path technique that enables us to detect the nucleoplasmic 

bridges joining two nuclei in cell images of binucleated cells. The effectiveness of our algorithm is illustrated using 

a set of cell images. We believe that this is the first time that a feasible automated nucleoplasmic bridge detection system 

has been reported. 

09:00-11:10, Paper WeAT9.12 

Multiple Model Estimation for the Detection of Curvilinear Segments in Medical X-Ray Images using Sparse-Plus- 

Dense-RANSAC 

Papalazarou, Chrysi, Eindhoven Univ. of Tech. 

De With, Peter H. N., Eindhoven Univ. of Tech. / CycloMedia 

Rongen, Peter, Philips Healthcare 

In this paper, we build on the RANSAC method to detect multiple instances of objects in an image, where the objects are 

modeled as curvilinear segments with distinct endpoints. Our approach differs from previously presented work in that it 

incorporates soft constraints, based on a dense image representation, that guide the estimation process in every step. This 

enables (1) better correspondence with image content, (2) explicit endpoint detection and (3) a reduction in the number of 

iterations required for accurate estimation. In the case of curvilinear objects examined in this paper, these constraints are 

formulated as binary image labels, where the estimation proved to be robust to mislabeling, e.g. in case of intersections. 

Results for both synthetic and real data from medical X-ray images show the improvement from incorporating soft imagebased 

constraints. 

- 185 -

09:00-11:10, Paper WeAT9.13 

Statistical Texture Modeling for Medical Volume using Generalized N-Dimensional Principal Component Analysis 

Method and 3D Volume Morphing 

Qiao, Xu, Ritsumeikan Univ. 


In this paper, a statistical texture modeling method is proposed for medical volumes. As the shapes of the human organ 

are very different from one case to another, 3D volume morphing is applied to normalize all the volume datasets to a same 

shape for removing shape variations. In order to deal with the problems of high-dimension and small number of medial 

samples, we propose an effective image compression method named Generalized N-dimensional Principal Component 

Analysis (GND-PCA) to construct a statistical model. Experiments applied on liver volumes show good performance on 

generalization using our method. A simple experiment is employed to show that the features extracted by the statistical 

texture model have capability of discrimination for different types of data, such as normal and abnormal. 

09:00-11:10, Paper WeAT9.14 CANCELED 

Distinguishing Patients with Gastritis and Cholecystitis from the Healthy by Analyzing Wrist Radial Arterial Doppler 

Blood Flow Signals 

Jiang, Xiaorui, Harbin Inst. of Tech. 


Wang, Kuanquan, Harbin Inst. of Tech. 


This paper tries to fill the gap between Traditional Chinese Pulse Diagnosis (TCPD) and Doppler diagnosis by applying 

digital signal analysis and pattern classification techniques to wrist radial arterial Doppler blood flow signals. Doppler 

blood flows signals (DBFS) of patients with cholecystitis, gastritis and healthy people are classified by L2-soft margin 

SVM and 5 linear classifiers using the proposed feature - piecewise axially integrated bispectra (PAIB). A 5-fold cross 

validation is used for performance evaluation. The classification accuracies between either two groups of subjects are 

greater than 93%. Gastritis can be recognized with higher accuracy than cholecystitis. Cholecystitis can be recognized 

with higher accuracy on left hand data than right. The findings in this paper partly conform to the theory of TCPD. Though 

the sample size is relatively small, we could still argue that the methods proposed here are effective and could serve as an 

assistive tool for TCPD. 

09:00-11:10, Paper WeAT9.15 

Pelvic Organs Dynamic Features Analysis for MRI Sequences Discrimination 

Rahim, Mehdi, Univ. Paul Cézanne 

Bellemare, Marc-Emmanuel, Univ. Paul Cézanne 

Pirro, Nicolas, Hôpital La Timone 

Bulot, Rémy, Univ. Paul Cézanne 

Dynamic magnetic resonance imaging MRI acquisitions are used in the clinical assessment of the pelvic organs behaviour 

during an abdominal strain. The main organs (bladder, uterus-vagina, rectum) undergo deformations and intrinsic movements 

along a sequence. Anatomical references and measurements are generally used by clinicians to evaluate pathology 

grades. In this context, we have established quantitative elements, which consist of deformation and movement features, 

for the pelvic dynamic characterization, by using shape descriptors computed from organ contours. Moreover, the deformation 

and movement features relevance has been assessed towards an efficient sequence discrimination and pathology 

detection. 

09:00-11:10, Paper WeAT9.16 

Multiple Atlas Inference and Population Analysis with Spectral Clustering 

Sfikas, Giorgos, Univ. of Ioannina 

Heinrich, Christian, Univ. de Strasbourg 

Nikou, Christophoros, Univ. of Ioannina 

In medical imaging, constructing an atlas and bringing an image set in a single common reference frame may easily lead 

the analysis to erroneous conclusions, especially when the population under study is heterogeneous. In this paper, we propose 

a framework based on spectral clustering that is capable of partitioning an image population into sets that require a 

- 186 -

separate atlas, and identifying the most suitable templates to be used as coordinate reference frames. The spectral analysis 

step relies on pairwise distances that express anatomical differences between subjects as a function of the diffeomorphic 

warp required to match the one subject onto the other, plus residual information. The methodology is validated numerically 

on artificial and medical imaging data. 

09:00-11:10, Paper WeAT9.17 

Automatic Pathology Annotation on Medical Images: A Statistical Machine Translation Framework 

Gong, Tianxia, National Univ. of Singapore 

Li, Shimiao, National Univ. of Singapore 


Pang, Boon Chuan, National Neuroscience Inst. Tan Tock Seng Hospital 

Lim, Tchoyoson, National Neuroscience Inst. Tan Tock Seng Hospital 

Lee, Cheng Kiang, National Neuroscience Inst. Tan Tock Seng Hospital 

Tian, Qi, Insitute of Infocomm Res. 

Zhang, Zhuo, Insitute of Infocomm Res. 

Large number of medical images are produced daily in hospitals and medical institutions, the needs to efficiently process, 

index, search and retrieve these images are great. In this paper, we propose a pathology based medical image annotation 

framework using a statistical machine translation approach. After pathology terms and regions of interest (ROIs) are extracted 

from training text and images respectively, we use machine translation model IBM Model 1 to iteratively learn the 

alignment between the ROIs and the pathology terms and generate an ROI-to-pathology translation table. In testing phase, 

we annotate the ROI in the image with the pathology label of the highest probability in the translation table. The overall 

annotation results and the retrieval performance are promising to doctors and medical professionals. 

09:00-11:10, Paper WeAT9.18 

3D Cell Nuclei Fluorescence Quantification using Sliding Band Filter 

Quelhas, Pedro, INEB- Inst. de Engenharia Biomedica 

Mendonça, Ana Maria, INEB - Inst. de Engenharia Biomédica 

Aurélio, Campilho, Faculdade de Engenharia da Univ. do Porto 

Plant development is orchestrated by transcription factors whose expression has become observable in living plants through 

the use of fluorescence microscopy. However, the exact quantification of expression levels is still not solved and most 

analysis is only performed through visual inspection. With the objective of automating the quantification of cell nuclei 

fluorescence we present a new approach to detect cell nuclei in 3D fluorescence confocal microscopy, based on the use of 

the sliding band convergence filter (SBF). The SBF filter detects cell nuclei and estimate their shape with high accuracy 

in each 2D image plane. For 3D detection, individual 2D shapes are joined into 3D estimates and then corrected based on 

the analysis of the fluorescence profile. The final nuclei detection’s precision/recall are of 0.779/0.803 respectively, and 

the average Dice’s coefficient of 0.773. 

09:00-11:10, Paper WeAT9.19 

AP-Based Consensus Clustering for Gene Expression Time Series 

Chiu, Tai-Yu, National Tsing Hua Univ. 

Hsu, Ting-Chieh, National Tsing Hua Univ. 

Wang, Jia-Shung, National Tsing Hua Univ. 

We propose an unsupervised approach for analyzing gene time-series datasets. Our method combines Affinity Propagation 

(AP) and the spirit of consensus clustering— extracting multiple partitions from different time intervals. Without priori 

knowledge of total number of clusters and exemplars, this method holds the relationship between genes through different 

time intervals, and eliminates the influence from noises and outliers. We demonstrate our method with both synthetic and 

real gene expression datasets showing significant improvement in accuracy and efficiency. 

- 187 -

09:00-11:10, Paper WeAT9.21 

Unsupervised Tissue Image Segmentation through Object-Oriented Texture 

Tosun, Akif Burak, Bilkent Univ. 

Sokmensuer, Cenk, Hacettepe Univ. 

Gunduz-Demir, Cigdem, Bilkent Univ. 

This paper presents a new algorithm for the unsupervised segmentation of tissue images. It relies on using the spatial information 

of cytological tissue components. As opposed to the previous study, it does not only use this information in 

defining its homogeneity measures, but it also uses it in its region growing process. This algorithm has been implemented 

and tested. Its visual and quantitative results are compared with the previous study. The results show that the proposed 

segmentation algorithm is more robust in giving better accuracies with less number of segmented regions. 

09:00-11:10, Paper WeAT9.22 

Automated Tracking of Vesicles in Phase Contrast Microscopy Images 

Usenik, Peter, Univ. of Ljubljana 

Vrtovec, Tomaž, Univ. of Ljubljana 

Pernus, Franjo, Univ. of Ljubljana 

Likar, Bostjan, Univ. of Ljubljana 

We propose an algorithm for automated tracking of the contours of phospholipid vesicles, which can be used to evaluate 

the power, magnitude and frequency distribution of vesicle contour movements induced by thermal fluctuations. The algorithm 

was tested on vesicles with different structure composition that were exposed to variable temperature. The results 

show that the proposed algorithm is fast, robust and reliable, and that the resulting description of vesicle contours enables 

straightforward spectral analysis of their fluctuations, which can be also used for the determination of other vesicle properties, 

e.g. the bending rigidity or spontaneous curvature. 

09:00-11:10, Paper WeAT9.23 

Automatic Detection and Segmentation of Focal Liver Lesions in Contrast Enhanced CT Images 

Militzer, Arne, Friedrich-Alexander-Univ. Erlangen-Nuremberg 

Hager, Tobias, Friedrich-Alexander-Univ. Erlangen-Nuremberg 

Jäger, Florian, Pattern Recognition Lab. Univ. of Erlangen 

Tietjen, Christian, Siemens Healthcare 

Hornegger, Joachim, Friedrich-Alexander-Univ. 

In this paper a novel system for automatic detection and segmentation of focal liver lesions in CT images is presented. It 

utilizes a probabilistic boosting tree to classify points in the liver as either lesion or parenchyma, thus providing both detection 

and segmentation of the lesions at the same time and fully automatically. To make the segmentation more robust, 

an iterative classification scheme is integrated, that incorporates knowledge gained from earlier iterations into later decisions. 

Finally, a comprehensive evaluation of both the segmentation and the detection performance for the most common 

hypo dense lesions is given. Detection rates of 77% could be achieved with a sensitivity of 0.95 and a specificity of 0.93 

for lesion segmentation at the same settings. 

09:00-11:10, Paper WeAT9.24 

Automatic Diagnosis of Masses by using Level Set Segmentation and Shape Description 

Oliver, Arnau, Univ. of Girona 

Torrent, Albert, Univ. of Girona 

Llado, Xavier, Univ. of Girona 

Martí, Joan, Univ. of Girona 

We present here an approach for automatic mass diagnosis in mammographic images. Our strategy contains three main 

steps. Firstly, region of interests containing mass and background are segmented using a level set algorithm based on 

region information. Secondly, the characterisation of each segmented mass is obtained using the Zernike moments for 

modelling its shape. The final step is the diagnosis of masses as benign or malignant lesions, which is done using the Gentleboost 

algorithm that also assigns a likelihood value to the final result. The experimental evaluation, performed using 

two different digitised databases and Receiver Operating Characteristics (ROC) analysis, proves the feasibility of our proposal, 

showing the benefits of a correct shape description for improving automatic mass diagnosis. 

- 188 -

09:00-11:10, Paper WeAT9.25 

3D Reconstruction of Tumors for Applications in Laparoscopy using Conformal Geometric Algebra 

Machucho, Rubén, CINVESTAV, Unidad Guadalajara 


This paper presents a method for 3D reconstruction of tumors for applications in laparoscopy. This uses stereo endoscopic 

ultrasound images, which are simultaneously recorded. To do this, the ultrasound probe is tracked throughout the stereo 

endoscopic images using a particle filter and an auxiliary method based on thresholding in the HSV-space is used in order 

to improve the tracking. Then, the 3D pose of the ultrasound probe is calculated using conformal geometric algebra. The 

2D ultrasound images have been segmented using two methods: the level sets method and morphological operators, and 

a comparison between their performances has been done. Finally, the processed ultrasound images are compounded into 

a 3D volume, using the calculated ultrasound pose. 

09:00-11:10, Paper WeAT9.26 

Vessel Bend-Based Cup Segmentation in Retinal Images 

Joshi, Gopal Datt, IIIT Hyderabad 

Sivaswamy, Jayanthi, IIIT Hyderabad 

Karan, Kundan, AECS, Madurai 

Ranganath, Prashanth Ranganath, AECS, Madurai 

Krishnadas, S.r.krishnadas, AECS, Madurai 

In this paper, we present a method for cup boundary detection from monocular colour fundus image to help quantify cup 

changes. The method is based on anatomical evidence such as vessel bends at cup boundary, considered relevant by glaucoma 

experts. Vessels are modeled and detected in a curvature space to better handle inter-image variations. Bends in a 

vessel are robustly detected using a region of support concept, which automatically selects the right scale for analysis. A 

reliable subset called r-bends is derived using a multi-stage strategy and a local splinetting is used to obtain the desired 

cup boundary. The method has been successfully tested on 133 images comprising 32 normal and 101 glaucomatous 

images against three glaucoma experts. The proposed method shows high sensitivity in cup to disk ratio-based glaucoma 

detection and local assessment of the detected cup boundary shows good consensus with the expert markings. 

09:00-11:10, Paper WeAT9.27 

A Spot Segmentation Approach for 2D Gel Electrophoresis Images based on 2D Histograms 

Zacharia, Eleni, Univ. of Athens 

Kostopoulou, Eirini, Univ. of Athens 

Maroulis, Dimitris, Univ. of Athens 

Kossida, Sophia, Foundation of Biomedical Res. of the Acad. of Athens 

Spot-Segmentation, an essential stage of processing 2D gel electrophoresis images, remains a challenging process. The 

available software programs and techniques fail to separate overlapping protein spots correctly and cannot detect low intensity 

spots without human intervention. This paper presents an original approach to spot segmentation in 2D gel electrophoresis 

images. The proposed approach is based on 2D-histograms of the aforementioned images. The conducted 

experiments in a set of 16-bit 2D gel electrophoresis images demonstrate that the proposed method is very effective and 

it outperforms existing techniques even when it is applied to images containing several overlapping spots as well as to 

images containing spots of various intensities, sizes and shapes. 

09:00-11:10, Paper WeAT9.28 

Automated Tracking of the Carotid Artery in Ultrasound Image Sequences using a Self Organizing Neural Network 

Hamid Muhammed, Hamed, Royal Inst. of Tech. (KTH) 

Azar, Jimmy C., STH, KTH 

An automated method for the segmentation and tracking of moving vessel walls in 2D ultrasound image sequences is introduced. 

The method was tested on simulated and real ultrasound image sequences of the carotid artery. Tracking was 

achieved via a self organizing neural network known as Growing Neural Gas. This topology-preserving algorithm assigns 

a net of nodes connected by edges that distributes itself within the vessel walls and adapts to changes in topology with 

time. The movement of the nodes was analyzed to uncover the dynamics of the vessel wall. By this way, radial and longitudinal 

strain and strain rates have been estimated. Finally, wave intensity signals were computed from these measure- 

- 189 -

ments. The method proposed improves upon wave intensity wall analysis, WIWA, and opens up a possibility for easy and 

efficient analysis and diagnosis of vascular disease through noninvasive ultrasonic examination. 

09:00-11:10, Paper WeAT9.29 

Quantification of Subcellular Molecules in Tissue MicroArray 

Can, Ali, General Electric 

Gerdes, Michael, General Electric 

Bello, Musodiq, General Electric 

Quantifying expression levels of proteins with sub cellular resolution is critical to many applications ranging from biomarker 

discovery to treatment planning. In this paper, we present a fully automated method and a new metric that quantifies 

the expression of target proteins in immunohisto-chemically stained tissue microarray (TMA) samples. The proposed 

metric is superior to existing intensity or ratio-based methods. We compared performance with the majority decision of a 

group of 19 observers scoring estrogen receptor (ER) status, achieving a detection rate of 96% with 90% specificity. The 

presented methods will accelerate the processes of biomarker discovery and transitioning of biomarkers from research 

bench to clinical utility. 

09:00-11:10, Paper WeAT9.30 

Actual Midline Estimation from Brain CT Scan using Multiple Regions Shape Matching 

Chen, Wenan, Virginia Commonwealth Univ. 

Ward, Kevin, Virginia Commonwealth Univ. 

Kayvan, Najarian, Virginia Commonwealth Univ. 

Computer assisted medical image processing can extract vital information that may be elusive to human eyes. In this paper, 

an algorithm is proposed to automatically estimate the position of the actual midline from the brain CT scans using multiple 

regions shape matching. The method matches feature points identified from a set of ventricle templates, extracted from 

MRI, with the corresponding feature points in the segmented ventricles from CT images. Then based on the matched 

feature points, the position of the actual midline is estimated. The proposed multiple regions shape matching algorithm 

addresses the deformation problem arising from the intrinsic multiple regions nature of the brain ventricles. Experiments 

on the CT scans from patients with traumatic brain injuries (TBI) show promising results, particularly the proposed algorithm 

proves to be quite robust. 

09:00-11:10, Paper WeAT9.31 

Boosting Alzheimer Disease Diagnosis using PET Images 

Silveira, Margarida, Inst. Superior Técnico / Inst. de Sistema e Robótica 

Marques, Jorge S., Inst. Superior Técnico 

Alzheimer’s disease (AD) is one of the most frequent type of dementia. Currently there is no cure for AD and early diagnosis 

is crucial to the development of treatments that can delay the disease progression. Brain imaging can be a biomarker 

for Alzheimer’s disease. This has been shown in several works with MR Images, but in the case of functional imaging 

such as PET, further investigation is still needed to determine their ability to diagnose AD, especially at the early stage of 

Mild Cognitive Impairment (MCI). In this paper we study the use of PET images of the ADNI database for the diagnosis 

of AD and MCI. We adopt a Boosting classification method, a technique based on a mixture of simple classifiers, which 

performs feature selection concurrently with the segmentation thus is well suited to high dimensional problems. The Boosting 

classifier achieved an accuracy of 90.97% in the detection of AD and 79.63% in the detection of MCI. 

09:00-11:10, Paper WeAT9.32 

Efficient Quantitative Information Extraction from PCR-RFLP Gel Electrophoresis Images 

Maramis, Christos, Aristotle Univ. of Thessaloniki 

Delopoulos, Anastasios, Aristotle Univ. of Thessaloniki 

For the purpose of PCR-RFLP analysis, as in the case of human papillomavirus (HPV) typing, quantitative information 

needs to be extracted from images resulting from one-dimensional gel electrophoresis by associating the image intensity 

with the concentration of biological material at the corresponding position on a gel matrix. However, the background intensity 

of the image stands in the way of quantifying this association. We propose a novel, efficient methodology for mod- 

- 190 -

eling the image background with a polynomial function and prove that this can benefit the extraction of accurate information 

from the lane intensity profile when modeled by a superposition of properly shaped parametric functions. 

09:00-11:10, Paper WeAT9.33 

Heart Murmur Classification using Complexity Signatures 

Kumar, Dinesh, Univ. of Coimbra 

Carvalho, Paulo, Univ. of Coimbra 

Couceiro, Ricardo, Univ. of Coimbra 

Antunes, Manuel, Univ. Hospital of Coimbra 

Paiva, Rui Pedro, Univ. of Coimbra 

Henriques, Jorge, Univ. of Coimbra 

In this work, we propose a two-stage classifier based on the analysis of the heart sound’s complexity for murmur identification 

and classification. The first stage of the classifier verifies if the heart sound (HS) exhibits murmurs. To this end, 

the chaotic nature of the signal is assessed using the Lyapunov exponents (LEs). The second stage of the method is devoted 

to the classification of the type of murmur. In opposition to current state of the art methods for murmur classification, a 

reduced set of features is proposed. This set includes both well-known as well as new features designed to capture the 

morphological and the chaotic nature of murmurs. The classification scheme is evaluated with three classification methods: 

Learning Vector Quantization, Gaussian Mixture Models and Support Vector Machines. The achieved results are comparable 

to reported results in literature, while relying on a significant smaller set of features. 

09:00-11:10, Paper WeAT9.34 

3D Filtering for Injury Detection in Brain MRI 

Sun, Yu, Univ. of California, Riverside 

Bhanu, Bir, Univ. of California Riverside 

This paper introduces a brain injury detection approach, using 3D filtering technique, for the images acquired by the magnetic 

resonance imaging (MRI) technique. The proposed method uses the symmetry property of brain MRI on both 2D 

images and 3D volumetric information of the MRI sequences. The approach consists of two key steps: (1) each slice of a 

brain image is segmented into different parts using a region growing algorithm, and a symmetry affinity matrix is computed, 

(2) non-symmetric regions are extracted, and they are further used to detect brain injury. The Kalman filter is explicitly 

used in step (2) to filter out the non-injury regions in 3D. Experiments are carried out to indicate the high efficiency of the 

method to detect the brain injuries. 

09:00-11:10, Paper WeAT9.35 

Prediction of Protein Sub-Nuclear Location by Clustering mRMR Ensemble Feature Selection 

Sakar, Cemal Okan, Bahcesehir Univ. 

Kursun, Olcay, Istanbul Univ. 

Seker, Huseyin, De Montfort Univ. 

Gürgen, Fikret Boğaziçi Univ. 

In many applications of pattern recognition in the bioinformatics and biomedical fields, input variables are organized into 

natural partitions that are called views in the literature. Mutual information can be used in selecting a minimal yet capable 

subset of views. Ignoring the presence of views, dismantling them, and treating their variables intermixed along with those 

of others at best results in a complex uninterpretable predictive system for researchers in these fields. Moreover, it would 

require measuring or computing majority of the views. We use the clustering indices of the views and rank the views according 

to the unique information they have with the target using minimum redundancy-maximum relevance (mRMR) 

approach. We also propose an ensemble approach to reduce the random variations in clusterings. 

09:00-11:10, Paper WeAT9.36 

Multivariate Brain Mapping by Random Subspaces 

Sona, Diego, Fondazione Bruno Kessler 

Avesani, Paolo, Fondazione Bruno Kessler 

- 191 -

Functional neuroimaging consists in the use of imaging technologies allowing to record the functional brain activity in 

real-time. Among all techniques, data produced by functional magnetic resonance is encoded as sequences of 3D images 

of thousands of voxels. The main investigation performed on this data, termed brain mapping, aims at producing functional 

maps of the brain. Brain mapping aims at the detection of the portion of voxels concerned with specific perceptual or cognitive 

brain activities. This challenge can be shaped as a problem of feature selection. Excessive features-to-instances ratio 

characterizing this data is a major issue for the computation of statistically robust maps. We propose a solution based on 

a Random Subspace Method that extends the reference approach (Search Light) adopted by the neuroscientific community. 

A comparison of the two methods is supported by the results of an empirical evaluation. 

09:00-11:10, Paper WeAT9.37 

Dual Channel Colocalization for Cell Cycle Analysis using 3D Confocal Microscopy 

Jaeger, Stefan, Chinese Academy of Sciences 

Casas-Delucchi, Corella S., Tech. Univ. Darmstadt 

Cardoso, M. Cristina, Tech. Univ. Darmstadt 

Palaniappan, Kannappan, Univ. of Missouri 

We present a cell cycle analysis that aims towards improving our previous work by adding another channel and using one 

more dimension. The data we use is a set of 3D images of mouse cells captured with a spinning disk confocal microscope. 

All images are available in two channels showing the chromocenters and the fluorescently marked protein PCNA, respectively. 

In the present paper, we will describe our recent colocalization study in which we use Hessian-based blob detectors 

in combination with radial features to measure the degree of overlap between both channels. We show that colocalization 

performed in such a way provides additional discriminative power and allows us to distinguish between phases that we 

were not able to distinguish with a single 2D channel. 

09:00-11:10, Paper WeAT9.38 

Automated Cell Phase Classification for Zebrafish Fluorescence Microscope Images 

Lu, Yanting, Nanjing Univ. of Science and Tech. 

Lu, Jianfeng, Nanjing Univ. of Science and Tech. 

Liu, Tianming, Univ. of Georgia 

Yang,Jingyu, Univ. of Georgia 

Automated cell phenotype image classification is an interesting bioinformatics problem. In this paper, an automated cell 

phase classification framework is investigated for zebra fish presomitic mesoderm (PSM) images. Low image resolution, 

gradual transitions between adjacent categories and irregularity of real cell images make this classification task tough but 

intriguing. The proposed framework first segments zebra fish image into cell patches by a two-stage segmentation procedure, 

then extracts feature set NF9, which designed especially for this low resolution image set, on each cell patch, and finally 

employs support vector machine (SVM) as cell classifier. At present, the total accuracy by NF9 is 75%. 

09:00-11:10, Paper WeAT9.39 

Data-Driven Lung Nodule Models for Robust Nodule Detection in Chest CT 

Farag, Amal, Univ. of Louisville 

Graham, James, Univ. of Louisville 


The quality of the lung nodule models determines the success of lung nodule detection. This paper describes aspects of 

our data-driven approach for modeling lung nodules using the texture and shape properties of real nodules to form an average 

model template per nodule type. The ELCAP low dose CT (LDCT) scans database is used to create the required statistics 

for the models based on modern computer vision techniques. These models suit various machine learning approaches 

for nodule detection including Bayesian methods, SVM and Neural Networks, and computations may be enhanced through 

genetic algorithms and Adaboost. The eminence of the new nodule models are studied with respect to parametric models 

showing significant improvements in both sensitivity and specificity. 

- 192 -

09:00-11:10, Paper WeAT9.41 

Segmentation of Anatomical Structures in Brain MR Images using Atlases in FSL - a Quantitative Approach 

Soldea, Octavian, Sabanci Univ. 

Ekin, Ahmet, Philips Res. Europe 

Soldea, Diana Florentina, Sabanci Univ. 

Unay, Devrim, Bahcesehir Univ. 


Ercil, Aytul, Sabanci Univ. 

Uzunbas, Gokhan Mustafa, Rutgers State University 

Firat, Zeynep, Yeditepe University Hospital 

Cihangiroglu, Mutlu, Yeditepe University Hospital 

Segmentation of brain structures from MR images is crucial in understanding the disease progress, diagnosis, and treatment 

monitoring. Atlases, showing the expected locations of the structures, are commonly used to start and guide the segmentation 

process. In many cases, the quality of the atlas may have a significant effect in the final result. In the literature, 

commonly used atlases may be obtained from one subject’s data, only from the healthy, or depict only certain structures 

that limit their accuracy. Anatomical variations, pathologies, imaging artifacts all could aggravate the problems related to 

application of atlases. In this paper, we propose to use multiple atlases that are sufficiently different from each other as 

much as possible to handle such problems. To this effect, we have built a library of atlases and computed their similarity 

values to each other. Our study showed that the existing atlases have varying levels of similarity for different structures. 

09:00-11:10, Paper WeAT9.42 

Graphical Model-Based Tracking of Curvilinear Structures in Bio-Image Sequences 

Koulgi, Pradeep, Univ. of California, Santa Barbara 

Sargin, Mehmet Emre, Univ. of California, Santa Barbara 

Rose, Kenneth, Univ. of California, Santa Barbara 


Tracking of curvilinear structures is a task of fundamental importance in the quantitative analysis of biological structures 

such as neurons, blood vessels, retinal interconnects, microtubules, etc. The state of the art HMM-based contour tracking 

scheme for tracking microtubules, while performing well in most scenarios, can miss the track if, during its growth, it intersects 

another microtubule in its neighbourhood. In this paper we present a graphical model-based tracking algorithm 

which propagates across frames information about the dynamics of all the microtubules. This allows the algorithm to faithfully 

differentiate the contour of interest from others that contribute to the clutter, and maintain tracking accuracy. We 

present results of experiments on real microtubule images captured using fluorescence microscopy, and show that our proposed 

scheme outperforms the existing HMM-based scheme. 

11:10-12:10, WePL1 Anadolu Auditorium 

The Quantitative Analysis of User Behavior Online Data, Models and Algorithms 

Prabhakar Raghavan Plenary Session 

Yahoo! Research, USA 

Prabhakar Raghavan has been the head of Yahoo! Research since 2005. His research interests include text and web mining, 

and algorithm design. He is a consulting professor of Computer Science at Stanford University and editor-in-chief of the 

Journal of the ACM. Prior to joining Yahoo!, he was the chief technology officer at Verity and has held a number of technical 

and managerial positions at IBM Research. Prabhakar received his PhD from Berkeley and is a fellow of the ACM 

and of the IEEE. 

By blending principles from mechanism design, algorithms, machine learning and massive distributed computing, the search 

industry has become good at optimizing monetization on sound scientific principles. This represents a successful and 

growing partnership between computer science and microeconomics. When it comes to understanding how online users 

respond to the content and experiences presented to them, we have more of a lacuna in the collaboration between computer 

science and certain social sciences. We will use a concrete technical example from image search results presentation, developing 

in the process some algorithmic and machine learning problems of interest in their own right. We then use this 

example to motivate the kinds of studies that need to grow between computer science and the social sciences; a critical 

element of this is the need to blend large-scale data analysis with smaller-scale eye-tracking and “individualized” lab studies. 

- 193 -

WeBT1 Marmara Hall 

Tracking and Surveillance - III Regular Session 

Session chair: Liao, Mark (Univ. of Southampton) 

13:30-13:50, Paper WeBT1.1 

Object Tracking by Structure Tensor Analysis 


Kluckner, Stefan, Graz Univ. of Tech. 


Covariance matrices have recently been a popular choice for versatile tasks like recognition and tracking due to their powerful 

properties as local descriptor and their low computational demands. This paper outlines similarities of covariance matrices 

to the well-known structure tensor. We show that the generalized version of the structure tensor is a powerful descriptor and 

that it can be calculated in constant time by exploiting the properties of integral images. To measure the similarities between 

several structure tensors, we describe an approximation scheme which allows comparison in a Euclidean space. Such an approach 

is also much more efficient than the common, computationally demanding Riemannian Manifold distances. Experimental 

evaluation proves the applicability for the task of object tracking demonstrating improved performance compared to 

covariance tracking. 

13:50-14:10, Paper WeBT1.2 

Prototype Learning using Metric Learning based Behavior Recognition 

Zhu, Pengfei, Chinese Acad. of Sciences 

Hu, Weiming, Chinese Acad. of Sciences 

Yuan, Chunfeng, Chinese Acad. of Sciences 

Li, Li, Chinese Acad. of Sciences 

Behavior recognition is an attractive direction in the computer vision domain. In this paper, we propose a novel behavior 

recognition method based on prototype learning using metric learning. Prototype learning algorithm can improve the classification 

performance of nearest-neighbor classifier, reduce the storage and computation requirements. And the metric learning 

algorithm is used to advance the performance of the prototype learning. In this paper, We use a kind of compound feature 

including local feature and motion feature to recognize human behaviors. The experimental results show the effectiveness 

of our method. 

14:10-14:30, Paper WeBT1.3 

Are Correlation Filters Useful for Human Action Recognition? 

Ali, Saad, Carnegie Mellon Univ. 

Lucey, Simon, CSIRO 

It has been argued in recent work that correlation filters are attractive for human action recognition from videos. Motivation 

for their employment in this classification task lies in their ability to: (i) specify where the filter should peak in contrast to 

all other shifts in space and time, (ii) have some degree of tolerance to noise and intra-class variation (allowing learning 

from multiple examples), and (iii) can be computed deterministically with low computational overhead. Specifically, Maximum 

Average Correlation Height (MACH) filters have exhibited encouraging results~\cite{Mikel} on a variety of human 

action datasets. Here, we challenge the utility of correlation filters, like the MACH filter, in these circumstances. First, we 

demonstrate empirically that identical performance can be attained to the MACH filter by simply taking the~\emph{average} 

of the same action specific training examples. Second, we characterize theoretically and empirically under what circumstances 

a MACH filter would become equivalent to the average of the action specific training examples. Based on this characterization, 

we offer an alternative type of filter, based on a discriminative paradigm, that circumvent the inherent limitations of 

correlation filters for action recognition and demonstrate improved action recognition performance. 

14:30-14:50, Paper WeBT1.4 

Tracking Hand Rotation and Grasping from an IR Camera using Cylindrical Manifold Embedding 

Lee, Chan-Su, Yeungnam Univ. 

Park, Shin Won, Yeungnam Univ. 

- 194 -

This paper presents a new approach for hand rotation and grasping tracking from a single IR camera. For the complexity and 

ambiguity of hand pose, it is difficult to track hand pose and view variations simultaneously from a single camera. We propose 

a cylindrical manifold embedding for one dimensional hand pose variation and cyclic viewpoint variation. A hand pose shape 

from a specific viewpoint can be generated from an embedding point on the cylindrical manifold after learning nonlinear 

generative models from the embedding space to the corresponding observed shape. Hand grasping with simultaneous hand 

rotation is tracked using particle filter on the manifold space. Experimental results for synthetic and real data show accurate 

tracking of grasping hand with rotation. The proposed approach shows potentials for advanced user interface in dark environments. 

14:50-15:10, Paper WeBT1.5 

Particle Filter Tracking with Online Multiple Instance Learning 

Ni, Zefeng, Univ. of California, Santa Barbara 

Sunderrajan, Santhoshkumar, Univ. of California, Santa Barbara 

Rahimi, Amir, Univ. of California, Santa Barbara 


This paper addresses the problem of object tracking by learning a discriminative classifier to separate the object from its 

background. The online-learned classifier is used to adaptively model object’s appearance and its background. To solve the 

typical problem of erroneous training examples generated during tracking, an online multiple instance learning (MIL) algorithm 

is used by allowing false positive examples. In addition, particle filter is applied to make best use of the learned classifier 

and help to generate a better representative set of training examples for the online MIL learning. The effectiveness of the 

proposed algorithm is demonstrated in some challenging environments for human tracking. 

WeBT2 Topkapı Hall A 

Pattern Recognition Systems and Applications - I Regular Session 

Session chair: Fred, Ana Luisa Nobre (Instituto Superior Técnico) 

13:30-13:50, Paper WeBT2.1 

A Test of Granger Non-Causality based on Nonparametric Conditional Independence 

Seth, Sohan, Univ. of Florida 

Principe, Jose, Univ. of Florida 

In this paper we describe a test of Granger non-causality from the perspective of a new measure of nonparametric conditional 

independence. We apply the proposed test on two synthetic nonlinear problems where linear Granger causality fails and 

show that the proposed method is able to derive the true causal connectivity effectively. 

13:50-14:10, Paper WeBT2.2 

Haar Random Forest Features and SVM Spatial Matching Kernel for Stonefly Species Identification 

Larios, Natalia, Univ. of Washington 

Soran, Bilge, Univ. of Washington 

Shapiro, Linda, 

Martinez-Muñoz, Gonzalo, Univ. Autonoma de Madrid 

Lin, Junyuan, Oregon State Univ. 

Dietterich, Thomas G., Oregon State Univ. 

This paper proposes an image classification method based on extracting image features using Haar random forests and combining 

them with a spatial matching kernel SVM. The method works by combining multiple efficient, yet powerful, learning 

algorithms at every stage of the recognition process. On the task of identifying aquatic stonefly larvae, the method has stateof-the-art 

or better performance, but with much higher efficiency. 

14:10-14:30, Paper WeBT2.3 

Incorporating Lane Estimation as Context Source in Pedestrian Recognition Task 

Szczot, Magdalena, Daimler AG 

Dannenmann, Iris, Daimler AG 

Löhlein, Otto, Daimler AG 

- 195 -

This contribution presents a method for incorporating information given by a lane estimation system into the pedestrian 

recognition task. The lane in front of the vehicle is represented by a three dimensional set of points belonging to the middle 

of the road. A cascaded classifier solves the first stage of pedestrian recognition task delivering a list of detections in a camera 

image. We present a fusion system which combines the information provided by the cascaded classifier and the lane estimation. 

The fusion system delivers a probability map of the environment in front of the vehicle. The map indicates regions in 

front of the vehicle which with a certain probability contain a relevant detected pedestrian. 

14:30-14:50, Paper WeBT2.4 

PILL-ID: Matching and Retrieval of Drug Pill Imprint Images 

Lee, Young-Beom, Korea Univ. 

Park, Unsang, Michigan State Univ. 


Automatic illicit drug pill matching and retrieval is becoming an important problem due to an increase in the number of 

tablet type illicit drugs being circulated in our society. We propose an automatic method to match drug pill images based on 

the imprints appearing on the tablet. This will help identify the source and manufacturer of the illicit drugs. The feature 

vector extracted from tablet images is based on edge localization and invariant moments. Instead of storing a single template 

for each pill type, we generate multiple templates during the edge detection process. This circumvents the difficulties during 

matching due to variations in illumination and viewpoint. Experimental results using a set of real drug pill images (822 illicit 

drug pill images and 1,294 legal drug pill images) showed 76.74% (93.02%) rank one (rank-20) matching accuracy. 

14:50-15:10, Paper WeBT2.5 

Identifying Gender from Unaligned Facial Images by Set Classification 

Chu, Wen-Sheng, Acad. Sinica 

Huang, Chun-Rong, Acad. Sinica 

Chen, Chu-Song, Acad. Sinica 

Rough face alignments lead to suboptimal performance of face identification systems. In this study, we present a novel approach 

for identifying genders from facial images without proper face alignments. Instead of using only one input for test, 

we generate an image set by randomly cropping out a set of image patches from a neighborhood of the face detection region. 

Each image set is represented as a subspace and compared with other image sets by measuring the canonical correlation between 

two associated subspaces. By finding an optimal discriminative transformation for all training subspaces, the proposed 

approach with unaligned facial images is shown to outperform the state-of-the-art methods with face alignment. 

WeBT3 Dolmabahçe Hall A 

Shape Modeling - II Regular Session 

Session chair: Imiya, Atsushi (Chiba Univ.) 

13:30-13:50, Paper WeBT3.1 

Detection of Shapes in 2D Point Clouds Generated from Images 

Su, Jingyong, Florida State Univ. 

Zhu, Zhiqiang, Florida State Univ. 

Srivastava, Anuj, Florida State Univ. 

Huffer, Fred W., Florida State Univ. 

We present a novel statistical framework for detecting pre-determined shape classes in 2D cluttered point clouds, which are 

in turn extracted from images. In this model based approach, we use a 1D Poisson process for sampling points on shapes, a 

2D Poisson process for points from background clutter, and an additive Gaussian model for noise. Combining these with a 

past stochastic model on shapes of continuous 2D contours, and optimization over unknown pose and scale, we develop a 

generalized likelihood ratio test for shape detection. We demonstrate the efficiency of this method and its robustness to clutter 

using both simulated and real data. 

- 196 -

13:50-14:10, Paper WeBT3.2 

Gait Learning-Based Regenerative Model: A Level Set Approach 

Al-Huseiny, Muayed Sattar, Univ. of Southampton 

Mahmoodi, Sasan, Univ. of Southampton 

Nixon, Mark, Univ. of Southampton 

We propose a learning method for gait synthesis from a sequence of shapes(frames) with the ability to extrapolate to novel 

data. It involves the application of PCA, first to reduce the data dimensionality to certain features, and second to model corresponding 

features derived from the training gait cycles as a Gaussian distribution. This approach transforms a non Gaussian 

shape deformation problem into a Gaussian one by considering features of entire gait cycles as vectors in a Gaussian space. 

We show that these features which we formulate as continuous functions can be modeled by PCA. We also use this model 

to in-between (generate intermediate unknown) shapes in the training cycle. Furthermore, this paper demonstrates that the 

derived features can be used in the identification of pedestrians. 

14:10-14:30, Paper WeBT3.3 

Scale-Space Spectral Representation of Shape 

Bates, Jonathan, Florida State Univ. 

Liu, Xiuwen, Florida State Univ. 

Mio, Washington, Florida State Univ. 

We construct a scale space of shape of closed Riemannian manifolds, equipped with metrics derived from spectral representations 

and the Hausdorff distance. The representation depends only on the intrinsic geometry of the manifolds, making it 

robust to pose and articulation. The computation of shape distance involves an optimization problem over the 2^p-element 

group of all p-bit strings, which is approached with Markov chain Monte Carlo techniques. The methods are applied to cluster 

surfaces in 3D space. 

14:30-14:50, Paper WeBT3.4 

Learning Metrics for Shape Classification and Discrimination 

Fan, Yu, Florida State Univ. 

Houle, David, Florida State Unversity 

Mio, Washington, Florida State Univ. 

We propose a family of shape metrics that generalize the classical Procrustes distance by attributing weights to general linear 

combinations of landmarks. We develop an algorithm to learn a metric that is optimally suited to a given shape classification 

problem. Shape discrimination experiments are carried out with phantom data, as well as landmark data representing the 

shape of the wing of different species of fruit flies. 

14:50-15:10, Paper WeBT3.5 

Non-Parametric 3D Shape Warping 

Hillenbrand, Ulrich, German Aerospace Center (DLR) 

A method is presented for non-rigid alignment of a source shape to a target shape through estimating and interpolating pointwise 

correspondences between their surfaces given as point clouds. The resulting mapping can be non-smooth and non-isometric, 

relate shapes across large variations, and find partial matches. It does not require a parametric model or a prior of 

deformations. Results are shown for some objects from the Princeton Shape Benchmark and a range scan. 

WeBT4 Dolmabahçe Hall B 

Image Denoising Regular Session 

Session chair: Skodras, A. (Hellenic Open Univ.) 

13:30-13:50, Paper WeBT4.1 

Edge Preserving Image Denoising in Reproducing Kernel Hilbert Spaces 

Bouboulis, Pantelis, Univ. of Athens 

Slavakis, Konstantinos, Univ. of Peloponnese 

Theodoridis, Sergios, Univ. of Athens 

- 197 -

The goal of this paper is the development of a novel approach for the problem of Noise Removal, based on the theory of 

Reproducing Kernels Hilbert Spaces (RKHS). The problem is cast as an optimization task in a RKHS, by taking advantage 

of the celebrated semi parametric Representer Theorem. Examples verify that in the presence of gaussian noise the proposed 

method performs relatively well compared to wavelet based techniques and outperforms them significantly in the presence 

of impulse or mixed noise. 

13:50-14:10, Paper WeBT4.2 

Multichannel Image Regularisation using Anisotropic Geodesic Filtering 

Grazzini, Jacopo, Los Alamos National Lab. 


Dillard, Scott, Los Alamos National Lab. 

This paper extends a recent image-dependent regularisation approach introduced in [Grazzini and Soille, PR09&CCIS09] 

aiming at edge-preserving smoothing. For that purpose, geodesic distances equipped with a Riemannian metric need to be 

estimated in local neighbourhoods. By deriving an appropriate metric from the gradient structure tensor, the associated geodesic 

paths are constrained to follow salient features in images. Following, we design a generalised anisotropic geodesic 

filter, incorporating not only a measure of the edge strength, like in the original method, but also further directional information 

about the image structures. The proposed filter is particularly efficient at smoothing heterogeneous areas while preserving 

relevant structures in multichannel images. 

14:10-14:30, Paper WeBT4.3 

Local Jet based Similarity for NL-Means Filtering 

Manzanera, Antoine, ENSTA-ParisTech 

Reducing the dimension of local descriptors in images is useful to perform pixels comparison faster. We show here that, for 

computing the NL-means denoising filter, image patches can be favourably replaced by a vector of spatial derivatives (local 

jet), to calculate the similarity between pixels. First, we present the basic, limited range implementation, and compare it with 

the original NL-means. We use a fast estimation of the noise variance to automatically adjust the decay parameter of the 

filter. Next, we present the unlimited range implementation using nearest neighbours search in the local jet space, based on 

a binary search tree representation. 

14:30-14:50, Paper WeBT4.4 

Image Denoising based on Fuzzy and Intra-Scale Dependency in Wavelet Transform Domain 

Saeedi, Jamal, Amirkabir Univ. of Tech. 

Moradi, Mohammad Hassan, Amirkabir Univ. of Tech. 

Abedi, Ali, Amirkabir Univ. of Tech. 

In this paper, we propose a new wavelet shrinkage algorithm based on fuzzy logic. Fuzzy logic is used for taking neighbor dependency and 

uncorrelated nature of noise into account in wavelet-based image denoising. For this reason, we use a fuzzy feature for enhancing wavelet coefficients 

information in the shrinkage step. Then a fuzzy membership function shrinks wavelet coefficients based on the fuzzy feature. We 

examine our image denoising algorithm in the dual-tree discrete wavelet transform, which is the new shiftable and modified version of discrete 

wavelet transform. Extensive comparisons with the state-of-the-art image denoising algorithm indicate that our image denoising algorithm 

has a better performance in noise suppression and edge preservation. 

14:50-15:10, Paper WeBT4.5 

Noise-Insensitive Contrast Enhancement for Rendering High-Dynamic-Range Images 

Lin, Hsueh-Yi Sean, Lunghwa Univ. of Science and Tech. 

The process of compressing the high luminance values into the displayable range inevitably incurs the loss of image contrasts. Although the 

local adaptation process, such as the two-scale contrast reduction scheme, is capable of preserving details during the HDR compression 

process, it cannot be used to enhance the local contrasts of image contents. Moreover, the effect of noise artifacts cannot be eliminated when 

the detail manipulation is subsequently performed. We propose a new tone reproduction scheme, which incorporates the local contrast enhancement 

and the noise suppression processes, for the display of HDR images. Our experimental results show that the proposed scheme is 

indeed effective in enhancing local contrasts of image contents and suppressing noise artifacts during the increase of the visibility of HDR 

scenes. 

- 198 -

WeBT5 Topkapı Hall B 

Feature Extraction for Face Recognition Regular Session 

Session chair: Govindaraju, Venu (Univ. at Buffalo) 

13:30-13:50, Paper WeBT5.1 

Monogenic Binary Pattern (MBP): A Novel Feature Extraction and Representation Model for Face Recognition 

Yang, Meng, The Hong Kong Pol. Univ. 


Zhang, Lin, The Hong Kong Pol. Univ. 


A novel feature extraction method, namely monogenic binary pattern (MBP), is proposed in this paper based on the theory 

of monogenic signal analysis, and the histogram of MBP (HMBP) is subsequently presented for robust face representation 

and recognition. MBP consists of two parts: one is monogenic magnitude encoded via uniform LBP, and the other is monogenic 

orientation encoded as quadrant-bit codes. The HMBP is established by concatenating the histograms of MBP of all 

sub-regions. Compared with the well-known and powerful Gabor filtering based LBP schemes, one clear advantage of 

HMBP is its lower time and space complexity because monogenic signal analysis needs fewer convolutions and generates 

more compact feature vectors. The experimental results on the AR and FERET face databases validate that the proposed 

MBP algorithm has better performance than or comparable performance with state-of-the-art local feature based methods 

but with significantly lower time and space complexity. 

13:50-14:10, Paper WeBT5.2 

Automatic Frequency Band Selection for Illumination Robust Face Recognition 



Varying illumination conditions cause a dramatic change in facial appearance that leads to a significant drop in face recognition 

algorithms’ performance. In this paper, to overcome this problem, we utilize an automatic frequency band selection 

scheme. The proposed approach is incorporated to a local appearance-based face recognition algorithm, which employs 

discrete cosine transform (DCT) for processing local facial regions. From the extracted DCT coefficients, the approach 

determines to the ones that should be used for classification. Extensive experiments conducted on the extended Yale face 

database B have shown that benefiting from frequency information provides robust face recognition under changing illumination 

conditions. 

14:10-14:30, Paper WeBT5.3 

Directed Random Subspace Method for Face Recognition 

Harandi, Mehrtash, NICTA 

Nili Ahmadabadi, Majid, Univ. of Tehran 

Nadjar Araabi, Babak, Univ. of Tehran 

Bigdeli, Abbas, NICTA 


With growing attention to ensemble learning, in recent years various ensemble methods for face recognition have been 

proposed that show promising results. Among diverse ensemble construction approaches, random subspace method has 

received considerable attention in face recognition. Although random feature selection in random subspace method improves 

accuracy in general, it is not free of serious difficulties and drawbacks. In this paper we present a learning scheme 

to overcome some of the drawbacks of random feature selection in the random subspace method. The proposed learning 

method derives a feature discrimination map based on a measure of accuracy and uses it in a probabilistic recall mode to 

construct an ensemble of subspaces. Experiments on different face databases revealed that the proposed method gives superior 

performance over the well-known benchmarks and state of the art ensemble methods. 

14:30-14:50, Paper WeBT5.4 

Raw vs. Processed: How to Use the Raw and Processed Images for Robust Face Recognition under Varying Illuminatio 

Xu, Li, Chinese Acad. of Sciences 

- 199 -

Lei, Huang, Chinese Acad. of Sciences 

Liu, Changping, Chinese Acad. of Sciences 

Many previous image processing methods discard low-frequency components of images to extract illumination invariant 

for face recognition. However, this method may cause distortion of processed images and perform poorly under normal 

lighting. In this paper, a new method is proposed to deal with illumination problem in face recognition. Firstly, we define 

a score to denote a relative difference of the first and second largest similarities between the query input and the individuals 

in the gallery classes. Then, according to the score, we choose the appropriate images, raw or processed images, to involve 

the recognition. The experiment in ORL, CMU-PIE and Extended Yale B face databases shows that our adaptive method 

give more robust result after combination and perform better than the traditional fusion operators, the sum and the maximum 

of similarities. 

14:50-15:10, Paper WeBT5.5 

Discriminative Prototype Learning in Open Set Face Recognition 

Han, Zhongkai, Tsinghua Univ. 



We address the problem of prototype design for open set face recognition (OSFR) using single sample image. Normalized 

Correlation (NC), also known as Cosine Distance, offers many benefits in accuracy and robustness compared to other distance 

measurement in OSFR problem. Inspired by classical Learning Vector Quantization (LVQ), a novel discriminative 

learning method is proposed to design a discriminative prototype used by NC classifier. Specifically, we develop an objective 

function that fixes the NC score between the prototype and within-class sample at a high level and minimizes the 

similarity between the prototype and between-class samples. Several experiments conducted on benchmark databases 

demonstrate the superior performance of the prototype designed compared to the original one. 

WeBT6 Anadolu Auditorium 

Document Analysis - II Regular Session 

Session chair: Lopresti, Daniel (Lehigh Univ.) 

13:30-13:50, Paper WeBT6.1 

On-Line Handwriting Word Recognition using a Bi-Character Model 

Prum, Sophea, Univ. of La Rochelle 

Visani, Muriel, Univ. of La Rochelle 

Ogier, Jean-Marc, Univ. de la Rochelle 

This paper deals with on-line handwriting recognition. Analytic approaches have attracted an increasing interest during 

the last ten years. These approaches rely on a preliminary segmentation stage, which remains one of the most difficult 

problems and may affect strongly the quality of the global recognition process. In order to circumvent this problem, this 

paper introduces a bi-character model, where each character is recognized jointly with its neighboring characters. This 

model yields two main advantages. First, it reduces the number of confusions due to connections between characters 

during the character recognition step. Second, it avoids some possible confusion at the character recognition level during 

the word recognition stage. Our experimentation on significant databases shows some interesting improvements of the 

recognition rate, since the recognition rate is increased from 65% to 83% by using this bi-character strategy. 

13:50-14:10, Paper WeBT6.2 

Ruling Line Removal in Handwritten Page Images 

Lopresti, Daniel, Lehigh Univ. 

Kavallieratou, Ergina, Univ. of the Aegean 

In this paper we present a procedure for removing ruling lines from a handwritten document image that does not break existing 

characters. We take advantage of common ruling line properties such as uniform width, predictable spacing, position 

vs. text, etc. The proposed process has no effect on document images without ruling lines, hence no a priori discrimination 

is required. The system is evaluated on synthetic page images in five different languages. 

- 200 -

14:10-14:30, Paper WeBT6.3 

Script Identification – a Han & Roman Script Perspective 

Chanda, Sukalpa, GJØVIK Univ. Coll. 




All Han-based scripts (Chinese, Japanese, and Korean) possess similar visual characteristics. Hence system development 

for identification of Chinese, Japanese and Korean scripts from a single document page is quite challenging. It is noted 

that a Han-based document page might also have Roman script in them. A multi-script OCR system dealing with Chinese, 

Japanese, Korean, and Roman scripts, demands identification of scripts before execution of respective OCR modules. We 

propose a system to address this problem using directional features along with a Gaussian Kernel-based Support Vector 

Machine. We got promising results of 98.39% script identification accuracy at character level and 99.85% at block level, 

when no rejection was considered. 

14:30-14:50, Paper WeBT6.4 

Robust 1D Barcode Recognition on Mobile Devices 

Rocholl, Johann, Stuttgart Univ. 

Klenk, Sebastian, Stuttgart Univ. 

Heidemann, Gunther, Stuttgart Univ. 

In the following we will describe a novel method for decoding linear barcodes from blurry camera images. Our goal was 

to develop a algorithm that can be used on mobile devices to recognize product numbers from EAN or UPC barcodes. 

14:50-15:10, Paper WeBT6.5 

Fast Logo Detection and Recognition in Document Images 

Li, Zhe, Siemens AG 

Schulte-Austum, Matthias, Siemens AG 

Neschen, Martin, Recosys GmbH 

The scientific significance of automatic logo detection and recognition is more and more growing because of the increasing 

requirements of intelligent document image analysis and retrieval. In this paper, we introduce a system architecture which 

is aiming at segmentation-free and layout-independent logo detection and recognition. Along with the unique logo feature 

design, a novel way to ensure the geometrical relationships among the features, and different optimizations in the recognition 

process, this system can achieve improvements concerning both the recognition performance and the running time. 

The experimental results on several sets of real-word documents demonstrate the effectiveness of our approach. 

WeBT7 Dolmabahçe Hall C 

Classification in Biomedicine Regular Session 

Session chair: Gurcan, Metin (Ohio State Univ.) 

13:30-13:50, Paper WeBT7.1 

Joint Independent Component Analysis of Brain Perfusion and Structural Magnetic Resonance Images in Dementia 

Tosun, Duygu, Center for Imaging Neurodegenerative Diseases 

Rosen, Howard, UCSF 

Miller, Bruce L., UCSF 

Weiner, Michael W., UCSF 

Schuff, Norbert, UCSF 

Magnetic Resonance Imaging (MRI) provides various imaging modes to study the brain. We tested the benefits of joint 

analysis of multimodality MRI data using joint independent components analysis (jICA) in comparison to unimodality 

analyses. Specifically, we designed a jICA to decompose the joint distributions of multimodality MRI data across image 

voxels and subjects into independent components that explain joint variations between image modalities across subjects. 

We applied jICA to structural and perfusion-weighted MRI data from 12 patients diagnosed with behavioral variant front 

temporal dementia (bvFTD), a type of dementia, and 12 healthy elderly individuals. While unimodality analyses showed 

widespread brain atrophy and hypoperfusion in the patients, jICA further revealed links between atrophy and hypoperfusion 

- 201 -

in specific brain regions. Moreover, significant links were confined to the right brain hemisphere in FTLD, consistent with 

the clinical symptoms. Considering multimodality effect size between bvFTD patients and controls, brain atrophy and hypoperfusion 

regions identified by multimodality jICA yielded the large effect size while regions identified by unimodality 

analysis of atrophy and hypoperfusion differences revealed only a medium multimodality effect size between bvFTD patients 

and controls. The findings demonstrate the power of jICA to effectively evaluate multimodality brain imaging data. 

13:50-14:10, Paper WeBT7.2 

Endoscopic Image Classification using Edge-Based Features 

Häfner, Michael, St. Elisabeth Hospital 

Gangl, Alfred, Medical Univ. of Vienna 

Liedlgruber, Michael, Univ. of Salzburg 


Vécsei, Andreas, St. Anna Children’s Hospital 

Wrba, Friedrich, Medical Univ. of Vienna 

We present a system for an automated colon cancer detection based on the pit pattern classification. In contrast to previous 

work we exploit the visual nature of the underlying classification scheme by extracting features based on detected edges. 

To focus on the most discriminative subset of features we use a greedy forward feature subset selection. The classification 

is then carried out using the k-nearest neighbors (k-NN) classifier. The results obtained are very promising and show that 

an automated classification of the given imagery is feasible by using the proposed method. 

14:10-14:30, Paper WeBT7.3 

Biclustering of Expression Microarray Data with Topic Models 


Lovato, Pietro, Univ. of Verona 

Ferrarini, Alberto, Univ. of Verona 

Delledonne, Massimo, Univ. of Verona 

This paper presents an approach to extract biclusters from expression micro array data using topic models a class of probabilistic 

models which allow to detect interpretable groups of highly correlated genes and samples. Starting from a topic 

model learned from the expression matrix, some automatic rules to extract biclusters are presented, which overcome the 

drawbacks of previous approaches. The methodology has been positively tested with synthetic benchmarks, as well as 

with a real experiment involving two different species of grape plants (Vitis vinifera and Vitis riparia). 

14:30-14:50, Paper WeBT7.4 

A Multiple Instance Learning Approach Toward Optimal Classification of Pathology Slides 

Dundar, Murat, IUPUI 

Badve, Sunil, Indiana Univ. 

Raykar, Vikas, Siemens Medical 

Jain, Rohit, IUPUI 

Sertel, Olcay, The Ohio State Univ. 

Gurcan, Metin, The Ohio State Univ. 

Pathology slides are diagnosed based on the histological descriptors extracted from regions of interest (ROIs) identified 

on each slide by the pathologists. A slide usually contains multiple regions of interest and a positive (cancer) diagnosis is 

confirmed when at least one of the ROIs in the slide is identified as positive. For a negative diagnosis the pathologist has 

to rule out cancer for each and every ROI available. Our research is motivated toward computer-assisted classification of 

digitized slides. The objective in this study is to develop a classifier to optimize classification accuracy at the slide level. 

Traditional supervised training techniques which are trained to optimize classifier performance at the ROI level yield suboptimal 

performance in this problem. We propose a multiple instance learning approach based on the implementation of 

the large margin principle with different loss functions defined for positive and negative samples. We consider the classification 

of intraductal breast lesions as a case study, and perform experimental studies comparing our approach against 

the state-of-the-art. 

- 202 -

14:50-15:10, Paper WeBT7.5 

Gaussian ERP Kernel Classifier for Pulse Waveforms Classification 




Wang, Kuanquan, Harbin Inst. of Tech. 

Li, Naimin, Harbin Inst. of Tech. 

While advances in sensor and signal processing techniques have provided effective tools for quantitative research on traditional 

Chinese pulse diagnosis (TCPD), the automatic classification of pulse waveforms is remained a difficult problem. 

To address this issue, this paper proposed a novel edit distance with real penalty (ERP)-based k-nearest neighbors (KNN) 

classifier by referring to recent progresses in time series matching and KNN classifier. Taking advantage of the metric 

property of ERP, we first develop a Gaussian ERP kernel, and then embed it into kernel difference-weighted KNN classifier. 

The proposed Gaussian ERP kernel classifier is evaluated on a dataset which includes 2470 pulse waveforms. Experimental 

results show that the proposed classifier is much more accurate than several other pulse waveform classification approaches. 

WeCT1 Marmara Hall 

Tracking and Surveillance - IV Regular Session 

Session chair: Carneiro, Gustavo (Technical Univ. of Lisbon) 

15:40-16:00, Paper WeCT1.1 

Human 3D Motion Recognition based on Spatial-Temporal Context of Joints 

Zhao, Qiong, Univ. of Science and Tech. of China 

Wang, Lihua, City Univ. of Hong Kong 

Ip, Horace, 

Zhou, Xuehai, Univ. of Science and Tech. of China 

The paper presents a novel human motion recognition method based on a new form of the Hidden Markov Models, called 

spatial-temporal hidden markov models (ST-HMM), which can be learnt from a sequence of joints positions. To cope with 

the high dimensionality of the pose space, in this paper, we exploit the spatial dependency between each pair of spatially 

connected joints in the articulated skeletal structure, as well as the temporal dependency due to the continuous movement 

of each of the joints. The spatial-temporal contexts of these joints are learnt from the sequences of joints movements and 

captured by our ST-HMM. Results of recognizing 11 different action classes on a large number of motion capture sequences 

as well as synthetic tracking data show that our approach outperforms traditional HMM approach in terms of robustness 

and recognition rates. 

16:00-16:20, Paper WeCT1.2 

Matching Groups of People by Covariance Descriptor 


Takala, Valtteri, Univ. of Oulu 


In this paper, we present a new solution to the problem of matching groups of people across multiple non-overlapping 

cameras. Similar to the problem of matching individuals across cameras, matching groups of people also faces challenges 

such as variations of illumination conditions, poses and camera parameters. Moreover, people often swap their positions 

while walking in a group. In this paper, we propose to use covariance descriptor in appearance matching of group images. 

Covariance descriptor is shown to be a discriminative descriptor which captures both appearance and statistical properties 

of image regions. Furthermore, it presents a natural way of combining multiple heterogeneous features together with a 

relatively low dimensionality. Experimental results on two different datasets demonstrate the effectiveness of the proposed 

method. 

16:20-16:40, Paper WeCT1.3 

Boosting Incremental Semi-Supervised Discriminant Analysis for Tracking 

Wang, Heng, Chinese Acad. of Sciences 

Hou, Xinwen, Chinese Acad. of Sciences 

- 203 -


Tracking is recently formulated as a problem of discriminating the object from its nearby background, where the classifier 

is updated by new samples successively arriving during tracking. Depending on whether labeling the samples or not, the 

tracker can be designed in a supervised or semi-supervised manner. This paper proposes a novel semi-supervised algorithm 

for tracking by combining Semi-supervised Discriminant Analysis (SDA) with an online boosting framework. Using the 

local geometric structure information from the samples, the SDA-based weak classifier is made more robust to outliers. 

Meanwhile, we design an incremental updating mechanism for SDA so that it can adapt to appearance changes. We further 

propose an Extended SDA (ESDA) algorithm, which gives better discrimination ability. Results on several challenging 

video sequences demonstrate the effectiveness of the method. 

16:40-17:00, Paper WeCT1.4 

Optical Rails: View-Based Track Following with Hemispherical Environment Model and Orientation View 

Descriptors 

Dederscheck, David, Goethe Univ. Frankfurt 

Zahn, Martin, Goethe Univ. Frankfurt 

Friedrich, Holger, Goethe Univ. Frankfurt 

Mester, Rudolf, Goethe Univ. Frankfurt 

We present a purely view-based method for robot navigation along a prerecorded track using compact omni directional 

view-descriptors. This paper focuses on a new model for the navigation environment to determine the steering direction 

by efficient holistic comparison of views. The concept of view descriptors based on low-order expansion of local orientation 

vectors into spherical harmonic basis functions is augmented by a linear illumination model, providing discriminative 

view matching also under illumination changes. 

17:00-17:20, Paper WeCT1.5 

Forward-Backward Error: Automatic Detection of Tracking Failures 

Kalal, Zdenek, Univ. of Surrey 


Matas, Jiri, CTU Prague 

This paper proposes a novel method for tracking failure detection. The detection is based on the Forward-Backward error, 

i.e. the tracking is performed forward and backward in time and the discrepancies between these two trajectories are measured. 

We demonstrate that the proposed error enables reliable detection of tracking failures and selection of reliable trajectories 

in video sequences. We demonstrate that the approach is complementary to commonly used normalized 

cross-correlation (NCC). Based on the error, we propose a novel object tracker called Median Flow. State-of-the-art performance 

is achieved on challenging benchmark video sequences which include non-rigid objects. 

WeCT2 Topkapı Hall A 

Pattern Recognition Systems and Applications - II Regular Session 

Session chair: Marinai, Simone (Univ. of Florence) 

15:40-16:00, Paper WeCT2.1 

Scene-Adaptive Human Detection with Incremental Active Learning 

Joshi, Ajay, Univ. of Minnesota, Twin Cities 

Porikli, Fatih, MERL 

In many computer vision tasks, scene changes hinder the generalization ability of trained classifiers. For instance, a human 

detector trained with one set of images is unlikely to perform well in different scene conditions. In this paper, we propose 

an incremental learning method for human detection that can take generic training data and build a new classifier adapted 

to the new deployment scene. Two operation modes are proposed: i) a completely autonomous mode wherein first few 

empty frames of video are used for adaptation, and ii) an active learning approach with user in the loop, for more challenging 

scenarios including situations where empty initialization frames may not exist. Results show the strength of the 

proposed methods for quick adaptation. 

- 204 -

16:00-16:20, Paper WeCT2.2 

Direct Printability Prediction in VLSI using Features from Orthogonal Transforms 

Kryszczuk, Krzysztof, IBM Zurich Res. Lab. 

Hurley, Paul, IBM Zurich Res. Lab. 

Sayah, Robert, IBM Systems and Tech. Group 

Full-chip printability simulations for VLSI layouts use analytical and heuristic physical process models, and require an 

explicit creation of a mask and image. This is a computationally expensive task, often prohibitively so, especially when 

prototyping new designs. In this paper we show that using orthogonal transform-based fixed-length feature vector representations 

of 22nm VLSI layouts to perform classification based rapid printability prediction, can help in avoiding or reducing 

the number of simulations. Furthermore, in order to overcome the problem of scarcity of training data, we show 

how re-scaled, abundant 45nm designs can train error prediction models for new, native 22nm designs. Our experiments, 

run on M1 layer data and line width errors, demonstrate the viability of the proposed approach. 

16:20-16:40, Paper WeCT2.3 

Improving Performance of Network Traffic Classification Systems by Cleaning Training Data 

Gargiulo, Francesco, Univ. of Naples Federico II 

Sansone, Carlo, Univ. of Naples Federico II 

In this paper we propose to apply an algorithm for finding out and cleaning mislabeled training sample in an adversarial 

learning context, in which a malicious user tries to camouflage training patterns in order to limit the classification system 

performance. In particular, we describe how this algorithm can be effectively applied to the problem of identifying HTTP 

traffic flowing through port TCP 80, where mislabeled samples can be forced by using port-spoofing attacks. 

16:40-17:00, Paper WeCT2.4 

Bayesian Networks for Predicting IVF Blastocyst Development 

Uyar, Asli, Bogazici Univ. 

Bener, Ayse, Bogazici Univ. 

Ciray, H. Nadir, Bahceci Woman Healthcare Centre 

Bahceci, Mustafa, Bahceci Woman Healthcare Centre 

In in-vitro fertilization (IVF) treatment, blastocyst stage embryo transfers at day 5 result in higher pregnancy rates. However, 

there is a risk of transfer cancelation due to embryonic developmental failure. Clinicians need reliable models in 

predicting blastocyst development. In this study, we apply Bayesian networks in order to investigate cause-effect relationships 

of the variables of interest in embryo growth process and to predict blastocyst development. We have analyzed 7745 

embryo records including embryo morphological characteristics and patient related data. Experimental results revealed 

that, Bayesian networks can predict blastocyst development with 63.5% true positive rate and 33.8% false positive rate. 

17:00-17:20, Paper WeCT2.5 

Spectral Invariant Representation for Spectral Reflectance Image 

Ibrahim, Abdelhameed, Chiba Univ. 

Tominaga, Shoji, Chiba Univ. 

Horiuchi, Takahiko, Chiba Univ. 

Although spectral images contain large amount of information, compared with color images, the image acquisition is affected 

by several factors such as shading and specular highlight. Many researchers have introduced color invariant and 

spectral invariant representations for these factors using the standard dichromatic reflection model of inhomogeneous dielectric 

materials. However, these representations are inadequate for other materials like metal. This paper proposes a 

more general spectral invariant representation for obtaining reliable spectral reflectance images. Our invariant representation 

is derived from the standard dichromatic reflection model for dielectric materials and the extended dichromatic reflection 

model for metals. We proof the invariant formulas for spectral images of most natural objects preserve spectral 

information and are invariant to highlights, shading, surface geometry, and illumination intensity. The method is applied 

to the problem of material classification and image segmentation of a raw circuit board. Experiments are done with real 

spectral images to examine the performance of the proposed method. 

- 205 -

WeCT3 Dolmabahçe Hall A 

Active Contours and Related Methods Regular Session 

Session chair: Burkhardt, Hans (Univ. of Freiburg) 

15:40-16:00, Paper WeCT3.1 

Level Set based Segmentation using Local Feature Distribution 


We propose a level set based framework to segment textured images. The snake deforms in the image domain in searching 

for object boundaries by minimizing an energy functional, which is defined based on dynamically selected local distribution 

of orientation invariant features. We also explore the user initialization to simplify the segmentation and improve accuracy. 

Experimental results on both synthetic and real data show significant improvements compared to direct modeling of filtering 

responses or piecewise constant modeling. 

16:00-16:20, Paper WeCT3.2 

Mean Shift Gradient Vector Flow: A Robust External Force Field for 3D Active Surfaces 


Padeken, Jan, Max-Planck-Inst. of Immunobiology 

Heun, Patrick, Max-Planck-Inst. of Immunobiology 


Ronneberger, Olaf, Univ. of Freiburg 

Gradient vector flow snakes are a very common method in bio-medical image segmentation. The use of gradient vector flow 

herein brings some major advantages like a large capture range and a good adaption of the snakes in concave regions. In 

some cases though, the application of gradient vector flow can also have undesired effects, e.g. if only parts of an image are 

strongly blurred, the remaining weak gradients will be smoothed away. Also, large gradients resulting from small but bright 

image structures usually have strong impact on the overall result. To tackle this problem, we present an improvement of the 

gradient vector flow, using the mean shift procedure and show its advantages on the segmentation of 3D cell nuclei. 

16:20-16:40, Paper WeCT3.3 

Adaptive Diffusion Flow for Parametric Active Contours 

Wu, Yuwei, Beijing Inst. of Tech. 

Wang, Yuanquan, Beijing Inst. of Tech. 


This paper proposes a novel external force for active contours, called adaptive diffusion flow (ADF). We reconsider the generative 

mechanism of gradient vector flow (GVF) diffusion process from the perspective of image restoration, and exploit a 

harmonic hyper surface minimal function to substitute smoothness energy term of GVF for alleviating the possible leakage 

problem. Meanwhile, a Laplacian functional is incorporated in the ADF framework to ensure that the vector flow diffuses 

mainly along normal direction in homogenous regions of an image. Experiments on synthetic and real images demonstrate 

the good properties of the ADF snake, including noise robustness, weak edge preserving, and concavity convergence. 

16:40-17:00, Paper WeCT3.4 

Using Snakes with Asymmetric Energy Terms for the Detection of Varying-Contrast Edges in SAR Images 

Seppke, Benjamin, Univ. of Hamburg 

Dreschler-Fischer, Leonie, Univ. of Hamburg 

Hübbe, Nathanael, Univ. of Hamburg 

Active contour methods like snakes, have become a basic tool in computer vision and image analysis over the last years. 

They have proven to be adequate for the task of finding boundary features like broken edges in an image. However, when 

applying the basic snake technique to synthetic aperture radar (SAR) remote sensing images, the detection of varying-contrast 

edges may not be satisfying. This is caused by the special imaging technique of SAR and the commonly known specklenoise. 

In this paper we propose the use of asymmetric external energy terms to cope with this problem. We show first results of the 

method for the detection of edges of tidal creeks using an ENVISAT ASAR image. These creeks can be found in the World 

Heritage Site Wadden Sea located at the German Bight (North Sea). 

- 206 -

17:00-17:20, Paper WeCT3.5 

Length Increasing Active Contour for the Segmentation of Small Blood Vessels 

Rivest-Hénault, David, École de Tech. Supérieure 

Deschênes, Sylvain, Sainte-Justine Hospital 

Lapierre, Chantale, Hospital Sainte-Justine 

Cheriet, Mohammed, École de Tech. Supérieure 

A new level-set based active contour method for the segmentation of small blood vessels and other elongated structures 

is presented. Its main particularity is the presence of a length increasing force in the contour driving equation. The effect 

of this force is to push the active contour in the direction of thin elongated shapes. Although the proposed force is not 

stable in general, our experiments show that with few precautions it can successfully be integrated in a practical segmentation 

scheme and that it helps to segment a longer part of the structures of interest. For the segmentation of blood vessels, 

this may reduce the amount of user interactivity needed: only a small region inside the structure of interest need to be 

specified. 

WeCT4 Anadolu Auditorium 

Graphical Models and Bayesian Methods Regular Session 

Session chair: Murino, Vittorio (Univ. of Verona) 

15:40-16:00, Paper WeCT4.1 

Using Sequential Context for Image Analysis 


Jurrus, Elizabeth, Univ. of Utah 


This paper proposes the sequential context inference (SCI) algorithm for Markov random field (MRF) image analysis. 

This algorithm is designed primarily for fast inference on an MRF model, but its application requires also a specific modeling 

architecture. The architecture is composed of a sequence of stages, each modeling the conditional probability of the 

labels, conditioned on a neighborhood of the input image and output of the previous stage. By learning the model at each 

stage sequentially with regards to the true output labels, the stages learn different models which can cope with errors in 

the previous stage. 

16:00-16:20, Paper WeCT4.2 

Recovery Video Stabilization using MRF-MAP Optimization 

Kim, Soo Wan, Seoul National Univ. 

Yi, Kwang Moo, Automation and System Res. Inst. Univ. 

Oh, Songhwai, Seoul National Univ. 

Choi, Jin Young, Seoul National University 

In this paper, we propose a novel approach for video stabilization using Markov random field (MRF) modeling and maximum 

a posteriori (MAP) optimization. We build an MRF model describing a sequence of unstable images and find joint 

pixel matchings over all image sequences with MAP optimization via Gibbs sampling. The resulting displacements of 

matched pixels in consecutive frames indicate the camera motion between frames and can be used to remove the camera 

motion to stabilize image sequences. The proposed method shows robust performance even when a scene has moving 

foreground objects and brings more accurate stabilization results. The performance of our algorithm is evaluated on outdoor 

scenes. 

16:20-16:40, Paper WeCT4.3 

Annealed SMC Samplers for Dirichlet Process Mixture Models 

Ülker, Yener, Istanbul Tech. Univ. 

Gunsel, Bilge, Istanbul Tech. Univ. 

Cemgil, Ali Taylan, Bogazici Univ. 

In this work we propose a novel algorithm that approximates sequentially the Dirichlet Process Mixtures (DPM) model 

posterior. The proposed method takes advantage of the Sequential Monte Carlo (SMC) samplers framework to design an 

- 207 -

effective annealing procedure that prevents the algorithm to get trapped in a local mode. We evaluate the performance in 

a Bayesian density estimation problem with unknown number of components. The simulation results suggest that the proposed 

algorithm represents the target posterior much more accurately and provides significantly smaller Monte Carlo error 

when compared to particle filtering. 

16:40-17:00, Paper WeCT4.4 

Bayesian Inference for Nonnegative Matrix Factor Deconvolution Models 

Kirbiz, Serap, Istanbul Tech. Univ. 

Cemgil, Ali Taylan, Bogazici Univ. 

Gunsel, Bilge, Istanbul Tech. Univ. 

In this paper we develop a probabilistic interpretation and a full Bayesian inference for non-negative matrix deconvolution 

(NMFD) model. Our ultimate goal is unsupervised extraction of multiple sound objects from a single channel auditory 

scene. The proposed method facilitates automatic model selection and determination of the sparsity criteria. Our approach 

retains attractive features of standard NMFD based methods such as fast convergence and easy implementation. We demonstrate 

the use of this algorithm in the log-frequency magnitude spectrum domain, where we employ it to perform model 

order selection and control sparseness directly. 

17:00-17:20, Paper WeCT4.5 

A Graph Matching Algorithm using Data-Driven Markov Chain Monte Carlo Sampling 

Lee, Jungmin, Seoul National Univ. 

Cho, Minsu, Seoul National Univ. 


We propose a novel stochastic graph matching algorithm based on data-driven Markov Chain Monte Carlo (DDMCMC) 

sampling technique. The algorithm explores the solution space efficiently and avoid local minima by taking advantage of 

spectral properties of the given graphs in data-driven proposals. Thus, it enables the graph matching to be robust to deformation 

and outliers arising from the practical correspondence problems. Our comparative experiments using synthetic 

and real data demonstrate that the algorithm outperforms the state-of-the-art graph matching algorithms. 

WeCT5 Topkapı Hall B 

Image Processing Applications Regular Session 

Session chair: Zafeiriou, Stefanos (Imperial College of London) 

15:40-16:00, Paper WeCT5.1 

Tensor-Driven Hyperspectral Denoising: A Strong Link for Classification Chains? 

Martín-Herrero, Julio, Univ. de Vigo 

Ferreiro-Armán, Marcos, Univ. de Vigo 

We show how a tensor-driven anisotropic diffusion denoising method affects the performance of a classifier trained to 

discriminate among vine varieties in noisy hyper spectral images. We compare the classification statistics on the original 

and denoised images and discuss the convenience of this kind of preprocessing for classification in hyperspectral images. 

16:00-16:20, Paper WeCT5.2 

Search Strategies for Image Multi-Distortion Estimation 

Caron, Andre Louis, Univ. of Sherbrooke 

Jodoin, Pierre-Marc, Univ. of Sherbrooke 

Charrier, Christophe, Univ. de Caen 

In this paper, we present a method for estimating the amount of Gaussian noise and Gaussian blur in a distorted image. 

Our method is based on the MS-SSIM framework which, although designed to measure image quality, is used to estimate 

the amount of blur and noise in a degraded image given a reference image. Various search strategies such as Newton, Simplex, 

and brute force search are presented and rigorously compared. Based on quantitative results, we show that the amount 

of blur and noise in a distorted image can be recovered with an accuracy up to 0.95% and 5.40%, respectively. To our 

knowledge, such precision has never been achieved before. 

- 208 -

16:20-16:40, Paper WeCT5.3 

Development of a High-Definition and Multispectral Image Capturing System for Digital Archiving of Early Modern 

Tapestries of the Kyoto Gion Festival 

Tsuchida, Masaru, NTT Corp. 

Tanaka, Hiromi, Ritsumei Univ. 

Yano, Keiji, Ritsumeikan Univ. 

We developed a two-shot 6-band image capturing system consisting of a large-format camera, a customized interference 

filter, and a scanning digital back to capture a 185-M-pixel images. The interference filter is set in front of the camera lens 

to obtain a 6-band image, that is, two 3-band images, one taken with the filter and the other without it. After correction of 

optical aberrations caused by the interference filter as well as system arrangement errors, the two images are combined 

into a 6-band image. The 6-band image was converted into a color-managed RGB image embedded ICC profile. In experiments, 

object images were captured as several divided parts and synthesized as almost 500-M-pixel image by using an 

image stitching technique. Resolution of the captured images is 0.02 mm/pixel. This paper discusses the camera system 

with its focus on some early modern tapestries used in the Kyoto Gion Festival. After the experiments, we interviewed a 

craftsman to assess the image’s importance in archiving and analyzing fabric structures. 

16:40-17:00, Paper WeCT5.4 

Appearance Control using Projection with Model Predictive Control 

Amano, Toshiyuki, Nara Inst. of Science and Tech. 

Kato, Hirokazu, Nara Inst. of Science and Tech. 

The unified technique for the irradiance correction and appearance enhancement for the real scene is proposed in this 

paper. The proposed method employed MPC algorithm for the projector camera system and enabled arbitrary appearance 

control such like photo retouching software in the real world. In the experiment, the appearance control of saturation enhancement, 

color removal, phase control, edge enhancement, image blur, makes unique brightness and the other enhancements 

for the real scene are shown. 

17:00-17:20, Paper WeCT5.5 

Decision Trees for Fast Thinning Algorithms 

Grana, Costantino, Univ. degli Studi di Modena e Reggio Emilia 

Borghesani, Daniele, Univ. degli Studi di Modena e Reggio Emilia 

Cucchiara, Rita, Univ. degli Studi di Modena e Reggio Emilia 

We propose a new efficient approach for neighborhood exploration, optimized with decision tables and decision trees, 

suitable for local algorithms in image processing. In this work, it is employed to speed up two widely used thinning techniques. 

The performance gain is shown over a large freely available dataset of scanned document images. 

WeCT6 Dolmabahçe Hall B 

Iris Regular Session 

Session chair: Kittler, Josef (Univ. of Surrey) 

15:40-16:00, Paper WeCT6.1 

Personal Identification from Iris Images using Localized Radon Transform 

Zhou, Yingbo, The Hong Kong Pol. Univ. 

Kumar, Ajay, The Hong Kong Pol. Univ. 

Personal identification using iris images has invited lots of attention in the literature and offered higher accuracy. However, 

the computational complexity in the feature extraction from the normalized iris images is still of key concern and further 

efforts are required to develop efficient feature extraction approaches. In this paper, we investigate a new approach for the 

efficient and effective extraction of iris features using localized Radon transforms. The feature extraction process exploits 

the orientation information from the local iris texture features using finite Radon transform. The dominant orientation 

from these Radon transform features is used to generate a binarized/compact feature representation. The similarity between 

two feature vectors is computed from the minimum matching distance that can account for the variations resulting from 

translation and rotation of the images. The feasibility of this approach is rigorously evaluated on two publically available 

iris image databases, i.e. IITD iris image database v1 and CASIA v3 iris image database. We also investigate the multi- 

- 209 -

scale analysis of iris images to enhance the performance. The experimental results presented in this paper are highly promising 

and suggest the computationally attractive alternative for the online iris identification. 

16:00-16:20, Paper WeCT6.2 

Segmentation of Unideal Iris Images using Game Theory 

Roy, Kaushik, Concordia Univ. 

Bhattacharya, Prabir, Concordia Univ. 


Robust localization of inner/outer boundary from an iris image plays an important role in iris recognition. However, the 

conventional iris/pupil localization methods using the region-based segmentation or the gradient-based boundary finding 

are often hampered by non-linear deformations, pupil dilations, head rotations, motion blurs, reflections, non-uniform intensities, 

low image contrast, camera angles and diffusions, and presence of eyelids and eyelashes. The novelty of this research 

effort is that we apply a parallel game-theoretic decision making procedure by using the modified Chakra borty 

and Duncan’s algorithm, which integrates the region-based segmentation and gradient-based boundary finding methods 

and fuses the complementary strengths of each of these individual methods. This integrated scheme forms a unified approach, 

which is robust to noise and poor localization. 

16:20-16:40, Paper WeCT6.3 

Iris-Biometric Hash Generation for Biometric Database Indexing 

Rathgeb, Christian, Univ. of Salzburg 


Performing identification on large-scale biometric databases requires an exhaustive linear search. Since biometric data 

does not have any natural sorting order, indexing databases, in order to minimize the response time of the system, represents 

a great challenge. In this work we propose a biometric hash generation technique for the purpose of biometric database 

indexing, applied to iris biometrics. Experimental results demonstrate that the presented approach highly accelerates biometric 

identification. 

16:40-17:00, Paper WeCT6.4 

A Robust Iris Localization Method using an Active Contour Model and Hough Transform 

Koh, Jaehan, SUNY Buffalo 


Chaudhary, Vipin, SUNY Buffalo 

Iris segmentation is one of the crucial steps in building an iris recognition system since it affects the accuracy of the iris 

matching significantly. This segmentation should accurately extract the iris region despite the presence of noises such as 

varying pupil sizes, shadows, specular reflections and highlights. Considering these obstacles, several attempts have been 

made in robust iris localization and segmentation. In this paper, we propose a robust iris localization method that uses an 

active contour model and a circular Hough transform. Experimental results on 100 images from CASIA iris image database 

show that our method achieves 99% accuracy and is about 2.5 times faster than the Daugman’s in locating the pupillary 

and the limbic boundaries. 

17:00-17:20, Paper WeCT6.5 

Isis: Iris Segmentation for Identification Systems 

Nappi, Michele, Univ. of Salerno 

Riccio, Daniel, Univ. of Salerno 

De Marsico, Maria, Sapienza Univ. of Rome 

Advances in processing procedures make the iris a realistic candidate to the role of biometry of the future. Precise detection 

and segmentation for such biometry are a crucial ongoing research area. We propose an iris segmentation technique and 

show that it is more reliable than existent ones. 

- 210 -

WeCT7 Dolmabahçe Hall C 

Handwriting Recognition Regular Session 

Session chair: Doermann, David (Univ. of Maryland) 

15:40-16:00, Paper WeCT7.1 

Consensus Network based Hypotheses Combination for Arabic Offline Handwriting Recognition 

Prasad, Rohit, Raytheon BBN Tech. 

Kamali, Matin, BBN Tech. 

Belanger, David, Raytheon BBN Tech. 

Rosti, Antti-Veikko, Raytheon BBN Tech. 

Matsoukas, Spyros, Raytheon BBN Tech. 

Natarajan, P., BBN Tech. 

Offline handwriting recognition (OHR) is an extremely challenging task because of many factors including variations in 

writing style, writing device and material, and noise in the scanning and collection process. Due to the diverse nature of 

the above challenges, it is highly unlikely that a single recognition technique can address all the characteristics of realworld 

handwritten documents. Therefore, one must consider designing different systems, each addressing specific challenges 

in the handwritten corpus, and then combining the hypotheses from these diverse systems. To that end, we present 

an innovative approach for combining hypotheses from multiple handwriting recognition systems. Our approach is based 

on generating a consensus network using hypotheses from a diverse set of handwriting recognition systems. Next, we decode 

the consensus network for producing the best possible hypothesis given an error criterion. Experimental results on 

an Arabic OHR task show that our combination algorithm outperforms the NIST ROVER technique and results in a 7% 

relative reduction in the word error rate over the single best OHR system. 

16:00-16:20, Paper WeCT7.2 

A Novel Lexicon Reduction Method for Arabic Handwriting Recognition 

Wshah, Safwan, SUNY Buffalo 


Li, Huiping, Applied Media Analysis Inc. 

Cheng, Yanfen, Wuhan Univ. of Tech. 

In this paper, we present a method for lexicon size reduction which can be used as an important pre-processing for an offline 

Arabic word recognition. The method involves extraction of the dot descriptors and PAWs (Piece of Arabic Word ). 

Then the number and position of dots and the number of the PAWs are used to eliminate unlikely candidates. The extraction 

of the dot descriptors is based on defined rules followed by a convolutional neural network for verification. The reduction 

algorithm makes use of the combination of two features with a dynamic matching scheme. On IFN/ENIT database of 

26459 Arabic handwritten word images we achieved a reduction rate of 87% with accuracy above 93%. 

16:20-16:40, Paper WeCT7.3 

A Novel Verification System for Handwritten Words Recognition 

Guichard, Laurent, IRISA - INRIA 

Toselli, Alejandro Héctor, Univ. Pol. de Valencia 

Couasnon, Bertrand, Irisa / Insa 

In the field of isolated handwritten word recognition, the development of highly effective verification systems to reject 

words presenting ambiguities is still an active research topic. In this paper, a novel verification system based on support 

vector machine scoring and multiple reject class-dependent thresholds is presented. In essence, a set of support vector machines 

appended to a standard HMM-based recognition system provides class-dependent confidence measures employed 

by the verification mechanism to accept or reject the recognized hypotheses. Experimental results on RIMES database 

show that this approach outperforms other state-of-the-art approaches. 

16:40-17:00, Paper WeCT7.4 

Multi-Template GAT/PAT Correlation for Character Recognition with a Limited Quantity of Data 


Yamashita, Yukihiko, Tokyo Inst. of Tech. 

- 211 -

This paper addresses the problem of improving the accuracy of character recognition with a limited quantity of data. The 

key ideas are twofold. One is distortion-tolerant template matching via hierarchical global/partial affine transformation 

(GAT/PAT) correlation to absorb both linear and nonlinear distortions in a parametric manner. The other is use of multiple 

templates per category obtained by k-means clustering in a gradient feature space for dealing with topological distortion. 

Recognition experiments using the handwritten numerical database IPTP CDROM1B show that the proposed method 

achieves a much higher recognition rate of 97.9% than that of 85.8% obtained by the conventional, simple correlation 

matching with a single template per category. Furthermore, comparative experiments show that the k-NN classification 

using the tangent distance and the GAT correlation technique achieves recognition rates of 97.5% and 98.7%, respectively. 

17:00-17:20, Paper WeCT7.5 

Structure Adaptation of HMM Applied to OCR 

Ait Mohand, Kamel, Univ. of Rouen 

Paquet, Thierry, Univ. of Rouen 

Ragot, Nicolas, Univ. François Rabelais Tours 

Heutte, Laurent, Univ. of Rouen 

In this paper we present a new algorithm for the adaptation of Hidden Markov Models (HMM models). The principle of 

our iterative adaptive algorithm is to alternate an HMM structure adaptation stage with an HMM Gaussian MAP adaptation 

stage of the parameters. This algorithm is applied to the recognition of printed characters to adapt the character models of 

a poly font general purpose character recognizer to new fonts of characters, never seen during training. A comparison of 

the results with those of MAP classical adaptation scheme show a slight increase in the recognition performance. 

WeBCT8 Upper Foyer 

SVM, NN, Kernel and Learning; Object Detection and Recognition Poster Session 

Session chair: Ross, Arun (West Virginia Univ.) 

13:30-16:30, Paper WeBCT8.1 

Multi-Class Pattern Classification in Imbalanced Data 

Ghanem, Amal Saleh, Univ. of Bahrain 

Venkatesh, Svetha, Curtin Univ. of Tech. 

West, Geoff, Curtin Univ. of Tech. 

The majority of multi-class pattern classification techniques are proposed for learning from balanced datasets. However, 

in several real-world domains, the datasets have imbalanced data distribution, where some classes of data may have few 

training examples compared for other classes. In this paper we present our research in learning from imbalanced multiclass 

data and propose a new approach, named Multi-IM, to deal with this problem. Multi-IM derives its fundamentals 

from the probabilistic relational technique (PRMs-IM), designed for learning from imbalanced relational data for the twoclass 

problem. Multi-IM extends PRMs-IM to a generalized framework for multi-class imbalanced learning for both relational 

and non-relational domains. 

13:30-16:30, Paper WeBCT8.2 

Deep Quantum Networks for Classification 

Zhou, Shusen, Harbin Inst. of Tech. 

Chen, Qingcai, Harbin Inst. of Tech. 

Wang, Xiaolong, Harbin Inst. of Tech. 

This paper introduces a new type of deep learning method named Deep Quantum Network (DQN) for classification. DQN 

inherits the capability of modeling the structure of a feature space by fuzzy sets. At first, we propose the architecture of 

DQN, which consists of quantum neuron and sigmoid neuron and can guide the embedding of samples divisible in new 

Euclidean space. The parameter of DQN is initialized through greedy layer-wise unsupervised learning. Then, the parameter 

space of the deep architecture and quantum representation are refined by supervised learning based on the global gradient-descent 

procedure. An exponential loss function is introduced in this paper to guide the supervised learning procedure. 

Experiments conducted on standard datasets show that DQN outperforms other feed forward neural networks and neurofuzzy 

classifiers. 

- 212 -

13:30-16:30, Paper WeBCT8.3 

Nonlinear Combination of Multiple Kernels for Support Vector Machines 

Li, Jinbo, East China Normal Univ. 


Support vector machines (SVMs) are effective kernel methods to solve pattern recognition problems. Traditionally, they 

adopt a single kernel chosen beforehand, which makes them lack flexibility. The recent multiple kernel learning (MKL) 

overcomes this issue by optimizing over a linear combination of kernels. Despite its success, MKL neglects useful information 

generated from the nonlinear interaction of different kernels. In this paper, we propose SVMs based on the nonlinear 

combination of multiple kernels (NCMK) which surmounts the drawback of previous MKL by the potential to exploit 

more information. We show that our method can be formulated as a semi-definite programming (SDP) problem then solved 

by interior-point algorithms. Empirical studies on several data sets indicate that the presented approach is very effective. 

13:30-16:30, Paper WeBCT8.4 

Data Transformation of the Histogram Feature in Object Detection 

Zhang, Rongguo, Chinese Acad. of Sciences 



Detecting objects in images is very important for several application domains in computer vision. This paper presents an 

experimental study on data transformation of the feature vector in object detection. We use the modified Pyramid of Histograms 

of Orientation Gradients descriptor and the SVM classifier to form an object detection model. We apply a simple 

transformation to the histogram features before training and testing. This transformation equals a small change in the 

kernel function for Support Vector Machines. This change is much quicker than the kernel, but obtains better results. Experimental 

evaluations on the UIUC Image Database and TU Darmstadt Database show that the transformed features perform 

better than the raw features, and this transformation improves the linear separability of the histogram feature. 

13:30-16:30, Paper WeBCT8.5 

A New Learning Formulation for Kernel Classifier Design 

Sato, Atsushi, NEC 

This paper presents a new learning formulation for classifier design called ``General Loss Minimization.’’ The formulation 

is based on Bayes decision theory which can handle various losses as well as prior probabilities. A learning method for 

RBF kernel classifiers is derived based on the formulation. Experimental results reveal that the classification accuracy by 

the proposed method is almost the same as or better than Support Vector Machine (SVM), while the number of obtained 

reference vectors by the proposed method is much less than that of support vectors by SVM. 

13:30-16:30, Paper WeBCT8.6 

Variable Selection for Five-Minute Ahead Electricity Load Forecasting 

Koprinska, Irena, Univ. of Sydney 

Sood, Rohen, Univ. of Sydney 

Agelidis, Vassilios, Univ. of Sydney 

We use autocorrelation analysis to extract 6 nested feature sets of previous electricity loads for 5-minite ahead electricity 

load forecasting. We evaluate their predictive power using Australian electricity data. Our results show that the most important 

variables for accurate prediction are previous loads from the forecast day, 1, 2 and 7 days ago. By using also load 

variables from 3 and 6 days ago, we achieved small further improvements. The 3 bigger feature sets (37-51 features) when 

used with linear regression and support vector regression algorithms, were more accurate than the benchmarks. The overall 

best prediction model in terms of accuracy and training time was linear regression using the set of 51 features. 

13:30-16:30, Paper WeBCT8.7 

Enhancing Web Page Classification via Local Co-Training 

Du, Youtian, Xi’an Jiaotong Univ. 

Guan, Xiaohong, Xi’an Jiaotong Univ., Tsinghua University 

Cai, Zhongmin, Xi’an Jiaotong Univ. 

- 213 -

In this paper we propose a new multi-view semi-supervised learning algorithm called Local Co-Training(LCT). The proposed 

algorithm employs a set of local models with vector outputs to model the relations among examples in a local region 

on each view, and iteratively refines the dominant local models (i.e. the local models related to the unlabeled examples 

chosen for enriching the training set) using unlabeled examples by the co-training process. Compared with previous cotraining 

style algorithms, local co-training has two advantages: firstly, it has higher classification precision by introducing 

local learning; secondly, only the dominant local models need to be updated, which significantly decreases the computational 

load. Experiments on WebKB and Cora datasets demonstrate that LCT algorithm can effectively exploit unlabeled 

data to improve the performance of web page classification. 

13:30-16:30, Paper WeBCT8.8 

Robust Face Recognition using Multiple Self-Organized Gabor Features and Local Similarity Matching 

Aly, Saleh, Kyushu Univ. 

Shimada, Atsushi, Kyushu Univ. 

Tsuruta, Naoyuki, Fukuoka Univ. 

Taniguchi, Rin-Ichiro, Kyushu Univ. 

Gabor-based face representation has achieved enormous success in face recognition. However, one drawback of Gaborbased 

face representation is the huge amount of data that must be stored. Due to the nonlinear structure of the data obtained 

from Gabor response, classical linear projection methods like principal component analysis fail to learn the distribution 

of the data. A nonlinear projection method based on a set of self-organizing maps is employed to capture this nonlinearity 

and to represent face in a new reduced feature space. The Multiple Self-Organized Gabor Features (MSOGF) algorithm 

is used to represent the input image using all winner indices from each SOM map. A new local matching algorithm based 

on the similarity between local features is also proposed to classify unlabeled data. Experimental results on FERET database 

prove that the proposed method is robust to expression variations. 

13:30-16:30, Paper WeBCT8.9 

Exploring Pattern Selection Strategies for Fast Neural Network Training 

Vajda, Szilard, Tech. Univ. of Dortmund 

Fink, Gernot, TU Dortmund Univ. 

Nowadays, the usage of neural network strategies in pattern recognition is a widely considered solution. In this paper we 

propose three different strategies to select more efficiently the patterns for a fast learning in such a neural framework by 

reducing the number of available training patterns. All the strategies rely on the idea of dealing just with samples close to 

the decision boundaries of the classifiers. The effectiveness (accuracy, speed) of these methods is confirmed through different 

experiments on the MNIST handwritten digit data [1], Bangla handwritten numerals [2] and the Shuttle data from 

the UCI machine learning repository [3]. 

13:30-16:30, Paper WeBCT8.10 

The Detection of Concept Frames using Clustering Multi-Instance Learning 

Tax, David, Delft Univ. of Tech. 

Hendriks, E. , Delft Univ. of Tech. 

Valstar, Michel, Imperial Coll. 

Pantic, M., Imperial Coll. 

The classification of sequences requires the combination of information from different time points. In this paper the detection 

of facial expressions is considered. Experiments on the detection of certain facial muscle activations in videos 

show that it is not always required to model the sequences fully, but that the presence of specific frames (the concept 

frame) can be sufficient for a reliable detection of certain facial expression classes. For the detection of these concept 

frames a standard classifier is often sufficient, although a more advanced clustering approach performs better in some 

cases. 

13:30-16:30, Paper WeBCT8.11 

Kernel Domain Description with Incomplete Data: Using Instance-Specific Margins to Avoid Imputation 

Gripton, Adam, Heriot-Watt Univ. 

Lu, Weiping, Heriot-Watt Univ. 

- 214 -

We present a method of performing kernel space domain description of a dataset with incomplete entries without the need 

for imputation, allowing kernel features of a class of data with missing features to be rigorously described. This addresses 

the problem that absent data completion is usually required before kernel classifiers, such as support vector domain description 

(SVDD), can be applied; equally, few existing techniques for incomplete data adequately address the issue of 

kernel spaces. Our method, which we call instance-specific domain description (ISDD), uses a parametrisation framework 

to compute minimal kernelised distances between data points with missing features through a series of optimisation runs, 

allowing evaluation of the kernel distance while avoiding subjective completions of missing data. We compare results of 

our method against those achieved by SVDD applied to an imputed dataset, using synthetic and experimental datasets 

where feature absence has a non-trivial structure. We show that our methods can achieve tighter sphere bounds when applied 

to linear and quadratic kernels. 

13:30-16:30, Paper WeBCT8.12 

Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts 

Fausser, Stefan, Univ. of Ulm 

Schwenker, Friedhelm, Univ. of Ulm 

Having a large game-tree complexity and being EXPTIME-complete, English Draughts, recently weakly solved during 

almost two decades, is still hard to learn for intelligent computer agents. In this paper we present a Temporal-Difference 

method that is nonlinear neural approximated by a 4-layer multi-layer perceptron. We have built multiple English Draughts 

playing agents, each starting with a randomly initialized strategy, which use this method during self-play to improve their 

strategies. We show that the agents are learning by comparing their winning-quote relative to their parameters. Our best 

agent wins versus the computer draughts programs Neuro Draughts, KCheckers and CheckerBoard with the easych engine 

and looses to Chinook, GuiCheckers and CheckerBoard with the strong cake engine. Overall our best agent has reached 

an amateur league level. 

13:30-16:30, Paper WeBCT8.13 

Learning the Kernel Combination for Object Categorization 

Zhang, Deyuan, Harbin Inst. of Tech. 

Wang, Xiaolong, Harbin Inst. of Tech. 

Liu, Bingquan, Harbin Inst. of Tech. 

Although Support Vector Machines(SVM) succeed in classifying several image databases using image descriptors proposed 

in the literature, no single descriptor can be optimal for general object categorization. This paper describes a novel framework 

to learn the optimal combination of kernels corresponding to multiple image descriptors before SVM training, leading 

to solve a quadratic programming problem efficiently. Our framework takes into account the variation of kernel matrix 

and imbalanced dataset, which are common in real world image categorization tasks. Experimental results on Graz-01 

and Caltech-101 image databases show the effectiveness and robustness of our algorithm. 

13:30-16:30, Paper WeBCT8.14 

SemiCCA: Efficient Semi-Supervised Learning of Canonical Correlations 

Kimura, Akisato, NTT Corp. 

Kameoka, Hirokazu, NTT Corp. 

Sugiyama, Masashi, Tokyo Inst. of Tech. 

Nakano, Takuho, University of Tokyo 

Maeda, Eisaku, Communication Science Lab. 

Sakano, Hitoshi, NTT 

Ishiguro, Katsuhiko, NTT 

Canonical correlation analysis (CCA) is a powerful tool for analyzing multi-dimensional paired data. However, CCA tends 

to perform poorly when the number of paired samples is limited, which is often the case in practice. To cope with this 

problem, we propose a semi-supervised variant of CCA named “Semi CCA” that allows us to incorporate additional unpaired 

samples for mitigating overfitting. The proposed method smoothly bridges the eigenvalue problems of CCA and 

principal component analysis (PCA), and thus its solution can be computed efficiently just by solving a single (generalized) 

eigenvalue problem as the original CCA. Preliminary experiments with artificially generated samples and PASCAL VOC 

data sets demonstrate the effectiveness of the proposed method. 

- 215 -

13:30-16:30, Paper WeBCT8.15 

Spatial String Matching for Image Classification 

Liu, Yunqiang, Barcelona Media - Innovation Center 

Caselles, Vicent, Univ. Pompeu Fabra 

This paper presents a spatial string matching method to incorporate spatial information into the bag-of-words model, which 

represents an image as an unordered distribution of local features. Spatial constraints among neighboring features are explored 

in order to achieve better discrimination power for image classification. The features from neighboring points are 

combined together and taken as a spatial string, and then our method matches the images according to the similarity of 

string pairs. The categorization problem can be formulated using KNN or SVM classifier based on the spatial string matching 

kernel. The proposed method is able to capture spatial dependencies across the neighboring features. Experiment 

results show promising performance for image classification tasks. 

13:30-16:30, Paper WeBCT8.16 

A Semi-Supervised Gaussian Mixture Model for Image Segmentation 

Martínez-Usó, Adolfo, Univ. Jaume I 

Pla, F., Univ. Jaume I 

Martínez Sotoca, Jose, Univ. Jaume I 

In this paper, the results of a semi-supervised approach based on the Expectation-Maximisation algorithm for model-based 

clustering are presented. We show in this work that, if the appropriate generative model is chosen, the classification accuracy 

on clustering for image segmentation can be significantly improved by the combination of a reduced set of labelled 

data and a large set of unlabelled data. This technique has been tested on real images as well as on medical images from 

a dermatology application. The preliminary results are quite promising. Not only the unsupervised accuracies have been 

improved as expected but the segmentation results obtained are considerably better than the results obtained by other powerful 

and well-known unsupervised image segmentation techniques. 

13:30-16:30, Paper WeBCT8.17 

Adding Classes Online in Error Correcting Output Codes Framework 

Escalera, Sergio, UB 

Masip, David, CVC, UOC 

Puertas, Eloi, Univ. de Barcelona 

Radeva, Petia, CVC 

Pujol, Oriol, UB 

This article proposes a general extension of the Error Correcting Output Codes (ECOC) framework to the online learning 

scenario. As a result, the final classifier handles the addition of new classes independently of the base classifier used. Validation 

on UCI database and two real machine vision applications show that the online problem-dependent ECOC proposal 

provides a feasible and robust way for handling new classes using any base classifier. 

13:30-16:30, Paper WeBCT8.18 

Training Multi-Level Features for the RobotVision@ICPR 2010 Challenge 

Sebastien, Paris, Univ. de la Méditerranée 

Herve, Glotin, LSIS 

This paper combines and proposes two novel multi-level spatial pyramidal (sp) features: spELBP (Extended Local Binary 

Pattern), spELBOP (Extended Local Binary Orientation Pattern) and spHOEE (Histogram of Oriented Edge Energy). 

These features feed state-of-the-art SVM algorithms for the localization of a robot in indoor environments. Two tasks are 

associated with the RobotVision@ICPR 2010 Challenge, the first one uses only a frame of stereoscopic images, the second 

takes into account the dynamics of the robot for improving results. Our scores are ranked 3rd for Task1 and 1st for Task2 

13:30-16:30, Paper WeBCT8.19 

Subclass Error Correcting Output Codes using Fisher’s Linear Discriminant Ratio 

Arvanitopoulos, Nikolaos, Aristotle Univ. of Thessaloniki 

Bouzas, Dimitrios, Aristotle Univ. of Thessaloniki 

- 216 -

Tefas, Anastasios, Aristotle Univ. of Thessaloniki 

Error-Correcting Output Codes (ECOC) with sub-classes reveal a common way to solve multi-class classification problems. 

According to this approach, a multi-class problem is decomposed into several binary ones based on the maximization of 

the mutual information (MI) between the classes and their respective labels. The MI is modelled through the fast quadratic 

mutual information (FQMI) procedure. However, FQMI is not applicable on large datasets due to its high algorithmic 

complexity. In this paper we propose Fisher’s Linear Discriminant Ratio (FLDR) as an alternative decomposition criterion 

which is of much less computational complexity and achieves in most experiments conducted better classification performance. 

Furthermore, we compare FLDR against FQMI for facial expression recognition over the Cohn-Kanade database. 

13:30-16:30, Paper WeBCT8.20 

Pattern Recognition Method using Ensembles of Regularities Found by Optimal Partitioning 

Senko, Oleg, Inst. of Russian Acad. of Sciences 

Kuznetsova, Anna, Inst. of Russian Acad. of Sciences 

New pattern recognition method is considered that is based on ensembles of syndromes. The developed method that is referred 

to as Multi-model statistically weighted syndromes (MSWS) is further development of earlier Statistically Weighted 

Syndromes (SWS) method. Syndromes are subregions in space of prognostic features where content of objects from one 

of the classes differs significantly from the same class contents in neighboring subregions. Syndromes are discussed as 

simple basic classifiers that are combined with the help of weighted voting procedure. Method of optimal partitioning of 

input features space is used for syndromes searching. At that syndromes are selected depending on quality of data separation 

and complexity of used partitioning model (partitions family). Performance of MSWS is compered with performance of 

SWS and alternative techniques in several applied tasks. Influence of recognition ability on characteristics of syndromes 

selection is studied. 

13:30-16:30, Paper WeBCT8.21 

A Geometric Radial Basis Function Network for Robot Perception and Action 


Vázquez Santacruz, Eduardo, CINVESTAV, Unidad Guadalajara 

This paper presents a new hyper complex valued Radial Basis Network. This network constitutes a generalization of the 

standard real valued RBF. This geometric RBF can be used in real time to estimate changes in linear transformations between 

sets of geometric entities. Experiments using stereo image sequences validate this proposal. We propose a Geometric 

RBF Network (GRBF-N) designed in the geometric algebra framework. We present an application to estimate linear transformations 

between sets of geometric entities. Our experiments validate our proposal. 

13:30-16:30, Paper WeBCT8.22 

Kernel on Graphs based on Dictionary of Paths for Image Retrieval 

Haugeard, Jean-Emmanuel, ETIS, CNRS, ENSEA, Univ. Cergy-Pontoise 

Philipp-Foliguet, Sylvie, ENSEA/UCP/CNRS 

Gosselin, Philippe Henri, CNRS 

Recent approaches of graph comparison consider graphs as sets of paths. Kernels on graphs are then computed from 

kernels on paths. A common strategy for graph retrieval is to perform pairwise comparisons. In this paper, we propose to 

follow a different strategy, where we collect a set of paths into a dictionary, and then project each graph to this dictionary. 

Then, graphs can be classified using powerful classification methods, such as SVM. Furthermore, we collect the paths 

through interaction with a user. This strategy is ten times faster than a straight comparisons of paths. Experiments have 

been carried out on a database of city windows. 

13:30-16:30, Paper WeBCT8.23 

An Efficient Active Constraint Selection Algorithm for Clustering 

Vu, Viet-Vu, Univ. Pierre et Marie Curie - Paris 6 

Labroche, Nicolas, Univ. Pierre et Marie Curie - Paris 6 

Bouchon-Meunier, Bernadette, Univ. Pierre et Marie Curie - Paris 6 

- 217 -

In this paper, we address the problem of active query selection for clustering with constraints. The objective is to determine 

automatically a set of queries and their associated must-link and can-not link constraints to help constraints based clustering 

algorithms to converge. Some works on active constraints learning have already been proposed but they are only applied 

to K-Means like clustering algorithms which are known to be limited to spherical clusters while we are interested in constraints-based 

clustering algorithms that deals with clusters of arbitrary shapes and sizes (like Constrained-DBSCAN, 

Constrained-Hierarchical Clustering. . . ). Our novel approach relies on a k-nearest neighbors graph to estimate the dense 

regions of the data space and generates queries at the frontier between clusters where the cluster membership is most uncertain. 

Experiments show that our framework improves the performance of constraints based clustering algorithms. 

13:30-16:30, Paper WeBCT8.24 

Fuzzy Support Vector Machines for ECG Arrhythmia Detection 

Özcan, N. Özlem, Boğaziçi Univ. 

Gürgen, Fikret, Boğaziçi Univ. 

Besides cardiovascular diseases, heart attacks are the main cause of death around the world. Pre-monitoring or pre-diagnostic 

helps to prevent heart attacks and strokes. ECG plays a key role in this regard. In recent studies, SVM with different 

kernel functions and parameter values are applied for classification on ECG data. The classification model of SVM can 

be improved by assigning membership values for inputs. SVM combined with fuzzy theory, FSVM, is exercised on UCI 

Arrhythmia Database. Five different membership functions are defined. It is shown that the accuracy of classification can 

be improved by defining appropriate membership functions. ANFIS is used in order to interpret the resulting classification 

model. The ANFIS model of the ECG data is compared to and found consistent with the medical knowledge. 

13:30-16:30, Paper WeBCT8.25 

ROC Analysis and Cost-Sensitive Optimization for Hierarchical Classifiers 

Paclik, Pavel, PR Sys Design 

Lai, Carmen, TU Delft 

Landgrebe, Thomas, De Beers 


Instead of solving complex pattern recognition problems using a single complicated classifier, it is often beneficial to 

leverage our prior knowledge and decompose the problem into parts. These may be tackled using specific feature subsets 

and simpler classifiers resulting in a hierarchical system. In this paper, we propose an efficient and scalable approach for 

cost-sensitive optimization of a general hierarchical classifier using ROC analysis. This allows the designer to view the 

hierarchy of trained classifiers as a system, and tune it according to the application needs. 

13:30-16:30, Paper WeBCT8.26 

Variational Mixture of Experts for Classification with Applications to Landmine Detection 

Yuksel, Seniha Esen, Univ. of Florida 


In this paper, we (1) provide a complete framework for classification using Variational Mixture of Experts (VME); (2) derive 

the variational lower bound; and (3) apply the method to landmine, or simply mine, detection and compare the results 

to the Mixtures of Experts trained with Expectation Maximization (EMME). VME has previously been used for regression 

and Waterhouse explained how to apply VME to classification (which we will call as VMEC). However, the steps to train 

the model were not made clear since the equations were applicable to vector valued parameters as opposed to matrices for 

each expert. Also, a variational lower bound was not provided. The variational lower bound provides an excellent stopping 

criterion that resists over-training. We demonstrate the efficacy of the method on real-world mine classification; in which, 

training robust mine classification algorithms is difficult because of the small number of samples per class. In our experiments 

VMEC consistently improved performance over EMME. 

13:30-16:30, Paper WeBCT8.27 

A Unifying Framework for Learning the Linear Combiners for Classifier Ensembles 


Sen, Mehmet Umut, Sabanci Univ. 

- 218 -

For classifier ensembles, an effective combination method is to combine the outputs of each classifier using a linearly 

weighted combination rule. There are multiple ways to linearly combine classifier outputs and it is beneficial to analyze 

them as a whole. We present a unifying framework for multiple linear combination types in this paper. This unification 

enables using the same learning algorithms for different types of linear combiners. We present various ways to train the 

weights using regularized empirical loss minimization. We propose using the hinge loss for better performance as compared 

to the conventional least-squares loss. We analyze the effects of using hinge loss for various types of linear weight training 

by running experiments on three different databases. We show that, in certain problems, linear combiners with fewer parameters 

may perform as well as the ones with much larger number of parameters even in the presence of regularization. 

13:30-16:30, Paper WeBCT8.28 

Reinforcement Learning for Robust and Efficient Real-World Tracking 

Cohen, Andre, Rutgers Univ. 

Pavlovic, Vladimir, Rutgers Univ. 

In this paper we present a new approach for combining several independent trackers into one robust real-time tracker. Unlike 

previous work that employ multiple tracking objectives used in unison, our tracker manages to determine an optimal sequence 

of individual trackers given the characteristics present in the video and the desire to achieve maximally efficient tracking. 

This allows for the selection of fast less-robust trackers when little movement is sensed, while using more robust but computationally 

intensive trackers in more dynamic scenes. We test this approach on the problem of real-world face tracking. 

Results show that this approach is a viable method for combining several independent trackers into one robust real-time 

tracker capable of tracking faces in varied lighting conditions, video resolutions, and with occlusions. 

13:30-16:30, Paper WeBCT8.29 

An Efficient and Stable Algorithm for Learning Rotations 

Arora, Raman, Univ. of Washington 

Sethares, William A., Univ. of Wisconsin-Madison 

This paper analyses the computational complexity and stability of an online algorithm recently proposed for learning rotations. 

The proposed algorithm involves multiplicative updates that are matrix exponentials of skew-symmetric matrices comprising 

the Lie algebra of the rotation group. The rank-deficiency of the skew-symmetric matrices involved in the updates is exploited 

to reduce the updates to a simple quadratic form. The Lyapunov stability of the algorithm is established and the application 

of the algorithm to registration of point-clouds in n-dimensional Euclidean space is discussed. 

13:30-16:30, Paper WeBCT8.30 

An Incremental Learning Algorithm for Nonstationary Environments and Class Imbalance 

Ditzler, Greg, Rowan Univ. 

Chawla, Nitesh, Univ. of Notre Dame 

Polikar, Robi, Rowan Univ. 

Learning in a non-stationary environment and in the presence of class imbalance has been receiving more recognition from the computational 

intelligence community, but little work has been done to create an algorithm or a framework that can handle both issues simultaneously. We 

have recently introduced a new member to the Learn++ family of algorithms, Learn++.NSE, which is designed to track non-stationary environments. 

However, this algorithm does not work well when there is class imbalance as it has not been designed to handle this problem. On 

the other hand, SMOTE a popular algorithm that can handle class imbalance is not designed to learn in nonstationary environments because 

it is a method of over sampling the data. In this work we describe and present preliminary results for integrating SMOTE and Learn++.NSE 

to create an algorithm that is robust to learning in a non-stationary environment and under class imbalance. 

13:30-16:30, Paper WeBCT8.31 

Feature-Based Partially Occluded Object Recognition 

Fan, Na, East China Normal Univ. 

We propose a framework to combine geometry, color and texture information among pairwise feature points into a graph and find the correct 

assignments from all candidates using graph matching techniques. Because of our informative similarity matrix, objects can be still recognized 

under severe occlusion and the matching errors can be greatly reduced when images are taken from very different view angles and partial occluded. 

- 219 -

13:30-16:30, Paper WeBCT8.32 

A Sample Pre-Mapping Method Enhancing Boosting for Object Detection 

Ren, Haoyu, Chinese Acad. of Sciences 

Hong, Xiaopeng, Harbin Inst. of Tech. 

Heng, Cher Keng, Panasonic Singapore Lab. Pte Ltd 

Liang, Luhong, Chinese Acad. of Sciences 

Chen, Xilin, Chinese Acad. of Sciences 

We propose a novel method to improve the training efficiency and accuracy of boosted classifiers for object detection. 

The key step of the proposed method is a sample pre-mapping on original space by referring to the selected reference 

sample before feeding into weak classifiers. The reference sample corresponds to an approximation of the optimal separating 

hyper-plane in an implicit high dimensional space, so that the resulting classifier could achieve the performance 

similar to kernel method, while spending the computation cost of linear classifier in both training and detection. We employ 

two different non-linear mappings to verify the proposed method under boosting framework. Experimental results show 

that the proposed approach achieves performance comparable with the common used methods on public datasets in both 

pedestrian detection and car detection. 

13:30-16:30, Paper WeBCT8.33 

Context Inspired Pedestrian Detection in Far-field Videos 

Ma, Wenhua, Chinese Acad. of Sciences 

He, Peng, Chinese Acad. of Sciences 



A novel pedestrian detection method that integrates context information with slide window search is proposed. The method 

applies notions such as corner, motion, and appearance to localize pedestrians in far-field videos without performing bruteforce-search. 

The corners direct attention to a set of conspicuous locations as the starting points for searching. And motion 

detection restricts the searching area within the foreground mask. Based on the above two, slide window search is applied 

to confirm the exact locations of pedestrians. Experiments demonstrate that the proposed method is efficient in detecting 

pedestrians in far-field videos. 

13:30-16:30, Paper WeBCT8.34 

Theme-Based Multi-Class Object Recognition and Segmentation 

Wu, Shilin, Chinese Acad. of Sciences 

Geng, Jiajia, Chinese Acad. of Sciences 

Zhu, Feng, Chinese Acad. of Sciences 

In this paper, we propose a new theme-based CRF model and investigate its performance on class based pixel-wise segmentation 

of images. By including the theme of an image, we also propose a new texture-environment potential to represent 

texture environment of a pixel, which alone gives satisfactory recognition results. The pixel-wise segmentation accuracy 

is remarkably improved by introducing texture potential. We compare our results to recent published results on the MSRC 

21-class database and show that our theme-based CRF model significantly outperforms the current state-of-the-art. Especially, 

by assigning a theme for each image, our model obtains greatly improved accuracy of structured classes with high 

visual variability and fewer training examples, the accuracy of which is very low in most related works. 

13:30-16:30, Paper WeBCT8.35 

Boosted Sigma Set for Pedestrian Detection 

Hong, Xiaopeng, Harbin Inst. of Tech. 

Chang, Hong, Chinese Acad. of Sciences 

Chen, Xilin, Chinese Acad. of Sciences 

Gao, Wen, PeKing Univ. 

This paper presents a new method to detect pedestrian in still image using Sigma sets as image region descriptors in the 

boosting framework. Sigma set encodes second order statistics of an image region implicitly in the form of a point set. 

Compared with the covariance matrix, the traditional second order statistics based region descriptor, which requires computationally 

demanding operations based on Riemannian manifold, Sigma set preserves similar robustness and discrimi- 

- 220 -

native power more efficiently because the classification on Sigma sets can be directly performed in vector space. Experimental 

results on the INRIA and the Daimler Chrysler pedestrian datasets show the effectiveness and efficiency of the 

proposed method. 

13:30-16:30, Paper WeBCT8.36 

Reverse Indexing for Reading Graffiti Tags 



In this paper, we consider the problem of automatically reading graffiti tags. As a preparatory step, we create a large set 

of synthetic graffiti-like characters, generated from publicly available true type fonts. For each character in the database, 

we extract a number of scale independent local binary descriptors. Then, using binary non negative matrix factorization, 

a sufficient number of basis functions are learned. Basis function coefficients of novel images can then be directly used 

for hashing characters from the database of prototypes. Finally, graffiti tags are recognized by means of a localized, spatial 

voting scheme. 

13:30-16:30, Paper WeBCT8.37 

Generic Object Recognition by Tree Conditional Random Field based on Hierarchical Segmentation 

Okumura, Takeshi, Kobe Univ. 

Takiguchi, Tetsuya, Kobe Univ. 

Ariki, Yasuo, Kobe Univ. 

Generic object recognition by a computer is strongly required in various fields like robot vision and image retrieval in 

recent years. Conventional methods use Conditional Random Field (CRF) that recognizes the class of each region using 

the features extracted from the local regions and the class co-occurrence between the adjoining regions. However, there 

is a problem that the discriminative ability of the features extracted from local regions is insufficient, and these methods 

is not robust to the scale variance. To solve this problem, we propose a method that integrates the recognition results in 

multi-scales by tree conditional random field based on hierarchical segmentation. As a result of the image dataset of 7 

classes, the proposed method has improved the recognition rate by 2.2%. 

13:30-16:30, Paper WeBCT8.38 

A Fast Approach for Pixelwise Labeling of Facade Images 

Fröhlich, Björn, Friedrich-Schiller Univ. of Jena 

Rodner, Erik, Friedrich-Schiller Univ. of Jena 

Denzler, Joachim, Friedrich-Schiller Univ. of Jena 

Facade classification is an important subtask for automatically building large 3d city models. In the following we present 

an approach for pixel wise labeling of facade images using an efficient Randomized Decision Forest classifier and robust 

local color features. Experiments are performed with a popular facade dataset and a new demanding dataset of pixel wise 

labeled images from the Label Me project. Our method achieves high recognition rates and is significantly faster for 

training and testing than other Methods based on expensive feature transformation techniques. 

13:30-16:30, Paper WeBCT8.39 

Real-Time Traffic Sign Detection: An Evaluation Study 

Li, Ying, IBM T. J. Watson Res. Center 

Guan, Weiguang, IBM 

Pankanti, Sharath 

This paper presents an experimental evaluation of three different traffic sign detection approaches, which detect or localize 

various types of traffic signs from real-time videos. Specifically, the first approach exploits geometric features to identify 

traffic signs, while the other two are developed based on SVM (Support Vector Machine) and AdaBoost learning mechanisms. 

We describe each of the three approaches, conduct a detailed comparison among them, and examine their pros and 

cons. Our conclusions should lead to useful guidelines for developing a real-time traffic sign detector. 

- 221 -

13:30-16:30, Paper WeBCT8.40 

Image Categorization by Learned Nonlinear Subspace of Combined Visual-Words and Low-Level Features 



Ruan, Xiang, Omron Corparation 

Image category recognition is important to access visual information on the level of objects and scene types. This paper 

presents a new algorithm for the automatic recognition of object and scene classes. Compact and yet discriminative visual-words 

and low-level-features object class subspaces are automatically learned from a set of training images by a Supervised 

Nonlinear Neighborhood Embedding (SNNE) algorithm, which can learn an adaptive nonlinear subspace by 

preserving the neighborhood structure of the visual feature space. The main contribution of this paper is two fold: i) an 

optimally compact and discriminative feature subspace is learned by the proposed SNNE algorithm for different feature 

space (visual-word and low-level features). ii) An effective merge of different feature subspace can be implemented simply. 

High classification accuracy is demonstrated on different database including the scene databas (Simplicity) and object 

recognition database (Caltech). We confirm that the proposed strategy is much better than state-of-the-art methods for different 

databases. 

13:30-16:30, Paper WeBCT8.41 

Can Motion Segmentation Improve Patch-Based Object Recognition? 

Ulges, Adrian, DFKI 

Breuel, Thomas - 

Patch-based methods, which constitute the state of the art in object recognition, are often applied to video data, where 

motion information provides a valuable clue for separating objects of interest from the background. We show that such 

motion-based segmentation improves the robustness of patch-based recognition with respect to clutter. Our approach, 

which employs segmentation information to rule out incorrect correspondences between training and test views, is demonstrated 

empirically to distinctly outperform baselines operating on unsegmented images. Relative improvements reach 

50% for the recognition of specific objects, and 33% for object category retrieval. 

13:30-16:30, Paper WeBCT8.42 

Semi-Supervised and Interactive Semantic Concept Learning for Scene Recognition 



Ruan, Xiang, Omron Corparation 

In this paper, we present a novel semi-supervised and interactive concept learning algorithm for scene recognition by local 

semantic description. Our work is motivated by the continuing effort in content-based image retrieval to extract and to 

model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic 

concept classes such as water, sunset, or sky [1]. However, labeling concept sampling manually for training semantic 

model is fairly expensive, and the labeling results is, to some extent, subjective to the operators. In this paper, by using 

the proposed semi-supervised and interactive learning algorithm, training samples and new concepts can be obtained accurately 

and efficiently. Through extensive experiments, we demonstrate that the image concept representation is well 

suited for modeling the semantic content of heterogenous scene categories, and thus for recognition and retrieval. Furthermore, 

higher recognition accuracy can be achieved by updating new training samples and concepts, which are obtained 

by the novel proposed algorithm. 

13:30-16:30, Paper WeBCT8.43 

Dense Structure Inference for Object Classification in Aerial LIDAR Dataset 

Kim, Eunyoung, Univ. of Southern California 


We present a framework to classify small freeform objects in 3D aerial scans of a large urban area. The system first identifies 

large structures such as the ground surface and roofs of buildings densely built in the scene, by fitting planar patches 

and grouping adjacent patches similar in pose together. Then, it segments initial object candidates which represent the 

visible surface of an object using the identified structures. To deal with sparse density in points representing each candidate, 

we also propose a novel method to infer a dense 3D structure from the given sparse and noisy points without any meshes 

- 222 -

and iterations. To label object candidates, we build a tree-structure database of object classes, which captures latent patterns 

in shape of 3D objects in a hierarchical manner. We demonstrate our system on the aerial LIDAR dataset acquired from a 

few square kilometers of Ottawa. 

13:30-16:30, Paper WeBCT8.44 

Data-Driven Foreground Object Detection from a Non-Stationary Camera 

Sun, Shih-Wei, Acad. Sinica, Taiwan 

Huang, Fay, National Ilan Univ. Taiwan 

Liao, Mark, Acad. Sinica, Taiwan 

In this paper, we propose a data-driven foreground object detection technique which can detect foreground objects from 

a moving camera. We propose to build a data-driven consensus foreground object template (CFOT) and then detect the 

foreground object region in each frame. The proposed foreground object detection technique is equipped with the following 

functions: (1) the ability to detect the foreground object captured by a fast moving camera ; (2) the ability to detect a low 

contrast (spatially/temporally) foreground object; and (3) the ability to detect a foreground object from a dynamic background. 

There are three contributions of our method: (1) a newly proposed data-driven foreground region decision process 

for generating the CFOT has been shown robust and efficient; (2) a foreground object probability is proposed for properly 

dealing with the imperfect initial foreground region estimations; and (3) a CFOT is generated for precise foreground object 

detection. 

13:30-16:30, Paper WeBCT8.45 

Efficient Shape Retrieval under Partial Matching 

Demirci, Fatih, TOBB Univ. of Ec. and Tech. 

Indexing into large database systems is essential for a number of applications. This paper presents a new indexing structure, 

which overcomes an important restriction of a previous indexing technique using a recently developed theorem from the 

domain of matrix analysis. Specifically, given a set of distance values computed by distance function, which do not necessarily 

satisfy the triangle inequality, this paper shows that computing its nearest distance values that obey the properties 

of a metric enables us to overcome the limitations of the previous indexing algorithm. We demonstrate the proposed framework 

in the context of a recognition task. 

13:30-16:30, Paper WeBCT8.46 

Component Identification in the 3D Model of a Building 

Xu, Mai, Imperial Coll. 

Petrou, Maria, Imperial Coll. 

Jahangiri, Mohammad, Imperial Coll. 

This paper addresses the problem of identifying the components (such as balconies and windows) of the 3D model of a 

building. A novel method, based on a voting scheme, is presented for solving such a problem. It is intuitive that interference 

(such as shadows and occlusions) rarely happen at the same place or at different times when looking at a scene from different 

directions. In the spirit of this intuition, the voting-based method combines the information from various images to 

identify and segment the components of a building. 

13:30-16:30, Paper WeBCT8.48 

Multi-Scale Color Local Binary Patterns for Visual Object Classes Recognition 

Zhu, Chao, Ec. Centrale de Lyon 

Bichot, Charles-Edmond, Ec. Centrale de Lyon 


The Local Binary Pattern (LBP) operator is a computationally efficient yet powerful feature for analyzing local texture 

structures. While the LBP operator has been successfully applied to tasks as diverse as texture classification, texture segmentation, 

face recognition and facial expression recognition, etc., it has been rarely used in the domain of Visual Object 

Classes (VOC) recognition mainly due to its deficiency of power for dealing with various changes in lighting and viewing 

conditions in real-world scenes. In this paper, we propose six novel multi-scale color LBP operators in order to increase 

photometric invariance property and discriminative power of the original LBP operator. The experimental results on the 

- 223 -

PASCAL VOC 2007 image benchmark show significant accuracy improvement by the proposed operators as compared 

with both the original LBP and other popular texture descriptors such as Gabor filter. 

13:30-16:30, Paper WeBCT8.49 

Object Localization by Propagating Connectivity via Superfeatures 

Chakraborty, Ishani, Rutgers Univ. 

Elgammal, Ahmed, Rutgers Univ. 

In this paper, we propose a part-based approach to localize objects in cluttered images. We represent object parts as boundary 

segments and image patches. A semi-local grouping of parts named superfeatures encodes appearance and connectivity 

within a neighborhood. To match parts, we integrate inter-feature similarities and intra-feature connectivity via a relaxation 

labeling framework. Additionally, we use a global elliptical shape prior to match the shape of the solution space to that of 

the object. To this end, we demonstrate the efficacy of the method for detecting various objects in cluttered images by 

comparing them to simple object models. 

13:30-16:30, Paper WeBCT8.50 

Efficient Object Detection and Matching using Feature Classification 

Dornaika, Fadi, Univ. of the Basque Country 

Chakik, Fadi, Lebanese Univ. 

This paper presents a new approach for efficient object detection and matching in images and videos. We propose a stage 

based on a classification scheme that classifies the extracted features in new images into object features and non-object 

features. This binary classification scheme has turned out to be an efficient tool that can be used for object detection and 

matching. By means of this classification not only the matching process becomes more robust and faster but also the robust 

object registration becomes fast. We provide quantitative evaluations showing the advantages of using the classification 

stage for object matching and registration. Our approach could lend itself nicely to real-time object tracking and detection. 

13:30-16:30, Paper WeBCT8.51 

A Discriminative Model for Object Representation and Detection via Sparse Features 

Song, Xi, Beijing Inst. of Tech. 

Luo, Ping, Sun Yat-Sen Univ. 

Lin, Liang, Sun Yat-Sen Univ. 


This paper proposes a discriminative model that represents an object category with a batch of boosted image patches, motivated 

by detecting and localizing objects with sparse features. Instead of designing features carefully and category-specifically 

as in previous work, we extract a massive number of local image patches from the positive object instances and 

quantize them as weak classifiers. Then we extend the Adaboost algorithm for learning the patch-based model integrating 

object appearance and structure information. With the learned model, a few features are activated to localize instances in 

the testing images. In the experiments, we apply the proposed method with several public datasets and achieve advancing 

performance. 

13:30-16:30, Paper WeBCT8.52 

A Robust Recognition Technique for Dense Checkerboard Patterns 

Dao, Vinh Ninh, The Univ. of Tokyo 

Sugimoto, Masanori, The Univ. of Tokyo 

The checkerboard pattern is widely used in computer vision techniques for camera calibration and simple geometry acquisition, 

both in practical use and research. However, most of the current techniques fail to recognize the checkerboard 

pattern under distorted, occluded or discontinuous conditions, especially when the checkerboard pattern is dense. This 

paper proposes a novel checkerboard recognition technique that is robust to noise, surface distortion or discontinuity, supporting 

checkerboard recognition in dynamic conditions for a wider range of applications. When the checkerboard pattern 

is used in a projector camera system for geometry reconstruction, by using epipolar geometry, this technique can recognize 

the corresponding positions of the crossing points, even if the checkerboard pattern is only partly detected. 

- 224 -

13:30-16:30, Paper WeBCT8.53 

Spike-Based Convolutional Network for Real-Time Processing 

Pérez-Carrasco, Jose-Antonio, Univ. de Sevilla 

Serrano-Gotarredona, Carmen, Univ. de Sevilla 

Acha-Piñero, Begoña, Univ. de Sevilla 

Serrano-Gotarredona, Teresa, Univ. de Sevilla 

Linares-Barranco, Bernabe, Univ. de Sevilla 

In this paper we propose the first bio-inspired six layer convolutional network (ConvNet) non-frame based that can be implemented 

with already physically available spikebased electronic devices. The system was designed to recognize people in 

three different positions: standing, lying or up-side down. The inputs were spikes obtained with a motion retina chip. We 

provide simulation results showing recognition delays of 16 milliseconds from stimulus onset (time-to-first spike) with a 

recognition rate of 94%. The weight sharing property in ConvNets and the use of AER protocol allow a great reduction in 

the number of both trainable parameters and connections (only 748 trainable parameters and 123 connections in our AER 

system (out of 506998 connections that would be required in a frame-based implementation). 

13:30-16:30, Paper WeBCT8.54 

Learning Affordances for Categorizing Objects and Their Properties 

Dag, Nilgun, Middle East Tech. Univ. 

Atil, Ilkay, Middle East Tech. Univ. 

Kalkan, Sinan, Middle East Tech. Univ. 

Sahin, Erol, Middle East Tech. Univ. 

In this paper, we demonstrate that simple interactions with objects in the environment leads to a manifestation of the perceptual 

properties of objects. This is achieved by deriving a condensed representation of the effects of actions (called effect prototypes 

in the paper), and investigating the relevance between perceptual features extracted from the objects and the actions that can 

be applied to them. With this at hand, we show that the agent can categorize (i.e., partition) its raw sensory perceptual feature 

vector, extracted from the environment, which is an important step for development of concepts and language. Moreover, 

after learning how to predict the effect prototypes of objects, the agent can categorize objects based on the predicted effects 

of actions that can be applied on them. 

13:30-16:30, Paper WeBCT8.55 

Feature Pairs Connected by Lines for Object Recognition 

Awais, Muhammad, Univ. of Surrey 


In this paper we exploit image edges and segmentation maps to build features for object category recognition. We build a 

parametric line based image approximation to identify the dominant edge structures. Line ends are used as features described 

by histograms of gradient orientations. We then form descriptors based on connected line ends to incorporate weak topological 

constraints which improve their discriminative power. Using point pairs connected by an edge assures higher repeatability 

than a random pair of points or edges. The results are compared with state-of-the-art, and show significant improvement on 

challenging recognition benchmark Pascal VOC 2007. Kernel based fusion is performed to emphasize the complementary 

nature of our descriptors with respect to the state-of-the-art features. 

13:30-16:30, Paper WeBCT8.56 

Using Gait Features for Improving Walking People Detection 

Bouchrika, Imed, Univ. of Southampton 

Carter, John, Univ. of Southampton 

Nixon, Mark, Univ. of Southampton 

Morzinger, Roland, Joanneum Res. 

Thallinger, Georg, Joanneum Res. 

In this paper, we explore a new approach for enriching the HoG method for pedestrian detection in an unconstrained outdoor 

environment. The proposed algorithm is based on using gait motion since the rhythmic footprint pattern for walking people 

is considered the stable and characteristic feature for the detection of walking people. The novelty of our approach is motivated 

by the latest research for people identification using gait. The experimental results confirmed the robustness of our method 

- 225 -

to enhance HoG to detect walking people as well as to discriminate between single walking subject, groups of people and 

vehicles with a detection rate of 100%. Furthermore, the results revealed the potential of our method to be used in visual surveillance 

systems for identity tracking over different camera views. 

13:30-16:30, Paper WeBCT8.57 

Learning-Based Vehicle Detection using Up-Scaling Schemes and Predictive Frame Pipeline Structures 

Tsai, Yi-Min, National Taiwan Univ. 

Huang, Keng-Yen, National Taiwan Univ. 

Tsai, Chih-Chung, National Taiwan Univ. 

Chen, Liang-Gee, National Taiwan Univ. 

This paper aims at detecting preceding vehicles in a variety of distance. A sub-region up-scaling scheme significantly raises 

far distance detection capability. Three frame pipeline structures involving object predictors are explored to further enhance 

accuracy and efficiency. It claims a 140-meter detecting distance along proposed methodology. 97.1% detection rate with 

4.2% false alarm rate is achieved. At last, the benchmark of several learning-based vehicle detection approaches is provided. 

13:30-16:30, Paper WeBCT8.58 

Dynamic Hand Pose Recognition using Depth Data 

Suryanarayan, Poonam, The Pennsylvania State Univ. 

Subramanian, Anbumani, HP Lab. 

Mandalapu, Dinesh, HP Lab. 

Hand pose recognition has been a problem of great interest to the Computer Vision and Human Computer Interaction community 

for many years and the current solutions either require additional accessories at the user end or enormous computation 

time. These limitations arise mainly due to the high dexterity of human hand and occlusions created in the limited view of 

the camera. This work utilizes the depth information and a novel algorithm to recognize scale and rotation invariant hand 

poses dynamically. We have designed a volumetric shape descriptor enfolding the hand to generate a 3D cylindrical histogram 

and achieved robust pose recognition in real time. 

13:30-16:30, Paper WeBCT8.59 

A Hierarchical GIST Model Embedding Multiple Biological Feasibilities for Scene Classification 

Han, Yina, Xi’an Jiaotong Univ. 

Liu, Guizhong, Xi’an Jiaotong Univ. 

We propose a hierarchical GIST model embedding multiple biological feasibilities for scene classification. In the perceptual 

layer, spatial layout of Gabor features are extracted in a bio-vision guided way: introducing diagnostic color information, 

tuning the orientations and scales of Gabor filters, as well as the spacial pooling size to a biological feasible value. In the 

conceptual layer, for the first time, we attempt to build a computational model for the biological conceptual GIST by kernel 

PCA based prototype representation, which is specific task orientated as biological GIST, and also in accordance with the 

unsupervised learning assumption in the primary visual cortex and prototype similarity based categorization in human cognition. 

Using around $200$ dimensions, our model is shown to outperform existing GIST models, and to achieve state-ofthe-art 

performances on four scene datasets. 

13:30-16:30, Paper WeBCT8.60 

Road Network Extraction using Edge Detection and Spatial Voting 

Sirmacek, Beril, Deutsches Zentrum fur Luft und Raumfahrt 

Unsalan, Cem, Yeditepe Univ. 

Road network detection from very high resolution satellite images is important for two main reasons. First, the detection 

result can be used in automated map making. Second, the detected network can be used in trajectory planning for unmanned 

aerial vehicles. Although an expert can label road pixels in a given satellite image, this operation is prone to errors. Therefore, 

an automated system is needed to detect the road network in a given satellite image in a robust manner. In this study, we propose 

a novel approach to detect the road network from a given panchromatic Ikonos satellite image. Our method has five 

main steps. First, we apply a nonlinear bilateral filtering to smooth the given image. Then, we extract Canny edges and the 

gradient information as local features. Using these local features, we generate a spatial voting matrix. This voting matrix in- 

- 226 -

dicates the possible locations of the road network pixels. By processing this voting matrix in an iterative manner, we detect 

initial road pixels. Finally, we apply a tracking algorithm on the voting matrix to detect the missing road pixels. We tested 

our method on various satellite images and provided the extracted road networks in the experiments section. 

13:30-16:30, Paper WeBCT8.61 

Decomposition Methods and Learning Approaches for Imbalanced Dataset: An Experimental Integration 

Soda, Paolo, Univ. Campus Bio-Medico di Roma 

Iannello, Giulio, Univ. Campus Bio-Medico di Roma 

Decomposition methods are multiclass classification schemes where the polychotomy is reduced into several dichotomies. 

Each dichotomy is addressed by a classifier trained on a training set derived from the original one on the basis of the decomposition 

rule adopted. These new training sets may present a disproportion between the classes, harming the global recognition 

accuracy. Indeed, traditional learning algorithms are biased towards the majority class, resulting in poor predictive accuracy 

over the minority one. This paper investigates if the application of learning methods specifically tailored for imbalanced 

training set introduces any performance improvement when used by dichotomizers of decomposition methods. The results 

on five public datasets show that the application of these learning methods improves the global performance of decomposition 

schemes. 

13:30-16:30, Paper WeBCT8.62 

The Balanced Accuracy and its Posterior Distribution 

Brodersen, Kay Henning, ETH Zurich 

Ong, Cheng Soon, ETH Zurich 

Stephan, Klaas Enno, Univ. of Zurich 

Buhmann, Joachim M., Swiss Federal Inst. of Tech. Zurich 

Evaluating the performance of a classification algorithm critically requires a measure of the degree to which unseen examples 

have been identified with their correct class labels. In practice, generalizability is frequently estimated by averaging the accuracies 

obtained on individual cross-validation folds. This procedure, however, is problematic in two ways. First, it does 

not allow for the derivation of meaningful confidence intervals. Second, it leads to an optimistic estimate when a biased 

classifier is tested on an imbalanced dataset. We show that both problems can be overcome by replacing the conventional 

point estimate of accuracy by an estimate of the posterior distribution of the balanced accuracy. 

WeBCT9, Lower Foyer 

Multimedia Analysis and Retrieval, Poster Session 

Session chair: Cetin, E. (Bilkent Univ.) 

13:30-16:30, Paper WeBCT9.1 

A Study on Detecting Patterns in Twitter Intra-Topic User and Message Clustering 

Cheong, Marc, Monash Univ. 

Lee, Vincent C S, Monash Univ. 

Timely detection of hidden patterns is the key for the analysis and estimating of driving determinants for mission critical decision 

making. This study applies Cheong and Lee’s context-aware content analysis framework to extract latent properties 

from Twitter messages (tweets). In addition, we incorporate an unsupervised Self-organizing Feature Map (SOM) as a machine 

learning-based clustering tool that has not been investigated in the context of opinion mining and sentimental analysis 

using microblogging. Our experimental results reveal the detection of interesting patterns for topics of interest which are 

latent and cannot be easily detected from the observed tweets without the aid of machine learning tools. 

13:30-16:30, Paper WeBCT9.2 

Classification of Near-Duplicate Video Segments based on Their Appearance Patterns 


Shamoto, Yuji, Nagoya Univ. 

Deguchi, Daisuke, Nagoya Univ. 



- 227 -

We propose a method that analyzes the structure of a large volume of general broadcast video data by the appearance patterns 

of near-duplicate video segments. We define six classification rules based on the appearance patterns of near-duplicate video 

segments according to their roles, and evaluated them over more than 1,000 hours of actual broadcast video data. 

13:30-16:30, Paper WeBCT9.3 

Motion Vector based Features for Content based Video Copy Detection 

Tasdemir, Kasim, Bilkent Univ. 

Cetin, E., Bilkent Univ. 

In this article, we propose a motion vector based feature set for Content Based Copy Detection (CBCD) of video clips. Motion 

vectors of image frames are one of the signatures of a given video. However, they are not descriptive enough when consecutive 

image frames are used because most vectors are too small. To overcome this problem we calculate motion vectors in a lower 

frame rate than the actual frame rate of the video. As a result we obtain longer vectors which form a robust parameter set representing 

a given video. Experimental results are presented. 

13:30-16:30, Paper WeBCT9.4 

A Statistical Learning Approach to Spatial Context Exploitation for Semantic Image Analysis 

Papadopoulos, Georgios Th., Centre for Res. and Tech. Hellas 

Mezaris, Vasileios, Centre for Res. and Tech. Hellas 

Kompatsiaris, Yiannis, Centre for Res. and Tech. Hellas 

Strintzis, Michael-Gerasimos, 

In this paper, a statistical learning approach to spatial context exploitation for semantic image analysis is presented. The proposed 

method constitutes an extension of the key parts of the authors’ previous work on spatial context utilization, where a Genetic 

Algorithm (GA) was introduced for exploiting fuzzy directional relations after performing an initial classification of image regions 

to semantic concepts using solely visual information. In the extensions reported in this work, a more elaborate approach 

is followed during the spatial knowledge acquisition and modeling process. Additionally, the impact of every resulting spatial 

constraint on the final outcome is adaptively adjusted. Experimental results as well as comparative evaluation on three datasets 

of varying complexity in terms of the total number of supported semantic concepts demonstrate the efficiency of the proposed 

method. 

13:30-16:30, Paper WeBCT9.5 

Wavelet-Based Texture Retrieval Modeling the Magnitudes of Wavelet Detail Coefficients with a Generalized Gamma Distribution 

De Ves Cuenca, Esther, Univ. of Valencia 

Benavent, Xaro, Univ. of Valencia 

Ruedin, Ana María Clara, Univ. de Buenos Aires 

Acevedo, Daniel Germán, Univ. de Buenos Aires 

Seijas, Leticia María, Univ. de Buenos Aires 

This paper presents a texture descriptor based on the fine detail coefficients at three resolution levels of a traslation invariant 

undecimated wavelet transform. First, we consider vertical and horizontal wavelet detail coefficients at the same position as the 

components of a bivariate random vector, and the magnitude and angle of these vectors are computed. The magnitudes are modeled 

by a Generalized Gamma distribution. Their parameters, together with the circular histograms of angles, are used to characterize 

each texture image of the database. The Kullback-Leibler divergence is used as the similarity measurement. Retrieval 

experiments, in which we compare two wavelet transforms, are carried out on the Brodatz texture collection. Results reveal the 

good performance of this wavelet-based texture descriptor obtained via the Generalized Gamma distribution. 

13:30-16:30, Paper WeBCT9.6 

3D-Shape Retrieval using Curves and HMM 

Tabia, Hedi, Lagis Univ. Lille 1 

Daoudi, Mohamed, TELECOM Lille1 

Vandeborre, Jean-Philippe, Univ. of Lille 1 

Colot, Olivier, Univ. Lille 1 

In this paper, we propose a new approach for 3D-shape matching. This approach encloses an off-line step and an on-line step. 

- 228 -

In the off-line one, an alphabet, of which any shape can be composed, is constructed. First, 3D-objects are subdivided into a set 

of 3D-parts. The subdivision consists to extract from each object a set of feature points with associated curves. Then the whole 

set of 3D-parts is clustered into different classes from a semantic point of view. After that, each class is modeled by a Hidden 

Markov Model (HMM). The HMM, which represents a character in the alphabet, is trained using the set of curves corresponding 

to the class parts. Hence, any 3D-object can be represented by a set of characters. The on-line step consists to compare the set 

of characters representing the 3D-object query and that of each object in the given dataset. The experimental results obtained 

on the TOSCA dataset show that the system efficiently performs in retrieving similar 3D-models. 

13:30-16:30, Paper WeBCT9.7 

Fast Fingerprint Retrieval with Line Detection 

Lian, Hui-Cheng, Shanghai University 

In this paper, a retrieval method is proposed for audio and video fingerprinting systems by adopting a line detection technique. 

To achieve fast retrieval, the lines are generated from sub-fingerprints of query and database, and the non-candidate lines are 

filtered out. So, the distance between query and refers can be calculated fast. To demonstrate the superiority of this method, the 

audio fingerprints and video fingerprints are generated for comparisons. The experimental results indicate that the proposed 

method outperforms the direct hashing method. 

13:30-16:30, Paper WeBCT9.8 

A High-Dimensional Access Method for Approximated Similarity Search in Text Mining 

Artigas-Fuentes, Fernando José, Univ. de Oriente, CERPAMID 

Badía-Contelles, José Manuel, Univ. Jaume I, Castellón 

Gil-García, Reynaldo, Univ. de Oriente, CERPAMID 

In this paper, a new access method for very high-dimensional data space is proposed. The method uses a graph structure and 

pivots for indexing objects, such as documents in text mining. It also applies a simple search algorithm that uses distance or 

similarity based functions in order to obtain the k-nearest neighbors for novel query objects. This method shows a good selectivity 

over very-high dimensional data spaces, and a better performance than other state-of-the-art methods. Although it is a probabilistic 

method, it shows a low error rate. The method is evaluated on data sets from the well-known collection Reuters corpus 

version 1 (RCV1-v2) and dealing with thousands of dimensions. 

13:30-16:30, Paper WeBCT9.9 

3D Model Comparison through Kernel Density Matching 

Wang, Yiming, Nanjing Univ. 

Lu, Tong, Nanjing Univ. 

Gao, Rongjun, Nanjing Univ. 

Liu, Wenyin, City U of HK 

A novel 3D shape matching method is proposed in this paper. We first extract angular and distance feature pairs from preprocessed 

3D models, then estimate their kernel densities after quantifying the feature pairs into a fixed number of bins. During 

3D matching, we adopt the KL-divergence as a distance of 3D comparison. Experimental results show that our method is effective 

to match similar 3D shapes, and robust to model deformations or rotation transformations. 

13:30-16:30, Paper WeBCT9.10 

Improving the Efficiency of Content-Based Multimedia Exploration 

Beecks, Christian, RWTH Aachen Univ. 

Wiedenfeld, Sascha, RWTH Aachen Univ. 

Seidl, Thomas, RWTH Aachen Univ. 

Visual exploration systems enable users to search, browse, and explore voluminous multimedia databases in an interactive and 

playful manner. Whether users know the database’s contents in advance or not, these systems guide the user’s exploration 

process by visualizing the database contents and allowing him or her to issue queries intuitively. In order to improve the efficiency 

of content-based visual exploration systems, we propose an efficient query evaluation scheme which aims at reducing the total 

number of costly similarity computations. We evaluate our approach on different state-of-the-art image databases. 

- 229 -

13:30-16:30, Paper WeBCT9.11 

Tertiary Hash Tree: Indexing Structure for Content-Based Image Retrieval 

Tak, Yoon-Sik, Korea Univ. 

Hwang, Eenjun, Korea Univ. 

Dominant features for content-based image retrieval usually consist of high-dimensional values. So far, many researches 

have been done to index such values for fast retrieval. Still, many existing indexing schemes are suffering from performance 

degradation due to the curse of dimensionality problem. As an alternative, heuristic algorithms have been proposed to calculate 

the result with high probability at the cost of accuracy. In this paper, we propose a new hash tree-based indexing structure 

called tertiary hash tree for indexing high-dimensional feature values. Tertiary hash tree provides several advantages compared 

to the traditional extendible hash structure in terms of resource usage and search performance. Through extensive experiments, 

we show that our proposed index structure achieves outstanding performance. 

13:30-16:30, Paper WeBCT9.12 

An Augmented Reality Setup with an Omnidirectional Camera based on Multiple Object Detection 

Hayashi, Tomoki, Keio Univ. 

Uchiyama, Hideaki, Keio Univ. 

Pilet, Julien, Keio Univ. 

Saito, Hideo, Keio Univ. 

We propose a novel augmented reality (AR) setup with an omni directional camera on a table top display. The table acts as 

a mirror on which real playing cards appear augmented with virtual elements. The omni directional camera captures and recognizes 

its surrounding based on a feature based image retrieval approach which achieves fast and scalable registration. It 

allows our system to superimpose virtual visual effects to the omni directional camera image. In our AR card game, users sit 

around a table top display and show a card to the other players. The system recognizes it and augments it with virtual elements 

in the omni directional image acting as a mirror. While playing the game, the users can interact with each other directly and 

through the display. Our setup is a new, simple, and natural approach to augmented reality. It opens new doors to traditional 

card games. 

13:30-16:30, Paper WeBCT9.13 

Enhancing SVM Active Learning for Image Retrieval using Semi-Supervised Bias-Ensemble 

Wu, Jun, Dalian Maritime Univ. 

Lu, Ming-Yu, Dalian Maritime Univ. 

Wang, Chun-Li, Dalian Maritime Univ. 

Support vector machine (SVM) based active learning technique plays a key role to alleviate the burden of labeling in relevance 

feedback. However, most SVM-based active learning algorithms are challenged by the small example problem and the asymmetric 

distribution problem. This paper proposes a novel active learning scheme that deals with SVM ensemble under the 

semi-supervised setting to address the fist problem. For the second problem, a bias-ensemble mechanism is developed to 

guide the classification model to pay more attention on the positive examples than the negative ones. An empirical study 

shows that the proposed scheme is significantly more effective than some existing approaches. 

13:30-16:30, Paper WeBCT9.14 

Interactive Browsing of Remote JPEG 2000 Image Sequences 

Garcia Ortiz, Juan Pablo, Univ. of Almeria 

Ruiz, Gonzalez V., Univ. of Almeria 

Garcia, I., Univ. of Almeria 

Müller, D., European Space Agency/NASA 

Dimitoglou, G., European Space Agency/NASA 

This papers studies a novel prefetching scheme for the remote browsing of sequences of high resolution JPEG 2000 images. 

Using this scheme, an user is able to select randomly any of the remote images for its analysis, repeating this process with 

other images after some undefined time. Our solution has been proposed in a low bit-rate communication context where the 

complete transmission of any of the images for its lossless recovery should take too much time for an interactive visualization. 

For this reason, quality scalability is used in order to minimize the decoding latency. Frequently, the user can also play a 

``video’’, moving sequentially on the neighbour (consecutive in time over previous or following) images of the currently 

displayed one. With the objective of hiding also the link latency, the proposed data scheduler transmits in parallel data of the 

- 230 -

image that it is currently displayed and data of the rest of the temporally adjacent images. This scheduler uses a model based 

on the quality progression of the image in order to estimate which percentage of the bandwidth is dedicated to prefetch data. 

Our experimental results prove that a significant benefit can be achieved in terms of both subjective quality and responsiveness 

by means of prefetching. 

13:30-16:30, Paper WeBCT9.15 

Binarization of Color Characters in Scene Images using K-Means Clustering and Support Vector Machines 


Kita, Kohei, Hosei Univ. 

This paper proposes a new technique for binalizing multicolored characters subject to heavy degradations. The key ideas are 

threefold. The first is generation of tentatively binarized images via every dichotomization of k clusters obtained by k-means 

clustering in the HSI color space. The total number of tentatively binarized images equals 2^k2. The second is use of support 

vector machines (SVM) to determine whether and to what degree each tentatively binarized image represents a character or 

non-character. We feed the SVM with mesh and weighted direction code histogram features to output the degree of character-likeness. 

The third is selection of a single binarized image with the maximum degree of character likeness as an optimal 

binarization result. Experiments using a total of 1000 single-character color images extracted from the ICDAR 2003 robust 

OCR dataset show that the proposed method achieves a correct binarization rate of 93.7%. 

13:30-16:30, Paper WeBCT9.16 

A Self-Training Learning Document Binarization Framework 

Su, Bolan, National Univ. of Singapore 

Lu, Shijian, - 


Document Image Binarization techniques have been studied for many years, and many practical binarization techniques have 

been developed and applied successfully on commercial document analysis systems. However, the current state-of-the-art 

methods, fail to produce good binarization results for many badly degraded document images. In this paper, we propose a 

self-training learning framework for document image binarization. Based on reported binarization methods, the proposed 

framework first divides document image pixels into three categories, namely, foreground pixels, background pixels and uncertain 

pixels. A classifier is then trained by learning from the document image pixels in the foreground and background categories. 

Finally, the uncertain pixels are classified using the learned pixel classifier. Extensive experiments have been 

conducted over the dataset that is used in the recent Document Image Binarization Contest(DIBCO) 2009. Experimental results 

show that our proposed framework significantly improves the performance of reported document image binarization 

methods. 

13:30-16:30, Paper WeBCT9.17 

Novel Edge Features for Text Frame Classification in Video 



Text frame classification is needed in many applications such as event identification, exact event boundary identification, 

navigation, video surveillance in multimedia etc. To the best of our knowledge, there are no methods reported solely dedicated 

to text frame classifications so far. Hence this paper presents a new approach to text frame classification in video based on 

capturing local observable edge properties of text frames, by virtue of the strong presence of sharp edges, straight appearances 

of edges and consistent proximity between edges. The approach initially classifies the blocks of the frame into text blocks 

and non-text blocks. The true text block is then identified among classified text blocks to detect text frames by the proposed 

features. If the text frame produces one true text block then it is considered as a text frame otherwise a non-text frame. We 

evaluate the proposed approach on a large database containing both text and nontext frames and publicly available data at 

two levels, i.e., estimating recall and precision at the block level and the frame level. 

13:30-16:30, Paper WeBCT9.18 

Image Matching and Retrieval by Repetitive Patterns 

Doubek, Petr, Czech Tech. Univ. in Prague 

Matas, Jiri, Czech Tech. Univ. in Prague 

- 231 -

Perdoch, Michal, Czech Tech. Univ. in Prague 

Chum, Ondrej, Czech Tech. Univ. in Prague 

Detection of repetitive patterns in images has been studied for a long time in computer vision. This paper discusses a 

method for representing a lattice or line pattern by shift-invariant descriptor of the repeating element. The descriptor overcomes 

shift ambiguity and can be matched between different a views. The pattern matching is then demonstrated in retrieval 

experiment, where different images of the same buildings are retrieved solely by repetitive patterns. 

13:30-16:30, Paper WeBCT9.19 

An Approach for Recognizing Text Labels in Raster Maps 

Chiang, Yao-Yi, USC ISI 

Knoblock, Craig, USC ISI 

Text labels in raster maps provide valuable geospatial information by associating geographical names with geospatial locations. 

Although present commercial optical character recognition (OCR) products can achieve a high recognition rate 

on documents containing text lines of the same orientation, text recognition on raster maps is challenging due to the varying 

text orientations and the overlap of text labels. This paper presents a text recognition approach that focuses on locating individual 

text labels in the map and detecting their orientations to then leverage the horizontal text recognition capability 

of commercial OCR software. We show that our approach detects accurate string orientations and achieves 96.2% precision 

and 94.7% recall on character recognition and 80.6% precision and 84.1% recall on word recognition. 

13:30-16:30, Paper WeBCT9.20 

Local Visual Pattern Indexing for Matching Screenshots with Videos 

Poullot, Sebastien, National Inst. of Informatics 

Satoh, Shin’Ichi, National Inst. of Informatics 

In this paper a particular issue is addressed: matching still images (screen shots) with videos. A content-based similarity 

search approach using image queries is proposed. A fast method based on local visual patterns both for matching and indexing 

is employed. But we argue that using every frames may limit the scalability of the approach. Therefore only 

keyframes are extracted and used. The main contribution of this paper is an investigation over the trade-off between accuracy 

and scalability using different keyframe rates for sampling the video database. This trade-off is evaluated on a 

ground truth using a large reference video database (1000 hours). 

13:30-16:30, Paper WeBCT9.21 

Suggesting Songs for Media Creation using Semantics 

Joshi, Dhiraj, Kodak Res. Lab. 

Wood, Mark, Eastman Kodak Company 

Luo, Jiebo, - 

In this paper, we describe a method for matching song lyrics with semantic annotations of picture collections in order to 

suggest songs that reflect picture content in lyrics or genre. Picture collections are first analyzed to extract a variety of semantic 

information including scene type, event type, and geospatial information. When aggregated over a picture collection, 

this semantic information forms a semantic signature of the collection. Typical picture collections in our scenario consist 

of photo subdirectories in which people store pictures of a place, activity, or event. Picture collections are expected to 

contain coherent semantic content describing in part or whole the event or activity they depict. The semantic signature of 

a picture collection is compared against song lyrics using a WordNet expansion based text matching to find songs relevant 

to the collection. We present interesting song suggestions, compare and contrast scenarios with human versus machine labels, 

and perform a user study to validate the usefulness of the proposed method. The proposed method will be a useful 

tool to support user media creation. 

13:30-16:30, Paper WeBCT9.22 

Color Feature based Approach for Determining Ink Age in Printed Documents 

Halder, Biswajit, Mallabhum Inst. of Tech. 

Garain, Utpal, Indian Statistical Inst. 

- 232 -

Answering to a query like when a particular document was printed is quite helpful in practice especially forensic purposes. 

This study attempts to develop a general framework that makes use of image processing and pattern recognition principles 

for ink age determination in printed documents. The approach, at first, computationally extracts a set of suitable color features 

and then analyzes them to properly associate them with ink age. Finally, a neural net is designed and trained to determine 

ages of unknown samples. The dataset used for the present experiment consists of the cover pages of LIFE 

magazines published in between 1930’s and 70’s (five decades). Test results show that a viable framework for involving 

machines in assisting human experts for determining age of printed documents. 

13:30-16:30, Paper WeBCT9.23 

Automatic Detection and Localization of Natural Scene Text in Video 

Huang, Xiaodong, Beijing Univ. of Posts and Telecommunications 

Ma, Huadong, Beijing Univ. of Posts and Telecommunications 

Video scene text contains semantic information and thus can contribute significantly to video indexing and summarization. 

However, most of the previous approaches to detecting scene text from videos experience difficulties in handling texts 

with various character size and text alignments. In this paper, we propose a novel algorithm of scene text detection and 

localization in video. Based on our observation that text character strokes show intensive edge details in the fixed orientation 

no matter what text alignment and size are, a stroke map is first generated. In the scene text detection, we extract 

the texture feature of stroke map to locate text lines. The detected scene text lines are accurately located by using Harris 

corners in the stroke map. Experimental results show that this approach is robust and can be effectively applied to scene 

text detection and localization in video. 

13:30-16:30, Paper WeBCT9.24 

High-Level Feature Extraction using SIFT GMMs and Audio Models 

Inoue, Nakamasa, Tokyo Inst. of Tech. 

Saito, Tatsuhiko, Tokyo Inst. of Tech. 

Shinoda, Koichi, Tokyo Inst. of Tech. 

Furui, Sadaoki, 

We propose a statistical framework for high-level feature extraction that uses SIFT Gaussian mixture models (GMMs) 

and audio models. SIFT features were extracted from all the image frames and modeled by a GMM. In addition, we used 

mel-frequency cepstral coefficients and ergodic hidden Markov models to detect high-level features in audio streams. The 

best result obtained by using SIFT GMMs in terms of mean average precision on the TRECVID 2009 corpus was 0.150 

and was improved to 0.164 by using audio information. 

13:30-16:30, Paper WeBCT9.25 

Pairwise Features for Human Action Recognition 

Ta, Anh Phuong, Univ. de Lyon, CNRS, INSA-Lyon, LIRIS 

Wolf, Christian, INSA de Lyon 

Lavoue, Guillaume, Univ. de Lyon, CNRS 

Baskurt, Atilla, LIRIS, INSA Lyon 

Jolion, Jolion, Univ. de Lyon 

Existing action recognition approaches mainly rely on the discriminative power of individual local descriptors extracted 

from spatio-temporal interest points (STIP), while the geometric relationships among the local features are ignored. This 

paper presents new features, called pairwise features (PWF), which encode both the appearance and the spatio-temporal 

relations of the local features for action recognition. First STIPs are extracted, then PWFs are constructed by grouping 

pairs of STIPs which are both close in space and close in time. We propose a combination of two codebooks for video 

representation. Experiments on two standard human action datasets: the KTH dataset and the Weizmann dataset show that 

the proposed approach outperforms most existing methods. 

13:30-16:30, Paper WeBCT9.26 

Group Activity Recognition by Gaussian Processes Estimation 

Cheng, Zhongwei, Chinese Acad. of Sciences 

Qin, Lei, Chinese Acad. of Sciences 

- 233 -




Human action recognition has been well studied recently, but recognizing the activities of more than three persons remains 

a challenging task. In this paper, we propose a motion trajectory based method to classify human group activities. Gaussian 

Processes are introduced to represent human motion trajectories from a probabilistic perspective to handle the variability 

of people’s activities in group. With respect to the relationships of persons in group activities, three discriminative descriptors 

are designed, which are Individual, Dual and Unitized Group Activity Pattern. We adopt the Bag of Words approach 

to solve the problem of unbalanced number of persons in different activities. Experiments are conducted on the 

human group-activity video database, and the results show that our approach outperforms the state-of-the-art. 

13:30-16:30, Paper WeBCT9.27 

Extracting Captions in Complex Background from Videos 

Liu, Xiaoqian, Chinese Acad. of Sciences 

Wang, Weiqiang, Chinese Acad. of Sciences 

Captions in videos play a significant role for automatically understanding and indexing video content, since much semantic 

information is associated with them. This paper presents an effective approach to extracting captions from videos, in which 

multiple different categories of features (edge, color, stroke etc.) are utilized, and the spatio-temporal characteristics of 

captions are considered. First, our method exploits the distribution of gradient directions to decompose a video into a sequence 

of clips temporally, so that each clip contains a caption at most, which makes the successive extraction computation 

more efficient and accurate. For each clip, the edge and corner information are then utilized to locate text regions. Further, 

text pixels are extracted based on the assumption that text pixels in text regions always have homogeneous color, and their 

quantity dominates the region relative to non-text pixels with different colors. Finally, the segmentation results are further 

refined. The encouraging experimental results on 2565 characters have preliminarily validated our approach. 

13:30-16:30, Paper WeBCT9.28 

Keyframe-Guided Automatic Non-Linear Video Editing 

Rajgopalan, Vaishnavi, Concordia Univ. 

Ranganathan, Ananth, Honda Res. Inst. USA 

Rajagopalan, Ramgopal, Res. in Motion 

Mudur, Sudhir, Concordia Univ. 

We describe a system for generating coherent movies from a collection of unedited videos. The generation process is 

guided by one or more input keyframes, which determine the content of the generated video. The basic mechanism involves 

similarity analysis using the histogram intersection function. The function is applied to spatial pyramid histograms computed 

on the video frames in the collection using Dense SIFT features. A two-directional greedy path finding algorithm is 

used to select and arrange frames from the collection while maintaining visual similarity, coherence, and continuity. Our 

system demonstrates promising results on large video collections and is a first step towards increased automation in nonlinear 

video editing. 

13:30-16:30, Paper WeBCT9.29 

Images in News 

Sankaranarayanan, Jagan, Univ. of Maryland 

Samet, Hanan, Univ. of Maryland 

A system, called News Stand, is introduced that automatically extracts images from news articles. The system takes RSS feeds of news 

article and applies an online clustering algorithm so that articles belonging to the same news topic can be associated with the same cluster. 

Using the feature vector associated with the cluster, the images from news articles that form the cluster are extracted. First, the caption text 

associated with each of the images embedded in the news article is determined. This is done by analyzing the structure of the news article’s 

HTML page. If the caption and feature vector of the cluster are found to contain keywords in common, then the image is added to an image 

repository. Additional meta-information are now associated with each image such as caption, cluster features, names of people in the news 

article, etc. A very large repository containing more than 983k images from 12 million news articles was built using this approach. This 

repository also contained more than 86.8 million keywords associated with the images. The key contribution of this work is that it combines 

clustering and natural language processing tasks to automatically create a large corpus of news images with good quality tags or meta-information 

so that interesting vision tasks can be performed on it. 

- 234 -

13:30-16:30, Paper WeBCT9.30 

A Multimodal Approach to Violence Detection in Video Sharing Sites 

Giannakopoulos, Theodoros, Univ. of Athens 

Pikrakis, Aggelos, Univ. of Piraeus 

Theodoridis, Sergios, Univ. of Athens 

This paper presents a method for detecting violent content in video sharing sites. The proposed approach operates on a fusion 

of three modalities: audio, moving image and text data, the latter being collected from the accompanying user comments. 

The problem is treated as a binary classification task (violent vs non-violent content) on a 9-dimensional feature 

space, where 7 out of 9 features are extracted from the audio stream. The proposed method has been evaluated on 210 

YouTube videos and the overall accuracy has reached 82%. 

13:30-16:30, Paper WeBCT9.31 

Video Retrieval based on Tracked Features Quantization 

Hiroaki, Kubo, Keio Univ. 

Pilet, Julien, Keio Univ. 



In this paper, we present an image retrieval method based on feature tracking. Feature tracks are summarized into a compact 

discreet value and used for video indexing purpose. As opposed to existing space-time features, we do not make any assumption 

on the motion visible on the indexed videos. As a result, given an example query, our system is able to retrieve 

related videos from a large database. We evaluated our system with the copy detection benchmark MUSCLE-VCD-2007. 

We also ran retrieval experiment on hours of TV broadcast. 

13:30-16:30, Paper WeBCT9.32 

Interactive Web Video Advertising with Context Analysis and Search 

Wang, Bo, Chinese Acad. of Sciences 

Wang, Jinqiao, Chinese Acad. of Sciences 

Duan, Lingyu, Peking Univ. 


Lu, Hanqing, Chinese Acad. of Sciences 

Gao, Wen, PeKing Univ. 

Online media services and electronic commerce are booming recently. Previous studies have been devoted to contextual 

advertising, but few work deals with interactive web advertising. In this paper, we propose to put users in the loop of collecting 

contextual ad information with an interaction process, establishing semantic ad links across media platforms. Given 

an ad video, the key frames with explicit product information are located, which allow users to click favorite key frames 

for searching ads interactively. A three-stage contextual search is applied to find relevant products or services from web 

pages, i.e., searching visually similar product images on shopping websites, ranking product tags by text aggregation, and 

re-search textual items consisting of semantic meaningful tags to make a recommendation. In addition, users can choose 

automatically suggested keywords to reflect their intentions. Subjective evaluation has demonstrated the effectiveness of 

the proposed approach to interactive video advertising over the Web. 

13:30-16:30, Paper WeBCT9.33 

Selection of Photos for Album Building Applications 

Egorova, Marta, National Nuclear Res. Univ. 

Safonov, Ilia, National Nuclear Res. Univ. 

In this work we propose a new algorithm for selection of high-quality photos for album building applications. We describe 

how to select features for detection well-exposed, sharp and artifact-free photos. We considered two approaches: the first, 

typical way when all features are used in single AdaBoost classifiers committee and the second way, when decision tree, 

including 3 committees. Careful analysis of features and decision tree construction allowed better outcomes to be reached. 

- 235 -

13:30-16:30, Paper WeBCT9.34 

Comparison of Multidimensional Data Access Methods for Feature-Based Image Retrieval 

Arslan, Serdar, Middle East Tech. Univ. 

Açar, Esra, Middle East Tech. Univ. 

Saçan, Ahmet, Middle East Tech. Univ. 

Toroslu, Ismail Hakkı , Middle East Tech. Univ. 

Yazıcı, Adnan, Middle East Tech. Univ. 

Within the scope of information retrieval, efficient similarity search in large document or multimedia collections is a 

critical task. In this paper, we present a rigorous comparison of three different approaches to the image retrieval problem, 

including cluster-based indexing, distance-based indexing, and multidimensional scaling methods. The time and accuracy 

trade-offs for each of these methods are demonstrated on a large Corel image database. Similarity of images is obtained 

via a feature-based similarity measure using four MPEG-7 low-level descriptors. We show that an optimization of feature 

contributions to the distance measure can identify irrelevant features and is necessary to obtain the maximum accuracy. 

We further show that using multidimensional scaling can achieve comparable accuracy, while speeding-up the query times 

significantly by allowing the use of spatial access methods. 

13:30-16:30, Paper WeBCT9.35 

A Pixel-Based Evaluation Method for Text Detection in Color Images 

Anthimopoulos, Marios, National Center for Scientific Res. “Demokritos” 

Vlissidis, Nikolaos, National Center for Scientific Res. “Demokritos” 

Gatos, B., National Center for Scientific Res. “Demokritos” 

This paper proposes a performance evaluation method for text detection in color images. The method, contrary to previous 

approaches, is not based on the inexplicitly defined text bounding boxes for the evaluation of the text detection result but 

considers only the text pixels detected by binarizing the image and applying a color inversion if needed. Moreover, in 

order to gain independence from the chosen binarization algorithm, the method uses the skeleton of the binarized image. 

The results produced by the proposed evaluation protocol proved to be quite representative and reasonable compared to 

the corresponding optical result. 

13:30-16:30, Paper WeBCT9.36 

Active Boosting for Interactive Object Retrieval 

Lechervy, Alexis, ETIS, CNRS, ENSEA, Univ. Cergy-Pontoise 

Gosselin, Philippe Henri, CNRS 

Precioso, Frederic, ETIS, CNRS, ENSEA, Univ. Cergy-Pontoise 

This paper presents a new algorithm based on boosting for interactive object retrieval in images. Recent works propose 

online boosting algorithms where weak classifier sets are iteratively trained from data. These algorithms are proposed for 

visual tracking in videos, and are not well adapted to online boosting for interactive retrieval. We propose in this paper to 

iteratively build weak classifiers from images, labeled as positive by the user during a retrieval session. A novel active 

learning strategy for the selection of images for user annotation is also proposed. This strategy is used to enhance the 

strong classifier resulting from boosting process, but also to build new weak classifiers. Experiments have been carried 

out on a generalist database in order to compare the proposed method to a SVM based reference approach. 

13:30-16:30, Paper WeBCT9.37 

Geotagged Photo Recognition using Corresponding Aerial Photos with Multiple Kernel Learning 

Keita, Yaegashi, Univ. of Electro-Commnications 

Keiji, Yanai, Univ. of Electro-Commnications 

In this paper, we treat with generic object recognition for geotagged images. As a recognition method for geotagged photos, 

we have already proposed exploiting aerial photos around geotag places as additional image features for visual recognition 

of geotagged photos. In the previous work, to fuse two kinds of features, we just concatenate them. Instead, in this paper, 

we introduce Multiple Kernel Learning (MKL) to integrate both features of photos and aerial images. MKL can estimate 

the contribution weights to integrate both kinds of features. In the experiments, we confirmed effectiveness of usage of 

aerial photos for recognition of geotagged photos, and we evaluated the weights of both features estimated by MKL for 

eighteen concepts. 

- 236 -

13:30-16:30, Paper WeBCT9.38 

Efficient Semantic Indexing for Image Retrieval 

Pulla, Chandrika, International Inst. of Information Tech. Hyderabad 

Karthik, Suman, International Inst. of Information Tech. Hyderabad 

Jawahar, C. V., IIIT 

Semantic analysis of a document collection can be viewed as an unsupervised clustering of the constituent words and documents 

around hidden or latent concepts. This has shown to improve the performance of visual bag of words in image retrieval. 

However, the enhancement in performance depends heavily on the right choice of number of semantic concepts. 

Most of the semantic indexing schemes are also computationally costly. In this paper, we employ a bipartite graph model 

(BGM) for image retrieval. BGM is a scalable data structure that aids semantic indexing in an efficient manner. It can also 

be incrementally updated. BGM uses \textbf{tf-idf} values for building a semantic bipartite graph. We also introduce a 

graph partitioning algorithm that works on the BGM to retrieve semantically relevant images from a database. We demonstrate 

the properties as well as performance of our semantic indexing scheme through a series of experiments. We also 

compare our methods with incremental pLSA. 

13:30-16:30, Paper WeBCT9.39 

Improving and Aligning Speech with Presentation Slides 

Swaminathan, Ranjini, Univ. of Arizona 

Thompson, Michael E., Univ. of Arizona 

Fong, Sandiway, Univ. of Arizona 

Efrat, Alon, Univ. of Arizona 

Amir, Arnon 

Barnard, Kobus, Univ. of Arizona 

We present a novel method to correct automatically generated speech transcripts of talks and lecture videos using text 

from accompanying presentation slides. The approach finesses the challenges of dealing with technical terms which are 

often outside the vocabulary of speech recognizers. Further, we align the transcript to the slide word sequence so that we 

can improve the organization of closed captioning for hearing impaired users, and improve automatic highlighting or magnification 

for visually impaired users. For each speech segment associated with a slide, we construct a sequential Hidden 

Markov Model for the observed phonemes that follows slide word order, interspersed with text not on the slide. Incongruence 

between slide words and mistaken transcript words is accounted for using phoneme confusion probabilities. Hence, 

transcript words different from aligned high probability slide words can be corrected. Experiments on six talks show improvement 

in transcript accuracy and alignment with slide words. 

13:30-16:30, Paper WeBCT9.40 

The ImageCLEF Medical Retrieval Task at ICPR 2010 - Information Fusion 

Kalpathy-Cramer, Jayashree, Oregon Health & Science Univ. 

Müller, Henning, Univ. of Applied Sciences 

An increasing number of clinicians, researchers, educators and patients routinely search for medical information on the 

Internet as well as in image archives. However, image retrieval is far less understood and developed than text-based search. 

The ImageCLEF medical image retrieval task is an international benchmark that enables researchers to assess and compare 

techniques for medical image retrieval using standard test collections. Although text retrieval is mature and well researched, 

it is limited by the quality and availability of the annotations associated with the images. Advances in computer vision 

have led to methods for using the image itself as search entity. However, the success of purely content-based techniques 

has been limited and these systems have not had much clinical success. On the other hand a combination of text- and content-based 

retrieval can achieve improved retrieval performance if combined effectively. Combining visual and textual 

runs is not trivial based on experience in ImageCLEF. The goal of the fusion challenge at ICPR is to encourage participants 

to combine visual and textual results to improve search performance. Participants were provided textual and visual runs, 

as well as the results of the manual judgments from ImageCLEFmed 2008 as training data. The goal was to combine 

textual and visual runs from 2009. In this paper, we present the results from this ICPR contest. 

13:30-16:30, Paper WeBCT9.41 

Unified Approach to Detection and Identification of Commercial Films by Temporal Occurrence Pattern 

Putpuek, Narongsak, Chulalongkorn Univ. 

- 237 -

Cooharojananone, Nagul, Chulalongkorn Univ. 

Lursinsap, Chidchanok, Chulalongkorn Univ. 


In this paper, we propose a method to detect and identify commercial films from broadcast videos by using Temporal Occurrence 

Pattern (TOP). Our method uses the characteristic of broadcast videos in Japan that each individual commercial 

film appears multiple times in broadcast stream and typically has the same duration (e.g., 15 seconds). Using this characteristic, 

the method can detect as well as identify individual commercial films within given video archive. Based on simple 

signature (global feature) for each frame image, the method first puts all frames into numbers of buckets where each bucket 

contains frames having the same signature, and thus they appear the same. For each bucket, TOP as a binary sequence 

representing the occurrence time within video archive is then generated. All buckets are then clustered using simple hierarchical 

clustering with similarity between TOPs allowing possible temporal offset. This clustering stage can stitch up all 

frames for each commercial film and identify multiple occurrence of the same commercial film at the same time. We tested 

our method using actual broadcast video archive and confirmed good performance in detecting and identifying commercial 

films. 

- 238 -

Technical Program for Thursday 


- 239 -

- 240 -

ThAT1 Marmara Hall 

Object Detection and Recognition - IV Regular Session 

Session chair: Lee, Kyoung Mu (Seoul National Univ.) 

09:00-09:20, Paper ThAT1.1 

Visual Recognition of Types of Structural Corridor Landmarks using Vanishing Points Detection and Hidden Markov 

Models 

Park, Young-Bin, Hanyang Univ. 

Kim, Sung-Su, Hanyang Univ. 

Suh, Il Hong, Hanyang Univ. 

In this paper, to provide a robot with information relative to structure of its environment, we propose a method to recognize 

types of structural corridor landmarks such as T-junction, L-junction, end of the corridor, using vanishing points-based visual 

image features and hidden Markov models. Several experimental results are illustrated to emonstrate the validity of the proposed 

approach in a real environment. 

09:20-09:40, Paper ThAT1.2 

Multi-Object Segmentation in a Projection Plane using Subtraction Stereo 

Ubukata, Toru, Chuo University / CREST, JST 

Terabayashi, Kenji, Chuo Univ. 

Moro, Alessandro, Univ. of Trieste 

Umeda, Kazunori, Chuo Univ. 

We propose a method for multi-object segmentation in a projection plane. Our algorithm requires a stereo camera system 

called Subtraction Stereo, which extracts foreground information with a fixed stereo camera. The main contribution of this 

paper is how the image sequences that include partial occlusion of the foreground objects can be accurately segmented using 

mean shift clustering in real-time processing. The proposed method is suitable for inside a medium-sized environment, such 

as a room. Finally, we try to segment the sequences that include occlusion and show the accuracy of the proposed method. 

09:40-10:00, Paper ThAT1.3 

Transitive Closure based Visual Words for Point Matching in Video Sequence 

Bhat, Srikrishna, INRIA 

Berger, Marie-Odile, INRIA 

Simon, Gilles, Nancy-Univ. 

Sur, Frédéric, INPL / INRIA Nancy Grand Est 

We present Transitive Closure based visual word formation technique for obtaining robust object representations from 

smoothly varying multiple views. Each one of our visual words is represented by a set of feature vectors which is obtained 

by performing transitive closure operation on SIFT features. We also present range-reducing tree structure to speed up the 

transitive closure operation. The robustness of our visual word representation is demonstrated for Structure from Motion 

(SfM) and location identification in video images. 

10:00-10:20, Paper ThAT1.4 

Constrained Energy Minimization for Matching-Based Image Recognition 

Gass, Tobias, RWTH Aachen Univ. 

Dreuw, Philippe, RWTH Aachen Univ. 

Ney, Hermann, RWTH Aachen Univ. 

We propose to use energy minimization in MRFs for matching-based image recognition tasks. To this end, the Tree- 

Reweighted Message Passing algorithm is modified by geometric constraints and efficiently used by exploiting the guaranteed 

monotonicity of the lower bound within a nearest-neighbor based classification framework. The constraints allow for a 

speedup linear to the dimensionality of the reference image, and the lower bound allows to optimally prune the nearestneighbor 

search without loosing accuracy, effectively allowing to increase the number of optimization iterations without an 

effect on runtime. We evaluate our approach on well-known OCR and face recognition tasks and on the latter outperform 

current state-of-the-art. 

- 241 -

10:20-10:40, Paper ThAT1.5 

A Re-Evaluation of Pedestrian Detection on Riemannian Manifolds 

Tosato, Diego, Univ. of Verona 

Farenzena, Michela, Univ. of Verona 



Boosting covariance data on Riemannian manifolds has proven to be a convenient strategy in a pedestrian detection context. 

In this paper we show that the detection performances of the state-of-the-art approach of Tuzel et al. [7] can be greatly improved, 

from both a computational and a qualitative point of view, by considering practical and theoretical issues, and 

allowing also the estimation of occlusions in a fine way. The resulting detection system reaches the best performance on the 

INRIA dataset, setting novel state-of-the art results. 

ThAT2 Anadolu Auditorium 

Classification - I Regular Session 

Session chair: Duin, Robert (TU Delft) 

09:00-09:20, Paper ThAT2.1 

An Optimum Class-Rejective Decision Rule and its Evaluation 

Le Capitaine, Hoel, Univ. of La Rochelle 

Frelicot, Carl, Univ. of La Rochelle 

Decision-making systems intend to copy human reasoning which often consists in eliminating highly non probable situations 

(e.g. diseases, suspects) rather than selecting the most reliable ones. In this paper, we present the concept of class-rejective 

rules for pattern recognition. Contrary to usual reject option schemes where classes are selected when they may correspond 

to the true class of the input pattern, it allows to discard classes that can not be the true one. Optimality of the rule is proven 

and an upper-bound for the error probability is given. We also propose a criterion to evaluate such class-rejective rules. Classification 

results on artificial and real datasets are provided. 

09:20-09:40, Paper ThAT2.2 

A Practical Heterogeneous Classifier for Relational Databases 

Manjunath, Geetha, Indian Inst. of Science 

M, Narasimha Murty, Indian Inst. of Science 

Sitaram, Dinkar, Hewlett Packard Company 

Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional singletable 

machine learning techniques over such data not only incur a computational penalty for converting to a flat form (megajoin), 

even the human-specified semantic information present in the relations is lost. In this paper, we present a two-phase 

hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose 

a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. A preliminary 

evaluation on TPCH and UCI benchmarks shows reduced training time without any loss of prediction accuracy. 

09:40-10:00, Paper ThAT2.3 

Spatial Representation for Efficient Sequence Classification 

Kuksa, Pavel, Rutgers Univ. 

Pavlovic, Vladimir, Rutgers Univ. 

We present a general, simple feature representation of sequences that allows efficient inexact matching, comparison and 

classification of sequential data. This approach, recently introduced for the problem of biological sequence classification, 

exploits a novel multi-scale representation of strings. The new representation leads to discovery of very efficient algorithms 

for string comparison, independent of the alphabet size. We show that these algorithms can be generalized to handle a wide 

gamut of sequence classification problems in diverse domains such as the music and text sequence classification. The presented 

algorithms offer low computational cost and highly scalable implementations across different application domains. 

The new method demonstrates order-of-magnitude running time improvements over existing state-of-the-art ap 

proaches while matching or exceeding their predictive accuracy. 

- 242 -

10:00-10:20, Paper ThAT2.4 

Rectifying Non-Euclidean Similarity Data using Ricci Flow Embedding 

Xu, Weiping, Univ. of York 



Similarity based pattern recognition is concerned with the analysis of patterns that are specified in terms of object dissimilarity 

or proximity rather than ordinal values. For many types of data and measures, these dissimilarities are not Euclidean. 

This hinders the use of many machine-learning techniques. In this paper, we provide a means of correcting or rectifying 

the similarities so that the non-Euclidean artifacts are minimized. We consider the data to be embedded as points on a 

curved manifold and then evolve the manifold so as to increase its flatness. Our work uses the idea of Ricci flow on the 

constant curvature Riemannian manifold to modify the Gaussian curvatures on the edges of a graph representing the non- 

Euclidean data. We demonstrate the utility of our method on the standard ``Chicken pieces’’ dataset and show that we can 

transform the non-Euclidean distances into Euclidean space. 

10:20-10:40, Paper ThAT2.5 

One-Vs-All Training of Prototype Classifier for Pattern Classification and Retrieval 


Prototype classifiers trained with multi-class classification objective are inferior in pattern retrieval and outlier rejection. 

To improve the binary classification (detection, verification, retrieval, outlier rejection) performance of prototype classifiers, 

we propose a one-vs-all training method, which enriches each prototype as a binary discriminant function with a local 

threshold, and optimizes both the prototype vectors and the thresholds on training data using a binary classification objective, 

the cross-entropy (CE). Experimental results on two OCR datasets show that prototype classifiers trained by the onevs-all 

method is superior in both multi-class classification and binary classification. 

ThAT3 Topkapı Hall A 

Computer Vision Applications - I Regular Session 

Session chair: Haindl, Michael (Institute of Information Theory) 

09:00-09:20, Paper ThAT3.1 

Probabilistic Modeling of Dynamic Traffic Flow across Non-Overlapping Camera Views 

Huang, Ching-Chun, National Chiao Tung University 

Chiu, Wei-Chen, Department of Computer Science 

Wang, Sheng-Jyh, National Chiao Tung Univ. 

Chuang, Jen-Hui, National Chiao Tung Univ. 

In this paper, we propose a probabilistic method to model the dynamic traffic flow across non-overlapping camera views. 

By assuming the transition time of object movement follows a certain global model, we may infer the time-varying traffic 

status in the unseen region without performing explicit object correspondence between camera views. In this paper, we 

model object correspondence and parameter estimation as a unified problem under the proposed Expectation-Maximization 

(EM) based framework. By treating object correspondence as a latent random variable, the proposed framework can iteratively 

search for the optimal model parameters with the implicit consideration of object correspondence. 

09:20-09:40, Paper ThAT3.2 

Vehicle Recognition as Changes in Satellite Imagery 

Ozcanli, Ozge Can, Brown Univ. 

Mundy, Joseph, 

Over the last several years, a new probabilistic representation for 3-d volumetric modeling has been developed. The main purpose of the 

model is to detect deviations from the normal appearance and geometry of the scene, i.e. change detection. In this paper, the model is 

utilized to characterize changes in the scene as vehicles. In the training stage, a compositional part hierarchy is learned to represent the 

geometry of Gaussian intensity extrema primitives exhibited by vehicles. In the test stage, the learned compositional model produces vehicle 

detections. Vehicle recognition performance is measured on low-resolution satellite imagery and detection accuracy is significantly improved 

over the initial change map given by the 3-d volumetric model. A PCA-based Bayesian recognition algorithm is implemented for comparison, 

which exhibits worse performance than the proposed method. 

- 243 -

09:40-10:00, Paper ThAT3.3 

Crowd Motion Analysis using Linear Cyclic Pursuit 

Viswanathan, Srikrishnan, I.I.T Bombay 

Chaudhuri, Subhasis, IIT 

Crowd motion analysis, where there is interdependence amongst the constituent elements, is a relatively unexplored application 

area in computer vision. In this work, we propose a fast method for short-term crowd motion prediction using a 

sparse set of particles. We study the dynamics of a crowd motion model and linear cyclic pursuit. We show that linear 

cyclic pursuit naturally captures the repulsive and attractive forces acting on the individual crowd member. The pursuit 

parameters are estimated from videos in an online manner using a feature tracker. Short term trajectory prediction is done 

by numerical solution of estimated cyclic pursuit equation. We demonstrate the suitability of the proposed technique 

through extensive experimentations. 

10:00-10:20, Paper ThAT3.4 

Integrating Object Detection with 3D Tracking towards a Better Driver Assistance System 

Prisacariu, Victor Adrian, Univ. of Oxford 

Timofte, Radu, Katholieke Univ. Leuven 

Zimmermann, Karel, Katholieke Univ. Leuven 

Reid, Ian, 

Van Gool, Luc 

Driver assistance helps save lives. Accurate 3D pose is required to establish if a traffic sign is relevant to the driver. We 

propose a real-time system that integrates single view detection with region-based 3D tracking of road signs. The optimal 

set of candidate detections is found, followed by AdaBoost cascades and SVMs. The 2D detections are then employed in 

simultaneous 2D segmentation and 3D pose tracking, using the known 3D model of the recognised traffic sign. We demonstrate 

the abilities of our system by tracking multiple road signs in real world scenarios. 

10:20-10:40, Paper ThAT3.5 

Real-Time Automatic Traffic Accident Recognition using HFG 

Bakheet, Samy, Otto-von-Guericke Univ. Magdeburg 

Al-Hamadi, Ayoub, Otto-von-Guericke Univ. Magdeburg 

Michaelis, Bernd, Otto-von-Guericke Univ. Magdeburg 

Sayed, Usama, Otto-von-Guericke Univ. Magdeburg 

Recently, the problem of automatic traffic accident recognition has appealed to the machine vision community due to its 

implications on the development of autonomous Intelligent Transportation Systems (ITS). In this paper, a new framework 

for real-time automated traffic accidents recognition using Histogram of Flow Gradient (HFG) is proposed. This framework 

performs two major steps. First, HFG-based features are extracted from video shots. Second, logistic regression is employed 

to develop a model for the probability of occurrence of an accident by fitting data to a logistic curve. In case of occurrence 

of an accident, the trajectory of vehicle by which the accident was occasioned is determined. Preliminary results on real 

video sequences confirm the effectiveness and the applicability of the proposed approach, and it can offer delay guarantees 

for real-time surveillance and monitoring scenarios. 

ThAT4 Dolmabahçe Hall A 

Semi-Supervised and Metric Learning Regular Session 

Session chair: Sanfeliu, Alberto (Universitat Politecnica de Catalunya) 

09:00-09:20, Paper ThAT4.1 

Semi-Supervised Distance Metric Learning by Quadratic Programming 

Cevikalp, Hakan, Eskisehir Osmangazi Univ. 

This paper introduces a semi-supervised distance metric learning algorithm which uses pair-wise equivalence (similarity 

and dissimilarity) constraints to improve the original distance metric in lower-dimensional input spaces. We restrict ourselves 

to pseudo-metrics that are in quadratic forms parameterized by positive semi-definite matrices. The proposed method 

works in both the input space and kernel in-duced feature space, and learning distance metric is formulated as a quadratic 

- 244 -

optimization problem which returns a global optimal solution. Experimental results on several databases show that the 

learned distance metric improves the performances of the subsequent classification and clustering algorithms. 

09:20-09:40, Paper ThAT4.2 

A Comparitive Study on the Use of an Ensemble of Feature Extractors for the Automatic Design of Local Image Descriptors 

Carneiro, Gustavo, Tech. Univ. of Lisbon 

The use of an ensemble of feature spaces trained with distance metric learning methods has been empirically shown to be 

useful for the task of automatically designing local image descriptors. In this paper, we present a quantitative analysis 

which shows that in general, nonlinear distance metric learning methods provide better results than linear methods for automatically 

designing local image descriptors. In addition, we show that the learned feature spaces present better results 

than state of- the-art hand designed features in benchmark quantitative comparisons. We discuss the results and suggest 

relevant problems for further investigation. 

09:40-10:00, Paper ThAT4.3 

A Study on Combining Sets of Differently Measured Dissimilarities 

Ibba, Alessandro, Delft Univ. of Tech. 

Duin, Robert, Delft Univ. of Tech. 

Lee, Wan-Jui, Delft Univ. of Tech. 

The ways distances are computed or measured enable us to have different representations of the same objects. In this paper 

we want to discuss possible ways of merging different sources of information given by differently measured dissimilarity 

representations. We compare here a simple averaging scheme [1] with dissimilarity forward selection and other techniques 

based on the learning of weights of linear and quadratic forms. Our general conclusion is that, although the more advanced 

forms of combination cannot always lead to better classification accuracies, combining given distance matrices prior to 

training is always worthwhile. We can thereby suggest which combination schemes are preferable with respect to the problem 

data. 

10:00-10:20, Paper ThAT4.4 

Efficient Kernel Learning from Constraints and Unlabeled Data 

Soleymani Baghshah, Mahdieh, Sharif Univ. of Tech. 

Bagheri Shouraki, Saeed, Sharif Univ. of Tech. 

Recently, distance metric learning has been received an increasing attention and found as a powerful approach for semisupervised 

learning tasks. In the last few years, several methods have been proposed for metric learning when must-link 

and/or cannot-link constraints as supervisory information are available. Although many of these methods learn global Mahalanobis 

metrics, some recently introduced methods have tried to learn more flexible distance metrics using a kernelbased 

approach. In this paper, we consider the problem of kernel learning from both pairwise constraints and unlabeled 

data. We propose a method that adapts a flexible distance metric via learning a nonparametric kernel matrix. We formulate 

our method as an optimization problem that can be solved efficiently. Experimental evaluations show the effectiveness of 

our method compared to some recently introduced methods on a variety of data sets. 

10:20-10:40, Paper ThAT4.5 

Semi-Supervised Graph Learning: Near Strangers or Distant Relatives 

Chen, Weifu, Sun Yat-sen Univ. 

Feng, Guocan, Sun Yat-Sen Univ. 

In this paper, an easily implemented semi-supervised graph learning method is presented for dimensionality reduction and 

clustering, using the most of prior knowledge from limited pairwise constraints. We extend instance-level constraints to 

space-level constraints to construct a more meaningful graph. By decomposing the (normalized) Laplacian matrix of this 

graph, to use the bottom eigenvectors leads to new representations of the data, which are hoped to capture the intrinsic 

structure. The proposed method improves the previous constrained learning methods. Furthermore, to achieve a given 

clustering accuracy, fewer constraints are required in our method. Experimental results demonstrate the advantages of the 

proposed method. 

- 245 -

ThAT5 Dolmabahçe Hall B 

Image Segmentation - I Regular Session 

Session chair: Puig, Domenec (Univ. Rovira i Virgili) 

09:00-09:20, Paper ThAT5.1 

Robust Color Image Segmentation through Tensor Voting 

Moreno, Rodrigo, Rovira i Virgili Univ. 

Garcia Garcia, Miguel Angel, Autonomous Univ. of Madrid 

Puig, Domenec, Univ. Rovira i Virgili 

This paper presents a new method for robust color image segmentation based on tensor voting, a robust perceptual grouping 

technique used to extract salient information from noisy data. First, an adaptation of tensor voting to both image denoising 

and robust edge detection is applied. Second, pixels in the filtered image are classified into likely-homogeneous and likelyinhomogeneous 

by means of the edginess maps generated in the first step. Third, the likely-homosgeneous pixels are segmented 

through an efficient graph-based segmenter. Finally, a modified version of the same graph-based segmenter is 

applied to the likely-inhomogeneous pixels in order to obtain the final segmentation. Experiments show that the proposed 

algorithm has a better performance than the state-of-the-art. 

09:20-09:40, Paper ThAT5.2 

An Improved Fluid Vector Flow for Cavity Segmentation in Chest Radiographs 

Xu, Tao, Univ. of Alberta 

Cheng, Irene, Univ. of Alberta 

Mandal, Mrinal, Univ. of Alberta 

Fluid vector flow (FVF) is a recently developed edge-based parametric active contour model for segmentation. By keeping 

its merits of large capture range and ability to handle acute concave shapes, we improved the model from two aspects: 

edge leakage and control point selection. Experimental results of cavity segmentation in chest radiographs show that the 

proposed method provides at least 8% improvement over the original FVF method. 

09:40-10:00, Paper ThAT5.3 

Patchy Aurora Image Segmentation based on ALBP and Block Threshold 

Fu, Rong, Xidian Univ. 

Gao, Xinbo, Xidian Univ. 

Jian, Yongjun, Xidian Univ. 

The proportion of aurora region to the field of view is an important index to measure the range and scale of aurora. A 

crucial step to obtain the index is to segment aurora region from the background. A simple and efficient aurora image segmentation 

algorithm is proposed, which is composed of feature representation based on adaptive local binary patterns 

(ALBP) and aurora region estimation through block threshold. First the ALBP features of sky image are extracted and the 

threshold is determined. The aurora image to be segmented is then equally divided into detection blocks from which ALBP 

features are also extracted. Aurora block is estimated through comparison its ALBP features with the threshold. Simple as 

it is, processing in huge data set is possible. The experiment illustrates the segmentation effect of the proposed method is 

satisfying from human visual aspect and segmentation accuracy. 

10:00-10:20, Paper ThAT5.4 

Retinal Image Segmentation based on Mumford-Shah Model and Gabor Wavelet Filter 

Du, Xiaojun, Concordia Univ. 

Bui, Tien D., Concordia Univ. 

Automatic retinal image segmentation is desirable for some disease diagnosis such as diabetes. In this paper, we propose 

a new image segmentation method to segment retinal images. The new method is based on the Mumford-Shah (MS) 

model. As a region-based approach, the MS model is a good segmentation technique. However, due to non-uniform illumination, 

some traditional approximations of the MS model cannot deal with this type of problems. We present a new 

method that requires no approximations. Instead, Gabor wavelet filter is used, and the method can segment objects with 

complicated image intensity distribution. The method is used to detect blood vessels in retinal images. The results are 

comparable with or better than state-of-the-art. Our method requires no training and is relatively fast. 

- 246 -

10:20-10:40, Paper ThAT5.5 

On Selecting an Optimal Number of Clusters for Color Image Segmentation 

Le Capitaine, Hoel, Univ. of La Rochelle 

Frelicot, Carl, Univ. of La Rochelle 

This paper addresses the problem of region-based color image segmentation using a fuzzy clustering algorithm, e.g. a 

spatial version of fuzzy c-means, in order to partition the image into clusters corresponding to homogeneous regions. We 

propose to determine the optimal number of clusters, and so the number of regions, by using a new cluster validity index 

computed on fuzzy partitions. Experimental results and comparison with other existing methods show the validity and the 

efficiency of the proposed method. 

ThAT6 Topkapı Hall B 

Face Ageing Regular Session 

Session chair: Yanikoglu, Berrin (Sabanci Univ.) 

09:00-09:20, Paper ThAT6.1 

Cross-Age Face Recognition on a Very Large Database: The Performance versus Age Intervals and Improvement 

using Soft Biometric Traits 

Guo, Guodong, West Virginia Univ. 

Mu, Guowang, North Carolina Central Univ. 

Ricanek, Karl, Univ. of North Carolina 

Facial aging can degrade the face recognition performance dramatically. Traditional face recognition studies focus on 

dealing with pose, illumination, and expression (PIE) changes. Considering a large span of age difference, the influence 

of facial aging could be very significant compared to the PIE variations. How big the aging influence could be? What is 

the relation between recognition accuracy and age intervals? Can soft biometrics be used to improve the face recognition 

performance under age variations? In this paper we address all these issues. First, we investigate the face recognition performance 

degradation with respect to age intervals between the probe and gallery images on a very large database which 

contains about 55,000 face images of more than 13,000 individuals. Second, we study if soft biometric traits, e.g., race, 

gender, height, and weight, could be used to improve the cross-age face recognition accuracies, and how useful each of 

them could be. 

09:20-09:40, Paper ThAT6.2 

A Ranking Approach for Human Age Estimation based on Face Images 

Chang, Kuang-Yu, Acad. Sinica 

Chen, Chu-Song, Acad. Sinica 


In our daily life, it is much easier to distinguish which person is elder between two persons than how old a person is. When 

inferring a person’s age, we may compare his or her face with many people whose ages are known, resulting in a series of 

comparative results, and then we conjecture the age based on the comparisons. This process involves numerous pairwise 

preferences information obtained by a series of queries, where each query compares the target person’s face to those faces 

in a database. In this paper, we propose a ranking-based framework consisting of a set of binary queries. Each query 

collects a binary-classification-based comparison result. All the query results are then fused to predict the age. Experimental 

results show that our approach performs better than traditional multi-class-based and regression-based approaches for age 

estimation. 

09:40-10:00, Paper ThAT6.3 

Perceived Age Estimation under Lighting Condition Change by Covariate Shift Adaptation 

Ueki, Kazuya, NEC Soft, Ltd. 

Sugiyama, Masashi, Tokyo Inst. of Tech. 

Ihara, Yasuyuki, NEC Soft, Ltd. 

Over the recent years, a great deal of effort has been made to age estimation from face images. It has been reported that 

age can be accurately estimated under controlled environment such as frontal faces, no expression, and static lighting conditions. 

However, it is not straightforward to achieve the same accuracy level in real-world environment because of con- 

- 247 -

siderable variations in camera settings, facial poses, and illumination conditions. In this paper, we apply a recently-proposed 

machine learning technique called covariate shift adaptation to alleviating lighting condition change between laboratory 

and practical environment. Through real-world age estimation experiments, we demonstrate the usefulness of our proposed 

method. 

10:00-10:20, Paper ThAT6.4 

Ranking Model for Facial Age Estimation 

Yang, Peng, Rutgers Univ. 

Lin, Zhong, Rutgers Univ. 

Metaxas, Dimitris, Rutgers Univ. 

Feature design and feature selection are two key problems in facial image based age perception. In this paper, we proposed 

to using ranking model to do feature selection on the haar-like features. In order to build the pairwise samples for the ranking 

model, age sequences are organized by personal aging pattern within each subject. The pairwise samples are extracted 

from the sequence of each subject. Therefore, the order information is intuitively contained in the pairwise data. Ranking 

model is used to select the discriminative features based on the pairwise data. The combination of the ranking model and 

personal aging pattern are powerful to select the discriminative features for age estimation. Based on the selected features, 

different kinds of regression models are used to build prediction models. The experiment results show the performance of 

our method is comparable to the state-of-art works. 

10:20-10:40, Paper ThAT6.5 

Development of Recognition Engine for Baby Faces 

Di, Wen, Tsinghua Univ. 

Zhang, Tong, Hewlett-Packard Lab. 



Existing face recognition approaches are mostly developed based on adult faces which may not work well in distinguishing 

faces of kids. Especially, baby faces tend to have common features such as round cheeks and chins, so that current face 

recognition engines often fail to differentiate them. In this paper, we present methods for discriminating baby faces from 

adult faces, and for training a special engine to recognize faces of different babies. To achieve these, we collected a huge 

number of baby face images and developed a software system to annotate the image database. Experimental results prove 

that the trained baby face recognizer achieves dramatic improvement on differentiating baby faces and the fusion of it 

with the conventional adult face recognition engine also works well on the overall data set containing both baby and adult 

faces. 

ThAT7 Dolmabahçe Hall C 

Document Retrieval Regular Session 

Session chair: Faruquie, Tanveer (IBM Res. India) 

09:00-09:20, Paper ThAT7.1 

An Information Extraction Model for Unconstrained Handwritten Documents 

Thomas, Simon, LITIS 

Chatelain, Clement, LITIS Lab. INSA de Rouen 

Heutte, Laurent, Univ. de Rouen 

Paquet, Thierry, Univ. of Rouen 

In this paper, a new information extraction system by statistical shallow parsing in unconstrained handwritten documents 

is introduced. Unlike classical approaches found in the literature as keyword spotting or full document recognition, our 

approch relies on a strong and powerful global handwriting model. A entire text line is considered as an indivisible entity 

and is modeled with Hidden Markov Models. In this way, text line shallow parsing allows fast extraction of the relevant 

information in any document while rejecting at the same time irrelevant information. First results are promising and show 

the interest of the approach. 

- 248 -

09:20-09:40, Paper ThAT7.2 

HMM-Based Word Spotting in Handwritten Documents using Subword Models 

Fischer, Andreas, Univ. of Bern 

Keller, Andreas, Univ. of Bern 

Frinken, Volkmar, Univ. of Bern 


Handwritten word spotting aims at making document images amenable to browsing and searching by keyword retrieval. 

In this paper, we present a word spotting system based on Hidden Markov Models (HMM) that uses trained subword models 

to spot keywords. With the proposed method, arbitrary keywords can be spotted that do not need to be present in the 

training set. Also, no text line segmentation is required. On the modern IAM off-line database and the historical George 

Washington database we show that the proposed system outperforms a standard template matching approach based on dynamic 

time warping (DTW). 

09:40-10:00, Paper ThAT7.3 

A Content Spotting System for Line Drawing Graphic Document Images 

Luqman, Muhammad Muzzamil, Univ. Françoise Rabelaise Tours France; CVC Barcelona 

Brouard, Thierry, Univ. Françoise Rabelaise Tours France 

Ramel, Jean-Yves, Univ. François Rabelais de Tours 


We present a content spotting system for line drawing graphic document images. The proposed system is sufficiently domain 

independent and takes the keyword based information retrieval for graphic documents, one step forward, to Query 

By Example (QBE) and focused retrieval. During offline learning mode: we vectorize the documents in the repository, 

represent them by attributed relational graphs, extract regions of interest (ROIs) from them, convert each ROI to a fuzzy 

structural signature, cluster similar signatures to form ROI classes and build an index for the repository. During online 

querying mode: a Bayesian network classifier recognizes the ROIs in the query image and the corresponding documents 

are fetched by looking up in the repository index. Experimental results are presented for synthetic images of architectural 

and electronic documents. 

10:00-10:20, Paper ThAT7.4 

Toward Massive Scalability in Image Matching 

Moraleda, Jorge, Ricoh Innovations Inc. 

Hull, Jonathan, Ricoh 

A method for image matching from partial blurry images is presented that leverages existing text retrieval algorithms to 

provide a solution that scales to hundreds of thousands of images. As an initial application, we present a document image 

matching system in which the user supplies a query image of a small patch of a paper document taken with a cell phone 

camera, and the system returns a label identifying the original electronic document if found in a previously indexed collection. 

Experimental results show that a retrieval rate of over 70% is achieved on a collection of nearly 500,000 document 

pages. 

10:20-10:40, Paper ThAT7.5 

Learning Image Anchor Templates for Document Classification and Data Extraction 

Sarkar, Prateek, Palo Alto Res. Center 

Image anchor templates are used in document image analysis for document classification, data localization, and other 

tasks. Current tools allow human operators to mark out small sub-images from documents to act as anchor templates. 

However, this requires time, and expertise because operators have to make informed decisions based on behavior of the 

template matching algorithms, and the expected degradations patterns in documents. We propose learning templates for a 

task automatically and quickly from a few training examples. Document classification or data localization can be done 

more robustly by combining evidence from many more discriminating templates (e.g., hundreds) than would be practicable 

for operators to specify. 

- 249 -

ThAT8 Upper Foyer 

Image Analysis; Scene Understanding; Shape Modeling; Tracking and Surveillance; Vision Sensors 

Poster Session 

Session chair: Gimel’farb, Georgy (Univ. of Auckland) 

09:00-11:10, Paper ThAT8.2 

Sparse Embedding Visual Attention Systems Combined with Edge Information 

Zhao, Cairong, Nanjing Univ. of Science and Tech. 

Liu, ChuanCai, Nanjing Univ. of Science and Tech. 


Yang, Jingyu, Nanjing Univ. of Science and Tech. 

The general computational models of visual attention are to obtain multi-scale feature maps in terms of visual properties 

like intensity, color and orientation, and then combine them to get one saliency map. But due to the lack of object edge information 

and reasonable feature combination strategy, the visual saliency map of the image is a blur map. Being aware 

of these, we propose a new scheme for saliency extraction. In this paper, we firstly put forward a sparse embedding feature 

combination strategy, inspired by sparse representation. The strategy is used to combine the salient regions from the individual 

feature maps based on a novel feature sparse indicator that measures the contribution of each map to saliency. Then 

we combine traditional visual attention with edge information. Results on different scene images show that our method 

outperforms other traditional feature combination strategies. 

09:00-11:10, Paper ThAT8.4 

LLN-Based Model-Driven Validation of Data Points for Random Sample Consensus Methods 

Zhang, Liang, Communications Res. Centre Canada 

Wang, Demin, Communications Res. Center Canada 

This paper presents an on-the-fly model-driven validation of data points for random sample consensus methods (RANSAC). 

The novelty resides in the idea that an analysis of the outcomes of previous random model samplings can benefit subsequent 

samplings. Given a sequence of successful model samplings, information from the inlier sets and the model errors is used 

to provide a validness of a data point. This validness is used to guide subsequent model samplings, so that the data point 

with a higher validness has more chance to be selected. To evaluate the performance, the proposed method is applied to 

the problem of the line model fitting and the estimation of the fundamental matrix. Experimental results confirm that the 

proposed algorithm improves the performance of RANSAC in terms of the estimate accuracy and the number of samplings. 

09:00-11:10, Paper ThAT8.5 

Estimating 3D Human Pose from Single Images using Iterative Refinement of the Prior 

Daubney, Ben Christopher, Swansea Univ. 


This paper proposes a generative method to extract 3D human pose using just a single image. Unlike many existing approaches 

we assume that accurate foreground background segmentation is not possible and do not use binary silhouettes. 

A stochastic method is used to search the pose space and the posterior distribution is maximized using Expectation Maximization 

(EM). It is assumed that some knowledge is known a priori about the position, scale and orientation of the person 

present and we specifically develop an approach to exploit this. The result is that we can learn a more constrained prior 

without having to sacrifice its generality to a specific action type. A single prior is learnt using all actions in the Human 

Eva dataset [9] and we provide quantitative results for images selected across all action categories and subjects, captured 

from differing viewpoints. 

09:00-11:10, Paper ThAT8.6 

Human-Area Segmentation by Selecting Similar Silhouette Images based on Weak-Classifier Response 

Ando, Hiroaki, Chubu Univ. 

Fujiyoshi, Hironobu, Chubu Univ. 

Human-area segmentation is a major issue in video surveillance. Many existing methods estimate individual human areas 

from the foreground area obtained by background subtraction, but the effects of camera movement can make it difficult 

- 250 -

to obtain a background image. We have achieved human-area segmentation requiring no background image by using 

chamfer matching to match the results of human detection using Real AdaBoost with silhouette images. Although accuracy 

in chamfer matching drops as the number of templates increases, the proposed method enables segmentation accuracy to 

be improved by selecting silhouette images similar to the matching target beforehand based on response values from weak 

classifiers in Real AdaBoost. 

09:00-11:10, Paper ThAT8.7 

Local Optical Operators for Subpixel Scene Analysis 

Jean, Yves, City Univ. of NY 

In this paper we present a scene analysis technique with subpixel filtering based on dense coded light fields. Our technique 

computes alignment and optically projects analysis filters to local surfaces within the extent of a camera pixel. The resolution 

gain depends on the local light field density not on the point spread function of the camera optics. Abstract An initial 

structured light sequence is used in establishing each camera pixel’s footprint in the projector generated light field. Then 

a sequence of basis functions embedded in the light field, with camera pixel support, combine with the local surface texture 

and are integrated by a camera sensor to produce a localized response at the subpixel scale. We address optical modeling 

and aliasing issues since the dense light field is under sampled by the camera pixels. Results are provided with objects of 

planar and non-planar topology. 

09:00-11:10, Paper ThAT8.8 

Aesthetic Image Classification for Autonomous Agents 

Desnoyer, Mark, Carnegie Mellon Univ. 

Wettergreen, David, Carnegie Mellon Univ. 

Computational aesthetics is the study of applying machine learning techniques to identify aesthetically pleasing imagery. 

Prior work used online datasets scraped from large user communities like Flikr to get labeled data. However, online imagery 

represents results late in the media generation process, as the photographer has already framed the shot and then picked 

the best results to upload. Thus, this technique can only identify quality imagery once it has been taken. In contrast, automatically 

creating pleasing imagery requires understanding the imagery present earlier in the process. This paper applies 

computational aesthetics techniques to a novel dataset from earlier in that process in order to understand how the problem 

changes when an autonomous agent, like a robot or a real-time camera aid, creates pleasing imagery instead of simply 

identifying it. 

09:00-11:10, Paper ThAT8.9 

Removal of Moving Objects from a Street-view Image by Fusing Multiple Image Sequences 

Uchiyama, Hiroyuki, Nagoya Univ. 

Deguchi, Daisuke, Nagoya Univ. 




We propose a method to remove moving objects from an in-vehicle camera image sequence by fusing multiple image sequences. 

Driver assistance systems and services such as Google Street View require images containing no moving object. 

The proposed scheme consists of three parts: (i) collection of many image sequences along the same route by using vehicles 

equipped with an omni-directional camera, (ii) temporal and spatial registration of image sequences, and (iii) mosaicing 

partial images containing no moving object. Experimental results show that 97.3% of the moving object area could be removed 

by the proposed method. 

09:00-11:10, Paper ThAT8.10 

Improving SIFT-Based Descriptors Stability to Rotations 

Bellavia, Fabio, Univ. of Palermo 

Tegolo, Domenico, Univ. of Palermo 

Trucco, Emanuele 

Image descriptors are widely adopted structures to match image features. SIFT-based descriptors are collections of gradient 

- 251 -

orientation histograms computed on different feature regions, commonly divided by using a regular Cartesian grid or a 

log-polar grid. In order to achieve rotation invariance, feature patches have to be generally rotated in the direction of the 

dominant gradient orientation. In this paper we present a modification of the GLOH descriptor, a SIFT-based descriptor 

based on a log-polar grid, which avoids to rotate the feature patch before computing the descriptor since predefined discrete 

orientations can be easily derived by shifting the descriptor vector. The proposed descriptors, called sGLOH and sGLOH+, 

have been compared with the SIFT descriptor on the Oxford image dataset, with good results which point out its robustness 

and stability. 

09:00-11:10, Paper ThAT8.11 

Inpainting Large Missing Regions in Range Images 

Bhavsar, Arnav, Indian Inst. of Tech. Madras 

Ambasamudram, Rajagopalan, Indian Inst. of Tech. Madras 

We propose a technique to in paint large missing regions in range images. Such a technique can be used to restore degraded/occluded 

range maps. It can also serve to reconstruct dense depth maps from sparse measurements which can speed 

up the acquisition. Our method uses the visual cue from segmentation of an intensity image registered to the range image. 

Our approach enforces that pixels in the same segment should have similar range. Our simple strategy involves planefitting 

and local medians over segments to compute local energies for labeling unknown pixels. Our results exhibit high 

quality in painting with very low errors. 

09:00-11:10, Paper ThAT8.12 

Angular Variation as a Monocular Cue for Spatial Perception 

Aranda, Joan, UPC 

Navarro, Agustin A., UPC 

Perspective projection presents objects as they are naturally seen by the eye. However, this type of mapping strongly 

distorts their geometric properties as angles, which are not preserved under perspective transformations. In this work, this 

angular variation serves to model the visual effect of perspective projection. Thus, knowing that the angular distortion depends 

on the point of view of the observer, it is demonstrated that it is possible to determine the pose of an object as a consequence 

of its perspective distortion. It is a computational approach to direct perception in which spatial information of 

a scene is calculated directly from the optic array. Experimental results show the robustness provided by the use of angles 

and establishes this 3D measurement technique as an emulation of a visual perception process. 

09:00-11:10, Paper ThAT8.13 

An Exploration Scheme for Large Images: Application to Breast Cancer Grading 

Veillard, Antoine, NUS 

Lomenie, Nicolas, CNRS 

Racoceanu, Daniel, CNRS - French National Res. Center 

Most research works focus on pattern recognition within a small sample images but strategies for running efficiently these 

algorithms over large images are rarely if ever specifically considered. In particular, the new generation of satellite and 

microscopic images are acquired at a very high resolution and a very high daily rate. We propose an efficient, generic 

strategy to explore large images by combining computational geometry tools with a local signal measure of relevance in 

a dynamic sampling framework. An application to breast cancer grading from huge histopathological images illustrates 

the benefit of such a general strategy for new major applications in the field of microscopy. 

09:00-11:10, Paper ThAT8.14 

3D Human Body Modeling using Range Data 

Yamauchi, Koichiro, Keio Univ. 

Bhanu, Bir, Univ. of California 


For the 3D modeling of walking humans the determination of body pose and extraction of body parts, from the sensed 3D 

range data, are challenging image processing problems. Real body data may have holes because of self-occlusions and 

grazing angle views. Most of the existing modeling methods rely on direct fitting a 3D model into the data without con- 

- 252 -

sidering the fact that the parts in an image are indeed the human body parts. In this paper, we present a method for 3D 

human body modeling using range data that attempts to overcome these problems. In our approach the entire human body 

is first decomposed into major body parts by a parts-based image segmentation method, and then a kinematics model is 

fitted to the segmented body parts in an optimized manner. The fitted model is adjusted by the iterative closest point (ICP) 

algorithm to resolve the gaps in the body data. Experimental results and comparisons demonstrate the effectiveness of our 

approach. 

09:00-11:10, Paper ThAT8.15 

Scale Matching of 3D Point Clouds by Finding Keyscales with Spin Images 

Tamaki, Toru, Hiroshima Univ. 

Tanigawa, Shunsuke, Hiroshima Univ. 

Ueno, Yuji, Hiroshima Univ. 

Raytchev, Bisser, Hiroshima Univ. 

Kaneda, Kazufumi, Hiroshima Univ. 

In this paper we propose a method for matching the scales of 3D point clouds. 3D point sets of the same scene obtained 

by 3D reconstruction techniques usually differ in scales. To match scales, we propose a keyscale that characterizes the 

scale of a given 3D point cloud. By performing PCA of spin images over different scales, a keyscale is defined as the 

scale that gives the minimum of cumulative contribution rate of PCA at a specific dimension of eigen space. Simulations 

with the Stanford bunny and experimental results with 3D reconstructions of a real scene demonstrate that keyscales of 

any 3D point clouds can be uniquely found and effectively used for scale matching. 

09:00-11:10, Paper ThAT8.16 

Tracking Multiple People with Illumination Maps 

Zen, Gloria, Fondazione Bruno Kessler 

Lanz, Oswald, Fondazione Bruno Kessler 

Messelodi, Stefano, Fondazione Bruno Kessler 

Ricci, Elisa, Fondazione Bruno Kessler 

We address the problem of multiple people tracking under non-homogenous and time-varying illumination conditions. 

We propose a unified framework for jointly estimating the position of the targets and their illumination conditions. For 

each target multiple templates are considered to model appearance variations due to lighting changes. The template choice 

is driven by an illumination map which describes the light conditions in different areas of the scene. This map is computed 

with a novel algorithm for efficient inference in a hierarchical Markov Random Field (MRF) and is updated online to 

adapt to slow lighting changes. Experimental results demonstrate the effectiveness of our approach. 

09:00-11:10, Paper ThAT8.17 

Combining Foreground / Background Feature Points and Anisotropic Mean Shift for Enhanced Visual Object 

Tracking 

Haner, Sebastian, Lund Univ. of Tech. 

Gu, Irene Yu-Hua, Chalmers Univ. of Tech. 

This paper proposes a novel visual object tracking scheme, exploiting both local point feature correspondences and global 

object appearance using the anisotropic mean shift tracker. Using a RANSAC cost function incorporating the mean shift 

motion estimate, motion smoothness and complexity terms, an optimal feature point set for motion estimation is found 

even when a high proportion of outliers is presented. The tracker dynamically maintains sets of both foreground and background 

features, the latter providing information on object occlusions. The mean shift motion estimate is further used to 

guide the inclusion of new point features in the object model. Our experiments on videos containing long term partial occlusions, 

object intersections and cluttered or close color distributed background have shown more stable and robust tracking 

performance in comparison to three existing methods. 

09:00-11:10, Paper ThAT8.18 

Enhanced Measurement Model for Subspace-Based Tracking 

Yin, Shimin, Seoul National Univ. 

Yoo, Haan Ju, Seoul National Univ. 

- 253 -

Choi, Jin Young, Automation and System Res. Inst. Seoul NationalUniversity 

We present an efficient and robust measurement model for visual tracking. This approach builds on and extends work on 

measurement model of subspace representation. Subspace-based tracking algorithms have been introduced to visual tracking 

literature for a decade and show considerable tracking performance due to its robustness in matching. However, the 

measures used in their measurement models are not robust enough in cluttered backgrounds. We propose a novel measure 

of object matching referred to as WDIFS, which aims to improve the discriminability of matching within the subspace. 

Our measurement model can distinguish target from similar background clutters which often cause erroneous drift by conventional 

DFFS based measure. Experiments demonstrate the effectiveness of the proposed tracking algorithm under cluttered 

background. 

09:00-11:10, Paper ThAT8.19 

Person-Specific Face Shape Estimation under Varying Head Pose from Single Snapshots 

Dornaika, Fadi, Univ. of the Basque Country 

Raducanu, Bogdan, Computer Vision Center 

This paper presents a new method for person-specific face shape estimation under varying head pose of a previously 

unseen person from a single image. We describe a featureless approach based on a deformable 3D model and a learned 

face subspace. The proposed approach is based on maximizing a likelihood measure associated with a learned face subspace, 

which is carried out by a stochastic and genetic optimizer. We conducted the experiments on a subset of Honda 

Video Database showing the feasibility and robustness of the proposed approach. For this reason, our approach could lend 

itself nicely to complex frameworks involving 3D face tracking and face gesture recognition in monocular videos. 

09:00-11:10, Paper ThAT8.20 

Tracking Ships from Fast Moving Camera through Image Registration 

Fefilatyev, Sergiy, Univ. of South Florida 


Lembke, Chad, Univ. of South Florida 

This paper presents an algorithm that detects and tracks marine vessels in video taken by a nonstationary camera installed 

on an untethered buoy. The video is characterized by large inter-frame motion of the camera, cluttered background, and 

presence of compression artifacts. Our approach performs segmentation of ships in individual frames processed with a 

color-gradient filter. The threshold selection is based on the histogram of the search region. Tracking of ships in a sequence 

is enabled by registering the horizon images in one coordinate system and by using a multihypothesis framework. Registration 

step uses an area-based technique to correlate a processed strip of the image over the found horizon line. The results 

of evaluation of detection, localization, and tracking of the ships show significant increase in performance in comparison 

to the previously used technique. 

09:00-11:10, Paper ThAT8.21 

Boosted Multiple Kernel Learning for Scene Category Recognition 

Jhuo, I-Hong, National Taiwan Univ. 

Lee, Der-Tsai, National Taiwan Univ. 

Scene images typically include diverse and distinctive properties. It is reasonable to consider different features in establishing 

a scene category recognition system with a promising performance. We propose an adaptive model to represent 

various features in a unified domain, i.e., a set of kernels, and transform the discriminant information contained in each 

kernel into a set of weak learners, called dyadic hyper cuts. Based on this model, we present a novel approach to carrying 

out incremental multiple kernel learning for feature fusion by applying AdaBoost to the union of the sets of weak learners. 

We further evaluate the performance of this approach by a benchmark dataset for scene category recognition. Experimental 

results show a significantly improved performance in both accuracy and efficiency. 

09:00-11:10, Paper ThAT8.22 

Receding Horizon Estimation for Hybrid Particle Filters and Application for Robust Visual Tracking 

Kim, Du Yong, Gwangju Inst. of Science and Tech. 

Yang, Ehwa, Gwangju Inst. of Science and Tech. 

- 254 -

Jeon, Moongu, Gwangju Inst. of Science and Tech. 

Shin, Vladimir, Gwangju Inst. of Science and Tech. 

The receding horizon estimation is applied to design robust visual trackers. Most recent data within the fixed size of windows 

is receding, and is processed to obtain an estimate of the object state at the current time. In visual tracking such a 

scheme improves filter accuracy by avoiding accumulated approximation errors. A newly derived unscented Kalman filter 

(UKF) based on the receding horizon strategy is proposed for determining the importance density of the hybrid particle 

filter. The importance density derived by the receding horizon-based UKF (RHUKF) provides significantly improved accuracy 

and performance consistency compared to the unscented particle filter (UPF). Visual tracking examples are subsequently 

tested to demonstrate the advantages of the filter. 

09:00-11:10, Paper ThAT8.23 

Efficient Polygonal Approximation of Digital Curves via Monte Carlo Optimization 

Zhou, Xiuzhuang, Beijing Inst. of Tech. 

Lu, Yao, Beijing Inst. of Tech. 

A novel stochastic searching scheme based on the Monte Carlo optimization is presented for polygonal approximation 

(PA) problem. We propose to combine the split-and-merge based local optimization and the Monte Carlo sampling, to 

give an efficient stochastic optimization scheme. Our approach, in essence, is a well-designed Basin-Hopping scheme, 

which performs stochastic hopping among the reduced energy peaks. Experiment results on various benchmarks show 

that our method achieves high-quality solutions with lower computational costs, and outperforms most of state-of-the-art 

algorithms for PA problem. 

09:00-11:10, Paper ThAT8.24 

Weakly Supervised Action Recognition using Implicit Shape Models 

Thi, Tuan Hue, Univ. of New South Wales and National ICT of Australia 

Cheng, Li, National ICT of Australia 

Zhang, Jian, National ICT of Australia 

Wang, Li, Nanjing Forest Univ. 


In this paper, we present a robust framework for action recognition in video, that is able to perform competitively against 

the state-of-the-art methods, yet does not rely on sophisticated background subtraction preprocess to remove background 

features. In particular, we extend the Implicit Shape Modeling (ISM) of [10] for object recognition to 3D to integrate local 

spatiotemporal features, which are produced by a weakly supervised Bayesian kernel filter. Experiments on benchmark 

datasets (including KTH and Weizmann) verifies the effectiveness of our approach. 

09:00-11:10, Paper ThAT8.25 

Moments of Elliptic Fourier Descriptors 

Soldea, Octavian, Sabanci Univ. 

Unel, Mustafa, Sabanci Univ. 


This paper develops a recursive method for computing moments of 2D objects described by elliptic Fourier descriptors 

(EFD). Green’s theorem is utilized to transform 2D surface integrals into 1D line integrals and EFD description is employed 

to derive recursions for moments computations. Experiments are performed to quantify the accuracy of our proposed 

method. Comparison with Bernstein-Bezier representations is also provided. 

09:00-11:10, Paper ThAT8.26 

Semi-Supervised Trajectory Learning using a Multi-Scale Key Point based Trajectory Representation 

Liu, Yang, Chinese Acad. of Sciences 

Li, Xi, CNRS, TELECOM ParisTech 

Hu, Weiming, National Lab. of Pattern Recognition, Inst. 

Motion trajectories contain rich high-level semantic information such as object behaviors and gestures, which can be ef- 

- 255 -

fectively captured by supervised trajectory learning. However, it is usually a tough task to obtain a large number of highquality 

manually labeled samples in real applications. Thus, how to perform trajectory learning in small training sample 

size situations is an important research topic. In this paper, we propose a trajectory learning framework using graph-based 

semi-supervised transductive learning, which propagates training sample labels along a particular graph. Furthermore, a 

novel trajectory descriptor based on multi-scale key points is proposed to characterize the spatial structural information. 

Experimental results demonstrate effectiveness of our framework. 

09:00-11:10, Paper ThAT8.27 

Detection based Low Frame Rate Human Tracking 

Wang, Lu, The Univ. of Hong Kong 

Yung, Nelson, the Univ. of Hong Kong 

Tracking by association of low frame rate detection responses is not trivial, as motion is less continuous and hence ambiguous. 

The problem becomes more challenging when occlusion occurs. To solve this problem, we firstly propose a 

robust data association method that explicitly differentiates ambiguous tracklets that are likely to introduce incorrect 

linking from other tracklets, and deal with them effectively. Secondly, we solve the long-time occlusion problem by detecting 

inter-track relationship and performing track split and merge according to appearance similarity and occlusion 

order. Experiment on a challenging human surveillance dataset shows the effectiveness of the proposed method. 

09:00-11:10, Paper ThAT8.28 

Detecting Dominant Motion Flows in Unstructured/Structured Crowd Scenes 

Ozturk, Ovgu, The Univ. of Tokyo 

Yamasaki, Toshihiko, The Univ. of Tokyo 

Aizawa, Kiyoharu, The Univ. of Tokyo 

Detecting dominant motion flows in crowd scenes is one of the major problems in video surveillance. This is particularly 

difficult in unstructured crowd scenes, where the participants move randomly in various directions. This paper presents a 

novel method which utilizes SIFT features’ flow vectors to calculate the dominant motion flows in both unstructured and 

structured crowd scenes. SIFT features can represent the characteristic parts of objects, allowing robust tracking under 

non-rigid motion. First, flow vectors of SIFT features are calculated at certain intervals to form a motion flow map of the 

video. ‘ext, this map is divided into equally sized square regions and in each region dominant motion flows are estimated 

by clustering the flow vectors. Then, local dominant motion flows are combined to obtain the global dominant motion 

flows. Experimental results demonstrate the successful application of the proposed method to challenging real-world 

scenes. 

09:00-11:10, Paper ThAT8.29 

Statistical Shape Modeling using Morphological Representations 

Velasco-Forero, Santiago, MINES ParisTech 

Angulo, Jesus, MINES ParisTech 

The aim of this paper is to propose tools for statistical analysis of shape families using morphological operators. Given a 

series of shape families (or shape categories), the approach consists in empirically computing shape statistics (i.e., mean 

shape and variance of shape) and then to use simple algorithms for random shape generation, for empirical shape confidence 

boundaries computation and for shape classification using Bayes rules. The main required ingredients for the present methods 

are well known in image processing, such as watershed on distance functions or log-polar transformation. Performance 

of classification is presented in a well-known shape database. 

09:00-11:10, Paper ThAT8.30 

Recovering the Topology of Multiple Cameras by Finding Continuous Paths in a Trellis 


Kaiqi, Huang, CAS Inst. of Automation 

Tan, Tieniu, CAS Inst. of Automation 


In this paper, we propose an unsupervised method for recovering the topology of multiple cameras with non-overlapping 

- 256 -

fields of view. The nodes in the topology graph are defined as entry/exit zones in each camera while the connectivity between 

nodes is inferred through finding continuous paths in a trellis where appearance information and temporal information 

of moving objects are encoded. Unlike previous methods which assume a single mode transition distribution between 

nodes, our method is capable of dealing with multi-modal transition situations when both cars and pedestrians are in the 

scene. Results on simulated and real-life datasets demonstrate the effectiveness of the proposed method. 

09:00-11:10, Paper ThAT8.31 

On-Line Random Naive Bayes for Tracking 

Godec, Martin, Graz Univ. of Tech. 

Leistner, Christian, Graz Univ. of Tech. 

Saffari, Amir, Graz Univ. of Tech. 


Randomized learning methods (i.e., Forests or Ferns) have shown excellent capabilities for various computer vision applications. 

However, it was shown that the tree structure in Forests can be replaced by even simpler structures, e.g., Random 

Naive Bayes classifiers, yielding similar performance. The goal of this paper is to benefit from these findings to develop 

an efficient on-line learner. Based on the principals of on-line Random Forests, we adapt the Random Naive Bayes classifier 

to the on-line domain. For that purpose, we propose to use on-line histograms as weak learners, which yield much better 

performance than simple decision stumps. Experimentally we show, that the approach is applicable to incremental learning 

on machine learning datasets. Additionally, we propose to use an iir filtering-like forgetting function for the weak learners 

to enable adaptivity and evaluate our classifier on the task of tracking by detection. 

09:00-11:10, Paper ThAT8.32 

Interest Point based Tracking 

Kloihofer, Werner, Center Communication Systems GmbH 

Kampel, Martin, Vienna Univ. of Tech. 

This paper deals with a novel method for object tracking. In the first step interest points are detected and feature descriptors 

around them are calculated. Sets of known points are created, allowing tracking based on point matching. The set representation 

is updated online at every tracking step. Our method uses one-shot learning with the first frame, so no offline 

and no supervised learning is required. Following an object recognition based approach there is no need for a background 

model or motion model, allowing tracking of abrupt motion and with non-stationary cameras. We compare our method to 

Mean Shift and Tracking via Online Boosting, showing the benefits of our approach. 

09:00-11:10, Paper ThAT8.33 

Stochastic Filtering of Level Sets for Curve Tracking 

Avenel, Christophe, Irisa 

Memin, Etienne 

Perez, Patrick 

This paper focuses on the tracking of free curves using non-linear stochastic filtering techniques. It relies on a particle 

filter which includes color measurements. The curve and its velocity are defined through two coupled implicit level set 

representations. The stochastic dynamics of the curve is expressed directly on the level set function associated to the curve 

representation and combines a velocity field captured from the additional second level set attached to the past curve’s 

points location. The curve’s dynamics combines a low-dimensional noise model and a data-driven local force. We demonstrate 

how this approach allows the tracking of highly and rapidly deforming objects, such as convective cells in infra-red 

satellite images, while providing a location-dependent assessment of the estimation confidence. 

09:00-11:10, Paper ThAT8.34 

Scalable Cage-Driven Feature Detection and Shape Correspondence for 3D Point Sets 

Seversky, Lee, State Univ. of New York at Binghamton 

Yin, Lijun, State Univ. of New York at Binghamton 

We propose an automatic deformation-driven correspondence algorithm for 3D point sets of non-rigid articulated shapes. 

- 257 -

Our approach uses simple geometric cages to embed the point set data and extract and match a coarse set of prominent 

features. We seek feature correspondences which lead to low-distortion deformations of the cages while satisfying the feature 

pairing. Our approach operates on the simplified geometric domain of the cage instead of the more complex 3D point 

data. Thus, it is robust to noise, partial occlusions, and insensitive to non-regular sampling. We demonstrate the potential 

of our approach by finding pairwise correspondences for sequences of acquired time-varying 3D scan point data. 

09:00-11:10, Paper ThAT8.35 

Event Recognition based on Top-Down Motion Attention 

Li, Li, Chinese Acad. of Sci. 

Hu, Weiming, Chinese Acad. of Sci. 

Li, Bing, Chinese Acad. of Sci. 

Yuan, Chunfeng, Chinese Acad. of Sci. 

Zhu, Pengfei, Chinese Acad. of Sci. 

Li, Wanqing, Univ. of Wollongong 

How to fuse static and dynamic information is a key issue in event analysis. In this paper, a top-down motion guided 

fusing method is proposed for recognizing events in an unconstrained news video. In the method, the static information is 

represented as a Bag-of-SIFT-features and motion information is employed to generate event specific attention map to 

direct the sampling of the interest points. We build class-specific motion histograms for each event so as to give more 

weight on the interest points that are discriminative to the corresponding event. Experimental results on TRECVID 2005 

video corpus demonstrate that the proposed method can improve the mean average accuracy of recognition. 

09:00-11:10, Paper ThAT8.36 

Construction of Precise Local Affine Frames 

Mikulik, Andrej, CMP FEE, CTU Prague 

Matas, Jiri, CTU Prague 

Perdoch, Michal, CMP, FEE, CTU Prague 

Chum, Ondrej, 

We propose a novel method for the refinement of Maximally Stable Extremal Region (MSER) boundaries to sub-pixel 

precision by taking into account the intensity function in the 2x2 neighborhood of the contour points. The proposed method 

improves the repeatability and precision of Local Affine Frames (LAFs) constructed on extremal regions. Additionally, 

we propose a novel method for detection of local curvature extrema on the refined contour. Experimental evaluation on 

publicly available datasets shows that matching with the modified LAFs leads to a higher number of correspondences and 

a higher inlier ratio in more than 80% of the test image pairs. Since the processing time of the contour refinement is negligible, 

there is no reason not to include the algorithms as a standard part of the MSER detector and LAF constructions. 

09:00-11:10, Paper ThAT8.37 

Foreground Segmentation via Background Modeling on Riemannian Manifolds 

Caseiro, Rui, Univ. of Coimbra 

Henriques, João F, Univ. of Coimbra 

Batista, Jorge, Univ. of Coimbra 

Statistical modeling in color space is a widely used approach for background modeling to foreground segmentation. Nevertheless, 

sometimes computing such statistics directly on image values is not enough to achieve a good discrimination. 

Thus the image may be converted into a more information rich form, such as a tensor field, in which can be encoded color 

and gradients. In this paper, we exploit the theoretically well-founded differential geometrical properties of the Riemannian 

manifold where tensors lie. We propose a novel and efficient approach for foreground segmentation on tensor field based 

on data modeling by means of Gaussians mixtures (GMM) directly in the tensor domain. We introduced a Expectation 

Maximization (EM) algorithm to estimate the mixture parameters, and are proposed two algorithms based on an online 

K-means approximation of EM, in order to speed up the process. Theoretic analysis and experimental evaluations demonstrate 

the promise and effectiveness of the proposed framework. 

- 258 -

09:00-11:10, Paper ThAT8.38 

Robust Human Behavior Modeling from Multiple Cameras 

Kosmopoulos, D., NCSR Demokritos 

Voulodimos, Athanasios, National Tech. Univ. of Athens 

Varvarigou, Theodora, National Tech. Univ. of Athens 

In this work, we propose a framework for classifying structured human behavior in complex real environments, where 

problems such as frequent illumination changes and heavy occlusions are expected. Since target recognition and tracking 

can be very challenging, we bypass these problems by employing an approach similar to Motion History Images for feature 

extraction. Furthermore, to tackle outliers residing within the training data, which might affect severely the training algorithm 

of models with Gaussian observation likelihoods, we scrutinize the effectiveness of the multivariate Student-t distribution 

as the observation likelihood of the employed Hidden Markov Models. Additionally, the problem of visibility 

and occlusions is addressed by providing various extensions of the framework for multiple cameras, both at the feature 

and at the state level. Finally, we evaluate the performance of the examined approaches under real-life visual behavior understanding 

scenarios and we compare and discuss the obtained results. 

09:00-11:10, Paper ThAT8.39 

Unsupervised Learning of Activities in Video using Scene Context 

Oh, Sangmin, Kitware Inc. 

Hoogs, Anthony, Kitware Inc. 

Unsupervised learning of semantic activities from video collected over time is an important problem for visual surveillance 

and video scene understanding. Our goal is to cluster tracks into semantically interpretable activity models that are independent 

of scene locations; most previous work in video scene understanding is focused on learning location-specific normalcy 

models. Location-independent models can be used to detect instances of the same activity anywhere in the scene, 

or even across multiple scenes. Our insight for this unsupervised activity learning problem is to incorporate scene context 

to characterize the behavior of every track. By scene context, we mean local scene structures, such as building entrances, 

parking spots and roads, that moving objects frequently interact with. Each track is attributed with large number of potentially 

useful features that capture the relationships and interactions with a set of existing scene context elements. Once 

feature vectors are obtained, tracks are grouped in this feature space using state-of-the-art clustering techniques, without 

considering scene location. Experiments are conducted on webcam video of a complex scene, with many interacting 

objects and very noisy tracks resulting from low frame rates and poor image quality. Our results demonstrate that location-independent 

and semantically interpretable groupings can be successfully obtained using unsupervised clustering 

methods, and that the models are superior to standard location-dependent clustering. 

09:00-11:10, Paper ThAT8.40 

Multipath Interference Compensation in Time-of-Flight Camera Images 

Fuchs, Stefan, German Aerospace Center 

Multipath interference is inherent to the working principle of a Time-of-flight camera and can influence the measurements 

by several centimeters. Especially in applications that demand for high accuracy, such as object localization for robotic 

manipulation or ego-motion estimation of mobile robots, multipath interference is not tolerable. In this paper we formulate 

a multipath model in order to estimate the interference and correct the measurements. The proposed approach comprises 

the measured scene structure. All distracting surfaces are assumed to be Lambertian radiators and the directional interference 

is simulated for correction purposes. The positive impact of these corrections is experimentally demonstrated. 

09:00-11:10, Paper ThAT8.41 

Segment-Based Foreground Extraction Dedicated to 3D Reconstruction 

Kim, Jungwhan, Soongsil Univ. 

Park, Anjin, AIST 

Jung, Keechul, Soongsil Univ. 

Researches of image-based 3D reconstruction have recently produced a number of good results, but they assume that the 

accurate foreground to be reconstructed is already extracted from each input image. This paper proposes a novel approach 

to extract more accurate foregrounds by iteratively performing foreground extraction and 3D reconstruction in a manner 

similar to an EM algorithm on regions segmented in an initial stage, called segments. After definitively extracting the 

- 259 -

foregrounds in multi-views based on simply selecting segments corresponding to the real foreground in only one image, 

further improved foregrounds are extracted by back-projecting 3D objects reconstructed based on the foreground extracted 

in the previous step into segments of each image in multi-views. These two steps are iteratively performed until the energy 

function is optimized. In the experiments, more accurate boundaries were obtained, although the proposed method used a 

simple 3D reconstruction method. 

09:00-11:10, Paper ThAT8.42 

Human Pose Estimation for Multiple Persons based on Volume Reconstruction 

Luo, Xinghan, Utrecht Univ. 

Berendsen, Berend 

Tan, Robby T., Utrecht Univ. 

Veltkamp, R. C., Utrecht Univ. 

Most of the development of pose recognition focused on a single person. However, many applications of computer vision 

essentially require the estimation of multiple people. Hence, in this paper, we address the problems of estimating poses of 

multiple persons using volumes estimated from multiple cameras. One of the main issues that causes the multiple person 

from multiple cameras to be problematic is the present of ghost; volumes. This problem arises when the projections of 

two different silhouettes of two different persons onto the 3D world overlap in a place where in fact there is no person in 

it. To solve this problem, we first introduce a novel principal axis-based framework to estimate the 3D ground plane positions 

of multiple people, and then use the position cues to label the multi-person volumes (voxels), while considering 

the voxel connectivity. Having labeled the voxels, we fit the volume of each person with a body model, and determine the 

pose of the person based on the model. The results on real videos demonstrate the accuracy and efficiency of our approach. 

09:00-11:10, Paper ThAT8.43 

3D Articulated Shape Segmentation using Motion Information 

Kalafatlar, Emre, Koç Univ. 

Yemez, Yucel, Koç Univ. 

We present a method for segmentation of articulated 3D shapes by incorporating the motion information obtained from 

time-varying models. We assume that the articulated shape is given in the form of a mesh sequence with fixed connectivity 

so that the inter-frame vertex correspondences, hence the vertex movements, are known a priori. We use different postures 

of an articulated shape in multiple frames to constitute an affinity matrix which encodes both temporal and spatial similarities 

between surface points. The shape is then decomposed into segments in spectral domain based on the affinity 

matrix using a standard K-means clustering algorithm. The performance of the proposed segmentation method is demonstrated 

on the mesh sequence of a human actor. 

09:00-11:10, Paper ThAT8.44 

Online Learning with Self-Organizing Maps for Anomaly Detection in Crowd Scenes 

Feng, Jie, Peking Univ. 

Zhang, Chao, Peking Univ. 

Hao, Pengwei, Queen Mary Univ. of London 

Detecting abnormal behaviors in crowd scenes is quite important for public security and has been paid more and more attentions. 

Most previous methods use offline trained model to perform detection which can’t handle the constantly changing 

crowd environment. In this paper, we propose a novel unsupervised algorithm to detect abnormal behavior patterns in 

crowd scenes with online learning. The crowd behavior pattern is extracted from the local spatio-temporal volume which 

consists of multiple motion patterns in temporal order. An online self-organizing map (SOM) is used to model the large 

number of behavior patterns in crowd. Each neuron can be updated by incrementally learning the new observations. To 

demonstrate the effectiveness of our proposed method, we have performed experiments on real-world crowd scenes. The 

online learning can efficiently reduce the false alarms while still be able to detect most of the anomalies. 

09:00-11:10, Paper ThAT8.45 

Scene Classification using Spatial Pyramid of Latent Topics 

Ergul, Emrah, Turkish Naval Academy 

Arica, Nafiz, Turkish Naval Academy 

- 260 -

We propose a scene classification method, which combines two popular methods in the literature: Spatial Pyramid Matching 

(SPM) and probabilistic Latent Semantic Analysis (pLSA) modeling. The proposed scheme called Cascaded pLSA performs 

pLSA in a hierarchical sense after the soft-weighted BoW representation based on dense local features is extracted. 

We associate spatial layout information by dividing each image into overlapping regions iteratively at different resolution 

levels and implementing a pLSA model for each region individually. Finally, an image is represented by concatenated 

topic distributions of each region. In performance evaluation, we compare the proposed method with the most successful 

methods in the literature, using the popular 15-class-dataset. In the experiments, it is seen that our method slightly outperforms 

the others in that particular dataset. 

09:00-11:10, Paper ThAT8.46 

Optimization of Target Objects for Natural Feature Tracking 

Gruber, Lukas, Graz Univ. of Tech. 

Zollmann, Stefanie, Graz Univ. of Tech. 

Wagner, Daniel, Graz Univ. of Tech. 

Schmalstieg, Dieter, Graz Univ. of Tech. 

Hollerer, Tobias, UCSB 

This paper investigates possible physical alterations of tracking targets to obtain improved 6DoF pose detection for a 

camera observing the known targets. We explore the influence of several texture characteristics on the pose detection, by 

simulating a large number of different target objects and camera poses. Based on statistical observations, we rank the importance 

of characteristics such as texturedness and feature distribution for a specific implementation of a 6DoF tracking 

technique. These findings allow informed modification strategies for improving the tracking target objects themselves, in 

the common case of man-made targets, as for example used in advertising. This fundamentally differs from and complements 

the traditional approach of leaving the targets unchanged while trying to optimize the tracking algorithms and parameters. 

09:00-11:10, Paper ThAT8.47 

View-Invariant Action Recognition using Rank Constraint 

Ashraf, Nazim, Univ. of Central Florida 

Shen, Yuping, Univ. of Central Florida 

Foroosh, Hassan, Univ. of Central Florida 

We propose a new method for view-invariant action recognition based on the rank constraint on the family of planar homographies 

associated with triplets of body points. We represent action as a sequence of poses and we use the fact that the 

family of homographies associated with two identical poses would have rank 4 to gauge similarity of the pose between 

two subjects, observed by different perspective cameras and from different viewpoints. Extensive experimental results 

show that our method can accurately identify action from video sequences when they are observed from totally different 

viewpoints with different camera parameters. 

09:00-11:10, Paper ThAT8.48 

Coarse-To-Fine Particle Filter by Implicit Motion Estimation for 3D Head Tracking on Mobile Devices 

Sung, Hacheon, Yonsei Univ. 

Choi, Kwontaeg, Yonsei Univ. 

Byun, Hyeran, Yonsei Univ. 

Due to the widely spread mobile devices over the years, a low cost implementation of an efficient head tracking system 

is becoming more useful for a wide range of applications. In this paper, we make an attempt to solving real-time 3D head 

tracking problem on mobile devices by enhancing the fitness of the dynamics. In our method, the particles are generated 

by implicit motion estimation between two particles rather than the explicit motion estimation using corresponding point 

matching between consecutive two frames. This generation is applied iteratively using coarse-to fine strategy in order to 

handle a large motion using a small number of particle. This reduces the computational cost while preserving the performance. 

We evaluate the efficiency and effectiveness of the proposed algorithm by empirical experiments. Finally, we demonstrate 

our method on a recent mobile phone. 

- 261 -

09:00-11:10, Paper ThAT8.49 

Visibility of Multiple Cameras in a Scene with Unknown Geometry 

Zhang, Liuxin, Beijing Inst. of Tech. 


In this paper, we investigate the problem of determining the visible regions of multiple cameras in a 3D scene without a 

priori knowledge of the scene geometry. Our approach is based on a variational energy functional where both the unresolved 

visibility information of multiple cameras and the unknown scene geometry are included. We cast visibility estimation 

and scene geometry reconstruction as an optimization of the variational energy functional amenable for minimization with 

the Euler-Lagrange driven evolution. Starting from any initial value, the accurate visibility of multiple cameras as well as 

the true scene geometry can be obtained at the end of the evolution. Experimental results show the validity of our approach. 

09:00-11:10, Paper ThAT8.50 

Low-Level Image Segmentation based Scene Classification 

Akbas, Emre, Univ. of Illinois 

Ahuja, Narendra, Univ. of Illinois 

This paper is aimed at evaluating the semantic information content of multiscale, low-level image segmentation. As a 

method of doing this, we use selected features of segmentation for semantic classification of real images. To estimate the 

relative measure of the information content of our features, we compare the results of classifications we obtain using them 

with those obtained by others using the commonly used patch/grid based features. To classify an image using segmentation 

based features, we model the image in terms of a probability density function, a Gaussian mixture model (GMM) to be 

specific, of its region features. This GMM is fit to the image by adapting a universal GMM which is estimated so it fits 

all images. Adaptation is done using a maximum-aposteriori criterion. We use kernelized versions of Bhattacharyya distance 

to measure the similarity between two GMMs and support vector machines to perform classification. We outperform previously 

reported results on a publicly available scene classification dataset. These results suggest further experimentation 

in evaluating the promise of low level segmentation in image classification. 

09:00-11:10, Paper ThAT8.51 

Learning Scene Semantics using Fiedler Embedding 

Liu, Jingen, Univ. of Michigan 

Ali, Saad, Carnegie Mellon Univ. 

We propose a framework to learn scene semantics from surveillance videos. Using the learnt scene semantics, a video analyst 

can efficiently and effectively retrieve the hidden semantic relationship between homogeneous and heterogeneous 

entities existing in the surveillance system. For learning scene semantics, the algorithm treats different entities as nodes 

in a graph, where weighted edges between the nodes represent the “initial” strength of the relationship between entities. 

The graph is then embedded into a k-dimensional space by Fiedler Embedding. 

09:00-11:10, Paper ThAT8.52 

Counting Vehicles in Highway Surveillance Videos 

Tamersoy, Birgi, The Univ. of Texas at Austin 

Aggarwal, J. K., The Univ. of Texas at Austin 

This paper presents a complete system for accurately and efficiently counting vehicles in a highway surveillance video. 

The proposed approach employs vehicle detection and tracking modules. In the detection module, an automatically trained 

binary classifier detects vehicles while providing robustness against view-point, poor quality videos and clutter. Efficient 

tracking is then achieved by a simplified multi-hypothesis approach. First an over-complete set of tracks is created considering 

every observed detection within a time interval. As needed, hypothesized detections are generated to force continuous 

tracks. Finally, a scoring function is used to separate the valid tracks in the over-complete set. Our tracking system 

achieved accurate results in significantly challenging highway surveillance videos. 

- 262 -

09:00-11:10, Paper ThAT8.53 

Efficient 3D Upper Body Tracking with Self-Occlusions 

Chen, Jixu, RPI 


We propose an efficient 3D upper body tracking method, which recovers the positions and orientations of six upper-body 

parts from the video sequence. Our method is based on a probabilistic graphical model (PGM), which incorporates the 

spatial relationships among the body parts, and a robust multi-view image likelihood using probabilistic PCA (PPCA). 

For the efficiency, we use a tree-structured graphical model and use the particle based belief propagation to perform the 

inference. Since our image likelihood is based on multiple views, we address the self-occlusion by modeling the likelihood 

of the body part in each view, and automatically decrease the influence of the occluded view in the inference procedure. 

09:00-11:10, Paper ThAT8.54 

Track Initialization in Low Frame Rate and Low Resolution Videos 

Cuntoor, Naresh, Kitware Inc. 

Basharat, Arslan, Kitware Inc. 

Perera, A. G. Amitha, Kitware Inc. 

Hoogs, Anthony, Kitware Inc. 

The problem of object detection and tracking has received relatively less attention in low frame rate and low resolution 

videos. Here we focus on motion segmentation in videos where objects appear small (less than 30-pixel tall people) and 

have low frame rate (less than 5 Hz). We study challenging cases where some of the, otherwise successful, approaches 

may break down. We investigate a number of popular techniques in computer vision that have been shown to be useful 

for discriminating various spatio-temporal signatures. These include: Histogram of oriented Gradients (HOG), Histogram 

of oriented optical Flow (HOF) and Haar-features (Viola and Jones). We use these feature to classify the motion segmentations 

into person vs. other and vehicle vs. other. We rely on aligned motion history images to create a more consistent 

object representation across frames. We present results on these features using webcam data and wide-area aerial video 

sequences. 

09:00-11:10, Paper ThAT8.55 

On the Performance of Handoff and Tracking in a Camera Network 

Li, Yiming, Univ. of California Riverside 

Bhanu, Bir, Univ. of California Riverside 

Nguyen, Vincent, Univ. of California Riverside 

Camera handoff is an important problem when using multiple cameras to follow a number of objects in a video network. 

However, almost all the handoff techniques rely on a robust tracker. State-of-the-art techniques used to evaluate the performance 

of camera handoff use either annotated videos or simulated data, and the handoff performance is evaluated in 

conjunction with a tracker. This does not allow a deeper understanding into the performance of a tracker and a handoff 

technique separately in the real-world settings. In this paper, we evaluate three camera handoff techniques, two different 

color-based trackers in seven real-life cases, with varying numbers of cameras, number of objects and the changing environmental 

conditions. We also perform experiments on annotated videos to provide the ground-truth for all the scenarios. 

This evaluation of performance isolates the effect of tracking and handoff techniques and clarifies their role in a video 

network. 

09:00-11:10, Paper ThAT8.56 

Object Tracking with Ratio Cycles using Shape and Appearance Cues 

Sargin, Mehmet Emre, UC Santa Barbara 

Ghosh, Pratim, UC Santa Barbara 

Manjunath, B. S., UC Santa Barbara 

Rose, Kenneth, UC Santa Barbara 

We present a method for object tracking over time sequence imagery. The image plane is represented with a 4-connected 

planar graph where vertices are associated with pixels. On each image, the outer contour of the object is localized by 

finding the optimal cycle in the graph such that a cost function based on temporal, appearance and shape priors is minimized. 

Our contribution is the particle filtering-based framework to integrate the shape cue with the temporal and appear- 

- 263 -

ance cues. We demonstrate that incorporating the shape prior yields promising performance improvement over temporal 

and appearance priors on various object tracking scenarios. 

09:00-11:10, Paper ThAT8.57 

Real-Time Abnormal Event Detection in Complicated Scenes 

Shi, Yinghuan, Nanjing Univ. 

Gao, Yang, Nanjing Univ. 

Wang, Ruili, Massey Univ. 

In this paper, we proposed a novel real-time abnormal event detection framework that requires a short training period 

and has a fast processing speed. Our approach is based on phase correlation and our newly developed spatial-temporal 

co-occurrence Gaussian mixture models (STCOG)with the following steps: (i) a frame is divided into non-overlapping 

local regions; (ii) phase correlation is used to estimate the motion vectors between successive two frames for all corresponding 

local regions, and (iii) STCOG is used to model normal events and detect abnormal events if any deviation 

from the trained STCOG is found. Our proposed approach is also able to update the parameters incrementally and can 

be applied in complicated scenes. The proposed approach outperforms previous ones in terms of shorter training periods 

and lower computational complexity. 

ThAT9 Lower Foyer 

Human Computer Interaction and Biometrics Poster Session 

Session chair: Alba Castro, Jose Luis (Univ. of Vigo) 

09:00-11:10, Paper ThAT9.1 

Encoding Actions via Quantized Vocabulary of Averaged Silhouettes 



Human action recognition from video clips has received increasing attention in recent years. This paper proposes a simple 

yet effective method for the problem of action recognition. The method aims to encode human actions using the quantized 

vocabulary of averaged silhouettes that are derived from space-time windowed shapes and implicitly capture local temporal 

motion as well as global body shape. Experimental results on the publicly available Weizmann dataset have demonstrated 

that, despite its simplicity, our method is effective for recognizing actions, and is comparable to other state-of-the-art methods. 

09:00-11:10, Paper ThAT9.2 

Action Recognition using Space-Time Shape Difference Images 

Qu, Hao, The Univ. of Melbourne 



A common approach to human action recognition is to use 2-D silhouettes in the space-time volume as a basis for further 

extraction of useful features. In this paper, we present a novel motion representation based on difference images. We show 

that this representation exploits the dynamics of motion, and show its effectiveness in action recognition. Moreover, experimental 

results demonstrate that this method is highly accurate and is not sensitive to the resolution of the video. 

09:00-11:10, Paper ThAT9.3 

A Brain Computer Interface for Communication using Real-Time fMRI 

Eklund, Anders, Linköping Univ. 

Andersson, Mats, Linköping Univ. 

Ohlsson, Henrik, Linköping Univ. 

Ynnerman, Anders, Linköping Univ. 

Knutsson, Hans, 

We present the first step towards a brain computer interface (BCI) for communication using real-time functional magnetic 

resonance imaging (fMRI). The subject in the MR scanner sees a virtual keyboard and steers a cursor to select different 

- 264 -

letters that can be combined to create words. The cursor is moved to the left by activating the left hand, to the right by activating 

the right hand, down by activating the left toes and up by activating the right toes. To select a letter, the subject 

simply rests for a number of seconds. We can thus communicate with the subject in the scanner by for example showing 

questions that the subject can answer. Similar BCI for communication have been made with electroencephalography 

(EEG). In these implementations the subject for example focuses on a letter while different rows and columns of the virtual 

keyboard are flashing. The system then tries to detect if the correct letter is flashing or not. In our setup we instead classify 

the brain activity. Our system is not limited to a communication interface, but can be used for any interface where five degrees 

of freedom is necessary. 

09:00-11:10, Paper ThAT9.4 

Combined Top-Down/Bottom-Up Human Articulated Pose Estimation using AdaBoost Learning 

Wang, Sheng, Tsinghua Univ. 

Ai, Haizhou, Tsinghua Univ. 

Yamashita, Takayoshi, OMRON Corp. 


In this paper, a novel human articulated pose estimation method based on AdaBoost algorithm is presented. The human 

articulated pose is estimated by locating major human joint positions. We learn the classifiers on a normalized image for 

classifying each pixel position into a certain category. Two different kinds of classifiers, bottom-up joint position classifier 

and top-down skeleton classifier, are combined to achieve final results. HOG (Histogram of Oriented Gradient) feature is 

used for training both classifiers. Our human pose estimation system consists of three models, human detection, view classification, 

and pose estimation. The implemented system can automatically estimate human pose of different views. Experiment 

results are reported to show our proposed method can work on relatively small-size human images without using 

human silhouettes as a prerequisite, which is very efficient, robust and accurate enough for potential applications in visual 

surveillance. 

09:00-11:10, Paper ThAT9.5 

The Human Action Image 

Sethi, Ricky, Univ. of California, Riverside 

Roy-Chowdhury, Amit, Univ. of California, Riverside 

Recognizing a person’s motion is intuitive for humans but represents a challenging problem in machine vision. In this 

paper, we present a multi-disciplinary framework for recognizing human actions. We develop a novel descriptor, the 

Human Action Image (HAI): a physically-significant, compact representation for the motion of a person, which we derive 

from first principles in physics using Hamilton’s Action. We embed the HAI as the Motion Energy Pathway of the latest 

Neurobiological model of motion recognition. The Form Pathway is modelled using existing low-level feature descriptors 

based on shape and appearance. Experimental validation of the theory is provided on the well-known Weizmann and USF 

Gait datasets. 

09:00-11:10, Paper ThAT9.6 

Combining Spatial and Temporal Information for Gait based Gender Classification 

Hu, Maodi, Beihang Univ. 




In this paper, we address the problem of gait based gender classification. The Gabor feature which is a new attempt for 

gait analysis, not only improves the robustness to the segmental noise, but also provides a feasible way to purge the additional 

influence factors like clothing and carrying condition changes before supervised learning. Furthermore, through the 

agency of Maximization of Mutual Information (MMI), the low dimensional discriminative representation is obtained as 

the Gabor-MMI feature. After that, gender related Gaussian Mixture Model-Hidden Markov Models (GMM-HMMs) are 

constructed for classification work. In this case, supervised learning reduces the dimension of parameter space, and significantly 

increases the gap between likelihoods of the gender models. In order to assess the performance of our proposed 

approach, we compare it with other methods on the standard CASIA Gait Databases (Dataset B). Experimental results 

demonstrate that our approach achieves better Correct Classification Rate (CCR) than the state of the art methods. 

- 265 -

09:00-11:10, Paper ThAT9.7 

A Vision-Based Taiwanese Sign Language Recognition System 

Huang, Chung-Lin, National Tsing-Hua Univ. 

Tsai, Bo-Lin, National Tsing-Hua Univ. 

This paper presents a vision-based continuous sign language recognition system to interpret the Taiwanese Sign Language 

(TSL). The continuous sign language, which consists of a sequence of hold and movement segments, can be decomposed 

into non-signs and signs. The signs can be either static signs or dynamic signs. The former can be found in the hold 

segment, whereas the latter can be identified in the combination of hold and movement segments. We use Support Vector 

Machine (SVM) to recognize the static sign and apply HMM model to identify the dynamic signs. Finally, we use the 

finite state machine to verify the correctness of the grammar of the recognized TSL sentence, and correct the miss-recognized 

signs. 

09:00-11:10, Paper ThAT9.8 

Fusing Audio-Visual Nonverbal Cues to Detect Dominant People in Group Conversations 

Aran, Oya, Idiap Res. Inst. 

Gatica-Perez, Daniel, 

This paper addresses the multimodal nature of social dominance and presents multimodal fusion techniques to combine 

audio and visual nonverbal cues for dominance estimation in small group conversations. We combine the two modalities 

both at the feature extraction level and at the classifier level via score and rank level fusion. The classification is done by 

a simple rule-based estimator. We perform experiments on a new 10-hour dataset derived from the popular AMI meeting 

corpus. We objectively evaluate the performance of each modality and each cue alone and in combination. Our results 

show that the combination of audio and visual cues is necessary to achieve the best performance. 

09:00-11:10, Paper ThAT9.9 

Wavelet Domain Local Binary Pattern Features for Writer Identification 

Du, Liang, Huazhong Univ. of Science and Tech. 


Xu, Huihui, Huazhong Univ. of Science and Tech. 

Gao, Zhifan, Huazhong Univ. of Science and Tech. 

Tang, Yuanyan, Hongkong Baptist University 

The representation of writing styles is a crucial step of writer identification schemes. However, the large intra-writer variance 

makes it a challenging task. Thus, a good feature of writing style plays a key role in writer identification. In this 

paper, we present a simple and effective feature for off-line, text-independent writer identification, namely wavelet domain 

local binary patterns (WD-LBP). Based on WD-LBP, a writer identification algorithm is developed. WD-LBP is able to 

capture the essence of characteristics of writer while ignoring the variations intrinsic to every single writer. Unlike other 

texture framework method, we do not assign any statistical distribution assumption to the proposed method. This prevent 

us from making any, possibly erroneous, assumptions about the handwritten image feature distributions. The experimental 

results show that the proposed writer identification method achieves high accuracy of identification and outperforms recent 

writer identification method such as wavelet-GGD model and Gabor filtering method. 

09:00-11:10, Paper ThAT9.10 

Audio-Visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space 

Nicolaou, Mihalis, Imperial Coll. 

Gunes, Hatice, Imperial Coll. 

Pantic, Maja, Imperial Coll. 

This paper focuses on audio-visual (using facial expression, shoulder and audio cues) classification of spontaneous affect, 

utilising generative models for classification (i) in terms of Maximum Likelihood Classification with the assumption that 

the generative model structure in the classifier is correct, and (ii) Likelihood Space Classification with the assumption 

that the generative model structure in the classifier may be incorrect, and therefore, the classification performance can be 

improved by projecting the results of generative classifiers onto likelihood space, and then using discriminative classifiers. 

Experiments are conducted by utilising Hidden Markov Models for single cue classification, and 2 and 3-chain coupled 

Hidden Markov Models for fusing multiple cues and modalities. For discriminative classification, we utilise Support 

- 266 -

Vector Machines. Results show that Likelihood Space Classification improves the performance (91.76%) of Maximum 

Likelihood Classification (79.1%). Thereafter, we introduce the concept of fusion in the likelihood space, which is shown 

to outperform the typically used model-level fusion, attaining a classification accuracy of 94.01% and further improving 

all previous results. 

09:00-11:10, Paper ThAT9.12 

Improved Mandarin Keyword Spotting using Confusion Garbage Model 

Zhang, Shilei, IBM Res., China 

Shuang, Zhiwei, IBM Res., China 

Shi, Qin, IBM Res., China 

Qin, Yong, IBM Res., China 

This paper presents an improved acoustic keyword spotting (KWS) algorithm using a novel confusion garbage model in 

Mandarin conversational speech. Observing the KWS corpus, we found there are many words with similar pronunciation 

with predefined keywords, although they have different Chinese characters and different meanings, which easily result in 

high false alarm rate. In this paper, an improved acoustic KWS method with confusion garbage models was developed 

that absorbs similar pronunciation words confused with specific keywords for a given task. One obvious advantage of 

such method is that it provides a flexible framework to implement the selection procedure and reduce false alarm rate effectively 

for a specific task. The efficiency of the proposed architecture was evaluated under HMM-based confidence 

measures (CM) methods and demonstrated on a conversational telephone dataset. 

09:00-11:10, Paper ThAT9.13 

Human Activity Recognition using Local Shape Descriptors 

Venkatesha, Sharath, Univ. of California, Santa Barbara 

Turk, Matthew, Univ. of California, Santa Barbara 

We propose a method for human activity recognition in videos, based on shape analysis. We define local shape descriptors 

for interest points on the detected contour of the human action and build an action descriptor using a Bag of Features 

method. We also use the temporal relation among matching interest points across successive video frames. Further, an 

SVM is trained on these action descriptors to classify the activity in the scene. The method is invariant to the length of the 

video sequence, and hence it is suitable in online activity recognition. We have demonstrated the results on an action database 

consisting of nine actions like walk, jump, bend, etc., by twenty people, in indoor and outdoor scenarios. The proposed 

method achieves an accuracy of 87%, and is comparable to other state-of-the-art methods. 

09:00-11:10, Paper ThAT9.14 

Use of Line Spectral Frequencies for Emotion Recognition from Speech 

Bozkurt, Elif, Koc Univ. 

Erzin, Engin, Koc Univ. 

Eroglu Erdem, Cigdem, Bahcesehir Univ. 

Erdem, Arif Tanju, Ozyegin Univ. 

We propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not 

been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled 

cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition. 

The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant 

structure as well, which are related to the emotional state of the speaker [4]. We use the Gaussian mixture model (GMM) 

classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin 

Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF 

features bring a consistent improvement over the MFCC based emotion classification rates. 

09:00-11:10, Paper ThAT9.15 

Spatially Regularized Common Spatial Patterns for EEG Classification 

Lotte, Fabien, Inst. for Infocomm Res. 

Guan, Cuntai, Inst. for Infocomm Res. 

- 267 -

In this paper, we propose a new algorithm for Brain-Computer Interface (BCI): Spatially Regularized Common Spatial 

Patterns (SRCSP). SRCSP is an extension of the famous CSP algorithm which includes spatial a priori in the learning 

process, by adding a regularization term which penalizes spatially non smooth filters. We compared SRCSP and CSP algorithms 

on data of 14 subjects from BCI competitions. Results suggested that SRCSP can improve performances, around 

10% more in classification accuracy, for subjects with poor CSP performances. They also suggested that SRCSP leads to 

more physiologically relevant filters than CSP. 

09:00-11:10, Paper ThAT9.16 

Comparing Multiple Classifiers for Speech-Based Detection of Self-Confidence – A Pilot Study 

Krajewski, Jarek, Univ. of Wuppertal 

Batliner, Anton, Univ. of Erlangen-Nuremberg 

Kessel, Silke, Univ. of Wuppertal 

The aim of this study is to compare several classifiers commonly used within the field of speech emotion recognition 

(SER) on the speech based detection of self-confidence. A standard acoustic feature set was computed, resulting in 170 

features per one-minute speech sample (e.g. fundamental frequency, intensity, formants, MFCCs). In order to identify 

speech correlates of self-confidence, the lectures of 14 female participants were recorded, resulting in 306 one-minute 

segments of speech. Five expert raters independently assessed the self-confidence impression. Several classification models 

(e.g. Random Forest, Support Vector Machine, Naive Bayes, Multi-Layer Perceptron) and ensemble classifiers (AdaBoost, 

Bagging, Stacking) were trained. AdaBoost procedures turned out to achieve best performance, both for single models 

(AdaBoost LR: 75.2% class-wise averaged recognition rate) and for average boosting (59.3%) within speaker-independent 

settings. 

09:00-11:10, Paper ThAT9.17 

Hierarchical Human Action Recognition by Normalized-Polar Histogram 

Ziaeefard, Maryam, Sahand Univ. of Tech. 

Ebrahimnezhad, Hossein, Sahand Univ. of Tech. 

This paper proposes a novel human action recognition approach which represents each video sequence by a cumulative 

skeletonized images (called CSI) in one action cycle. Normalized-polar histogram corresponding to each CSI is computed. 

That is the number of pixels in CSI which is located in the certain distance and angles of the normalized circle. Using hierarchical 

classification in two levels, human action is recognized. In first level, course classification is performed with 

whole bins of histogram. In the second level, the more similar actions are examined again employing the special bins and 

the fine classification is completed. We use linear multi-class SVM as the classifier in two steps. Real human action dataset, 

Weizmann, is selected for evaluation. The resulting average recognition rate of the proposed method is 97.6%. 

09:00-11:10, Paper ThAT9.18 

Automatic 3D Facial Expression Recognition based on a Bayesian Belief Net and a Statistical Facial Feature Model 

Zhao, Xi, Ec. Centrale de Lyon 

Huang, Di, Ec. Centrale de Lyon 

Dellandréa, Emmanuel, Ec. Centrale de Lyon 


Automatic facial expression recognition on 3D face data is still a challenging problem. In this paper we propose a novel 

approach to perform expression recognition automatically and flexibly by combining a Bayesian Belief Net (BBN) and 

Statistical facial feature models (SFAM). A novel BBN is designed for the specific problem with our proposed parameter 

computing method. By learning global variations in face landmark configuration (morphology) and local ones in terms of 

texture and shape around landmarks, morphable Statistic Facial feature Model (SFAM) allows not only to perform an automatic 

landmarking but also to compute the belief to feed the BBN. Tested on the public 3D face expression database 

BU-3DFE, our automatic approach allows to recognize expressions successfully, reaching an average recognition rate 

over 82%. 

- 268 -

09:00-11:10, Paper ThAT9.19 

EEG-Based Personal Identification: From Proof-of-Concept to a Practical System 

Su, Fei, Beijing Univ. of Posts and Telecommunications 

Xia, Liwen, Beijing Univ. of Posts and Telecommunications 

Ma, Junshui, Merck Res. Lab. Merck & Co, Inc. 

Although the concept of using brain waves, e.g. Electroencephalogram (EEG), for personal identification has been validated 

in several studies, some unanswered practical and theoretical questions prevent this technology from further development 

for commercialization. Based on a well-designed personal identification experiment using EEG recordings, this study addressed 

three of these questions, which are (1) feasibility of using portable EEG equipment, (2) necessity for controlling 

factors influencing EEG, (3) the optimal set of features. With our understanding of the answers to these questions, the 

EEG-based personal identification system we built achieved an average accuracy of 97.5% on a dataset with 40 subjects. 

Results of this study provided supporting evidence that EEG-based personal identification from proof-of-concept to system 

implementation is promising. 

09:00-11:10, Paper ThAT9.20 

Improved Facial Expression Recognition with Trainable 2-D Filters and Support Vector Machines 

Peiyao, Li, Univ. of Wollongong 

Phung, Son Lam, Univ. of Wollongong 

Bouzerdoum, Abdesselam, Univ. of Wollongong 

Tivive, Fok Hing Chi, Univ. of Wollongong 

Facial expression is one way humans convey their emotional states. Accurate recognition of facial expressions is essential 

in perceptual human-computer interface, robotics and mimetic games. This paper presents a novel approach to facial expression 

recognition from static images that combines fixed and adaptive 2-D filters in a hierarchical structure. The fixed 

filters are used to extract primitive features. They are followed by the adaptive filters that are trained to extract more complex 

facial features. Both types of filters are non-linear and are based on the biological mechanism of shunting inhibition. 

The features are finally classified by a support vector machine. The proposed approach is evaluated on the JAFFE database 

with seven types of facial expressions: anger, disgust, fear, happiness, neutral, sadness and surprise. It achieves a classification 

rate of 96.7%, which compares favorably with several existing techniques for facial expression recognition tested 

on the same database. 

09:00-11:10, Paper ThAT9.21 

A Biologically-Inspired Top-Down Learning Model based on Visual Attention 


Wei, Longsheng, Huazhong Univ. of Science and Tech. 

Wang, Yuehuan, Huazhong Univ. of Science and Tech. 

A biologically-inspired top-down learning model based on visual attention is proposed in this paper. Low-level visual features 

are extracted from learning object itself and do not depend on the background information. All the features are expressed 

as a feature vector, which is looked as a random variable following a normal distribution. So every learning object 

is represented as the mean and standard deviation. All the learning objects are combined as an object class, which is represented 

as class’s mean and class’s standard deviation stored in long-term memory (LTM). Then the learned knowledge 

is used to find the similar location in an attended image. Experimental results indicate that: when the attended object 

doesn’t always appear in the background similar to that in the learning objects or their combinations change hugely between 

learning images and attended images, our model is excellent to the top-down approach of VOCUS and NavaIPakkam’s 

statistical model. 

09:00-11:10, Paper ThAT9.22 

Human Action Recognition using Segmented Skeletal Features 

Yoon, Sang Min, Tech. Univ. of Darmstadt 

Kuijper, Arjan, Fraunhofer IGD 

We present a novel human action recognition system based on segmented skeletal features which are separated into several 

human body parts such as face, torso and limbs. Our proposed human action recognition system consists of two steps: (i) 

automatic skeletal feature extraction and splitting by measuring the similarity in the space of diffusion tensor fields, and 

- 269 -

(ii) multiple kernel Support Vector Machine based human action recognition. Experimental results on a set of test database 

show that our proposed method is very efficient and effective to recognize human actions using few parameters, independent 

of dimensions, shadows, and viewpoints. 

09:00-11:10, Paper ThAT9.23 

Action Recognition by Multiple Features and Hyper-Spheremulti-Class SVM 

Liu, Jia, Shanghai Jiao Tong Univ. 

Yang, Jie, Shanghai Jiao Tong Univ. 

Zhang, Yi, Shanghai Jiao Tong Univ. 

He, Xiangjian, University of Technology, Sydney 

In this paper we propose a novel framework for action recognition based on multiple features for improve action recognition 

in videos. The fusion of multiple features is important for recognizing actions as often a single feature based representation 

is not enough to capture the imaging variations (view-point, illumination etc.) and attributes of individuals (size, age, 

gender etc.). Hence, we use two kinds of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (cuboids 

and 2-D SIFT), and ii) the higher-order statistical models of interest points, which aims to capture the global information 

of the actor. We construct video representation in terms of local space-time features and global features and integrate such 

representations with hyper-sphere multi-class SVM. Experiments on publicly available datasets show that our proposed 

approach is effective. An additional experiment shows that using both local and global features provides a richer representation 

of human action when compared to the use of a single feature type. 

09:00-11:10, Paper ThAT9.24 

Multimodal Recognition of Cognitive Workload for Multitasking in the Car 

Putze, Felix, Karlsruhe Inst. of Tech. 

Jarvis, Jan-Philip, Karlsruhe Inst. of Tech. 

Schultz, Tanja, Univ. Karlsruhe 

This work describes the development and evaluation of a recognizer for different levels of cognitive workload in the car. 

We collected multiple biosignal streams (skin conductance, pulse, respiration, EEG) during an experiment in a driving 

simulator in which the drivers performed a primary driving task and several secondary tasks of varying difficulty. From 

this data, an SVM based workload classifier was trained and evaluated, yielding recognition rates of up to for three levels 

of workload. 

09:00-11:10, Paper ThAT9.25 

Automatic Facial Action Detection using Histogram Variation between Emotional States 

Senechal, Thibaud, ISIR, UPMC 

Bailly, Kevin, Univ. PIERRE 1 MARIE CURIE - PARIS 6 

Prevost, Lionel, Univ. PIERRE 1 MARIE CURIE - PARIS 6 

This article presents an appearance based method to detect automatically facial actions. Our approach focuses on reducing 

features sensitivity to identity of the subject. We compute from an expressive image a Local Gabor Binary Pattern (LGBP) 

histogram and synthesize a LGBP histogram approaching the one we would compute on a neutral face. Difference between 

these two histograms are used as inputs of Support Vector Machine (SVM) binary detectors associated with a new kernel: 

the Histogram Difference Intersection (HDI) kernel. Experimental results carried out for 16 Action Units (AUs) on the 

benchmark Cohn-Kanade database can be compared favorably with two state-of-the-art methods. 

09:00-11:10, Paper ThAT9.27 

Decoding Finger Flexion from Electrocorticographic Signals using Sparse Gaussian Process 

Wang, Zuoguan, RPI 


Schalk, Gerwin, NYS Dept of Health 

Miller, Kai J., Univ. of Washington, 

A brain-computer interface (BCI) creates a direct communication pathway between the brain and an external device, and 

can thereby restore function in people with severe motor disabilities. A core component in a BCI system is the decoding 

- 270 -

algorithm that translates brain signals into action commands of an output device. Most of current decoding algorithms are 

based on linear models (e.g., derived using linear regression) that may have important shortcomings. The use of nonlinear 

models (e.g., neural networks) could overcome some of these shortcomings, but has difficulties with high dimensional 

feature spaces. Here we propose another decoding algorithm that is based on the sparse gaussian process with pseudoinputs 

(SPGP). As a nonparametric method, it can model more complex relationships compared to linear methods. As a 

kernel method, it can readily deal with high dimensional feature space. The evaluations shown in this paper demonstrate 

that SPGP can decode the flexion of finger movements from electrocorticographic (ECoG) signals more accurately than 

a previously described algorithm that used a linear model. In addition, by formulating problems in the bayesian probabilistic 

framework, SPGP can provide estimation of the prediction uncertainty. Furthermore, the trained SPGP offers a very effective 

way for identifying important features. 

09:00-11:10, Paper ThAT9.28 

Hand Pointing Estimation for Human Computer Interaction based on Two Orthogonal-Views 

Hu, Kaoning, State Univ. of New York at Binghamton 

Canavan, Shaun, State Univ. of New York at Binghamton 

Yin, Lijun, State Univ. of New York at Binghamton 

Hand pointing has been an intuitive gesture for human interaction with computers. Big challenges are still posted for accurate 

estimation of finger pointing direction in a 3D space. In this paper, we present a novel hand pointing estimation 

system based on two regular cameras, which includes hand region detection, hand finger estimation, two views‘ feature 

detection, and 3D pointing direction estimation. Based on the idea of binary pattern face detector, we extend the work to 

hand detection, in which a polar coordinate system is proposed to represent the hand region, and achieved a good result 

in terms of the robustness to hand orientation variation. To estimate the pointing direction, we applied an AAM based approach 

to detect and track 14 feature points along the hand contour from a top view and a side view. Combining two views 

of the hand features, the 3D pointing direction is estimated. The experiments have demonstrated the feasibility of the system. 

09:00-11:10, Paper ThAT9.29 

A Brain-Computer Interface for Mental Arithmetic Task from Single-Trial Near-Infrared Spectroscopy Brain Signals 

Ang, Kai Keng, Inst. for Infocomm Res. A*STAR 


Lee, Kerry, National Inst. of Education 

Lee, Jie Qi, National Inst. of Education 

Nioka, Shoko, Univ. of Pennsylvania 

Chance, Britton, Univ. of Pennsylvania 

Near-infrared spectroscopy (NIRS) enables non-invasive recording of cortical hemoglobin oxygenation in human subjects 

through the intact skull using light in the near-infrared range to determine. Recently, NIRS-based brain-computer interfaces 

are introduced for discriminating left and right-hand motor imagery. A neuroimaging study has also revealed event-related 

hemodynamic responses associated with the performance of mental arithmetic tasks. This paper proposes a novel BCI for 

detecting changes resulting from increases in the magnitude of operands used in a mental arithmetic task, using data from 

single-trial NIRS brain signals. We measured hemoglobin responses from 20 healthy subjects as they solved mental arithmetic 

problems with three difficulty levels. Accuracy in recognizing one difficulty level from another is then presented 

using 5 ? 5-fold cross-validations on the data collected. The results yielded an overall average accuracy of 71.2%, thus 

demonstrating potential in the proposed NIRS-based BCI in recognizing difficulty of problems encountered by mental 

arithmetic problem solvers. 

09:00-11:10, Paper ThAT9.30 

Articulated Human Body: 3D Pose Estimation using a Single Camera 

Wang, Zibin, The Chinese Univ. of Hong Kong 

Chung, Chi-Kit Ronald, The Chinese Univ. of Hong Kong 

We address how human pose in 3D can be tracked from a monocular video using a probabilistic inference method. Human 

body is modeled as a number of cylinders in space, each with an appearance facet as well as a pose facet. The appearance 

facets are acquired in a learning phase from some beginning frames of the input video. On this the visual hull description 

- 271 -

of the target human subject constructed from multiple images is found to be instrumental. In the operation phase, the 3D 

pose of the target subject in the subsequent frames of the input video is tracked. A bottom-up framework is used, which 

for any current image frame extracts firstly the tentative candidates of each body part in the image space. The human 

model, with the appearance facets already learned, and with the pose entries initialized with those for the previous image 

frame, is then brought in under a belief propagation algorithm, to establish correlation with the above 2D body part candidates 

while enforcing the proper articulation between the body parts, thereby determining the 3D pose of the human 

body in the current frame. The tracking performance on a number of monocular videos is shown. 

09:00-11:10, Paper ThAT9.31 

Resampling Approach to Facial Expression Recognition using 3D Meshes 

Murthy, O. V. Ramana, NUS 

Venkatesh, Y. V., NUS 

Kassim, Ashraf, NUS 

We propose a novel strategy, based on resampling of 3D meshes, to recognize facial expressions. This entails conversion 

of the existing irregular 3D mesh structure in the database to a uniformly sampled 3D matrix structure. An important consequence 

of this operation is that the classical correspondence problem can be dispensed with. In the present paper, in 

order to demonstrate the feasibility of the proposed strategy, we employ only spectral flow matrices as features to recognize 

facial expressions. Experimental results are presented, along with suggestions for possible refinements to the strategy to 

improve classification accuracy. 

09:00-11:10, Paper ThAT9.33 

Facial Expression Mimicking System 

Fukui, Ryuichi, Toyohashi Univ. of Tech. 

Katsurada, Kouichi, Toyohashi Univ. of Tech. 

Iribe, Yurie, Toyohashi Univ. of Tech. 

Nitta, Tsuneo, Toyohashi Univ. of Tech. 

We propose a facial expression mimicking system that copies the facial expression of one person on the image of another. 

The system uses the active appearance model (AAM), a commonly used model in the field of facial expression processing. 

AAM compositionally comprises some parameters representing facial shape, brightness, and illumination environment. 

Therefore, in addition to the facial expression elements, the model parameters express other elements, such as individuality 

and direction of the face. In order to extract the facial expression elements from compositional parameters of AAM, we 

applied principal component analysis (PCA) to the AAM parameter values, collected with each change in facial expression. 

The obtained facial expression model is applied to the facial expression mimicking system and the experiment shows its 

effectiveness for mimicking. 

09:00-11:10, Paper ThAT9.34 

A Framework for Hand Gesture Recognition and Spotting using Sub-Gesture Modeling 

Malgireddy, Manavender, Univ. at Buffalo, SUNY 

Corso, Jason, Univ. at Buffalo, SUNY 



Mandalapu, Dinesh, HP Lab. 

Hand gesture interpretation is an open research problem in Human Computer Interaction (HCI), which involves locating 

gesture boundaries (Gesture Spotting) in a continuous video sequence and recognizing the gesture. Existing techniques 

model each gesture as a temporal sequence of visual features extracted from individual frames which is not efficient due 

to the large variability of frames at different timestamps. In this paper, we propose a new sub-gesture modeling approach 

which represents each gesture as a sequence of fixed sub-gestures (a group of consecutive frames with locally coherent 

context) and provides a robust modeling of the visual features. We further extend this approach to the task of gesture spotting 

where the gesture boundaries are identified using a filler model and gesture completion model. Experimental results 

show that the proposed method outperforms state-of-the-art Hidden Conditional Random Fields (HCRF) based methods 

and baseline gesture spotting techniques. 

- 272 -

09:00-11:10, Paper ThAT9.35 

Off-Line Signature Verification using Graphical Model 

Lv, Hairong 

Bai, Xinxin, IBM Res. – China 

Yin, Wenjun, IBM Res. – China 

Dong, Jin, IBM Res. – China 

In this paper, we propose a novel probabilistic graphical model to address the off-line signature verification problem. Different 

from previous work, our approach introduces the concept of feature roles according to their distribution in genuine 

and forgery signatures, with all these features represented by a unique graphical model. And we propose several new techniques 

to improve the performance of the new signature verification system. Results based on 200 persons’ signatures 

(16000 signature samples) indicate that the proposed method outperforms other popular techniques for off-line signature 

verification with a great improvement. 

09:00-11:10, Paper ThAT9.36 

Linear Facial Expression Transfer with Active Appearance Models 

De La Hunty, Miles, Australian National Univ. 

Asthana, Akshay, Australian National Univ. 

Goecke, Roland, Univ. of Canberra 

The issue of transferring facial expressions from one person’s face to another’s has been an area of interest for the movie 

industry and the computer graphics community for quite some time. In recent years, with the proliferation of online image 

and video collections and web applications, such as Google Street View, the question of preserving privacy through face 

de-identification has gained interest in the computer vision community. In this paper, we focus on the problem of realtime 

dynamic facial expression transfer using an Active Appearance Model framework. We provide a theoretical foundation 

for a generalisation of two well-known expression transfer methods and demonstrate the improved visual quality of the 

proposed linear extrapolation transfer method on examples of face swapping and expression transfer using the AVOZES 

data corpus. Realistic talking faces can be generated in real-time at low computational cost. 

09:00-11:10, Paper ThAT9.37 

Fractal and Multi-Fractal for Arabic Offline Writer Identification 

Chaabouni, Aymen, Univ. of Sfax 

Boubaker, Houcine, Univ. of Sfax 

Kherallah, Monji, Univ. of Sfax 

El Abed, Haikal, Technische Universitat Braunschweig 


In recent years, fractal and multi-fractal analysis have been widely applied in many domains, especially in the field of 

image processing. In this direction we present in this paper a novel method for Arabic text-dependent writer identification 

based on fractal and multi-fractal features; thus, from the images of Arabic words, we calculate their fractal dimensions 

by using the Box-counting method, then we calculate their multi-fractal dimensions by using the method of DLA (Diffusion 

Limited Aggregates). To evaluate our method, we used 50 writers of the ADAB database, each writer wrote 288 words 

(24 Tunisian cities repeated 12 times) with 2/3 of words are used for the learning phase and the rest is used for the identification. 

The results obtained by using knearest neighbor classifier, demonstrate the effectiveness of our proposed method. 

09:00-11:10, Paper ThAT9.38 

A Simulation Study on the Generative Neural Ensemble Decoding Algorithms 

Kim, Sung-Phil, Korea Univ. 

Kim, Min-Ki, Korea Univ. 

Park, Gwi-Tae, Korea Univ. 

Brain-computer interfaces rely on accurate decoding of cortical activity to understand intended action. Algorithms for 

neural decoding can be broadly categorized into two groups: direct versus generative methods. Two generative models, 

the population vector algorithm (PVA) and the Kalman filter (KF), have been widely used for many intracortical BCI studies, 

where KF generally showed superior decoding to PVA. However, little has been known for which conditions each algorithm 

works properly and how KF translates the ensemble information. To address these questions, we performed a 

- 273 -

simulation study and demonstrated that KF and PVA worked congruently for uniformly distributed preferred directions 

(PDs) whereas KF outperformed PVA for non-uniform PDs. In addition, we showed that KF decoded better than PVA for 

low signal-to-noise ratio (SNR) or a small ensemble size. The results suggest that KF may decode direction better than 

PVA with non-uniform PDs or with low SNR and small ensemble size. 

09:00-11:10, Paper ThAT9.39 

3D Active Shape Model for Automatic Facial Landmark Location Trained with Automatically Generated Landmark 

Points 

Zhou, Dianle, TMSP 

Petrovska-Delacretaz, Dijana, Inst. Telecom SudParis (ex GET-INT) 

Dorizzi, Bernadette, TELECOM & Management SudParis 

In this paper, a 3D Active Shape Model (3DASM) algorithm is presented to automatically locate facial landmarks from different 

views. The 3DASM is trained by setting different shape and texture parameters of 3D Morphable Model (3DMM). 

Using 3DMM to synthesize training data offers us two advantages: first, few manual operations are need, except labeling 

landmarks on the mean face of 3DMM. Second, since the learning data are directly from 3DMM, landmarks have one to one 

correspondence between the 2D points detected from the image and 3D points on 3DMM. This kind of correspondence will 

benefit 3D face reconstruction processing. During fitting, 3D rotation parameters are added comparing to 2D Active Shape 

Model (ASM). So we separate shape variations into intrinsic change (caused by the character of different person) and extrinsic 

change (caused by model projection). The experimental results show that our method is robust to pose variation. 

09:00-11:10, Paper ThAT9.40 

Using Moments on Spatiotemporal Plane for Facial Expression Recognition 

Ji, Yi, INSA de Lyon 

Idrissi, Khalid, INSA de Lyon 

In this paper, we propose a novel approach to capture the dynamic deformation caused by facial expressions. The proposed 

method is concentrated on the spatiotemporal plane which is not well explored. It uses the moments as features to describe 

the movements of essential components such as eyes and mouth on vertical time plane. The system we developed can automatically 

recognize the expression on images as well as on image sequences. The experiments are performed on 348 sequences 

from 95 subjects in Cohn-Kanade database and obtained good results as high as 96.1% in 7-class recognition for 

frames and 98.5% in 6-class for sequences. 

09:00-11:10, Paper ThAT9.41 

Towards a More Realistic Appearance-Based Gait Representation for Gender Recognition 

Martín-Félez, Raúl, Univ. Jaume I 

Mollineda, Ramón A., Univ. Jaume I 

Sanchez, J. Salvador, Univ. Jaume I 

A realistic appearance-based representation of side-view gait sequences is here introduced. It is based on a prior method 

where a set of appearance-based features of a gait sample is used for gender recognition. These features are computed from 

parameter values of ellipses that fit body parts enclosed by regions previously defined while ignoring well-known facts of 

the human body structure. This work presents an improved regionalization method supported by some adaptive heuristic 

rules to better adjust regions to body parts. As a result, more realistic ellipses and a more meaningful feature space are obtained. 

Gender recognition experiments conducted on the CASIA Gait Database show better classification results when using 

the new features. 

09:00-11:10, Paper ThAT9.42 

A Calibration-Free Head Gesture Recognition System with Online Capability 

Wöhler, Nils-Christian, Bielefeld Univ. 

Großekathöfer, Ulf, Bielefeld Univ. 

Dierker, Angelika, Bielefeld Univ. 

Hanheide, Marc, Univ. of Birmingham 

Kopp, Stefan, Bielefeld Univ. 

Hermann, Thomas, Bielefeld Univ. 

- 274 -

In this paper, we present a calibration-free head gesture recognition system using a motion-sensor-based approach. For 

data acquisition we conducted a comprehensive study with 10 subjects. We analyzed the resulting head movement data 

with regard to separability and transferability to new subjects. Ordered means models (OMMs) were used for classification, 

since they provide an easy-to-use, fast, and stable approach to machine learning of time series. In result, we achieved classification 

rates of 85-95% for nodding, head shaking and tilting head gestures and good transferability. Finally, we show 

first promising attempts towards online recognition. 

09:00-11:10, Paper ThAT9.43 

TrajAlign: A Method for Precise Matching of 3-D Trajectories 

Aung, Zeyar, Inst. for Infocomm Res. Singapore 

Sim, Kelvin, Inst. for Infocomm Res. Singapore 

Ng, Wee Siong, Inst. for Infocomm Res. Singapore 

Matching two 3-D trajectories is an important task in a number of applications. The trajectory matching problem can be 

solved by aligning the two trajectories and taking the alignment score as their similarity measurement. In this paper, we 

propose a new method called “TrajAlign” (Trajectory Alignment). It aligns two trajectories by means of aligning their 

representative distance matrices. Experimental results show that our method is significantly more precise than the existing 

state-of-the-art methods. While the existing methods can provide correct answers in only up to 67% of the test cases, TrajAlign 

can offer correct results in 79% (i.e. 12% more) of the test cases, TrajAlign is also computationally inexpensive, 

and can be used practically for applications that demand efficiency. 

09:00-11:10, Paper ThAT9.44 

Real-Time 3D Model based Gesture Recognition for Multimedia Control 

Lin, Shih-Yao, National Taiwan Univ. 

Lai, Yun-Chien, National Taiwan Univ. 

Chan, Li-Wei, National Taiwan Univ. 


This paper presents a new 3D model-based gesture tracking system for controlling multimedia player in an intuitive way. 

The motivation of this paper is to make home appliance aware of user’s intention. This 3D model-based gesture tracking 

system adopts a Bayesian framework to track the user’s 3D hand position and to recognize meaning of these postures for 

controlling 3D player interactively. To avoid the high dimensionality of the whole 3D upper body model, which may complicate 

the gesture tracking problem, our system applies a novel hierarchical tracking algorithm to improve the system 

performance. Moreover, this system applies multiple cues for improving the accuracy of tracking results. Based on the 

above idea, we have implemented a 3D hand gesture interface for controlling multimedia players. Experimental results 

have shown that the proposed system robustly tracks the 3D position of the hand and has high potential for controlling the 

multimedia player. 

09:00-11:10, Paper ThAT9.45 

Motif Discovery and Feature Selection for CRF-Based Activity Recognition 

Zhao, Liyue, Univ. of Central Florida 

Wang, Xi, Univ. of Central Florida 

Sukthankar, Gita, Univ. of Central Florida 

Sukthankar, Rahul, Intel Labs Pittsburgh and Carnegie Mellon University 

Due to their ability to model sequential data without making unnecessary independence assumptions, conditional random 

fields (CRFs) have become an increasingly popular discriminative model for human activity recognition. However, how 

to represent signal sensor data to achieve the best classification performance within a CRF model is not obvious. This 

paper presents a framework for extracting motif features for CRF-based classification of IMU (inertial measurement unit) 

data. To do this, we convert the signal data into a set of motifs, approximately repeated symbolic sub sequences, for each 

dimension of IMU data. These motifs leverage structure in the data and serve as the basis to generate a large candidate set 

of features from the multi-dimensional raw data. By measuring reductions in the conditional log-likelihood error of the 

training samples, we can select features and train a CRF classifier to recognize human activities. An evaluation of our 

classifier on the CMU Multi-Modal Activity Database reveals that it outperforms the CRF-classifier trained on the raw 

features as well as other standard classifiers used in prior work. 

- 275 -

09:00-11:10, Paper ThAT9.46 

On-Line Signature Verification using 1-D Velocity-Based Directional Analysis 

Muhammad Talal Ibrahim, Ryerson Unviersity 

Matthew, Kyan, Ryerson Unviersity 

M. Aurangzeb, Khan, COMSATS Inst. of Information Tech. 

Ling, Guan, Ryerson Unviersity 

In this paper, we propose a novel approach for identity verification based on the directional analysis of velocity-based 

partitions of an on-line signature. First, inter-feature dependencies in a signature are exploited by decomposing the shape 

(horizontal trajectory, vertical trajectory) into two partitions based on the velocity profile of the base-signature for each 

signer, which offers the flexibility of analyzing both low and high-curvature portions of the trajectory independently. Further, 

these velocity-based shape partitions are analyzed directionally on the basis of relative angles. Support Vector Machine 

(SVM) is then used to find the decision boundary between the genuine and forgery class. Experimental results demonstrate 

the superiority of our approach in on-line signature verification in comparison with other techniques. 

09:00-11:10, Paper ThAT9.47 

Age Classification based on Gait using HMM 

Zhang, De, Beihang Univ. 


Bhanu, Bir, Univ. of California 

In this paper we propose a new framework for age classification based on human gait using Hidden Markov Model (HMM). 

A gait database including young people and elderly people is built. To extract appropriate gait features, we consider a contour 

related method in terms of shape variations during human walking. Then the image feature is transformed to a lowerdimensional 

space by using the Frame to Exemplar (FED) distance. A HMM is trained on the FED vector sequences. 

Thus, the framework provides flexibility in the selection of gait feature representation. In addition, the framework is robust 

for classification due to the statistical nature of HMM. The experimental results show that video-based automatic age classification 

from human gait is feasible and reliable. 

09:00-11:10, Paper ThAT9.48 

Human Electrocardiogram for Biometrics using DTW and FLDA 

N, Venkatesh, Tata Consultancy Services Innovation Lab. 

Jayaraman, Srinivasan, Tata Consultancy Services, Bangalore 

This paper proposes a new approach for person identification and novel person authentication using single lead human 

Electrocardiogram. Nine Feature parameters were extracted from ECG in spatial domain for classification. For person 

identification, Dynamic Time Warping (DTW) and Fisher‘s Linear Discriminant Analysis (FLDA) with K-Nearest Neighbor 

Classifier (NNC) as single stage classification yielded a recognition accuracy of 96% and 97% respectively. To further 

improve the performance of the system, two stage classification techniques have been adapted. In two stage classifications 

FLDA is used with k-NNC at the first stage followed by DTW classifier at the second stage which yielded 100% recognition 

accuracy. During person authentication we adapted the QRS complex based threshold technique. The overall performance 

of the system was 96% for both legal and intruder situations is verified for MIT-BIH normal database size of 375 recording 

from 15 individual ECG. 

09:00-11:10, Paper ThAT9.49 

Recognizing Sign Language from Brain Imaging 

Mehta, Nishant, Georgia Inst. of Tech. 

Starner, Thad, Georgia Inst. of Tech. 

Moore Jackson, Melody, Georgia Inst. of Tech. 

Babalola, Karolyn, Georgia Inst. of Tech. 

James, George Andrew, Univ. of Arkansas 

Classification of complex motor activities from brain imaging is relatively new in the fields of neuroscience and braincomputer 

interfaces (BCIs). We report sign language classification results for a set of three contrasting pairs of signs. Executed 

sign accuracy was 93.3%, and imagined sign accuracy was 76.7%. For a full multiclass problem, we used a decision 

directed acyclic graph of pairwise support vector machines, resulting in 63.3% accuracy for executed sign and 31.4% ac- 

- 276 -

curacy for imagined sign. Pairwise comparison of phrases composed of these signs yielded a mean accuracy of 73.4%. 

These results suggest the possibility of BCIs based on sign language. 

09:00-11:10, Paper ThAT9.50 

American Sign Language Phrase Verification in an Educational Game for Deaf Children 

Zafrulla, Zahoor, Georgia Inst. of Tech. 

Brashear, Helene, Georgia Inst. of Tech. 

Yin, Pei, Georgia Inst. of Tech. 

Presti, Peter, Georgia Inst. of Tech. 

Starner, Thad, Georgia Inst. of Tech. 

Hamilton, Harley, Georgia Inst. of Tech. 

We perform real-time American Sign Language (ASL) phrase verification for an educational game, CopyCat, which is 

designed to improve deaf children’s signing skills. Taking advantage of context information in the game we verify a phrase, 

using Hidden Markov Models (HMMs), by applying a rejection threshold on the probability of the observed sequence for 

each sign in the phrase. We tested this approach using 1204 signed phrase samples from 11 deaf children playing the game 

during the phase two deployment of CopyCat. The CopyCat data set is particularly challenging because sign samples are 

collected during live game play and contain many variations in signing and disfluencies. We achieved a phrase verification 

accuracy of 83% compared to 90% real-time performance by a sign linguist. We report on the techniques required to reach 

this level of performance. 

09:00-11:10, Paper ThAT9.51 

A Robust Method for Hand Gesture Segmentation and Recognition using Forward Spotting Scheme in Conditional 

Random Fields 

Elmezain, Mahmoud, Otto-von-Guericke-Univ. Magdeburg 

Al-Hamadi, Ayoub, Otto-von-Guericke-Univ. Magdeburg 

Michaelis, Bernd, Otto-von-Guericke-Univ. Magdeburg 

This paper proposes a forward spotting method that handles hand gesture segmentation and recognition simultaneously 

without time delay. To spot meaningful gestures of numbers (0-9) accurately, a stochastic method for designing a nongesture 

model using Conditional Random Fields (CRFs) is proposed without training data. The non-gesture model provides 

a confidence measures that are used as an adaptive threshold to find the start and the end point of meaningful gestures. 

Experimental results show that the proposed method can successfully recognize isolated gestures with 96.51% and meaningful 

gestures with 90.49% reliability. 

09:00-11:10, Paper ThAT9.52 

Real-Time Upper-Limbs Posture Recognition based on Particle Filters and AdaBoost Algorithms 

Fahn, Chin-Shyurng, National Taiwan Univ. of Science and Tech. 

Chiang, Sheng-Lung, National Taiwan Univ. of Science and Tech. 

In this paper, we employ particle filters to dynamically locate a face and upper-limbs. To prevent from the disturbance 

caused by skin color regions, such as other naked parts of a human body, or some skin color-like objects in the background, 

we further take the motion cue as a feature during the tracking. Currently, we prescribe eight kinds of upper-limbs postures 

with reference to the characteristic of flag semaphore. The advantage is that we can utilize the relative positions of a face 

and two hands to recognize the postures easily. To achieve posture recognition, we evaluate three different classifiers using 

the machine learning methods: multi-layer perceptrons, support vector machines, and AdaBoost algorithms. The experimental 

results reveal that AdaBoost algorithms are the best one, which reach the accuracy rate of recognizing upper-limbs 

postures more than 95% and require much less training time than the other two do. 

09:00-11:10, Paper ThAT9.53 

One-Lead ECG-Based Personal Identification using Ziv-Merhav Cross Parsing 

Pereira Coutinho, David, Inst. Superior de Engenharia de Lisboa 

Fred, Ana Luisa Nobre, Inst. Superior Técnico 

Figueiredo, Mario A. T., Inst. Superior Técnico 

- 277 -

The advance of falsification technology increases security concerns and gives biometrics an important role in security solutions. 

The electrocardiogram (ECG) is an emerging biometric that does not need liveliness verification. There is strong 

evidence that ECG signals contain sufficient discriminative information to allow the identification of individuals from a 

large population. Most approaches rely on ECG data and the fiducia of different parts of the heartbeat waveform. However 

non-fiducial approaches have proved recently to be also effective, and have the advantage of not relying critically on the 

accurate extraction of fiducia data. In this paper, we propose a new % NEW DAV non-fiducial ECG biometric identification 

method based on data compression techniques, namely the Ziv-Merhav cross parsing algorithm for symbol sequences 

(strings). Our method relies on a string similarity measure derived from algorithmic cross complexity concept and its compression-based 

approximation. NEW DAV We present results on real data, one-lead ECG, acquired during a concentration 

task, from 19 healthy individuals. Our approach achieves 100% subject recognition rate despite the existence of differentiated 

stress states. 

09:00-11:10, Paper ThAT9.54 

Multimodal Human Computer Interaction with MIDAS Intelligent Infokiosk 

Karpov, Alexey, Russian Acad. of Sciences 

Ronzhin, Andrey, Russian Acad. of Sciences 

Kipyatkova, Irina, Russian Acad. of Sciences 

Ronzhin, Alexander, Russian Acad. of Sciences 

Akarun, Lale, Bogazici Univ. 

In this paper, we present an intelligent information kiosk called MIDAS (Multimodal Interactive-Dialogue Automaton for 

Self-service), including its hardware and software architecture, stages of deployment of speech recognition and synthesis 

technologies. MIDAS uses the methodology Wizard of Oz (WOZ) that allows an expert to correct speech recognition 

results and control the dialogue flow. User statistics of the multimodal human computer interaction (HCI) have been analyzed 

for the operation of the kiosk in the automatic and automated modes. The infokiosk offers information about the 

structure and staff of laboratories, the location and phones of departments and employees of the institution. The multimodal 

user interface is provided with a touch screen, natural speech input and head and manual gestures, both for ordinary and 

physically handicapped users. 

09:00-11:10, Paper ThAT9.55 

View Invariant Body Pose Estimation based on Biased Manifold Learning 

Hur, Dongcheol, Korea Univ. 

Lee, Seong-Whan, Korea Univ. 

Wallraven, Christian, MPI for Biological Cybernetics 

In human body pose estimation, manifold learning is a popular technique for reducing the dimension of 2D images and 

3D body configuration data. This technique, however, is especially vulnerable to silhouette variation such as caused by 

viewpoint changes. In this paper, we propose a novel approach that combines three separate manifolds for representing 

variations in viewpoint, pose and 3D body configuration. We use biased manifold learning to learn these manifolds with 

appropriately weighted distances. A set of four mapping functions are then learned by a generalized regression neural network 

for added robustness. Despite using only three manifolds, we show that this method can reliably estimate 3D body 

poses from 2D images with all learned viewpoints. 

09:00-11:10, Paper ThAT9.56 

Visual Gaze Estimation by Joint Head and Eye Information 

Valenti, Roberto, Univ. of Amsterdam 

Lablack, Adel, UMR USTL/CNRS 8022 

Sebe, Nicu, Univ. of Trento 

Djeraba, Chabane, UMR USTL/CNRS 8022 

Gevers, Theo, Univ. of Amsterdam 

In this paper, we present an unconstrained visual gaze estimation system. The proposed method extracts the visual field 

of view of a person looking at a target scene in order to estimate the approximate location of interest (visual gaze). The 

novelty of the system is the joint use of head pose and eye location information to fine tune the visual gaze estimated by 

the head pose only, so that the system can be used in multiple scenarios. The improvements obtained by the proposed approach 

are validated using the Boston University head pose dataset, on which the standard deviation of the joint visual 

- 278 -

gaze estimation improved by 61:06% horizontally and 52:23% vertically with respect to the gaze estimation obtained by 

the head pose only. A user study shows the potential of the proposed system. 

09:00-11:10, Paper ThAT9.57 

Discrimination of Moderate and Acute Drowsiness based on Spontaneous Facial Expressions 

Vural, Esra, Univ. of California San Diego 

Bartlett, Marian Stewart, Univ. of California San Diego 

Littlewort, Gwen, Univ. of California San Diego 



Movellan, Javier, Univ. of California San Diego 

It is important for drowsiness detection systems to identify different levels of drowsiness and respond appropriately at 

each level. This study explores how to discriminate moderate from acute drowsiness by applying computer vision techniques 

to the human face. In our previous study, spontaneous facial expressions measured through computer vision techniques 

were used as an indicator to discriminate alert from acutely drowsy episodes. In this study we are exploring which 

facial muscle movements are predictive of moderate and acute drowsiness. The effect of temporal dynamics of action 

units on prediction performances is explored by capturing temporal dynamics using an over complete representation of 

temporal Gabor Filters. In the final system we perform feature selection to build a classifier that can discriminate moderate 

drowsy from acute drowsy episodes. The system achieves a classification rate of .96 A’ in discriminating moderately 

drowsy versus acutely drowsy episodes. Moreover the study reveals new information in facial behavior occurring during 

different stages of drowsiness. 

11:10-12:10, ThPL1 Anadolu Auditorium 

J.K. Aggarwal Prize Lecture: 

Scene and Object Recognition in Context 

Antonio Torralba Plenary Session 

Computer Science and Artificial Intelligence Laboratory 

Dept. of Electrical Engineering and Computer Science 

MIT, USA 

Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been 

much progress and there are already object recognition systems operating in commercial products. Most of the algorithms 

for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions 

with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem 

by brute force. However, in the real world, objects tend to co-vary with other objects, providing a rich collection of 

contextual associations. These contextual associations can be used to reduce the search space by looking only in places in 

which the object is expect to be; this also increases performance, by rejecting image patterns that appear to look like the 

target object but that are in unlikely places. 

As the field moves into integrated systems that try to recognize many object classes and learn about contextual relationships 

between objects, the lack of large annotated datasets hinders the fast development of robust solutions. In this talk I will 

describe recent work on visual scene understanding that try to build integrated models for scene and object recognition, 

emphasizing the power of large database of annotated images in computer vision. 

ThBT1 Marmara Hall 

Object Detection and Recognition - V Regular Session 

Session chair: Wang, Yunhong (Beihang Univ.) 

13:30-13:50, Paper ThBT1.1 

Finding Multiple Object Instances with Occlusion 

Guo, Ge, Chinese Acad. of Sciences 

Jiang, Tingting, Peking Univ. 

Wang, Yizhou, School of EECS, Peking 

Gao, Wen, Peking Univ. 

- 279 -

In this paper we provide a framework of detection and localization of multiple similar shapes or object instances from an 

image based on shape matching. There are three challenges about the problem. The first is the basic shape matching 

problem about how to find the correspondence and transformation between two shapes; second how to match shapes under 

occlusion; and last how to recognize and locate all the matched shapes in the image. We solve these problems by using 

both graph partition and shape matching in a global optimization framework. A Hough-like collaborative voting is adopted, 

which provides a good initialization, data-driven information, and plays an important role in solving the partial matching 

problem due to occlusion. Experiments demonstrate the efficiency of our method. 

13:50-14:10, Paper ThBT1.2 

Bag of Hierarchical Co-Occurrence Features for Image Classification 

Kobayashi, Takumi, National Inst. of Advanced Industrial Science and 

Otsu, Nobuyuki, National Inst. of Advanced Industrial Science and 

We propose a bag-of-hierarchical-co-occurrence features method incorporating hierarchical structures for image classification. 

Local co-occurrences of visual words effectively characterize the spatial alignment of objects‘ components. The 

visual words are hierarchically constructed in the feature space, which helps us to extract higher-level words and to avoid 

quantization error in assigning the words to descriptors. For extracting descriptors, we employ two types of features hierarchically: 

narrow (local) descriptors, like SIFT [1], and broad descriptors based on co-occurrence features. The proposed 

method thus captures the co-occurrences of both small and large components. We conduct an experiment on image classification 

by applying the method to the Caltech 101 dataset and show the favorable performance of the proposed method. 

14:10-14:30, Paper ThBT1.3 

Person Detection using Temporal and Geometric Context with a Pan Tilt Zoom Camera 


Lisanti, Giuseppe, Univ. of Florence 

Masi, Iacopo, Univ. of Florence 

Pernici, Federico, Univ. of Florence 

In this paper we present a system that integrates automatic camera geometry estimation and object detection from a Pan 

Tilt Zoom camera. We estimate camera pose with respect to a world scene plane in real-time and perform human detection 

exploiting the relative space-time context. Using camera self-localization, 2D object detections are clustered in a 3D world 

coordinate frame. Target scale inference is further exploited to reduce the number of false alarms and to increase also the 

detection rate in the final non-maximum suppression stage. Our integrated system applied on real-world data shows superior 

performance with respect to the standard detector used. 

14:30-14:50, Paper ThBT1.4 

Disparity Map Refinement for Video based Scene Change Detection using a Mobile Stereo Camera Platform 

Haberdar, Hakan, Univ. of Houston 

Shah, Shishir, Univ. of Houston 

This paper presents a novel disparity map refinement method and vision based surveillance framework for the task of detecting 

objects of interest in dynamic outdoor environments from two stereo video sequences taken at different times and 

from different viewing angles by a mobile camera platform. The proposed framework includes several steps, the first of 

which computes disparity maps of the same scene in two video sequences. Preliminary disparity images are refined based 

on estimated disparities in neighboring frames. Segmentation is performed to estimate ground planes, which in turn are 

used for establishing spatial registration between the two video sequences. Finally, the regions of change are detected 

using the combination of texture and intensity gradient features. We present experiments on detection of objects of different 

sizes and textures in real videos. 

14:50-15:10, Paper ThBT1.5 

Using Symmetry to Select Fixation Points for Segmentation 

Kootstra, Gert, KTH 

Bergström, Niklas, Royal Inst. of Tech. 

Kragic, Danica, KTH 

- 280 -

For the interpretation of a visual scene, it is important for a robotic system to pay attention to the objects in the scene and 

segment them from their background. We focus on the segmentation of previously unseen objects in unknown scenes. The 

attention model therefore needs to be bottom-up and context-free. In this paper, we propose the use of symmetry, one of 

the Gestalt principles for figure-ground segregation, to guide the robot’s attention. We show that our symmetry-saliency 

model outperforms the contrast-saliency model, proposed in (Itti et al 1998). The symmetry model performs better in finding 

the objects of interest and selects a fixation point closer to the center of the object. Moreover, the objects are better 

segmented from the background when the initial points are selected on the basis of symmetry. 

ThBT2 Anadolu Auditorium 

Classification - II Regular Session 

Session chair: Pelillo, Marcello (Ca’Foscari Univ.) 

13:30-13:50, Paper ThBT2.1 

Data Classification on Multiple Manifolds 

Xiao, Rui, Shanghai Jiao Tong Univ. 



Shi, Pengfei, Shanghai Jiao Tong Univ. 

Unlike most previous manifold-based data classification algorithms assume that all the data points are on a single manifold, 

we expect that data from different classes may reside on different manifolds of possible different dimensions. Therefore, 

better classification accuracy would be achieved by modeling the data by multiple manifolds each corresponding to a 

class. To this end, a general framework for data classification on multiple manifolds is presented. The manifolds are firstly 

learned for each class separately, and a stochastic optimization algorithm is then employed to get the near optimal dimensionality 

of each manifold from the classification viewpoint. Then, classification is performed under a newly defined minimum 

reconstruction error based classifier. Our method could be easily extended by involving various manifold learning 

methods and searching strategies. Experiments on both synthetic data and databases of facial expression images show the 

effectiveness of the proposed multiple manifold based approach. 

13:50-14:10, Paper ThBT2.2 

Unsupervised Ensemble Ranking: Application to Large-Scale Image Retrieval 

Lee, Jung-Eun, Michigan State Univ. 

Jin, Rong, Michigan State Univ. 


The continued explosion in the growth of image and video databases makes automatic image search and retrieval an extremely 

important problem. Among the various approaches to Content-based Image Retrieval (CBIR), image similarity 

based on local point descriptors has shown promising performance. However, this approach suffers from the scalability 

problem. Although bag-of-words model resolves the scalability problem, it suffers from loss in retrieval accuracy. We circumvent 

this performance loss by an ensemble ranking approach in which rankings from multiple bag-of-words models 

are combined to obtain more accurate retrieval results. An unsupervised algorithm is developed to learn the weights for 

fusing the rankings from multiple bag-of-words models. Experimental results on a database of 100,000 images show that 

this approach is both efficient and effective in finding visually similar images. 

14:10-14:30, Paper ThBT2.3 

Cross Entropy Optimization of the Random Set Framework for Multiple Instance Learning 

Bolton, Jeremy, Univ. of Florida 


Multiple instance learning (MIL) is a recently researched technique used for learning a target concept in the presence of 

noise. Previously, a random set framework for multiple instance learning (RSF-MIL) was proposed; however, the proposed 

optimization strategy did not permit the harmonious optimization of model parameters. A cross entropy, based optimization 

strategy is proposed. Experimental results on synthetic examples, benchmark and landmine data sets illustrate the benefits 

of the proposed optimization strategy. 

- 281 -

14:30-14:50, Paper ThBT2.4 

A Constant Average Time Algorithm to Allow Insertions in the LAESA Fast Nearest Neighbour Search Index 

Oncina, Jose, Univ. de Alicante 

Micó, Luisa, Univ. de Alicante 

Nearest Neighbour search is a widely used technique in Pattern Recognition. In order to speed up the search many indexing 

techniques have been proposed. However, most of the proposed techniques are static, that is, once the index is built the 

incorporation of new data is not possible unless a costly rebuilt of the index is performed. The main effect is that changes 

in the environment are very costly to be taken into account. In this work, we propose a technique to allow the insertion of 

elements in the LAESA index. The resulting index is exactly the same as the one that would be obtained by building it 

from scratch. In this paper we also obtain an upper bound for its expected running time. Surprisingly, this bound is independent 

of the database size. 

14:50-15:10, Paper ThBT2.5 

Feature Extraction from Discrete Attributes 

Yildiz, Olcay Taner, Isik Univ. 

In many pattern recognition applications, first decision trees are used due to their simplicity and easily interpretable nature. 

In this paper, we extract new features by combining k discrete attributes, where for each subset of size k of the attributes, 

we generate all orderings of values of those attributes exhaustively. We then apply the usual univariate decision tree classifier 

using these orderings as the new attributes. Our simulation results on 16 datasets from UCI repository show that the 

novel decision tree classifier performs better than the proper in terms of error rate and tree complexity. The same idea can 

also be applied to other univariate rule learning algorithms such as C4.5 Rules and Ripper. 

ThBT3 Topkapı Hall A 

Computer Vision Applications - II Regular Session 

Session chair: Foggia, Pasquale (Univ. di Salerno) 

13:30-13:50, Paper ThBT3.1 

Fire-Flame Detection based on Fuzzy Finite Automation 

Ko, Byoungchul, Keimyung Univ. 

Ham, Seoun-Jae, Keimyung Univ. 

Nam, Jaeyeal, Keimyung Univ. 

This paper proposes a new fire-flame detection method using probabilistic membership function of visual features and 

Fuzzy Finite Automata (FFA). First, moving regions are detected by analyzing the background subtraction and candidate 

flame regions then identified by applying flame color models. Since flame regions generally have an irregular pattern continuously, 

membership functions of variance of intensity, wavelet energy and motion orientation are generate and applied 

to FFA. Since FFA combines the capabilities of automata with fuzzy logic, it not only provides a systemic approach to 

handle uncertainty in computational systems, but also can handle continuous spaces. The proposed algorithm is successfully 

applied to various fire videos and shows a better detection performance when compared with other methods 

13:50-14:10, Paper ThBT3.2 

Extrinsic Camera Parameter Estimation using Video Images and GPS Considering GPS Positioning Accuracy 

Kume, Hideyuki, Nara Inst. of Science and Tech. 

Taketomi, Takafumi, Nara Inst. of Science and Tech. 

Sato, Tomokazu, Nara Inst. of Science and Tech. 

Yokoya, Naokazu, Nara Inst. of Science and Tech. 

This paper proposes a method for estimating extrinsic camera parameters using video images and position data acquired 

by GPS. In conventional methods, the accuracy of the estimated camera position largely depends on the accuracy of GPS 

positioning data because they assume that GPS position error is very small or normally distributed. However, the actual 

error of GPS positioning easily grows to the 10m level and the distribution of these errors is changed depending on satellite 

positions and conditions of the environment. In order to achieve more accurate camera positioning in outdoor environments, 

in this study, we have employed a simple assumption that true GPS position exists within a certain range from the observed 

GPS position and the size of the range depends on the GPS positioning accuracy. Concretely, the proposed method estimates 

- 282 -

camera parameters by minimizing an energy function that is defined by using the reprojection error and the penalty term 

for GPS positioning. 

14:10-14:30, Paper ThBT3.3 

Combining Monocular and Stereo Cues for Mobile Robot Localization using Visual Words 

Fraundorfer, Friedrich, ETH Zurich 

Wu, Changchang, UNC-Chapel Hill 

Pollefeys, Marc, 

This paper describes an approach for mobile robot localization using a visual word based place recognition approach. In 

our approach we exploit the benefits of a stereo camera system for place recognition. Visual words computed from SIFT 

features are combined with VIP (viewpoint invariant patches) features that use depth information from the stereo setup. 

The approach was evaluated under the ImageCLEF@ICPR 2010 competition. The results achieved on the competition 

datasets are published in this paper. 

14:30-14:50, Paper ThBT3.4 

Fast Derivation of Soil Surface Roughness Parameters using Multi-Band SAR Imagery and the Integral Equation 

Model 

Seppke, Benjamin, Univ. of Hamburg 

Dreschler-Fischer, Leonie, Univ. of Hamburg 

Heiming, Jo-Ann, Univ. of Hamburg 

Wengenroth, Felix, Univ. of Hamburg 

The Integral Equation Model (IEM) predicts the normalized radar cross section (NRCS) of dielectric surfaces given surface 

and radar parameters. To derive the surface parameters from the NRCS using the IEM, the model needs to be inverted. We 

present a fast method of this model inversion to derive soil surface roughness parameters from synthetic aperture radar 

(SAR) remote sensing data. The model inversion is based on two different collocated SAR images of different bands, the 

derivation of the parameters cannot be done using one band alone. The computation of the model and the model inversion 

are very time consuming tasks and therefore may be impractical for large remote sensing data. We present an approach 

that is based on a few model assumptions to speed up the computation of the surface parameters. We applied the algorithm 

to detect the correlation length of the surface for dry-fallen areas in the World Cultural Heritage Wadden Sea, a coastal 

tidal flat at the German Bight (North Sea). The results are very promising and may be used for a classification of the area 

in future steps. 

14:50-15:10, Paper ThBT3.5 

Social Network Approach to Analysis of Soccer Game 

Park, Kyoung-Jin, The Ohio State Univ. 

Yilmaz, Alper, The Ohio State Univ. 

Video understanding has been an active area of research, where many articles have been published on how to detect and 

track objects in videos, and how to analyze their trajectories. These methods, however, only provided heuristic low level 

information without providing a higher level understanding of global relations within the whole context. This paper presents 

a new way to provide such understanding using social network approach in soccer videos. Our approach considers representing 

interactions between the objects in the video as a social network. This network is then analyzed by detecting small 

communities using modularity, which relates social interaction. Additionally, we analyze the centrality of nodes which 

provides importance of individuals composing the network. In particular, we introduce five centralities exploiting directed 

and weighted social network. The partitions of the resulting social network are shown to relate to clusters of soccer players 

with respect to their role in the game. 

ThBT4 Dolmabahçe Hall B 

Image Segmentation - II Regular Session 

Session chair: Farag, Aly A. (Univ. of Louisville) 

13:30-13:50, Paper ThBT4.1 

Robust Foreground Object Segmentation via Adaptive Region-Based Background Modelling 

Reddy, Vikas, NICTA, The Univ. of Queensland 


- 283 -


We propose a region-based foreground object segmentation method capable of dealing with image sequences containing 

noise, illumination variations and dynamic backgrounds (as often present in outdoor environments). The method utilises 

contextual spatial information through analysing each frame on an overlapping block by-block basis and obtaining a lowdimensional 

texture descriptor for each block. Each descriptor is passed through an adaptive multi-stage classifier, comprised 

of a likelihood evaluation, an illumination invariant measure, and a temporal correlation check. The overlapping of 

blocks not only ensures smooth contours of the foreground objects but also effectively minimises the number of false positives 

in the generated foreground masks. The parameter settings are robust against wide variety of sequences and postprocessing 

of foreground masks is not required. Experiments on the challenging I2R dataset show that the proposed method 

obtains considerably better results (both qualitatively and quantitatively) than methods based on Gaussian mixture models 

(GMMs), feature histograms, and normalised vector distances. On average, the proposed method achieves 36% more accurate 

foreground masks than the GMM based method. 

13:50-14:10, Paper ThBT4.2 

Flooding and MRF-Based Algorithms for Interactive Segmentation 

Grinias, Ilias, Univ. of Crete 

Komodakis, Nikos, Univ. of Crete 

Tziritas, G., Univ. of Crete 

We propose a method for interactive colour image segmentation. The goal is to detect an object from the background, 

when some markers on object(s) and the background are given. As features only probability distributions of the data are 

used. At first, all the labelled seeds are independently propagated for obtaining homogeneous connected components for 

each of them. Then the image is divided in blocks, which are classified according to their probabilistic distance from the 

classified regions. A topographic surface for each class is obtained, using Bayesian dissimilarities and a min-max criterion. 

Two algorithms are proposed: a regularized classification based on the topographic surface and incorporating an MRF 

model, and a priority multi-label flooding algorithm. Segmentation results on the LHI data set are presented. 

14:10-14:30, Paper ThBT4.3 

Steerable Filtering using Novel Circular Harmonic Functions with Application to Edge Detection 

Papari, Giuseppe, Univ. of Groningen 

Campisi, Patrizio, Univ. degli Studi Roma TRE 

Petkov, N, Univ. of Groningen 

In this paper, we perform approximate steering of the elongated 2D Hermite-Gauss functions with respect to rotations and 

provide a compact analytical expressions for the related basis functions. A special notation introduced here considerably 

simplifies the derivation and unifies the cases of even and odd indices. The proposed filters are applied to edge detection. 

Quantitative analysis shows a performance increase of about 12.5% in terms of the Pratt’s figure of merit with respect to 

the well-established Gaussian gradient proposed by Canny. 

14:30-14:50, Paper ThBT4.4 

3D Vertebral Body Segmentation using Shape based Graph Cuts 

Aslan, Melih Seref, Univ. of Louisville 



Rara, Ham, Univ. of Louisville 

Arnold, Ben, Image Analysis, Inc. 

Ping, Xiang, Image Analysis, Inc. 

Bone mineral density (BMD) measurements and fracture analysis of the spine bones are restricted to the Vertebral bodies 

(VBs). In this paper, we propose a novel 3D shape based method to segment VBs in clinical computed tomography (CT) 

images without any user intervention. The proposed method depends on both image appearance and shape information. 

3D shape information is obtained from a set of training data sets. Then, we estimate the shape variations using a distance 

probabilistic model which approximates the marginal densities of the VB and background in the variability region. To 

segment a VB, the Matched filter is used to detect the VB region automatically. We align the detected volume with 3D 

shape prior in order to be used in distance probabilistic model. Then, the graph cuts method which integrates the linear 

- 284 -

combination of Gaussians (LCG), Markov Gibbs Random Field (MGRF), and distance probabilistic model obtained from 

3D shape prior is used. Experiments on the data sets show that the proposed segmentation approach is more accurate than 

other known alternatives. 

14:50-15:10, Paper ThBT4.5 

Locally Deformable Shape Model to Improve 3D Level Set based Esophagus Segmentation 

Kurugol, Sila, Northeastern Univ. 

Ozay, Necmiye, Northeastern Univ. 

Dy, Jennifer G., Northeastern Univ. 

Sharp, Gregory C., Mass. General Hospital and Harvard Medical School 

Brooks, Dana H., Northeastern Univ. 

In this paper we propose a supervised 3D segmentation algorithm to locate the esophagus in thoracic CT scans using a 

variational framework. To address challenges due to low contrast, several priors are learned from a training set of segmented 

images. Our algorithm first estimates the centerline based on a spatial model learned at a few manually marked anatomical 

reference points. Then an implicit shape model is learned by subtracting the centerline and applying PCA to these shapes. 

To allow local variations in the shapes, we propose to use nonlinear smooth local deformations. Finally, the esophageal 

wall is located within a 3D level set framework by optimizing a cost function including terms for appearance, the shape 

model, smoothness constraints and an air/contrast model. 

ThBT5 Topkapı Hall B 

3D Face Recognition Regular Session 

Session chair: Li, Stan Z. (CASIA) 

13:30-13:50, Paper ThBT5.1 

3D Face Recognition by Deforming the Normal Face 

Li, Xiaoli, Southeast Univ. 

Da, Feipeng, Southeast Univ. 

3D face recognition is complicated by the presence of expression variation. In this paper, we present an automatic 3D face 

recognition method which can differentiate the expression deformations from the interpersonal differences and recognize 

faces with expressions being removed. The deformations caused by expression and interpersonal difference are firstly 

learnt from training set, respectively. Then the deformations are linearly combined to synthesize new face with certain expression. 

When a target face comes in, the synthesized face is used to match it by adjusting the coefficients in the linear 

combination. After the matching process, coefficients corresponding to the interpersonal differences are chosen as features 

for recognition. We perform experiments on the FRGC v2.0 database and good performance is obtained. 

13:50-14:10, Paper ThBT5.2 

Real-Time 3D Face and Facial Action Tracking using Extended 2D+3D AAMs 

Zhou, Mingcai, Chinese Acad. of Sciences 

Wang, Yangsheng, Chinese Acad. of Sciences 

Huang, Xiangsheng, Chinese Acad. of Sciences 

In this work, we address the problem of tracking three dimensional (3D) faces and facial actions in video sequences. The 

main contributions of the paper are as follows. First, we develop an extended 2D+3D Active Appearance Models (AAM) 

based 3D face and facial action tracking framework using 2D view-based AAMs and a modified 3D face model. Second, 

we develop a robust shape initialization method based on local feature matching to track fast face motion. Experiments 

evaluating the effectiveness of the proposed algorithm are reported. 

14:10-14:30, Paper ThBT5.3 

A Novel Face Recognition Approach using a 2D-3D Searching Strategy 

Dahm, Nicholas, NICTA 


Many Face Recognition techniques focus on 2D-2D comparison or 3D-3D comparison, however few techniques explore 

- 285 -

the idea of cross-dimensional comparison. This paper presents a novel face recognition approach that implements crossdimensional 

comparison to solve the issue of pose invariance. Our approach implements a Gabor representation during 

comparison to allow for variations in texture, illumination, expression and pose. Kernel scaling is used to reduce comparison 

time during the branching search, which determines the facial pose of input images. The conducted experiments prove 

the viability of this approach, with our larger kernel experiments returning 91.6% - 100% accuracy on a database comprised 

of both local data, and data from the USF Human ID 3D database. 

14:30-14:50, Paper ThBT5.4 

Initialization and Pose Alignment in Active Shape Model 

Xiong, Pengfei, Chinese Acad. of Sciences 



In this paper, we propose a new algorithm for shape initialization and 3D pose alignment in Active Shape Model (ASM). 

Instead of initializing with average shape in previous works, we build a scatter data interpolation model from key points 

to obtain the initial shape, which ensures shape initialized around face organs. These key points are chosen from organs 

of face shape and located with a strong classifier firstly. Then they are utilized to build a Radial Basis Function (RBF) 

model to deform the average shape as initial shape. Besides, to cope with variety face poses, we define a 3D general shape 

to align face shapes in 3D instead of 2D alignment in Classic ASM. With the accurate 3D rotation angles iteratively calculated 

by Levenberg-Marquardt (LM) algorithm, shapes can be aligned to standard shape more reliably. Experiments 

and comparisons on FERET show that both shape initialization and 3D pose alignment of our algorithm greatly improve 

the location accuracy. 

14:50-15:10, Paper ThBT5.5 

3D Face Reconstruction using a Single or Multiple Views 

Choi, Jongmoo, Univ. of Southern California 


Lin, Yuping, Univ. of Southern California 

Silva, Luciano, Univ. Federal do Parana 

Bellon, Olga Regina Pereira, Univ. Federal do Parana 

Pamplona, Mauricio, Univ. Federal do Parana 

Faltemier, Timothy, Progeny Systems 

We present a 3D face reconstruction system that takes as input either one single view or several different views. Given a 

facial image, we first classify the facial pose into one of five predefined poses, then detect two anchor points that are then 

used to detect a set of predefined facial landmarks. Based on these initial steps, for a single view we apply a warping 

process using a generic 3D face model to build a 3D face. For multiple views, we apply sparse bundle adjustment to reconstruct 

3D landmarks which are used to deform the generic 3D face model. Experimental results on the Color FERET 

and CMU multi-PIE databases confirm our framework is effective in creating realistic 3D face models that can be used in 

many computer vision applications, such as 3D face recognition at a distance. 

ThBT6 Dolmabahçe Hall A 

Text Analysis and Detection Regular Session 

Session chair: Kholmatov, Alisher (TUBITAK UEKAE) 

13:30-13:50, Paper ThBT6.1 

Text Detection using Edge Gradient and Graph Spectrum 

Zhang, Jing, Univ. of South Florida 


In this paper, we propose a new unsupervised text detection approach which is based on Histogram of Oriented Gradient 

and Graph Spectrum. By investigating the properties of text edges, the proposed approach first extracts text edges from 

an image and localize candidate character blocks using Histogram of Oriented Gradients, then Graph Spectrum is utilized 

to capture global relationship among candidate blocks and cluster candidate blocks into groups to generate bounding boxes 

of text objects in the image. The proposed method is robust to the color and size of text. ICDAR 2003 text locating dataset 

- 286 -

and video frames were used to evaluate the performance of the proposed approach. Experimental results demonstrated the 

validity of our approach. 

13:50-14:10, Paper ThBT6.2 

Scene Text Extraction with Edge Constraint and Text Collinearity 

Lee, Seonghun, KAIST 

Cho, MinSu, KAIST 

Jung, Kyomin, KAIST 

Kim, Jin Hyung, KAIST 

In this paper, we propose a framework for isolating text regions from natural scene images. The main algorithm has two 

functions: it generates text region candidates, and it verifies of the label of the candidates (text or non-text). The text region 

candidates are generated through a modified K-means clustering algorithm, which references texture features, edge information 

and color information. The candidate labels are then verified in a global sense by the Markov Random Field model 

where collinearity weight is added as long as most texts are aligned. The proposed method achieves reasonable accuracy 

for text extraction from moderately difficult examples from the ICDAR 2003 database. 

14:10-14:30, Paper ThBT6.3 

Typographical Features for Scene Text Recognition 

Weinman, Jerod, Grinnell Coll. 

Scene text images feature an abundance of font style variety but a dearth of data in any given query. Recognition methods 

must be robust to this variety or adapt to the query data’s characteristics. To achieve this, we augment a semi-Markov 

model—-integrating character segmentation and recognition—-with a bigram model of character widths. Softly promoting 

segmentations that exhibit font metrics consistent with those learned from examples, we use the limited information available 

while avoiding error-prone direct estimates and hard constraints. Incorporating character width bigrams in this fashion 

improves recognition on low-resolution images of signs containing text in many fonts. 

14:30-14:50, Paper ThBT6.4 

A Visual Attention based Approach to Text Extraction 

Sun, Qiaoyu, HuaihaiInstitute of Tech. 

Lu, Yue, East China Normal Univ. 


A visual attention based approach is proposed to extract texts from complicated background in camera-based images. 

First, it applies the simplified visual attention model to highlight the region of interest (ROI) in an input image and to 

yield a map, named the VA map, consisting of the ROIs. Second, an edge map of image containing the edge information 

of four directions is obtained by Sobel operators. Character areas are detected by connected component analysis and 

merged into candidate text regions. Finally, the VA map is employed to confirm the candidate text regions. The experimental 

results demonstrate that the proposed method can effectively extract text information and locate text regions contained in 

camera-based images. It is robust not only for font, size, color, language, space, alignment and complexity of background, 

but also for perspective distortion and skewed texts embedded in images. 

14:50-15:10, Paper ThBT6.5 

New Wavelet and Color Features for Text Detection in Video 


Phan, Trung Quy, National Univ. of Singapore 


Automatic text detection in video is an important task for efficient and accurate indexing and retrieval of multimedia data 

such as events identification, events boundary identification etc. This paper presents a new method comprising of wavelet 

decomposition and color features namely R, G and B. The wavelet decomposition is applied on three color bands separately 

to obtain three high frequency sub bands (LH, HL and HH) and then the average of the three sub bands for each color 

band is computed further to enhance the text pixels in video frame. To take advantage of wavelet and color information, 

we again take the average of the three average images (AoA) obtained by the former step to increase the gap between text 

- 287 -

and non text pixels. Our previous Laplacian method is employed on AoA for text detection. The proposed method is evaluated 

by testing on a large dataset which includes publicly available data, non text data and ICDAR-03 data. Comparative 

study with existing methods shows that the results of the proposed method are encouraging and useful. 

ThBT7 Dolmabahçe Hall C 

Quantitative Biological Image and Signal Analysis Regular Session 

Session chair: Tasdizen, Tolga (Univ. of Utah) 

13:30-13:50, Paper ThBT7.1 

Improving Undersampled MRI Reconstruction using Non-Local Means 

Adluru, Ganesh, Univ. of Utah 


Whitaker, Ross, Univ. of Utah 

Dibella, Edward, Univ. of Utah 

Obtaining high quality images in MR is desirable not only for accurate visual assessment but also for automatic processing 

to extract clinically relevant parameters. Filtering-based techniques are extremely useful for reducing artifacts caused due 

to under sampling of k-space (to reduce scan time). The recently proposed Non-Local Means (NLM) filtering method 

offers a promising means to denoise images. Compared to most previous approaches, NLM is based on a more realistic 

model of images, which results in little loss of information while removing the noise. Here we extend the NLM method 

for MR image reconstruction from under sampled k-space data. The method is applied on T1-weighted images of the 

breast and T2-weighted anatomical brain images. Results show that NLM offers a promising method that can be used for 

accelerating MR data acquisitions. 

13:50-14:10, Paper ThBT7.2 

Towards an Intelligent Bed Sensor: Non-Intrusive Monitoring of Sleep Irregularities with Computer Vision Techniques 

Branzan Albu, Alexandra, Univ. of Victoria 

Malakuti, Kaveh, Univ. of Victoria 

This paper proposes a novel approach for monitoring sleep using pressure data. The goal of sleep monitoring is to detect 

and log events of normal breathing, sleep apnea and body motion. The proposed approach is based on translating the signal 

data to the image domain by computing a sequence of inter-frame similarity matrices from pressure maps acquired with 

a mattress of pressure sensors. Periodicity analysis was performed on similarity matrices via a new algorithm based on 

segmentation of elementary patterns using the watershed transform, followed by aggregation of quasi-rectangular patterns 

into breathing cycles. Once breathing events are detected, all remaining elementary patterns aligned on the main diagonal 

are considered as belonging to either apnea or motion events. The discrimination between these two events is based on 

detecting movement times from a statistical analysis of pressure data. Experimental results confirm the validity of our approach. 

14:10-14:30, Paper ThBT7.3 

Automatic Selection of Keyframes from Angiogram Videos 

Syeda-Mahmood, Tanveer, IBM Almaden Res. Center 

Wang, Fei, Almaden Res. Center 

Beymer, David, IBM Almaden Res. Center 

Mahmood, Aafreen, Monta Vista High School 

Lundstrom, Robert, Kaiser Permanente SFO Medical Center 

In this paper we address the problem of automatic selection of important vessel-depicting key frames within 2D angiography 

videos. Two different methods of frame selection are described, one based on Frangi filter, and the other based on 

detecting parallel curves formed from edges in angiography images. Results are shown by comparison to physician annotation 

of such key frames on 2D coronary angiograms. 

14:30-14:50, Paper ThBT7.4 

A Computer-Aided Method for Scoliosis Fusion Level Selection by a Topologicaly Ordered Self Organizing Kohonen 

Network 

- 288 -

Mezghani, Neila, Centre de Recherche du CHUM 

Phan, Philippe, Sainte-Justine University Hospital Center 

Mitiche, Amar, Labella, Hubert, cole Polytechnique de Montreal 

de Guise, Jacques, Centre de Recherche du CHUM 

Surgical instrumentation for the Adolescent idiopathic scoliosis (AIS) is a complex procedure involving many difficult 

decisions. Selection of the appropriate fusion level remains one of the most challenging decisions in scoliosis surgery. 

Currently, the Lenke classification model is generally followed in surgical planning. The purpose of our study is to investigate 

a computer aided method for Lenke classification and scoliosis fusion level selection. The method uses a self organizing 

neural network trained on a large database of surgically treated AIS cases. The neural network produces two 

maps, one of Lenke classes and the other of fusion levels. These two maps show that the Lenke classes are associated 

with the the proper fusion level categories everywhere in the map except at the Lenke class transitions. The topological 

ordering of the Cobb angles in the neural network justifies determining a patient scoliotic treatment instrumentation using 

directly the fusion level map rather than via the Lenke classification. 

14:50-15:10, Paper ThBT7.5 

A Fast and Robust Graph-Based Approach for Boundary Estimation of Fiber Bundles Relying on Fractional 

Anisotropy Maps 

Bauer, Miriam Helen Anna, Univ. of Marburg 

Egger, Jan, Univ. of Marburg 

Odonnell, Thomas Patrick, Siemens Corp. Res. 


Barbieri, Sebastiano, Fraunhofer MEVIS 

Klein, Jan, Fraunhofer MEVIS 

Hahn, Horst Karl, Fraunhofer MEVIS 

Nimsky, Christopher, Univ. Marburg 

In this paper, a fast and robust graph-based approach for boundary estimation of fiber bundles derived from Diffusion 

Tensor Imaging (DTI) is presented. DTI is a non-invasive imaging technique that allows the estimation of the location of 

white matter tracts based on measurements of water diffusion properties. Depending on DTI data, the fiber bundle boundary 

can be determined to gain information about eloquent structures, which is of major interest for neurosurgery. DTI in combination 

with tracking algorithms allows the estimation of position and course of fiber tracts in the human brain. The presented 

method uses these tracking results as the starting point for a graph-based approach. The overall method starts by 

computing the fiber bundle centerline between two user-defined regions of interests (ROIs). This centerline determines 

the planes that are used for creating a directed graph. Then, the mincut of the graph is calculated, creating an optimal 

boundary of the fiber bundle. 

ThCT1 Marmara Hall 

Object Detection and Recognition - VI Regular Session 

Session chair: Denzler, Joachim (Friedrich-Schiller Univ. of Jena ) 

15:40-16:00, Paper ThCT1.1 

Recognizing 3D Objects with 3D Information from Stereo Vision 

Yoon, Kuk-Jin, GIST 

Shin, Min-Gil, GIST 

Lee, Ji-Hyo, Samsung Electronics 

Conventional local feature-based object recognition methods try to recognize learned 3D objects by using unordered local 

feature matching followed by the verification. However, the matching between unordered feature sets can be ambiguous 

and, moreover, it is difficult to deal with general shaped 3D objects in the verification stage. In this paper, we present a 

new framework for general 3D object recognition, which is based on the invariant local features and their 3D information 

with stereo cameras. We extend the conventional object recognition framework for stereo cameras. Since the proposed 

method is based on the stereo vision, it is possible to utilize 3D information of local features visible from two cameras. 

- 289 -

16:00-16:20, Paper ThCT1.2 

Combining Geometry and Local Appearance for Object Detection 

Pascual García-Tubío, Manuel, Vienna Univ. of Tech. 

Wildenauer, Horst, Vienna Univ. of Tech. 

Szumilas, Lech, Ind. Research Inst. for Automation & Measurement 

In this paper we address the problem of object detection in cluttered scenes. Local image features and their spatial configuration 

act as representation of object classes which are learned in a discriminative fashion. Recent contributions in the 

area of object detection indicate the importance of using geometrical properties for representing object classes. Prompted 

by this, we devised an approach tailored to control the importance of the features and their spatial alignment. We quantitatively 

show that modeling the spatial distribution of local features and optimising the influence of both cues significantly 

boosts object detection performance. 

16:20-16:40, Paper ThCT1.3 

Illumination and Expression Invariant Face Recognition using SSIM based Sparse Representation 

Khwaja, Asim, The Australian National Univ. 

Asthana, Akshay, Australian National Univ. 

Goecke, Roland, Univ. of Canberra 

The sparse representation technique has provided a new way of looking at object recognition. As we demonstrate in this 

paper, however, the mean-squared error (MSE) measure, which is at the heart of this technique, is not a very robust measure 

when it comes to comparing facial images, which differ significantly in luminance values, as it only performs pixel-bypixel 

comparisons. This requires a significantly large training set with enough variations in it to offset the drawback of the 

MSE measure. A large training set, however, is often not available. We propose the replacement of the MSE measure by 

the structural similarity (SSIM) measure in the sparse representation algorithm, which performs a more robust comparison 

using only one training sample per subject. In addition, since the off-the-shelf sparsifiers are also written using the MSE 

measure, we developed our own sparsifier using genetic algorithms that use the SSIM measure. We applied the modified 

algorithm to the Extended Yale Face B database as well as to the Multi-PIE database with expression and illumination 

variations. The improved performance demonstrates the effectiveness of the proposed modifications. 

16:40-17:00, Paper ThCT1.4 

Improving Classification Accuracy by Comparing Local Features through Canonical Correlations 

Dikmen, Mert, Univ. of Illinois at Urbana Champaign 

Huang, Thomas, Univ. of Illinois at Urbana-Champaign 

Classifying images using features extracted from densely sampled local patches has enjoyed significant success in many detection 

and recognition tasks. It is also well known that generally more than one type of feature is needed to achieve robust 

classification performance. Previous works using multiple features have addressed this issue either through simple concatenation 

of feature vectors or through combining feature specific kernels at the classifier level. In this work we introduce a 

novel approach for combining features at the feature level by projecting two types of features onto two respective subspaces 

in which they are maximally correlated. We use their correlation as an augmented feature and demonstrate improvement in 

classification accuracy over simple combination through concatenation in a pedestrian detection framework. 

17:00-17:20, Paper ThCT1.5 

A Robust Approach for Person Localization in Multi-Camera Environment 

Sun, Luo, Tsinghua Univ. 

Di, Huijun, Tsinghua Univ. 

Tao, Linmi, Tsinghua Univ. 

Xu, Guangyou, Tsinghua Univ. 

Person localization is fundamental in human centered computing, since person should be localized before being actively serviced. This paper 

proposed a robust approach to localize person based on the geometric constraints in multi-camera environment. The proposed algorithm has 

several advantages: 1) no assumption on the positions and orientations of cameras except the cameras should have certain common field of 

view; 2) no assumption on the visibility of particular body part (e.g., feet), except a portion of person should be observed in at least two views; 

3) reliability in terms of tolerating occlusion, body posture change and inaccurate motion detection. It can also provide error control and be 

further extended to measure person height. The efficacy of the approach is demonstrated on challenging real-world scenarios. 

- 290 -

ThCT2 Anadolu Auditorium 

Classification - III Regular Session 

Session chair: Tortorella, Francesco (Univ. degli Studi di Cassino) 

15:40-16:00, Paper ThCT2.1 

Nearest Archetype Hull Methods for Large-Scale Data Classification 


This paper introduces an efficient geometric approach for data classification that can build class models from large amounts 

of high dimensional data. We determine a convex model of the data as the outcome of convex hull non-negative matrix 

factorization, a large-scale variant of Archetypal Analysis. The resulting convex regions or archetype hulls give an optimal 

(in a least squares sense) bounding of the data region and can be efficiently computed. We classify based on the minimum 

distance to the closest archetype hull. The proposed method offers (i) an intuitive geometric interpretation, (ii) single as 

well as multi-class classification, and (iii) handling of large amounts of high dimensional data. Experimental evaluation 

on common benchmark data sets shows promising results. 

16:00-16:20, Paper ThCT2.2 

A Bound on the Performance of LDA in Randomly Projected Data Spaces 

Durrant, Robert John, Univ. of Birmingham 

Kaban, Ata, Univ. of Birmingham 

We consider the problem of classification in nonadaptive dimensionality reduction. Specifically, we bound the increase in 

classification error of Fisher’s Linear Discriminant classifier resulting from randomly projecting the high dimensional 

data into a lower dimensional space and both learning the classifier and performing the classification in the projected 

space. Our bound is reasonably tight, and unlike existing bounds on learning from randomly projected data, it becomes 

tighter as the quantity of training data increases without requiring any sparsity structure from the data. 

16:20-16:40, Paper ThCT2.3 

Adaptive Incremental Learning with an Ensemble of Support Vector Machines 

Kapp, Marcelo N., École de Tech. Supérieure - Univ. of Quebec 


Maupin, Patrick, Defence Res. and Development Canada 

The incremental updating of classifiers implies that their internal parameter values can vary according to incoming data. 

As a result, in order to achieve high performance, incremental learner systems should not only consider the integration of 

knowledge from new data, but also maintain an optimum set of parameters. In this paper, we propose an approach for performing 

incremental learning in an adaptive fashion with an ensemble of support vector machines. The key idea is to track, 

evolve, and combine optimum hypotheses over time, based on dynamic optimization processes and ensemble selection. 

From experimental results, we demonstrate that the proposed strategy is promising, since it outperforms a single classifier 

variant of the proposed approach and other classification methods often used for incremental learning. 

16:40-17:00, Paper ThCT2.4 

Margin Preserved Approximate Convex Hulls for Classification 

Takahashi, Tetsuji, Hokkaido Univ. 

Kudo, Mineichi, Hokkaido Univ. 

The usage of convex hulls for classification is discussed with a practical algorithm, in which a sample is classified according 

to the distances to convex hulls. Sometimes convex hulls of classes are too close to keep a large margin. In this paper, we 

discuss a way to keep a margin larger than a specified value. To do this, we introduce a concept of `èxpanded convex 

hull’’ and confirm its effectiveness. 

17:00-17:20, Paper ThCT2.5 

Evolving Fuzzy Classifiers: Application to Incremental Learning of Handwritten Gesture Recognition Systems 

Almaksour, Abdullah, IRISA/INSA de Rennes 

Anquetil, Eric, IRISA/INSA 

- 291 -

Quiniou, Solen, Ec. de Tech. Supérieure 

Cheriet, Mohammed, École de Tech. Supérieure 

In this paper, we present a new method to design customizable self-evolving fuzzy rule-based classifiers. The presented 

approach combines an incremental clustering algorithm with a fuzzy adaptation method in order to learn and maintain the 

model. We use this method to build an evolving handwritten gesture recognition system. The self-adaptive nature of this 

system allows it to start its learning process with few learning data, to continuously adapt and evolve according to any 

new data, and to remain robust when introducing a new unseen class at any moment in the life-long learning process. 

ThCT3 Topkapı Hall A 

Computer Vision Applications - III Regular Session 

Session chair: Yilmaz, Alper (Ohio State Univ.) 

15:40-16:00, Paper ThCT3.1 

Fast and Spatially-Smooth Terrain Classification using Monocular Camera 

Jakkoju, Chetan, IIIT Hyderabad 

Krishna, Madhava, IIIT Hyderabad 

Jawahar, C. V., IIIT 

In this paper, we present a monocular camera based terrain classification scheme. The uniqueness of the proposed scheme 

is that it inherently incorporates spatial smoothness while segmenting a image, without requirement of post-processing 

smoothing methods. The algorithm is extremely fast because it is build on top of a Random Forest classifier. We present 

comparison across features and classifiers. The baseline algorithm uses color, texture and their combination with classifiers 

such as SVM and Random Forests. We further enhance the algorithm through a label transfer method. The efficacy of the 

proposed solution can be seen as we reach a low error rates on both our dataset and other publicly available datasets. 

16:00-16:20, Paper ThCT3.2 

Learning Major Pedestrian Flows in Crowded Scenes 

Widhalm, Peter, Austrian Inst. of Tech. 

Braendle, Norbert, Austrian Inst. of Tech. 

We present a crowd analysis approach computing a representation of the major pedestrian flows in complex scenes. It 

treats crowds as a set of moving particles and builds a spatio-temporal model of motion events. A Growing Neural Gas algorithm 

encodes optical flow particle trajectories as sequences of local motion events and learns a topology which is the 

base for trajectory distance computations. Trajectory prototypes are aligned with a two-open-ends version of Dynamic 

Time Warping to cope with fragmented trajectores. The trajectories are grouped into an automatically determined number 

of clusters with self-tuning spectral clustering. The clusters are compactly represented with the help of Principal Component 

Analysis, providing a technique for unusual motion detection based on residuals. We demonstrate results for a publicly 

available crowded video and a scene with volunteers moving according to defined origin-destination flows. 

16:20-16:40, Paper ThCT3.3 

On-Line Video Recognition and Counting of Harmful Insects 

Bechar, Ikhlef, INRIA 

Sabine Moisan, INRIA 

Monique Thonnat, INRIA 

Francois Bremond, INRIA 

This article is concerned with on-line counting of harmful insects of certain species in videos in the framework of in situ 

video-surveillance that aims at the early detection of prominent pest attacks in greenhouse crops. The video-processing 

challenges that need to be coped with concern mainly the low spatial resolution and color contrast of the objects of interest 

in the videos, the outdoor issues and the video-processing which needs to be done in quasi-real time. Thus, we propose an 

approach which makes use of a pattern recognition algorithm to extract the locations of the harmful insects of interest in 

a video, which we combine with some video-processing algorithms in order to achieve an on-line video-surveillance solution. 

The system has been validated off-line on the whiteflie species (one potential harmful insect) and has shown acceptable 

performance in terms of accuracy versus computational time. 

- 292 -

16:40-17:00, Paper ThCT3.4 

Boosted Edge Orientation Histograms for Grasping Point Detection 

Lefakis, Leonidas, TU Vienna 

Wildenauer, Horst, Vienna Univ. of Tech. 

Pascual García-Tubío, Manuel, Vienna Univ. of Tech. 

Szumilas, Lech, Ind. Research Inst. for Automation & Measurement 

In this paper, we describe a novel algorithm for the detection of grasping points in images of previously unseen objects. 

A basic building block of our approach is the use of a newly devised descriptor, representing semi-local grasping point 

shape by the use edge orientation histograms. Combined with boosting, our method learns discriminative grasp point models 

for new objects from a set of annotated real-world images. The method has been extensively evaluated on challenging 

images of real scenes, exhibiting largely varying characteristics concerning illumination conditions, scene complexity, 

and viewpoint. Our experiments show that the method works in a stable manner and that its performance compares favorably 

to the state-of-the-art. 

17:00-17:20, Paper ThCT3.5 

Automatic Refinement of Foreground Regions for Robot Trail Following 

Kocamaz, Mehmet Kemal, Univ. of Delaware 

Rasmussen, Christopher, Univ. of Delaware 

Continuous trails are extended regions along the ground such as roads, hiking paths, rivers, and pipelines which can be 

navigationally useful for ground-based or aerial robots. Finding trails in an image and determining possible obstacles on 

them are important tasks for robot navigation systems. Assuming that a rough initial segmentation or outline of the region 

of interest is available, our goal is to refine the initial guess to obtain a more accurate and detail representation of the true 

trail borders. In this paper, we compare the suitability of several previously published segmentation algorithms both in 

terms of agreement with ground truth and speed on a range of trail images with diverse appearance characteristics. These 

algorithms include generic graph cut, a shape-based version of graph cut which employs a distance penalty, Grab Cut, and 

an iterative superpixel grouping method. 

ThCT4 Dolmabahçe Hall A 

Image Representation and Analysis Regular Session 

Session chair: Debled-Rennesson, Isabelle (LORIA-Nancy Univ.) 

15:40-16:00, Paper ThCT4.1 

Object Decomposition via Curvilinear Skeleton Partition 

Serino, Luca, Istituto di Cibernetica 

Sanniti Di Baja, Gabriella, CNR 

Arcelli, Carlo, Istituto di Cibernetica 

A method to decompose a complex 3D object into simpler parts is presented, based on a suitable partition of the curvilinear 

skeleton of the object. The curvilinear skeleton is divided into subsets, by taking into account the regions of influence that 

can be associated with its branch points. The obtained subsets are then used to recover the parts into which the object can 

be decomposed. 

16:00-16:20, Paper ThCT4.2 

Differential Area Profiles 

Ouzounis, Georgios, Joint Res. Center - Ispra, European Commission, 


In this paper a new feature descriptor, the differential area profile (DAP), is presented. DAPs, like the regular differential 

morphological profiles, are computed from some size distribution. The proposed method is based on the area metric given 

by regular connected area filters. Area compared to local width, i.e. the diameter of the structuring element in the corresponding 

set of openings by reconstruction in classical DMPs, leads to a rather different multi-scale decomposition. This 

is investigated here and an example on a very high resolution satellite image tile is given. 

- 293 -

16:20-16:40, Paper ThCT4.3 

Connected Component Trees for Multivariate Image Processing and Applications in Astronomy 

Perret, Benjamin, Univ. of Strasbourg, LSIIT-CNRS 

Lefèvre, Sébastien, Univ. of Strasbourg 

Collet, Christophe, Univ. of Strasbourg, LSIIT-CNRS 

Slezak, Eric Jean Marc, Univ. de Nice - Sophia Antipolis, 

In this paper, we investigate the possibilities offered by the extension of the connected component trees (cc-trees) to multivariate 

images. We propose a general framework for image processing using the cc-tree based on the lattice theory and 

we discuss the possible applications depending on the properties of the underlying ordered set. This theoretical reflexion 

is illustrated by two applications in multispectral astronomical imaging: source separation and object detection. 

16:40-17:00, Paper ThCT4.4 

Multiresolution Analysis of 3D Images based on Discrete Distortion 

Weiss, Kenneth, Univ. of Maryland, Coll. Park 

Mesmoudi, Mohammed Mostefa, Univ. of Genova 

De Floriani, L., Univ. of Genova 

We consider a model of a 3D image obtained by discretizing it into a multiresolution tetrahedral mesh known as a hierarchy 

of diamonds. This model enables us to extract crack-free approximations of the 3D image at any uniform or variable resolution, 

thus reducing the size of the data set without reducing the accuracy. A 3D intensity image is a scalar field (the intensity 

field) defined at the vertices of a 3D regular grid and thus the graph of the image is a hyper surface in $R^4$. We 

measure the discrete distortion, a generalization of the notion of curvature, of the transformation which maps the tetrahedralized 

3D grid onto its graph in $R^4$. We evaluate the use of a hierarchy of diamonds to analyze properties of a 3D 

image, such as its discrete distortion, directly on lower resolution approximations. Our results indicate that distortionguided 

extractions focus the resolution of approximated images on the salient features of the intensity image. 

17:00-17:20, Paper ThCT4.5 

Multiscale Analysis of Digital Segments by Intersection of 2D Digital Lines 

Said, Mouhammad, Univ. de Savoie, Univ. d’Auvergne 

Lachaud, Jacques-Olivier, Univ. of Savoie 


A theory for the multiscale analysis of digital shapes would be very interesting for the pattern recognition community, 

giving a digital equivalent of the continuous scale-space theory. We focus here on providing analytical formulae of the 

multiresolution of Digital Straight Segments (DSS), which is a fundamental tool for describing digital shape contours. 

ThCT5 Dolmabahçe Hall B 

Image/Video Processing Regular Session 

Session chair: Hamzaoğlu, İlker (Sabancı Univ.) 

15:40-16:00, Paper ThCT5.1 

Stereoscopic Image Inpainting: Distinct Depth Maps and Images Inpainting 

Hervieu, Alexandre, Barcelona Media, Univ. Pompeu Fabra of Barcelona 

Papadakis, Nicolas, Barcelona Media 

Bugeau, Aurélie, Barcelona Media 

Gargallo, Pau, Barcelona Media 

Caselles, Vicent, Univ. Pompeu Fabra 

In this paper we propose an algorithm for in painting of stereo images. The issue is to reconstruct the holes in a pair of 

stereo image as if they were the projection of a 3D scene. Hence, the reconstruction of the missing information has to produce 

a consistent visual perception of depth. Thus, first step of the algorithm consists in the computation and in painting 

of disparity maps in the given holes. The second step of the algorithm is to fill-in missing regions using the complete disparity 

maps in a way that avoids the creation of 3D artifacts. We present some experiments on several pairs of stereo images. 

- 294 -

16:00-16:20, Paper ThCT5.2 

Panoramic Video Generation by Multi View Data Synthesis 

D’Orazio, Tiziana, Italian National Res. Council 

Leo, Marco, Italian National Res. Council 

Mosca, Nicola, Italian National Res. Council 

This paper presents a mosaic based approach for enlarged view soccer video production that can be provided to the audience 

as a complementary view for greater enjoyment of relevant events, such as offside, counter attack or goal, that spread out 

all over the playing feld. Firstly, an enlarged view of the whole field is produced by fusing the images of six cameras 

placed on the two sides of the field. Then a color transformation is applied to have uniform colors on the parts of the 

playing field acquired from different cameras. Finally, the players are segmented by each camera and projected onto the 

enlarged view to produce videos of the most interesting events. 

16:20-16:40, Paper ThCT5.3 

An Adaptive True Motion Estimation Algorithm for Frame Rate Conversion of High Definition Video 

Cetin, Mert, Sabanci Univ. 

Hamzaoglu, Ilker, Sabanci Univ. 

Frame Rate Up-Conversion (FRUC) is necessary for displaying low frame rate video signals on high frame rate flat panel 

displays. This paper proposes an adaptive true Motion Estimation (ME) algorithm for FRUC of High Definition video 

formats. The proposed ME algorithm produces similar quality results with less number of calculations or better quality 

results with similar number of calculations compared to 3D Recursive Search true ME algorithm by adaptively using optimized 

sets of candidate search locations and several computational complexity reduction techniques. 

16:40-17:00, Paper ThCT5.4 

Human Daily Activities Indexing in Videos from Wearable Cameras for Monitoring of Patients with Dementia Diseases 

Karaman, Svebor, Lab. 

Benois-Pineau, Jenny, Lab. 

Megret, Remi, Univ. of Bordeaux 

Dovgalecs, Vladislavs, IMS 

Gaëstel, Yann, INSERM U.897 

Dartigues, Jean-Francois, INSERM U.897 

Our research focuses on analysing human activities according to a known behaviorist scenario, in case of noisy and high 

dimensional collected data. The data come from the monitoring of patients with dementia diseases by wearable cameras. 

We define a structural model of video recordings based on a Hidden Markov Model. New spatio-temporal features, color 

features and localization features are proposed as observations. First results in recognition of activities are promising. 

17:00-17:20, Paper ThCT5.5 

Automatic Composition of an Informative Wide-View Image from Video 

Habe, Hitoshi, NAIST 

Makiyama, Shota, NAIST 

Kidode, Masatsugu, NAIST 

We describe a method for generating an informative wide-view image using images captured by a moving camera. The 

generated image allows for events in the scene observed by the camera to be understood easily. Our method does not use 

3D shape information explicitly. Instead, it employs the trajectory of feature points across multiple images and generates 

a composite image by taking into account the distribution of the trajectories of the feature points. 

ThCT6 Topkapı Hall B 

Facial Expression Regular Session 

Session chair: Akarun, Lale (Bogazici Univ.) 

- 295 -

15:40-16:00, Paper ThCT6.1 

Regression-Based Multi-View Facial Expression Recognition 

Rudovic, Ognjen, Imperial Coll. 

Patras, Ioannis, Queen Mary Univ. of London 

Pantic, Maja, Imperial Coll. 

We present a regression-based scheme for multi-view facial expression recognition based on 2D geometric features. We 

address the problem by mapping facial points (e.g. mouth corners) from non-frontal to frontal view where further recognition 

of the expressions can be performed using a state-of-the-art facial expression recognition method. To learn the mapping 

functions we investigate four regression models: Linear Regression (LR), Support Vector Regression (SVR), 

Relevance Vector Regression (RVR) and Gaussian Process Regression (GPR). Our extensive experiments on the CMU 

Multi-PIE facial expression database show that the proposed scheme outperforms view-specific classifiers by utilizing 

considerably less training data. 

16:00-16:20, Paper ThCT6.2 

A Set of Selected SIFT Features for 3D Facial Expression Recognition 

Berretti, Stefano, Univ. of Firenze 


Pala, Pietro, Univ. of Firenze 

Ben Amor, Boulbaba, LIFL UMR 8022 


In this paper, the problem of person-independent facial expression recognition is addressed on 3D shapes. To this end, an 

original approach is proposed that computes SIFT descriptors on a set of facial landmarks of depth images, and then selects 

the subset of most relevant features. Using SVM classification of the selected features, an average recognition rate of 

77.5% on the BU-3DFE database has been obtained. Comparative evaluation on a common experimental setup, shows 

that our solution is able to obtain state of the art results. 

16:20-16:40, Paper ThCT6.3 

Local 3D Shape Analysis for Facial Expression Recognition 

Maalej, Ahmed, LIFL UMR 8022 

Ben Amor, Boulbaba, LIFL UMR 8022 


Srivastava, Anuj, Florida State Univ. 

Berretti, Stefano, Univ. of Firenze 

We investigate the problem of facial expression recognition using 3D face data. Our approach is based on local shape 

analysis of several relevant regions of a given face scan. These regions or patches from facial surfaces are extracted and 

represented by sets of closed curves. A Riemannian framework is used to derive the shape analysis of the extracted patches. 

The applied framework permits to calculate a similarity (or dissimilarity) distances between patches, and to compute the 

optimal deformation between them. Once calculated, these measures are employed as inputs to a commonly used classification 

techniques such as AdaBoost and Support Vector Machines (SVM). A quantitative evaluation of our novel approach 

is conducted on a subset of the publicly available BU-3DFE database. 

16:40-17:00, Paper ThCT6.4 CANCELED 

Incorporating Action Unit Co-Movement in Classification of Dynamic Facial Expressions using Lasso 

Rastad, Mahdi, Univ. of Illinois 

Zhu, Lusha, Univ. of Illinois 

Koenker, Roger, Univ. of Illinois 

Spencer-Smith, Jesse, Univ. of Illinois 

Hsu, Ming, Univ. of California, Berkeley 

Current literature on facial expression analysis are often applied to static facial images along with a small set of expressions. 

In this research we generate a novel dataset of facial action unit dynamics during several experiment sessions by the means 

of an avatar controlled by participants using a joystick. Previous studies have shown that this generates highly realistic 

facial expressions, comparable to popular displays of facial expressions used in computer vision experiments. Here we 

- 296 -

extend this work by using functional data analysis (FDA) to classify facial movement functions into basic emotion categories. 

Several single and hybrid classification algorithms are tested. By incorporating action unit co-movement in a Lasso 

shrinkage method, we achieved a recognition rate of 89%, substantially outperforming competitor approaches. Application 

to real expressions, and introduction of intensity and other temporal features of expressions are discussed as examples of 

extensions of our method. 

17:00-17:20, Paper ThCT6.5 

Multi-Modal Emotion Recognition using Canonical Correlations and Acoustic Features 

Gajsek, Rok, Univ. of Ljubljana 

Struc, Vitomir, Univ. of Ljubljana 

Mihelic, France, Univ. of Ljubljana 

The information of the psycho-physical state of the subject is becoming a valuable addition to the modern audio or video 

recognition systems. As well as enabling a better user experience, it can also assist in superior recognition accuracy of the 

base system. In the article, we present our approach to multi-modal (audio-video) emotion recognition system. For audio 

sub-system, a feature set comprised of prosodic, spectral and cepstrum features is selected and support vector classifier is 

used to produce the scores for each emotional category. For video sub-system a novel approach is presented, which does 

not rely on the tracking of specific facial landmarks and thus, eliminates the problems usually caused, if the tracking algorithm 

fails at detecting the correct area. The system is evaluated on the interface database and the recognition accuracy 

of our audio-video fusion is compared to the published results in the literature. 

ThCT7 Dolmabahçe Hall C 

Multimedia and Document Analysis Applications Regular Session 

Session chair: Duygulu Sahin, Pinar (Bilkent Univ.) 

15:40-16:00, Paper ThCT7.1 

Automatic Music Genre Classification using Bass Lines 

Simsekli, Umut, Bogazici Univ. 

A bass line is an instrumental melody that encapsulates both rhythmic, melodic, and harmonic features and arguably contains 

sufficient information for accurate genre classification. In this paper a bass line based automatic music genre classification 

system is described. “Melodic Interval Histograms” are used as features and k-nearest neighbor classifiers are 

utilized and compared with SVMs on a small size standard MIDI database. Apart from standard distance metrics for knearest 

neighbor (Euclidean, symmetric Kullback-Leibler, earth mover’s, normalized compression distances) we propose 

a novel distance metric, perceptually weighted Euclidean distance (PWED). The maximum classification accuracy (84%) 

is obtained with k-nearest neighbor classifiers and the added utility of the novel metric is illustrated in our experiments. 

16:00-16:20, Paper ThCT7.2 

Exploiting Combined Multi-Level Model for Document Sentiment Analysis 

Li, Si, Beijing Univ. of Posts and Telecommunications 

Zhang, Hao, Beijing Univ. of Posts and Telecommunications 

Xu, Weiran, Beijing Univ. of Posts and Telecommunications 


This paper focuses on the task of text sentiment analysis in hybrid online articles and web pages. Traditional approaches 

of text sentiment analysis typically work at a particular level, such as phrase, sentence or document level, which might 

not be suitable for the documents with too few or too many words. Considering every level analysis has its own advantages, 

we expect that a combination model may achieve better performance. In this paper, a novel combined model based on 

phrase and sentence level’s analyses and a discussion on the complementation of different levels’ analyses are presented. 

For the phrase-level sentiment analysis, a newly defined Left-Middle-Right template and the Conditional Random Fields 

are used to extract the sentiment words. The Maximum Entropy model is used in the sentence-level sentiment analysis. 

The experiment results verify that the combination model with specific combination of features is better than single level 

model. 

- 297 -

16:20-16:40, Paper ThCT7.3 

MONORAIL: A Disk-Friendly Index for Huge Descriptor Databases 

Akune, Fernando, Univ. of Campinas 

Valle, Eduardo, Univ. of Campinas 

Torres, Ricardo, Univ. of Campinas 

We propose MONORAIL, an indexing scheme for very large multimedia descriptor databases. Our index is based on the 

Hilbert curve, which is able to map the high-dimensional space of those descriptors to a single dimension. Instead of using 

several curves to mitigate boundary effects, we use a single curve with several surrogate points for each descriptor. Thus, 

we are able to reduce the random accesses to the bare minimum. In a rigorous empirical comparison with another method 

based on multiple surrogates, ours shows a significant improvement, due to our careful choice of the surrogate points. 

16:40-17:00, Paper ThCT7.4 

Localized Supervised Metric Learning on Temporal Physiological Data 

Sun, Jimeng, IBM T. J. Watson Res. Center 

Sow, Daby, IBM T.J. Watson Res. Center 

Hu, Jianying, IBM 

Ebadollahi, Shahram, IBM T.J. Watson Res. Center 

Effective patient similarity assessment is important for clinical decision support. It enables the capture of past experience 

as manifested in the collective longitudinal medical records of patients to help clinicians assess the likely outcomes resulting 

from their decisions and actions. However, it is challenging to devise a patient similarity metric that is clinically relevant 

and semantically sound. Patient similarity is highly context sensitive: it depends on factors such as the disease, the particular 

stage of the disease, and co-morbidities. One way to discern the semantics in a particular context is to take advantage of 

physicians’ expert knowledge as reflected in labels assigned to some patients. In this paper we present a method that leverages 

localized supervised metric learning to effectively incorporate such expert knowledge to arrive at semantically sound 

patient similarity measures. Experiments using data obtained from the MIMIC II database demonstrate the effectiveness 

of this approach. 

17:00-17:20, Paper ThCT7.5 

Automatic Detection of Phishing Target from Phishing Webpage 

Liu, Gang, City Univ. of Hong Kong 

Qiu, Bite, City Univ. of Hong Kong 

Liu, Wenyin, City Univ. of Hong Kong 

An approach to identification of the phishing target of a given (suspicious) webpage is proposed by clustering the webpage 

set consisting of its all associated webpages and the given webpage itself. We first find its associated webpages, and then 

explore their relationships to the given webpage as their features for clustering. Such relationships include link relationship, 

ranking relationship, text similarity, and webpage layout similarity relationship. A DBSCAN clustering method is employed 

to find if there is a cluster around the given webpage. If such cluster exists, we claim the given webpage is a phishing 

webpage and then find its phishing target (i.e., the legitimate webpage it is attacking) from this cluster. Otherwise, we 

identify it as a legitimate webpage. Our test dataset consists of 8745 phishing pages (targeting at 76 well-known websites) 

selected from Phish Tank and preliminary experiments show that the approach can successfully identify 91.44% of their 

phishing targets. Another dataset of 1000 legitimate webpages is collected to test our method‘s false alarm rate, which is 

3.40%. 

ThBCT8 Upper Foyer 

Pattern Recognition Systems and Applications - III Poster Session 

Session chair: Radeva, Petia (CVC) 

13:30-16:30, Paper ThBCT8.1 

Underwater Mine Classification with Imperfect Labels 

Williams, David, NATO Undersea Res. Centre 

A new algorithm for performing classification with imperfectly labeled data is presented. The proposed approach is motivated 

by the insight that the average prediction of a group of sufficiently informed people is often more accurate than the 

- 298 -

prediction of any one supposed expert. This idea that the “wisdom of crowds” can outperform a single expert is implemented 

by drawing sets of labels as samples from a Bernoulli distribution with a specified labeling error rate. Additionally, 

ideas from multiple imputation are exploited to provide a principled way for determining an appropriate number of label 

sampling rounds to consider. The approach is demonstrated in the context of an underwater mine classification application 

on real synthetic aperture sonar data collected at sea, with promising results. 

13:30-16:30, Paper ThBCT8.2 

Optimizing Optimum-Path Forest Classification for Huge Datasets 

Papa, Joao Paulo, Sao Paulo State Univ 

Cappabianco, Fabio, Univ. of Campinas 

Falcao, Alexandre Xavier, State Univ. of Campinas 

Traditional pattern recognition techniques can not handle the classification of large datasets with both efficiency and effectiveness. 

In this context, the Optimum-Path Forest (OPF) classifier was recently introduced, trying to achieve high 

recognition rates and low computational cost. Although OPF was much faster than Support Vector Machines for training, 

it was slightly slower for classification. In this paper, we present the Efficient OPF (EOPF), which is an enhanced and 

faster version of the traditional OPF, and validate it for the automatic recognition of white matter and gray matter in magnetic 

resonance images of the human brain. 

13:30-16:30, Paper ThBCT8.3 

Model-Based Detection of Acoustically Dense Objects in Ultrasound 

Banerjee, Jyotirmoy, General Electric 

Krishnan, Kajoli B., General Electric 

Traditional methods of detection tend to under perform in the presence of the strong and variable background clutter that 

characterize a medical ultrasound image. In this paper, we present a novel diffusion based technique to localize acoustically 

dense objects in an ultrasound image. The approach is premised on the observation that the topology of noise in ultrasound 

images is more sensitive to diffusion than that of any such physical object. We show that our method when applied to the 

problem of fetal head detection and automatic measurement of head circumference in 59 obstetric scans compares remarkably 

well with manually assisted measurements. Based on fetal age estimates and their bounds specified in Standard 

OB Tables [6], the Gestational Age predictions from automated measurements is found to be within 2SD in 95% and 98% 

of cases when compared with manual measurements by two experts. The framework is general and can be extended to 

object localization in diverse applications of ultrasound imaging. 

13:30-16:30, Paper ThBCT8.4 

SubXPCA versus PCA: A Theoretical Investigation 

Negi, Atul, Univ. of Hyderabad 

Kadappagari, Vijaya Kumar, Vasavi Coll. of Egineering 

Principal Component Analysis (PCA) is a widely accepted dimensionality reduction technique that is optimal in a MSE 

sense. PCA extracts `global’ variations and is insensitive to `local’ variations in sub patterns. Recently, we have proposed 

a novel approach, SubXPCA, which was more effective computationally than PCA and also effective in computing principal 

components with both global and local information across sub patterns. In this paper, we show the near-optimality 

of SubXPCA (in terms of summarization of variance) by proving analytically that `SubXPCA approaches PCA with increase 

in number of local principal components of sub patterns.’ This is demonstrated empirically upon CMU Face Data. 

13:30-16:30, Paper ThBCT8.5 

Feature Extraction Base on Class Mean Embedding (CME) 

Wan, Minghua, Nanjing Univ. of Science and Tech. 



Recently, local discriminant embedding (LDE) was proposed to manifold learning and pattern classification. In LDE 

framework, the neighbor and class of data points were used to construct the graph embedding for classification problems. 

From a high dimensional to a low dimensional subspace, data points of the same class maintain their intrinsic neighbor 

- 299 -

elations, whereas neighboring data points of different classes no longer stick to one another. But, neighboring data points 

of different classes are not deemphasized efficiently by LDE and it may degrade the performance of classification. In this 

paper, we investigated its extension, called class mean embedding (CME), using class mean of data points to enhance its 

discriminant power in their mapping into a low dimensional space. Experimental results on ORL and FERET face databases 

show the effectiveness of the proposed method. 

13:30-16:30, Paper ThBCT8.6 

Forest Species Recognition using Color-Based Features 

Paula, Pedro Luiz, UFPR 

Oliveira, Luiz, Federal Univ. of Parana 

Britto, Alceu, Pontificia Univ. Católica do Paraná 

Sabourin, R., École de Tech. supérieure 

In this work we address the problem of forest species recognition which is a very challenging task and has several potential 

applications in the wood industry. The first contribution of this work is a database composed of 22 different species of the 

Brazilian flora that has been carefully labeled by expert in wood anatomy. In addition, in this work we demonstrate through 

a series of comprehensive experiments that color-based features are quite useful to increase the discrimination power for 

this kind of application. Last but not least, we propose a segmentation approach so that a wood can be locally processed 

to mitigate the intra-class variability featured in some classes. Such an approach also brings important contribution to improve 

the final performance in terms of classification. 

13:30-16:30, Paper ThBCT8.7 

An Information Theoretic Linear Discriminant Analysis Method 

Zhang, Haihong, Inst. for Infocomm Res. 


We propose a novel linear discriminant analysis method and demonstrate its superiority over existing linear methods. 

Based on information theory, we introduce a non-parametric estimate of mutual information with variable kernel bandwidth. 

Furthermore, we derive a gradient-based optimization algorithm for learning the optimal linear reduction vectors which 

maximizes the mutual information estimate. We evaluate the proposed method by running cross-validation on 2 data sets 

from the UCI repository, together with linear and nonlinear SVMs as classifiers. The result attests to the superority of the 

method over conventional LDA and its variant, aPAC. 

13:30-16:30, Paper ThBCT8.8 

Framewise Phone Classification using Weighted Fuzzy Classification Rules 

Dehzangi, Omid, Nanyang Tech. Univ. 

Ma, Bin, Inst. for Infocomm Res. 

Chng, Eng Siong, Nanyang Tech. Univ. 

Li, Haizhou, Inst. for Infocomm Res. 

Our aim in this paper is to propose a rule-weight learning algorithm in fuzzy rule-based classifiers. The proposed algorithm 

is presented in two modes: first, all training examples are assumed to be equally important and the algorithm attempts to 

minimize the error-rate of the classifier on the training data by adjusting the weight of each fuzzy rule in the rule-base, 

and second, a weight is assigned to each training example as the cost of misclassification of it using the class distribution 

of its neighbors. Then, instead of minimizing the error-rate, the learning algorithm is modified to minimize the sum of 

costs for misclassified examples. Using six data sets from UCI-ML repository and the TIMIT speech corpus for frame 

wise phone classification, we show that our proposed algorithm considerably improves the prediction ability of the classifier. 

13:30-16:30, Paper ThBCT8.9 

Statistical Fourier Descriptors for Defect Image Classification 

Timm, Fabian, Univ. of Lübeck 

Martinetz, Thomas, Univ. of Lübeck 

In many industrial applications, Fourier descriptors are commonly used when the description of the object shape is an im- 

- 300 -

portant characteristic of the image. However, these descriptors are limited to single objects. We propose a general Fourierbased 

approach, called statistical Fourier descriptor (SFD), which computes shape statistics in grey level images. The SFD 

is computationally efficient and can be used for defect image classification. In a first example, we deployed the SFD to 

the inspection of welding seams with promising results. 

13:30-16:30, Paper ThBCT8.10 

A Measure of Competence based on Randomized Reference Classifier for Dynamic Ensemble Selection 

Woloszynski, Tomasz, Wroclaw Univ. of Tech. 

Kurzynski, Marek, Wroclaw Univ. of Tech. 

This paper presents a measure of competence based on a randomized reference classifier (RRC) for classifier ensembles. 

The RRC can be used to model, in terms of class supports, any classifier in the ensemble. The competence of a modelled 

classifier is calculated as the probability of correct classification of the respective RRC. A multiple classifier system (MCS) 

was developed and its performance was compared against five MCSs using eight databases taken from the UCI Machine 

Learning Repository. The system developed achieved the highest overall classification accuracies for both homogeneous 

and heterogeneous ensembles. 

13:30-16:30, Paper ThBCT8.12 

Information Theory based WCE Video Summarization 

Granata, Eliana, Univ. of Catania 

Gallo, Giovanni, Univ. of Catania 

Torrisi, Alessandro, Univ. of Catania 

Wireless Capsule Endoscopy (WCE) is a technical break-through that allows to produce a video of the entire intestine 

without surgery. It is reported that a medical clinician spends one or two hours to assess a WCE video. It is hence useful 

to help the physician to do analysis diagnosis using computerized methods. In this paper an algorithmic informationtheroretic 

method is presented for the automatic summarization of meaningful changes in video sequences extracted from 

WCE videos. To segment a WCE video into anatomic parts (esophagus, stomach, small intestine, colon) we use a textonsbased 

method. The local textons histogram sequence is used for image representation and the Normalized Compression 

Distance (NCD) measure is used to compute the similarity between images. 

13:30-16:30, Paper ThBCT8.13 

An LDA-Based Relative Hysteresis Classifier with Application to Segmentation of Retinal Vessels 

Condurache, Alexandru Paul, Univ. of Luebeck 

Müller, Florian, Univ. of Luebeck 

Mertins, Alfred, Univ. of Luebeck 

In a pattern classification setup, image segmentation is achieved by assigning each pixel to one of two classes: object or 

background. The special case of vessel segmentation is characterized by a strong disproportion between the number of 

representatives of each class (i.e. class skew) and also by a strong overlap between classes. These difficulties can be solved 

using problem-specific knowledge. The proposed hysteresis classification makes use of such knowledge in an efficient 

way. We describe a novel, supervised, hysteresis-based classification method that we apply to the segmentation of retina 

photographies. This procedure is fast and achieves results that comparable or even superior to other hysteresis methods 

and, for the problem of retina vessel segmentation, to known dedicated methods on similar data sets. 

13:30-16:30, Paper ThBCT8.14 

An Offline Map Matching via Integer Programming 

Yanagisawa, Hiroki, IBM 

The map matching problem is, given a spatial road network and a sequence of locations of an object moving on the network, 

to identify the path in the network that the moving object passed through. In this paper, an integer programming formulation 

for the offline map matching problem is presented. This is the first approach that gives the optimal solution with respect 

to a widely used objective function for map matching. 

- 301 -

13:30-16:30, Paper ThBCT8.15 

Invisible Calibration Pattern based on Human Visual Perception Characteristics 

Takimoto, Hironori, Okayama Prefectural Univ. 

Yoshimori, Seiki, Nippon Bunri Univ. 

Mitsukura, Yasue, Tokyo Univ. of Agriculture and Tech. 

Fukumi, Minoru, The Univ. of Tokushima 

In the print-type steganographic system and watermark, a calibration pattern is arranged around contents where invisible 

data is embedded, as plural feature points corresponding to between an original image and the scanned image for normalization 

of the scanned image. However, it is clear that conventional methods interfere with page layout and artwork of 

contents. In addition, visible calibration patterns are not suitable for security service. In this paper, we propose an arrangement 

and detection method of an invisible calibration pattern based on characteristics of human visual perception. The 

calibration pattern is embedded to blue intensity in an original image by adding high frequency component. 

13:30-16:30, Paper ThBCT8.16 

Boosting Gray Codes for Red Eyes Removal 

Battiato, Sebastiano, Univ. of Catania 

Farinella, Giovanni Maria, Univ. of Catania 

Guarnera, Mirko, ST Microelectronics 

Messina, Giuseppe, ST Microelectronics 

Ravì, Daniele, ST Microelectronics 

Since the large diffusion of digital camera and mobile devices with embedded camera and flashgun, the red-eyes artifacts 

have de-facto become a critical problem. The technique herein described makes use of three main steps to identify and remove 

red-eyes. First, red eyes candidates are extracted from the input image by using an image filtering pipeline. A set of 

classifiers is then learned on gray code features extracted in the clustered patches space, and hence employed to distinguish 

between eyes and non-eyes patches. Once red-eyes are detected, artifacts are removed through desaturation and brightness 

reduction. The proposed method has been tested on large dataset of images achieving effective results in terms of hit rates 

maximization, false positives reduction and quality measure. 

13:30-16:30, Paper ThBCT8.17 

A New Rotation Feature for Single Tri-Axial Accelerometer based 3D Spatial Handwritten Digit Recognition 

Xue, Yang, South China Univ. of Tech. 

Jin, Lianwen, South China Univ. of Tech. 

A new rotation feature extracted from tri-axial acceleration signals for 3D spatial handwritten digit recognition is proposed. 

The feature can effectively express the clockwise and anti-clockwise direction changes of the users‘ movement while writing 

in a 3D space. Based on the rotation feature, an algorithm for 3D spatial handwritten digit recognition is presented. 

First, the rotation feature of the handwritten digit is extracted and coded. Then, the normalized edit distance between the 

digit and class model is computed. Finally, classification is performed using Support Vector Machine (SVM). The proposed 

approach outperforms time-domain features with a 22.12% accuracy improvement, peak-valley features with a 12.03% 

accuracy improvement, and FFT features with a 3.24% accuracy improvement, respectively. Experimental results show 

that the proposed approach is effective. 

13:30-16:30, Paper ThBCT8.18 

Improved Mean Shift Algorithm with Heterogeneous NodeWeights 

Yoon, Ji Won, Trinity Coll. Dublin 

Wilson, Simon, Trinity Coll. Dublin 

The conventional mean shift algorithm has been known to be sensitive to selecting a bandwidth. We present a robust mean 

shift algorithm with heterogeneous node weights that come from a geometric structure of a given data set. Before running 

MS procedure, we reconstruct un-normalized weights (a rough surface of data points) from the Delaunay Triangulation. 

The un-normalized weights help MS to avoid the problem of failing of misled mean shift vectors. As a result, we can 

obtain a more robust clustering result compared to the conventional mean shift algorithm. We also propose an alternative 

way to assign weights for large size datasets and noisy datasets. 

- 302 -

13:30-16:30, Paper ThBCT8.19 

Word Clustering using PLSA Enhanced with Long Distance Bigrams 

Bassiou, Nikoletta, Aristotle Univ. of Thessaloniki 

Kotropoulos, Constantine, Aristotle Univ. of Thessaloniki 

Probabilistic latent semantic analysis is enhanced with long distance bigram models in order to improve word clustering. 

The long distance bigram probabilities and the interpolated long distance bigram probabilities at varying distances within 

a context capture different aspects of contextual information. In addition, the baseline bigram, which incorporates triggerpairs 

for various histories, is tested in the same framework. The experimental results collected on publicly available corpora 

(CISI, Cran field, Medline, and NPL) demonstrate the superiority of the long distance bigrams over the baseline bigrams 

as well as the superiority of the interpolated long distance bigrams against the long distance bigrams and the baseline 

bigram with trigger-pairs in yielding more compact clusters containing less outliers. 

13:30-16:30, Paper ThBCT8.20 

Scene Classification using Local Co-Occurrence Feature in Subspace Obtained by KPCA of Local Blob Visual 

Words 

Hotta, Kazuhiro, Meijo University 

In recent years, scene classification based on local correlation of binarized projection lengths in subspace obtained by 

Kernel Principal Component Analysis (KPCA) of visual words was proposed and its effectiveness was shown. However, 

local correlation of 2 binary features becomes 1 only when both features are 1. In other cases, local correlation becomes 

0. This discarded information. In this paper, all kinds of co-occurrence of 2 binary features are used. This is the first device 

of our method. The second device is local Blob visual words. Conventional method made visual words from an orientation 

histogram on each grid. However, it is too local information. We use orientation histograms in a local Blob on grid as a 

basic feature and develop local Blob visual words. The third device is norm normalization of each orientation histogram 

in a local Blob. By normalizing local norm, the similarity between corresponding orientation histogram is reflected in 

subspace by KPCA. By these 3 devices, the accuracy is achieved more than 84% which is higher than conventional methods. 

13:30-16:30, Paper ThBCT8.21 

Recognition and Prediction of Situations in Urban Traffic Scenarios 

Käfer, Eugen, Daimler AG 

Hermes, Christoph, Bielefeld Univ. 

Wöhler, Christian, Dortmund University of Technology 

Kummert, Franz, Bielefeld Univ. 

Ritter, Helge, Bielefeld Univ. 

The recognition and prediction of intersection situations and an accompanying threat assessment are an indispensable skill 

of future driver assistance systems. This study focuses on the recognition of situations involving two vehicles at intersections. 

For each vehicle, a set of possible future motion trajectories is estimated and rated based on a motion database for 

a time interval of 2-4 s ahead. Possible situations involving two vehicles are generated by a pairwise combination of these 

individual motion trajectories. An interaction model based on the mutual visibility of the vehicles and the assumption that 

a driver will attempt to avoid a collision is used to rate possible situations. The correspondingly favoured situations are 

classified with a probabilistic framework. The proposed method is evaluated on a real-world differential GPS data set acquired 

during a test drive of about 10 km, including three road intersections. Our method is typically able to recognise the 

situation correctly about 1.5-3 s before the last vehicle has passed its minimum distance to the centre of the intersection. 

13:30-16:30, Paper ThBCT8.22 

Employing Decoding of Specific Error Correcting Codes as a New Classification Criterion in Multiclass Learning 

Problems 

Luo, Yurong, Virginia Commonwealth Univ. 

Kayvan, Najarian, Virginia Commonwealth Univ. 

Error Correcting Output Codes (ECOC) method solves multiclass learning problems by combining the outputs of several 

binary classifiers according to an error correcting output code matrix. Traditionally, the minimum Hamming distance is 

adopted as the classification criterion to “vote” among multiple hypotheses, and the focus is given to the choice of error 

- 303 -

correcting output code matrix. In this paper, we apply a decoding methodology in multiclass learning problems, in which 

class labels of testing samples are unknown. In other words, without comparing the predicted and actual class labels, it 

can be known whether testing samples are classified correctly. Based on this property, a new cascade classifier is introduced. 

The classifier can improve the accuracy and will not result in over fitting. The analytical results show feasibility, accuracy, 

and the advantages of the proposed method. 

13:30-16:30, Paper ThBCT8.23 

EEG-Based Emotion Recognition using Self-Organizing Map for Boundary Detection 

Khosrowabadi, Reza, Nanyang Tech. Univ. Singapore 

Ang, Kai Keng, Inst. for Infocomm Res. A*STAR 

Quek, Hiok Chai, Nanyang Tech. Univ. 

Bin Abdul Rahman, Abdul Wahab, International Islamic Univ. Malaysia 

This paper presents an EEG-based emotion recognition system using self-organizing map for boundary detection. Features 

from EEG signals are classified by considering the subjects‘ emotional responses using scores from SAM questionnaire. 

The selection of appropriate threshold levels for arousal and valence is critical to the performance of the recognition system. 

Therefore, this paper investigates the performance of a proposed EEG-based emotion recognition system that employed selforganizing 

map to identify the boundaries between separable regions. A study was performed to collect 8 channels of EEG 

data from 26 healthy right-handed subjects in experiencing 4 emotional states while exposed to audio-visual emotional 

stimuli. EEG features were extracted using the magnitude squared coherence of the EEG signals. The boundaries of the EEG 

features were then extracted using SOM. 5-fold cross-validation was then performed using the k-nn classifier. The results 

showed that proposed method improved the accuracies to 84.5%. 

13:30-16:30, Paper ThBCT8.24 

Vocabulary-Based Approaches for Multiple-Instance Data: A Comparative Study 

Amores, Jaume, Univ. Autònoma de Barcelona 

Multiple Instance Learning (MIL) has become a hot topic and many different algorithms have been proposed in the last 

years. Despite this fact, there is a lack of comparative studies that shed light into the characteristics of the different methods 

and their behavior in different scenarios. In this paper we provide such an analysis. We include methods from different families, 

and pay special attention to vocabulary-based approaches, a new family of methods that has not received much attention 

in the MIL literature. The empirical comparison includes seven databases from four heterogeneous domains, implementations 

of eight popular MIL methods, and a study of the behavior under synthetic conditions. Based on this analysis, we show that, 

with an appropriate implementation, vocabulary-based approaches outperform other MIL methods in most of the cases, 

showing in general a more consistent performance. 

13:30-16:30, Paper ThBCT8.25 

A Multiple Classifier System Approach for Facial Expressions in Image Sequences Utilizing GMM Supervectors 

Schels, Martin, Univ. of Ulm 

Schwenker, Friedhelm, Univ. of Ulm 

The Gaussian mixture model (GMM) super vector approach is a well known technique in the domain of speech processing, 

e.g. speaker verification and audio segmentation. In this paper we apply this approach to video data in order to recognize 

human facial expressions. Three different image feature types (optical flow histograms, orientation histograms and principal 

components) from four pre-selected regions of the human’s face image were extracted and GMM super-vectors of the feature 

channels per sequence were constructed. Support vector machines (SVM) were trained using these super vectors for every 

channel separately and its results were combined using classifier fusion techniques. Thus, the performance of the classifier 

could be improved compared to the best individual classifier. 

13:30-16:30, Paper ThBCT8.26 

Incremental Learning of Visual Landmarks for Mobile Robotics 

Bandera, Antonio, Univ. of Malaga 

Vázquez-Martín, Ricardo, Centro Andaluz de Innovación y Tecnologías de la Información y las Comunicaciones CITIC 

Marfil, Rebeca, Univ. of Malaga 

- 304 -

This paper proposes an incremental scheme for visual landmark learning and recognition. The feature selection stage characterises 

the landmark using the Opponent SIFT, a color-based variant of the SIFT descriptor. To reduce the dimensionality 

of this descriptor, an incremental non-parametric discriminant analysis is conducted to seek directions for efficient discrimination 

(incremental eigenspace learning). On the other hand, the classification stage uses the incremental envolving clustering 

method (ECM) to group feature vectors into a set of clusters (incremental prototype learning). Then, the final classification 

is conducted based on the k-nearest neighbor approach, whose prototypes were updated by the ECM. This global scheme 

enables a classifier to learn incrementally, on-line, and in one pass. Besides, the ECM allows to reduce the memory and computation 

expenses. Experimental results show that the proposed recognition system is well suited to be used by an autonomous 

mobile robot. 

13:30-16:30, Paper ThBCT8.27 

Subspace Methods with Globally/Locally Weighted Correlation Matrix 

Yamashita, Yukihiko, Tokyo Inst. of Tech. 


The discriminant function of a subspace method is provided by using correlation matrices that reflect the averaged feature 

of a category. As a result, it will not work well on unknown input patterns that are far from the average. To address this problem, 

we propose two kinds of weighted correlation matrices for subspace methods. The globally weighted correlation matrix 

(GWCM) attaches importance to training patterns that are far from the average. Then, it can reflect the distribution of patterns 

around the category boundary more precisely. The computational cost of a subspace method using GWCMs is almost the 

same as that using ordinary correlation matrices. The locally weighted correlation matrix (LWCM) attaches importance to 

training patterns that arenear to an input pattern to be classified. Then, it can reflect the distribution of training patterns 

around the input pattern in more detail. The computational cost of a subspace method with LWCM at the recognition stage 

does not depend on the number of training patterns, while those of the conventional adaptive local and the nonlinear subspace 

methods do. We show the advantages of the proposed methods by experiments made on the MNIST database of handwritten 

digits. 

13:30-16:30, Paper ThBCT8.28 

The Binormal Assumption on Precision-Recall Curves 

Brodersen, Kay Henning, ETH Zurich 

Ong, Cheng Soon, ETH Zurich 

Stephan, Klaas Enno, Univ. of Zurich 

Buhmann, Joachim M., Swiss Federal Inst. of Tech. Zurich 

The precision-recall curve (PRC) has become a widespread conceptual basis for assessing classification performance. The curve 

relates the positive predictive value of a classifier to its true positive rate and often provides a useful alternative to the well-known 

receiver operating characteristic (ROC). The empirical PRC, however, turns out to be a highly imprecise estimate of the true curve, 

especially in the case of a small sample size and class imbalance in favour of negative examples. Ironically, this situation tends to 

occur precisely in those applications where the curve would be most useful, e.g., in anomaly detection or information retrieval. Here, 

we propose to estimate the PRC on the basis of a simple distributional assumption about the decision values that generalizes the established 

binormal model for estimating smooth ROC curves. Using simulations, we show that our approach outperforms empirical 

estimates, and that an account of the class imbalance is crucial for obtaining unbiased PRC estimates. 

13:30-16:30, Paper ThBCT8.29 

Incremental Training of Multiclass Support Vector Machines 

Nikitidis, Symeon, Centre for Res. and Tech. Hellas 

Nikolaidis, Nikos, Aristotle Univ. of Thessaloniki 

Pitas, Ioannis, - 

We present a new method for the incremental training of multiclass Support Vector Machines that provides computational efficiency 

for training problems in the case where the training data collection is sequentially enriched and dynamic adaptation of the classifier 

is required. An auxiliary function that incorporates some desired characteristics in order to provide an upper bound of the objective 

function which summarizes the multiclass classification task has been designed and the global minimizer for the enriched dataset is 

found using a warm start algorithm, since faster convergence is expected when starting from the previous global minimum. Experimental 

evidence on two data collections verified that our method is faster than retraining the classifier from scratch, while the 

achieved classification accuracy is maintained at the same level. 

- 305 -

13:30-16:30, Paper ThBCT8.30 

User Adaptive Clustering of a Large Image Database 

Saboorian, Mohammad Mehdi, Sharif Univ. of Tech. 

Jamzad, Mansour, Sharif Univ. of Tech. 

Rabiee, Hamid Reza, Sharif Univ. of Tech. 

Searching large image databases is a time consuming process when done manually. Current CBIR methods mostly rely 

on training data in specific domains. When source and domain of images are unknown, unsupervised methods provide 

better solutions. In this work, we use a hierarchical clustering scheme to group images in an unknown and large image 

database. In addition, the user should provide the current class assignment of a small number of images as a feedback to 

the system. The proposed method uses this feedback to guess the number of required clusters, and optimizes the weight 

vector in an iterative manner. In each step, after modification of the weight vector, the images are reclustered. We compared 

our method with a similar approach (but without users feedback) named CLUE. Our experimental results show that by 

considering the user feedback, the accuracy of clustering is considerably improved. 

13:30-16:30, Paper ThBCT8.31 

Alignment-Based Similarity of People Trajectories using Semi-Directional Statistics 

Calderara, Simone, Univ. of Modena and Reggio Emilia 

Prati, Andrea, Univ. of Modena and Reggio Emilia 

Cucchiara, Rita, Univ. of Modena and Reggio Emilia 

This paper presents a method for comparing people trajectories for video surveillance applications, based on semi-directional 

statistics. In fact, the modelling of a trajectory as a sequence of angles, speeds and time lags, requires the use of a 

statistical tool capable to jointly consider periodic and linear variables. Our statistical method is compared with two stateof-the-art 

methods. 

13:30-16:30, Paper ThBCT8.32 

Contact Lens Detection based on Weighted LBP 

Zhang, Hui, Shanghai Inst. of Tech. 

Sun, Zhenan, Chinese Acad. of Sciences 

Tan, Tieniu, Chinese Acad. of Sciences 

Spoof detection is a critical function for iris recognition because it reduces the risk of iris recognition systems being forged. 

Despite various counterfeit artifacts, cosmetic contact lens is one of the most common and difficult to detect. In this paper, 

we proposed a novel fake iris detection algorithm based on improved LBP and statistical features. Firstly, a simplified 

SIFT descriptor is extracted at each pixel of the image. Secondly, the SIFT descriptor is used to rank the LBP encoding 

sequence. Then, statistical features are extracted from the weighted LBP map. Lastly, SVM classifier is employed to 

classify the genuine and counterfeit iris images. Extensive experiments are conducted on a database containing more than 

5000 fake iris images by wearing 70 kinds of contact lens, and captured by four iris devices. Experimental results show 

that the proposed method achieves state-of-the-art performance in contact lens spoof detection. 

13:30-16:30, Paper ThBCT8.33 

Integrating ILSR to Bag-of-Visual Words Model based on Sparse Codes of SIFT Features Representations 

Wu, Lina, Univ. Beijing 

Luo, Siwei, Univ. Beijing 

Sun, Wei, Beijing Jiaotong Univ. 

Zheng, Xiang, Beijing Jiaotong Univ. 

In computer vision, the bag-of-visual words(BOV) approach has been shown to yield state-of-the-art results. To improve 

BOV model, we use sparse codes of SIFT features instead of previous vector quantization (VQ) such as k-means, due to 

more quantization errors of VQ. And as local features in most categories have spatial dependence in real world, we use 

neighbor features of one local feature as its implicit local spatial relationship (ILSR). This paper proposes an object categorization 

algorithm which integrate implicit local spatial relationship with its appearance features based on sparse codes 

of SIFT to form two sources of information for categorization. The algorithm is applied in Caltech-101 and Caltech-256 

datasets to validate its effectiveness. The experimental results show its good performance. 

- 306 -

13:30-16:30, Paper ThBCT8.34 

Heteroscedastic Multilinear Discriminant Analysis for Face Recognition 

Safayani, Mehran, Sharif Univ. of Tech. 

Manzuri Shalmani, Mohammad Taghi, Sharif Univ. of Tech. 

There is a growing attention in subspace learning using tensor-based approaches in high dimensional spaces. In this paper 

we first indicate that these methods suffer from the Heteroscedastic problem and then propose a new approach called Heteroscedastic 

Multilinear Discriminant Analysis (HMDA). Our method can solve this problem by utilizing the pairwise 

chernoff distance between every pair of clusters with the same index in different classes. We also show that our method is 

a general form of Multilinear Discriminant Analysis (MDA) approach. Experimental results on CMU-PIE, AR and AT&T 

face databases demonstrate that the proposed method always perform better than MDA in term of classification accuracy. 

13:30-16:30, Paper ThBCT8.35 

Applying Error Correcting Output Coding to Enhance the Convolutional Neural Network for Target Detection and 

Pattern Recognition 

Deng, Huiqun, Concordia Univ. 

Stathopoulos, George, Concordia Univ. 


This paper views target detection and pattern recognition as a kind of communications problem and applies error-correcting 

coding to the outputs of a convolutional neural network to improve the accuracy and reliability of detection and recognition 

of targets. The outputs of the convolutional neural network are designed according to codewords with maximum Hamming 

distances. The effects of the codewords on the performance of the convolutional neural network in target detection and 

recognition are then investigated. Images of hand-written digits and printed English letters and symbols are used in the 

experiments. Results show that error-correcting output coding provides the neural network with more reliable decision 

rules and enables it to perform more accurate and reliable detection and recognition of targets. Moreover, our error-correcting 

output coding can reduce the number of neurons required, which is highly desirable in efficient implementations. 

13:30-16:30, Paper ThBCT8.36 

Action Recognition using Direction Models of Motion 

Benabbas, Yassine, LIFL 

Lablack, Adel, UMR USTL/CNRS 8022 

Ihaddadene, Nacim, UMR USTL/CNRS 8022 

Djeraba, Chabane, UMR USTL/CNRS 8022 

In this paper, we present an effective method for human action recognition using statistical models based on optical flow 

orientations. We compute a distribution mixture over motion orientations at each spatial location of the video sequence. 

The set of estimated distributions constitutes the direction model, which is used as a mid-level feature for the video sequence. 

We recognize human actions using a distance metric to compare the direction model of a query sequence with the 

direction models of training sequences. The experimentations have been performed on standard datasets and have showed 

promising results. 

13:30-16:30, Paper ThBCT8.37 

Boolean Combination of Classifiers in the ROC Space 

Khreich, Wael, École de Tech. Supérieure 

Granger, Eric, École de Tech. Supérieure 

Miri, Ali, Univ. of Ottawa 


Using Boolean AND and OR functions to combine the responses of multiple one- or two-class classifiers in the ROC 

space may significantly improve performance of a detection system over a single best classifier. However, techniques 

found in literature assume that the classifiers are conditionally independent, and that their ROC curves are convex. These 

assumptions are not valid in most real-world applications, where classifiers are designed using limited and imbalanced 

training data. A new Iterative Boolean Combination (IBC) technique applies all Boolean functions to combine the ROC 

curves produced by multiple classifiers without prior assumptions, and its time complexity is linear according to the 

number of classifiers. The results of computer simulations conducted on synthetic and real-world host-based intrusion de- 

- 307 -

tection data indicate that combining the responses from multiple HMMs with IBC can achieve a significantly higher level 

of performance than with the AND and OR combinations, especially when training data is limited and imbalanced. 

13:30-16:30, Paper ThBCT8.38 

Stereo-Based Multi-Person Tracking using Overlapping Silhouette Templates 

Satake, Junji, Toyohashi Univ. of Tech. 

Miura, Jun, Toyohashi Univ. of Tech. 

This paper describes a stereo-based person tracking method for a person following robot. Many previous works on person 

tracking use laser range finders which can provide very accurate range measurements. Stereo-based systems have also 

been popular, but most of them are not used for controlling a real robot. We previously developed a tracking method which 

uses depth templates of person shape applied to a dense depth image. The method, however, sometimes failed when complex 

occlusions occurred. In this paper, we propose an accurate, stable tracking method using overlapping silhouette templates 

which consider how persons overlap in the image. Experimental results show the effectiveness of the proposed 

method. 

13:30-16:30, Paper ThBCT8.40 

Characterising Facial Gender Difference using Fisher-Rao Metric 

Ceolin, Simone Regina, Univ. of York 


The aim in this paper is to explore whether the Fisher-Rao metric can be used to measure different facets of facial shape 

estimated from fields of surface normals using the von-Mises Fisher distribution. In particular we aim to characterise the 

shape changes due to differences in gender. We make use of the von-Mises Fisher distribution since we are dealing with 

surface normal data over the sphere R^2. Finally, we show the results achieved using EAR and Max Planck datasets. 

13:30-16:30, Paper ThBCT8.41 

On-Line FMRI Data Classification using Linear and Ensemble Classifiers 

Plumpton, Catrin Oliver, Bangor Univ. 

Kuncheva, Ludmila I., Bangor Univ. 

Linden, David E. J., Bangor Univ. 

Johnston, Stephen Jaye, Bangor Univ. 

The advent of real-time fMRI pattern classification opens many avenues for interactive self-regulation where the brain’s 

response is better modelled by multivariate, rather than univariate techniques. Here we test three on-line linear classifiers, 

applied to a real fMRI dataset, collected as part of an experiment on the cortical response to emotional stimuli. We propose 

a random subspace ensemble as a fast and more accurate alternative to component classifiers. The on-line linear discriminant 

classifier (O-LDC) was found to be a better base classifier than the on-line versions of the perceptron and the balanced 

winnow. 

13:30-16:30, Paper ThBCT8.42 

Adaptive Feature and Score Level Fusion Strategy using Genetic Algorithms 

Ben Soltana, Wael, Ec. Centrale de Lyon 



Ben Amar, Chokri, Res. Group on Intelligent Machines 

Classifier fusion is considered as one of the best strategies for improving performance of general purpose classification systems. 

On the other hand, fusion strategy space strongly depends on classifiers, features and data spaces. As the cardinality 

of this space is exponential, one needs to resort to a heuristic to find a sub-optimal fusion strategy. In this work, we present 

a new adaptive feature and score level fusion strategy (AFSFS) based on adaptive genetic algorithm. AFSFS tunes itself between 

feature and matching score level, and improves the final performance over the original on both levels, and as a fusion 

method, it does not only contain fusion strategy to combine the most relevant features so as to achieve adequate and optimized 

results, but also has the extensive ability to select the most discriminative features. Experiments are provided on the FRGC 

database showing that the proposed method produces significantly better results than the baseline fusion methods. 

- 308 -

13:30-16:30, Paper ThBCT8.43 

Local Binary Pattern-Based Features for Text Identification of Web Images 

Jung, Insook, Chonbuk National Univ. 

Oh, Il-Seok, Chonbuk National Univ. 

We present a method of robustly identifying a text block in complex web images. The method is a MLP (Multi-layer perceptron) 

classifier trained on LBP (Local binary patterns), wavelet and shape feature spaces. Especially, we propose adaptive 

masks of LBP which responses flexibly to various character sizes. Most of previous works use fixed mask size or 

multi level scales by pyramid schemes, which may have weakness in dealing with diverse size of text. Experiments carried 

out on 100 web images show promising results. 

13:30-16:30, Paper ThBCT8.44 

Classification of Polarimetric SAR Images using Evolutionary RBF Neural Networks 

Turker, Ince, Izmir Univ. of Ec. 

Kiranyaz, Serkan, Tampere Univ. of Tech. 

Moncef, Gabbouj, Tampere Univ. of Tech. 

This paper proposes an evolutionary RBF network classifier for polar metric synthetic aperture radar ( SAR) images. The 

proposed feature extraction process utilizes the full covariance matrix, the gray level co-occurrence matrix (GLCM) based 

texture features, and the backscattering power (Span) combined with the H/α/A decomposition, which are projected 

onto a lower dimensional feature space using principal component analysis. An experimental study is performed using 

the fully polar metric San Francisco Bay data set acquired by the NASA/Jet Propulsion Laboratory Airborne SAR (AIR- 

SAR) at L-band to evaluate the performance of the proposed classifier. Classification results (in terms of confusion matrix, 

overall accuracy and classification map) compared to the Wish art and a recent NN-based classifiers demonstrate the effectiveness 

of the proposed algorithm. 

13:30-16:30, Paper ThBCT8.45 

On the Use of Median String for Multi-Source Translation 

González Rubio, Jesús, Univ. Pol. de Valencia 

Casacuberta, Francisco, Univ. Pol. de Valencia 

State-of-the-art approaches to multi-source translation involve a multimodal-like process which applies an individual 

translation system to each source language. Then, the translations of the individual systems are combined to obtain a consensus 

output. We propose to use the (generalised) median string as the consensus output of the individual translation systems. 

Different approximations to the median string are studied as well as different approaches to improve the median 

string performance when dealing with natural language strings. The proposed approaches were evaluated on the Europarl 

corpus, achieving significant improvements in translation quality. 

13:30-16:30, Paper ThBCT8.47 

A Lip Contour Extraction Method using Localized Active Contour Model with Automatic Parameter Selection 

Liu, Xin, Hong Kong Baptist Univ. 

Cheung, Yiu-Ming, Hong Kong Baptist Univ. 

Li, Meng, Hong Kong Baptist Univ. 

Liu, Hailin, Guangdong Univ. of Technology 

Lip contour extraction is crucial to the success of a lipreading system. This paper presents a lip contour extraction algorithm 

using localized active contour model with the automatic selection of proper parameters. The proposed approach utilizes a 

minimum-bounding ellipse as the initial evolving curve to split the local neighborhoods into the local interior region and 

the local exterior region, respectively, and then compute the localized energy for evolving and extracting. This method is 

robust against the uneven illumination, rotation, deformation, and the effects of teeth and tongue. Experiments show its 

promising result in comparison with the existing methods. 

13:30-16:30, Paper ThBCT8.48 

Multimodal Sleeping Posture Classification 

Huang, Weimin, I2R 

Phyo Wai, Aung Aung, Inst. for Infocomm Res. 

Foo, Siang Fook, Inst. for Infocomm Res. 

- 309 -

Biswas, Jit, Inst. for Infocomm Res. 

Liou, Kou Juch, Industrial Tech. Res. Inst. 

Hsia, C. C., ITRI 

Sleeping posture reveals important information for eldercare and patient care, especially for bed ridden patients. Traditionally, 

some works address the problem from either pressure sensor or video image. This paper presents a multimodal 

approach to sleeping posture classification. Features from pressure sensor map and video image have been proposed in 

order to characterize the posture patterns. The spatiotemporal registration of the two modalities has been considered in 

the design, and the joint feature extraction and data fusion is presented. Using multi-class SVM, experiment results demonstrate 

that the multimodal approach achieves better performance than the approaches using single modal sensing. 

13:30-16:30, Paper ThBCT8.49 

Exploiting System Knowledge to Improve ECOC Reject Rules 

Simeone, Paolo, Univ. of Cassino 

Marrocco, Claudio, Univ. of Cassino 

Tortorella, Francesco, Univ. of Cassino 

Error Correcting Output Coding is a common technique for multiple class classification tasks which decomposes the original 

problem in several two-class problems solved through dichotomizers. Such classification system can be improved 

with a reject option which can be defined according to the level of information available from the dichotomizers. This 

paper analyzes how this knowledge is useful when applying such reject rules. The nature of the outputs, the kind of the 

employed classifiers and the knowledge of their loss function are influential details for the improvement of the general 

performance of the system. Experimental results on popular benchmark data sets are reported to show the behavior of the 

different schemes. 

13:30-16:30, Paper ThBCT8.50 

Human Smoking Event Detection using Visual Interaction Clues 

Wu, Pin, Yuan-Ze University 

Hsieh, Jun-Wei, Yuan-Ze University 

Cheng, Jiun-Cheng, National Taiwan Ocean Univ. 

Cheng, Shyi-Chyi, National Taiwan Ocean Univ. 

Tseng, Shau-Yin, Industry Tech. Res. Institute 

This paper presents a novel scheme to automatically and directly detect smoking events in video. In this scheme, a colorbased 

ratio histogram analysis is introduced to extract the visual clues from appearance interactions between lighted cigarette 

and its human holder. The techniques of color re-projection and Gaussian Mixture Models (GMMs) enable the tasks 

of cigarette segmentation and tracking over the background pixels. Then, a key problem for event analysis is the nonregular 

form of smoking events. Thus, we propose a self-determined mechanism to analyze this suspicious event using 

HHM framework. Due to the uncertainties of cigarette size and color, there is no automatic system which can well analyze 

human smoking events directly from videos. The proposed scheme is compatible to detect the smoking events of uncertain 

actions with various cigarette sizes, colors, and shapes, and has capacity to extend visual analysis to human events of 

similar interaction relationship. Experimental results show the effectiveness and real-time performances of our scheme in 

smoking event analysis. 

13:30-16:30, Paper ThBCT8.51 

Malware Detection on Mobile Devices using Distributed Machine Learning 

Sharifi Shamili, Ashkan, RWTH Aachen Univ. 


Alpcan, Tansu, Tech. Univ. Berlin 

This paper presents a distributed Support Vector Machine (SVM) algorithm in order to detect malicious software (malware) 

on a network of mobile devices. The light-weight system monitors mobile user activity in a distributed and privacy-preserving 

way using a statistical classification model which is evolved by training with examples of both normal usage patterns 

and unusual behavior. The system is evaluated using the MIT reality mining data set. The results indicate that the 

distributed learning system trains quickly and performs reliably. Moreover, it is robust against failures of individual components. 

- 310 -

13:30-16:30, Paper ThBCT8.52 

Combining Single Class Features for Improving Performance of a Two Stage Classifier 

Cordella, Luigi P., Univ. di Napoli Federico II 

De Stefano, Claudio, Univ. of Cassino 

Fontanella, Francesco, Univ. of Cassino 

Marrocco, Cristina, Univ. of Cassino 

Scotto Di Freca, Alessandra, Univ. of Cassino 

We propose a feature selection—based approach for improving classification performance of a two stage classification 

system in contexts where a high number of features is involved. A problem with a set of N classes is subdivided into a set 

of N two class problems. In each problem, a GA—based feature selection algorithm is used for finding the best subset of 

features. These subsets are then used for training N classifiers. In the classification phase, unknown samples are given in 

input to each of the trained classifiers by using the corresponding subspace. In case of conflicting responses, the sample 

is sent to a suitably trained supplementary classifier. The proposed approach has been tested on a real world dataset containing 

hyper—spectral image data. The results favourably compare with those obtained by other methods on the same 

data. 

13:30-16:30, Paper ThBCT8.53 

The Rex Leopold II Model: Application of the Reduced Set Density Estimator to Human Categorization 

De Schryver, Maarten, Ghent Univ. 

Roelstraete, Bjorn, Ghent Univ. 

Reduction techniques are important tools in machine learning and pattern recognition. In this article, we demonstrate how 

a kernel-based density estimator can be used as a tool for understanding human category representation. Despite the dominance 

of exemplar models of categorization, there is still ambiguity about the number of exemplars stored in memory. 

Here, we illustrate that by omitting exemplars categorization performance is not affected. 

13:30-16:30, Paper ThBCT8.54 

A Hybrid Method for Feature Selection based on Mutual Information and Canonical Correlation Analysis 

Sakar, Cemal Okan, Bahcesehir Univ. 

Kursun, Olcay, Istanbul Univ. 

Mutual Information (MI) is a classical and widely used dependence measure that generally can serve as a good feature selection 

algorithm. However, under-sampled classes or rare but certain relations are overlooked by this measure, which can 

result in missing relevant features that could be very predictive of variables of interest, such as certain phenotypes or disorders 

in biomedical research, rare but dangerous factors in ecology, intrusions in network systems, etc. On the other hand, 

Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence 

but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the 

amount of information (entropy) of the dependence. In this paper, we propose Predictive Mutual Information (PMI), a hybrid 

measure of relevance not only is based on MI but also accounts for predictability of signals from one another as in 

KCCA. We show that PMI has more improved feature detection capability than MI and KCCA, especially in catching 

suspicious coincidences that are rare but potentially important not only for subsequent experimental studies but also for 

building computational predictive models which is demonstrated on two toy datasets and a real intrusion detection system 

dataset. 

13:30-16:30, Paper ThBCT8.55 

Speech Magnitude-Spectrum Information-Entropy (MSIE) for Automatic Speech Recognition in Noisy Environments 

Nolazco-Flores, Juan A., Inst. Tecnológico y de Estudios Superiores de Monterrey 

Aceves-López, Roberto A., Inst. Tecnológico y de Estudios Superiores de Monterrey 

García-Perera, L. Paola, Inst. Tecnológico y de Estudios Superiores de Monterrey 

The Magnitude-Spectrum Information-Entropy (MSIE) of the speech signal is presented as an alternative representation 

of the speech that can be used to mitigate the mismatch between training and testing conditions. The speech-magnitude 

spectrum is considered as a random variable from which entropy coefficients can be calculated for each frame. By concatenating 

these entropic coefficients to its corresponding MFCC vector, then calculating the dynamic coefficients, and 

- 311 -

the results show an improvement compared to a baseline. The MSIE effectiveness was tested under the Aurora 2 database 

audio ?les. When trained in clean speech, the experimental results obtained by the MSIE concatenated to the MFCC outperform 

the results obtained with the MFCC baseline system for selected types of noises at different SNRs. For this selected 

group of noises the overall improvement performance in the range 0 dB to 20 dB for the Aurora 2 database is of 15.06%. 

13:30-16:30, Paper ThBCT8.56 

Unsupervised Image Retrieval with Similar Lighting Conditions 

Serrano Talamantes, Jose Felix, Centro De Investigacion en Computacion 

Aviles, Carlos, Univ. Autónoma Metropolitana-Azcapotzalco México 

Sossa, Humberto, Center for Computing Res. CIC-IPN 

Villegas, Juan, Univ. Autónoma Metropolitana-Azcapotzalco México 

Olague, Gustavo, Centro de Investiación Científica y de Educación Superior 

In this work a new method to retrieve images with similar lighting conditions is presented. It is based on automatic clustering 

and automatic indexing. Our proposal belongs to Content Based Image Retrieval (CBIR) category. The goal is to 

retrieve from a database, images (by their content) with similar lighting conditions. When we look at images taken from 

outdoor scenes, much of the information perceived depends on the lighting conditions. The proposal combines fixed and 

random extracted points for feature extraction. The describing features are the mean, the standard deviation and the homogeneity 

(from the co-occurrence matrix) of a sub-image extracted from the three color channels: (H, S, I). A K-MEANS 

algorithm and a 1-NN classifier are used to build an indexed database of 300 images in order to retrieve images with 

similar lighting conditions applied on sky regions such as: sunny, partially cloudy and completely cloudy. One of the advantages 

of the proposal is that we do not need to manually label the images for their retrieval. The performance of our 

framework is demonstrated through several experimental results, including the improved rates for images retrieval with 

similar lighting conditions. A comparison with another similar work is also presented. 

13:30-16:30, Paper ThBCT8.57 

Lattice-Based Anomaly Rectification for Sport Video Annotation 

Khan, Aftab, Univ. of Surrey 

Windridge, David, Univ. of Surrey 

De Campos, Teofilo, Univ. of Surrey 

Anomaly detection has received much attention within the literature as a means of determining, in an unsupervised manner, 

whether a learning domain has changed in a fundamental way. This may require continuous adaptive learning to be abandoned 

and a new learning process initiated in the new domain. A related problem is that of anomaly rectification; the adaptation 

of the existing learning mechanism to the change of domain. As a concrete instantiation of this notion, the current 

paper investigates a novel lattice-based HMM induction strategy for arbitrary court-game environments. We test (in real 

and simulated domains) the ability of the method to adapt to a change of rule structures going from tennis singles to tennis 

doubles. Our long term aim is to build a generic system for transferring game-rule inferences. 

13:30-16:30, Paper ThBCT8.58 

An Ensemble of Classifiers Approach to Steganalysis 

Bayram, Sevinc, Pol. Inst. of NYU 

Dirik, Ahmet Emir, Pol. Inst. of NYU 

Sencar, Husrev Taha, TOBB Univ. of Ec. and Tech. 

Memon, Nasir, Pol. Inst. of New York Univ. 

Most work on steganalysis, except a few exceptions, have primarily focused on providing features with high discrimination 

power without giving due consideration to issues concerning practical deployment of steganalysis methods. In this work, 

we focus on machine learning aspect of steganalyzer design and utilize a hierarchical ensemble of classifiers based approach 

to tackle two main issues. Firstly, proposed approach provides a workable and systematic procedure to incorporate several 

steganalyzers together in a composite steganalyzer to improve detection performance in a scalable and cost-effective manner. 

Secondly, since the approach can be readily extended to multi-class classification it can also be used to infer the 

steganographic technique deployed in generation of a stego-object. We provide results to demonstrate the potential of the 

proposed approach. 

- 312 -

13:30-16:30, Paper ThBCT8.59 

Discriminating Intended Human Objects in Consumer Videos 

Uegaki, Hiroshi, Osaka Univ. 

Nakashima, Yuta, Osaka Univ. 


In a consumer video, there are not only intended objects, which are intentionally captured by the camcorder user, but also 

unintended objects, which are accidentally framed-in. Since the intended objects are essential to present what the camcorder 

user wants to express in the video, discriminating the intended objects from the unintended objects are beneficial for many 

applications, e.g., video summarization, privacy protection, and so forth. In this paper, focusing on human objects, we 

propose a method for discriminating the intended human objects from the unintended human objects. We evaluated the 

proposed method using 10 videos captured by 3 camcorder users. The results demonstrate that the proposed method successfully 

discriminates the intended human objects with 0.45 of recall and 0.80 of precision. 

13:30-16:30, Paper ThBCT8.60 

Detecting Human Activity Profiles with Dirichlet Enhanced Inhomogeneous Poisson Processes 

Shimosaka, Masamichi, The Univ. of Tokyo 

Ishino, Takahito, The Univ. of Tokyo 

Noguchi, Hiroshi, The Univ. of Tokyo 

Mori, Taketoshi, The Univ. of Tokyo 

Sato, Tomomasa, The Univ. of Tokyo 

This paper describes an activity pattern mining method via inhomogeneous Poisson point processes (IPPPs) from timeseries 

of count data generated in behavior detection by pyroelectric sensors. IPPP reflects the idea that typical human activity 

is rhythmic and periodic. We also focus on the idea that activity patterns are affected by exogenous phenomena, 

such as the day of the week, and weather condition. Because single IPPP could not tackle this idea, Dirichlet process mixtures 

(DPM) are leveraged in order to discriminate and discover different activity patterns caused by such factors. The use 

of DPM leads us to discover the appropriate number of the typical daily patterns automatically. Experimental result using 

long-term count data shows that our model successfully and efficiently discovers typical daily patterns. 

13:30-16:30, Paper ThBCT8.61 

I-FAC: Efficient Fuzzy Associative Classifier for Object Classes in Images 

Mangalampalli, Ashish, International Inst. of Information Tech. Hyderabad, India 

Chaoji, Vineet, Yahoo! Inc 

Sanyal, Subhajit, Yahoo! Lab. Bangalore, India 

We present I-FAC, a novel fuzzy associative classification algorithm for object class detection in images using interest 

points. In object class detection, the negative class CN is generally vague (CN = U CP ; where U and CP are the universal 

and positive classes respectively). But, image classification necessarily requires both positive and negative classes for 

training. I-FAC is a single class image classifier that relies only on the positive class for training. Because of its fuzzy 

nature, I-FAC also handles polysemy and synonymy (common problems in most crisp (non-fuzzy) image classifiers) very 

well. As associative classification leverages frequent patterns mined from a given dataset, its performance as adjudged 

from its false-positive-rate(FPR)-versus-recall curve is very good, especially at lower FPRs when its recall is even better. 

IFAC has the added advantage that the rules used for classification have clear semantics, and can be comprehended easily, 

unlike other classifiers, such as SVM, which act as black-boxes. From an empirical perspective (on standard public 

datasets), the performance of I-FAC is much better, especially at lower FPRs, than that of either bag-of-words (BOW) or 

SVM (both using interest points). 

13:30-16:30, Paper ThBCT8.62 

Audio-Visual Data Fusion using a Particle Filter in the Application of Face Recognition 

Steer, Michael, Otto-von-guericke-Univ. Magdeburg 

This paper describes a methodology by which audio and visual data about a scene can be fused in a meaningful manner 

in order to locate a speaker in a scene. This fusion is implemented within a Particle Filter such that a single speaker can 

be identified in the presence of multiple visual observations. The advantages of this fusion are that weak sensory data 

from either modality can be reinforced and the presence of noise can be reduced. 

- 313 -

13:30-16:30, Paper ThBCT8.63 

The Problem of Fragile Feature Subset Preference in Feature Selection Methods and a Proposal of Algorithmic 

Workaround 

Somol, Petr, Inst. of Information Theory and Automation, Czech 

Grim, Jiří, Inst. of Information Theory and Automation 

Pudil, Pavel, Prague Univ. of Ec. 

We point out a problem inherent in the optimization scheme of many popular feature selection methods. It follows from 

the implicit assumption that higher feature selection criterion value always indicates more preferable subset even if the 

value difference is marginal. This assumption ignores the reliability issues of particular feature preferences, over-fitting 

and feature acquisition cost. We propose an algorithmic extension applicable to many standard feature selection methods 

allowing better control over feature subset preference. We show experimentally that the proposed mechanism is capable 

of reducing the size of selected subsets as well as improving classifier generalization. 

ThBCT9 Lower Foyer 

Signal, Speech, and Image Processing Poster Session 

Session chair: Ariki, Yasuo (Kobe Univ.) 

13:30-16:30, Paper ThBCT9.1 

Removing Partial Occlusion from Blurred Thin Occluders 

Mccloskey, Scott, McGill Univ. Honeywell 

Langer, Michael, McGill Univ. 

Siddiqi, Kaleem, McGill Univ. 

We present a method to remove partial occlusion that arises from out-of-focus thin foreground occluders such as wires, 

branches, or a fence. Such partial occlusion causes the irradiance at a pixel to be a weighted sum of the radiances of a 

blurred foreground occluder and that of the background. The result is that the background component has lower contrast 

than it would if seen without the occluder. In order to remove the contribution of the foreground in such regions, we characterize 

the position and size of the occluder in a narrow aperture image. In subsequent images with wider apertures, we 

use this characterization to remove the contribution of the foreground, thereby restoring contrast in the background. We 

demonstrate our method on real camera images without assuming that the background is static. 

13:30-16:30, Paper ThBCT9.2 

A New Approach to Aircraft Surface Inspection based on Directional Energies of Texture 

Mumtaz, Mustafa, National Univ. of Sciences and Tech. 

Bin Mansoor, Atif, National Univ. of Sciences and Tech. 

Masood, Hassan, National Univ. of Sciences and Tech. 

Non Destructive Inspections (NDI) plays a vital role in aircraft industry as it determines the structural integrity of aircraft 

surface and material characterization. The existing NDI methods are time consuming, we propose a new NDI approach 

using Digital Image Processing that has the potential to substantially decrease the inspection time. The aircraft imagery is 

analyzed by two methods i.e Contourlet Transform (CT) and Discrete Cosine Transform (DCT). With the help of Contourlet 

Transform the two dimensional (2-D) spectrum is divided into fine slices, using iterated directional filter banks. Next, directional 

energy components for each block of the decomposed subband outputs are computed. These energy values are 

used to distinguish between the crack and scratch images using the Dot Product classifier. In next approach, the aircraft 

imagery is decomposed into high and low frequency components using DCT and the first order moment is determined to 

form feature vectors. A correlation based approach is then used for distinction between crack and scratch surfaces. A comparative 

examination between the two techniques on a database of crack and scratch images revealed that texture analysis 

using the combined transform based approach gave the best results by giving an accuracy of 96.6% for the identification 

of crack surfaces and 98.3% for scratch surfaces. 

13:30-16:30, Paper ThBCT9.3 

A Generalized Anisotropic Diffusion for Defect Detection in Low-Contrast Surfaces 

Chao, Shin-Min, Utechzone Co. Ltd. 

Tsai, Du-Ming, Yuan-Ze Univ. 

Li, Wei-Chen, Yuan-Ze Univ. 

Chiu, Wei-Yao, Yuan-Ze Univ. 

- 314 -

In this paper, an anisotropic diffusion model with a generalized diffusion coefficient function is presented for defect detection 

in low-contrast surface images and, especially, aims at material surfaces found in liquid crystal display (LCD) 

manufacturing. A defect embedded in a low-contrast surface image is extremely difficult to detect because the intensity 

difference between unevenly-illuminated background and defective regions are hardly observable. The proposed anisotropic 

diffusion model provides a generalized diffusion mechanism that can flexibly change the curve of the diffusion coefficient 

function. It adaptively carries out a smoothing process for faultless areas and performs a sharpening process for defect 

areas in an image. An entropy criterion is proposed as the performance measure of the diffused image and then a stochastic 

evolutionary computation algorithm, particle swarm optimization (PSO), is applied to automatically determine the best 

parameter values of the generalized diffusion coefficient function. Experimental results have shown that the proposed 

method can effectively and efficiently detect small defects in low-contrast surface images. 

13:30-16:30, Paper ThBCT9.4 

Impact of Vector Ordering Strategies on Morphological Unmixing of Remotely Sensed Hyperspectral Images 

Plaza, Antonio, Univ. of Extremadura 

Hyper spectral imaging is a new technique in remote sensing that generates hundreds of images, corresponding to different 

wavelength channels, for the same area on the surface of the Earth. In previous work, we have explored the application of 

morphological operations to integrate both spatial and spectral responses in hyper spectral data analysis. These operations 

rely on ordering pixel vectors in spectral space, but there is no unambiguous means of defining the minimum and maximum 

values between two vectors of more than one dimension. Our original contribution in this paper is to examine the impact 

of different vector ordering strategies on the definition of multi-channel morphological operations. Our focus is on morphological 

unmixing, which decomposes each pixel vector in the hyper spectral scene into a combination of pure spectral 

signatures (called end members) and their associated abundance fractions, allowing sub-pixel characterization. Experiments 

are conducted using real hyper spectral data sets collected by NASA/JPL’s Airborne Visible Infra-Red Imaging Spectrometer 

(AVIRIS) system. 

13:30-16:30, Paper ThBCT9.5 

A Recursive and Model-Constrained Region Splitting Algorithm for Cell Clump Decomposition 

Xiong, Wei, Inst. for Infocomm Res. A-STAR 

Ong, Sim Heng, National Univ. of Singapore 

Lim, Joo-Hwee, Inst. for Infocomm Res. 

Decomposition of cells in clumps is a difficult segmentation task requiring region splitting techniques. Techniques that do 

not employ prior shape constraints usually fail to achieve accurate segmentation. Those using shape constraints are unable 

to cope with large clumps and occlusions. In this work, we propose a model-constrained region splitting algorithm for cell 

clump decomposition. We build the cell model using joint probability distribution of invariant shape features. The shape 

model, the contour smoothness and the gradient information along the cut are used to optimize the splitting in a recursive 

manner. The short cut rule is also adopted as a strategy to speed up the process. The algorithm performs well in validation 

experiments using 60 images with 4516 cells and 520 clumps. 

13:30-16:30, Paper ThBCT9.6 

Bounding-Box based Segmentation with Single Min-Cut using Distant Pixel Similarity 

Pham, Viet-Quoc, The Univ. of Tokyo 

Takahashi, Keita, The Univ. of Tokyo 

Naemura, Takeshi, The Univ. of Tokyo 

This paper addresses the problem of interactive image segmentation with a user-supplied object bounding box. The underlying 

problem is the classification of pixels into foreground and background, where only background information is 

provided with sample pixels. Many approaches treat appearance models as an unknown variable and optimize the segmentation 

and appearance alternatively, in an expectation maximization manner. In this paper, we describe a novel approach 

to this problem: the objective function is expressed purely in terms of the unknown segmentation and can be optimized 

using only one minimum cut calculation. We aim to optimize the trade-off of making the foreground layer as large as possible 

while keeping the similarity between the foreground and background layers as small as possible. This similarity is 

formulated using the similarities of distant pixel pairs. We evaluated our algorithm on the GrabCut dataset and demonstrated 

that high-quality segmentations were attained at a fast calculation speed. 

- 315 -

13:30-16:30, Paper ThBCT9.7 

Image Retargeting in Compressed Domain 

Murthy, O.v. Ramana, Nanyang Tech. Univ. 

Muthuswamy, Karthik, Nanyang Tech. Univ. 

Rajan, Deepu, Nanyang Tech. Univ. 

Chia, Liang-Tien, Nanyang Tech. Univ. 

A simple algorithm for image retargeting in the compressed domain is proposed. Most existing retargeting algorithms 

work directly in the spatial domain of the raw image. Here, we work on the DCT coefficients of a JPEG-compressed image 

to generate a gradient map that serves as an importance map to help identify those parts in the image that need to be 

retained during the retargeting process. Each 8x8 block of DCT coefficients is scaled based on the least importance value. 

Retargeting can be done both in the horizontal and vertical directions with the same framework. We also illustrate image 

enlargement using the same method. Experimental results show that the proposed algorithm produces less distortion in 

the retargeted image compared to some other algorithms reported recently. 

13:30-16:30, Paper ThBCT9.8 

Progressive MAP-Based Deconvolution with Pixel-Dependent Gaussian Prior 

Tanaka, Masayuki, Tokyo Inst. of Tech. 

Kanda, Takafumi, Tokyo Inst. of Tech. 

Okutomi, Masatoshi, 

A deconvolution is a fundamental technique and used in various vision applications. A maximum a posteriori estimation 

is known as a powerful tool. In this paper, we propose a progressive MAP-based deconvolution algorithm with a pixel dependent 

Gaussian image prior. In the proposed algorithm, a mean and a variance for each pixel are adaptively estimated. 

Then, the mean and the variance are progressively updated. We experimentally show that the proposed algorithm is comparable 

to the state-of-the-art algorithms in the case that the true point spread function (PSF) is used for the deconvolution, 

and that the proposed algorithm outperforms in the non-true PSF case. 

13:30-16:30, Paper ThBCT9.9 

A Fast Image Inpainting Method based on Hybrid Similarity-Distance 

Liu, Jie, Chinese Acad. of Sciences 

Zhang, Shuwu, Chinese Acad. of Sciences 

Yang, Wuyi, Chinese Acad. of Sciences 

Li, Heping, Chinese Acad. of Sciences 

A fast image in painting method based on hybrid similarity-distance is proposed in this paper. In Criminisi et al.’s work 

[1], similarity distance are not reliable enough in many cases and the algorithm performs inefficiently. To solve these problems, 

we propose a new searching strategy to accelerate the algorithm. In addition, we modify the confidence-updating 

rule to make more reasonable the distributions of the confidences in source region. Besides, taking account of the stationarity 

of texture and the reliability of the source regions, we present a hybrid similarity-distance, which combines the 

distance in color space with the distance in spatial space by weight coefficients related to the confidence value. A more 

reasonable patch will be found out by this hybrid similarity-distance. The experiments verify that the proposed method 

yields qualitative improvements compared to Criminisi et al.’s work [1]. 

13:30-16:30, Paper ThBCT9.10 

Reversible Integer 2-D Discrete Fourier Transform by Control Bits 

Dursun, Serkan, Univ. of Texas at San Antonio 

Grigoryan, Artyom M., Univ. of Texas at San Antonio 

This paper describes the 2-D reversible integer discrete Fourier transform (RiDFT), which is based on the concept of the 

paired representation of the 2-D image, which is referred to as the unique 2-D frequency and 1-D time representation. The 

2-D DFT of the image is split into a minimum set of short transforms, and the image is represented as a set of 1-D signals. 

The paired 2-DDFT involves a few operations of multiplication that can be approximated by integer transforms, such as 

one-point transforms with one control bit. 24 control bits are required to perform the 8x8-point RiDFT, and 264 control 

bits for the 16x16-point 2-D RiDFT of real inputs. The fast paired method of calculating the 1-D DFT is used. The computational 

complexity of the proposed 2-D RiDFTs is comparative with the complexity of the fast 2-D DFT. 

- 316 -

13:30-16:30, Paper ThBCT9.12 

Image Inpainting based on Local Optimisation 

Zhou, Jun, National ICT Australia 

Robles-Kelly, Antonio, National ICT Australia 

In this paper, we tackle the problem of image in painting which aims at removing objects from an image or repairing damaged 

pictures by replacing the missing regions using the information in the rest of the scene. The image in painting method 

proposed here builds on an exemplar-based perspective so as to improve the local consistency of the in painted region. 

This is done by selecting the optimal patch which maximises the local consistency with respect to abutting candidate 

patches. The similarity computation generates weights based upon an edge prior and the structural differences between in 

painting exemplar candidates. This treatment permits the generation of an in painting sequence based on a list of factors. 

The experiments show that the proposed method delivers a margin of improvement as compared to alternative methods. 

13:30-16:30, Paper ThBCT9.13 

Image Processing based Approach for Retrieving Data from a Seismic Section in Bitmap Format 

Chevion, Dan, IBM Res. Lab. in Haifa 

Navon, Yaakov, IBM 

Ramm, Dov, former Res. stuff member of IBM Israel Res. Lab. 

A new method for retrieving seismic data from a seismic section provided in a bitmap format is described. The method is 

based on image processing techniques and includes creating a grey level image of a seismic section, processing the grey 

level image (by integration, filtering, etc.) and then reconstructing digitized values of individual seismic traces_ from the 

resulting image, thus ending with the data in standard SEG-Y format 

13:30-16:30, Paper ThBCT9.14 

Visible Entropy: A Measure for Image Visibility 

Hou, Zujun, Inst. for Infocomm Res. 

Yau, Wei-Yun, Inst. for Infocomm Res. 

Image visibility is a fundamental issue in the field of computer vision. This paper investigates the connection between 

histogram and image visibility, where the concept of entropy is employed to depict the information content of the histogram. 

It turns out that image visibility is more dependent on the observed intensity levels with higher frequencies and the distribution 

of their locations in the range of intensity levels. With this in mind, the concept of visible entropy is proposed. The 

usefulness of the proposed visibility measure has been evaluated using a number of realistic images. 

13:30-16:30, Paper ThBCT9.15 

Research the Performance of a Recursive Algorithm of the Local Discrete Wavelet Transform 

Kopenkov, Vasiliy, RAS 

Myasnikov, Vladislav, RAS 

We experimentally compare the performance of two fast algorithms for computing the local discrete wavelet transform of 

one-dimensional signals: the Mallatalgorithm and a recursive algorithm. For the comparison purposes, we analyze Haar 

wavelet bases for one and two-dimensional signals, an extension of the Haar basis with the scale coefficient 3, and biorthogonal 

polynomial spline wavelets with finite support. 

13:30-16:30, Paper ThBCT9.16 

Auditory Features Revisited for Robust Speech Recognition 

Harte, Naomi, Trinity Coll. Dublin 

Kelly, Finnian, Trinity Coll. Dublin 

Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most 

promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings 

with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC). 

Standard Mel-Frequency Cepstral Coefficients (MFCC) are also tested for reference. The performance of all features is 

reported on the TIMIT database using a HMM-based recogniser. It is found that the PNCC features outperform MFCC in 

- 317 -

clean conditions and are most robust to noise. ZCPA performance is shown to vary widely with filter bank configuration 

and frame length. The ZCPA performance is poor in clean conditions but is the least affected by white noise. PNCC is 

shown to be the most promising new feature set for robust ASR in recent years. 

13:30-16:30, Paper ThBCT9.17 

Sparse Representation for Speaker Identification 

Naseem, Imran, The Univ. of Western Australia 

Togneri, Roberto, The Univ. of Western Australia 

Bennamoun, Mohammed, The Univ. of Western Australia 

We address the closed-set problem of speaker identification by presenting a novel sparse representation classification algorithm. 

We propose to develop an over complete dictionary using the GMM mean super vector kernel for all the training 

utterances. A given test utterance corresponds to only a small fraction of the whole training database. We therefore propose 

to represent a given test utterance as a linear combination of all the training utterances, thereby generating a naturally 

sparse representation. Using this sparsity, the unknown vector of coefficients is computed via l1minimization which is 

also the sparsest solution [12]. Ideally, the vector of coefficients so obtained has nonzero entries representing the class 

index of the given test utterance. Experiments have been conducted on the standard TIMIT [14] database and a comparison 

with the state-of-art speaker identification algorithms yields a favorable performance index for the proposed algorithm. 

13:30-16:30, Paper ThBCT9.18 

Latency in Speech Feature Analysis for Telepresence State Coding 

O’Gorman, Lawrence, Alcatel-Lucent Bell Lab. 

For video conferencing, there are network bandwidth and screen real-estate constraints that limit the number of user channels. 

We propose an intermediate transmission mode that transmits only at events, where these are detected by both audio 

and video changes from the short-term signal average. Our objective in this paper is to determine latency until the audio 

portion of a single telepresence channel stabilizes. It is this stable signal from which we detect events. We describe a recursive 

filter approach for feature determination and experiments on the Switchboard telephone call database. Results 

show latency to stable signal of up to 10 seconds. Although events can be detected much more quickly. 

13:30-16:30, Paper ThBCT9.19 

Automatically Detecting Peaks in Terahertz Time-Domain Spectroscopy 

Stephani, Henrike, Fraunhofer ITWM 

Jonuscheit, Joachim, Fraunhofer IPM 

Robiné, Christoph, Fraunhofer IPM 

Heise, Bettina, JKU 

To classify spectroscopic measurements it is necessary to have comparable methods of evaluation. In Terahertz (THz) 

time-domain spectroscopy, as a new technology, neither the presentation of the data nor the peak detection is standardized 

yet. We propose a procedure for automatic peak extraction in THz spectra of chemical compounds. After preprocessing in 

the time-domain, we use a variance based algorithm for determining the valid frequency region. We furthermore propose 

a baseline correction using simulated THz spectra. We illustrate how this procedure works on the example of hyperspectral 

THz measurements of six chemical compounds. Subsequently we propose to use unsupervised classification on the thus 

processed data to robustly detect the characteristic peaks of a compound. 

13:30-16:30, Paper ThBCT9.20 

Iwasawa Decomposition and Computational Riemannian Geometry 

Lenz, Reiner, Linköping Univ. 

Mochizuki, Rika, Nippon Telegraph and Telephone Corp. 

Chao, Jinhui, Chuo Univ. 

We investigate several topics related to manifold-techniques for signal processing. On the most general level we consider 

manifolds with a Riemannian Geometry. These manifolds are characterized by their inner products on the tangent spaces. 

We describe the connection between the symmetric positive-definite matrices defining these inner products and the Cartan 

and the Iwasawa decomposition of the general linear matrix groups. This decomposition gives rise to the decomposition 

- 318 -

of the inner product matrices into diagonal matrices and orthonormal and into diagonal and upper triangular matrices. 

Next we describe the estimation of the inner product matrices from measured data as an optimization process on the homogeneous 

space of upper triangular matrices. We show that the decomposition leads to simple forms of partial derivatives 

that are commonly used in optimization algorithms. Using the group theoretical parametrization ensures also that all intermediate 

estimates of the inner product matrix are symmetric and positive definite. Finally we apply the method to a 

problem from psychophysics where the color perception properties of an observer are characterized with the help of color 

matching experiments. We will show that measurements from color weak observers require the enforcement of the positive-definiteness 

of the matrix with the help of the manifold optimization technique. 

13:30-16:30, Paper ThBCT9.21 

Rethinking Algorithm Design and Development in Speech Processing 

Stadelmann, Thilo, Univ. of Marburg 

Wang, Yinghui, Univ. of Marburg 

Smith, Matthew, Univ. of Hannover 

Ewerth, Ralph, Univ. of Marburg 


Speech processing is typically based on a set of complex algorithms requiring many parameters to be specified. When 

parts of the speech processing chain do not behave as expected, trial and error is often the only way to investigate the reasons. 

In this paper, we present a research methodology to analyze unexpected algorithmic behavior by making (intermediate) 

results of the speech processing chain perceivable and intuitively comprehensible by humans. The workflow of the 

process is explicated using a real-world example leading to considerable improvements in speaker clustering. The described 

methodology is supported by a software toolbox available for download. 

13:30-16:30, Paper ThBCT9.22 

Phone-Conditioned Suboptimal Wiener Filtering 

Gonzalez-Caravaca, Guillermo, Univ. Autonoma de Madrid 

Toledano, Doroteo, Univ. Autonoma de Madrid 

Puertas, Maria, Univ. Autonoma de Madrid 

A novel way of managing the compromise between noise reduction and speech distortion in Wiener filters is presented. It 

is based on adjusting the amount of noise reduced, and therefore the speech distortion introduced, on a phone-by-phone 

basis. We show empirically that optimal Wiener filters produce different amounts of speech distortion for different phones. 

Therefore we propose a phone-conditioned suboptimal Wiener filter that uses different amounts of noise reduction for 

each phone, based on a previous estimation of the amount of distortion introduced. Speech recognition results have shown 

that phone conditioning suboptimal Wiener filtering can provide almost a 5% additional relative improvement in word 

accuracy over comparable optimal Wiener filtering. 

13:30-16:30, Paper ThBCT9.23 

Geodesic Active Fields on the Sphere 

Zosso, Dominique, École Pol. Fédérale de Lausanne 

Thiran, Jean-Philippe, École Pol. Fédérale de Lausanne 

In this paper, we propose a novel method to register images defined on spherical meshes. Instances of such spherical 

images include inflated cortical feature maps in brain medical imaging or images from omni directional cameras. We apply 

the Geodesic Active Fields (GAF) framework locally at each vertex of the mesh. Therefore we define a dense deformation 

field, which is embedded in a higher dimensional manifold, and minimize the weighted Polyakov energy. While the 

Polyakov energy itself measures the hyper area of the embedded deformation field, its weighting allows to account for the 

quality of the current image alignment. Iteratively minimizing the energy drives the deformation field towards a smooth 

solution of the registration problem. Although the proposed approach does not necessarily outperform state-of-the-art 

methods that are tightly tailored to specific applications, it is of methodological interest due to its high degree of flexibility 

and versatility. 

- 319 -

13:30-16:30, Paper ThBCT9.24 

Emotional Speech Classification based on Multi View Characterization 

Mahdhaoui, Ammar, Univ. Pierre & Marie Curie 

Chetouani, M., Inst. des Systèmes Intelligents et Robotique 

Emotional speech classification is a key problem in social interaction analysis. Traditional emotional speech classification 

methods are completely supervised and require large amounts of labeled data. In addition, various feature sets are usually 

used to characterize the emotional speech signals. Therefore, we propose a new co-training algorithm based on multiview 

features. More specifically, we adopt different features for the characterization of speech signals to form different 

views for classification, so as to extract as much discriminative information as possible. We then use the co-training algorithm 

to classify emotional speech with only few annotations. In this article, a dynamic weighted co-training algorithm is 

developed to combine different features (views) to predict the common class variable. Experiments prove the validity and 

effectiveness of this method compared to self-training algorithm. 

13:30-16:30, Paper ThBCT9.25 

Image Inpainting using Structure-Guided Priority Belief Propagation and Label Transformations 

Hsin, Heng-Feng, National Chung Cheng Univ. 

Leou, Jin-Jang, National Chung Cheng Univ. 

Lin, Cheng-Shian, National Chung Cheng Univ. 

Chen, Hsuan-Ying, National Chung Cheng Univ. 

In this study, an image in painting approach using structure-guided priority belief propagation (BP) and label transformations 

is proposed. The proposed approach contains five stages, namely, Markov random field (MRF) node determination, 

structure map generation, label set enlargement by label transformations, image in painting by priority-BP optimization, 

and overlapped region composition. Based on experimental results obtained in this study, as compared with three comparison 

approaches, the proposed approach provides the better image in painting results. 

13:30-16:30, Paper ThBCT9.26 

Comparison of Syllable/Phone HMM based Mandarin TTS 

Duan, Quansheng, Tsinghua Univ. 

Kang, Shiyin, Tsinghua Univ. 

Shuang, Zhiwei, IBM Res. - China 

Wu, Zhiyong, Tsinghua Univ. 

Cai, Lianhong, Tsinghua Univ. 

Qin, Yong, IBM Res. - China 

The performance of HMM-based text to speech (TTS) system is affected by the basic modeling units and the size of 

training data. This paper compares two HMM based Mandarin TTS systems using syllable and phone as basic units respectively 

with 1000, 3000 and 5000 sentences’ training data. Two female speakers’ corpora are used as training data for 

evaluation. For both corpora, the system using syllable as basic unit outperforms the system using phone as basic unit 

with 3000 and 5000 sentences’ training data. 

13:30-16:30, Paper ThBCT9.27 

QRS Complex Detection by Non Linear Thresholding of Modulus Maxima 

Jalil, Bushra, Univ. de Bourgogne 

Laligant, Olivier, Univ. de Bourgogne 

Fauvet, Eric, Univ. de Bourgogne 

Beya, Ouadi, Univ. de Bourgogne 

Electrocardiogram (ECG) signal is used to analyze the cardiovascular activity in the human body and has a primary role 

in the diagnosis of several heart diseases. The QRS complex is the most distinguishable component in the ECG. Therefore, 

the accuracy of the detection of QRS complex is crucial to the performance of subsequent machine learning algorithms 

for cardiac disease classification. The aim of the present work is to detect QRS wave from ECG signals. Wavelet transform 

filtering is applied to the signal in order to remove baseline drift, followed by QRS localization. By using the property of 

R peak, having highest and prominent amplitude, we have applied thresholding technique based on the median absolute 

deviation(MAD) of modulus maximas to detect the complex. In order to evaluate the algorithm, the analysis has been 

- 320 -

done on MIT-BIH Arrhythmia database. The results have been examined and approved by medical doctors. 

13:30-16:30, Paper ThBCT9.28 

Crossmodal Matching of Speakers using Lip and Voice Features in Temporally Non-Overlapping Audio and Video 

Streams 

Roy, Anindya, Ec. Pol. Federale de Lausanne 

Marcel, Sebastien, Ec. Pol. Federale de Lausanne 

Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently 

or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identification 

in a cross-modal scenario, i.e., matching the speaker in an audio recording to the same speaker in a video recording, 

where the two recordings have been made during different sessions, using speaker specific information which is 

common to both the audio and video modalities. Several recent psychological studies have shown how humans can indeed 

perform this task with an accuracy significantly higher than chance. Here we propose two systems which can solve this 

task comparably well, using purely pattern recognition techniques. We hypothesize that such systems could be put to practical 

use in multimodal biometric and surveillance systems. 

13:30-16:30, Paper ThBCT9.29 

Image Parsing with a Three-State Series Neural Network Classifier 

Seyedhosseini Tarzjani, Seyed Mojtaba, Univ. of Utah 



We propose a three-state series neural network for effective propagation of context and uncertainty information for image 

parsing. The activation functions used in the proposed model have three states instead of the normal two states. This makes 

the neural network more flexible than the two-state neural network, and allows for uncertainty to be propagated through 

the stages. In other words, decisions about difficult pixels can be left for later stages which have access to more contextual 

information than earlier stages. We applied the proposed method to three different datasets and experimental results demonstrate 

higher performance of the three-state series neural network. 

13:30-16:30, Paper ThBCT9.30 

Pan-Sharpening using an Adaptive Linear Model 

Liu, Lining, Beihang Univ. 



Yu, Haiyan, Beihang Univ. 

In this paper, we propose an algorithm to synthesize high-resolution multispectral images by fusing panchromatic (Pan) 

images and multispectral (MS) images. The algorithm is based on an adaptive linear model, which is automatically estimated 

by least square fitting. In this model, a virtual difference band is appended to the MS to guarantee the correlation 

between the Pan and MS. Then, an iterative procedure is carried out to generate the fused images using steepest descent 

method. The efficiency of the presented technique is tested by performing pan-sharpening of IKONOS, Quick Bird, and 

Landsat-7 ETM+ datasets. Experimental results show that our method provides better fusion results than other methods. 

13:30-16:30, Paper ThBCT9.31 

A Study of Voice Source and Vocal Tract Filter based Features in Cognitive Load Classification 

Le, Phu, The Univ. of New South Wales 

Epps, Julien, The Univ. of New South Wales 

Choi, Eric, ational ICT Australia 

Ambikairajah, Eliathamby, The Univ. of New South Wales 

Speech has been recognized as an attractive method for the measurement of cognitive load. Previous approaches have 

used mel frequency cepstral coefficients (MFCCs) as discriminative features to classify cognitive load. The MFCCs contain 

information from both the voice source and the vocal tract, so that the individual contributions of each to cognitive load 

variation are unclear. This paper aims to extract speech features related to either the voice source or the vocal tract and use 

- 321 -

them to discriminate between cognitive load levels in order to identify the individual contribution of each for cognitive 

load measurement. Voice source-related features are then used to improve the performance of current cognitive load classification 

systems, using adapted Gaussian mixture models. Our experimental result shows that the use of voice source 

feature could yield around 12% reduction in relative error rate compared with the baseline system based on MFCCs, intensity, 

and pitch contour. 

13:30-16:30, Paper ThBCT9.32 

Adaptive Enhancement with Speckle Reduction for SAR Images using Mirror-Extended Curvelet and PSO 

Li, Ying, Northwestern Pol. Univ. 

Hongli, Gong, Northwestern Pol. Univ. 

Wang, Qing, Northwestern Pol. Univ. 

Speckle and low contrast can cause image degradation, which reduces the detectability of targets and impedes further investigation 

of synthetic aperture radar (SAR) images. This paper presents an adaptive enhancement method with speckle 

reduction for SAR images using mirror-extended curve let (ME-curve let) transform and particle swarm optimization 

(PSO). First, an improved enhancement function is proposed to nonlinearly shrink and stretch the curve let coefficients. 

Then, a novel objective evaluation criterion is introduced to adaptively obtain the optimal parameters in the enhancement 

function. Finally, a PSO algorithm with two improvements is used as a global search strategy for the best enhanced image. 

Experimental results indicate that the proposed method can reduce the speckle and enhance the edge features and the contrast 

of SAR images better with comparison to the wavelet-based and curve let-based non-adaptive enhancement methods. 

13:30-16:30, Paper ThBCT9.33 

Recursive Video Matting and Denoising 

Prabhu, Sahana, Indian Inst. of Tech. Madras 

Ambasamudram, Rajagopalan, Indian Inst. of Tech. Madras 

In this paper, we propose a video matting method with simultaneous noise reduction based on the Unscented Kalman filter 

(UKF). This recursive approach extracts the alpha mattes and denoised foregrounds from noisy videos, in a unified framework. 

No assumptions are made about the type of motion of the camera or of the foreground object in the video. Moreover, 

user-specified trimaps are required only once every ten frames. In order to accurately extract information at the borders 

between the foreground and the background, we include a discontinuity-adaptive Markov random field (MRF) prior. It 

incorporates spatio-temporal information from the current and previous frame during estimation of the alpha matte as well 

as the foreground. Results are given on videos with real film-grain noise. 

13:30-16:30, Paper ThBCT9.35 

The Effects of Radiometry on the Accuracy of Intensity based Registration 

Selby, Boris Peter, Medcom GmbH 

Sakas, Georgios, Fraunhofer IGD 

Walter, Stefan, Medcom GmbH 

Groch, Wolf-Dieter, Univ. of Applied Sciences Darmstadt 

Stilla, Uwe, Tech. Univ. Muenchen 

Besides several other factors, radiometric differences between a reference and a floating image greatly influence the achievable 

accuracy of image registration. In this work we derive the magnitude of registration inaccuracy coming from changes 

in radiometric properties. This is done for the example of medical X-ray image registration. We therefore estimate the 

change of image intensity with respect to object shape, X-ray attenuation of the object material and the initial X-ray energy 

by modeling a simplified image formation process. The change in intensity is then used to determine a closed form estimation 

of the resulting registration error, independent from a specific registration algorithm. Finally the theoretical calculations 

are compared to the accuracy of intensity based registration performed on X-ray images with different radiometric 

properties. Results show that the herewith derived accuracy estimation is well suited to predict the achievable accuracy of 

a registration for images with radiometric differences. 

- 322 -

13:30-16:30, Paper ThBCT9.36 

Fence Removal from Multi-Focus Images 

Yamashita, Atsushi, Shizuoka Univ. 

Matsui, Akiyoshi, Shizuoka Univ. 

Kaneko, Toru, Shizuoka Univ. 

When an image of a scene is captured by a camera through a fence, a blurred fence image interrupts objects in the scene. 

In this paper, we propose a method for a fence removal from the image using multiple focusing. Most of previous methods 

interpolate the interrupted regions by using information of surrounding textures. However, these methods fail when information 

of surrounding textures is not rich. On the other hand, there are methods that acquire multiple images for image 

restoration and composite them to generate a new clear image. The latter approach is adopted because it is robust and accurate. 

Multi-focus images are acquired and ``defocusing’’ information is utilized to generate a clear image. Experimental 

results show the effectiveness of the proposed method. 

13:30-16:30, Paper ThBCT9.37 

Information Theoretic Expectation Maximization based Gaussian Mixture Modeling for Speaker Verification 

Memon, Sheeraz, RMIT Univ. 

Lech, Margaret, RMIT Univ. 

Namunu, Maddage, RMIT Univ. 

The expectation maximization (EM) algorithm is widely used in the Gaussian mixture model (GMM) as the state-of-art 

statistical modeling technique. Like the classical EM method, the proposed EM-Information Theoretic algorithm (EM- 

IT) adapts means, covariances and weights, however this process is not conducted directly on feature vectors but on a 

smaller set of centroids derived by the information theoretic procedure, which simultaneously minimizes the divergence 

between the Parzen estimates of the feature vector’s distribution within a given Gaussian component and the centroid’s 

distribution within the same Gaussian component. The EM-IT algorithm was applied to the speaker verification problem 

using NIST 2004 speech corpus and the MFCC with dynamic features. The results showed an improvement of the equal 

error rate (ERR) by 1.5% over the classical EM approach. The EM-IT also showed higher convergence rates compare to 

the EM method. 

13:30-16:30, Paper ThBCT9.38 

A Gaussian Process Regression Framework for Spatial Error Concealment with Adaptive Kernels 

Asheri, Hadi, Sharif Univ. of Tech. 

Rabiee, Hamid Reza, Sharif Univ. of Tech. 

Pourdamghani, Nima, Sharif Univ. of Tech. 

Rohban, Mohammad H., Sharif Univ. of Tech. 

We have developed a Gaussian Process Regression method with adaptive kernels for concealment of the missing macroblocks 

of block-based video compression schemes in a packet video system. Despite promising results, the proposed algorithm 

introduces a solid framework for further improvements. In this paper, the problem of estimating lost macro-blocks 

will be solved by estimating the proper covariance function of the Gaussian process defined over a region around the missing 

macro-blocks (i.e. its kernel function). In order to preserve block edges, the kernel is constructed adaptively by using 

the local edge related information. Moreover, we can achieve more improvement by local estimation of the kernel parameters. 

While restoring the prominent edges of the missing macro-blocks, the proposed method produces perceptually 

smooth concealed frames. Objective and subjective evaluations verify the effectiveness of the proposed method. 

13:30-16:30, Paper ThBCT9.39 

Colour Constant Image Shapening 

Alsam, Ali, Sør-Trøndelag Univ. Coll. 

In this paper, we introduce a new sharpening method which guarantees colour constancy and resolves the problem of equiluminance 

colours. The algorithm is similar to unsharp masking in that the gradients are calculated at different scales by 

blurring the original with a variable size kernel. The main difference is in the blurring stage where we calculate the average 

of an n times n neighborhood by projecting each colour vector onto the space of the center pixel before averaging. Thus 

starting with the center pixel we define a projection matrix onto the space of that vector. Each neighboring colour is then 

projected onto the center and the result is summed up. The projection step results in an average vector which shares the 

- 323 -

direction of the original center pixel. The difference between the center pixel and the average is by definition a vector 

which is scalar away from the center pixel. Thus adding the average to the center pixel is guaranteed not to result in colour 

shifts. This projection step is also shown to remedy the problem of equiluminance colours and can be used for m-dimensional 

data. Finally, the results indicate that the new sharpening method results in better sharpening than that achieved 

using unsharp masking with noticeably less halos around strong edges. The latter aspect of the algorithm is believed to be 

due to the asymmetric nature of the projection step. 

13:30-16:30, Paper ThBCT9.40 

Maximally Stable Texture Regions 

Güney, Mesut, Turkish Naval Academy 

Arica, Nafiz, Turkish Naval Academy 

In this study, we propose to detect interest regions based on texture information of images. For this purpose, Maximally 

Stable Extremal Regions (MSER) approach is extended using the high dimensional texture features of image pixels. The 

regions with different textures from their vicinity are detected using agglomerative clustering successively. The proposed 

approach is evaluated in terms of repeatability and matching scores in an experimental setup used in the literature. It outperforms 

the intensity and color based detectors, especially in the images containing textured regions. It succeeds better 

in the transformations including viewpoint change, blurring, illumination and JPEG compression, while producing comparable 

results in the other transformations tested in the experiments. 

13:30-16:30, Paper ThBCT9.41 

Combining the Likelihood and the Kullback-Leibler Distance in Estimating the Universal Background Model for 

Speaker Verification using SVM 

Lei, Zhenchun, JiangxiNormal Univ. 

The state-of-the-art methods for speaker verification are based on the support vector machine. The Gaussian supervector 

SVM is a typical method which uses the Gaussian mixture model for creating feature vectors for the discriminative SVM. 

And all GMMs are adapted from the same universal background model, which is got by maximum likelihood estimation 

on a large number of data sets. So the UBM should cover the feature space widely as possible. We propose a new method 

to estimate the parameters of the UBM by combining the likelihood and the Kullback-Leibler distances in the UBM. Its 

aim is to find the model parameters which get the high likelihood value and all Gaussian distributions are dispersed to 

cover the feature space in a great measuring. Experiments on NIST 2001 task show that our method can improve the performance 

obviously. 

13:30-16:30, Paper ThBCT9.42 

Asymmetric Generalized Gaussian Mixture Models and EM Algorithm for Image Segmentation 

Nacereddine, Nafaa, LORIA 



Hamami, Latifa, Ec. Nationale Pol. 

In this paper, a parametric and unsupervised histogram-based image segmentation method is presented. The histogram is 

assumed to be a mixture of asymmetric generalized Gaussian distributions. The mixture parameters are estimated by using 

the Expectation Maximization algorithm. Histogram fitting and region uniformity measures on synthetic and real images 

reveal the effectiveness of the proposed model compared to the generalized Gaussian mixture model. 

13:30-16:30, Paper ThBCT9.43 

Color Connectedness Degree for Mean-Shift Tracking 

Gouiffès, Michèle, IEF Univ. Paris Sud 11 

Laguzet, Florence, LRI Univ. Paris Sud 11 

Lacassagne, Lionel, IEF Univ. Paris Sud 11 

This paper proposes an extension to the mean shift tracking. We introduce the color connectedness degrees (CCD) which, 

more than providing statistical information about the target to track, embeds information about the amount of connectedness 

of the color intervals which compose the target. With a low increase of complexity, this approach provides a better robust- 

- 324 -

ness and quality of the tracking compared to the use of the RGB space. This is asserted by the experiments performed on 

several sequences showing vehicles and pedestrians in various contexts. 

13:30-16:30, Paper ThBCT9.44 

Signal-To-Signal Ratio Independent Speaker Identifi Cation for Co-Channel Speech Signals 

Saeidi, Rahim, Univ. of Eastern Finland 

Mowlaee, Pejman, Aalborg Univ. 

Kinnunen, Tomi, Univ. of Eastern Finland 

Tan, Zheng-Hua, Aalborg Univ. 

Christensen, Mads Græsbøll, Aalborg Univ. 

Jensen, Søren Holdt, Aalborg Univ. 

Fränti, Pasi, Univ. of Eastern Finland 

In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is 

recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition 

accuracies have already been reported when an accurately estimated signal-to-signal ratio (SSR) is available. In this paper, 

we approach the problem without estimating SSR. We show that a simple method based on fusion of adapted Gaussian 

mixture models and Kullback-Leibler divergence calculated between models, achieves an accuracy of 97% and 93% when 

the two target speakers enlisted as three and two most probable speakers, respectively. 

13:30-16:30, Paper ThBCT9.45 

Selection of Training Instances for Music Genre Classification 

Lopes, Miguel, INESC Porto 

Gouyon, Fabien, INESC Porto 

Koerich, Alessandro, PUCPR 

Oliveira, Luiz, Federal Univ. of Parana 

In this paper we present a method for the selection of training instances based on the classification accuracy of a SVM 

classifier. The instances consist of feature vectors representing short-term, low-level characteristics of music audio signals. 

The objective is to build, from only a portion of the training data, a music genre classifier with at least similar performance 

as when the whole data is used. The particularity of our approach lies in a pre-classification of instances prior to the main 

classifier training: i.e. we select from the training data those instances that show better discrimination with respect to class 

memberships. On a very challenging dataset of 900 music pieces divided among 10 music genres, the instance selection 

method slightly improves the music genre classification in 2.4 percentage points. On the other hand, the resulting classification 

model is significantly reduced, permitting much faster classification over test data. 

13:30-16:30, Paper ThBCT9.46 

Semi-Blind Speech-Music Separation using Sparsity and Continuity Priors 


M. Grais, Emad, Sabanci Univ. 

In this paper we propose an approach for the problem of single channel source separation of speech and music signals. 

Our approach is based on representing each source’s power spectral density using dictionaries and nonlinearly projecting 

the mixture signal spectrum onto the combined span of the dictionary entries. We encourage sparsity and continuity of the 

dictionary coefficients using penalty terms (or log-priors) in an optimization framework. We propose to use a novel coordinate 

descent technique for optimization, which nicely handles nonnegativity constraints and nonquadratic penalty terms. 

We use an adaptive Wiener filter, and spectral subtraction to reconstruct both of the sources from the mixture data after 

corresponding power spectral densities (PSDs) are estimated for each source. Using conventional metrics, we measure 

the performance of the system on simulated mixtures of single person speech and piano music sources. The results indicate 

that the proposed method is a promising technique for low speech-to-music ratio conditions and that sparsity and continuity 

priors help improve the performance of the proposed system. 

- 325 -

13:30-16:30, Paper ThBCT9.47 

Comparative Analysis for Detecting Objects under Cast Shadows in Video Images 

Villamizar Vergel, Michael, CSIC-UPC 

Scandaliaris, Jorge, CSIC-UPC 

Sanfeliu, Alberto, Univ. Pol. de Catalunya 

Cast shadows add additional difficulties on detecting objects because they locally modify image intensity and color. Shadows 

may appear or disappear in an image when the object, the camera, or both are free to move through a scene. This 

work evaluates the performance of an object detection method based on boosted HOG paired with three different image 

representations in outdoor video sequences. We follow and extend on the taxonomy from van de Sande with considerations 

on the constraints assumed by each descriptor on the spatial variation of the illumination. We show that the intrinsic image 

representation consistently gives the best results. This proves the usefulness of this representation for object detection in 

varying illumination conditions, and supports the idea that in practice local assumptions in the descriptors can be violated. 

13:30-16:30, Paper ThBCT9.48 

Shape-Appearance Guided Level-Set Deformable Model for Image Segmentation 

Khalifa, Fahmi, Univ. of Louisville 

El-Baz, Ayman, Univ. of Louisville 

Gimel’Farb, Georgy, Univ. of Auckland 

Abou El-Ghar, Mohamed, Univ. of Mansoura 

A new speed function to guide evolution of a level-set based active contour is proposed for segmenting an object from its 

background in a given image. The guidance accounts for a learned spatially variant statistical shape prior, 1st-order visual 

appearance descriptors of the contour interior and exterior (associated with the object and background, respectively), and 

a spatially invariant 2nd-order homogeneity descriptor. The shape prior is learned from a subset of co-aligned training images. 

The visual appearances are described with marginal gray level distributions obtained by separating their mixture 

over the image. The evolving contour interior is modeled by a 2nd-order translation and rotation invariant Markov-Gibbs 

random field of object/background labels with analytically estimated potentials. Experiments with kidney CT images confirm 

robustness and accuracy of the proposed approach. 

13:30-16:30, Paper ThBCT9.49 

Iterative Ramp Sharpening for Structure/Signature-Preserving Simplification of Images 

Grazzini, Jacopo, Los Alamos National Lab. 


In this paper, we present a simple and heuristic ramp sharpening algorithm that achieves local contrast enhancement of 

vector-valued images. The proposed algorithm performs pixel wise comparisons of intensity values, gradient strength and 

directional information in order to locate transition ramps around true edges in the image. The sharpening is then applied 

only for those pixels found on the ramps. This way, the contrast between objects and regions separated by a ramp is enhanced 

correspondingly, avoiding ringing artifacts. It is found that applying this technique in an iterative manner on blurred 

imagery produces sharpening preserving both structure and signature of the image. The final approach reaches a good 

compromise between complexity and effectiveness for image simplfication, enhancing in an efficient manner the image 

details and maintaining the overall image appearance. 

13:30-16:30, Paper ThBCT9.50 

Learning Naive Bayes Classifiers for Music Classification and Retrieval 

Fu, Zhouyu, Monash Univ. 

Lu, Guojun, Monash Univ. 

Ting, Kai Ming, Monash Univ. 

Zhang, Dengsheng, Monash Univ. 

In this paper, we explore the use of naive Bayes classifiers for music classification and retrieval. The motivation is to employ 

all audio features extracted from local windows for classification instead of just using a single song-level feature 

vector produced by compressing the local features. Two variants of naive Bayes classifiers are studied based on the extensions 

of standard nearest neighbor and support vector machine classifiers. Experimental results have demonstrated superior 

performance achieved by the proposed naive Bayes classifiers for both music classification and retrieval as compared 

to the alternative methods. 

- 326 -

13:30-16:30, Paper ThBCT9.52 

An Empirical Study of Feature Extraction Methods for Audio Classification 

Parker, Charles, Eastman Kodak Company 

With the growing popularity of video sharing web sites and the increasing use of consumer-level video capture devices, 

new algorithms are needed for intelligent searching and indexing of such data. The audio from these video streams is particularly 

challenging due to its low quality and high variability. Here, we perform a broad empirical study of features used 

for intelligent audio processing. We perform experiments on a dataset of 200 consumer videos over which we attempt to 

detect 10 semantic audio concepts. 

13:30-16:30, Paper ThBCT9.53 

Geometric Total Variation for Texture Deformation 

Bespalov, Dmitriy, Drexel Univ. 

Dahl, Anders, Tech. Univ. of Denmark 

Shokoufandeh, Ali, Drexel Univ. 

In this work we propose a novel variational method that we intend to use for estimating non-rigid texture deformation. 

The method is able to capture variation in gray scale images with respect to the geometry of its features. Accurate localization 

of features in the presence of unknown deformations is a crucial property for texture characterization. Our experimental 

evaluations demonstrate that accounting for geometry of features in texture images leads to significant 

improvements in localization of these features, when textures undergo geometrical transformations. In addition, feature 

descriptors using geometrical total variation energies discriminate between various regular textures with accuracy comparable 

to SIFT descriptors, while reduced dimensionality of TVG descriptor yields significant improvements over SIFT 

in terms of retrieval time. 

13:30-16:30, Paper ThBCT9.54 

A Novel Approach to Detect Ship-Radiated Signal based on HMT 

Zhou, Yue, Shanghai Jiaotong Univ. 

Niu, Zhibin, Shanghai Jiaotong Univ. 

Wang, Chenhao, Shanghai Jiaotong Univ. 

In the presence of non-gaussian noise, we propose a method for the detection of underwater ship-radiated signal. The 

wavelet decomposition of the underwater signal yields a natural tree structure, which is further modeled by the Hidden 

Markov Tree (HMT). Therefore, the signal is represented as the parameter of the correspondent HMT. We analysis the 

likelihood defined on the parameters and form the new detection criteria. Experimental results demonstrate a reliable and 

robust solution of our method. 

13:30-16:30, Paper ThBCT9.55 

Speech Emotion Analysis in Noisy Real-World Environment 

Tawari, Ashish, Univ. of California, San Diego 

Trivedi, Mohan, Univ. of California, San Diego 

Automatic recognition of emotional states via speech signal has attracted increasing attention in recent years. A number 

of techniques have been proposed which are capable of providing reasonably high accuracy for controlled studio settings. 

However, their performance is considerably degraded when the speech signal is contaminated by noise. In this paper, we 

present a framework with adaptive noise cancellation as front end to speech emotion recognizer. We also introduce a new 

feature set based on cepstral analysis of pitch and energy contours. Experimental analysis shows promising results. 

13:30-16:30, Paper ThBCT9.56 

Sampling and Ideal Reconstruction on the 3D Diamond Grid 

Strand, Robin, Uppsala Univ. 

This paper presents basic, yet important, properties that can be used when developing methods for image acquisition, processing, 

and visualization on the diamond grid. The sampling density needed to reconstruct a band-limited signal and the 

ideal interpolation function on the diamond grid are derived. 

- 327 -

13:30-16:30, Paper ThBCT9.57 

Detecting Faint Compact Sources using Local Features and a Boosting Approach 

Torrent, Albert, Univ. of Girona 

Peracaula, Marta, Univ. of Girona 

Llado, Xavier, Univ. of Girona 

Freixenet, Jordi, Univ. of Girona 

Sanchez-Sutil, Juan Ramon, Univ. de Jaén 

Martí, Josep, Univ. de Jaén 

Paredes, Josep Maria, Univ. de Barcelona 

Several techniques have been proposed so far in order to perform faint compact source detection in wide field interferometric 

radio images. However, all these methods can easily miss some detections or obtain a high number of false positive 

detections due to the low intensity of the sources, the noise ratio, and the interferometric patterns present in the images. 

In this paper we present a novel strategy to tackle this problem. Our approach is based on using local features extracted 

from a bank of filters in order to provide a description of different types of faint source structures. We then perform a 

training step in order to automatically learn and select the most salient features, which are used in a Boosting classifier to 

perform the detection. The validity of our method is demonstrated using 19 real images that compose a radio mosaic. The 

comparison with two well-known state of the art methods shows that our approach is able to obtain more source detections, 

reducing also the number of false positives. 

13:30-16:30, Paper ThBCT9.58 

Automatic Hair Detection in the Wild 

Julian, Pauline, IRIT, FittingBox 

Dehais, Christophe, FittingBox 

Lauze, Francois, Univ. of Copenhagen 

Charvillat, Vincent, IRIT 

Bartoli, Adrien, UdA 

Choukroun, Ariel, FittingBox 

This paper presents an algorithm for segmenting the hair region in uncontrolled, real life conditions images. Our method 

is based on a simple statistical hair shape model representing the upper hair part. We detect this region by minimizing an 

energy which uses active shape and active contour. The upper hair region then allows us to learn the hair appearance parameters 

(color and texture) for the image considered. Finally, those parameters drive a pixel-wise segmentation technique 

that yields the desired (complete) hair region. We demonstrate the applicability of our method on several real images. 

13:30-16:30, Paper ThBCT9.59 

De-Noising of SRμCT Fiber Images by Total Variation Minimization 

Lindblad, Joakim, Swedish Univ. of Agricultural Sciences 

Sladoje, Natasa, Univ. of Novi Sad 

Lukic, Tibor, Univ. of Novi Sad 

SRCT images of paper and pulp fiber materials are characterized by a low signal to noise ratio. De-noising is therefore a 

common preprocessing step before segmentation into fiber and background components. We suggest a de-noising 

method based on total variation minimization using a modified Spectral Conjugate Gradient algorithm. Quantitative 

evaluation performed on synthetic 3D data and qualitative evaluation on real 3D paper fiber data confirm appropriateness 

of the suggested method for the particular application. 

- 328 -

Abstract book (pdf) - ICPR 2010

Create successful ePaper yourself

Delete template?

Save as template?