06.02.2013 Views

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CONTENTS<br />

Organizing Committees 2<br />

Tracks and Co-Chairs 4<br />

Message from the General Chair 6<br />

Message from the Technical Program Chairs 7<br />

Technical Program Overview 8<br />

Technical Program for Monday 17<br />

Technical Program for Tuesday 69<br />

Technical Program for Wednesday 157<br />

Technical Program for Thursday 239<br />

- 1 -


Organizing Committees<br />

Conference Chair<br />

Aytül Erçil<br />

Sabanci University<br />

Turkey<br />

Technical Co-Chairs<br />

Kim Boyer<br />

Rensselaer<br />

Polytechnic Institute<br />

USA<br />

Müjdat Çetin<br />

Sabanci University<br />

Turkey<br />

Seong-Whan Lee<br />

Korea University<br />

Korea<br />

Advisory Committee<br />

Sergey Ablameyko<br />

National Academy of Sciences<br />

Belarus<br />

Hüseyin Abut<br />

San Diego<br />

State University<br />

USA<br />

Jake Aggarwal<br />

University of Texas<br />

USA<br />

Horst Bunke<br />

University of Bern<br />

Switzerland<br />

Rama Chellappa<br />

University of Maryland<br />

USA<br />

Igor B. Gurevich<br />

Russian Academy of Sciences<br />

Russia<br />

Anil K. Jain<br />

Michigan State University<br />

USA<br />

Takeo Kanade<br />

Carnegie Mellon University<br />

USA<br />

Rangachar Kasturi<br />

University of South Florida<br />

USA<br />

- 2 -<br />

Josef Kittler<br />

University of Surrey<br />

UK<br />

Brian Lovell<br />

University of Queensland<br />

Australia<br />

Theo Pavlidis<br />

Stony Brook University<br />

USA<br />

Pietro Perona<br />

California Institute of Technology<br />

USA<br />

Fatih Porikli<br />

MERL<br />

USA<br />

Alberto Sanfeliu<br />

Politechnical University of Catalonia<br />

Spain<br />

Bülent Sankur<br />

Bogazici University<br />

Turkey<br />

Bernhard Schölkopf<br />

Max Planck Institutes<br />

Germany<br />

Mubarak Shah<br />

University of Central Florida<br />

USA<br />

Tieniu Tan<br />

National Laboratory of<br />

Pattern Recognition<br />

China<br />

Sergios Theodoridis<br />

University of Athens<br />

Greece<br />

Plenary Speakers Committee<br />

Anil K. Jain<br />

Michigan State University<br />

USA<br />

Tutorials<br />

Denis Laurendeau<br />

Laval University<br />

Canada<br />

- 2 -


Arun Ross<br />

West Virginia University<br />

USA<br />

Birsen Yazıcı<br />

Rensselaer<br />

Polytechnic Institute<br />

USA<br />

Workshops<br />

Selim Aksoy<br />

Bilkent University<br />

Turkey<br />

Theo Gevers<br />

University of Amsterdam<br />

The Netherlands<br />

Denis Laurendeau<br />

Laval University<br />

Canada<br />

Bülent Sankur<br />

Bogazici University<br />

Turkey<br />

Contest Organization<br />

Selim Aksoy<br />

Bilkent University<br />

Turkey<br />

Zehra Çataltepe<br />

Istanbul Technical Universityv<br />

Turkey<br />

Devrim Ünay<br />

Bahcesehir University<br />

Turkey<br />

Publicity<br />

Enis Çetin<br />

Bilkent University<br />

Turkey<br />

Pınar Duygulu Şahin<br />

Bilkent University<br />

Turkey<br />

Asian Liaisons<br />

Karthik Nandakumar<br />

Institute for Infocomm Research<br />

Singapore<br />

Yunhou Wang<br />

Beihang University<br />

China<br />

- 3 -<br />

European Liaisons<br />

Javier Ortega-Garcia<br />

Universidad Autonoma de Madrid<br />

Spain<br />

Fabio Roli<br />

University of Cagliari<br />

Italy<br />

American Liaisons<br />

Deniz Erdoğmuş<br />

Northeastern University<br />

USA<br />

Publications<br />

Nafiz Arıca<br />

Naval Academy<br />

Turkey<br />

Cem Ünsalan<br />

Yeditepe University<br />

Turkey<br />

Local Arrangements<br />

Ayşın Baytan Ertüzün<br />

Bogazici University<br />

Turkey<br />

Mustafa Ünel<br />

Sabanci University<br />

Turkey<br />

Finance<br />

Gülbin Akgün<br />

Sabanci University<br />

Turkey<br />

Hakan Erdoğan<br />

Sabanci University<br />

Turkey<br />

Sponsorship<br />

Fatoş Yarman Vural<br />

Middle East Technical University<br />

Turkey<br />

Exhibits<br />

Olcay Kurşun<br />

Istanbul University<br />

Turkey


Tracks and Co-Chairs<br />

Track I: Computer Vision<br />

Joachim Buhmann<br />

ETH Zurich, Switzerland<br />

Xiaoyi Jiang<br />

University of Munster, Germany<br />

Jussi Parkkinen<br />

University of Joensuu, Finland<br />

Alper Yılmaz<br />

Ohio State University, USA<br />

Area Co-Chairs:<br />

Ahmet Ekin, Philips Research Europe, The Netherlands<br />

Georgy Gimel’farb, University of Auckland, New Zealand<br />

Muhittin Gökmen, Istanbul Technical University, Turkey<br />

Atsushi Imiya, Chiba University, Japan<br />

Nikos Paragios, Ecole Centrale de Paris, France<br />

Fatih Porikli, MERL, USA<br />

Sudeep Sarkar, University of South Florida, USA<br />

Bernt Schiele, TU Darmstadt, Germany<br />

Yaser Ajmal Sheikh, Carnegie Mellon, USA<br />

Dacheng Tao, Nanyang Technological University, Singapore<br />

Track II: Pattern Recognition and Machine Learning<br />

G. Sanniti di Baja<br />

Istituto di Cibernetica Eduardo Caianiello, Italy<br />

Mario Figueiredo<br />

Instituto Superior Tecnico, Portugal<br />

Bilge Günsel<br />

Istanbul Technical University, Turkey<br />

D.Y. Yeung<br />

Hong Kong University of Science and Technology, China<br />

Area Co-Chairs:<br />

Ethem Alpaydın, Bogazici University, Turkey<br />

Gunilla Borgefors, CBA Uppsala, Sweden<br />

Yang Gao, Nanjing University, China<br />

Simone Marinai, University of Florence, Italy<br />

Aleix Martinez, The Ohio State University, USA<br />

Petr Somol, UTIA, Czech Republic<br />

Tolga Taşdizen, University of Utah, USA<br />

Zhi-Hua Zhou, Nanjing University, China<br />

Track III: Signal, Speech, Image and Video Processing<br />

Maria Petrou<br />

Imperial College, UK<br />

Kazuya Takeda<br />

Nagoya University, Japan<br />

Murat Tekalp<br />

Koc University, Turkey<br />

Jean-Philippe Thiran<br />

Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland<br />

- 4 -


Track IV:<br />

Biometrics and Human Computer Interaction<br />

Lale Akarun<br />

Bogazici University, Turkey<br />

Patrick Flynn<br />

University of Notre Dame, USA<br />

B. Vijaya Kumar<br />

Carnegie Mellon, USA<br />

Stan Z. Li<br />

Chinese Academy of Sciences, China<br />

Track V: Multimedia and Document Analysis,<br />

Processing and Retrieval<br />

Nozha Boujemaa<br />

INRIA, France<br />

David Doermann<br />

University of Maryland, USA<br />

B. S. Manjunath<br />

University of California, USA<br />

Nicu Sebe<br />

University of Trento, Italy<br />

Berrin Yanıkoğlu<br />

Sabanci University, Turkey<br />

Track VI: Bioinformatics and Biomedical Applications<br />

Rachid Deriche<br />

INRIA, France<br />

Tianzi Jiang<br />

Chinese Academy of Sciences, China<br />

Elena Marchiori<br />

Radboud University, Netherlands<br />

Dimitris Metaxas<br />

The State University of New Jersey, USA<br />

Gözde Ünal<br />

Sabanci University, Turkey<br />

- 5 -


Message from the General Chair<br />

It is my great honor and privilege to welcome all of you to the 20th International Conference on Pattern Recognition.<br />

In the past 40 years, this conference brought together the research communities of industry and academia, from all over<br />

the world to discuss important issues, challenges, and solutions in Pattern Recognition related problems. The conference<br />

has established itself as a forum at which research as well as practical aspects of pattern recognition are enthusiastically<br />

addressed. We hope to continue this tradition by offering you another successful forum with an interesting program.<br />

Once again we have a very strong technical program, with technical sessions on computer vision, pattern recognition and<br />

machine learning, signal, speech, image and video processing, biometrics and human computer interaction, multimedia<br />

and Document Analysis, Processing, and Bioinformatics and Biomedical Applications. We are also fortunate to have distinguished<br />

invited speakers: Christopher Bishop from Microsoft Research Cambridge, Shree K. Nayar Columbia University,<br />

Prabhakar Raghavan from Yahoo! Research will share their experiences and vision with us. The conference also has<br />

an extremely varied program: there will be 7 interesting tutorials that are an integrated part of the program, as well as 9<br />

workshops that allow an even deeper focus on areas that are of interest to the conference participants. A new feature in the<br />

program this year is the organization of 9 contests which will provide a setting where participants will have the opportunity<br />

to evaluate their algorithms using publicly available datasets, and discuss technical topics in an atmosphere that fosters<br />

active exchange of ideas.<br />

A number of organizations, namely, Tüpraş (TR), Tübitak (TR), Havelsan (TR), Cybersoft (TR), Savronik (TR), Chryso<br />

(TR), Star Alliance (TR), Mitsubishi Electric Research Laboratories (USA), IBM Research (USA) and Elsevier (USA),<br />

kindly served as supporters of the Conference. We are most grateful to these organizations for their financial support and<br />

encouragement. The conference is technically co-sponsored by IEEE Computer Society, continuing our desire to seek<br />

closer collaboration between our two communities.<br />

During this period, I have had the opportunity to work closely with some of the best people in our community. We are extremely<br />

grateful to Prof. Müjdat Çetin and Osman Rahmi Fıçıcı, who worked day and night, beyond their professional duties<br />

to make the conference a success. The success of any conference depends heavily on the quality of the selected papers.<br />

For selecting the best out of many excellent submitted papers, we are indebted to the technical co-chairs Müjdat Çetin,<br />

Kim Boyer and Seong-Whan Lee, all the track chairs, and the external referees for their hard work that has continued to<br />

uphold the high standard that is now custom to this conference series.<br />

Special thanks also to the conference organizing committee, in particular, Ayşın Ertüzün and Mustafa Ünel (Local arrangement<br />

chairs), Cem Ünsalan (Publication chair), Fatoş Yarman Vural (Sponsorship chair), Anil Jain (Plenary speakers chair),<br />

Hakan Erdoğan and Gülbin Akgül (Finance chairs), Olcay Kurşun (Exhibits chair), Denis Laurendeau, Arun Ross and<br />

Birsen Yazıcı (Tutorial chairs), Selim Aksoy, Theo Gevers, Denis Laurendeau and Bülent Sankur (Workshop chairs), Selim<br />

Aksoy, Zehra Çataltepe and Devrim Ünay (Contest chairs) and Pınar Duygulu, Enis Çetin (Publicity chairs). There would<br />

be no conference without them. We also thank IAPR ex-co members and past chairs of this event for their continued<br />

support and advice in helping us. We would also like to thank Sabancı University President Prof. Nihat Berker for his support<br />

and encouragement. Our special thanks go to the Teamcon staff members, who provided critical support overseeing<br />

all the logistics and making the smooth operation of the entire conference possible.<br />

Finally, no conference can ever take place without the support of those individuals who submit their original research results,<br />

or without the participants, who honor the conference with their presence.<br />

We hope that you will find the conference both enjoyable and valuable, and also enjoy the architectural, cultural and<br />

natural beauty of Istanbul, and Turkey.<br />

Aytül Erçil<br />

<strong>ICPR</strong> <strong>2010</strong> General Chair<br />

Sabancı University, Faculty of Engineering and Natural Sciences<br />

- 6 -


Message from the Technical Program Chairs<br />

The full technical program committee joins the three of us in welcoming you to the <strong>2010</strong> International Conference on Pattern<br />

Recognition in beautiful, fascinating İstanbul! This is the 20th edition of <strong>ICPR</strong>, world famous as the flagship conference<br />

of the International Association for Pattern Recognition. For nearly 40 years, <strong>ICPR</strong> has been the international forum<br />

for reporting the latest advances across a wide spectrum of fields including pattern recognition and machine learning,<br />

computer vision, image and signal understanding, medical image analysis, biometrics and human-computer interaction,<br />

multimedia and document analysis, bioinformatics and biomedical applications, and more.<br />

The conference program is the work of many people, whose names you will find in the accompanying lists. Track Chairs,<br />

in some cases supported by Area Chairs, pored over thoughtful, well-written reviews provided by an extensive set of referees<br />

drawn from the broad IAPR community. Preliminary decisions were funneled to a set of Track Chairs and Müjdat<br />

Çetin, who met in İstanbul to finalize the program. Papers submitted by Track Chairs were processed by Kim Boyer. General<br />

Chair’s and Technical Program Chairs’ papers were handled by a senior researcher and a separate set of reviewers in<br />

a process completely external to the main paper management system. Seong-Whan Lee took the point on awards.<br />

In all we received 2140 submissions, and accepted 1147 for an acceptance rate of 54%. Of the accepted papers, we were<br />

able to accommodate 385 for oral presentation and 762 as posters. This submission number continues an upward trend for<br />

<strong>ICPR</strong>, and underscores the health of our scientific community. A slight tightening of the acceptance rate ensures a highquality<br />

meeting, and indeed was necessary to fit into the space and time constraints. It is, however, undoubtedly true that<br />

many quality submissions were left out. This is an unfortunate byproduct of the compressed time window in which such<br />

a large number of decisions need to be made.<br />

We thank all of the authors who took the time to prepare and submit their work. We are also deeply grateful to all of the<br />

reviewers, and especially the Track and Area Chairs who devoted so much time and expertise to bringing forth a quality<br />

meeting.<br />

We are confident that <strong>ICPR</strong> <strong>2010</strong> will prove to be a rewarding experience, both scientifically as you interact with others<br />

at the meeting, and culturally as you enjoy the rich heritage, local cuisine, crafts, shopping, and so much more that İstanbul<br />

has to offer.<br />

We look forward to seeing you during our time together, here where the continents meet.<br />

Müjdat Çetin, Kim Boyer, and Seong-Whan Lee<br />

Technical Program Chairs<br />

- 7 -


- 8 -


- 9 -


- 10 -


- 11 -


- 12 -


- 13 -


- 14 -


- 15 -


- 16 -


Technical Program for Monday<br />

August 23, <strong>2010</strong><br />

- 17 -


- 18 -


09:00-09.30, MoOT10 Anadolu Auditorium<br />

Opening Session<br />

09:30-10:30, MoP1L1 Anadolu Auditorium<br />

K.S. Fu Prize Lecture:<br />

Towards the Unification of Structural and Statistical Pattern Recognition<br />

Horst Bunke Plenary Session<br />

Research Group on Computer Vision and Artificial Intelligence IAM<br />

University of Bern, Switzerland<br />

Statistical pattern recognition is characterized by the use of feature vectors for pattern representation, while the structural<br />

approach is based on symbolic data structures, such as strings, trees, and graphs. Clearly, symbolic data structures have a<br />

higher representational power than feature vectors because they allows one to directly model relationships that may exist<br />

between the individual parts of a pattern. However, many operations that are needed in classification, clustering, and other<br />

pattern recognition tasks are not defined for graphs. Consequently, there has been a lack of algorithmic tools in the domain<br />

of structural pattern recognition since its beginning. This talk gives an overview of the development of the field of structural<br />

pattern recognition and shows various attempts to bridge the gap between statistical and structural pattern recognition, i.e.<br />

to make algorithmic tools originally developed for feature vectors applicable to symbolic data structures.<br />

MoAT1 Anadolu Auditorium<br />

Image Analysis - I Regular Session<br />

Session chair: Aksoy, Selim (Bilkent Univ.)<br />

11:00-11:20, Paper MoAT1.1<br />

Minimizing Geometric Distance by Iterative Linear Optimization<br />

Chen, Yisong, Peking Univ.<br />

Sun, Jiewei, Peking Univ.<br />

Wang, Guoping, Peking Univ.<br />

This paper proposes an algorithm that solves planar homography by iterative linear optimization. we iteratively employ<br />

direct linear transformation (DLT) algorithm to robustly estimate the homography induced by a given set of point correspondences<br />

under perspective transformation. By simple on-the-fly homogeneous coordinate adjustment we progressively minimize<br />

the difference between the algebraic error and the geometric error. When the difference is sufficiently close to zero,<br />

the geometric error is equivalently minimized and the homography is reliably solved. Backward covariance propagation is<br />

employed to do error analysis. The experiments prove that the algorithm is able to find global minimum despite erroneous<br />

initialization. It gives very precise estimate at low computational cost and greatly outperforms existing techniques.<br />

11:20-11:40, Paper MoAT1.2<br />

Hyper Least Squares and its Applications<br />

Rangarajan, Prasanna, Southern Methodist Univ.<br />

Kanatani, Kenichi, Okayama Univ.<br />

Niitsuma, Hirotaka, Okayama Univ.<br />

Sugaya, Yasuyuki, Toyohashi Univ. of Tech.<br />

We present a new form of least squares (LS), called ``hyper LS’’, for geometric problems that frequently appear in computer<br />

vision applications. Doing rigorous error analysis, we maximize the accuracy by introducing a normalization that eliminates<br />

statistical bias up to second order noise terms. Our method yields a solution comparable to maximum likelihood (ML)<br />

without iterations, even in large noise situations where ML computation fails.<br />

11:40-12:00, Paper MoAT1.3<br />

Integrating a Discrete Motion Model into GMM based Background Subtraction<br />

Wolf, Christian, INSA de Lyon<br />

Jolion, Jean-Michel, Univ. de Lyon<br />

GMM based algorithms have become the de facto standard for background subtraction in video sequences, mainly because<br />

of their ability to track multiple background distributions, which allows them to handle complex scenes including moving<br />

trees, flags moving in the wind etc. However, it is not always easy to determine which distributions of the mixture belong<br />

- 19 -


to the background and which distributions belong to the foreground, which disturbs the results of the labeling process for<br />

each pixel. In this work we tackle this problem by taking the labeling decision together for all pixels of several consecutive<br />

frames minimizing a global energy function taking into account spatial and temporal relationships. A discrete approximative<br />

optical-flow like motion model is integrated into the energy function and solved with Ishikawa’s convex graph cuts algorithm.<br />

12:00-12:20, Paper MoAT1.4<br />

Saliency based on Multi-Scale Ratio of Dissimilarity<br />

Huang, Rui, Huazhong Univ. of Science and Tech.<br />

Sang, Nong, Huazhong Univ. of Science and Tech.<br />

Liu, Leyuan, Huazhong Univ. of Science and Tech.<br />

Tang, Qiling, Huazhong Univ. of Science and Tech.<br />

Recently, many vision applications tend to utilize saliency maps derived from input images to guide them to focus on processing<br />

salient regions in images. In this paper, we propose a simple and effective method to quantify the saliency for<br />

each pixel in images. Specially, we define the saliency for a pixel in a ratio form, where the numerator measures the<br />

number of dissimilar pixels in its center-surround and the denominator measures the total number of pixels in its centersurround.<br />

The final saliency is obtained by combining these ratios of dissimilarity over multiple scales. For images, the<br />

saliency map generated by our method not only has a high quality in resolution also looks more reasonable. Finally, we<br />

apply our saliency map to extract the salient regions in images, and compare the performance with some state-of-the-art<br />

methods over an established ground-truth which contains 1000 images.<br />

12:20-12:40, Paper MoAT1.5<br />

Online Principal Background Selection for Video Synopsis<br />

Feng, Shikun, Chinese Acad. of Sciences<br />

Liao, Shengcai, Chinese Acad. of Sciences<br />

Yuan, Zhiyong, Wuhan Univ.<br />

Li, Stan Z., Chinese Acad. of Sciences<br />

Video synopsis provides a means for fast browsing of activities in video. Principal background selection (PBS) is an important<br />

step in video synopsis. Existing methods make PBS in an offline way and at a high memory cost. In this paper we<br />

propose a novel background selection method, ``online principal background selection’’ (OPBS). The OPBS selects n<br />

principal backgrounds from N backgrounds in an online fashion with a low memory cost, making it possible to build an<br />

efficient online video synopsis system. Another advantage is that, with OPBS, the selected backgrounds are related to not<br />

only background changes over time but also video activities. Experimental results demonstrate the advantages of the proposed<br />

OPBS.<br />

MoAT2 Marmara Hall<br />

Support Vector Machines Regular Session<br />

Session chair: Alpaydin, Ethem (Bogazici Univ.)<br />

11:00-11:20, Paper MoAT2.1<br />

Large Margin Classifier based on Affine Hulls<br />

Cevikalp, Hakan, Eskisehir Osmangazi Univ.<br />

Yavuz, Hasan Serhan, Eskisehir Osmangazi Univ.<br />

This paper introduces a geometrically inspired large-margin classifier that can be a better alternative to the Support Vector<br />

Machines (SVMs) for the classification problems with limited number of training samples. In contrast to the SVM classifier,<br />

we approximate classes with affine hulls of their class samples rather than convex hulls, which may be unrealistically<br />

tight in high-dimensional spaces. To find the best separating hyperplane between any pair of classes approximated with<br />

the affine hulls, we first compute the closest points on the affine hulls and connect these two points with a line segment.<br />

The optimal separating hyperplane is chosen to be the hyperplane that is orthogonal to the line segment and bisects the<br />

line. To allow soft margin solutions, we first reduce affine hulls in order to alleviate the effects of outliers and then search<br />

for the best separating hyperplane between these reduced models. Multi-class classification problems are dealt with constructing<br />

and combining several binary classifiers as in SVM. The experiments on several databases show that the proposed<br />

method compares favorably with the SVM classifier.<br />

- 20 -


11:20-11:40, Paper MoAT2.2<br />

2D Shape Recognition using Information Theoretic Kernels<br />

Bicego, Manuele, Univ. of Verona<br />

Torres Martins, André Filipe, Inst. Superior Técnico<br />

Murino, Vittorio, Univ. of Verona<br />

Aguiar, Pedro M. Q., Inst. for Systems and Robotics / Inst. Superior Tecnico<br />

Figueiredo, Mario A. T., Inst. Superior Técnico<br />

In this paper, a novel approach for contour based 2D shape recognition is proposed, using a class of information theoretic<br />

kernels recently introduced. This kind of kernels, based on a non-extensive generalization of the classical Shannon information<br />

theory, are defined on probability measures. In the proposed approach, chain code representations are first extracted<br />

from the contours; then n-gram statistics are computed and used as input to the information theoretic kernels. We tested<br />

different versions of such kernels, using support vector machine and nearest neighbor classifiers. An experimental evaluation<br />

on the Chicken pieces dataset shows that the proposed approach significantly outperforms the current state-of-theart<br />

methods.<br />

11:40-12:00, Paper MoAT2.3<br />

Time Series Classification using Support Vector Machine with Gaussian Elastic Metric Kernel<br />

Zhang, Dongyu, Harbin Inst. of Tech.<br />

Zuo, Wangmeng, Harbin Inst. of Tech.<br />

Zhang, David, The Hong Kong Pol. Univ.<br />

Zhang, Hongzhi, Harbin Inst. of Tech.<br />

Motivated by the great success of dynamic time warping (DTW) in time series matching, Gaussian DTW kernel had been<br />

developed for support vector machine (SVM)-based time series classification. Counter-examples, however, had been subsequently<br />

reported that Gaussian DTW kernel usually cannot outperform Gaussian RBF kernel in the SVM framework.<br />

In this paper, by extending the Gaussian RBF kernel, we propose one novel class of Gaussian elastic metric kernel (GEMK),<br />

and present two examples of GEMK: Gaussian time warp edit distance (GTWED) kernel and Gaussian edit distance with<br />

real penalty (GERP) kernel. Experimental results on UCR time series data sets show that, in terms of classification accuracy,<br />

SVM with GEMK is much superior to SVM with Gaussian RBF kernel and Gaussian DTW kernel, and the state-of-theart<br />

similarity measure methods.<br />

12:00-12:20, Paper MoAT2.4<br />

Multiplicative Update Rules for Multilinear Support Tensor Machines<br />

Kotsia, Irene, Queen Mary Univ. of London<br />

Patras, Ioannis, Queen Mary Univ. of London<br />

In this paper, we formulate the Multilinear Support Tensor Machines (MSTMs) problem in a similar to the Non-negative<br />

Matrix Factorization (NMF) algorithm way. A novel set of simple and robust multiplicative update rules are proposed in<br />

order to find the multilinear classifier. Updates rules are provided for both hard and soft margin MSTMs and the existence<br />

of a bias term is also investigated. We present results on standard gait and action datasets and report faster convergence of<br />

equivalent classification performance in comparison to standard MSTMs.<br />

12:20-12:40, Paper MoAT2.5<br />

Support Vectors Selection for Supervised Learning using an Ensemble Approach<br />

Guo, Li, Univ. of Bordeaux 3<br />

Boukir, Samia, Univ. of Bordeaux 3<br />

Chehata, Nesrine, Univ. of Bordeaux 3<br />

Support Vector Machines (SVMs) are popular for pattern classification. However, training a SVM requires large memory<br />

and high processing time, especially for large datasets, which limits their applications. To speed up their training, we<br />

present a new efficient support vector selection method based on ensemble margin, a key concept in ensemble classifiers.<br />

This algorithm exploits a new version of the margin of an ensemble-based classification and selects the smallest margin<br />

instances as support vectors. Our experimental results show that our method reduces training set size significantly without<br />

degrading the performance of the resulting SVMs classifiers.<br />

- 21 -


MoAT3 Topkapı Hall A<br />

Motion and Multiple-View Vision – I Regular Session<br />

Session chair: Hancock, Edwin (Univ. of York)<br />

11:00-11:20, Paper MoAT3.1<br />

Estimating Apparent Motion on Satellite Acquisitions with a Physical Dynamic Model<br />

Huot, Etienne, INRIA and UVSQ<br />

Herlin, Isabelle, INRIA<br />

Mercier, Nicolas, INRIA<br />

Plotnikov, Evgeny, National Academy of Sciences, Ukraine<br />

The paper presents a motion estimation method based on data assimilation in a dynamic model, named Image Model, expressing<br />

the physical evolution of a quantity observed on the images. The application concerns the retrieval of apparent<br />

surface velocity from a sequence of satellite data, acquired over the ocean. The Image Model includes a shallow-water<br />

approximation for the dynamics of the velocity field (the evolution of the two components of motion are linked by the<br />

water layer thickness) and a transport equation for the image field. For retrieving the surface velocity, a sequence of Sea<br />

Surface Temperature (SST) acquisitions is assimilated in the Image Model with a 4D-Var method. This is based on the<br />

minimization of a cost function including the discrepancy between model outputs and SST data and a regularization term.<br />

Several types of regularization norms have been studied. Results are discussed to analyze the impact of the different components<br />

of the assimilation system.<br />

11:20-11:40, Paper MoAT3.2<br />

Multiple View Geometries for Mirrors and Cameras<br />

Fujiyama, Shinji, Nagoya Inst. of Tech.<br />

Sakaue, Fumihiko, Nagoya Inst. of Tech.<br />

Sato, Jun, Nagoya Inst. of Tech.<br />

In this paper, we analyze the multiple view geometry for a camera and mirrors, and propose a method for computing the<br />

geometry of the camera and mirrors accurately from fewer corresponding points than the existing methods. The geometry<br />

between a camera and mirrors can be described as the multiple view geometry for a real camera and virtual cameras. We<br />

show that very strong constraints on geometries can be obtained in addition to the ordinary multilinear constraints. By<br />

using these constraints, we can estimate multiple view geometry more accurately from fewer corresponding points than<br />

usual. The experimental results show the efficiency of the proposed method.<br />

11:40-12:00, Paper MoAT3.3<br />

Perspective Reconstruction and Camera Auto-Calibration as Rectangular Polynomial Eigenvalue Problem<br />

Pernek, Ákos, MTA SZTAKI, BME<br />

Hajder, Levente, MTA SZTAKI<br />

Motion-based 3D reconstruction (SfM) with missing data has been a challenging computer vision task since the late 90s.<br />

Under perspective camera model, one of the most difficult problems is camera auto-calibration which means determining<br />

the intrinsic camera parameters without using any known calibration object or assuming special properties of the scene.<br />

This paper presents a novel algorithm to perform camera auto-calibration from multiple images and dealing with the missing<br />

data problem. The method supposes semi-calibrated cameras (every intrinsic camera parameter except for the focal<br />

length is considered to be known) and constant focal length over all the images. The solution requires at least one image<br />

pair having at least eight common measured points. Tests verified that the algorithm is numerically stable and produces<br />

accurate results both on synthetic and real test sequences.<br />

12:00-12:20, Paper MoAT3.4<br />

Multi-Camera Platform Calibration using Multi-Linear Constraints<br />

Nyman, Patrik, Lund Univ.<br />

Heyden, Anders, Lund Univ.<br />

Astroem, Kalle, Lund Univ.<br />

We present a novel calibration method for multi-camera platforms, based on multi-linear constraints. The calibration<br />

method can recover the relative orientation between the different cameras on the platform, even when there are no corre-<br />

- 22 -


sponding feature points between the cameras, i.e. there are no overlaps between the cameras. It is shown that two translational<br />

motions in different directions are sufficient to linearly recover the rotational part of the relative orientation. Then<br />

two general motions, including both translation and rotation, are sufficient to linearly recover the translational part of the<br />

relative orientation. However, as a consequence of the speed-scale ambiguity the absolute scale of the translational part<br />

can not be determined if no prior information about the motions are known, e.g. from dead reckoning. It is shown that in<br />

case of planar motion, the vertical component of the translational part can not be determined. However, if at least one<br />

feature point can be seen in two different cameras, this vertical component can also be estimated. Finally, the performance<br />

of the proposed method is shown in simulated experiments.<br />

12:20-12:40, Paper MoAT3.5<br />

A Game-Theoretic Approach to Robust Selection of Multi-View Point Correspondence<br />

Rodolà, Emanuele, Univ. Ca’ Foscari Venezia<br />

Albarelli, Andrea, Univ. Ca’ Foscari di Venezia<br />

Torsello, Andrea, Univ. Ca’ Foscari<br />

In this paper we introduce a robust matching technique that allows very accurate selection of corresponding feature points<br />

from multiple views. Robustness is achieved by enforcing global geometric consistency at an early stage of the matching<br />

process, without the need of subsequent verification through reprojection. The global consistency is reduced to a pairwise<br />

compatibility making use of the size and orientation information provided by common feature descriptors, thus projecting<br />

what is a high-order compatibility problem into a pairwise setting. Then a game-theoretic approach is used to select a<br />

maximally consistent set of candidate matches, where highly compatible matches are enforced while incompatible correspondences<br />

are driven to extinction.<br />

MoAT4 Dolmabahçe Hall A<br />

Ensemble Learning Regular Session<br />

Session chair: Roli, Fabio (Univ. of Cagliari)<br />

11:00-11:20, Paper MoAT4.1<br />

A Bias-Variance Analysis of Bootstrapped Class-Separability Weighting for Error-Correcting Output Code Ensemble<br />

Smith, Raymond, Univ. of Surrey<br />

Windeatt, Terry, Univ. of Surrey<br />

We investigate the effects, in terms of a bias-variance decomposition of error, of applying class-separability weighting<br />

plus bootstrapping in the construction of error-correcting output code ensembles of binary classifiers. Evidence is presented<br />

to show that bias tends to be reduced at low training strength values whilst variance tends to be reduced across the full<br />

range. The relative importance of these effects, however, varies depending on the stability of the base classifier type.<br />

11:20-11:40, Paper MoAT4.2<br />

Multi-Class AdaBoost with Hypothesis Margin<br />

Jin, Xiaobo, Chinese Acad. of Sciences<br />

Hou, Xinwen, Chinese Acad. of Sciences<br />

Liu, Cheng-Lin, Chinese Acad. of Sciences<br />

Most AdaBoost algorithms for multi-class problems have to decompose the multi-class classification into multiple binary<br />

problems, like the Adaboost.MH and the LogitBoost. This paper proposes a new multi-class AdaBoost algorithm based<br />

on hypothesis margin, called AdaBoost.HM, which directly combines multi-class weak classifiers. The hypothesis margin<br />

maximizes the output about the positive class meanwhile minimizes the maximal outputs about the negative classes. We<br />

discuss the upper bound of the training error about AdaBoost.HM and a previous multi-class learning algorithm AdaBoost.M1.<br />

Our experiments using feed forward neural networks as weak learners show that the proposed AdaBoost.HM<br />

yields higher classification accuracies than the AdaBoost.M1 and the AdaBoost.MH, and meanwhile, AdaBoost.HM is<br />

computationally efficient in training.<br />

- 23 -


11:40-12:00, Paper MoAT4.3<br />

A Score Decidability Index for Dynamic Score Combination<br />

Lobrano, Carlo, DIEE- Univ. of Cagliari<br />

Tronci, Roberto, Univ. of Cagliari<br />

Giacinto, Giorgio, Univ. of Cagliari<br />

Roli, Fabio, Univ. of Cagliari<br />

In two-class problems, the combination of the outputs (scores) of an ensemble of classifiers is widely used to attain high<br />

performance. Dynamic combination techniques that estimate the combination parameters on a pattern per pattern basis,<br />

usually provide better performance than those of static combination techniques. In this paper, we propose an Index of Decidability<br />

derived from the Wilcox on-Mann-Whitney statistic, that is used to estimate the combination parameters. Reported<br />

results on a multimodal biometric dataset show the effectiveness of the proposed dynamic combination mechanisms<br />

in terms of misclassification errors.<br />

12:00-12:20, Paper MoAT4.4<br />

AUC-Based Combination of Dichotomizers: Is Whole Maximization also Effective for Partial Maximization?<br />

Ricamato, Maria Teresa, Univ. degli Studi di Cassino<br />

Tortorella, Francesco, Univ. degli Studi di Cassino<br />

The combination of classifiers is an established technique to improve the classification performance. When dealing with<br />

two-class classification problems, a frequently used performance measure is the Area under the ROC curve (AUC) since<br />

it is more effective than accuracy. However, in many applications, like medical or biometric ones, tests with false positive<br />

rate over a given value are of no practical use and thus irrelevant for evaluating the performance of the system. In these<br />

cases, the performance should be measured by looking only at the interesting part of the ROC curve. Consequently, the<br />

optimization goal is to maximize only a part of the AUC instead of the whole area. In this paper we propose a method tailored<br />

for these situations which builds a linear combination of two dichotomizers maximizing the partial AUC (pAUC).<br />

Another aim of the paper is to understand if methods that maximize the AUC can maximize also the pAUC. An empirical<br />

comparison drawn between algorithms maximizing the AUC and the proposed method shows that this latter is more effective<br />

for the pAUC maximization than methods designed to globally optimize the AUC.<br />

12:20-12:40, Paper MoAT4.5<br />

Random Prototypes-Based Oracle for Selection-Fusion Ensembles<br />

Armano, Giuliano, Univ. of Cagliari<br />

Hatami, Nima, Univ. of Cagliari<br />

Classifier ensembles based on selection-fusion strategy have recently aroused enormous interest. The main idea underlying<br />

this strategy is to use miniensembles instead of monolithic base classifiers in an ensemble in order to improve the overall<br />

performance. This paper proposes a classifier selection method to be used in selection-fusion strategies. The method involves<br />

first splitting the original classification problem according to some prototypes randomly selected from training<br />

data, and then building a classifier on each subset. The trained classifiers, together with an oracle used to switch between<br />

them, form a miniensemble of classifier selection. With respect to the other methods used in the selection-fusion framework,<br />

the proposed method has proven to be more efficient in the decomposition process with no limitation in the number of resulting<br />

partitions. Experimental results on some datasets from the UCI repository show the validity of the proposed method.<br />

MoAT5 Dolmabahçe Hall B<br />

Detection and Segmentation of Audio Signals Regular Session<br />

Session chair: Erdogan, Hakan (Sabanci Univ.)<br />

11:00-11:20, Paper MoAT5.1<br />

Noise-Robust Voice Activity Detector based on Hidden Semi-Markov Models<br />

Liu, Xianglong, Beihang Univ.<br />

Liang, Yuan, Beihang Univ.<br />

Lou, Yihua, Beihang Univ.<br />

Li, He, Beihang Univ.<br />

Shan, Baosong, Beihang Univ.<br />

- 24 -


This paper concentrates on speech duration distributions that are usually invariant to noises and proposes a noise-robust<br />

and real-time voice activity detector (VAD) using the hidden semi-Markov model (HSMM) to explicitly model state durations.<br />

Motivated by statistical observations and tests on TIMIT and the IEEE sentence database, we use Weibull distributions<br />

to model state durations approximately and estimate their parameters by maximum likelihood estimators. The<br />

final VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge and modified<br />

forward variables. An efficient way that recursively calculates modified forward variables is devised and a dynamic adjustment<br />

scheme is used to update parameters. Experiments on noisy speech data show that the proposed method performs<br />

more robustly and accurately than the standard ITU-T G.729B VAD and AMR2.<br />

11:20-11:40, Paper MoAT5.2<br />

Simultaneous Segmentation and Modelling of Signals based on an Equipartition Principle<br />

Panagiotakis, Costas, Univ. of Crete<br />

We propose a general framework for simultaneous segmentation and modelling of signals based on an Equipartition Principle<br />

(EP). According to EP, the signal is divided into segments with equal reconstruction errors by selecting the most<br />

suitable model to describe each segment. In addition, taking into account change detection on signal model an efficient<br />

signal reconstruction is also obtained. The model selection concerns both the kind and the order of the model. The proposed<br />

methodology is very flexible on different error criteria and signal features.<br />

11:40-12:00, Paper MoAT5.3<br />

Voice Activity Detection based on Complex Exponential Atomic Decomposition and Likelihood Ratio Test<br />

Deng, Shiwen, Harbin Inst. of Tech.<br />

Han, Jiqing, Harbin Inst. of Tech.<br />

The voice activity detection (VAD) algorithms by using Discrete Fourier Transform (DFT) coefficients are widely found<br />

in literature. However, some shortcomings for modeling a signal in the DFT can easily degrade the performance of a VAD<br />

in noise environment. To overcome the problem, this paper presents a novel approach by using the complex coefficients<br />

derived from complex exponential atomic decomposition of a signal. Those coefficients are modeled by a complex Gaussian<br />

probability distribution and a statistical model is employed to derive the decision rule from the likelihood ratio test.<br />

According to the experimental results, the proposed VAD method shows better performance than the VAD based on DFT<br />

coefficients in various noise environments.<br />

12:00-12:20, Paper MoAT5.4<br />

Speaker Change Detection based on the Pairwise Distance Matrix<br />

Seo, Jin S., Gangneung-Wonju National Univ.<br />

Speaker change detection is most commonly done by statistically determining whether the two adjacent segments of a<br />

speech stream are significantly different or not. In this paper, we propose a novel method to detect speaker change points<br />

based on the minimum statistics of the pairwise distance matrix of feature vectors. The use of the minimum statistics<br />

makes it possible to compare between the similar acoustic groups, which is effective in suppressing the phonetic variation.<br />

Experimental results showed that the proposed method is promising for speech change detection problem.<br />

12:20-12:40, Paper MoAT5.5<br />

Real-Time User Position Estimation in Indoor Environments using Digital Watermarking for Audio Signals<br />

Kaneto, Ryosuke, Osaka Univ.<br />

Nakashima, Yuta, Osaka Univ.<br />

Babaguchi, Noboru, Osaka Univ.<br />

In this paper, we propose a method for estimating the user position where a user is holding a microphone in an indoor environment<br />

using digital watermarking for audio signals. The proposed method utilizes detection strengths, which are calculated<br />

while detecting spread-spectrum-based watermarks. Taking into account delays and attenuation of the watermarked<br />

signals emitted from multiple loudspeakers and other factors, we construct a model of detection strengths. The user position<br />

is estimated in real-time using the model. The experimental results indicate that the user positions are estimated with 1.3<br />

m of root mean squared error on average for the case where the user is static. We demonstrate that the proposed method<br />

successfully estimates the user position even when the user moves.<br />

- 25 -


MoAT6 Topkapı Hall B<br />

Human Computer Interaction Regular Session<br />

Session chair: Drygajlo, Andrzej (EPFL)<br />

11:00-11:20, Paper MoAT6.1<br />

Gaze Probing: Event-Based Estimation of Objects being Focused On<br />

Yonetani, Ryo, Kyoto Univ.<br />

Kawashima, Hiroaki, Kyoto Univ.<br />

Hirayama, Takatsugu, Kyoto Univ.<br />

Matsuyama, Takashi, Kyoto Univ.<br />

We propose a novel method to estimate the object that a user is focusing on by using the synchronization between the<br />

movements of objects and a user’s eyes as a cue. We first design an event as a characteristic motion pattern, and we then<br />

embed it within the movement of each object. Since the user’s ocular reactions to these events are easily detected using a<br />

passive camera-based eye tracker, we can successfully estimate the object that the user is focusing on as the one whose<br />

movement is most synchronized with the user’s eye reaction. Experimental results obtained from the application of this<br />

system to dynamic content (consisting of scrolling images) demonstrate the effectiveness of the proposed method over<br />

existing methods.<br />

11:20-11:40, Paper MoAT6.2<br />

A Covariate Shift Minimisation Method to Alleviate Non-Stationarity Effects for an Adaptive Brain-Computer Interface<br />

Satti, Abdul Rehman, Univ. of Ulster<br />

Guan, Cuntai, Inst. For Infocomm Res.<br />

Coyle, Damien, Univ. of Ulster<br />

Prasad, Girijesh, Univ. of Ulster<br />

The non-stationary nature of the electroencephalogram (EEG) poses a major challenge for the successful operation of a<br />

brain-computer interface (BCI) when deployed over multiple sessions. The changes between the early training measurements<br />

and the proceeding multiple sessions can originate as a result of alterations in the subject’s brain process, new<br />

cortical activities, change of recording conditions and/or change of operation strategies by the subject. These differences<br />

and alterations over multiple sessions cause deterioration in BCI system performance if periodic or continuous adaptation<br />

to the signal processing is not carried out. In this work, the covariate shift is analyzed over multiple sessions to determine<br />

the non-stationarity effects and an unsupervised adaptation approach is employed to account for the degrading effects this<br />

might have on performance. To improve the system’s online performance, we propose a covariate shift minimization<br />

(CSM) method, which takes into account the distribution shift in the feature set domain to reduce the feature set overlap<br />

and unbalance for different classes. The analysis and the results demonstrate the importance of CSM, as this method not<br />

only improves the accuracy of the system, but also reduces the classification unbalance for different classes by a significant<br />

amount.<br />

11:40-12:00, Paper MoAT6.3<br />

A Probabilistic Language Model for Hand Drawings<br />

Akce, Abdullah, Univ. of Illinois at Urbana-Champaign<br />

Bretl, Timothy, Univ. of Illinois at Urbana-Champaign<br />

Probabilistic language models are critical to applications in natural language processing that include speech recognition,<br />

optical character recognition, and interfaces for text entry. In this paper, we present a systematic way to learn a similar<br />

type of probabilistic language model for hand drawings from a database of existing artwork by representing each stroke<br />

as a sequence of symbols. First, we propose a language in which the symbols are circular arcs with length fixed by a scale<br />

parameter and with curvature chosen from a fixed low-cardinality set. Then, we apply an algorithm based on dynamic<br />

programming to represent each stroke of the drawing as a sequence of symbols from our alphabet. Finally, we learn the<br />

probabilistic language model by constructing a Markov model. We compute the entropy of our language in a test set as<br />

measured by the expected number of bits required for each symbol. Our language model might be applied in future work<br />

to create a drawing interface for noisy and low-bandwidth input devices, for example an electroencephalograph (EEG)<br />

that admits one binary command per second. The results indicate that by leveraging our language model, the performance<br />

of such an interface would be enhanced by about 20 percent.<br />

- 26 -


12:00-12:20, Paper MoAT6.4<br />

AR-PCA-HMM Approach for Sensorimotor Task Classification in EEG-Based Brain-Computer Interfaces<br />

Argunsah, Ali Ozgur, Inst. Gulbenkian de Ciencia<br />

Cetin, Mujdat, Sabanci Univ.<br />

We propose an approach based on Hidden Markov models (HMMs) combined with principal component analysis (PCA)<br />

for classification of four-class single trial motor imagery EEG data for brain computer interfacing (BCI) purposes. We extract<br />

autoregressive (AR) parameters from EEG data and use PCA to decrease the number of features for better training<br />

of HMMs. We present experimental results demonstrating the improvements provided by our approach over an existing<br />

HMM-based EEG single trial classification approach as well as over state-of-the-art classification methods.<br />

12:20-12:40, Paper MoAT6.5<br />

Design, Implementation and Evaluation of a Real-Time P300-Based Brain-Computer Interface System<br />

Amcalar, Armagan, Sabanci Univ.<br />

Cetin, Mujdat, Sabanci Univ.<br />

We present a new end-to-end brain-computer interface system based on electroencephalography (EEG). Our system exploits<br />

the P300 signal in the brain, a positive deflection in event-related potentials, caused by rare events. P300 can be<br />

used for various tasks, perhaps the most well-known being a spelling device. We have designed a flexible visual stimulus<br />

mechanism that can be adapted to user preferences and developed and implemented EEG signal processing, learning and<br />

classification algorithms. Our classifier is based on Bayes linear discriminant analysis, in which we have explored various<br />

choices and improvements. We have designed data collection experiments for offline and online decision-making and<br />

have proposed modifications in the stimulus and decision-making procedure to increase online efficiency. We have evaluated<br />

the performance of our system on 8 healthy subjects on a spelling task and have observed that our system achieves<br />

higher average speed than state-of-the-art systems reported in the literature for a given classification accuracy.<br />

MoAT7 Dolmabahçe Hall C<br />

Video Classification and Retrieval Regular Session<br />

Session chair: Sarkar, Sudeep (Univ. of South Florida)<br />

11:00-11:20, Paper MoAT7.1<br />

Motion-Sketch based Video Retrieval using a Trellis Levenshtein Distance<br />

Hu, Rui, Univ. of Surrey<br />

Collomosse, John Philip, Univ. of Surrey<br />

We present a fast technique for retrieving video clips using free-hand sketched queries. Visual keypoints within each video<br />

are detected and tracked to form short trajectories, which are clustered to form a set of space-time tokens summarising<br />

video content. A Viterbi process matches a space-time graph of tokens to a description of colour and motion extracted<br />

from the query sketch. Inaccuracies in the sketched query are ameliorated by computing path cost using a Levenshtein<br />

(edit) distance. We evaluate over datasets of sports footage.<br />

11:20-11:40, Paper MoAT7.2<br />

Extracting Key Sub-Trajectory Features for Supervised Tactic Detection in Sports Video<br />

Zhang, Yi, Chinese Acad. of Sciences<br />

Xu, Changsheng, Chinese Acad. of Sciences<br />

Lu, Hanqing, Chinese Acad. of Sciences<br />

Tactic analysis is receiving more attention in sports video analysis for its assistance to coaches and players. This paper<br />

proposes an efficient key sub-trajectory feature representation of ball trajectory for tactic analysis. Ball trajectories are<br />

modeled with the generalized suffix tree where frequent sub-trajectory patterns are searched for. Key sub-trajectory patterns<br />

are extracted by further filtering these frequent sub-trajectory patterns. Instead of directly using individual sub-trajectories<br />

as features to train tactic detectors, we take key sub-trajectory patterns as a whole. Key sub-trajectory feature representation<br />

effectively removes noise, reduces the dimension of features, and improves the performance of supervised learning to<br />

detect tactics.<br />

- 27 -


11:40-12:00, Paper MoAT7.3<br />

A New Symmetry based on Proximity of Wavelet-Moments for Text Frame Classification in Video<br />

Palaiahnakote, Shivakumara, National Univ. of Singapore<br />

Dutta, Anjan, Univ. Autonoma de Barcelona<br />

Tan, Chew-Lim, National Univ. of Singapore<br />

Pal, Umapada, Indian Statistical Inst.<br />

This paper proposes the use of a new symmetry property based on proximity of the median moments in the wavelet domain.<br />

The method divides a given frame into 16 equally sized blocks to classify the true text frame. The average of high frequency<br />

subbands of a block is used for computing median moments to brighten the text pixel in a block of video frame. Then Kmeans<br />

clustering with K=2 is applied on the median moments of the block to classify it as a probable text block. For classified<br />

blocks, average wavelet median moments are computed for a sliding window. We introduce Max-Min cluster to<br />

classify the probable text pixel in each probable text block. The four quadrants are formed from the centroid of the probable<br />

text pixels. The new concept called symmetry is introduced to identify the true text block based on proximity between<br />

probable text pixels in each quadrant. If the frame produces at least one true text block, it is considered as a text frame<br />

otherwise a non-text frame. The method is tested on three datasets to evaluate the robustness of the method in classification<br />

of text frames in terms of recall and precision.<br />

12:00-12:20, Paper MoAT7.4<br />

Edge based Binarization for Video Text Images<br />

Zhou, Zhiwei, National Univ. of Singapore<br />

Li, Linlin, Univ. of Singapore<br />

Tan, Chew-Lim, National Univ. of Singapore<br />

This paper introduces a binarization method based on edge for video text images, especially for images with complex<br />

background or low contrast. The binarization method first detects the contour of the text, and utilizes a local thresholding<br />

method to decide the inner side of the contour, and then fills up the contour to form characters that are recognizable to<br />

OCR software. Experiment results show that our method is especially effective on complex background and low contrast<br />

images.<br />

12:20-12:40, Paper MoAT7.5<br />

Detecting Group Turn Patterns in Conversations using Audio-Video Change Scale-Space<br />

Krishnan, Ravikiran, Univ. of South Florida<br />

Sarkar, Sudeep, Univ. of South Florida<br />

Automatic analysis of conversations is important for extracting high-level descriptions of meetings. In this work, as an alternative<br />

to linguistic approaches, we develop a novel, purely bottom-up representation, constructed from both audio and<br />

video signals that help us characterize and build a rich description of the content at multiple temporal scales. We consider<br />

the evolution of the detected change, using Bayesian Information Criterion (BIC) at multiple temporal scales to build an<br />

audio-visual change scale space. Peaks detected in this representation, yields group-turn based conversational changes at<br />

different temporal scales. Conversation overlaps, changes and their inferred models offer an intermediate-level description<br />

of meeting videos that can be useful in summarization and indexing of meetings. Results on NIST meeting room dataset<br />

showed a true positive rate of 88%<br />

14:00-15:00, MoP2L1 Anadolu Auditorium<br />

Embracing Uncertainty: The New Machine Intelligence<br />

Christopher M. Bishop Plenary Session<br />

Microsoft Research Cambridge, UK<br />

Professor Chris Bishop is Chief Research Scientist at Microsoft Research Cambridge. He also has a Chair in computer<br />

science at the University of Edinburgh, and is a Fellow of Darwin College Cambridge. Chris is the author of the leading<br />

text<strong>book</strong> “Pattern Recognition and Machine Learning” (Springer, 2006). His research interests include probabilistic approaches<br />

to machine learning, as well as their application to fields such as biomedical sciences and healthcare.<br />

The first successful applications of machine intelligence were based on expert systems constructed using rules elicited<br />

from human experts. Limitations in the applicability of this approach helped drive the second generation of machine intelligence<br />

methods, as typified by neural as elin and as eli vector machines, which can be characterised as black-box sta-<br />

- 28 -


tistical models fitted to large data sets. In this talk I will describe a new paradigm for machine intelligence, based on probabilistic<br />

graphical models, which has emerged over the last five years and which allows strong prior knowledge from<br />

domain experts to be combined with machine learning techniques to enable a new generation of large-scale applications.<br />

The talk will be illustrated with tutorial examples as well as real-world case studies.<br />

MoBT1 Marmara Hall<br />

Tracking and Surveillance – I Regular Session<br />

Session chair: Goldgof, Dmitry (Univ of South Florida)<br />

15:30-15:50, Paper MoBT1.1<br />

Improved Shadow Removal for Robust Person Tracking in Surveillance Scenarios<br />

Sanin, Andres, NICTA<br />

Sanderson, Conrad, NICTA<br />

Lovell, Brian Carrington, The Univ. of Queensland<br />

Shadow detection and removal is an important step employed after foreground detection, in order to improve the segmentation<br />

of objects for tracking. Methods reported in the literature typically have a significant trade-off between the shadow<br />

detection rate (classifying true shadow areas as shadows) and the shadow discrimination rate (discrimination between<br />

shadows and foreground). We propose a method that is able to achieve good performance in both cases, leading to improved<br />

tracking in surveillance scenarios. Chromacity information is first used to create a mask of candidate shadow pixels, followed<br />

by employing gradient information to remove foreground pixels that were incorrectly included in the mask. Experiments<br />

on the CAVIAR dataset indicate that the proposed method leads to considerable improvements in multiple object<br />

tracking precision and accuracy.<br />

15:50-16:10, Paper MoBT1.2<br />

Multi-Cue Integration for Multi-Camera Tracking<br />

Chen, Kuan-Wen, National Taiwan Univ.<br />

Hung, Yi-Ping, National Taiwan Univ.<br />

For target tracking across multiple cameras with disjoint views, previous works usually employed multiple cues and<br />

focused on learning a better matching model of each cue, separately. However, none of them had discussed how to integrate<br />

these cues to improve performance, to our best knowledge. In this paper, we look into the multi-cue integration problem<br />

and propose an unsupervised learning method since a complicated training phase is not always viable. In the experiments,<br />

we evaluate several types of score fusion methods and show that our approach learns well and can be applied to large<br />

camera networks more easily.<br />

16:10-16:30, Paper MoBT1.3<br />

Learning Pedestrian Trajectories with Kernels<br />

Ricci, Elisa, Fondazione Bruno Kessler<br />

Tobia, Francesco, Fondazione Bruno Kessler<br />

Zen, Gloria, Fondazione Bruno Kessler<br />

We present a novel method for learning pedestrian trajectories which is able to describe complex motion patterns such as<br />

multiple crossing paths. This approach adopts Kernel Canonical Correlation Analysis (KCCA) to build a mapping between<br />

the physical location space and the trajectory patterns space. To model crossing paths we rely on a clustering algorithm<br />

based on Kernel K-means with a Dynamic Time Warping (DTW) kernel. We demonstrate the effectiveness of our method<br />

incorporating the learned motion model into a multi-person tracking algorithm and testing it on several video surveillance<br />

sequences.<br />

16:30-16:50, Paper MoBT1.4<br />

Bag of Features Tracking<br />

Yang, Fan, Dalian Univ. of Tech.<br />

Lu, Hu-Chuan, Dalian Univ. of Tech.<br />

Chen, Yen-Wei, Ritsumeikan Univ.<br />

- 29 -


In this paper, we propose a visual tracking approach based on “bag of features” (BoF) algorithm. We randomly sample<br />

image patches within the object region in training frames for constructing two code<strong>book</strong>s using RGB and LBP features,<br />

instead of only one code<strong>book</strong> in traditional BoF. Tracking is accomplished by searching for the highest similarity between<br />

candidates and code<strong>book</strong>s. Besides, updating mechanism and result refinement scheme are included in BoF tracking. We<br />

fuse patch-based approach and global template-based approach into a unified framework. Experiments demonstrate that<br />

our approach is robust in handling occlusion, scaling and rotation.<br />

16:50-17:10, Paper MoBT1.5<br />

Gradient Constraints Can Improve Displacement Expert Performance<br />

Tresadern, Philip Andrew, Univ. of Manchester<br />

Cootes, Tim, The Univ. of Manchester<br />

The `displacement expert’ has recently proven popular for rapid tracking applications. In this paper, we note that experts<br />

are typically constrained only to produce approximately correct parameter updates at training locations. However, we<br />

show that incorporating constraints on the gradient of the displacement field within the learning framework results in an<br />

expert with better convergence and fewer local minima. We demonstrate this proposal for facial feature localization in<br />

static images and object tracking over a sequence.<br />

MoBT2 Topkapı Hall B<br />

Dimensionality Reduction Regular Session<br />

Session chair: Somol, Petr (Institute of Information Theory and Automation)<br />

15:30-15:50, Paper MoBT2.1<br />

Temporal Extension of Laplacian Eigenmaps for Unsupervised Dimensionality Reduction of Time Series<br />

Lewandowski, Michal, Kingston Univ.<br />

Martinez-Del-Rincon, Jesus, Kingston Univ.<br />

Makris, Dimitrios, Kingston Univ.<br />

Nebel, Jean-Christophe, Kingston Univ.<br />

A novel non-linear dimensionality reduction method, called Temporal Laplacian Eigenmaps, is introduced to process efficiently<br />

time series data. In this embedded-based approach, temporal information is intrinsic to the objective function,<br />

which produces description of low dimensional spaces with time coherence between data points. Since the proposed<br />

scheme also includes bidirectional mapping between data and embedded spaces and automatic tuning of key parameters,<br />

it offers the same benefits as mapping-based approaches. Experiments on a couple of computer vision applications demonstrate<br />

the superiority of the new approach to other dimensionality reduction method in term of accuracy. Moreover, its<br />

lower computational cost and generalisation abilities suggest it is scalable to larger datasets.<br />

15:50-16:10, Paper MoBT2.2<br />

Orthogonal Locality Sensitive Fuzzy Discriminant Analysis in Sleep-Stage Scoring<br />

Khushaba, Rami N., Univ. of Tech. Sydney<br />

Elliott, Rosalind, Univ. of Tech. Sydney<br />

Alsukker, Akram, Univ. of Tech. Sydney<br />

Al-Ani, Ahmed, Univ. of Tech. Sydney<br />

Mckinley, Sharon, Univ. of Tech. Sydney<br />

Sleep-stage scoring plays an important role in analyzing the sleep patterns of people. Studies have revealed that Intensive<br />

Care Unit (ICU) patients do not usually get enough quality sleep, and hence, analyzing their sleep patterns is of increased<br />

importance. Due to the fact that sleep data are usually collected from a number of Electroencephalogram (EEG), Electromyogram<br />

(EMG) and Electrooculography (EOG) channels, the feature set size can become large, which may affect the<br />

development of on-line scoring systems. Hence, a dimensionality reduction step is needed. One of the powerful dimensionality<br />

reduction approaches is based on the concept of Linear Discriminant Analysis (LDA). Unlike existing variants<br />

of LDA, this paper presents a new method that considers the fuzzy nature of input measurements while preserving their<br />

local structure. Practical results indicate the significance of preserving the local structure of sleep data, which is achieved<br />

by the proposed method, and hence attaining superior results to other dimensionality reduction methods.<br />

- 30 -


16:10-16:30, Paper MoBT2.3<br />

A Recursive Online Kernel PCA Algorithm<br />

Hasanbelliu, Erion, Univ. of Florida<br />

Sanchez-Giraldo, Luis Gonzalo, Univ. of Florida<br />

Principe, Jose, Univ. of Florida<br />

In this paper, we describe a new method for performing kernel principal component analysis which is online and also has<br />

a fast convergence rate. The method follows the Rayleigh quotient to obtain a fixed point update rule to extract the leading<br />

eigenvalue and eigenvector. Online deflation is used to estimate the remaining components. These operations are performed<br />

in reproducing kernel Hilbert space (RKHS) with linear order memory and computation complexity. The derivation of the<br />

method and several applications are presented.<br />

16:30-16:50, Paper MoBT2.4<br />

Effective Dimensionality Reduction based on Support Vector Machine<br />

Moon, Sangwoo, Univ. of Tennessee<br />

Qi, Hairong, Univ. of Tennessee<br />

This paper presents an effective dimensionality reduction method based on support vector machine. By utilizing mapping<br />

vectors from support vector machine for dimensionality reduction purpose, we obtain features which are computationally<br />

efficient, providing high classification accuracy and robustness especially in noisy environment. These characteristics are<br />

acquired from the generalization capability of support vector machine by minimizing the structural risk. To further reduce<br />

dimensionality, this paper introduces the redundancy removal process based on an asymmetric I relation measure with<br />

kernel function. Experimental results show that the proposed dimensionality reduction method provides the most appropriate<br />

trade off between classification accuracy and robustness in relatively low dimensional space.<br />

16:50-17:10, Paper MoBT2.5<br />

Prototype Selection for Dissimilarity Representation by a Genetic Algorithm<br />

Plasencia, Yenisel, CENATAV, Cuba<br />

Garcia, Edel, Advanced Tech. Application Center<br />

Orozco-Alzate, Mauricio, Univ. Nacional de Colombia Sede Manizales, Colombia<br />

Duin, Robert, TU Delft<br />

Dissimilarities can be a powerful way to represent objects like strings, graphs and images for which it is difficult to find<br />

good features. The resulting dissimilarity space may be used to train any classifier appropriate for feature spaces. There is,<br />

however, a strong need for dimension reduction. Straightforward procedures for prototype selection as well as feature selection<br />

have been used for this in the past. Complicated sets of objects may need more advanced procedures to overcome local minima.<br />

In this paper it is shown that genetic algorithms, previously used for feature selection, may be used for building good<br />

dissimilarity spaces as well, especially when small sets of prototypes are needed for computational reasons.<br />

MoBT3 Topkapı Hall A<br />

Motion and Multiple-View Vision – II Regular Session<br />

Session chair: Torsello, Andrea (Univ. Ca’ Foscari)<br />

15:30-15:50, Paper MoBT3.1<br />

Multiple View Geometry for Non-Rigid Motions Viewed from Curvilinear Motion Projective Cameras<br />

Wan, Cheng, Nagoya Inst. of Tech.<br />

Sato, Jun, Nagoya Inst. of Tech.<br />

This paper presents a tensorial representation of multiple projective cameras with arbitrary curvilinear motions. It enables<br />

us to define multilinear relationship of image points derived from non-rigid object motions viewed from multiple cameras<br />

with arbitrary curvilinear motions. We show the new multilinear relationship is useful for generating images of non-rigid<br />

object motions viewed from cameras with arbitrary curvilinear motions. The method is tested in real image sequences.<br />

- 31 -


15:50-16:10, Paper MoBT3.2<br />

Estimating Nonrigid Shape Deformation using Moments<br />

Liu, Wei, Florida Inst. of Tech.<br />

Ribeiro, Eraldo, Florida Inst. of Tech.<br />

Image moments have been widely used for designing robust shape descriptors that are invariant to rigid transformations.<br />

In this work, we address the problem of estimating non-rigid deformation fields based on image moment variations. By<br />

using a single family of polynomials to both parameterize the deformation field and to define image moments, we can<br />

represent image moments variation as a system of quadratic functions, and solve for the deformation parameters. As a<br />

result, we can recover the deformation field between two images without solving the correspondence problem. Additionally,<br />

our method is highly robust to image noise. The method was tested on both synthetically deformed MPEG-7 shapes and<br />

cardiac MRI sequences.<br />

16:10-16:30, Paper MoBT3.3<br />

Optical Flow Estimation using Diffusion Distances<br />

Wartak, Szymon, Univ. of York<br />

Bors, Adrian, Univ. of York<br />

In this paper we apply the diffusion framework to dense optical flow estimation. Local image information is represented<br />

by matrices of gradients between paired locations. Diffusion distances are modelled as sums of eigenvectors weighted by<br />

their eigenvalues extracted following the eigen decomposion of these matrices. Local optical flow is estimated by correlating<br />

diffusion distances characterizing features from different frames. A feature confidence factor is defined based on<br />

the local correlation efficiency when compared to that of its neighbourhood. High confidence optical flow estimates are<br />

propagated to areas of lower confidence.<br />

16:30-16:50, Paper MoBT3.4<br />

Novel Multi View Structure Estimation based on Barycentric Coordinates<br />

Ruether, Matthias, Graz Univ. of Tech.<br />

Bischof, Horst, Graz Univ. of Tech.<br />

Traditionally, multi-view stereo algorithms estimate three-dimensional structure from corresponding points by linear triangulation<br />

or bundle-adjustment. This introduces systematic errors in case of inaccurate camera calibration and partial<br />

occlusion. The errors are not negligible in applications requiring high accuracy like micro-metrology or quality inspection.<br />

We show how accuracy of structure estimation can be significantly increased by using a barycentric coordinate representation<br />

for central perspective projection. Experiments show a reduction of geometric error by 50% compared with bundle<br />

adjustment. The error remains almost constantly low, even under partial occlusion.<br />

16:50-17:10, Paper MoBT3.5<br />

Estimation of Non-Rigid Surface Deformation using Developable Surface Model<br />

Watanabe, Yoshihiro, Univ. of Tokyo<br />

Nakashima, Takashi, Univ. of Tokyo<br />

Komuro, Takashi, Univ. of Tokyo<br />

Ishikawa, Masatoshi, Univ. of Tokyo<br />

There is a strong demand for a method of acquiring a non-rigid shape under deformation with high accuracy and high resolution.<br />

However, this is difficult to achieve because of performance limitations in measurement hardware. In this paper,<br />

we propose a model based method for estimating non-rigid deformation of a developable surface. The model is based on<br />

geometric characteristics of the surface, which are important in various applications. This method improves the accuracy<br />

of surface estimation and planar development from a low-resolution point cloud. Experiments using curved documents<br />

showed the effectiveness of the proposed method.<br />

- 32 -


MoBT4 Dolmabahçe Hall A<br />

Ocular Biometrics Regular Session<br />

Session chair: Zhang, David (The Hong Kong Polytechnic Univ.)<br />

15:30-15:50, Paper MoBT4.1<br />

On the Fusion of Periocular and Iris Biometrics in Non-Ideal Imagery<br />

Woodard, Damon, Clemson Univ.<br />

Pundlik, Shrinivas, Clemson Univ.<br />

Miller, Philip, Clemson Univ.<br />

Jillela, Raghavender, West Virginia Univ.<br />

Ross, Arun, West Virginia Univ.<br />

Human recognition based on the iris biometric is severely impacted when encountering non-ideal images of the eye characterized<br />

by occluded irises, motion and spatial blur, poor contrast, and illumination artifacts. This paper discusses the<br />

use of the periocular region surrounding the iris, along with the iris texture patterns, in order to improve the overall recognition<br />

performance in such images. Periocular texture is extracted from a small, fixed region of the skin surrounding the<br />

eye. Experiments on the images extracted from the Near Infra-Red (NIR) face videos of the Multi Biometric Grand Challenge<br />

(MBGC) dataset demonstrate that valuable information is contained in the periocular region and it can be fused with<br />

the iris texture to improve the overall identification accuracy in non-ideal situations.<br />

15:50-16:10, Paper MoBT4.2<br />

Genetic-Based Type II Feature Extraction for Periocular Biometric Recognition: Less is More<br />

Adams, Joshua, North Carolina A&T Univ.<br />

Woodard, Damon, Clemson Univ.<br />

Dozier, Gerry, North Carolina A&T State Univ.<br />

Miller, Philip, Clemson Univ.<br />

Bryant, Kelvin, North Carolina A&T State Univ.<br />

Glenn, George, North Carolina A&T State Univ.<br />

Given an image from a biometric sensor, it is important for the feature extraction module to extract an original set of<br />

features that can be used for identity recognition. This form of feature extraction has been referred to as Type I feature extraction.<br />

For some biometric systems, Type I feature extraction is used exclusively. However, a second form of feature extraction<br />

does exist and is concerned with optimizing/minimizing the original feature set given by a Type I feature extraction<br />

method. This second form of feature extraction has been referred to as Type II feature extraction (feature selection). In<br />

this paper, we present a genetic-based Type II feature extraction system, referred to as GEFE (Genetic & Evolutionary<br />

Feature Extraction), for optimizing the feature sets returned by Loocal Binary Pattern Type I feature extraction for periocular<br />

biometric recognition. Our results show that not only does GEFE dramatically reduce the number of features needed<br />

but the evolved features sets also have higher recognition rates.<br />

16:10-16:30, Paper MoBT4.3<br />

Multispectral Eye Detection: A Preliminary Study<br />

Whitelam, Cameron, WVU<br />

Jafri, Zain, WVU<br />

Bourlai, Thirimachos, WVU<br />

In this paper the problem of eye detection across three different bands, i.e., the visible, multispectral, and short wave<br />

infrared (SWIR), is studied in order to illustrate the advantages and limitations of multi-band eye localization. The contributions<br />

of this work are two-fold. First, a multi-band database of 30 subjects is assembled and used to illustrate the<br />

challenges associated with the problem. Second, a set of experiments is performed in order to demonstrate the possibility<br />

for multi-band eye detection. Experiments show that the eyes on face images captured under different bands can be detected<br />

with promising results. Finally, we illustrate that recognition performance in all studied bands is favorably affected by the<br />

geometric normalization of raw face images that is based on our proposed detection methodology. To the best of our<br />

knowledge this is the first time that this problem is being investigated in the open literature in the context of human eye<br />

localization across different bands.<br />

- 33 -


16:30-16:50, Paper MoBT4.4<br />

Entropy of Feature Point-Based Retina Templates<br />

Jeffers, Jason, RMIT Univ.<br />

Arakala, Arathi, RMIT Univ.<br />

Horadam, Kathy, RMIT Univ.<br />

This paper studies the amount of distinctive information contained in a privacy protecting and compact template of a<br />

retinal image created from the locations of crossings and bifurcations in the choroidal vasculature, otherwise called feature<br />

points. Using a training set of 20 different retina, we build a template generator that simulates one million imposter comparisons<br />

and computes the number of imposter retina comparisons that successfully matched at various thresholds. The<br />

template entropy thus computed was used to validate a theoretical model of imposter comparisons. The simulator and the<br />

model both estimate that 20 bits of entropy can be achieved by the feature point-based template. Our results reveal the<br />

distinctiveness of feature point-based retinal templates, hence establishing their potential as a biometric identifier for high<br />

security and memory intensive applications.<br />

16:50-17:10, Paper MoBT4.5<br />

Hierarchical Fusion of Face and Iris for Personal Identification<br />

Zhang, Xiaobo, Chinese Acad. of Sciences<br />

Sun, Zhenan, Chinese Acad. of Sciences<br />

Tan, Tieniu, Chinese Acad. of Sciences<br />

Most existing face and iris fusion schemes are concerned about improving performance on good quality images under<br />

controlled environments. In this paper, we propose a hierarchical fusion scheme for low quality images under uncontrolled<br />

situations. In the training stage, canonical correlation analysis (CCA) is adopted to construct a statistical mapping from<br />

face to iris in pixel level. In the testing stage, firstly the probe face image is used to obtain a subset of candidate gallery<br />

samples via regression between the probe face and gallery irises, then ordinal representation and sparse representation are<br />

performed on these candidate samples for iris recognition and face recognition respectively. Finally, score level fusion via<br />

min-max normalization is performed to make final decision. Experimental results on our low quality database show the<br />

outperforming performance of proposed method.<br />

MoBT5 Anadolu Auditorium<br />

Image Analysis – II Regular Session<br />

Session chair: Mirmehdi, Majid (Univ. of Bristol)<br />

15:30-15:50, Paper MoBT5.1<br />

Wavelet-Based Texture Retrieval using a Mixture of Generalized Gaussian Distributions<br />

Allili, Mohand Said, Univ. du Québec en Outaouais<br />

In this paper, we address the texture retrieval problem using wavelet distribution. We propose a new statistical scheme to<br />

represent the marginal distribution of the wavelet coefficients using a mixture of generalized Gaussian distributions<br />

(MoGG). The MoGG allows to capture a wide range of histogram shapes, which provides a better description of texture<br />

and enhances texture discrimination. We propose a similarity measurement based on Kullback-Leibler distance (KLD),<br />

which is calculated using MCMC Metropolis-Hastings sampling algorithm. We show that our approach yields better texture<br />

retrieval results than previous methods using only a single probability density function (<strong>pdf</strong>) for wavelet representation,<br />

or texture energy distribution.<br />

15:50-16:10, Paper MoBT5.2<br />

Adaptive Color Curve Models for Image Matting<br />

Cho, Sunyoung, Yonsei Univ.<br />

Byun, Hyeran, Yonsei Univ.<br />

Image matting is the process of extracting a foreground element from a single image with limited user input. To solve the<br />

inherently ill-posed problem, there exist various methods which use specific color model. One representative method assumes<br />

that the colors of the foreground and background elements satisfy the linear color model. The other recent method<br />

considers line-point color model and point-point color model. In this paper we present a new adaptive color curve model<br />

for image matting. We assume that the colors of local region form curve. Based on these pixels in the local region, we<br />

adaptively construct a curve model using quadratic Bezier curve model. This curve model enables us to derive a matting<br />

- 34 -


equation for estimating alphas of pixels forming a curve using quadratic formula. We show that our model estimates alpha<br />

mattes comparable or more accurately than recent existing methods.<br />

16:10-16:30, Paper MoBT5.3<br />

Fast and Accurate Approximation of the Euclidean Opening Function in Arbitrary Dimension<br />

Coeurjolly, David, CNRS – Univ. Claude Bernard Lyon 1<br />

In this paper, we present a fast and accurate approximation of the Euclidean opening function which is a wide-used tool<br />

in morphological mathematics to analyze binary shapes since it allows us to define a local thickness distribution. The proposed<br />

algorithm can be defined in arbitrary dimension thanks to the existing techniques to compute the discrete power diagram.<br />

16:30-16:50, Paper MoBT5.4<br />

Non-Ring Filters for Robust Detection of Linear Structures<br />

Läthén, Gunnar, Linkoping Univ.<br />

Olivier, Olivier Cros, Linköping Univ.<br />

Knutsson, Hans,<br />

Borga, Magnus, Linköping Univ.<br />

Many applications in image analysis include the problem of linear structure detection, e.g. segmentation of blood vessels<br />

in medical images, roads in satellite images, etc. A simple and efficient solution is to apply linear filters tuned to the structures<br />

of interest and extract line and edge positions from the filter output. However, if the filter is not carefully designed,<br />

artifacts such as ringing can distort the results and hinder a robust detection. In this paper, we study the ringing effects<br />

using a common Gabor filter for linear structure detection, and suggest a method for generating non-ring filters in 2D and<br />

3D. The benefits of the non-ring design are motivated by results on both synthetic and natural images.<br />

16:50-17:10, Paper MoBT5.5<br />

Incremental Distance Transforms (IDT)<br />

Schouten, Theo, Radboud Univ. Nijmegen<br />

Van Den Broek, Egon L., Univ. of Twente<br />

A new generic scheme for incremental implementations of distance transforms (DT) is presented: Incremental Distance<br />

Transforms (IDT). This scheme is applied on the city-block, Chamfer, and three recent exact Euclidean DT (E2DT). A<br />

benchmark shows that for all five DT, the incremental implementation results in a significant speedup: 3.4 -10 times. However,<br />

significant differences (i.e., up to 12.5 times) among the DT remain present. The FEED transform, one of the recent<br />

E2DT, even showed to be faster than both city-block and Chamfer DT. So, through a very efficient incremental processing<br />

scheme for DT, a relief is found for E2DT’s computational burden.<br />

MoBT6 Dolmabahçe Hall B<br />

Document Segmentation Regular Session<br />

Session chair: Srihari, Sargur (Univ. at Buffalo)<br />

15:30-15:50, Paper MoBT6.1<br />

Text Separation from Mixed Documents using a Tree-Structured Classifier<br />

Peng, Xujun, State Univ. at Buffalo<br />

Setlur, Srirangaraj, Univ. at Buffalo<br />

Govindaraju, Venu, Univ. at Buffalo<br />

Sitaram, Ramachandrula, HP Lab. India<br />

In this paper, we propose a tree-structured multi-class classifier to identify annotations and overlapping text from machine<br />

printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT)<br />

which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all<br />

training data at each node with different weights. The evaluation of the proposed method is presented on a set of machine<br />

printed documents which have been annotated by multiple writers in an office/collaborative environment.<br />

- 35 -


15:50-16:10, Paper MoBT6.2<br />

Document Segmentation using Pixel-Accurate Ground Truth<br />

An, Chang, Lehigh Univ.<br />

Yin, Dawei, Lehigh Univ.<br />

Baird, Henry, Lehigh Univ.<br />

We compare methodologies for trainable document image content extraction, using a variety of ground-truth policies:<br />

loose, tight, and pixel-accurate. The goal is to achieve pixel-accurate segmentation of document images. Which groundtruth<br />

policy is the best has been debated. ``Loose’’ truth is obtained by sweeping rectangles to enclose entire text blocks<br />

etc, and can be an efficient manual task. ``Tight’’ truth requires more care, and more time, to enclose individual text lines.<br />

Pixel-accurate truth, in which only foreground pixels are labeled, can be obtained by applying the PARC PixLabeler tool;<br />

in our experience this tool was as quick to use as loose truthing. We have compared the accuracy of all three truthing policies,<br />

and report that tight truth supports higher accuracy than loose truth, and pixel-accurate truth yields the highest accuracy.<br />

We have also experimented on morphological expansions on pixel-accurate truth, by expanding sets of foreground<br />

pixels morphologically, and report that expanded pixel-accurate truth supports higher accuracy than pixel-accurate truth.<br />

16:10-16:30, Paper MoBT6.3<br />

An Adaptive Script-Independent Block-Based Text Line Extraction<br />

Ziaratban, Majid, Amirkabir Univ. of Technology<br />

Faez, Karim, Amirkabir Univ. of Technology<br />

In this paper, a novel script-independent block-based text line extraction technique is proposed for multi-skewed document<br />

images. Three parameters are defined to adopt the method with various writings. Extensive experiments on different<br />

datasets demonstrate that the proposed algorithm outperforms previous methods.<br />

16:30-16:50, Paper MoBT6.4<br />

Automated Quality Assurance for Document Logical Analysis<br />

Meunier, Jean-Luc, XRCE<br />

We consider here the general problem of converting documents available in print-ready or image format into a structured<br />

format that reflects the logical structure of the document. One aspect of the problem involves reconstructing conventional<br />

constructs such as titles, headings, captions, footnotes, etc. In practice, another important aspect involves putting in place<br />

some automated Quality Assessment (QA) method. We propose here a method to automate the QA in the case of a homogeneous<br />

collection by considering multiple documents at once instead of focusing only on the document being processed.<br />

16:50-17:10, Paper MoBT6.5<br />

The PAGE (Page Analysis and Ground-Truth Elements) Format Framework<br />

Pletschacher, Stefan, Univ. of Salford<br />

Antonacopoulos, Apostolos, Univ. of Salford<br />

There is a plethora of established and proposed document representation formats but none that can adequately support individual<br />

stages within an entire sequence of document image analysis methods (from document image enhancement to<br />

layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation<br />

framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections,<br />

binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation<br />

of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications<br />

such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition<br />

series.<br />

MoBT7 Dolmabahçe Hall C<br />

Computer Aided Detection and Diagnosis Regular Session<br />

Session chair: Unal, Gozde (Sabanci Univ.)<br />

- 36 -


15:30-15:50, Paper MoBT7.1<br />

Dyslexia Diagnostics by Centerline-Based Shape Analysis of the Corpus Callosum<br />

Elnakib, Ahmed, Univ. of Louisville<br />

El-Baz, Ayman, Univ. of Auckland<br />

Casanova, Manuel, Univ. of Louisville<br />

Switala, Andrew, Univ. of Louisville<br />

Dyslexia severely impairs learning abilities, so that improved diagnostic methods are called for. Neuropathological studies<br />

have revealed abnormal anatomy of the Corpus Callosum (CC) in dyslexic brains. We explore a possibility of distinguishing<br />

between dyslexic and normal (control brains by quantitative CC shape analysis in 3D magnetic resonance images (MRI).<br />

Our approach consists of the three steps: (I) segmenting the CC from a given 3D MRI using the learned CC shape and<br />

visual appearance; (ii) extracting the centerline of the CC; and (iii) classifying the subject as dyslexic or normal based on<br />

the estimated length of the CC centerline using a _-nearest neighbor classifier. Experiments revealed significant differences<br />

(at the 95% confidence level) between the CC centerlines for 14 normal and 16 dyslexic subjects. Our initial classification<br />

suggests the proposed centerline-based shape analysis of the CC is a promising supplement to the current dyslexia diagnostics.<br />

15:50-16:10, Paper MoBT7.2<br />

A Probabilistic Information Fusion Approach to MR-Based Automated Diagnosis of Dementia<br />

Akgul, Ceyhun Burak, Vistek Machine Vision and Automation<br />

Ekin, Ahmet, Philips Res. Europe<br />

In this work, we present a probabilistic information fusion approach for the diagnosis of dementia from cross-sectional<br />

magnetic resonance (MR) images. The approach relies on first mapping the outputs of a support vector classifier (SVM)<br />

trained on image features to probabilities and then on combining these probabilities with the class-conditional distributions<br />

of neuropsychiatric test scores, such as the mini-mental state examination (MMSE). The SVM classifier is trained and<br />

tested on 121 subjects drawn from the Open Access Series of Imaging Studies (OASIS) database. Two independent sets<br />

of MMSE related statistics are estimated from data, one from the training set in OASIS and the other from the Alzheimer’s<br />

Disease Neuroimaging Initiative (ADNI) database. The probabilistic fusion of image-based SVM decisions with no visual<br />

MMSE information exhibits very steep receiver operating characteristic curves on the test set; giving, at the equal error<br />

rate operating point, 92% accuracy.<br />

16:10-16:30, Paper MoBT7.3<br />

Two-Level Algorithm for MCs Detection in Mammograms using Diverse-Adaboost-SVM<br />

Harirchi, Farshad, K. N. Toosi Univ. of Tech.<br />

Radparvar, Parham, K. N. Toosi Univ. of Tech.<br />

Abrishami Moghaddam, Hamid, K. N. Toosi Univ. of Tech.<br />

Dehghan, Faramarz, K. N. Toosi Univ. of Tech.<br />

Giti, Masoumeh, Tehran Univ. of Medical Sciences<br />

Clustered micro calcifications (MCs) are one of the early signs of breast cancer. In this paper, we propose a new computer<br />

aided diagnosis (CAD) system for automatic detection of MCs in two steps. First, pixels corresponding to potential micro<br />

calcifications are found using a multilayer feed-forward neural network. The input of this network consists of 4 wavelet<br />

and 2 gray-level features. The output of the network is then transformed to potential micro calcification objects using<br />

spatial 4-point connectivity. Second, we extract 25 features from the potential MC objects and use Diverse Adaboost SVM<br />

(DA-SVM) and 3 other classifiers to detect individual MCs. A free-response operating characteristics (FROC) curve issued<br />

to evaluate the performance of the CAD system. The 90.44% mean TP detection rate is achieved at the cost of 1.043 FP<br />

per image by using DA-SVM shows a quite satisfactory detection performance of CAD system.<br />

16:30-16:50, Paper MoBT7.4<br />

An Image Analysis Approach for Detecting Malignant Cells in Digitized H&E-Stained Histology Images of Follicular<br />

Lymphoma<br />

Sertel, Olcay, The Ohio State Univ.<br />

Catalyurek, Umit, The Ohio State Univ.<br />

Lozanski, Gerard, The Ohio State Univ.<br />

Shana’Ah, Arwa, The Ohio State Univ.<br />

- 37 -


Gurcan, Metin, The Ohio State Univ.<br />

The gold standard in follicular lymphoma (FL) diagnosis and prognosis is histopathological examination of tumor tissue<br />

samples. However, the qualitative manual evaluation is tedious and subject to considerable inter- and intra-reader variations.<br />

In this study, we propose an image analysis system for quantitative evaluation of digitized FL tissue slides. The developed<br />

system uses a robust feature space analysis method, namely the mean shift algorithm followed by a hierarchical grouping<br />

to segment a given tissue image into basic cytological components. We then apply further morphological operations to<br />

achieve the segmentation of individual cells. Finally, we generate a likelihood measure to detect candidate cancer cells<br />

using a set of clinically driven features. The proposed approach has been evaluated on a dataset consisting of 100 region<br />

of interest (ROI) images and achieves a promising 89% average accuracy in detecting target malignant cells.<br />

16:50-17:10, Paper MoBT7.5<br />

Microaneurysm (MA) Detection via Sparse Representation Classifier with MA and Non-MA Dictionary Learning<br />

Zhang, Bob, Univ. of Waterloo<br />

Zhang, Lei, The Hong Kong Pol. Univ.<br />

You, Jane, The Hong Kong Pol. Univ.<br />

Karray, Fakhri, Univ. of Waterloo<br />

Diabetic retinopathy (DR) is a common complication of diabetes that damages the retina and leads to sight loss if treated<br />

late. In its earliest stage, DR can be diagnosed by micro aneurysm (MA). Although some algorithms have been developed,<br />

the accurate detection of MA in color retinal images is still a challenging problem. In this paper we propose a new method<br />

to detect MA based on Sparse Representation Classifier (SRC). We first roughly locate MA candidates by using multiscale<br />

Gaussian correlation filtering, and then classify these candidates with SRC. Particularly, two dictionaries, one for<br />

MA and one for non-MA, are learned from example MA and non-MA structures, and are used in the SRC process. Experimental<br />

results on the ROC database show that the proposed method can well distinguish MA from non-MA objects.<br />

MoBT8 Lower Foyer<br />

Object Detection and Recognition; Performance Evaluation of Computer Vision Algorithms; Computer Vision<br />

Applications Poster Session<br />

Session chair: Chen, Chu-Song (Academia Sinica)<br />

15:00-17:10, Paper MoBT8.1<br />

A Neurobiologically Motivated Stochastic Method for Analysis of Human Activities in Video<br />

Sethi, Ricky, Univ. of California, Riverside<br />

Roy-Chowdhury, Amit, Univ. of California, Riverside<br />

In this paper, we develop a neurobiologically-motivated statistical method for video analysis that simultaneously searches<br />

the combined motion and form space in a concerted and efficient manner using well-known Markov chain Monte Carlo<br />

(MCMC) techniques. Specifically, we leverage upon an MCMC variant called the Hamiltonian Monte Carlo (HMC),<br />

which we extend to utilize data-based proposals rather than the blind proposals in a traditional HMC, thus creating the<br />

Data-Driven HMC (DDHMC). We demonstrate the efficacy of our system on real-life video sequences.<br />

15:00-17:10, Paper MoBT8.2<br />

Arbitrary Stereoscopic View Generation using Multiple Omnidirectional Image Sequences<br />

Hori, Maiya, Nara Inst. of Science and Tech.<br />

Kanbara, Masayuki, Nara Inst. of Science and Tech.<br />

Yokoya, Naokazu, Nara Inst. of Science and Tech.<br />

This paper proposes a novel method for generating arbitrary stereoscopic view from multiple omni directional image sequences.<br />

Although conventional methods for arbitrary view generation with an image-based rendering approach can create<br />

binocular views, positions and directions of viewpoints for stereoscopic vision are limited to a small range. In this research,<br />

we attempt to generate arbitrary stereoscopic views from omni directional image sequences that are captured in various<br />

multiple paths. To generate a high-quality stereoscopic view from a number of images captured at various viewpoints, appropriate<br />

ray information needs to be selected. In this paper, appropriate ray information is selected from a number of<br />

omni directional images using a penalty function expressed as ray similarity. In experiments, we show the validity of this<br />

penalty function by generating stereoscopic view from multiple real image sequences.<br />

- 38 -


15:00-17:10, Paper MoBT8.3<br />

Fast Odometry Integration in Local Bundle Adjustment-Based Visual SLAM<br />

Eudes, Alexandre, CEA LIST<br />

Lhuillier, Maxime, LASMEA<br />

Naudet Collette, Sylvie, CEA LIST, LVIC<br />

Dhome, Michel, Blaise Pascal Univ.<br />

The Simultaneous Localisation And Mapping (SLAM) for a camera moving in a scene is a long term research problem.<br />

Here we improve a recent visual SLAM which applies Local Bundle Adjustments (LBA) on selected key-frames of a<br />

video: we show how to correct the scale drift observed in long monocular video sequence using an additional odometry<br />

sensor. Our method and results are interesting for several reasons: (1) the pose accuracy is improved on real examples (2)<br />

we do not sacrifice the consistency between the reconstructed 3D points and image features to fit odometry data (3) the<br />

modification of the original visual SLAM method is not difficult.<br />

15:00-17:10, Paper MoBT8.4<br />

Classifying Textile Designs using Bags of Shapes<br />

Jia, Wei, Univ. of Dundee<br />

Mckenna, Stephen James, Univ. of Dundee<br />

The use of region shape descriptors was investigated for categorisation of textile design images. Images were segmented<br />

using MRF pixel labelling and the shapes of regions obtained were described with generic Fourier descriptors. Each image<br />

was represented as a bag of shapes. A simple yet competitive classification scheme based on nearest neighbour class-based<br />

matching was used. Classification performance was compared to that obtained when using bags of SIFT features.<br />

15:00-17:10, Paper MoBT8.5<br />

Driver Body-Height Prediction for an Ergonomically Optimized Ingress using a Single Omnidirectional Camera<br />

Scharfenberger, Christian, TU-Munich<br />

Chakraborty, Samarjit, TU-Munich<br />

Faerber, Georg, TU-Munich<br />

Maximizing passengers comfort is an important research topic in the domain of automotive systems engineering. In particular,<br />

an automatic adjustment of seat position according to driver height significantly increases the level of comfort<br />

during ingress. In this paper, we present a new method to estimate the height of approaching car drivers based on a single<br />

omni directional camera integrated with the side-view mirror of a car. Towards this, we propose mathematical descriptions<br />

of standard parking scenarios, allowing for an accurate height estimation. First, approaching drivers are extracted from<br />

image frames captured by the camera. Second, the scenario and height are initially estimated based on gathered samples<br />

of angles to head and foot-points of an approaching driver. An iterative optimization process removes outliers and refines<br />

the initially estimated scenario and height. Finally, we present a number of experimental results based on image sequences<br />

captured from real-life ingress scenarios.<br />

15:00-17:10, Paper MoBT8.6<br />

Torchlight Navigation<br />

Felsberg, Michael, Linköping Univ.<br />

Larsson, Fredrik, Linköping Univ.<br />

Wang, Han, Nanyang Tech. Univ.<br />

Ynnerman, Anders, Linköping Univ.<br />

Schön, Thomas, Linköping Univ.<br />

A common computer vision task is navigation and mapping. Many indoor navigation tasks require depth knowledge of<br />

flat, unstructured surfaces (walls, floor, ceiling). With passive illumination only, this is an ill-posed problem. Inspired by<br />

small children using a torchlight, we use a spotlight for active illumination. Using our torchlight approach, depth and orientation<br />

estimation of unstructured, flat surfaces boils down to estimation of ellipse parameters. The extraction of ellipses<br />

is very robust and requires little computational effort.<br />

- 39 -


15:00-17:10, Paper MoBT8.7<br />

Adaptive Image Projection Onto Non-Planar Screen using Projector-Camera Systems<br />

Yamanaka, Takashi, Nagoya Inst. of Tech.<br />

Sakaue, Fumihiko, Nagoya Inst. of Tech.<br />

Sato, Jun, Nagoya Inst. of Tech.<br />

In this paper, we propose a method for projecting images onto non-planar screens by using projector-camera systems eliminating<br />

distortion in projected images. In this system, point-to-point correspondences in a projector image and a camera<br />

image should be extracted. For finding correspondences, the epipolar geometry between a projector and a camera is used.<br />

By using dynamic programming method on epipolar lines, correspondences between projector image and camera image<br />

are obtained. Furthermore, in order to achieve faster and more robust matching, the non-planar screen is approximately<br />

represented by a B-spline surface. The small number of parameters for the B-spline surface are estimated from corresponding<br />

pixels on epipolar lines rapidly. Experimental results show the proposed method works well for projecting images<br />

onto non-planar screens.<br />

15:00-17:10, Paper MoBT8.8<br />

Analysis and Adaptation of Integration Time in PMD Camera for Visual Servoing<br />

Gil, Pablo, Univ. of Alicante<br />

Pomares, Jorge, Univ. of Alicante<br />

Torres, Fernando, Univ. of Alicante<br />

The depth perception in the objects of a scene can be useful for tracking or applying visual servoing in mobile systems. 3D<br />

time-of-flight (ToF) cameras provide range images which give measurements in real time to improve these types of tasks.<br />

However, the distance computed from these range images is very changing with the integration time parameter. This paper<br />

presents an analysis for the online adaptation of integration time of ToF cameras. This online adaptation is necessary in order<br />

to capture the images in the best condition irrespective of the changes of distance (between camera and objects) caused by<br />

its movement when the camera is mounted on a robotic arm.<br />

15:00-17:10, Paper MoBT8.9<br />

Detecting Paper Fibre Cross Sections in Microtomy Images<br />

Kontschieder, Peter, Graz Univ. of Tech.<br />

Donoser, Michael, Graz Univ. of Tech.<br />

Kritzinger, Johannes, Graz Univ. of Tech.<br />

Bauer, Wolfgang, Graz Univ. of Tech.<br />

Bischof, Horst, Graz Univ. of Tech.<br />

The goal of this work is the fully-automated detection of cellulose fibre cross sections in microtomy images. A lack of<br />

significant appearance information makes edges the only reliable cue for detection. We present a novel and highly discriminative<br />

edge fragment descriptor that represents angular relations between fragment points. We train a Random Forest<br />

with a plurality of these descriptors including their respective center votes. In such a way, the Random Forest exploits the<br />

knowledge about the object centroid for detection using a generalized Hough voting scheme. In the experiments we found<br />

that our method is able to robustly detect fibre cross sections in microtomy images and can therefore serve as initialization<br />

for successive fibre segmentation or tracking algorithms.<br />

15:00-17:10, Paper MoBT8.10<br />

Active Calibration of Camera-Projector Systems based on Planar Homography<br />

Park, Soon-Yong, Kyungpook National Univ.<br />

Park, Go Gwang, Kyungpook National Univ.<br />

This paper presents a simple and active calibration technique of camera-projector systems based on planar homography.<br />

From the camera image of a planar calibration pattern, we generate a projector image of the pattern through the homography<br />

between the camera and the projector. To determine the coordinates of the pattern corners from the view of the projector,<br />

we actively project a corner marker from the projector to align the marker with the printed pattern corners. Calibration is<br />

done in two steps. First, four outer corners of the pattern are identified. Second, all other inner corners are identified. The<br />

pattern image from the projector is then used to calibrate the projector. Experimental results of two types of camera-projector<br />

systems show that the projection errors of both camera and projector are less than 1 pixel.<br />

- 40 -


15:00-17:10, Paper MoBT8.11<br />

Abnormal Traffic Detection using Intelligent Driver Model<br />

Sultani, Waqas, Seoul National Univ.<br />

Choi, Jin Young, Seoul National Univ.<br />

We present a novel approach for detecting and localizing abnormal traffic using intelligent driver model. Specifically, we<br />

advect particles over video sequence. By treating each particle as a car, we compute driver behavior using intelligent driver<br />

model. The behaviors are learned using latent dirichlet allocation and frames are classified as abnormal using likelihood<br />

threshold criteria. In order to localize the abnormality; we compute spatial gradients of behaviors and construct Finite<br />

Time Lyaponov Field. Finally the region of abnormality is segmented using watershed algorithm. The effectiveness of<br />

proposed approach is validated using videos from stock footage websites.<br />

15:00-17:10, Paper MoBT8.12<br />

Detection of Moving Objects with Removal of Cast Shadows and Periodic Changes using Stereo Vision<br />

Moro, Alessandro, Univ. of Trieste<br />

Terabayashi, Kenji, Chuo Univ.<br />

Umeda, Kazunori, Chuo Univ.<br />

In this paper we present a method for the detection of moving objects for unknown and generic environments under cast<br />

shadow and periodic movements of non relevant objects (like waving leaves), using a combination of non-parametric<br />

thresholding algorithms and local cast shadow analysis with stereo camera information. Good detection rates were achieved<br />

in several environments under different lighting conditions, and objects could be detected independently of scene illumination,<br />

shadow, and periodic changes.<br />

15:00-17:10, Paper MoBT8.13<br />

Localized Image Matte Evaluation by Gradient Correlation<br />

Yao, Guilin, Harbin Inst. of Tech.<br />

Yao, Hongxun, Harbin Inst. of Tech.<br />

In natural image matting, various kinds of algorithms have been recently proposed. Moreover, alpha matting results have<br />

also been generated for comparison and composition into new backgrounds. However, all these methods have to make an<br />

alpha matte comparison to the ground truth so that one can get the final pixel-wised evaluation of these results. Nevertheless,<br />

while the input datasets are just used for test and there are no ground truth mattes, it is not possible to perform comparisons<br />

and to generate the quantitative comparison results. In this paper we combine the two ideas above and propose a<br />

new pixel-wise alpha mattes evaluation method. This approach is based on using local windows to measure gradient correlation<br />

between image and the matte. An optimal image channel minimizing the image variance is also selected at each<br />

window in order to perform the correlation more correctly. Experimental result shows that, our system can generate precise<br />

evaluation result for each pixel of each matte without ground truth.<br />

15:00-17:10, Paper MoBT8.14<br />

Multiple Plane Detection in Image Pairs using J-Linkage<br />

Fouhey, David Ford, Middlebury Coll.<br />

Scharstein, Daniel, Middlebury Coll.<br />

Briggs, Amy, Middlebury Coll.<br />

We present a new method for the robust detection and matching of multiple planes in pairs of images. Such planes can<br />

serve as stable landmarks for vision-based urban navigation. Our approach starts from SIFT matches and generates multiple<br />

local homography hypotheses using the recent J-linkage technique by Toldo and Fusiello, a robust randomized multimodel<br />

estimation algorithm. These hypotheses are then globally merged, spatially analyzed, robustly fitted, and checked<br />

for stability. When tested on more than 30,000 image pairs taken from panoramic views of a college campus, our method<br />

yields no false positives and recovers 72% of the matchable building walls identified by a human, despite significant occlusions<br />

and viewpoint changes.<br />

- 41 -


15:00-17:10, Paper MoBT8.15<br />

Contextual Features for Head Pose Estimation in Football Games<br />

Launila, Andreas, Royal Inst. of Tech. (KTH)<br />

Sullivan, Josephine, Royal Inst. of Tech. (KTH)<br />

We explore the benefits of using contextual features for head pose estimation in football games. Contextual features are<br />

derived from knowledge of the position of all players and combined with image based features derived from low-resolution<br />

footage. Using feature selection and combination techniques, we show that contextual features can aid head pose estimation<br />

in football games and potentially be an important complement to the image based features traditionally used.<br />

15:00-17:10, Paper MoBT8.16<br />

Coarse-To-Fine Multiclass Nested Cascades for Object Detection<br />

Verschae, Rodrigo, Univ. de Chile<br />

Ruiz-Del-Solar, Javier, Univ. de Chile<br />

Building robust and fast object detection systems is an important goal of computer vision. A problem arises when several<br />

object types are to be detected, because the computational burden of running several specific classifiers in parallel becomes<br />

a problem. In addition the accuracy and the training time can be greatly affected. Seeking to provide a solution to these<br />

problems, we extend cascade classifiers to the multiclass case by proposing the use of multiclass coarse-to-fine (CTF)<br />

nested cascades. The presented results show that the proposed system scales well with the number of classes, both at<br />

training and running time.<br />

15:00-17:10, Paper MoBT8.17<br />

Visual SLAM with an Omnidirectional Camera<br />

Rituerto, Alejandro, Univ. de Zaragoza<br />

Puig, Luis, Univ. de Zaragoza<br />

Guerrero, Jose J., Univ. de Zaragoza<br />

In this work we integrate the Spherical Camera Model for catadioptric systems in a Visual-SLAM application. The Spherical<br />

Camera Model is a projection model that unifies central catadioptric and conventional cameras. To integrate this model<br />

into the Extended Kalman Filter-based SLAM we require to linearize the direct and the inverse projection. We have performed<br />

an initial experimentation with omni directional and conventional real sequences including challenging trajectories.<br />

The results confirm that the omni directional camera gives much better orientation accuracy improving the estimated camera<br />

trajectory.<br />

15:00-17:10, Paper MoBT8.18<br />

Shape Index SIFT: Range Image Recognition using Local Features<br />

Bayramoglu, Neslihan, Middle East Tech. Univ.<br />

Alatan, A. Aydin, Middle East Tech. Univ.<br />

Range image recognition gains importance in the recent years due to the developments in acquiring, displaying, and storing<br />

such data. In this paper, we present a novel method for matching range surfaces. Our method utilizes local surface properties<br />

and represents the geometry of local regions efficiently. Integrating the Scale Invariant Feature Transform (SIFT) with the<br />

shape index (SI) representation of the range images allows matching of surfaces with different scales and orientations. We<br />

apply the method for scaled, rotated, and occluded range images and demonstrate the effectiveness it by comparing the<br />

previous studies.<br />

15:00-17:10, Paper MoBT8.19<br />

Windows Detection using K-Means in CIE-Lab Color Space<br />

Recky, Michal, ICG TU Graz<br />

Leberl, Franz, ICG TU Graz<br />

In this paper, we present a method for window detection, robust enough to process complex facades of historical buildings.<br />

This method is able to provide results even for facades under severe perspective distortion. Our algorithm is able to detect<br />

many different window types and does not require a learning step. We achieve these features thanks to an extended gradient<br />

projection method and introduction of a I color descriptor based on a k-means clustering in a CIE-Lab color space into the<br />

- 42 -


process. This method is an important step towards creating large 3D city models in an automated workflow from large online<br />

image databases, or industrial systems. As such, it was designed to provide a high level of robustness for processing<br />

a large variety of I types.<br />

15:00-17:10, Paper MoBT8.20<br />

Robust Figure Extraction on Textured Background: A Game-Theoretic Approach<br />

Albarelli, Andrea, Univ. Ca’ Foscari Venezia<br />

Rodolà, Emanuele, Univ. Ca’ Foscari Venezia<br />

Cavallarin, Alberto, Univ. Ca’ Foscari Venezia<br />

Torsello, Andrea, Univ. Ca’ Foscari Venezia<br />

Feature-based image matching relies on the assumption that the features contained in the model are distinctive enough. When<br />

both model and data present a sizeable amount of clutter, the signal-to-noise ratio falls and the detection becomes more challenging.<br />

If such clutter exhibits a coherent structure, as it is the case for textured background, matching becomes even harder.<br />

In fact, the large amount of repeatable features extracted from the texture dims the strength of the relatively few interesting<br />

points of the object itself. In this paper we introduce a game-theoretic approach that allows to distinguish foreground features<br />

from background ones. In addition the same technique can be used to deal with the object matching itself. The whole procedure<br />

is validated by applying it to a practical scenario and by comparing it with a standard point-pattern matching technique.<br />

15:00-17:10, Paper MoBT8.21<br />

Image Retrieval of First-Person Vision for Pedestrian Navigation in Urban Area<br />

Kameda, Yoshinari, Univ. of Tsukuba<br />

Ohta, Yuich, Univ. of Tsukuba<br />

We propose a new computer vision approach to locate a walking pedestrian by a camera image of first-person vision in practical<br />

situation. We assume reference points have been registered with other first-person vision images. We utilize SURF and<br />

define seven matching criteria that derive from the property of first-person vision so that it rejects false matching. We have<br />

implemented a preliminary system that can respond to a query within ½ seconds for a path of approximately 1 km long<br />

around Tokyo downtown area where pedestrians and vehicles are always in images.<br />

15:00-17:10, Paper MoBT8.22<br />

Unexpected Human Behavior Recognition in Image Sequences using Multiple Features<br />

Zweng, Andreas, Vienna Univ. of Tech.<br />

Kampel, Martin, Vienna Univ. of Tech.<br />

This paper presents a novel approach for unexpected behavior recognition in image sequences with attention to high density<br />

crowd scenes. Due to occlusions, object-tracking in such scenes is challenging and in cases of low resolution or poor image<br />

quality it is not robust enough to efficiently detect abnormal behavior. The wide variety of possible actions performed by<br />

humans and the problem of occlusions makes action recognition unsuitable for behavior recognition in high density crowd<br />

scenes. The novel approach, which is presented in this paper uses features based on motion information instead of detecting<br />

actions or events in order to detect abnormality. Experiments demonstrate the potentials of the approach.<br />

15:00-17:10, Paper MoBT8.23<br />

Object Recognition based on N-Gram Expression of Human Actions<br />

Kojima, Atsuhiro, Osaka Prefecture Univ.<br />

Miki, Hiroshi, Osaka Prefecture Univ.<br />

Kise, Koichi, Osaka Prefecture Univ.<br />

In this paper, we propose a novel method for recognizing objects by observing human actions based on bag-of-features. The<br />

key contribution of our method is that human actions are represented as n-grams of symbols and used to identify specific<br />

object categories. First, features of human actions taken on a object are extracted from video images and encoded to symbols.<br />

Then, n-grams are generated from the sequence of symbols and registered for corresponding object category. For recognition<br />

phase, actions taken on the object are converted into set of n-grams in the same way and compared with ones representing<br />

object categories. We performed experiments to recognize objects in an office environment and confirmed the effectiveness<br />

of our method.<br />

- 43 -


15:00-17:10, Paper MoBT8.24 CANCELED<br />

Image Feature Associations via Local Semantic Structure<br />

Parrish, Nicholas, Colorado State Univ.<br />

Draper, Bruce A., Colorado State Univ.<br />

Most research in object recognition suffers from two distinct weaknesses that limits its effectiveness in natural environments.<br />

First, it tends to rely on labeled training images to learn object models. Second, it tends to assume that the goal is<br />

to recognize a single, dominant foreground object. This paper presents a different method of object recognition that learns<br />

to recognize objects in natural scenes without supervision. The approach uses semantic co-occurance information of local<br />

image features to form object models (called percepts) from groups of image features. These percepts are used to recognize<br />

objects in novel images. It will be shown that this approach is capable of learning object categories without supervision,<br />

and of recognizing objects in complex multi-object scenes. It will also be shown that it outperforms nearest-neighbor<br />

scene recognition.<br />

15:00-17:10, Paper MoBT8.25<br />

Unifying Approach for Fast License Plate Localization and Super-Resolution<br />

Nguyen, Chu Duc, Ec. Centrale de Lyon<br />

Ardabilian, Mohsen, Ec. Centrale de Lyon<br />

Chen, Liming, Ec. Centrale de Lyon<br />

This paper addresses the localization and super resolution of license plate in a unifying approach. Higher quality license<br />

plate can be obtained using super resolution on successive lower resolution plate images. All existing methods assume that<br />

plate zones are correctly extracted from every frame. However, the accurate localization needs a sufficient quality of the<br />

image, which is not always true in real video. Super-resolution on all pixels is a possible but much time consuming alternative.<br />

We propose a framework which interlaces successfully these two modules. First, coarse candidates are found by an weak<br />

but fast license plate detection based on edge map sub-sampling. Then, an improved fast MAP-based super-resolution, using<br />

local phase accurate registration and edge preserving prior, applied on these regions of interest. Finally, our robust ICHTbased<br />

localizer rejects false-alarms and localizes the high resolution license plate more accurately. Experiments which were<br />

conducted on synthetic and real data, proved the robustness of our approach with real-time possibility.<br />

15:00-17:10, Paper MoBT8.26<br />

Dimensionality Reduction for Distributed Vision Systems using Random Projection<br />

Sulic, Vildana, Univ. of Ljubljana<br />

Pers, Janez, Univ. of Ljubljana<br />

Kristan, Matej, Univ. of Ljubljana<br />

Kovacic, Stanislav, Univ. of Ljubljana<br />

Dimensionality reduction is an important issue in the context of distributed vision systems. Processing of dimensionality<br />

reduced data requires far less network resources (e.g., storage space, network bandwidth) than processing of original data.<br />

In this paper we explore the performance of the random projection method for distributed smart cameras. In our tests, random<br />

projection is compared to principal component analysis in terms of recognition efficiency (i.e., object recognition).<br />

The results obtained on the COIL-20 image data set show good performance of the random projection in comparison to<br />

the principal component analysis, which requires distribution of a subspace and therefore consumes more resources of the<br />

network. This indicates that random projection method can elegantly solve the problem of subspace distribution in embedded<br />

and distributed vision systems. Moreover, even without explicit orthogonalization or normalization of random<br />

projection transformation subspace, the method achieves good object recognition efficiency.<br />

15:00-17:10, Paper MoBT8.27<br />

Sensor Fusion for Cooperative Head Localization<br />

Del Bimbo, Alberto, Univ. of Florence<br />

Dini, Fabrizio, Univ. of Florence<br />

Lisanti, Giuseppe, Univ. of Florence<br />

Pernici, Federico, Univ. of Florence<br />

In modern video surveillance systems, pan; tilt; zoom (PTZ) cameras certainly have the potential to allow the coverage of<br />

wide areas with a much smaller number of sensors, compared to the common approach of fixed camera networks. This<br />

- 44 -


paper describes a general framework that aims at exploiting the capabilities of modern PTZ cameras in order to acquire<br />

high resolution images of body parts, such as the head, from the observation of pedestrians moving in a wide outdoor<br />

area. The framework allows to organize the sensors in a network with arbitrary topology, and to establish pairwise<br />

master;slave relationship between them. In this way a slave camera can be steered to acquire imagery of a target keeping<br />

into account both target and zooming uncertainties. Experiments show good performance in localizing targets head, independently<br />

from the zooming factor of the slave camera.<br />

15:00-17:10, Paper MoBT8.28<br />

Shared Random Ferns for Efficient Detection of Multiple Categories<br />

Villamizar Vergel, Michael, CSIC-UPC<br />

Moreno-Noguer, Francesc, CSIC-UPC<br />

Andrade Cetto, Juan, CSIC-UPC<br />

Sanfeliu, Alberto, Univ. Pol. De Catalunya<br />

We propose a new algorithm for detecting multiple object categories that exploits the fact that different categories may<br />

share common features but with different geometric distributions. This yields an efficient detector which, in contrast to<br />

existing approaches, considerably reduces the computation cost at runtime, where the feature computation step is traditionally<br />

the most expensive. More specifically, at the learning stage we compute common features by applying the same<br />

Random Ferns over the Histograms of Oriented Gradients on the training images. We then apply a boosting step to build<br />

discriminative weak classifiers, and learn the specific geometric distribution of the Random Ferns for each class. At<br />

runtime, only a few Random Ferns have to be densely computed over each input image, and their geometric distribution<br />

allows performing the detection. The proposed method has been validated in public datasets achieving competitive detection<br />

results, which are comparable with state-of-the-art methods that use specific features per class.<br />

15:00-17:10, Paper MoBT8.29<br />

Age Recognition in the Wild<br />

Bauckhage, Christian, Fraunhofer IAIS<br />

Jahanbekam, Amirhossein, Fraunhofer IAIS<br />

Thurau, Christian, Fraunhofer IAIS<br />

In this paper, we present a novel approach to age recognition from facial images. The method we propose, combines<br />

several established features in order to characterize facial characteristics and aging patterns. Since we explicitly consider<br />

age recognition in the wild, i.e. vast amounts of unconstrained Internet images, the methods we employ are tailored towards<br />

speed and efficiency. For evaluation, we test different classifiers on common benchmark data and a new data set of unconstrained<br />

images harvested from the Internet. Extensive experimental evaluation shows state of the art performance on<br />

the benchmarks, very high accuracy for the novel data set, and superior runtime performance; to our knowledge, this is<br />

the first time that automatic age recognition is carried out on a large Internet data set.<br />

15:00-17:10, Paper MoBT8.30<br />

EKF-SLAM and Machine Learning Techniques for Visual Robot Navigation<br />

Casarrubias-Vargas, Heriberto, CINVESTAV<br />

Petrilli-Barceló, Alberto E., CINVESTAV<br />

Bayro Corrochano, Eduardo Jose, CINVESTAV, Unidad Guadalajara<br />

In this work we propose the use of machine learning techniques to improve Simultaneous Localization and Mapping<br />

(SLAM) using an extended Kalman filter (EKF) and visual information for robot navigation. We are using the Viola and<br />

Jones approach for looking specific visual landmarks in environment. The landmarks are used to improve the robot localization<br />

in the EKF-SLAM system. Our experiments validate the efficiency of our algorithm.<br />

15:00-17:10, Paper MoBT8.31<br />

Boosting Clusters of Samples for Sequence Matching in Camera Networks<br />

Takala, Valtteri, Univ. of Oulu<br />

Cai, Yinghao, Univ. of Oulu<br />

Pietikäinen, Matti, Univ. of Oulu<br />

This study introduces a novel classification algorithm for learning and matching sequences in view independent object<br />

- 45 -


tracking. The proposed learning method uses adaptive boosting and classification trees on a wide collection (shape, pose,<br />

color, texture, etc.) of image features that constitute a model for tracked objects. The temporal dimension is taken into account<br />

by using k-mean clusters of sequence samples. Most of the utilized object descriptors have a temporal quality also.<br />

We argue that with a proper boosting approach and decent number of reasonably descriptive image features it is feasible<br />

to do view-independent sequence matching in sparse camera networks. The experiments on real-life surveillance data support<br />

this statement.<br />

15:00-17:10, Paper MoBT8.32<br />

Saliency Detection and Object Localization in Indoor Environments<br />

Rudinac, Maja, Delft Univ. of Tech.<br />

Jonker, Pieter, Delft Univ. of Tech.<br />

In this paper we present a scene exploration method for the identification of interest regions in unknown indoor environments<br />

and the position estimation of the objects located in those regions. Our method consists of two stages: First, we<br />

generate a saliency map of the scene based on the spectral residual of three color channels and interest points are detected<br />

in this map. Second, we propose and evaluate a method for the clustering of neighboring interest regions, the rejection of<br />

outliers and the estimation of the positions of potential objects. Once the location of objects in the scene is known, recognition<br />

of objects/object classes can be performed or the locations can be used for grasping the object. The main contribution<br />

of this paper lies in a computationally inexpensive method for the localization of multiple salient objects in a scene. The<br />

performance obtained on a dataset of indoor scenes shows that our method performs good, is very fast and hence highly<br />

suitable for real-world applications, such as mobile robots and surveillance.<br />

15:00-17:10, Paper MoBT8.33<br />

Bubble Tag Identification using an Invariant–Under–Perspective Signature<br />

Patraucean, Viorica, Univ. of Toulouse<br />

Gurdjos, Pierre, Univ. of Toulouse<br />

Conter, Jean, Univ. of Toulouse<br />

We have at our disposal a large database containing images of various configurations of coplanar circles, randomly laidout,<br />

called Bubble Tags. The images are taken from different viewpoints. Given a new image (query image), the goal is to<br />

find in the database the image containing the same bubble tag as the query image. We propose representing the images<br />

through projective invariant signatures which allow identifying the bubble tag without passing through an Euclidean reconstruction<br />

step. This is justified by the size of the database, which imposes the use of queries in 1D/vectorial form, i.e.<br />

not in 2D/matrix form. The experiments carried out confirm the efficiency of our approach, in terms of precision and complexity.<br />

15:00-17:10, Paper MoBT8.35<br />

The Role of Polarity in Haar-Like Features for Face Detection<br />

Landesa-Vázquez, Iago, Univ. de Vigo<br />

Alba Castro, Jose Luis, Univ. of Vigo<br />

Human vision is primarily based on local contrast perception and its polarity. Viola and Jones proposed, in their wellknown<br />

face detector framework, a boosted cascade of weak classifiers based on Haar-like features which encode local<br />

contrast and polarity information. Nevertheless contrast polarity invariance, which is not directly modeled in their framework,<br />

has been shown to be perceptually relevant for the human capability of detecting faces. In this paper we study, from<br />

both algorithmical and perceptual points of view, the effect of enhancing Haar-like features with polarity invariance and<br />

how it may improve cascaded classifiers.<br />

15:00-17:10, Paper MoBT8.36<br />

A Human Detection Framework for Heavy Machinery<br />

Heimonen, Teuvo Antero, Univ. of Oulu<br />

Heikkilä, Janne, Univ. of Oulu<br />

A stereo camera based human detection framework for heavy machinery is proposed. The framework allows easy integration<br />

of different human detection and image segmentation methods. This integration is essential for diverge and challenging<br />

- 46 -


work machine environments, in which traditional, one detector based human detection approaches has been found to be<br />

insufficient. The framework is based on the idea of pixel-wise human probabilities, which are obtained by several separate<br />

detection trials following binomial distribution. The framework has been evaluated with extensive image sequences of<br />

authentic work machine environments, and it has proven to be feasible. Promising detection performance was achieved<br />

by utilizing publically available human detectors.<br />

15:00-17:10, Paper MoBT8.37<br />

Building a Videorama with Shallow Depth of Field<br />

Bae, Soonmin, Boston Coll.<br />

Jiang, Hao, Boston Coll.<br />

This paper presents a new automatic approach to building a videorama with shallow depth of field. We stitch the static background<br />

of video frames and render the dynamic foreground onto the enlarged background after foreground/background segmentation.<br />

To this end, we extract the depth information from a two-view video stream. We show that the depth cues combined<br />

with color cues improve segmentation. Finally, we use the depth cues to synthesize the shallow depth of field effects in the<br />

final videorama. Our approach stabilizes the camera motion as if the video was captured from a static camera and improves<br />

the visual quality with the increased field of view and shallow depth of field effects.<br />

15:00-17:10, Paper MoBT8.38<br />

Fast Training of Object Detection using Stochastic Gradient Descent<br />

Wijnhoven, Rob, ViNotion BV<br />

De With, Peter H. N., Eindhoven Univ. of Tech. / CycloMedia<br />

Training datasets for object detection problems are typically very large and Support Vector Machine (SVM) implementations<br />

are computationally complex. As opposed to these complex techniques, we use Stochastic Gradient Descent (SGD) algorithms<br />

that use only a single new training sample in each iteration and process samples in a stream-like fashion. We have incorporated<br />

SGD optimization in an object detection framework. The object detection problem is typically highly asymmetric, because<br />

of the limited variation in object appearance, compared to the background. Incorporating SGD speeds up the optimization<br />

process significantly, requiring only a single iteration over the training set to obtain results comparable to state-of-the-art<br />

SVM techniques. SGD optimization is linearly scalable in time and the obtained speedup in computation time is two to three<br />

orders of magnitude. We show that by considering only part of the total training set, SGD converges quickly to the overall<br />

optimum.<br />

15:00-17:10, Paper MoBT8.39<br />

Assessing Water Quality by Video Monitoring Fish Swimming Behavior<br />

Serra-Toro, Carlos, Univ. Jaume I<br />

Montoliu, Raúl, Univ. Jaume I<br />

Traver, V. Javier, Univ. Jaume I<br />

Hurtado-Melgar, Isabel M., Univ. Jaume I<br />

Núñez-Redó, Manuela, Univ. Jaume I<br />

Cascales, Pablo, Univ. Jaume I<br />

Animals are known to alter their behavior in response to changes in their environments. Therefore, automatic visual monitoring<br />

of animal behavior is currently of great interest because of its many applications. In this paper, a video-based system<br />

is proposed for analyzing the swimming patterns of fishes so that the presence of toxic in the water can be inferred. This<br />

problem is challenging, among other reasons, because how fishes react when swimming in contaminated water is neither<br />

really known nor well defined. A novel use of recurrence plots is proposed, and very compact and simple descriptors based<br />

on these recurrence representation are found to be highly discriminative between videos of fishes in clean and polluted water.<br />

15:00-17:10, Paper MoBT8.40<br />

Detecting Wires in Cluttered Urban Scenes using a Gaussian Model<br />

Candamo, Joshua, Univ. of South Florida<br />

Goldgof, Dmitry, Univ. of South Florida<br />

Kasturi, Rangachar, Univ. of South Florida<br />

Godavarthy, Sridhar, Univ. of South Florida<br />

- 47 -


A novel wire detection algorithm for use by unmanned aerial vehicles (UAV) in low altitude urban reconnaissance is presented.<br />

This is of interest to urban search and rescue and military reconnaissance operations. Detection of wires plays an<br />

important role, because thin wires are hard to discern by tele-operators and automated systems. Our algorithm is based on<br />

identification of linear patterns in images. Most existing methods that search for linear patterns use a simple model of a<br />

line, which does not take into account the line surroundings. We propose the use of a robust Gaussian model to approximate<br />

the intensity profile of a line and its surroundings which allows effective discrimination of wires from other visually similar<br />

linear patterns. The algorithm is able to cope with highly cluttered urban backgrounds, moderate rain, and mist. Experimental<br />

results show a 17.7% detection improvement over the baseline.<br />

15:00-17:10, Paper MoBT8.41<br />

Abandoned Objects Detection based on Radial Reach Correlation of Double Illumination Invariant Foreground Masks<br />

Li, Xunli, Peking Univ.<br />

Zhang, Chao, Peking Univ.<br />

Zhang, Duo,<br />

This paper proposes an automatic and robust method to detect and recognize the abandoned objects for video surveillance<br />

systems. Two Gaussian Mixture Models(Long-term and Short-term models) in the RGB color space are constructed to<br />

obtain two binary foreground masks. By refining the foreground masks through Radial Reach Filter(RRF) method, the influence<br />

of illumination changes is greatly reduced. The height/width ratio and a linear SVM classifier based on HOG (Histogram<br />

of Oriented Gradient) descriptor is also used to recognize the left-baggage. Tests on datasets of PETS2006,<br />

PETS2007 and our own videos show that the proposed method in this paper can detect very small abandoned objects<br />

within low quality surveillance videos, and it is also robust to the varying illuminations and dynamic background.<br />

15:00-17:10, Paper MoBT8.42<br />

Unsupervised Visual Object Categorisation via Self-Organisation<br />

Kinnunen, Juha Teemu Ensio, Lappeenranta Univ. of Tech.<br />

Kamarainen, Joni-Kristian, Lappeenranta Univ. of Tech.<br />

Lensu, Lasse, Lappeenranta Univ. of Tech.<br />

Kalviainen, Heikki, Lappeenranta Univ. of Tech.<br />

Visual object categorisation (VOC) has become one of the most actively investigated topic in computer vision. In the<br />

mainstream studies, the topic is considered as a supervised problem, but recently, the ultimate challenge has been posed:<br />

Unsupervised visual object categorisation. Hitherto only a few methods have been published, all of them being computationally<br />

demanding successors of their supervised counterparts. In this study, we address this problem with a simple and<br />

effective method: competitive learning leading to self organisation (self-categorisation). The unsupervised competitive<br />

learning approach is implemented using the Kohonen self-organising map algorithm (SOM). The SOM is used to perform<br />

the both unsupervised code<strong>book</strong> generation and object categorisation. We present our method in detail and compare results<br />

to the supervised approach.<br />

15:00-17:10, Paper MoBT8.43<br />

A Novel Shape Feature for Fast Region-Based Pedestrian Recognition<br />

Shahrokni, Ali, Univ. of Reading<br />

Gawley, Darren, Univ. of Adelaide<br />

Ferryman, James, Univ. of Reading<br />

A new class of shape features for region classification and high-level recognition is introduced. The novel Randomised<br />

Region Ray (RRR) features can be used to train binary decision trees for object category classification using an abstract<br />

representation of the scene. In particular we address the problem of human detection using an over segmented input image.<br />

We therefore do not rely on pixel values for training, instead we design and train specialised classifiers on the sparse set<br />

of semantic regions which compose the image. Thanks to the abstract nature of the input, the trained classifier has the potential<br />

to be fast and applicable to extreme imagery conditions. We demonstrate and evaluate its performance in people<br />

detection using a pedestrian dataset.<br />

- 48 -


15:00-17:10, Paper MoBT8.44<br />

Road Change Detection from Multi-Spectral Aerial Data<br />

Mancini, Adriano, Univ. Pol. Delle Marche<br />

Frontoni, Emanuele, Univ. Pol. Delle Marche<br />

Zingaretti, Primo, Univ. Pol. Delle Marche<br />

The paper presents a novel approach to automate the Change Detection (CD) problem for the specific task of road extraction.<br />

Manual approaches to CD fail in terms of the time for releasing updated maps; in the contrary, automatic approaches,<br />

based on machine learning and image processing techniques, allow to update large areas in a short time with an accuracy<br />

and precision comparable to those obtained by human operators. This work is focused on the road-graph update starting<br />

from aerial, multi-spectral data. Geore ferenced, ground data, acquired by a GPS and an inertial sensor, are integrated with<br />

aerial data to speed up the change detector. After roads extraction by means of a binary AdaBoost classifier, the old roadgraph<br />

is updated exploiting a particle filter. In particular this filter results very useful to link (track) parts of roads not extracted<br />

by the classifier due to the presence of occlusions (e.g., shadows, trees).<br />

15:00-17:10, Paper MoBT8.45<br />

Object Recognition and Localization via Spatial Instance Embedding<br />

Ikizler Cinbis, Nazli, Boston Univ.<br />

Sclaroff, Stan, Boston Univ.<br />

We propose an approach for improving object recognition and localization using spatial kernels together with instance embedding.<br />

Our approach treats each image as a bag of instances (image features) within a multiple instance learning framework,<br />

where the relative locations of the instances are considered as well as the appearance similarity of the localized image features.<br />

The introduced spatial kernel augments the recognition power of the instance embedding in an intuitive and effective way,<br />

providing increased localization performance. We test our approach over two object datasets and present promising results.<br />

15:00-17:10, Paper MoBT8.46<br />

Co-Recognition of Actions in Video Pairs<br />

Shin, Young Min, Seoul National Univ.<br />

Cho, Minsu, Seoul National Univ.<br />

Lee, Kyoung Mu, Seoul National Univ.<br />

In this paper, we present a method that recognizes single or multiple common actions between a pair of video sequences.<br />

We establish an energy function that evaluates geometric and photometric consistency, and solve the action recognition<br />

problem by optimizing the energy function. The proposed stochastic inference algorithm based on the Monte Carlo method<br />

explores the video pair from the local spatio-temporal interest point matches to find the common actions. Our algorithm<br />

works in unsupervised way without prior knowledge about the type and the number of common actions. Experiments<br />

show that our algorithm produces promising results on single and multiple action recognition.<br />

15:00-17:10, Paper MoBT8.47<br />

Detecting Moving Objects using a Camera on a Moving Platform<br />

Lin, Chung-Ching, Georgia Inst. of Tech.<br />

Wolf, Marilyn, Georgia Inst. of Tech.<br />

This paper proposes a new ego-motion estimation and background/foreground classification method to effectively segment<br />

moving objects from videos captured by a moving camera on a moving platform. Existing methods for moving-camera<br />

detecting impose serious constraints. In our approach, ellipsoid scene shape is applied in the motion model and a complicated<br />

ego-motion estimation formula is derived. Genetic algorithm is introduced to accurately solve ego-motion parameters.<br />

After motion recovery, noisy result is refined by motion vector correlation and foreground is classified by pixel level probability<br />

model. Experiment results show that the method demonstrates significant detecting performance without further<br />

restrictions and performs effectively in complex detecting environment.<br />

- 49 -


15:00-17:10, Paper MoBT8.48<br />

A Unified Probabilistic Approach to Feature Matching and Object Segmentation<br />

Kim, Tae Hoon, Seoul National Univ.<br />

Lee, Kyoung Mu, Seoul National Univ.<br />

Lee, Sang Uk, Seoul National Univ.<br />

This paper deals with feature matching and segmentation of common objects in a pair of images, simultaneously. For the<br />

feature matching problem, the matching likelihoods of all feature correspondences are obtained by combining their discriminative<br />

power with the spatial coherence constraint that favors their spatial aggregation via object segmentation. At<br />

the same time, for the object segmentation problem, our algorithm estimates the object likelihood that each subregion is<br />

a commonly existing part in two images by the affinity propagation of the resulted matching likelihoods. Since these two<br />

problems are related to each other, our main idea to solve them is to integrate all the priors about them into a unified framework,<br />

that consists of several correlated quadratic cost functions. Eventually, all matching and object likelihoods are estimated<br />

simultaneously as a solution of linear system of equations. Based on these likelihoods, we finally recover the optimal<br />

feature matches and the common object parts by imposing simple sequential mapping and thresholding techniques, respectively.<br />

The experiments demonstrate the superiority of our algorithm compared with the conventional methods.<br />

15:00-17:10, Paper MoBT8.49<br />

Automatic Restoration of Scratch in Old Archive<br />

Kim, Kyung-Tai, Konkuk Univ.<br />

Kim, Byunggeun, Konkuk Univ.<br />

Kim, Eun Yi, Konkuk Univ.<br />

This paper presents scratch restoration method that can deal with scratches of various lengths and widths in old film. The<br />

proposed method consists of detection and reconstruction. The detection is performed using texture and shape properties<br />

of the scratches: first, each pixel is classified as scratches and non-scratches using a neural network (NN)-based texture<br />

classifier, and then some false alarms are removed by shape filtering. Thereafter, the detected region is reconstructed.<br />

Here, the reconstruction is formulated as energy minimization problem, thus genetic algorithm is used as optimization algorithm.<br />

The experimental result with well-known old films showed the effectiveness of the proposed method.<br />

15:00-17:10, Paper MoBT8.50<br />

Automatic Building Detection in Aerial Images using a Hierarchical Feature based Image Segmentation<br />

Izadi, Mohammad, Simon Fraser Univ.<br />

Saeedi, Parvaneh, Simon Fraser Univ.<br />

This paper introduces a novel automatic building detection method for aerial images. The proposed method incorporates<br />

a hierarchical multilayer feature based image segmentation technique using color. A number of geometrical/regional attributes<br />

are defined to identify potential regions in multiple layers of segmented images. A tree-based mechanism is utilized<br />

to inspect segmented regions using their spatial relationships with each other and their regional/geometrical characteristics.<br />

This process allows the creation of a set of candidate regions that are validated as rooftops based on the overlap between<br />

existing and predicted shadows of each region according to the image acquisition information. Experimental results show<br />

an overall shape accuracy and completeness of 96%.<br />

15:00-17:10, Paper MoBT8.51<br />

Making Visual Object Categorization More Challenging: Randomized Caltech-101 Data Set<br />

Kinnunen, Juha Teemu Ensio, Lappeenranta Univ. of Tech.<br />

Kamarainen, Joni-Kristian, Lappeenranta Univ. of Tech.<br />

Lensu, Lasse, Lappeenranta Univ. of Tech.<br />

Lankinen, Jukka, Lappeenranta Univ. of Tech.<br />

Kalviainen, Heikki, Lappeenranta Univ. of Tech.<br />

Visual object categorization is one of the most active research topics in computer vision, and Caltech-101 data set is one<br />

of the standard benchmarks for evaluating the method performance. Despite of its wide use, the data set has certain weaknesses:<br />

I) the objects are practically in a standard pose and scale in the middle of the images and ii) background varies too<br />

little in certain categories making it more discriminative than the foreground objects. In this work, we demonstrate how<br />

these weaknesses bias the evaluation results in an undesired manner. In addition, we reduce the bias effect by replacing<br />

- 50 -


the backgrounds with random landscape images from Google and by applying random Euclidean transformations to the<br />

foreground objects. We demonstrate how the proposed randomization process makes visual object categorization more<br />

challenging improving the relative results of methods which categorize objects by their visual appearance and are invariant<br />

to pose changes. The new data set is made publicly available for other researchers.<br />

15:00-17:10, Paper MoBT8.52<br />

A Reliability Assessment Paradigm for Automated Video Tracking Systems<br />

Chen, Chung-Hao, North Carolina Central Univ.<br />

Yao, Yi, GE Global Res.<br />

Koschan, Andreas, The Univ. of Tennessee<br />

Abidi, Mongi, The Univ. of Tennessee<br />

Most existing performance evaluation methods concentrate on defining separate metrics over a wide range of conditions<br />

and generating standard benchmarking video sequences for examining the effectiveness of video tracking systems. In<br />

other words, these methods attempt to design a robustness margin or factor for the system. These methods are deterministic<br />

in which a robustness factor, for example, 2 or 3 times the expected number of subjects to track or the strength of illumination<br />

would be required in the design. This often results in over design, thus increasing costs, or under design causing<br />

failure by unanticipated factors. In order to overcome these limitations, we propose in this paper an alternative framework<br />

to analyze the physics of the failure process and, through the concept of reliability, determine the time to failure in automated<br />

video tracking systems. The benefit of our proposed framework is that we can provide a unified and statistical index<br />

to evaluate the performance of automated video tracking system for a task to be performed. At the same time, the uncertainty<br />

problem about a failure process, which may be caused by the systems complexity, imprecise measurements of the<br />

relevant physical constants and variables, or the indeterminate nature of future events, can be addressed accordingly based<br />

on our proposed framework.<br />

15:00-17:10, Paper MoBT8.53<br />

Road Sign Detection in Images: A Case Study<br />

Belaroussi, Rachid, Univ. Paris Est,INRETS-LCPC<br />

Foucher, Philippe, Lab. Des Ponts et Chaussées<br />

Tarel, Jean-Philippe, LCPC<br />

Soheilian, Bahman, Ins. Géographique National,<br />

Charbonnier, Pierre, ERA27 LCPC – LRPC<br />

Paparoditis, Nicolas, Inst. Geographique National<br />

Road sign identification in images is an important issue, in particular for vehicle safety applications. It is usually tackled<br />

in three stages: detection, recognition and tracking, and evaluated as a whole. To progress towards better algorithms, we<br />

focus in this paper on the first stage of the process, namely road sign detection. More specifically, we compare, on the<br />

same ground-truth image database, results obtained by three algorithms that sample different state-of-the-art approaches.<br />

The three tested algorithms: Contour Fitting, Radial Symmetry Transform, and pair-wise voting scheme, all use color and<br />

edge information and are based on geometrical models of road signs. The test dataset is made of 847 images 960x1080 of<br />

complex urban scenes (available at www.itowns.fr/benchmarking.html). They feature 251 road signs of different shapes<br />

(circular, rectangular, triangular), sizes and types. The pros and cons of the three algorithms are discussed, allowing to<br />

draw new research perspectives.<br />

15:00-17:10, Paper MoBT8.54<br />

ImageCLEF@<strong>ICPR</strong> Contest: Challenges, Methodologies and Results of the Photo Annotation Task<br />

Nowak, Stefanie, Fraunhofer Inst. For Digital Media Tech.<br />

The Photo Annotation Task is performed as one task in the Image CLEF@<strong>ICPR</strong> contest and poses the challenge to annotate<br />

53 visual concepts in Flickr photos. Altogether 12 research teams met the multilabel classification challenge and submitted<br />

solutions. The participants were provided with a training and a validation set consisting of 5,000 and 3,000 annotated images,<br />

respectively. The test was performed on 10,000 images. Two evaluation paradigms have been applied, the evaluation per<br />

concept and the evaluation per example. The evaluation per concept was performed by calculating the Equal Error Rate and<br />

the Area Under Curve (AUC). The evaluation per example utilizes a recently proposed Ontology Score. For the concepts, an<br />

average AUC of 86.5% could be achieved, including concepts with an AUC of 96%. The classification performance for each<br />

image ranged between 59% and 100% with an average score of 85%.<br />

- 51 -


15:00-17:10, Paper MoBT8.55<br />

Task-Oriented Evaluation of Super-Resolution Techniques<br />

Tian, Li, NTT Corp.<br />

Suzuki, Akira, NTT Cyber Space Lab.<br />

Koike, Hideki, NTT Corp.<br />

The goal of super-resolution (SR) techniques is to enhance the resolution of low-resolution (LR) images. How to evaluate<br />

the performance of an SR algorithm seems to be forgotten when researchers keep producing algorithms. This paper presents<br />

a task-oriented method for evaluating SR techniques. Our method includes both objective and subjective measures and is<br />

designed from the viewpoint of how SR impacts many essential image processing and vision tasks. We evaluate some<br />

state-of-the-art SR algorithms and the results suggest that different SR algorithms should be utilized for different applications.<br />

In general, they reflect the consistency and conflict between objective and subjective measures as well as computer<br />

vision systems and human vision systems do.<br />

15:00-17:10, Paper MoBT8.56<br />

FeEval – a Dataset for Evaluation of Spatio-Temporal Local Features<br />

Stoettinger, Julian, TU Vienna<br />

Zambanini, Sebastian, TU Vienna<br />

Khan, Rehanullah, TU Vienna<br />

Hanbury, Allan, Information Retrieval Facility<br />

The most successful approaches to video understanding and video matching use local spatio-temporal features as a sparse<br />

representation for video content. Until now, no principled evaluation of these features has been done. We present FeEval,<br />

a dataset for the evaluation of such features. For the first time, this dataset allows for a systematic measurement of the stability<br />

and the invariance of local features in videos. FeEval consists of 30 original videos from a great variety of different<br />

sources, including HDTV shows, 1080p HD movies and surveillance cameras. The videos are iteratively varied by increasing<br />

blur, noise, increasing or decreasing light, median filter, compression quality, scale and rotation leading to a total<br />

of 1710 video clips. Homography matrices are provided for geometric transformations. The surveillance videos are taken<br />

from 4 different angles in a calibrated environment. Similar to prior work on 2D images, this leads to a repeatability and<br />

matching measurement in videos for spatio-temporal features estimating the overlap of features under increasing changes<br />

in the data.<br />

15:00-17:10, Paper MoBT8.57<br />

Performance Evaluation Tools for Zone Segmentation and Classification (PETS)<br />

Seo, Wontaek, Univ. of Maryland<br />

Agrawal, Mudit, Univ. of Maryland<br />

Doermann, David, Univ. of Maryland<br />

This paper describes a set of Performance Evaluation Tools (PETS) for document image zone segmentation and classification.<br />

The tools allow researchers and developers to evaluate, optimize and compare their algorithms by providing a<br />

variety of quantitative performance metrics. The evaluation of segmentation quality is based on the pixel-based overlaps<br />

between two sets of zones proposed by Randriamasy and Vincent. PETS extends the approach by providing a set of metrics<br />

for overlap analysis, RLE and polygonal representation of zones and introduces type-matching to evaluate zone classification.<br />

The software is available for research use.<br />

MoBT9 Upper Foyer<br />

Feature Extraction; Classification; Clustering; Bayesian Methods Poster Session<br />

Session chair: Pietikäinen, Matti (Univ of Oulu)<br />

15:00-17:10, Paper MoBT9.1<br />

Shape Filling Rate for Silhouette Representation and Recognition<br />

An, Guocheng, Chinese Acad. of Sciences<br />

Zhang, Fengjun, Chinese Acad. of Sciences<br />

Wang, Hong’An, Chinese Acad. of Sciences<br />

Dai, Guozhong, Chinese Acad. of Sciences<br />

- 52 -


Research on complex shape recognition showed that the shape context algorithm is sensitive to relative position variation<br />

of articulation. Aimed at this problem, a shape recognition method is proposed based on local shape filling rate of various<br />

object silhouettes. We take each landmark point as a circle center and use as its radius. Then, under a particular radius, the<br />

ratio between the covered silhouette pixels and the total pixels is defined as local shape filling rate. Thus, different radius<br />

may form different local shape filling rates. All landmark points with different radius will constitute a characteristic matrix<br />

which can effectively reflects the entire statistical property of the object shape. Experiments on a variety of shape databases<br />

show that the novel method is insensitive to articulation and less influenced by the number of landmark points, so our algorithm<br />

has strong power in describing object details.<br />

15:00-17:10, Paper MoBT9.2<br />

Learning Gmm using Elliptically Contoured Distributions<br />

Li, Bo, Beijing Inst. of Tech.<br />

Liu, Wenju, Chinese Acad. of Sciences<br />

Dou, Lihua, Beijing Inst. of Tech.<br />

Model order selection and parameter estimation for Gaussian mixture model (GMM) are important issues for clustering<br />

analysis and density estimation. Most methods for model selection usually add a penalty term in the objective function<br />

that can penalize the models and choose an optimal one from a set of candidate models. This paper presents a simple and<br />

novel approach to determine the number of components and simultaneously estimate the parameters for GMM. By introducing<br />

the degenerating model, the proposed approach overcomes the drawback of likelihood estimate that is a non-decreasing<br />

function and can not be used to select the number of components. The degenerating model is a more general form<br />

of mixture component density and it can degenerate into the component density or a crater-like density when its parameter<br />

K varies from 1 to a bigger value. The likelihood of the crater-like density evaluated for the training data approximates to<br />

zero. This characteristic of the degenerating model forms the foundation of the proposed approach. The experimental<br />

results show robust and evident performance improvement of the approach.<br />

15:00-17:10, Paper MoBT9.3<br />

FIND: A Neat Flip Invariant Descriptor<br />

Guo, Xiaojie, Tianjin Univ.<br />

Cao, Xiaochun, Tianjin Univ.<br />

In this paper, we introduce a novel Flip Invariant Descriptor (FIND). FIND improves the degenerated performance resulted<br />

from image flips and reduces both space and time costs. Flip invariance of FIND enables the intractable flip detection to<br />

be achieved easily, instead of duplicately implementing the procedure. To alleviate the pressure brought by the increasing<br />

scale of image and video data, FIND utilizes a concise structure with less storage space. Comparing to SIFT, FIND reduces<br />

35.94% length for a descriptor. We compare FIND against SIFT with respect to accuracy, speed and space cost. An application<br />

to image search over a database of 3.27 million descriptors is also shown.<br />

15:00-17:10, Paper MoBT9.4<br />

Matching Image with Multiple Local Features<br />

Cao, Yudong, Beijing Univ. of Posts and Telecommunications/ Liaoning Univ. of Tech<br />

Zhang, Honggang, Beijing Univ. of Posts and Telecommunications<br />

Gao, Yanyan, Beijing Univ. of Posts and Telecommunications<br />

Xu, Xiaojun, Beijing Univ. of Posts and Telecommunications<br />

Guo, Jun, Beijing Univ. of Posts and Telecommunications<br />

In this paper, we present the fusional feature composed of Affine-SIFT, MSER and color moment invariants. The fusional<br />

feature is more robust and distinctive than a single local feature. Instead of adding three local features together simply, an<br />

efficient two-level matching strategy is devised with the fusional feature, which speeds up the establishment of the local<br />

correspondences. To remove partial false positives, an affine transformation is estimated with the weighted RANSAC<br />

which decreases iteration times. The experimental results show that our approach can achieve more accurate correspondence.<br />

We prospect to apply the fusional feature and match strategy to image retrieval in the end.<br />

- 53 -


15:00-17:10, Paper MoBT9.5<br />

Lipreading: A Graph Embedding Approach<br />

Zhou, Ziheng, Univ. of Oulu<br />

Zhao, Guoying, Univ. of Oulu<br />

Pietikäinen, Matti, Univ. of Oulu<br />

In this paper, we propose a novel graph embedding method for the problem of lipreading. To characterize the temporal<br />

connections among video frames of the same utterance, a new distance metric is defined on a pair of frames and graphs<br />

are constructed to represent the video dynamics based on the distances between frames. Audio information is used to assist<br />

in calculating such distances. For each utterance, a subspace of the visual feature space is learned from a well-defined intrinsic<br />

and penalty graph within a graph-embedding framework. Video dynamics are found to be well preserved along<br />

some dimensions of the subspace. Discriminatory cues are then decoded from curves of the projected visual features to<br />

classify different utterances.<br />

15:00-17:10, Paper MoBT9.6<br />

Face Recognition using a Multi-Manifold Discriminant Analysis Method<br />

Yang, Wankou, Southeast Univ. Nanjing<br />

Sun, Changyin, Southeast Univ. Nanjing<br />

Zhang, Lei, The Hong Kong Pol. Univ.<br />

In this paper, we propose a Multi-Manifold Discriminant Analysis (MMDA) method for face feature extraction and face<br />

recognition, which is based on graph embedded learning and under the Fisher discirminant analysis framework. In MMDA,<br />

the within-class graph and between-class graph are designed to characterize the within-class compactness and the between-class<br />

separability, respectively, seeking for the discriminant matrix that simultaneously maximizing the betweenclass<br />

scatter and minimizing the within-class scatter. In addition, the within-class graph can also represent the sub-manifold<br />

information and the between-class graph can also represent the multi-manifold information. The proposed MMDA is examined<br />

by using the FERET face database, and the experimental results demonstrate that MMDA works well in feature<br />

extraction and lead to good recognition performance.<br />

15:00-17:10, Paper MoBT9.7<br />

Globally-Preserving based Locally Linear Embedding<br />

Hui, Kanghua, Chinese Acad. of Sciences<br />

Wang, Chunheng, Chinese Acad. of Sciences<br />

Xiao, Baihua, Chinese Acad. of Sciences<br />

The locally linear embedding (LLE) algorithm is considered as a powerful method for the problem of nonlinear dimensionality<br />

reduction. In this paper, a new method called globally-preserving based LLE (GPLLE) is proposed. It not only<br />

preserves the local neighborhood, but also keeps those distant samples still far away, which solves the problem that LLE<br />

may encounter, i.e. LLE only makes local neighborhood preserving, but cannot prevent the distant samples from nearing.<br />

Moreover, GPLLE can estimate the intrinsic dimensionality d of the manifold structure. The experiment results show that<br />

GPLLE always achieves better classification performances than LLE based on the estimated d.<br />

15:00-17:10, Paper MoBT9.8<br />

3d Human Pose Estimation by an Annealed Two-Stage Inference Method<br />

Wang, Yuan-Kai, Fu Jen Univ.<br />

Cheng, Kuang-You, Fu Jen Univ.<br />

This paper proposes a novel human motion capture method that locates human body joint position and reconstructs the<br />

human pose in 3D space from monocular images. We propose a two-stage framework including 2D and 3D probabilistic<br />

graphical models which can solve the occlusion problem for the estimation of human joint positions. The 2D and 3D<br />

models adopt directed acyclic structure to avoid error propagation of inference in the models. Both the 2D and 3D models<br />

utilize the Expectation Maximization algorithm to learn prior distributions of the models. An annealed Gibbs sampling<br />

method is proposed for the two-stage method to inference the maximum posteriori distributions of joint positions. The annealing<br />

process can efficiently explore the mode of distributions and find solutions in high-dimensional space. Experiments<br />

are conducted on the Human Eva dataset to show the effectiveness of the proposed method. The experimental data are<br />

image sequences of walking motion with a full 180 turn around a region, which causes occlusion of poses and loss of<br />

- 54 -


image observations. Experimental results show that the proposed two-stage approach can efficiently estimate more accurate<br />

human poses from monocular images.<br />

15:00-17:10, Paper MoBT9.9<br />

Extended Locality Preserving Discriminant Analysis for Face Recognition<br />

Yang, Liping, Chongqing Univ.<br />

Gong, Weiguo, Chongqing Univ.<br />

Gu, Xiaohua, Chongqing Univ.<br />

In this paper, an extended locality preserving discriminant analysis (ELPDA) method is proposed. To address the disadvantages<br />

of original locality preserving discriminant analysis (LPDA), a new locality preserving between-class scatter,<br />

which is characterized by samples and the corresponding k out-class nearest neighbors, is defined. Moreover, the small<br />

sample size problem is also avoided by solving a new optimization function. Experimental results on AR and FERET subsets<br />

illustrate the effectiveness of the proposed method for face recognition.<br />

15:00-17:10, Paper MoBT9.10<br />

Beyond “Near-Duplicates”: Learning Hash Codes for Efficient Similar-Image Retrieval<br />

Baluja, Shumeet, Google, Inc.<br />

Covell, Michele, Google, Inc.<br />

Finding similar images in a large database is an important, but often computationally expensive, task. In this paper, we<br />

present a two-tier similar-image retrieval system with the efficiency characteristics found in simpler systems designed to<br />

recognize near-duplicates. We compare the efficiency of lookups based on random projections and learned hashes to 100times-more-frequent<br />

exemplar sampling. Both approaches significantly improve on the results from exemplar sampling,<br />

despite having significantly lower computational costs. Learned-hash keys provide the best result, in terms of both recall<br />

and efficiency.<br />

15:00-17:10, Paper MoBT9.11<br />

Rare Class Classification on SVM<br />

He, He, The Hong Kong Pol. Univ.<br />

Ghodsi, Ali, University of Waterloo<br />

The problem of classification on highly imbalanced datasets has been studied extensively in the literature. Most classifiers<br />

show significant deterioration in performance when dealing with skewed datasets. In this paper, we first examine the underlying<br />

reasons for SVM’s deterioration on imbalanced datasets. We then propose two modifications for the soft margin<br />

SVM, where we change or add constraints to the optimization problem. The proposed methods are compared with regular<br />

SVM, cost-sensitive SVM and two re-sampling methods. Our experimental results demonstrate that this constrained SVM<br />

can consistently outperform the other associated methods.<br />

15:00-17:10, Paper MoBT9.12<br />

Package Boosting for Readaption of Cascaded Classifiers<br />

Szczot, Magdalena, Daimler AG<br />

Löhlein, Otto, Daimler AG<br />

Forster, Julian, Daimler AG<br />

Palm, Günther, Univ. of Ulm<br />

This contribution presents an efficient and useful way to readapt a cascaded classifier. We introduce Package Boosting<br />

which combines the advantages of Real Adaboost and Online Boosting for the realization of the strong learners in each<br />

cascade layer. We also examine the conditions which need to be fulfilled by a cascade in order to meet the requirements<br />

of an online algorithm and present the evaluation results of the system.<br />

- 55 -


15:00-17:10, Paper MoBT9.13<br />

Baby-Posture Classification from Pressure-Sensor Data<br />

Boughorbel, Sabri, Philips Res. Lab.<br />

Bruekers, Fons, Philips Res. Lab.<br />

Breebaart, Jeroen, Philips Res. Lab.<br />

The activity of babies and more specifically the posture of babies is an important aspect in their safety and development.<br />

In this paper, we studied the automatic classification of baby posture using a pressure-sensitive mat. The posture classification<br />

problem is formulated as the design of features that describe the pressure patterns induced by the child in combination<br />

with generic classifiers. Novel rotation invariant features constructed from high order statistics obtained from the concentric<br />

rings around the center of gravity. Non-constant ring radii are used in order to ensure uniform cell areas and therefore<br />

equal importance of features. A vote fusion of various generic classifiers is used for classification. Temporal information<br />

was shown to improve the classification performance. The obtained results are promising and open new opportunities for<br />

applications and further research in the area of baby safety and development.<br />

15:00-17:10, Paper MoBT9.14<br />

Vector Quantization Mappings for Speaker Verification<br />

Brew, Anthony, Univ. Coll. Dublin<br />

Cunningham, Pádraig, Univ. Coll. Dublin<br />

In speaker verification several techniques have emerged to map variable length utterances into a fixed dimensional space<br />

for classification. One popular approach uses Maximum A-Posteriori (MAP) adaptation of a Gaussian Mixture Model<br />

(GMM) to create a super-vector. This paper investigates using Vector Quantisation (VQ) as the global model to provide a<br />

similar mapping. This less computationally complex mapping gives comparable results to its GMM counterpart while<br />

also providing the ability for an efficient iterative update enabling media files to be scanned with a fixed length window.<br />

15:00-17:10, Paper MoBT9.15<br />

Maximum Entropy Model based Classification with Feature Selection<br />

Dukkipati, Ambedkar, Indian Inst. of Science<br />

Yadav, Abhay Kumar, Indian Inst. of Science<br />

M, Narasimha Murty, Indian Inst. of Science<br />

In this paper, we propose a classification algorithm based on the maximum entropy principle. This algorithm finds the<br />

most appropriate class-conditional maximum entropy distributions for classification. No prior knowledge about the form<br />

of density function for estimating the class conditional density is assumed except that the information is given in the form<br />

of expected valued of features. This algorithm also incorporates a method to select relevant features for classification. The<br />

proposed algorithm is suitable for large data-sets and is demonstrated by simulation results on some real world benchmark<br />

data-sets.<br />

15:00-17:10, Paper MoBT9.16<br />

Dimensionality Reduction by Minimal Distance Maximization<br />

Xu, Bo, Chinese Acad. of Sciences<br />

Huang, Kaizhu, Chinese Acad. of Sciences<br />

Liu, Cheng-Lin, Chinese Acad. of Sciences<br />

In this paper, we propose a novel discriminant analysis method, called Minimal Distance Maximization (MDM). In contrast<br />

to the traditional LDA, which actually maximizes the average divergence among classes, MDM attempts to find a low-dimensional<br />

subspace that maximizes the minimal (worst-case) divergence among classes. This ``minimal” setting solves<br />

the problem caused by the ``average” setting of LDA that tends to merge similar classes with smaller divergence when<br />

used for multi-class data. Furthermore, we elegantly formulate the worst-case problem as a convex problem, making the<br />

algorithm solvable for larger data sets. Experimental results demonstrate the advantages of our proposed method against<br />

five other competitive approaches on one synthetic and six real-life data sets.<br />

- 56 -


15:00-17:10, Paper MoBT9.17<br />

Possibilistic Clustering based on Robust Modeling of Finite Generalized Dirichlet Mixture<br />

Ben Ismail, Maher, Univ. of Louisville<br />

Frigui, Hichem, Univ. of Louisville<br />

We propose a novel possibilistic clustering algorithm based on robust modelling of the Generalized Dirichlet (GD) finite<br />

mixture. The algorithm generates two types of membership degrees. The first one is a posterior probability that indicates<br />

the degree to which the point fits the estimated distribution. The second membership represents the degree of typicality<br />

and is used to indentify and discard noise points. The algorithm minimizes one objective function to optimize GD mixture<br />

parameters and possibilistic membership values. This optimization is done iteratively by dynamically updating the Dirichlet<br />

mixture parameters and the membership values in each iteration. We compare the performance of the proposed algorithm<br />

with an EM based approach. We show that the possibilistic approach is more robust.<br />

15:00-17:10, Paper MoBT9.18<br />

Cluster-Pairwise Discriminant Analysis<br />

Makihara, Yasushi, The Inst. of Scientific and Industrial Res. Univ.<br />

Yagi, Yasushi, Osaka Univ.<br />

Pattern recognition problems often suffer from the larger intra-class variation due to situation variations such as pose,<br />

walking speed, and clothing variations in gait recognition. This paper describes a method of discriminant subspace analysis<br />

focused on situation cluster pair. In training phase, both a situation cluster discriminant subspace and class discriminant<br />

subspaces for the situation cluster pair by using training samples of non recognition-target classes. In testing phase, given<br />

a matching pair of patterns of recognition-target classes, posterior of situation cluster pairs is estimated at first, and then<br />

the distance is calculated in the corresponding cluster-pairwise class discriminant subspace. The experiments both with<br />

simulation data and real data show the effectiveness of the proposed method.<br />

15:00-17:10, Paper MoBT9.19<br />

Online Discriminative Kernel Density Estimation<br />

Kristan, Matej, Univ. of Ljubljana<br />

Leonardis, Ales, Univ. of Ljubljana<br />

We propose a new method for online estimation of probabilistic discriminative models. The method is based on the recently<br />

proposed online Kernel Density Estimation (oKDE) framework which produces Gaussian mixture models and allows<br />

adaptation using only a single data point at a time. The oKDE builds reconstructive models from the data, and we extend<br />

it to take into account the interclass discrimination through a new distance function between the classifiers. We arrive at<br />

an online discriminative Kernel Density Estimator (odKDE). We compare the odKDE to oKDE, batch state-of-the-art<br />

KDEs and support vector machine (SVM) on a standard database. The odKDE achieves comparable classification performance<br />

to that of best batch KDEs and SVM, while allowing online adaptation, and produces models of lower complexity<br />

than the oKDE.<br />

15:00-17:10, Paper MoBT9.20<br />

Local Outlier Detection based on Kernel Regression<br />

Gao, Jun, Chinese Acad. of Sciences<br />

Hu, Weiming, Chinese Acad. of Sciences<br />

Li, Wei, Chinese Acad. of Sciences<br />

Zhang, Zhongfei, State Univ. of New York, Binghamton<br />

Wu, Ou, Chinese Acad. of Sciences<br />

Outlier detection keeps an important and attractive task of the knowledge discovery in databases. In this paper, a novel<br />

approach named Multi-scale Local Kernel Regression is proposed. It transfers the unsupervised learning of outlier detection<br />

to the classic non-parameter regression learning. Through preprocessing the original data by the basic local density-based<br />

method, it adopts the local kernel regression estimator in the multiple scale neighborhoods to determine outliers. Experiments<br />

on several real life data sets demonstrate that this approach is promising in detection performance.<br />

- 57 -


15:00-17:10, Paper MoBT9.21<br />

Verification under Increasing Dimensionality<br />

Hendrikse, Anne, Univ. of Twente<br />

Veldhuis, Raymond, Univ. of Twente<br />

Spreeuwers, Luuk, Univ. of Twente<br />

Verification decisions are often based on second order statistics estimated from a set of samples. Ongoing growth of computational<br />

resources allows for considering more and more features, increasing the dimensionality of the samples. If the<br />

dimensionality is of the same order as the number of samples used in the estimation or even higher, then the accuracy of<br />

the estimate decreases significantly. In particular, the eigenvalues of the covariance matrix are estimated with a bias and<br />

the estimate of the eigenvectors differ considerably from the real eigenvectors. We show how a classical approach of verification<br />

in high dimensions is severely affected by these problems, and we show how bias correction methods can reduce<br />

these problems.<br />

15:00-17:10, Paper MoBT9.22<br />

Discriminant Feature Manifold for Facial Aging Estimation<br />

Fang, Hui, Swansea Univ.<br />

Grant, Phil, Swansea Univ.<br />

Min, Chen, Swansea Univ.<br />

Computerised facial aging estimation, which has the potential for many applications in human-computer interactions, has<br />

been investigated by many computer vision researchers in recent years. In this paper, a feature-based discriminant subspace<br />

is proposed to extract more discriminating and robust representations for aging estimation. After aligning all the faces by<br />

a piece-wise affine transform, orthogonal locality preserving projection (OLPP) is employed to project local binary patterns<br />

(LBP) from the faces into an age-discriminant subspace. The feature extracted from this manifold is more distinctive for<br />

age estimation compared with the features using in the state-of-the-art methods. Based on the public database FG-NET,<br />

the performance of the proposed feature is evaluated by using two different regression techniques, quadratic function and<br />

neural-network regression. The proposed feature subspace achieves the best performance based on both types of regression.<br />

15:00-17:10, Paper MoBT9.23<br />

Tensor Voting based Color Clustering<br />

Nguyen Dinh, Toan, Chonnam National Univ.<br />

Park, Jonghyun, Chonnam National Univ.<br />

Lee, Chilwoo, Chonnam National Univ.<br />

Lee, Gueesang, Chonnam National Univ.<br />

A novel color clustering algorithm based on tensor voting is proposed. Each color feature vector is encoded by a second<br />

order tensor. Tensor voting is then applied to estimate the number of dominant colors and perform color clustering by exploiting<br />

the shape and data density of the color clusters. The experimental results show that the proposed method generates<br />

good results in image segmentation, especially in the case of images with multi-color texts.<br />

15:00-17:10, Paper MoBT9.24<br />

An Improved Structural EM to Learn Dynamic Bayesian Nets<br />

De Campos, Cassio, Dalle Molle Inst. For Artificial Intelligence<br />

Zeng, Zhi, Rensselaer Pol. Inst.<br />

Ji, Qiang, RPI<br />

This paper addresses the problem of learning structure of Bayesian and Dynamic Bayesian networks from incomplete<br />

data based on the Bayesian Information Criterion. We describe a procedure to map the problem of the dynamic case into<br />

a corresponding augmented Bayesian network through the use of structural constraints. Because the algorithm is exact<br />

and anytime, it is well suitable for a structural Expectation–Maximization (EM) method where the only source of approximation<br />

is due to the EM itself. We show empirically that the use a global maximizer inside the structural EM is computationally<br />

feasible and leads to more accurate models.<br />

- 58 -


15:00-17:10, Paper MoBT9.25<br />

Gaussian Process Learning from Order Relationships using Expectation Propagation<br />

Wang, Ruixuan, Univ. of Dundee<br />

Mckenna, Stephen James, Univ. of Dundee<br />

A method for Gaussian process learning of a scalar function from a set of pair-wise order relationships is presented. Expectation<br />

propagation is used to obtain an approximation to the log marginal likelihood which is optimised using an analytical<br />

expression for its gradient. Experimental results show that the proposed method performs well compared with a<br />

previous method for Gaussian process preference learning.<br />

15:00-17:10, Paper MoBT9.26<br />

Feature Ranking based on Decision Border<br />

Diamantini, Claudia, Univ. Pol. Delle Marche<br />

Gemelli, Alberto, Univ. Pol. Delle Marche<br />

Potena, Domenico, Univ. Pol. Delle Marche<br />

In this paper a Feature Ranking algorithm for classification is proposed, which is based on the notion of Bayes decision<br />

border. The method elaborates upon the results of the Decision Border Feature Extraction approach, exploiting properties<br />

of eigenvalues and eigenvectors of the orthogonal transformation to calculate the discriminative importance weights of<br />

the original features. Non parametric classification is also considered by resorting to Labeled Vector Quantizers neural<br />

networks trained by the BVQ algorithm. The choice of this architecture leads to a cheap implementation of the ranking algorithm<br />

we call BVQ-FR. The effectiveness of BVQ-FR is tested on real datasets. The novelty of the method is to use a<br />

feature extraction technique to assess the weight of the original features, as opposed to heuristics methods commonly used.<br />

15:00-17:10, Paper MoBT9.27<br />

Three-Layer Spatial Sparse Coding for Image Classification<br />

Dai, Dengxin, Wuhan Univ.<br />

Yang, Wen, Wuhan Univ.<br />

Wu, Tianfu, Lotus Hill Res. Inst.<br />

In this paper, we propose a three-layer spatial sparse coding (TSSC) for image classification, aiming at three objectives:<br />

naturally recognizing image categories without learning phase, naturally involving spatial configurations of images, and<br />

naturally counteracting the intra-class variances. The method begins by representing the test images in a spatial pyramid<br />

as the to-be-recovered signals, and taking all sampled image patches at multiple scales from the labeled images as the<br />

bases. Then, three sets of coefficients are involved into the cardinal sparse coding to get the TSSC, one to penalize spatial<br />

inconsistencies of the pyramid cells and the corresponding selected bases, one to guarantee the sparsity of selected images,<br />

and the other to guarantee the sparsity of selected categories. Finally, the test images are classified according to a simple<br />

image-to-category similarity defined on the coding coefficients. In experiments, we test our method on two publicly available<br />

datasets and achieve significantly more accurate results than the conventional sparse coding with only a modest increase<br />

in computational complexity.<br />

15:00-17:10, Paper MoBT9.28<br />

Theoretical Analysis of a Performance Measure for Imbalanced Data<br />

Garcia, Vicente, Univ. Jaume I<br />

Mollineda, Ramón A., Univ. Jaume I<br />

Sanchez, J. Salvador, Univ. Jaume I<br />

This paper analyzes a generalization of a new metric to evaluate the classification performance in imbalanced domains,<br />

combining some estimate of the overall accuracy with a plain index about how dominant the class with the highest individual<br />

accuracy is. A theoretical analysis shows the merits of this metric when compared to other well-known measures.<br />

15:00-17:10, Paper MoBT9.29<br />

Cluster Preserving Embedding<br />

Zhan, Yubin, National Univ. of Defense Tech.<br />

Yin, Jianping, National Univ. of Defense Tech.<br />

- 59 -


Most of existing dimensionality reduction methods obtain the low-dimensional embedding via preserving a certain property<br />

of the data, such as locality, neighborhood relationship. However, the intrinsic cluster structure of data, which plays a key<br />

role in analyzing and utilizing the data, has been ignored by the state-of-the-art dimensionality reduction methods. Hence,<br />

in this paper we propose a novel dimensionality reduction method called Cluster Preserving Embedding(CPE), in which<br />

the cluster structure of original data is preserved via preserving the robust path-based similarity between pairwise points.<br />

We present two different methods to preserve this similarity. One is the Multidimensional Scaling(MDS) way, which tries<br />

to preserve similarity matrix accurately, the other one is a Laplacian-style way, which preserves the topological partial<br />

order of the similarity rather than similarity itself. Encouraging experimental results on a toy data set and handwritten<br />

digits from MNIST database demonstrate the effectiveness of our Cluster Preserving Embedding method.<br />

15:00-17:10, Paper MoBT9.30<br />

Color Image Analysis by Quaternion Zernike Moments<br />

Chen, Beijing, Southeast Univ.<br />

Shu, Huazhong, Southeast Univ.<br />

Zhang, Hui, Southeast Univ.<br />

Chen, Gang, Southeast Univ.<br />

Luo, Limin, Southeast Univ.<br />

Moments and moment invariants are useful tool in pattern recognition and image analysis. Conventional methods to deal<br />

with color images are based on RGB decomposition or graying. In this paper, by using the theory of quaternions, we introduce<br />

a set of quaternion Zernike moments (QZMs) for color images in a holistic manner. It is shown that the QZMs<br />

can be obtained via the conventional Zernike moments of each channel. We also construct a set of combined invariants to<br />

rotation and translation (RT) using the modulus of central QZMs. Experimental results show that the proposed descriptors<br />

are more efficient than the existing ones.<br />

15:00-17:10, Paper MoBT9.32<br />

Topic-Sensitive Tag Ranking<br />

Jin, Yan’An, Huazhong Univ. of Science and Tech.<br />

Li, Ruixuan, Huazhong Univ. of Science and Tech.<br />

Lu, Zhengding, Huazhong Univ. of Science and Tech.<br />

Wen, Kunmei, Huazhong Univ. of Science and Tech.<br />

Gu, Xiwu, Huazhong Univ. of Science and Tech.<br />

Social tagging is an increasingly popular way to describe and classify documents on the web. However, the quality of the<br />

tags varies considerably since the tags are authored freely. How to rate the tags becomes an important issue. In this paper,<br />

we propose a topic-sensitive tag ranking (TSTR) approach to rate the tags on the web. We employ a generative probabilistic<br />

model to associate each tag with a distribution of topics. Then we construct a tag graph according to the co-tag relationships<br />

and perform a topic-level random walk over the graph to suggest a ranking score for each tag at different topics. Experimental<br />

results validate the effectiveness of the proposed tag ranking approach.<br />

15:00-17:10, Paper MoBT9.33<br />

Water Reflection Detection using a Flip Invariant Shape Detector<br />

Zhang, Hua, Tianjin Univ.<br />

Guo, Xiaojie, Tianjin Univ.<br />

Cao, Xiaochun, Tianjin Univ.<br />

Water reflection detection is a tough task in computer vision, since the reflection is distorted by ripples irregularly. This<br />

paper proposes an effective method to detect water reflections. We introduce a descriptor that is not only invariant to<br />

scales, rotations and affine transformations, but also tolerant to the flip transformation and even non-rigid distortions, such<br />

as ripple effects. We analyze the structure of our descriptor and show how it outperforms the existing mirror feature descriptors<br />

in the context of water reflection. The experimental results demonstrate that our method is able to detect the<br />

water reflections.<br />

- 60 -


15:00-17:10, Paper MoBT9.34<br />

CDP Mixture Models for Data Clustering<br />

Ji, Yangfeng, Peking Univ.<br />

Lin, Tong, Peking Univ.<br />

Zha, Hongbin, Peking Univ.<br />

In Dirichlet process (DP) mixture models, the number of components is implicitly determined by the sampling parameters<br />

of Dirichlet process. However, this kind of models usually produces lots of small mixture components when modeling<br />

real-world data, especially high-dimensional data. In this paper, we propose a new class of Dirichlet process mixture<br />

models with some constrained principles, named constrained Dirichlet process (CDP) mixture models. Based on general<br />

DP mixture models, we add a resampling step to obtain latent parameters. In this way, CDP mixture models can suppress<br />

noise and generate the compact patterns of the data. Experimental results on data clustering show the remarkable performance<br />

of the CDP mixture models.<br />

15:00-17:10, Paper MoBT9.35<br />

A Simple Approach to Find the Best Wavelet Basis in Classification Problems<br />

Faradji, Farhad, Univ. of British Columbia<br />

Ward, Rabab K., Univ. of British Columbia<br />

Birch, Gary E., Neil Squire Society<br />

In this paper, we address the problem of finding the best wavelet basis in wavelet packet analysis for applications based<br />

on classification. We implement and evaluate our proposed method in the design of a self-paced 2-state mental task-based<br />

brain-computer interface (BCI) as one possible type of classification-based applications. The autoregressive coefficients<br />

of the best wavelet basis are concatenated to form the feature vector. The 2-stage classification process is based on quadratic<br />

discriminant analysis and majority voting. Seventeen wavelets from 2 different families are tested. A cross-validation<br />

process is per-formed twice to do model selection and system performance evaluation. The results show that the proposed<br />

method can be well applied to BCI systems.<br />

15:00-17:10, Paper MoBT9.36<br />

Learning Probabilistic Models of Contours<br />

Amate, Laure, Univ. of Nice-Sophia Antipolis, CNRS<br />

Rendas, Maria João, Univ. of Nice-Sophia Antipolis, CNRS<br />

We present a methodology for learning spline-based probabilistic models for sets of contours, proposing a new Monte<br />

Carlo variant of the EM algorithm to estimate the parameters of a family of distributions defined over the set of spline<br />

functions (with fixed complexity). The proposed model effectively captures the major morphological properties of the observed<br />

set of contours as well as its variability, as the simulation results presented demonstrate.<br />

15:00-17:10, Paper MoBT9.37<br />

Local Sparse Representation based Classification<br />

Li, Chun-Guang, Beijing Univ. of Posts and Telecommunications<br />

Guo, Jun, Beijing Univ. of Posts and Telecommunications<br />

Zhang, Honggang, Beijing Univ. of Posts and Telecommunications<br />

In this paper, we address the computational complexity issue in Sparse Representation based Classification (SRC). In<br />

SRC, it is time consuming to find a global sparse representation. To remedy this deficiency, we propose a Local Sparse<br />

Representation based Classification (LSRC) scheme, which performs sparse decomposition in local neighborhood. In<br />

LSRC, instead of solving the L1-norm constrained least square problem for all of training samples we solve a similar<br />

problem in a local neighborhood for each test sample. Experiments on face recognition data sets ORL and Extended Yale<br />

B demonstrated that the proposed LSRC algorithm can reduce the computational complexity and remain the comparative<br />

classification accuracy and robustness.<br />

- 61 -


15:00-17:10, Paper MoBT9.38<br />

Manifold Modeling with Learned Distance in Random Projection Space for Face Recognition<br />

Tsagkatakis, Grigorios, Rochester Inst. of Tech.<br />

Savakis, Andreas, Rochester Inst. of Tech.<br />

In this paper, we propose the combination of manifold learning and distance metric learning for the generation of a representation<br />

that is both discriminative and informative, and we demonstrate that this approach is effective for face recognition.<br />

Initial dimensionality reduction is achieved using random projections, a computationally efficient and data independent<br />

linear transformation. Distance metric learning is then applied to increase the separation between classes and improve the<br />

accuracy of nearest neighbor classification. Finally, a manifold learning method is used to generate a mapping between<br />

the randomly projected data and a low dimensional manifold. Face recognition results suggest that the combination of<br />

distance metric learning and manifold learning can increase performance. Furthermore, random projections can be applied<br />

as an initial step without significantly affecting the classification accuracy.<br />

15:00-17:10, Paper MoBT9.39<br />

Part Detection, Description and Selection based on Hidden Conditional Random Fields<br />

Lu, Wenhao, Tsinghua Univ.<br />

Wang, Shengjin, Tsinghua Univ.<br />

Ding, Xiaoqing, Tsinghua Univ.<br />

In this paper, the problem of part detection, description and selection is discussed. This problem is crucial in the learning<br />

algorithms of part-based models, but cannot be solved well when some candidate parts are extracted from background.<br />

This paper studies this problem and introduces a new algorithm, HCRF-PS (Hidden Conditional Random Fields for Part<br />

Selection), for part detection, description, especially selection. Our algorithm is distinguished for its power to optimize<br />

multiple kinds of information at the same time, including texture, color, location and part label. Finally, we did some experiments<br />

with HCRF-PS algorithm which give good results on both virtual and real data.<br />

15:00-17:10, Paper MoBT9.40<br />

Boosting Bayesian MAP Classification<br />

Piro, Paolo, CNRS/Univ. of Nice-Sophia Antipolis<br />

Nock, Richard, Univ. des Antilles et de la Guyane<br />

Nielsen, Frank, Ec. Pol.<br />

Barlaud, Michel, CNRS/Univ. of Nice-Sophia Antipolis<br />

In this paper we redefine and generalize the classic k-nearest neighbors (k-NN) voting rule in a Bayesian maximum-aposteriori<br />

(MAP) framework. Therefore, annotated examples are used for estimating pointwise class probabilities in the<br />

feature space, thus giving rise to a new instance-based classification rule. Namely, we propose to ``boost’’ the classic k-<br />

NN rule by inducing a strong classifier from a combination of sparse training data, called ``prototypes’’. In order to learn<br />

these prototypes, our MapBoost algorithm globally minimizes a multiclass exponential risk defined over the training data,<br />

which depends on the class probabilities estimated at sample points themselves. We tested our method for image categorization<br />

on three benchmark databases. Experimental results show that MapBoost significantly outperforms classic k-NN<br />

(up to 8%). Interestingly, due to the supervised selection of sparse prototypes and the multiclass classification framework,<br />

the accuracy improvement is obtained with a considerable computational cost reduction.<br />

15:00-17:10, Paper MoBT9.41<br />

Weighting of the K-Nearest-Neighbors<br />

Chernoff, Konstantin, Univ. of Copenhagen<br />

Nielsen, Mads<br />

This paper presents two distribution independent weighting schemes for k-Nearest-Neighbors (kNN). Applying the first<br />

scheme in a Leave-One-Out (LOO) setting corresponds to performing complete b-fold cross validation (b-CCV), while<br />

applying the second scheme corresponds to performing bootstrapping in the limit of infinite iterations. We demonstrate<br />

that the soft kNN errors obtained through b-CCV can be obtained by applying the weighted kNN in a LOO setting, and<br />

that the proposed weighting schemes can decrease the variance and improve the generalization of kNN in a CV setting.<br />

- 62 -


15:00-17:10, Paper MoBT9.42<br />

Learning Sparse Face Features : Application to Face Verification<br />

Buyssens, Pierre, Greyc UMR6072<br />

Revenu, Marinette, GREYC UMR 6072<br />

We present a low resolution face recognition technique based on a Convolutional Neural Network approach. The network<br />

is trained to reconstruct a reference per subject image. In classical feature–based approaches, a first stage of features extraction<br />

is followed by a classification to perform the recognition. In classical Convolutional Neural Network approaches,<br />

features extraction stages are stacked (interlaced with pooling layers) with classical neural layers on top to form the complete<br />

architecture of the network. This paper addresses two questions : 1. Does a pretraining of the filters in an unsupervised<br />

manner improve the recognition rate compared to the one with filters learned in a purely supervised scheme ? 2. Is there<br />

an advantage of pretraining more than one feature extraction stage ? We show particularly that a refinement of the filters<br />

during the supervised training improves the results.<br />

15:00-17:10, Paper MoBT9.43<br />

Image Feature Extraction using 2D Mel-Cepstrum<br />

Cakir, Serdar, Bilkent Univ.<br />

Cetin, E., Bilkent Univ.<br />

In this paper, a feature extraction method based on two-dimensional (2D) mel-cepstrum is introduced. Feature matrices<br />

resulting from the 2D mel-cepstrum, Fourier LDA approach and original image matrices are individually applied to the<br />

Common Matrix Approach (CMA) based face recognition system. For each of these feature extraction methods, recognition<br />

rates are obtained in the AR face database, ORL database and Yale database. Experimental results indicate that recognition<br />

rates obtained by the 2D mel-cepstrum method is superior to the recognition rates obtained using Fourier LDA approach<br />

and raw image matrices. This indicates that 2D mel-cepstral analysis can be used in image feature extraction problems.<br />

15:00-17:10, Paper MoBT9.44<br />

Entropy Estimation and Multi-Dimensional Scale Saliency<br />

Suau, Pablo, Univ. of Alicante<br />

Escolano, Francisco, Univ. of Alicante<br />

In this paper we survey two multi-dimensional Scale Saliency approaches based on graphs and the k-d partition algorithm.<br />

In the latter case we introduce a new divergence metric and we show experimentally its suitability. We also show an application<br />

of multi-dimensional Scale Saliency to texture discrimination. We demonstrate that the use of multi-dimensional<br />

data can improve the performance of texture retrieval based on feature extraction.<br />

15:00-17:10, Paper MoBT9.45<br />

A Novel Facial Localization for Three-Dimensional Face using Multi-Level Partition of Unity Implicits<br />

Hu, Yuan, Shanghai Jiao Tong Univ.<br />

Yan, Jingqi, Shanghai Jiao Tong Univ.<br />

Li, Wei, Shanghai Jiao Tong Univ.<br />

Shi, Pengfei, Shanghai Jiao Tong Univ.<br />

This paper presents a novel facial localization method for 3D face in the presence of facial pose and expression variation.<br />

An idea of using Multi-level Partition of Unity (MPU) Implicits in a hierarchical way is proposed for reconstruction of<br />

face surface. Based on the analysis of curvature features, nose and eyeholes regions can be detected on lower level reconstructed<br />

face surface uniquely. Experimental results show that this method is invariant to pose, holes, noise and expression.<br />

The overall performance of 99.18% is achieved.<br />

15:00-17:10, Paper MoBT9.46<br />

Automated Feature Weighting in Fuzzy Declustering-Based Vector Quantization<br />

Ng, Theam Foo, Univ. of New South Wales@ADFA<br />

Pham, Tuan D., Univ. of New South Wales@ADFA<br />

Sun, Changming, CSIRO<br />

- 63 -


Feature weighting plays an important role in improving the performance of clustering technique. We propose an automated<br />

feature weighting in fuzzy declustering-based vector quantization (FDVQ), namely AFDVQ algorithm, for enhancing effectiveness<br />

and efficiency in classification. The proposed AFDVQ imposes weights on the modified fuzzy c-means (FCM)<br />

so that it can automatically calculate feature weights based on their degrees of importance rather than treating them equally.<br />

Moreover, the extension of FDVQ and AFDVQ algorithms based on generalized improved fuzzy partitions (GIFP), known<br />

as GIFP-FDVQ and GIFP-AFDVQ respectively, are proposed. The experimental results on real data (original and noisy<br />

data) and modified data (biased and noisy-biased data) have demonstrated that the proposed algorithms outperformed<br />

standard algorithms in classifying clusters especially for biased data.<br />

15:00-17:10, Paper MoBT9.47<br />

A Discriminative and Heteroscedastic Linear Feature Transformation for Multiclass Classification<br />

Lee, Hung-Shin, National Taiwan Univ.<br />

Wang, Hsin-Min, Acad. Sinica<br />

Chen, Berlin, National Taiwan Normal Univ.<br />

This paper presents a novel discriminative feature transformation, named full-rank generalized likelihood ratio discriminant<br />

analysis (fGLRDA), on the grounds of the likelihood ratio test (LRT). fGLRDA attempts to seek a feature space, which is<br />

linearly isomorphic to the original n-dimensional feature space and is characterized by a full-rank transformation matrix,<br />

under the assumption that all the class-discrimination information resides in a d-dimensional subspace, through making<br />

the most confusing situation, described by the null hypothesis, as unlikely as possible to happen without the homoscedastic<br />

assumption on class distributions. Our experimental results demonstrate that fGLRDA can yield moderate performance<br />

improvements over other existing methods, such as linear discriminant analysis (LDA) for the speaker identification task.<br />

15:00-17:10, Paper MoBT9.48<br />

Sparse Representation Classifier Steered Discriminative Projection<br />

Yang, Jian, Nanjing Univ. of Science and Tech.<br />

Chu, Delin, National Univ. of Singapore<br />

The sparse representation-based classifier (SRC) has been developed and shows great potential for pattern classification.<br />

This paper aims to gain a discriminative projection such that SRC achieves the optimum performance in the projected<br />

pattern space. We use the decision rule of SRC to steer the design of a dimensionality reduction method, which is coined<br />

the sparse representation classifier steered discriminative projection (SRC-DP). SRC-DP matches SRC optimally in theory.<br />

Experiments are done on the AR and extended Yale B face image databases, and results show the proposed method is<br />

more effective than other dimensionality reduction methods with respect to the sparse representation-based classifier.<br />

15:00-17:10, Paper MoBT9.49<br />

Designing a Pattern Stabilization Method using Scleral Blood Vessels for Laser Eye Surgery<br />

Kaya, Aydin, Hacettepe Univ.<br />

Can, Ahmet Burak, Hacettepe Univ.<br />

Çakmak, Hasan Basri, Ataturk Research Hospital<br />

In laser eye surgery, the accuracy of operation depends on coherent eye tracking and registration techniques. Main approach<br />

used in image processing based eye trackers is extraction and tracking of pupil and limbus regions. In eye registration<br />

step, iris region features extracted from infrared images are used generally. Registration step determines the angular shift<br />

of eye origin by comparing the eye position on operation table with the eye topology obtained before the operation. Registration<br />

is only applied at the beginning but patients movements don not stop during operation. Hence we presented a<br />

method for pattern stabilization which can be repeated during operation at regular intervals. We use scleral blood vessels<br />

as features due to texturedness and resistance to errors caused by pupil center shift and ablation of cornea region.<br />

15:00-17:10, Paper MoBT9.51<br />

Aggregation of Probabilistic PCA Mixtures with a Variational-Bayes Technique over Parameters<br />

Bruneau, Pierrick, Nantes Univ.<br />

Gelgon, Marc, Nantes Univ.<br />

Picarougne, Fabien, Nantes Univ.<br />

- 64 -


This paper proposes a solution to the problem of aggregating versatile probabilistic models, namely mixtures of probabilistic<br />

principal component analyzers. These models are a powerful generative form for capturing high-dimensional, non<br />

Gaussian, data. They simultaneously perform mixture adjustment and dimensionality reduction. We demonstrate how such<br />

models may be advantageously aggregated by accessing mixture parameters only, rather than original data. Aggregation<br />

is carried out through Bayesian estimation with a specific prior and an original variational scheme. Experimental results<br />

illustrate the effectiveness of the proposal.<br />

15:00-17:10, Paper MoBT9.52<br />

Kernel Uncorrelated Adjacent-Class Discriminant Analysis<br />

Jing, Xiaoyuan, Nanjing Univ. of Posts and Telecommunications<br />

Li, Sheng, Nanjing Univ. of Posts and Telecommunications<br />

Yao, Yongfang, Nanjing Univ. of Posts and Telecommunications<br />

Bian, Lusha, Nanjing Univ. of Posts and Telecommunications<br />

Yang, Jingyu, Nanjing Univ. of Science and Tech.<br />

In this paper, a kernel uncorrelated adjacent-class discriminant analysis (KUADA) approach is proposed for image recognition.<br />

The optimal nonlinear discriminant vector obtained by this approach can differentiate one class and its adjacent<br />

classes, i.e., its nearest neighbor classes, by constructing the specific between-class and within-class scatter matrices in<br />

kernel space using the Fisher criterion. In this manner, KUADA acquires all discriminant vectors class by class. Furthermore,<br />

KUADA makes every discriminant vector satisfy locally statistical uncorrelated constraints by using the corresponding<br />

class and part of its most adjacent classes. Experimental results on the public AR and CAS-PEAL face databases<br />

demonstrate that the proposed approach outperforms several representative nonlinear discriminant methods.<br />

15:00-17:10, Paper MoBT9.53<br />

A Meta-Learning Approach to Conditional Random Fields using Error-Correcting Output Codes<br />

Ciompi, Francesco, Univ. de Barcelona<br />

Pujol, Oriol, UB<br />

Radeva, Petia, CVC<br />

We present a meta-learning framework for the design of potential functions for Conditional Random Fields. The design<br />

of both node potential and edge potential is formulated as a classification problem where margin classifiers are used. The<br />

set of state transitions for the edge potential is treated as a set of different classes, thus defining a multi-class learning<br />

problem. The Error-Correcting Output Codes (ECOC) technique is used to deal with the multi-class problem. Furthermore,<br />

the point defined by the combination of margin classifiers in the ECOC space is interpreted in a probabilistic manner, and<br />

the obtained distance values are then converted into potential values. The proposed model exhibits very promising results<br />

when applied to two real detection problems.<br />

15:00-17:10, Paper MoBT9.54<br />

Statistical Modeling of Image Degradation based on Quality Metrics<br />

Chetouani, Aladine, Inst. Galilée – Univ. Paris 13<br />

Beghdadi, Azeddine, Univ. Paris 13<br />

Deriche, Mohamed, KFUPM<br />

A plethora of Image Quality Metrics (IQM) has been proposed during the last two decades. However, at present time,<br />

there is no accepted IQM able to predict the perceptual level of image degradation across different types of visual distortions.<br />

Some measures are more adapted for a set of degradations but inefficient for others. Indeed, the efficiency of any<br />

IQM has been shown to depend upon the type of degradation. Thus, we propose here a new approach for predicting the<br />

type of degradation before using IQMs. The basic idea is first to identify the type of distortion using a Bayesian approach,<br />

then select the most appropriate IQM for estimating image quality for that specific type of distortion. The performance of<br />

the proposed method is evaluated in terms of classification accuracy across different types of degradations.<br />

15:00-17:10, Paper MoBT9.55<br />

Performance Evaluation of Automatic Feature Discovery Focused within Error Clusters<br />

Wang, Sui-Yu, Lehigh Univ.<br />

Baird, Henry, Lehigh Univ.<br />

- 65 -


We report performance evaluation of our automatic feature discovery method on the publicly available Gisette dataset: a<br />

set of 29 features discovered by our method ranks 129 among all 411 current entries on the validation set. Our approach<br />

is a greedy forward selection algorithm guided by error clusters. The algorithm finds error clusters in the current feature<br />

space, then projects one tight cluster into the null space of the feature mapping, where a new feature that helps to classify<br />

these errors can be discovered. This method assumes a ``data-rich’’ problem domain and works well when large amount<br />

of labeled data is available. The result on the Gisette dataset shows that our method is competitive to many of the current<br />

feature selection algorithms. We also provide analytical results showing that our method is guaranteed to lower the error<br />

rate on Gaussian distributions and that our approach may outperform the standard Linear Discriminant Analysis (LDA)<br />

method in some cases.<br />

15:00-17:10, Paper MoBT9.56<br />

Optimized Entropy-Constrained Vector Quantization of Lossy Vector Map Compression<br />

Chen, Minjie, Univ. of Eastern Finland<br />

Xu, Mantao, Carestream Health Corp. Shanghai, China<br />

Fränti, Pasi, Univ. of Eastern Finland<br />

Quantization plays an important part in lossy vector map compression, for which the existing solutions are based on either<br />

a fixed size open-loop code<strong>book</strong>, or a simple uniform quantization. In this paper, we proposed an entropy-constrained<br />

vector quantization to optimize both the structure and size of the code<strong>book</strong> at the same time using a closed-loop approach.<br />

In order to lower the distortion to a desirable level, we exploit two-level design strategy, where the vector quantization<br />

code<strong>book</strong> is designed only for most common vectors and the remaining (outlier) vectors are coded by uniform quantization.<br />

15:00-17:10, Paper MoBT9.57<br />

Nonnegative Embeddings and Projections for Dimensionality Reduction and Information Visualization<br />

Zafeiriou, Stefanos, Imperial Coll. of London<br />

Laskaris, Nikolaos, AiiA-Lab. AUTH,<br />

In this paper, we propose novel algorithms for low dimensionality nonnegative embedding of vectorial and/or relational<br />

data, as well as nonnegative projections for dimensionality reduction. We start by introducing a novel algorithm for Metric<br />

Multidimensional Scaling (MMS). We propose algorithms for Nonnegative Locally Linear Embedding (NLLE) and Nonnegative<br />

Laplacian Eigenmaps (NLE). By reformulating the problem of MMS, NLLE and NLE for finding projections<br />

we propose algorithms for Nonnegative Principal Component Analysis (NPCA), for Nonnegative Orthogonal Neighbourhood<br />

Preserving Projections (NONPP) and Nonnegative Orthogonal Locality Preserving Projections (NOLPP). We demonstrate<br />

some first preliminary results of the proposed methods in data visualization.<br />

15:00-17:10, Paper MoBT9.58<br />

Unsupervised Learning from Linked Documents<br />

Guo, Zhen, SUNY at Binghamton<br />

Zhu, Shenghuo, NEC Lab.<br />

Chi, Yun, NEC Lab.<br />

Zhang, Zhongfei, State Univ. of New York, Binghamton<br />

Gong, Yihong, NEC Lab. America, Inc.<br />

Documents in many corpora, such as digital libraries and webpages, contain both content and link information. In a traditional<br />

topic model which plays an important role in the unsupervised learning, the link information is either totally ignored<br />

or treated as a feature similar to content. We believe that neither approach is capable of accurately capturing the relations<br />

represented by links. To address the limitation of traditional topic models, in this paper we propose a citation-topic (CT)<br />

model that explicitly considers the document relations represented by links. In the CT model, instead of being treated as<br />

yet another feature, links are used to form the structure of the generative model. As a result, in the CT model a given document<br />

is modeled as a mixture of a set of topic distributions, each of which is borrowed (cited) from a document that is<br />

related to the given document. We apply the CT model to several document collections and the experimental comparisons<br />

against state-of-the-art approaches demonstrate very promising performances.<br />

- 66 -


15:00-17:10, Paper MoBT9.59<br />

Tensor Power Method for Efficient MAP Inference in Higher-Order MRFs<br />

Semenovich, Dimitri, Univ. of New South Wales<br />

Sowmya, Arcot, Univ. of New South Wales<br />

We present a new efficient algorithm for maximizing energy functions with higher order potentials suitable for MAP inference<br />

in discrete MRFs. Initially we relax integer constraints on the problem and obtain potential label assignments<br />

using higher-order (tensor) power method. Then we utilise an ascent procedure similar to the classic ICM algorithm to<br />

converge to a solution meeting the original integer constraints.<br />

15:00-17:10, Paper MoBT9.60<br />

Detection and Characterization of Anomalous Entities in Social Communication Networks<br />

Gupta, Nithi, Tata Consultancy Services<br />

Dey, Lipika, Tata Consultancy Services<br />

Social networks generated from emails or calls provide enormous geospatial and interaction information about subscribers.<br />

These have served as important inputs to intelligence analysts. In this paper, we propose an efficient algorithm for anomaly<br />

detection from social networks. Anomalous users are detected based on their behavioral dissimilarity from others. A rich<br />

feature set is proposed for outlier detection. A method for providing visual explanation for the results is also proposed.<br />

15:00-17:10, Paper MoBT9.61<br />

Mahalanobis-Based Adaptive Nonlinear Dimension Reduction<br />

Aouada, Djamila, Univ. of Luxembourg, SnT<br />

Baryshnikov, Yuliy, Bell Lab.<br />

Krim, Hamid, NCSU<br />

We define a new adaptive embedding approach for data dimension reduction applications. Our technique entails a local<br />

learning of the manifold of the initial data, with the objective of defining local distance metrics that take into account the<br />

different correlations between the data points. We choose to illustrate the properties of our work on the isomap algorithm.<br />

We show through multiple simulations that the new adaptive version of isomap is more robust to noise than the original<br />

non-adaptive one.<br />

15:00-17:10, Paper MoBT9.62<br />

Maximum Likelihood Estimation of Gaussian Mixture Models using Particle Swarm Optimization<br />

Ari, Caglar, Bilkent Univ.<br />

Aksoy, Selim, Bilkent Univ.<br />

We present solutions to two problems that prevent the effective use of population-based algorithms in clustering problems.<br />

The first solution presents a new representation for arbitrary covariance matrices that allows independent updating of individual<br />

parameters while retaining the validity of the matrix. The second solution involves an optimization formulation<br />

for finding correspondences between different parameter orderings of candidate solutions. The effectiveness of the proposed<br />

solutions are demonstrated on a novel clustering algorithm based on particle swarm optimization for the estimation of<br />

Gaussian mixture models.<br />

15:00-17:10, Paper MoBT9.63<br />

Object Discovery by Clustering Correlated Visual Word Sets<br />

Fuentes Pineda, Gibran, The Univ. of Electro-Communications<br />

Koga, Hisashi, Univ. of Electro-Communications<br />

Watanabe, Toshinori, Univ. of Electro-Communications<br />

This paper presents a novel approach to discovering particular objects from a set of unannotated images. We aim to find<br />

discriminative feature sets that can effectively represent particular object classes (as opposed to object categories). We<br />

achieve this by mining correlated visual word sets from the bag-of-features model. Specifically, we consider that a visual<br />

word set belongs to the same object class if all its visual words consistently occur together in the same image. To efficiently<br />

find such sets we apply Min-LSH to the occurrence vector of the each visual word. An agglomerative hierarchical clustering<br />

- 67 -


is further performed to eliminate redundancy and obtain more representative sets. We also propose a simple and efficient<br />

strategy for quantizing the feature descriptors based on locality-sensitive hashing. By experiment, we show that our approach<br />

can efficiently discover objects against cluster and slight viewpoint variations.<br />

- 68 -


Technical Program for Tuesday<br />

August 24, <strong>2010</strong><br />

- 69 -


- 70 -


TuAT1 Marmara Hall<br />

Object Detection and Recognition – I Regular Session<br />

Session chair: Jiang, Xiaoyi (Univ. of Münster)<br />

09:00-09:20, Paper TuAT1.1<br />

Learning an Efficient and Robust Graph Matching Procedure for Specific Object Recognition<br />

Revaud, Jerome, Univ. de Lyon, CNRS<br />

Lavoue, Guillaume, Univ. de Lyon, CNRS<br />

Ariki, Yasuo, Kobe Univ.<br />

Baskurt, Atilla, LIRIS, INSA Lyon<br />

We present a fast and robust graph matching approach for 2D specific object recognition in images. From a small number<br />

of training images, a model graph of the object to learn is automatically built. It contains its local key points as well as<br />

their spatial proximity relationships. Training is based on a selection of the most efficient subgraphs using the mutual information.<br />

The detection uses dynamic programming with a lattice and thus is very fast. Experiments demonstrate that<br />

the proposed method outperforms the specific object detectors of the state-of-the-art in realistic noise conditions.<br />

09:20-09:40, Paper TuAT1.2<br />

A New Biologically Inspired Feature for Scene Image Classification<br />

Jiang, Aiwen, Chinese Acad. of Sciences<br />

Wang, Chunheng, Chinese Acad. of Sciences<br />

Xiao, Baihua, Chinese Acad. of Sciences<br />

Dai, Ruvei, Chinese Acad. of Sciences<br />

Scene classification is a hot topic in pattern recognition and computer vision area. In this paper, based on the past research<br />

on vision neuroscience, we proposed a new biologically inspired feature method for scene image classification. The new<br />

feature accounts for the visual processing from simple cell to complex cell in V1 area, and also the spatial layout for scene<br />

gist signature. It provides a different line and model revision to consider some nonlinearities inV1 area. We compare it<br />

with traditional HMAX model and recently proposed ScSPM model, and experiment on a popular 15 scenes dataset. We<br />

show that our proposed method has many important differences and merits. The experiment results also show that our<br />

method outperforms the state-of-the-art like ScSPM and KSPM model.<br />

09:40-10:00, Paper TuAT1.3<br />

On a Quest for Image Descriptors based on Unsupervised Segmentation Maps<br />

Koniusz, Piotr, Univ. of Surrey<br />

Mikolajczyk, Krystian, Univ. of Surrey<br />

This paper investigates segmentation-based image descriptors for object category recognition. In contrast to commonly<br />

used interest points the proposed descriptors are extracted from pairs of adjacent regions given by a segmentation method.<br />

In this way we exploit semi-local structural information from the image. We propose to use the segments as spatial bins<br />

for descriptors of various image statistics based on gradient, colour and region shape. Proposed descriptors are validated<br />

on standard recognition benchmarks. Results show they outperform state-of-the-art reference descriptors with 5.6x less<br />

data and achieve comparable results to them with 8.6x less data. The proposed descriptors are complementary to SIFT<br />

and achieve state-of-the-art results when combined together within a kernel based classifier.<br />

10:00-10:20, Paper TuAT1.4<br />

An RST-Tolerant Shape Descriptor for Object Detection<br />

Su, Chih-Wen, Acad. Sinica<br />

Liao, Mark, Acad. Sinica, Taiwan<br />

Liang, Yu-Ming, Acad. Sinica<br />

Tyan, Hsiao-Rong, Chung Yuan Christian Univ.<br />

In this paper, we propose a new object detection method that does not need a learning mechanism. Given a hand-drawn<br />

model as a query, we can detect and locate objects that are similar to the query model in cluttered images. To ensure the<br />

invariance with respect to rotation, scaling, and translation (RST), high curvature points (HCPs) on edges are detected<br />

first. Each pair of HCPs is then used to determine a circular region and all edge pixels covered by the circular region are<br />

- 71 -


transformed into a polar histogram. Finally, we use these local descriptors to detect and locate similar objects within any<br />

images. The experiment results show that the proposed method outperforms the existing state-of-the-art work.<br />

10:20-10:40, Paper TuAT1.5<br />

Inverse Multiple Instance Learning for Classifier Grids<br />

Sternig, Sabine, Graz Univ. of Tech.<br />

Roth, Peter M., Graz Univ. of Tech.<br />

Bischof, Horst, Graz Univ. of Tech.<br />

Recently, classifier grids have shown to be a considerable alternative for object detection from static cameras. However,<br />

one drawback of such approaches is drifting if an object is not moving over a long period of time. Thus, the goal of this<br />

work is to increase the recall of such classifiers while preserving their accuracy and speed. In particular, this is realized<br />

by adapting ideas from Multiple Instance Learning within a boosting framework. Since the set of positive samples is well<br />

defined, we apply this concept to the negative samples extracted from the scene: Inverse Multiple Instance Learning. By<br />

introducing temporal bags, we can ensure that each bag contains at least one sample having a negative label, providing<br />

the required stability. The experimental results demonstrate that using the proposed approach state-of-the-art detection results<br />

can by obtained, however, showing superior classification results in presence of non-moving objects.<br />

TuAT2 Topkapı Hall B<br />

Clustering Regular Session<br />

Session chair: Tasdizen, Tolga (Univ. of Utah)<br />

09:00-09:20, Paper TuAT2.1<br />

On Dynamic Weighting of Data in Clustering with K-Alpha Means<br />

Chen, Si-Bao, Anhui Univ.<br />

Wang, Hai-Xian, Southeast Univ.<br />

Luo, Bin, Anhui Univ.<br />

Although many methods of refining initialization have appeared, the sensitivity of K-Means to initial centers is still an<br />

obstacle in applications. In this paper, we investigate a new class of clustering algorithm, K-Alpha Means (KAM), which<br />

is insensitive to the initial centers. With K-Harmonic Means as a special case, KAM dynamically weights data points<br />

during iteratively updating centers, which deemphasizes data points that are close to centers while emphasizes data points<br />

that are not close to any centers. Through replacing minimum operator in K-Means by alpha-mean operator, KAM significantly<br />

improves the clustering performances.<br />

09:20-09:40, Paper TuAT2.2<br />

ARImp: A Generalized Adjusted Rand Index for Cluster Ensembles<br />

Zhang, Shaohong, City Univ. of Hong Kong<br />

Wong, Hau-San, City Univ. of Hong Kong<br />

Adjusted Rand Index (ARI) is one of the most popular measure to evaluate the consistency between two partitions of data<br />

sets in the areas of pattern recognition. In this paper, ARI is generalized to a new measure, Adjusted Rand Index between<br />

a similarity matrix and a cluster partition (ARImp), to evaluate the consistency between a set of clustering solutions (or<br />

cluster partitions) and their associated consensus matrix in a cluster ensemble. The generalization property of ARImp from<br />

ARI is proved and its preservation of desirable properties of ARI is illustrated with simulated experiments. Also, we show<br />

with application experiments on several real data sets that ARImp can serve as a filter to identify the less effective cluster<br />

ensemble methods.<br />

09:40-10:00, Paper TuAT2.3<br />

On the Scalability of Evidence Accumulation Clustering<br />

Lourenço, André, Inst. Superior de Engenharia de Lisboa (ISEL), Inst. Superior Técnico (IST), IT<br />

Fred, Ana Luisa Nobre, Inst. Superior Técnico<br />

Jain, Anil, Michigan State Univ.<br />

This work focuses on the scalability of the Evidence Accumulation Clustering (EAC) method. We first address the space<br />

- 72 -


complexity of the co-association matrix. The sparseness of the matrix is related to the construction of the clustering ensemble.<br />

Using a split and merge strategy combined with a sparse matrix representation, we empirically show that a linear<br />

space complexity is achievable in this framework, leading to the scalability of EAC method to clustering large data-sets.<br />

10:00-10:20, Paper TuAT2.4<br />

A Hierarchical Clustering Method for Color Quantization<br />

Zhang, Jun, Waseda Univ.<br />

Hu, Jinglu, Waseda Univ.<br />

In this paper, we propose a hierarchical frequency sensitive competitive learning (HFSCL) method to achieve color quantization<br />

(CQ). In HFSCL, the appropriate number of quantized colors and the palette can be obtained by an adaptive procedure<br />

following a binary tree structure with nodes and layers. Starting from the root node that contains all colors in an<br />

image until all nodes are examined by split conditions, a binary tree will be generated. In each node of the tree, a frequency<br />

sensitive competitive learning (FSCL) network is used to achieve two-way division. To avoid over-split, merging condition<br />

is defined to merge the clusters that are close enough to each other at each layer. Experimental results show that HFSCL<br />

has the desired ability for CQ.<br />

10:20-10:40, Paper TuAT2.5<br />

Combining Real and Virtual Graphs to Enhance Data Clustering<br />

Wang, Liang, The Univ. of Melbourne<br />

Leckie, Christopher, The Univ. of Melbourne<br />

Kotagiri, Rao, Univ. of Melbourne<br />

Fusion of multiple information sources can yield significant benefits to accomplishing certain learning tasks. This paper<br />

exploits the sparse representation of signals for the problem of data clustering. The method is built within the framework<br />

of spectral clustering algorithms, which convexly combines a real graph constructed from the given physical features with<br />

a virtual graph constructed from sparse reconstructive coefficients. The experimental results on several real-world data<br />

sets have shown that fusion of both real and virtual graphs can obtain better (or at least comparable) results than using<br />

either graph alone.<br />

TuAT3 Topkapı Hall A<br />

3D Shape Recovery Regular Session<br />

Session chair: Sato, Jun (Nagoya Institute of Technology)<br />

09:00-09:20, Paper TuAT3.1<br />

Calibration Method for Line Structured Light Vision Sensor based on Vanish Points and Lines<br />

Wei, Zhenzhong, Beihang Univ.<br />

Xie, Meng, Beihang Univ. Ministry of Education<br />

Zhang, Guangjun, Beihang Univ.<br />

Line structured light vision sensor (LSLVS) calibration is to establish the location relationship between the camera and<br />

the light plane projector. This paper proposes a geometrical calibration method of LSLVS based on the property of vanish<br />

points and lines, by randomly moving the planar target. This method contains two steps, (1) the vanish point of the light<br />

stripe projected by the light plane is found in each target image, and all the obtained vanish points form the vanish line of<br />

the light plane, which is helpful to determine the normal of the light plane. (2) one 3D feature point on the light plane is<br />

acquired (one is enough, surely can be more than one) to determine d parameter of the light plane. Then the equation of<br />

the light plane under the camera coordinate system can be solved out. Computer simulations and real experiments have<br />

been carried out to validate our method, and the result of the real calibration reaches the accuracy of 0.141mm within the<br />

view field of about 300mm×200mm.<br />

- 73 -


09:20-09:40, Paper TuAT3.2<br />

A Color Invariant based Binary Coded Structured Light Range Scanner for Shiny Objects<br />

Benveniste, Rifat, Yeditepe Univ.<br />

Unsalan, Cem, Yeditepe Univ.<br />

Object range data provide valuable information in recognition and modeling applications. Therefore, it is extremely important<br />

to reliably extract the range data from a given object. There are various range scanners based on different principles.<br />

Among these, structured light based range scanners deserve spacial attention. In these systems, coded light stripes are projected<br />

onto the object. Using the bending of these light stripes on the object and the triangulation principle, range information<br />

can be obtained. Since this method is simple and fast, it is used in most industrial range scanners. Unfortunately,<br />

these range scanners can not scan shiny objects reliably. The main reason is either highlights on the shiny object or the<br />

ambient light in the environment. These disturb the coding by illumination. As the code is changed, the range data extracted<br />

from it will also be disturbed. In this study, we propose a color invariant based binary coded structured light range scanner<br />

to solve this problem. The color invariant used can eliminate the effects of highlights on the object and the ambient light<br />

from the environment. This way, we can extract the range data of shiny objects in a robust manner. To test our method, we<br />

developed a prototype range scanner. We provide the obtained range data of various test objects with our range scanner.<br />

09:40-10:00, Paper TuAT3.3<br />

Improving Shape-From-Focus by Compensating for Image Magnification Shift<br />

Pertuz, Said|, Rovira I Virgili Univ.<br />

Puig, Domenec, Rovira I Virgili Univ.<br />

Garcia, Miguel Angel, Autonomous Univ. of Madrid<br />

Images taken with different focus settings are used in shape-from-focus to reconstruct the depth map of a scene. A problem<br />

when acquiring images with different focus settings is the shift of image features due to changes in magnification. This<br />

paper shows that those changes affect the shape-from-focus performance and that the final reconstruction can be improved<br />

by compensating for that shift. The proposed scheme takes into account the effects due to magnification changes between<br />

near and far focused images and it is able to determine the depth of the scene points with higher accuracy than traditional<br />

techniques. Experimental results of the application of the proposed method are shown.<br />

10:00-10:20, Paper TuAT3.4<br />

Quasi-Dense Wide Baseline Matching for Three Views<br />

Koskenkorva, Pekka, Univ. of Oulu<br />

Kannala, Juho, Univ. of Oulu<br />

Brandt, Sami Sebastian, Univ. of Oulu<br />

This paper proposes a method for computing a quasi-dense set of matching points between three views of a scene. The<br />

method takes a sparse set of seed matches between pairs of views as input and then propagates the seeds to neighboring<br />

regions. The proposed method is based on the best-first match propagation strategy, which is here extended from twoview<br />

matching to the case of three views. The results show that utilizing the three-view constraint during the correspondence<br />

growing improves the accuracy of matching and reduces the occurrence of outliers. In particular, compared with<br />

two-view stereo, our method is more robust for repeating texture. Since the proposed approach is able to produce high<br />

quality depth maps from only three images, it could be used in multi-view stereo systems that fuse depth maps from multiple<br />

views.<br />

10:20-10:40, Paper TuAT3.5<br />

Robust Shape from Polarisation and Shading<br />

Huynh, Cong Phuoc, Australian National Univ.<br />

Robles-Kelly, Antonio, National ICT Australia<br />

Hancock, Edwin, Univ. of York<br />

In this paper, we present an approach to robust estimation of shape from single-view multi-spectral polarisation images.<br />

The developed technique tackles the problem of recovering the azimuth angle of surface normals robust to image noise<br />

and a low degree of polarisation. We note that the linear least-squares estimation results in a considerable phase shift from<br />

the ground truth in the presence of noise and weak polarisation in multispectral and hyper spectral imaging. This paper<br />

discusses the utility of robust statistics to discount the large error attributed to outliers and noise. Combining this approach<br />

- 74 -


with Shape from Shading, we fully recover the surface shape. We demonstrate the effectiveness of the robust estimator<br />

compared to the linear least-squares estimator through shape recovery experiments on both synthetic and real images.<br />

TuAT4 Dolmabahçe Hall A<br />

Signal Separation and Classification Regular Session<br />

Session chair: Erzin, Engin (Koc Univ.)<br />

09:00-09:20, Paper TuAT4.1<br />

Classifying Three-Way Seismic Volcanic Data by Dissimilarity Representation<br />

Porro, Diana, Advanced Tech. Application Center<br />

Duin, Robert, TU Delft<br />

Orozco-Alzate, Mauricio, Univ. Nacional de Colombia Sede Manizales<br />

Talavera, Isneri, Advanced Tech. Application Center<br />

Londoño-Bonilla, John Makario, Inst. Colombiano de Geología y Minería<br />

Multi-way data analysis is a multivariate data analysis technique having a wide application in some fields. Nevertheless,<br />

the development of classification tools for this type of representation is incipient yet. In this paper we study the dissimilarity<br />

representation for the classification of three-way data, as dissimilarities allow the representation of multi-dimensional objects<br />

in a natural way. As an example, the classification of seismic volcanic events is used. It is shown that in this application<br />

classification based on 2D spectrograms, dissimilarities perform better than on 1D spectral features.<br />

09:20-09:40, Paper TuAT4.2<br />

Improved Blur Insensitivity for Decorrelated Local Phase Quantization<br />

Heikkilä, Janne, Univ. of Oulu<br />

Ojansivu, Ville, Univ. of Oulu<br />

Rahtu, Esa, Univ. of Oulu<br />

This paper presents a novel blur tolerant I relation scheme for local phase quantization (LPQ) texture descriptor. As opposed<br />

to previous methods, the introduced model can be applied with virtually any kind of blur regardless of the point spread<br />

function. The new technique takes also into account the changes in the image characteristics originating from the blur<br />

itself. The implementation does not suffer from multiple solutions like the I relation in original LPQ, but still retains the<br />

same run-time computational complexity. The texture classification experiments illustrate considerable improvements in<br />

the performance of LPQ descriptors in the case of blurred images and show only negligible loss of accuracy with sharp<br />

images.<br />

09:40-10:00, Paper TuAT4.3<br />

Ensemble Discriminant Sparse Projections Applied to Music Genre Classification<br />

Kotropoulos, Constantine, Aristotle Univ. of Thessaloniki<br />

Arce, Gonzalo, Univ. of Delaware<br />

Panagakis, Yannis, Aristotle Univ. of Thessaloniki<br />

Resorting to the rich, psycho-physiologically grounded, properties of the slow temporal modulations of music recordings,<br />

a novel classifier ensemble is built, which applies discriminant sparse projections. More specifically, over complete dictionaries<br />

are learned and sparse coefficient vectors are extracted to optimally approximate the slow temporal modulations<br />

of the training music recordings. The sparse coefficient vectors are then projected to the principal subspaces of their withinclass<br />

and between-class covariance matrices. Decisions are taken with respect to the minimum Euclidean distance from<br />

the class mean sparse coefficient vectors, which undergo the aforementioned projections. The application of majority<br />

voting to the decisions taken by 10 individual classifiers, which are trained on the 10 training folds defined by stratified<br />

10-fold cross-validation on the GTZAN dataset, yields a music genre classification accuracy of 84.96% on average. The<br />

latter exceeds by 2.46% the highest accuracy previously reported without employing any sparse representations.<br />

- 75 -


10:00-10:20, Paper TuAT4.4<br />

Single Channel Speech Separation using Source-Filter Representation<br />

Stark, Michael, Graz Univ. of Tech.<br />

Wohlmayr, Michael, Graz Univ. of Tech.<br />

Pernkopf, Franz, Graz Univ. of Tech.<br />

We propose a fully probabilistic model for source-filter based single channel source separation. In particular, we perform<br />

separation in a sequential manner, where we estimate the source-driven aspects by a factorial HMM used for multi-pitch<br />

estimation. Afterwards, these pitch tracks are combined with the vocal tract filter model to form an utterance dependent<br />

model. Additionally, we introduce a gain estimation approach to enable adaptation to arbitrary mixing levels in the speech<br />

mixtures. We thoroughly evaluate this system and finally end up in a speaker independent model.<br />

10:20-10:40, Paper TuAT4.5<br />

Nonlinear Blind Source Separation using Slow Feature Analysis with Random Features<br />

Ma, Kuijun, Chinese Acad. of Sciences<br />

Tao, Qing, Chinese Acad. of Sciences<br />

Wang, Jue, Chinese Acad. of Sciences<br />

We develop an algorithm RSFA to perform nonlinear blind source separation with temporal constraints. The algorithm is<br />

based on slow feature analysis using random Fourier features for shift invariant kernels, followed by a selection procedure<br />

to obtain the sought-after signals. This method not only obtains remarkable results in a short computing time, but also excellently<br />

handles situations where there are multiple types of mixtures. In kernel methods, since the problem is unsupervised,<br />

the need of multiple kernels is ubiquitous. Experiments on music excerpts illustrate the strong performance of our<br />

method.<br />

TuAT5 Anadolu Auditorium<br />

Image Analysis – III Regular Session<br />

Session chair: Kittler, Josef (Univ. of Surrey)<br />

09:00-09:20, Paper TuAT5.1<br />

Canonical Image Selection by Visual Context Learning<br />

Zhou, Wengang, Univ. of Science and Tech. of China<br />

Lu, Yijuan, Texas State Univ. at San Marcos<br />

Li, Houqiang, Univ. of Science and Tech. of China<br />

Tian, Qi, Univ. of Texas at San Antonio<br />

Canonical image selection is to select a subset of photos that best summarize a photo collection. In this paper, we define<br />

the canonical image as those that contain most important and distinctive visual words. We propose to use visual context<br />

learning to discover visual word significance and develop Weighted Set Coverage algorithm to select canonical images<br />

containing distinctive visual words. Experiments with web image datasets demonstrate that the canonical images selected<br />

by our approach are not only representatives of the collected photos, but also exhibit a diverse set of views with minimal<br />

redundancy.<br />

09:20-09:40, Paper TuAT5.2<br />

Exposing Digital Image Forgeries by using Canonical Correlation Analysis<br />

Zhang, Chi, Beijing Univ. of Tech.<br />

Zhang, Hongbin, Beijing Univ. of Tech.<br />

In this paper, we propose a new method to detect the forgeries in digital images by using photo-response non-uniformity<br />

(PRNU) noise features. The method utilizes canonical correlation analysis (CCA) to measure linear correlation relationship<br />

between two sets of PRNU noise estimation from images taken by the same camera. The linear correlation relationship<br />

maximizes the correlation between the noise reference pattern(or PRNU noise estimation) and PRNU noise features from<br />

the same camera. To further improve the detection accuracy rate, the difference of variance between an image region and<br />

its smoothed version is used to categorize the image region into heavily textured region class or non-heavily textured<br />

region class. For a heavily textured region or a non-heavily textured region, Neyman-Pearson decision is used to calculate<br />

the corresponding threshold, and get the final result of detection.<br />

- 76 -


09:40-10:00, Paper TuAT5.3<br />

Adding Affine Invariant Geometric Constraint for Partial-Duplicate Image Retrieval<br />

Wu, Zhipeng, Chinese Acad. of Sciences<br />

Xu, Qianqian, Chinese Acad. of Sciences<br />

Jiang, Shuqiang, Chinese Acad. of Sciences<br />

Huang, Qingming, Chinese Acad. of Sciences<br />

Cui, Peng, Chinese Acad. of Sciences<br />

Li, Liang, Chinese Acad. of Sciences<br />

The spring up of large numbers of partial-duplicate images on the internet brings a new challenge to the image retrieval<br />

systems. Rather than taking the image as a whole, researchers bundle the local visual words by MSER detector into groups<br />

and add simple relative ordering geometric constraint to the bundles. Experiments show that bundled features become<br />

much more discriminative than single feature. However, the weak geometric constraint is only applicable when there is<br />

no significant rotation between duplicate images and it couldn’t handle the circumstances of image flip or large rotation<br />

transformation. In this paper, we improve the bundled features with an affine invariant geometric constraint. It employs<br />

area ratio invariance property of affine transformation to build the affine invariant matrix for bundled visual words. Such<br />

affine invariant geometric constraint can cope well with flip, rotation or other transformations. Experimental results on<br />

the internet partial-duplicate image database verify the promotion it brings to the original bundled features approach. Since<br />

currently there is no available public corpus for partial-duplicate image retrieval, we also publish our dataset for future<br />

studies.<br />

10:00-10:20, Paper TuAT5.4<br />

Outlier-Resistant Dissimilarity Measure for Feature-Based Image Matching<br />

Palenichka, Roman, Univ. of Quebec<br />

Lakhssassi, Ahmed, Univ. of Quebec<br />

Zaremba, Marek, Univ. of Quebec<br />

A novel dissimilarity measure is proposed to perform correspondence image matching for object recognition, image registration<br />

and content-based image retrieval. This is a feature-based matching, which supposes image representation (object<br />

description) in the form of a set of multi-location descriptor vectors. The proposed measure called intersection matching<br />

distance eliminates outlies (false or missing feature points) while transformation-invariantly matching two sets of descriptor<br />

vectors. A block-subdivision algorithm for time-efficient image matching is also described.<br />

10:20-10:40, Paper TuAT5.5<br />

The University of Surrey Visual Concept Detection System at ImageCLEF@<strong>ICPR</strong>: Working Notes<br />

Tahir, Muhammad Atif, Univ. of Surrey<br />

Fei, Yan, Univ. of Surrey<br />

Barnard, Mark, Univ. of Surrey<br />

Awais, Muhammad, Univ. of Surrey<br />

Mikolajczyk, Krystian, Univ. of Surrey<br />

Kittler, Josef, Univ. of Surrey<br />

Visual concept detection is one of the most important tasks in image and video indexing. This paper describes our system<br />

in the Image CLEF@<strong>ICPR</strong> Visual Concept Detection Task which ranked {\it first} for large-scale visual concept detection<br />

tasks in terms of Equal Error Rate (EER) and Area under Curve (AUC) and ranked {\it third} in terms of hierarchical<br />

measure. The presented approach involves state-of-the-art local descriptor computation, vector quantisation via clustering,<br />

structured scene or object representation via localised histograms of vector codes, similarity measure for kernel construction<br />

and classifier learning. The main novelty is the classifier-level and kernel-level fusion using Kernel Discriminant Analysis<br />

with RBF/Power Chi-Squared kernels obtained from various image descriptors. For 32 out of 53 individual concepts, we<br />

obtain the best performance of all 12 submissions to this task.<br />

- 77 -


TuAT6 Dolmabahçe Hall B<br />

Texture Regular Session<br />

Session chair: Theodoridis, Sergios (Univ. of Athens)<br />

09:00-09:20, Paper TuAT6.1<br />

On Adapting Pixel-Based Classification to Unsupervised Texture Segmentation<br />

Melendez, Jaime, Rovira I Virgili Univ.<br />

Puig, Domenec, Univ. Rovira I Virgili<br />

Garcia, Miguel Angel, Autonomous Univ. of Madrid<br />

An inherent problem of unsupervised texture segmentation is the absence of previous knowledge regarding the texture<br />

patterns present in the images to be segmented. A new efficient methodology for unsupervised image segmentation based<br />

on texture is proposed. It takes advantage of a supervised pixel-based texture classifier trained with feature vectors associated<br />

with a set of texture patterns initially extracted through a clustering algorithm. Therefore, the final segmentation is<br />

achieved by classifying each image pixel into one of the patterns obtained after the previous clustering process. Multisized<br />

evaluation windows following a top-down approach are applied during pixel classification in order to improve accuracy.<br />

The proposed technique has been experimentally validated on MeasTex, VisTex and Brodatz compositions, as well<br />

as on complex ground and aerial outdoor images. Comparisons with state-of the-art unsupervised texture segmenters are<br />

also provided.<br />

09:20-09:40, Paper TuAT6.2<br />

Natural Material Recognition with Illumination Invariant Textural Features<br />

Vacha, Pavel, Inst. of Information Theory and Automation<br />

Haindl, Michael, Inst. of Information Theory and Automation<br />

A visual appearance of natural materials fundamentally depends on illumination conditions, which significantly complicates<br />

a real scene analysis. We propose textural features based on fast Markovian statistics, which are simultaneously invariant<br />

to illumination colour and robust to illumination direction. No knowledge of illumination conditions is required and a<br />

recognition is possible from a single training image per material. Material recognition is tested on the currently most realistic<br />

visual representation – Bidirectional Texture Function (BTF), using the Amsterdam Library of Textures (I), which<br />

contains 250 natural materials acquired in different illumination conditions. Our proposed features significantly outperform<br />

several leading alternatives including Local Binary Patterns (LBP, LBP-HF) and Gabor features.<br />

09:40-10:00, Paper TuAT6.3<br />

Gaze-Motivated Compression of Illumination and View Dependent Textures<br />

Filip, Jiri, Inst. of Information Theory and Automation of the AS CR<br />

Haindl, Michael, Inst. of Information Theory and Automation<br />

Chantler, Michael J., Heriot-Watt Univ.<br />

Illumination and view dependent texture provide ample information on the appearance of real materials at the cost of enormous<br />

data storage requirements. Hence, past research focused mainly on compression and modelling of these data, however,<br />

few papers have explicitly addressed the way in which humans perceive these compressed data. We analyzed human<br />

gaze information to determine appropriate texture statistics. These statistics were then exploited in a pilot illumination<br />

and view direction dependent data compression algorithm. Our results showed that taking into account local texture variance<br />

can increase compression of current methods more than twofold, while preserving original realistic appearance and<br />

allowing fast data reconstruction.<br />

10:00-10:20, Paper TuAT6.4<br />

Perceptual Color Texture Code<strong>book</strong>s for Retrieving in Highly Diverse Texture Datasets<br />

Alvarez, Susana, Univ. Rovira I Virgili<br />

Salvetella, Anna, Univ. Autònoma de Barcelona<br />

Vanrell, Maria, Univ. Autònoma de Barcelona<br />

Otazu, Xavier, Univ. Autònoma de Barcelona<br />

Color and texture are visual cues of different nature, their integration in a useful visual descriptor is not an obvious step.<br />

One way to combine both features is to compute texture descriptors independently on each color channel. A second way<br />

- 78 -


is integrate the features at a descriptor level, in this case arises the problem of normalizing both cues. A significant progress<br />

in the last years in object recognition has provided the bag-of-words framework that again deals with the problem of<br />

feature combination through the definition of vocabularies of visual words. Inspired in this framework, here we present<br />

perceptual textons that will allow to fuse color and texture at the level of p-blobs, which is our feature detection step.<br />

Feature representation is based on two uniform spaces representing the attributes of the p-blobs. The low-dimensionality<br />

of these text on spaces will allow to bypass the usual problems of previous approaches. Firstly, no need for normalization<br />

between cues; and secondly, vocabularies are directly obtained from the perceptual properties of text on spaces without<br />

any learning step. Our proposal improve current state-of-art of color-texture descriptors in an image retrieval experiment<br />

over a highly diverse texture dataset from Corel.<br />

10:20-10:40, Paper TuAT6.5<br />

Illumination Estimation of 3D Surface Texture based on Active Basis<br />

Dong, Junyu, Ocean Univ. of China<br />

Su, Liyuan, Ocean Univ. of China<br />

Duan, Yuanxu, Alcatel-Lucent R&D<br />

This paper describes an approach to estimate illumination directions of 3D surface texture based on Active Basis. Instead<br />

of applying Gabor wavelet transform to extract texture features, we represent our texture features with a simple Haar<br />

feature to improve efficiency. The Active Basis model can be learned from training image patches by the shared pursuit<br />

algorithm. The base histogram can then be obtained based on each model. We measure the illumination directions by minimizing<br />

the Euclidean distance and the entropy difference of base histograms between the test image and the training sets.<br />

Experimental results demonstrate the effectiveness and accuracy of the proposed approach.<br />

TuAT7 Dolmabahçe Hall C<br />

Security and Privacy Regular Session<br />

Session chair: Veldhuis, Raymond (Univ of Twente)<br />

09:00-09:20, Paper TuAT7.1<br />

Binary Discriminant Analysis for Face Template Protection<br />

Feng, Y C, Hong Kong Baptist Univ.<br />

Yuen, Pong C, Hong Kong Baptist Univ.<br />

Biometric cryptosystem (BC) is a very secure approach for template protection because the stored template is encrypted.<br />

The key issues in BC approach include(I) limited capability in handling intra-class variations and (ii) binary input is required.<br />

To overcome these problems, this paper adopts the concept of discriminative analysis and develops a new binary<br />

discriminant analysis (BDA) method to convert a real valued template to a binary template. Experimental results on CMU-<br />

PIE and FRGC face databases show that the proposed BDA method outperforms existing template binarization schemes.<br />

09:20-09:40, Paper TuAT7.2<br />

Renewable Minutiae Templates with Tunable Size and Security<br />

Yang, Bian, Gjovik Univ. Coll.<br />

Busch, Christoph, Gjovik Univ. Coll.<br />

Gafurov, Davrondzhon, Gjovik Univ. Coll.<br />

Bours, Patrick, Gjovik Univ. Coll.<br />

A renewable fingerprint minutiae template generation scheme is proposed to utilize random projection for template diversification<br />

in a security enhanced way. The scheme first achieves absolute pre-alignment over local minutiae quadruplets<br />

in the original template and results in a fix-length feature vector; and then encrypts the feature vector by projecting it to<br />

multiple random matrices and quantizing the projected result; and finally post-process the resultant binary vector in a size<br />

and security tunable way to obtain the final protected minutia vicinity. Experiments on the fingerprint database<br />

FVC2002DB2_A demonstrate the desirable biometric performance of the proposed scheme.<br />

- 79 -


09:40-10:00, Paper TuAT7.3<br />

Tokenless Cancelable Biometrics Scheme for Protecting IrisCodes<br />

Ouda, Osama, Chiba Univ.<br />

Tsumura, Norimichi, Chiba Univ.<br />

Nakaguchi, Toshiya, Chiba Univ.<br />

In order to satisfy the requirements of the cancelable biometrics construct, cancelable biometrics techniques rely on other<br />

authentication factors such as password keys and/or user specific tokens in the transformation process. However, such<br />

multi-factor authentication techniques suffer from the same issues associated with traditional knowledge-based and tokenbased<br />

authentication systems. This paper presents a new one-factor cancelable biometrics scheme for protecting Iris Codes.<br />

The proposed method is based solely on Iris Codes; however, it satisfies the requirements of revocability, diversity and<br />

noninvertibility without deteriorating the recognition performance. Moreover, the transformation process is easy to implement<br />

and can be integrated simply with current iris matching systems. The impact of the proposed transformation<br />

process on the the recognition accuracy is discussed and its noninvertibility is analyzed. The effectiveness of the proposed<br />

method is confirmed experimentally using CASIA-IrisV3-Interval dataset.<br />

10:00-10:20, Paper TuAT7.4<br />

A Novel Fingerprint Template Protection Scheme based on Distance Projection Coding<br />

Wang, Ruifang, Chinese Acad. of Sciences<br />

Yang, Xin, Chinese Acad. of Sciences<br />

Liu, Xia, Harbin University of Science and Technology<br />

Zhou, Sujing, Chinese Acad. of Sciences<br />

Li, Peng, Chinese Acad. of Sciences<br />

Cao, Kai, Chinese Acad. of Sciences<br />

Tian, Jie, Chinese Acad. of Sciences<br />

The biometric template, which is stored in the form of raw data, has become the greatest potential threat to the security of<br />

biometric authentication system. As the compromise of the biometric data is permanent, the protection of biometric data<br />

is particularly important. Consequently, biometric template protection technologies have aroused research highlights recently.<br />

One of the most popular template protection methods is biometric cryptosystem method. In this paper, we design<br />

a code<strong>book</strong> named distance projection for biometric coding to generate secured biometric template, and propose a novel<br />

fingerprint biometric cryptosystem scheme based on the code<strong>book</strong>. Experimental results on FVC2002 DB2 show that the<br />

proposed scheme can obtain positive results on both security and authentication accuracy.<br />

10:20-10:40, Paper TuAT7.5<br />

Combination of Symmetric Hash Functions for Secure Fingerprint Matching<br />

Kumar, Gaurav, State Univ. of New York at Buffalo<br />

Tulyakov, Sergey, Univ. at Buffalo<br />

Govindaraju, Venu, Univ. at Buffalo<br />

Fingerprint based secure biometric authentication systems have received considerable research attention lately, where the<br />

major goal is to provide an anonymous, multipliable and easily revocable methodology for fingerprint verification. In our<br />

previous work, we have shown that symmetric hash functions are very effective in providing such secure fingerprint representation<br />

and matching since they are independent of order of minutiae triplets as well as location of singular points<br />

(e.g. core and delta). In this paper, we extend our prior work by generating a combination of symmetric hash functions,<br />

which increases the security of fingerprint matching by an exponential factor. Firstly, we extract kplets from each fingerprint<br />

image and generate a unique key for combining multiple hash functions up to an order of (k-1). Each of these keys is generated<br />

using the features extracted from minutiae k-plets such as bin index of smallest angles in each k-plet. This combination<br />

provides us an extra security in the face of brute force attacks, where the compromise of few hash functions as well<br />

do not compromise the overall matching. Our experimental results suggest that the EER obtained using the combination<br />

of hash functions (4.98%) is comparable with the baseline system (3.0%), with the added advantage of being more secure.<br />

- 80 -


TuAT8 Lower Foyer<br />

Structural Methods and Speech/Image Analysis Poster Session<br />

Session chair: Aguiar, Pedro M. Q. (Institute for Systems and Robotics / Instituto Superior Tecnico)<br />

09:00-11:10, Paper TuAT8.1<br />

Face Recognition based on Illumination Adaptive LDA<br />

Liu, Zhonghua, Nanjing Univ. of Science and Tech.<br />

Zhou, Jingbo, Nanjing Univ. of Science and Tech.<br />

Jin, Zhong, Nanjing Univ. of Science and Tech.<br />

The variation of facial appearance due to the illumination degrades face recognition systems considerably, which is well<br />

known as one of the bottlenecks in face recognition. However, the variations of each subject which are due to the changes<br />

of illumination are extremely similar to each other. We offline collect many face classes each of which has many images<br />

under different lighting conditions, a common within-class scatter matrix describing the within-class illumination variations<br />

of all the face classes can be gotten. Based on this, illumination adaptive linear discriminant analysis (IALDA) is proposed<br />

to solve illumination variation problems in face recognition when each face class has only one training sample under the<br />

standard lighting conditions. In the IALDA method, the illumination direction of an input face image is firstly estimated.<br />

Then the corresponding LDA feature, which is robust to the variations between the images under the estimated lighting<br />

conditions and the standard lighting conditions, is extracted. Experiments on the face databases demonstrate the effectiveness<br />

of the proposed method.<br />

09:00-11:10, Paper TuAT8.2<br />

Topological Dynamic Bayesian Networks<br />

Bouchaffra, Djamel, Grambling State Univ.<br />

The objective of this research is to embed topology within the dynamic Bayesian network (DBN) formalism. This extension<br />

of a DBN (that encodes statistical or causal relationships) to a topological DBN (TDBN) allows continuous mappings<br />

(e.g., topological homeomorphisms), topological relations (e.g., homotopy equivalences) and invariance properties (e.g.,<br />

surface genus, compactness) to be exploited. The mission of TDBN is not limited only to classify objects but to reveal<br />

how these objects are topologically related as well. Because TDBN formalism uses geometric constructors that project a<br />

discrete space onto a continuous space, it is well suited to identify objects that undergo smooth deformation. Experimental<br />

results in face identification across ages represent conclusive evidence that the fusion of statistics and topology embodied<br />

by the TDBN concept holds promise. The TDBN formalism outperformed the DBN approach in facial identification across<br />

ages.<br />

09:00-11:10, Paper TuAT8.3<br />

Vector Space Embedding of Undirected Graphs with Fixed-Cardinality Vertex Sequences for Classification<br />

Richiardi, Jonas, Ec. Pol. Fédérale de Lausanne<br />

Van De Ville, Dimitri, Ec. Pol. Fédérale de Lausanne<br />

Riesen, Kaspar, Univ. of Bern<br />

Bunke, Horst, Univ. of Bern<br />

Simple weighted undirected graphs with a fixed number of vertices and fixed vertex orderings can be used to represent<br />

data and patterns in a wide variety of scientific and engineering domains. Classification of such graphs by existing graph<br />

matching methods perform rather poorly because they do not exploit their specificity. As an alternative, methods relying<br />

on vector-space embedding hold promising potential. We propose two such techniques that can be deployed as a frontend<br />

for any pattern recognition classifiers: one has low computational cost but generates high-dimensional spaces, while<br />

the other is more computationally demanding but can yield relatively low-dimensional vector space representations. We<br />

show experimental results on an fMRI brain state decoding task and discuss the shortfalls of graph edit distance for the<br />

type of graph under consideration.<br />

- 81 -


09:00-11:10, Paper TuAT8.4<br />

Hierarchical Large Margin Nearest Neighbor Classification<br />

Chen, Qiaona, East China Normal Univ.<br />

Sun, Shiliang, East China Normal Univ.<br />

Distance metric learning has exhibited its great power to enhance performance in metric related pattern recognition tasks.<br />

The recent large margin nearest neighbor classification (LMNN) improves the performance of k-nearest neighbor classification<br />

by learning a global distance metric. However, it does not consider the locality of data distributions, which is<br />

crucial in determining a proper metric. In this paper, we propose a novel local distance metric learning method called hierarchical<br />

LMNN (HLMNN) which first builds a hierarchical structure by grouping data points according to the overlapping<br />

ratios defined by us and then learns distance metrics sequentially. Experimental results on real-world data sets including<br />

comparisons with the traditional k-nearest neighbor and the state-of-the-art LMNN show the effectiveness of the proposed<br />

HLMNN.<br />

09:00-11:10, Paper TuAT8.5<br />

Adapting Information Theoretic Clustering to Binary Images<br />

Bauckhage, Christian, Fraunhofer IAIS<br />

Thurau, Christian, Fraunhofer IAIS<br />

We consider the problem of finding points of interest along local curves of binary images. Information theoretic vector<br />

quantization is a clustering algorithm that shifts cluster centers towards the modes of principal curves of a data set. Its<br />

runtime characteristics, however, do not allow for efficient processing of many data points. In this paper, we show how to<br />

solve this problem when dealing with data on a 2D lattice. Borrowing concepts from signal processing, we adapt information<br />

theoretic clustering to the quantization of binary images and gain significant speedup.<br />

09:00-11:10, Paper TuAT8.6<br />

Nearest-Manifold Classification with Gaussian Processes<br />

Jun, Goo, Univ. of Texas at Austin<br />

Ghosh, Joydeep, Univ. of Texas<br />

Manifold models for nonlinear dimensionality reduction provide useful low-dimensional representations of high-dimensional<br />

data. Most manifold models are unsupervised algorithms and map the entire data onto a single manifold. Heterogeneous<br />

data with multiple classes are often better modeled by multiple manifolds rather than by a single global manifold,<br />

but there is no explicit way to compare instances embedded in different subspaces. We propose a novel low-to-high dimensional<br />

mapping using Gaussian processes that offers comparisons in the original space. Based on the mapping, we<br />

propose a nearest-manifold classification algorithm for high-dimensional data. Experimental results show that the proposed<br />

algorithm provides good classification accuracies for problems well-modeled by multiple manifolds.<br />

09:00-11:10, Paper TuAT8.7<br />

Mining Exemplars for Object Modelling using Affinity Propagation<br />

Xia, Shengping, Univ. of York<br />

Liu, Jianjun, Univ. of York<br />

Hancock, Edwin, Univ. of York<br />

This paper focusses on the problem of locating object class exemplars from a large corpus of images using a infinity propagation.<br />

We use attributed relational graphs to represent groups of local invariant features together with their spatial arrangement.<br />

Rather than mining exemplars from the entire graph corpus, we prefer to cluster object specific exemplars. Firstly,<br />

we obtain an object specific cluster of graphs using similarity propagation. The popular a nity propagation method is then<br />

individually applied to each object specific cluster. Using this clustering method, we can obtain object specific exemplars<br />

together with a high precision for the data associated with each exemplar. Experiments are performed on over 80K images<br />

spanning 500 objects, and demonstrate the performance of the method in terms of efficiency, scalability.<br />

- 82 -


09:00-11:10, Paper TuAT8.8<br />

Background Filtering for Improving of Object Detection in Images<br />

Qin, Ge, Univ. of Surrey<br />

Vrusias, Bogdan, Univ. of Surrey<br />

Gillam, Lee, Univ. of Surrey<br />

We propose a method for improving object recognition in street scene images by identifying and filtering out background<br />

aspects. We analyse the semantic relationships between foreground and background objects and use the information obtained<br />

to remove areas of the image that are misclassified as foreground objects. We show that such background filtering<br />

improves the performance of four traditional object recognition methods by over 40%. Our method is independent of the<br />

recognition algorithms used for individual objects, and can be extended to generic object recognition in other environments<br />

by adapting other object models.<br />

09:00-11:10, Paper TuAT8.9<br />

Sparse Local Discriminant Projections for Feature Extraction<br />

Lai, Zhihui, Nanjing Univ. of Science and Tech.<br />

Jin, Zhong, Nanjing Univ. of Science and Tech.<br />

Yang, Jian, Nanjing Univ. of Science and Tech.<br />

Wong, W.K., The Hong Kong Pol. Univ.<br />

One of the major disadvantages of the linear dimensionality reduction algorithms, such as Principle Component Analysis<br />

(PCA) and Linear Discriminant Analysis (LDA), are that the projections are linear combination of all the original features<br />

or variables and all weights in the linear combination known as loadings are typically non-zero. Thus, they lack physical<br />

interpretation in many applications. In this paper, we propose a novel supervised learning method called Sparse Local<br />

Discriminant Projections (SLDP) for linear dimensionality reduction. SLDP introduces a sparse constraint into the objective<br />

function and obtains a set of sparse projective axes with directly physical interpretation. The sparse projections can be efficiently<br />

computed by the Elastic Net combining with spectral analysis. The experimental results show that SLDP give<br />

the explicit interpretation on its projections and achieves competitive performance compared with some dimensionality<br />

reduction techniques.<br />

09:00-11:10, Paper TuAT8.10<br />

Information-Theoretic Feature Selection from Unattributed Graphs<br />

Bonev, Boyan, Univ. of Alicante<br />

Escolano, Francisco, Univ. of Alicante<br />

Giorgi, Daniela, National Res. Council<br />

Biasotti, Silvia, CNR – IMATI<br />

In this work we evaluate purely structural graph measures for 3D objects classification. We extract spectral features from<br />

different Reeb graph representations. Information-theoretic feature selection gives an insight on which are the most relevant<br />

features.<br />

09:00-11:10, Paper TuAT8.11<br />

Head Pose Estimation based on Random Forests for Multiclass Classification<br />

Huang, Chen, Tsinghua Univ.<br />

Ding, Xiaoqing, Tsinghua Univ.<br />

Fang, Chi, Tsinghua Univ.<br />

Head pose estimation remains a unique challenge for computer vision system due to identity variation, illumination<br />

changes, noise, etc. Previous statistical approaches like PCA, linear discriminative analysis (LDA) and machine learning<br />

methods, including SVM and Adaboost, cannot achieve both accuracy and robustness that well. In this paper, we propose<br />

to use Gabor feature based random forests as the classification technique since they naturally handle such multi-class classification<br />

problem and are accurate and fast. The two sources of randomness, random inputs and random features, make<br />

random forests robust and able to deal with large feature spaces. Besides, we implement LDA as the node test to improve<br />

the discriminative power of individual trees in the forest, with each node generating both constant and variant number of<br />

children nodes. Experiments are carried out on two public databases to show the proposed algorithm outperforms other<br />

approaches in both accuracy and computational efficiency.<br />

- 83 -


09:00-11:10, Paper TuAT8.12<br />

Differential Morphological Decomposition Segmentation: A Multi-Scale Object based Image Description<br />

Gueguen, Lionel, JRC – European Commission<br />

Soille, Pierre, Ec. Joint Res. Centre<br />

Pesaresi, Martino, Ec. Joint Res. Centre<br />

In order to describe, to extract image information content, segmentation is a well-known approach to represent the information<br />

in terms of objects. Image segmentation is a common image processing technique aiming at disintegrating an<br />

image into a partition of its support. Hierarchical of fuzzy segmentation are extension of segmentation definition, in order<br />

to provide a covering of the image support with overlapping segments. In this paper, we propose a novel approach for<br />

breaking up an image into multi-scale overlapping objects. The image is decomposed by granulometry or differential morphological<br />

pyramid, resulting in a discrete scale-space representation. Then, the scale-space transform is segmented by a<br />

region based method. Projecting the obtained scale-space partition into space constitutes the disintegrated image representation,<br />

which enables a multi-scale object based image description.<br />

09:00-11:10, Paper TuAT8.13<br />

Efficient Learning to Label Images<br />

Jia, Ke, Australian National Univ. National ICT Australia<br />

Cheng, Li, NICTA<br />

Liu, Nianjun, NICTA<br />

Wang, Lei, The Australian National Univ.<br />

Conditional random field methods (CRFs) have gained popularity for image labeling tasks in recent years. In this paper,<br />

we describe an alternative discriminative approach, by extending the large margin principle to incorporate spatial correlations<br />

among neighboring pixels. In particular, by explicitly enforcing the sub modular condition, graph-cuts is conveniently<br />

integrated as the inference engine to attain the optimal label assignment efficiently. Our approach allows learning<br />

a model with thousands of parameters, and is shown to be capable of readily incorporating higher-order scene context.<br />

Empirical studies on a variety of image datasets suggest that our approach performs competitively compared to the stateof-the-art<br />

scene labeling methods.<br />

09:00-11:10, Paper TuAT8.14<br />

NAVIDOMASS: Structural-Based Approaches towards Handling Historical Documents<br />

Jouili, Salim, LORIA<br />

Coustaty, Mickaël, Univ. of La Rochelle<br />

Tabbone, Salvatore, Univ. Nancy 2-LORIA<br />

Ogier, Jean-Marc, Univ. de la Rochelle<br />

In the context of the NAVIDOMASS project, the problematic of this paper concerns the clustering of historical document<br />

images. We propose a structural-based framework to handle the ancient ornamental letters data-sets. The contribution,<br />

firstly, consists of examining the structural (i.e. graph) representation of the ornamental letters, secondly, the graph matching<br />

problem is applied to the resulted graph-based representations. In addition, a comparison between the structural (graphs)<br />

and statistical (generic Fourier descriptor) techniques is drawn.<br />

09:00-11:10, Paper TuAT8.15<br />

Median Graph Shift: A New Clustering Algorithm for Graph Domain<br />

Jouili, Salim, LORIA<br />

Tabbone, Salvatore, Univ. Nancy 2-LORIA<br />

Lacroix, Vinciane, Royal Military Acad. Belgium<br />

In the context of unsupervised clustering, a new algorithm for the domain of graphs is introduced. In this paper, the key idea<br />

is to adapt the mean-shift clustering and its variants proposed for the domain of feature vectors to graph clustering. These algorithms<br />

have been applied successfully in image analysis and computer vision domains. The proposed algorithm works in<br />

an iterative manner by shifting each graph towards the median graph in a neighborhood. Both the set median graph and the<br />

generalized median graph are tested for the shifting procedure. In the experiment part, a set of cluster validation indices are<br />

used to evaluate our clustering algorithm and a comparison with the well-known Kmeans algorithm is provided.<br />

09:00-11:10, Paper TuAT8.16<br />

- 84 -


A Discrete Labelling Approach to Attributed Graph Matching using SIFT Features<br />

Sanroma, Gerard, Univ. Rovira I Virgili<br />

Alquezar, Rene, Univ. Pol. De Catalunya<br />

Serratosa, Francesc, Univ. Rovira I Virgili<br />

Local invariant feature extraction methods are widely used for image-features matching. There exist a number of approaches<br />

aimed at the refinement of the matches between image-features. It is a common strategy among these approaches<br />

to use geometrical criteria to reject a subset of outliers. One limitation of the outlier rejection design is that it is unable to<br />

add new useful matches. We present a new model that integrates the local information of the SIFT descriptors along with<br />

global geometrical information to estimate a new robust set of feature-matches. Our approach encodes the geometrical information<br />

by means of graph structures while posing the estimation of the feature-matches as a graph matching problem.<br />

Some comparative experimental results are presented.<br />

09:00-11:10, Paper TuAT8.17<br />

A Conductance Electrical Model for Representing and Matching Weighted Undirected Graphs<br />

Igelmo, Manuel, Univ. Pol. De Catalunya<br />

Sanfeliu, Alberto, Univ. Pol. De Catalunya<br />

Ferrer, Miquel, Univ. Pol. De Catalunya<br />

In this paper we propose a conductance electrical model to represent weighted undirected graphs that allows us to efficiently<br />

compute approximate graph isomorphism in large graphs. The model is built by transforming a graph into an electrical<br />

circuit. Edges in the graph become conductances in the electrical circuit. This model follows the laws of the electrical<br />

circuit theory and we can potentially use all the existing theory and tools of this field to derive other approximate techniques<br />

for graph matching. In the present work, we use the proposed circuital model to derive approximated graph isomorphism<br />

solutions.<br />

09:00-11:10, Paper TuAT8.18<br />

Computing the Barycenter Graph by Means of the Graph Edit Distance<br />

Bardaji, Itziar, Univ. Pol. De Catalunya<br />

Ferrer, Miquel, Univ. Pol. De Catalunya<br />

Sanfeliu, Alberto, Univ. Pol. De Catalunya<br />

The barycenter graph has been shown as an alternative to obtain the representative of a given set of graphs. In this paper<br />

we propose an extension of the original algorithm which makes use of the graph edit distance in conjunction with the<br />

weighted mean of a pair of graphs. Our main contribution is that we can apply the method to attributed graphs with any<br />

kind of labels in both the nodes and the edges, equipped with a distance function less constrained than in previous approaches.<br />

Experiments done on four different datasets support the validity of the method giving good approximations of<br />

the barycenter graph.<br />

09:00-11:10, Paper TuAT8.19<br />

Refined Morphological Methods of Moment Computation<br />

Suk, Tomas, Inst. of Information Theory and Automation<br />

Flusser, Jan, Inst. of Information Theory and Automation<br />

A new method of moment computation based on decomposition of the object into rectangular blocks is presented. The decomposition<br />

is accomplished by means of distance transform. The method is compared with earlier morphological methods,<br />

namely with erosion decomposition to squares. All the methods are also compared with direct computation by definition.<br />

09:00-11:10, Paper TuAT8.20<br />

Robust Computation of the Polarisation Image<br />

Saman, Gule, Univ. of York<br />

Hancock, Edwin, Univ. of York<br />

In this paper we show how to render the computation of polarisation information from multiple polariser angle images robust.<br />

We make two contributions. First, we show how to use M-estimators to make robust moments estimates of the mean<br />

intensity, polarisation and phase. Second, we show how directional statistics can be used to smooth the phase-angle, and<br />

- 85 -


to improve its estimation when the polarisation is small. We apply the resulting techniques to polariser images and perform<br />

surface quality inspection. Compared to polarisation information delivered by the three-point method, our estimates reveal<br />

finer surface detail.<br />

09:00-11:10, Paper TuAT8.21<br />

Fast Polar and Spherical Fourier Descriptors for Feature Extraction<br />

Yang, Zhuo, Waseda Univ.<br />

Kamata, Sei-Ichiro, Waseda Univ.<br />

Polar Fourier Descriptor(PFD) and Spherical Fourier Descriptor(SFD) are rotation invariant feature descriptors for two<br />

dimensional(2D) and three dimensional(3D) image retrieval and pattern recognition tasks. They are demonstrated to show<br />

superiorities compared with other methods on describing rotation invariant features of 2D and 3D images. However in<br />

order to increase the computation speed, fast computation method is needed especially for applications like real-time systems<br />

and large image databases. This paper presents fast computation method for PFD and SFD that based on mathematical<br />

properties of trigonometric functions and associated Legendre polynomials. Proposed fast PFD and SFD are 8 and 16<br />

times faster than traditional ones that significantly boost computation process.<br />

09:00-11:10, Paper TuAT8.22<br />

RBM-Based Silhouette Encoding for Human Action Modelling<br />

Marin-Jimenez, Manuel Jesus, Univ. of Cordoba<br />

Perez De La Blanca, Nicolas, UGR<br />

Mendoza Perez, Maria Angeles, Univ. de Granada<br />

In this paper we evaluate the use of Restricted Bolzmann Machines (RBM) in the context of learning and recognizing<br />

human actions. The features used as basis are binary silhouettes of persons. We test the proposed approach on two datasets<br />

of human actions where binary silhouettes are available: ViHASi (synthetic data) and Weizmann (real data). In addition,<br />

on Weizmann dataset, we combine features based on optical flow with the associated binary silhouettes. The results show<br />

that thanks to the use of RBM-based models, very informative and shorter feature vectors can be obtained for the classification<br />

tasks, improving the classification performance.<br />

09:00-11:10, Paper TuAT8.23<br />

Shape Classification using Tree-Unions<br />

Wang, Bo, Huazhong Univ. of Science and Tech.<br />

Shen, Wei, Huazhong Univ. of Science and Tech.<br />

Liu, Wenyu, Huazhong Univ. of Science and Tech.<br />

You, Xinge, Huazhong Univ. of Science and Tech.<br />

Bai, Xiang, Huazhong Univ. of Science and Tech.<br />

In this paper, we proposed a novel approach to shape classification. A new shape tree based on junction nodes can represent<br />

the global structure in a simple way. The statistic distribution of junctions can be learned by merging the shape trees. In<br />

the process of learning, context of a junction node is obtained to improve the rate of classification. We illustrate the utility<br />

of the proposed method on the problem of 2D shape classification using the new shape tree representation.<br />

09:00-11:10, Paper TuAT8.24<br />

Sparse Coding of Linear Dynamical Systems with an Application to Dynamic Texture Recognition<br />

Ghanem, Bernard, Univ. of Illinois at Urbana-Champaign<br />

Ahuja, Narendra,<br />

Given a sequence of observable features of a linear dynamical system (LDS), we propose the problem of finding a representation<br />

of the LDS which is sparse in terms of a given dictionary of LDSs. Since LDSs do not belong to Euclidean<br />

space, traditional sparse coding techniques do not apply. We propose a probabilistic framework and an efficient MAP algorithm<br />

to learn this sparse code. Since dynamic textures (DTs) can be modeled as LDSs, we validate our framework and<br />

algorithm by applying them to the problems of DT representation and DT recognition. In the case of occlusion, we show<br />

that this sparse coding scheme outperforms conventional DT recognition methods.<br />

- 86 -


09:00-11:10, Paper TuAT8.25<br />

Background Modeling by Combining Joint Intensity Histogram with Time-Sequential Data<br />

Kita, Yasuyo, National Inst. of Advanced Industrial Science and Technology<br />

In this paper, a method for detecting changes from time-sequential images of outside scenes which are taken with several<br />

minutes interval is proposed. Recently, statistical background intensity model per pixel using Gaussian mixture model<br />

(GMM) has shown its effectiveness for detecting changes from video streams. However, when the time interval between<br />

consecutive images is long, enough number of frames can not be sampled for building useful GMM. To robustly build a<br />

pixel wise background model at time t0 from small number of fore and aft frames, we propose to use the joint intensity<br />

histogram of the images at time t0 and t0 + 1, H(It0, Ito+1). Under background dominance condition, background probability<br />

distribution for each intensity level at t0 can be estimated from H(It0, Ito+1). By taking this background probability<br />

distribution per intensity as a prior probability, GMM which models the variation in each pixel is robustly calculated even<br />

from several frames. Experimental results using actual field monitoring images have shown the advantage of the proposed<br />

method.<br />

09:00-11:10, Paper TuAT8.26<br />

2LDA: Segmentation for Recognition<br />

Perina, Alessandro, Univ. of Verona<br />

Cristani, Marco, Univ. of Verona<br />

Murino, Vittorio, Univ. of Verona<br />

Following the trend of segmentation for recognition, we present 2LDA, a novel generative model to automatically segment<br />

an image in 2 segments, background and foreground, while inferring a latent Dirichlet allocation (LDA) topic distribution<br />

on both segments. The idea is to merge two separate modules, LDA and the segmentation module, explicitly considering<br />

(and exchanging) the uncertainty between them. The resulting model adds spatial relationships to LDA, which in turn<br />

helps in using the topics to segment an image. The experimental results show that, unlike LDA, our model can be used to<br />

recognize objects, and also outperforms the state of the art algorithms.<br />

09:00-11:10, Paper TuAT8.27<br />

Modeling and Generalization of Discrete Morse Terrain Decompositions<br />

De Floriani, L.<br />

Magillo, Paola, Univ. of Genova<br />

Vitali, Maria, DISI, Univ. of Genova<br />

We address the problem of morphological analysis of real terrains. We describe a morphological model for a terrain by<br />

considering extensions of Morse theory to the discrete case. We propose a two-level model of the morphology of a terrain<br />

based on a graph joining the critical points of the terrain through integral lines. We present a new set of generalization operators<br />

specific for discrete piece-wise linear terrain models, which are used to reduce noise and the size of the morphological<br />

representation. We show results of our approach on real terrains.<br />

09:00-11:10, Paper TuAT8.28<br />

Region Description using Extended Local Ternary Patterns<br />

Liao, Wen-Hung, National Chengchi Univ.<br />

The local binary pattern (LBP) operator is a computationally efficient local texture descriptor and has found many useful<br />

applications. However, its sensitivity to noise and the high dimensionality of histogram associated with a mediocre size<br />

neighborhood have raised some concerns. In this paper, we attempt to improve the original LBP by proposing a novel extension<br />

named extended local ternary pattern (ELTP). We will investigate the characteristics of ELTP in terms of noise<br />

sensitivity, discriminability and computational efficiency. Preliminary experimental results have shown better efficacy of<br />

ELTP over the original LBP.<br />

09:00-11:10, Paper TuAT8.29<br />

A Novel Multi-View Agglomerative Clustering Algorithm based on Ensemble of Partitions on Different Views<br />

Mirzaei, Hamidreza, SFU<br />

- 87 -


In this paper, we propose a new algorithm for extending the hierarchical clustering methods and introduce a Multi-View<br />

Agglomerative Clustering approach to handle multi-view represented objects. Experiments on real world datasets indicate<br />

that our algorithm considering the relationship among multiple views can provide a solution with improved quality in<br />

multi-view setting. We find empirically that the multi-view version of our Agglomerative Clustering, independent of<br />

linkage method and given any number of views, greatly improves on its single-view counterparts.<br />

09:00-11:10, Paper TuAT8.30<br />

Hydroacoustic Signal Classification using Kernel Functions for Variable Feature Sets<br />

Tuma, Matthias, Ruhr-Univ. Bochum<br />

Igel, Christian, Ruhr-Univ. Bochum<br />

Prior, Mark, Preparatory Commission for the CTBTO<br />

Large-scale geophysical monitoring systems raise the need for real-time feature extraction and signal classification. We<br />

study support vector machine (SVM) classification of hydroacoustic signals recorded by the Comprehensive Nuclear-<br />

Test-Ban Treaty’s verification network. Due to constraints in the early signal processing most samples have incomplete<br />

feature sets with values missing not at random. We propose kernel functions explicitly incorporating Boolean representations<br />

of the missingness pattern through dedicated sub-kernels. For kernels with more than a few parameters, gradientbased<br />

model selection algorithms were employed. In the case of binary classification, an increase in classification accuracy<br />

as compared to baseline SVM and linear classifiers was observed. In the multi-class case we evaluated four different formulations<br />

of multi-class SVMs. Here, neither SVMs with standard nor with problem-specific kernels outperformed a baseline<br />

linear discriminant analysis.<br />

09:00-11:10, Paper TuAT8.31<br />

Large Margin Discriminant Hashing for Fast K-Nearest Neighbor Classification<br />

Shibata, Tomoyuki, Toshiba Corp.<br />

Kubota, Susumu, Toshiba Corp.<br />

Ito, Satoshi, Toshiba Corp.<br />

Since the k-nearest neighbor (k-NN) classification is computationally demanding in terms of time and memory, approximate<br />

nearest neighbor (ANN) algorithms that utilize dimensionality reduction and hashing are gathering interest. Dimensionality<br />

reduction saves memory usage for storing training patterns and hashing techniques significantly reduce the<br />

computation required for distance calculation. Several ANN methods have been proposed which make k-NN classification<br />

applicable to those tasks that have a large number of training patterns with very high-dimensional feature. Though conventional<br />

ANN methods try to approximate Euclidean distance calculation in the original high-dimensional feature space<br />

with much lower-dimensional subspace, the Euclidean distance in the original feature space is not necessarily optimal for<br />

classification. According to the recent studies, metric learning is effective to improve accuracy of the k-NN classification.<br />

In this paper, Large Margin Discriminative Hashing (LMDH) method, which projects input patterns into low dimensional<br />

subspace with the optimized metric for the k-NN classification, is proposed.<br />

09:00-11:10, Paper TuAT8.32<br />

Robust Frame-To-Frame Hybrid Matching<br />

Chen, Lei, Beijing Inst. of Tech.<br />

Jia, Yunde, Beijing Inst. of Tech.<br />

Wang, Zhongli, Beijing Inst. of Tech.<br />

In this paper, we propose a hybrid approach for addressing feature-based matching problem. We aim to obtain robust and<br />

accurate correspondence between features from image frames under unknown and unstructured environments. The approach<br />

incorporates image texture analysis, 2-D analytic signal theory and color modeling. It takes advantage of geometric<br />

invariant property in texture and monogenic signal information as well as photometric invariant property in HSV color<br />

information. The detected features are well localized with high accuracy and the selected matches are robust to changes<br />

in scale, blur, viewpoint, and illumination. Experiments conducted on a standard benchmark dataset demonstrate the effectiveness<br />

and reliability of our approach.<br />

- 88 -


09:00-11:10, Paper TuAT8.33<br />

A Fast Extension for Sparse Representation on Robust Face Recognition<br />

Qiu, Hui-Ning, Sun Yat-sen Univ.<br />

Pham, Duc-Son, Curtin Univ. of Tech.<br />

Venkatesh, Svetha, Curtin Univ. of Tech<br />

Liu, Wanquan, Curtin Univ. of Tech.<br />

Lai, Jian-Huang, Sun Yat-sen Univ.<br />

We extend a recent Sparse Representation-based Classification (SRC) algorithm for face recognition to work on 2D images<br />

directly, aiming to reduce the computational complexity whilst still maintaining performance. Our contributions include:<br />

(1) a new 2D extension of SRC algorithm; (2) an incremental computing procedure which can reduce the eigen decomposition<br />

expense of each 2D-SRC for sequential input data; and (3) extensive numerical studies to validate the proposed<br />

methods.<br />

09:00-11:10, Paper TuAT8.34<br />

A MANOVA of Major Factors of RIU-LBP Feature for Face Recognition<br />

Luo, Jie, Shanghai Univ.<br />

Fang, Yuchun, Shanghai Univ.<br />

Cai, Qiyun, Shanghai Univ.<br />

Local Binary Patterns (LBP) feature is one of the most popular representation schemes for face recognition. The four<br />

factors deciding its effect are the blocking number, image resolution, the sampling radius and sampling density of LBP<br />

operator. Numerous previous researches have taken various groups of value of these factors based on experimental comparisons.<br />

However, which factor among them contributes the most? Numerous revisions are made to the LBP operators<br />

for it is believed that the LBP coding is the most essential factor. Is it true? In this paper, with the very simple and classical<br />

Multivariate Analysis of Variance (MANOVA), we discover that the blocking number contributes the most; though all<br />

four factors have significant effect for recognition rate. In addition, with the same analysis, we disclose the detailed effect<br />

of each factor and their interactions to the precision of LBP features.<br />

09:00-11:10, Paper TuAT8.35<br />

Consistent Estimators of Median and Mean Graph<br />

Jain, Brijnesh J., Berlin Univ. of Tech.<br />

Obermayer, Klaus, Berlin Univ. of Tech.<br />

The median and mean graph are basic building blocks for statistical graph analysis and unsupervised pattern recognition<br />

methods such as central clustering and graph quantization. This contribution provides sufficient conditions for consistent<br />

estimators of true but unknown central points of a distribution on graphs.<br />

09:00-11:10, Paper TuAT8.36<br />

Efficient Encoding of N-D Combinatorial Pyramids<br />

Fourey, Sébastien, GREYC Ensicaen & Univ. of Caen<br />

Brun, Luc, ENSICAEN<br />

Combinatorial maps define a general framework which allows to encode any subdivision of an n-D orientable quasi-manifold<br />

with or without boundaries. Combinatorial pyramids are defined as stacks of successively reduced combinatorial<br />

maps. Such pyramids provide a rich framework which allows to encode fine properties of objects (either shapes or partitions).<br />

Combinatorial pyramids have first been defined in 2D, then extended using n-D generalized combinatorial maps.<br />

We motivate and present here an implicit and efficient way to encode pyramids of n-D combinatorial maps.<br />

- 89 -


09:00-11:10, Paper TuAT8.37<br />

View-Invariant Object Recognition with Visibility Maps<br />

Raytchev, Bisser, Hiroshima Univ.<br />

Mino, Tetsuya, Hiroshima Univ.<br />

Tamaki, Toru, Hiroshima Univ.<br />

Kaneda, Kazufumi, Hiroshima Univ.<br />

In this paper we propose a new framework for view-invariant 3D object recognition, based on what we call Visibility Maps.<br />

A Visibility Map (VM) encodes a compact model of an arbitrary 3D object for which a set of images taken from different<br />

views is available. Representative local invariant features extracted from each image are selectively combined to form a visibility<br />

basis, in terms of which an arbitrary view of the modeled object can be represented. A metric which incorporates geometric<br />

information is also provided for comparing test images to the model, and can be used for recognition.<br />

09:00-11:10, Paper TuAT8.38<br />

Normalized Sum-Over-Paths Edit Distances<br />

García, Silvia, Univ. Catholique de Louvain<br />

Fouss, François, Facultés Univ. Catholiques de Mons<br />

Shimbo, Masashi, Graduate School of Information Science<br />

Saerens, Marco, Univ. Catholique de Louvain<br />

In this paper, normalized SoP string-edit distances, taking into account all possible alignments between two sequences, are<br />

investigated. These normalized distances are variants of the Sum-over-Paths (SoP) distances which compute the expected<br />

cost on all sequence alignments by favoring low-cost ones therefore favoring good alignment. Such distances consider two<br />

sequences tied by many optimal or nearly-optimal alignments as more similar than two sequences sharing only one, optimal,<br />

alignment. They depend on a parameter, and reduce to the standard distances the edit-distance or the longest common subsequence<br />

when 0, while having the same time complexity. This paper puts the emphasis on applying some type of normalization<br />

of the expectation of the cost. Experimental results for clustering and classification tasks performed on four OCR<br />

data sets show that (I) the applied normalization generally improves the existing results, and (ii) as for the SoP edit-distances,<br />

the normalized SoP edit-distances clearly outperform the non-randomized measures, i.e. the standard edit distance and longest<br />

common subsequence.<br />

09:00-11:10, Paper TuAT8.39<br />

Effective Multi-Level Image Representation for Image Categorization<br />

Li, Hao, Peking Univ.<br />

Peng, Yuxin, Peking Univ.<br />

This paper proposes a novel approach for image categorization based on effective multi-level image representation(MLIR).<br />

On one hand, to exploit fully the information of segmented regions at different levels in the image, we recursively segment<br />

the image into a hierarchical structure. On the other hand, to represent the information at different levels in a uniform manner,<br />

we construct a visual vocabulary based on the image regions of the hierarchical structure by a random sampling strategy.<br />

And the intermediate feature mapping is adopted to form a multi-level image representation, which encodes the information<br />

of the image at different levels, and can be very useful for distinguishing images from different categories. Experimental results<br />

on the widely used COREL data set have shown our proposed approach can achieve significant improvement compared<br />

with the state-of-the-art methods.<br />

09:00-11:10, Paper TuAT8.40<br />

Classification of Volcano Events Observed by Multiple Seismic Stations<br />

Duin, Robert, TU Delft<br />

Orozco-Alzate, Mauricio, Univ. Nacional de Colombia Sede Manizales, Colombia<br />

Londoño-Bonilla, John Makario, Inst. Colombiano de Geología y Minería (INGEOMINAS), Colombia<br />

Seismic events in and around volcanos, like tremors, earth quakes, ice quakes and strokes of lightning, are usually observed<br />

by multiple stations. The question rises whether classifiers trained for one seismic station can be used for classifying observations<br />

by other stations, and, moreover, whether a combination of station signals improves the classification performances<br />

for a single station. We study this for seismic time signals represented by spectra and spectrograms obtained from 5 seismic<br />

stations on the Nevado del Ruiz in Colombia.<br />

- 90 -


09:00-11:10, Paper TuAT8.41<br />

A Variational Bayesian EM Algorithm for Tree Similarity<br />

Takasu, Atsuhiro, National Inst. of Informatics<br />

Fukagawa, Daiji, National Inst. of Informatics<br />

Akutsu, Tatsuya, Kyoto Univ.<br />

In recent times, a vast amount of tree-structured data has been generated. For mining, retrieving, and integrating such data,<br />

we need a fine-grained tree similarity measure that can be adapted to objective data. To achieve this goal, this paper (1)<br />

proposes a probabilistic generative model that generates pairs of similar trees, and (2) derives a learning algorithm for estimating<br />

the parameters of the model based on the variational Bayesian expectation maximization (VBEM) method. This<br />

method can handle rooted, ordered, and labeled trees. We show that the tree similarity model obtained via the BEM technique<br />

performs better than that obtained via maximum likelihood estimation by tuning the hyper parameters.<br />

09:00-11:10, Paper TuAT8.42<br />

Enhancing Image Classification with Class-Wise Clustered Vocabularies<br />

Wojcikiewicz, Wojciech, Fraunhofer Inst. FIRST<br />

Kawanabe, Motoaki, Fraunhofer FIRST and TU Berlin<br />

Binder, Alexander, Fraunhofer Inst. FIRST, Berlin<br />

In recent years bag-of-visual-words representations have gained increasing popularity in the field of image classification.<br />

Their performance highly relies on creating a good visual vocabulary from a set of image features (e.g. SIFT). For realworld<br />

photo archives such as Flicker, code<strong>book</strong>s with larger than a few thousand words are desirable, which is infeasible<br />

by the standard k-means clustering. In this paper, we propose a two-step procedure which can generate more informative<br />

code<strong>book</strong>s efficiently by class-wise k-means and a novel procedure for word selection. Our approach was compared favorably<br />

to the standard k-means procedure on the PASCAL VOC data sets.<br />

09:00-11:10, Paper TuAT8.43<br />

Efficiently Computing Optimal Consensus of Digital Line Fitting<br />

Kenmochi, Yukiko, Univ. Paris-Est<br />

Buzer, Lilian, ESIEE<br />

Talbot, Hugues, ESIEE<br />

Given a set of discrete points in a 2D digital image containing noise, we formulate our problem as robust digital line<br />

fitting. More precisely, we seek the maximum subset whose points are included in a digital line, called the optimal consensus.<br />

The paper presents an efficient method for exactly computing the optimal consensus by using the topological<br />

sweep, which provides us with the quadratic time complexity and the linear space complexity with respect to the number<br />

of input points.<br />

09:00-11:10, Paper TuAT8.44<br />

Learning a Joint Manifold Representation from Multiple Data Sets<br />

Torki, Marwan, Rutgers Univ.<br />

Elgammal, Ahmed, Rutgers Univ.<br />

Lee, Chan-Su, Yeungnam Univ.<br />

The problem we address in the paper is how to learn a joint representation from data lying on multiple manifolds. We are<br />

given multiple data sets and there is an underlying common manifold among the different data set. We propose a framework<br />

to learn an embedding of all the points on all the manifolds in a way that preserves the local structure on each manifold<br />

and, in the same time, collapses all the different manifolds into one manifold in the embedding space, while preserving<br />

the implicit correspondences between the points across different data sets. The proposed solution works as extensions to<br />

current state of the art spectral-embedding approaches to handle multiple manifolds.<br />

09:00-11:10, Paper TuAT8.45<br />

A Multi-Scale Approach to Decompose a Digital Curve into Meaningful Parts<br />

Nguyen, Thanh Phuong, LORIA<br />

Debled-Rennesson, Isabelle, LORIA – Nancy Univ.<br />

- 91 -


A multi-scale approach is proposed for polygonal representation of a digital curve by using the notion of blurred segment<br />

and a split-and-merge strategy. Its main idea is to decompose the curve into meaningful parts that are represented by detected<br />

dominant points at the appropriate scale. The method uses no threshold and can automatically decompose the curve<br />

into meaningful parts.<br />

09:00-11:10, Paper TuAT8.46<br />

A Memetic Algorithm for Selection of 3D Clustered Features with Applications in Neuroscience<br />

Björnsdotter, Malin, Univ. of Gothenburg<br />

Wessberg, Johan, Univ. of Gothenburg<br />

We propose a Memetic algorithm for feature selection in volumetric data containing spatially distributed clusters of informative<br />

features, typically encountered in neuroscience applications. The proposed method complements a conventional genetic<br />

algorithm with a local search utilizing inherent spatial relationships to efficiently identify informative feature clusters across<br />

multiple regions of the search volume. First, we demonstrate the utility of the algorithm on simulated data containing informative<br />

feature clusters of varying contrast-to-noise-ratios. The Memetic algorithm identified a majority of the relevant<br />

features whereas a conventional genetic algorithm detected only a subset sufficient for fitness maximization. Second, we<br />

applied the algorithm to authentic functional magnetic resonance imaging (fMRI) brain activity data from a motor task study,<br />

where the Memetic algorithm identified expected brain regions and subsequent brain activity prediction in new individuals<br />

was accurate at an average of 76% correct classification. The proposed algorithm constitutes a novel method for efficient<br />

volumetric feature selection and is applicable in any 3D data scenario. In particular, the algorithm is a promising alternative<br />

for sensitive brain activity mapping and decoding.<br />

09:00-11:10, Paper TuAT8.47<br />

Pose Estimation of Known Objects by Efficient Silhouette Matching<br />

Reinbacher, Christian, Graz Tech. Univ.<br />

Ruether, Matthias, Graz Univ. of Tech.<br />

Bischof, Horst, Graz Univ. of Tech.<br />

Pose estimation is essential for automated handling of objects. In many computer vision applications only the object silhouettes<br />

can be acquired reliably, because untextured or slightly transparent objects do not allow for other features. We propose<br />

a pose estimation method for known objects, based on hierarchical silhouette matching and unsupervised clustering. The<br />

search hierarchy is created by an unsupervised clustering scheme, which makes the method less sensitive to parametrization,<br />

and still exploits spatial neighborhood for efficient hierarchy generation. Our evaluation shows a decrease in matching time<br />

of 80% compared to an exhaustive matching and scalability to large models.<br />

09:00-11:10, Paper TuAT8.48<br />

Learning Non-Linear Dynamical Systems by Alignment of Local Linear Models<br />

Joko, Masao, The Univ. of Tokyo<br />

Kawahara, Yoshinobu, Osaka Univ.<br />

Yairi, Takehisa, Univ. of Tokyo<br />

Learning dynamical systems is one of the important problems in many fields. In this paper, we present an algorithm for<br />

learning non-linear dynamical systems which works by aligning local linear models, based on a probabilistic formulation of<br />

subspace identification. Because the procedure for constructing a state sequence in subspace identification can be interpreted<br />

as the CCA between past and future observation sequences, we can derive a latent variable representation for this problem.<br />

Therefore, as in a similar manner to the recent works on learning a mixture of probabilistic models, we obtain a framework<br />

for constructing a state space by aligning local linear coordinates. This leads to a prominent algorithm for learning non-linear<br />

dynamical systems. Finally, we apply our method to motion capture data and show how our algorithm works well.<br />

09:00-11:10, Paper TuAT8.49<br />

A Column Generation Approach for the Graph Matching Problem<br />

Silva, Freire, Alexandre, Univ. of Sao Paulo<br />

Jr., R. M. Cesar, Univ. of Sao Paulo<br />

Ferreira, C.E., Univ. of Sao Paulo<br />

Graph matching plays a central role in different problems for structural pattern recognition. Examples of applications include<br />

- 92 -


matching 3D CAD models, shape matching and medical imaging, to name but a few. In this paper, we present a new integer<br />

linear formulation for the problem and employ a combinatorial optimization technique, called column generation, in order<br />

to solve instances of the problem. We also present computational experiments with generated instances.<br />

09:00-11:10, Paper TuAT8.50<br />

Pattern Recognition using Functions of Multiple Instances<br />

Zare, Alina, Univ. of Florida<br />

Gader, Paul, Univ. of Florida<br />

The Functions of Multiple Instances (FUMI) method for learning a target prototype from data points that are functions of<br />

target and non-target prototypes is introduced. In this paper, a specific case is considered where, given data points which are<br />

convex combinations of a target prototype and several non-target prototypes, the Convex-FUMI (C-FUMI) method learns<br />

the target and non-target patterns, the number of nontarget patterns, and determines the weights (or proportions) of all the<br />

prototypes for each data point. For this method, training data need only binary labels indicating whether the data contains or<br />

does not contain some proportion of the target prototype; the specific target weights for the training data are not needed.<br />

After learning the target prototype using the binary labeled training data, target detection is performed on test data. Results<br />

showing detection of the skin in hyper spectral imagery and sub-pixel target detection in simulated data are presented.<br />

09:00-11:10, Paper TuAT8.51<br />

Linear Decomposition of Planar Shapes<br />

Faure, Alexandre, LAIC Univ. d’Auvergne<br />

Feschet, Fabien, Univ. d’Auvergne Clermont-Ferrand 1<br />

The issue of decomposing digital shapes into sets of digital primitives has been widely studied over the years. Practically all<br />

existing approaches require perfect or cleaned shapes. Those are obtained using various pre-processing techniques such as<br />

thinning or skeletonization. The aim of this paper is to bypass the use of such pre-processings, in order to obtain decompositions<br />

of shapes directly from connected components. This method has the advantage of taking into account the intrinsic<br />

thickness of digital shapes, and provides a decomposition which is also robust to<br />

09:00-11:10, Paper TuAT8.52<br />

Sketched Symbol Recognition with a Latent-Dynamic Conditional Model<br />

Deufemia, Vincenzo, Univ. di Salerno<br />

Risi, Michele, Univ. of Salerno<br />

Tortora, Genoveffa, Univ. di Salerno<br />

In this paper we present a recognizer of sketched symbols based on Latent-Dynamic Conditional Random Fields (LDCRF),<br />

a discriminative model for sequence classification. The LDCRF model classifies unsegmented sequences of strokes into domain<br />

symbols by taking into account contextual and temporal information. In particular, LDCRFs learn the extrinsic dynamics<br />

among strokes by modeling a continuous stream of symbol labels, and learn internal stroke sub-structure by using intermediate<br />

hidden states. The performance of our work is evaluated in the electric circuit domain.<br />

09:00-11:10, Paper TuAT8.53<br />

Canonical Patterns of Oriented Topologies<br />

Mankowski, Walter, Drexel Univ.<br />

Shokoufandeh, Ali, Drexel Univ.<br />

Salvucci, Dario, Drexel Univ.<br />

A common problem in many areas of behavioral research is the analysis of the large volume of data recorded during the execution<br />

of the tasks being studied. Recent work has proposed the use of an automated method based on canonical sets to<br />

identify the most representative patterns in a large data set, and described an initial experiment in identifying canonical webbrowsing<br />

patterns. However, there is a significant limitation to the method: it requires the similarity matrix to be symmetric,<br />

and thus can only be used for problems that can be modeled as unoriented topologies. In this paper we propose a novel enhancement<br />

to the method to support oriented topologies by allowing the similarity matrix to be nonsymmetric. We demonstrate<br />

the power of this new technique by applying the new method to find canonical lane changes in a driving simulator experiment.<br />

- 93 -


09:00-11:10, Paper TuAT8.54<br />

Hierarchical Anomality Detection based on Situation<br />

Nishio, Shuichi, Advanced Telecommunication Res. Inst. International<br />

Okamoto, Hiromi, Nara Women’s Univ.<br />

Babaguchi, Noboru, Osaka Univ.<br />

In this paper, we propose a novel anomality detection method based on external situational information and hierarchical<br />

analysis of behaviors. Past studies model normal behaviors to detect anomality as outliers. However, normal behaviors tend<br />

to differ by situations. Our method combines a set of simple classifiers with pedestrian trajectories as inputs. As mere path<br />

information is not sufficient for detecting anomality, trajectories are first decomposed into hierarchical features of different<br />

abstract levels and then applied to appropriate classifiers corresponding to the situation it belongs to. Effects of the methods<br />

are tested using real environment data.<br />

09:00-11:10, Paper TuAT8.55<br />

Image Classification using Subgraph Histogram Representation<br />

Ozdemir, Bahadir, Bilkent Univ.<br />

Aksoy, Selim, Bilkent Univ.<br />

We describe an image representation that combines the representational power of graphs with the efficiency of the bag-ofwords<br />

model. For each image in a data set, first, a graph is constructed from local patches of interest regions and their spatial<br />

arrangements. Then, each graph is represented with a histogram of sub graphs selected using a frequent subgraph mining algorithm<br />

in the whole data. Using the sub graphs as the visual words of the bag-of-words model and transforming of the<br />

graphs into a vector space using this model enables statistical classification of images using support vector machines. Experiments<br />

using images cut from a large satellite scene show the effectiveness of the proposed representation in classification<br />

of complex types of scenes into eight high-level semantic classes.<br />

09:00-11:10, Paper TuAT8.56<br />

Oriented Boundary Graph: A Framework to Design and Implement 3D Segmentation Algorithms<br />

Baldacci, Fabien, Univ. de Bordeaux<br />

Braquelaire, Achille, Univ. de Bordeaux<br />

Domenger, Jean Philippe, Univ. de Bordeaux<br />

In this paper we show the interest of a topological model to represent 3D segmented image which is a good compromise between<br />

the complete but time consuming representations and the partial but not expressive enough ones. We show that this<br />

model, called Oriented Boundary Graph, provides an effective framework for both volumic image analysis and segmentation.<br />

The Oriented Boundary Graph provides an efficient implementation of a set of primitives suitable for the design complex<br />

segmentation algorithms and to implement the computation of the segmented image characteristics needed by such algorithms.<br />

We first present the framework and give the time complexity of its main primitives. Then, we give some examples of the use<br />

of this framework in order to efficiently design non-trivial image analysis operations and image segmentation algorithms.<br />

Those examples are applied on 3D CT-scan data.<br />

09:00-11:10, Paper TuAT8.57<br />

Hierarchical Segmentation of Complex Structures<br />

Akcay, Huseyin Gokhan, Bilkent Univ.<br />

Aksoy, Selim, Bilkent Univ.<br />

Soille, Pierre, Ec. Joint Res. Centre<br />

We present an unsupervised hierarchical segmentation algorithm for detection of complex heterogeneous image structures<br />

that are comprised of simpler homogeneous primitive objects. An initial segmentation step produces regions corresponding<br />

to primitive objects with uniform spectral content. Next, the transitions between neighboring regions are modeled and clustered.<br />

We assume that the clusters that are dense and large enough in this transition space can be considered as significant.<br />

Then, the neighboring regions belonging to the significant clusters are merged to obtain the next level in the hierarchy. The<br />

experiments show that the algorithm that iteratively clusters and merges region groups is able to segment high-level complex<br />

structures in a hierarchical manner.<br />

- 94 -


TuAT9 Upper Foyer<br />

Biometrics Poster Session<br />

Session chair: Dobrišek, Simon (University of Ljubljana)<br />

09:00-11:10, Paper TuAT9.1<br />

Image Specific Error Rate: A Biometric Performance Metric<br />

Tabassi, Elham, NIST<br />

Image-specific false match and false non-match error rates are defined by inheriting concepts from the biometric zoo. These<br />

metrics support failure mode analyses by allowing association of a covariate (e.g., dilation for iris recognition) with a matching<br />

error rate without having to consider the covariate of a comparison image. Image-specific error rates are also useful in detection<br />

of ground truth errors in test datasets. Images with higher image-specific error rates are more ``difficult’’ to recognize,<br />

so these metrics can be used to assess the level of difficulty of test corpora or partition a corpus into sets with varying level<br />

of difficulty. Results on use of image-specific error rates for ground-truth error detection, covariate analysis and corpus partitioning<br />

is presented.<br />

09:00-11:10, Paper TuAT9.2<br />

Low Cost and Usable Multimodal Biometric System based on Keystroke Dynamics and 2D Face Recognition<br />

Giot, Romain, Univ. de Caen, Basse-Normandie – CNRS<br />

Hemery, Baptiste, Univ. de CAEN<br />

Rosenberger, Christophe, Lab. GREYC<br />

We propose in this paper a low cost multimodal biometric system combining keystroke dynamics and 2D face recognition.<br />

The objective of the proposed system is to be used while keeping in mind: good performances, acceptability, and espect of<br />

privacy. Different fusion methods have been used (min, max, mul, svm, weighted sum configured with genetic algorithms,<br />

and, genetic programming) on the scores of three keystroke dynamics algorithms and two 2D face recognition ones. This<br />

multimodal biometric system improves the recognition rate in comparison with each individual method. On a chimeric database<br />

composed of 100 individuals, the best keystroke dynamics method obtains an EER of 8.77%, the best face recognition<br />

one has an EER of 6.38%, while the best proposed fusion system provides an EER of 2.22%.<br />

09:00-11:10, Paper TuAT9.3<br />

Parallel versus Hierarchical Fusion of Extended Fingerprint Features<br />

Zhao, Qijun, The Hong Kong Pol. Univ.<br />

Liu, Feng, The Hong Kong Pol. Univ.<br />

Zhang, Lei, The Hong Kong Pol. Univ.<br />

Zhang, David, The Hong Kong Pol. Univ.<br />

Extended fingerprint features such as pores, dots and incipient ridges have been increasingly attracting attention from researchers<br />

and engineers working on automatic fingerprint recognition systems. A variety of methods have been proposed to<br />

combine these features with the traditional minutiae features. This paper comparatively analyses the parallel and hierarchical<br />

fusion approaches on a high resolution fingerprint image dataset. Based on the results, a novel and more effective hierarchical<br />

approach is presented for combining minutiae, pores, dots and incipient ridges.<br />

09:00-11:10, Paper TuAT9.4<br />

Feature Band Selection for Multispectral Palmprint Recognition<br />

Guo, Zhenhua, The Hong Kong Pol. Univ.<br />

Zhang, Lei, The Hong Kong Pol. Univ.<br />

Zhang, David, The Hong Kong Pol. Univ.<br />

Palm print is a unique and reliable biometric characteristic with high usability. Many palm print recognition algorithms and<br />

systems have been successfully developed in the past decades. Most of the previous works use the white light sources for illumination.<br />

Recently, it has been attracting much research attention on developing new biometric systems with both high<br />

accuracy and high anti-spoof capability. Multispectral palm print imaging and recognition can be a potential solution to such<br />

systems because it can acquire more discriminative information for personal identity recognition. One crucial step in developing<br />

such systems is how to determine the minimal number of spectral bands and select the most representative bands to<br />

build the multispectral imaging system. This paper presents preliminary studies on feature band selection by analyzing hyper<br />

- 95 -


spectral palm print data (420nm~1100nm). Our experiments showed that 2 spectral bands at 700nm and 960nm could provide<br />

most discriminate information of palm print. This finding could be used as the guidance for designing multispectral palm<br />

print systems in the future.<br />

09:00-11:10, Paper TuAT9.5<br />

Automatic Gender Recognition using Fusion of Facial Strips<br />

Lee, Ping-Han, National Taiwan Univ.<br />

Hung, Jui-Yu, National Taiwan Univ.<br />

Hung, Yi-Ping, National Taiwan Univ.<br />

We propose a fully automatic system that detects and normalizes faces in images and recognizes their genders. To boost the<br />

recognition accuracy, we correct the in-plane and out-of-plane rotations of faces, and align faces based on estimated eye positions.<br />

To perform gender recognition, a face is first decomposed into several horizontal and vertical strips. Then, a regression<br />

function for each strip gives an estimation of the likelihood the strip sample belongs to a specific gender. The likelihoods<br />

from all strips are concatenated to form a new feature, based on which a gender classifier gives the final decision. The proposed<br />

approach achieved an accuracy of 88.1% in recognizing genders of faces in images collected from the World-Wide<br />

Web. For faces in the FERET dataset, our system achieved an accuracy of 98.8%, outperforming all the six state-of-the-art<br />

algorithms compared in this paper<br />

09:00-11:10, Paper TuAT9.6<br />

Benchmarking Local Orientation Extraction in Fingerprint Recognition<br />

Cappelli, Raffaele, Univ. of Bologna<br />

Maltoni, Davide, Univ. of Bologna<br />

Turroni, Francesco, Univ. of Bologna<br />

The computation of local orientations is a fundamental step in fingerprint recognition. Although a large number of approaches<br />

have been proposed in the literature, no systematic quantitative evaluations have been done yet, mainly due to the lack of<br />

proper datasets with associated ground truth information. In this paper we propose a new benchmark (which includes two<br />

datasets and an accuracy metric) and report preliminary results obtained by testing four well-known local orientation extraction<br />

algorithms.<br />

09:00-11:10, Paper TuAT9.7<br />

Efficient Finger Vein Localization and Recognition<br />

Li, Xu, Civil Aviation Univ. of China<br />

Yang, Jinfeng, Civil Aviation Univ. of China<br />

In order to achieve accurate recognition of human finger vein (FV), this paper addresses the problems of finger vein localization<br />

and vein feature extraction. An inherent physical property of human fingers is used to localize the region of interest<br />

(ROI) of vein images as well as removing uninformative vein imagery based on the inter-phalangeal joint prior. In addition,<br />

vein images are characterized as a series of energy features through steerable filters. Experimental results show the promising<br />

performance of the proposed algorithm for human vein identification.<br />

09:00-11:10, Paper TuAT9.8<br />

Learning the Relationship between High and Low Resolution Images in Kernel Space for Face Super Resolution<br />

Zou, Wilman, W W, Hong Kong Baptist Univ.<br />

Yuen, Pong C, Hong Kong Baptist Univ.<br />

This paper proposes a new nonlinear face super resolution algorithm to address an important issue in face recognition from<br />

surveillance video namely, recognition of low resolution face image with nonlinear variations. The proposed method learns<br />

the nonlinear relationship between low resolution face image and high resolution face image in (nonlinear) kernel feature<br />

space. Moreover, the discriminative term can be easily included in the proposed framework. Experimental results on CMU-<br />

PIE and FRGC v2.0 databases show that proposed method outperforms existing methods as well as the recognition based on<br />

high resolution images.<br />

- 96 -


09:00-11:10, Paper TuAT9.9<br />

Robust Regression for Face Recognition<br />

Naseem, Imran, The Univ. of Western Australia<br />

Togneri, Roberto, The Univ. of Western Australia<br />

Bennamoun, Mohammed, The Univ. of Western Australia<br />

In this paper we address the problem of illumination invariant face recognition. Using a fundamental concept that in<br />

general, patterns from a single object class lie on a linear subspace [2], we develop a linear model representing a probe<br />

image as a linear combination of class-specific galleries. In the presence of noise, the well-conditioned inverse problem<br />

is solved using the robust Huber estimation and the decision is ruled in favor of the class with the minimum reconstruction<br />

error. The proposed Robust Linear Regression Classification (RLRC) algorithm is extensively evaluated for two standard<br />

databases and has shown good performance index compared to the state-of-art robust approaches.<br />

09:00-11:10, Paper TuAT9.10<br />

Recognition of Blurred Faces via Facial Deblurring Combined with Blur-Tolerant Descriptors<br />

Hadid, Abdenour, Univ. of Oulu<br />

Nishiyama, Masashi, Toshiba Corp.<br />

Sato, Yoichi, Univ. of Tokyo<br />

Blur is often present in real-world images and significantly affects the performance of face recognition systems. To improve<br />

the recognition of blurred faces, we propose a new approach which inherits the advantages of two recent methods. The<br />

idea consists of first reducing the amount of blur in the images via deblurring and then extracting blur-tolerant descriptors<br />

for recognition. We assess our analysis on real blurred face images (FRGC 1.0 database) and also on face images artificially<br />

degraded by focus blur (FERET database), demonstrating significant performance enhancement compared to the state-ofthe-art.<br />

09:00-11:10, Paper TuAT9.11<br />

Diffusion-Based Face Selective Smoothing in DCT Domain to Illumination Invariant Face Recognition<br />

Ezoji, Mehdi, Amirkabir Univ. of Tech.<br />

Faez, Karim, Amirkabir Univ. of Tech.<br />

In this paper, a diffusion-based iterative algorithm is proposed for illumination invariant face representation using image<br />

selective smoothing in DCT domain. In fact, we split the image I into three parts (R+w)+L of an illumination invariant<br />

component, an oscillating component and a smooth component. At each iteration, the influence of different frequency<br />

sub-bands of image is determined and the additive oscillating component is reduced. The experimental results confirmed<br />

that our approach provides a suitable representation for overcoming illumination variations.<br />

09:00-11:10, Paper TuAT9.12<br />

BioHashing for Securing Fingerprint Minutiae Templates<br />

Belguechi, Rima, National School of Computer Science<br />

Rosenberger, Christophe, Lab. GREYC<br />

Ait Aoudia, Samy, National School of Computer Science<br />

The storage of fingerprints is an important issue as this biometric modality is more and more deployed for real applications.<br />

The a prori impossibility to revoke a biometric template (like a password) in case of theft, is a major concern for privacy<br />

reasons. We propose in this paper a new method to secure fingerprint minutiae templates by storing a bio code while keeping<br />

good recognition results. We show the efficiency of the method in comparison to some published methods for different<br />

scenarios.<br />

09:00-11:10, Paper TuAT9.13<br />

Fusion of an Isometric Deformation Modeling Approach using Spectral Decomposition and a Region-Based Approach<br />

using ICP for Expression-Invariant 3D Face Recognition<br />

Smeets, Dirk, K.U.Leuven<br />

Fabry, Thomas, K.U.Leuven<br />

Hermans, Jeroen, K.U.Leuven<br />

- 97 -


Vandermeulen, Dirk<br />

Suetens, Paul, K.U.Leuven<br />

The recognition of faces under varying expressions is one of the current challenges in the face recognition community. In<br />

this paper, we propose a method fusing different complementary approaches each dealing with expression variations. The<br />

first approach uses an isometric deformation model and is based on the largest singular values of the geodesic distance<br />

matrix as an expression-invariant shape descriptor. The second approach performs recognition on the more rigid parts of<br />

the face that are less affected by expression variations. Several fusion techniques are examined for combining the approaches.<br />

The presented method is validated on a subset of 900 faces of the BU-3DFE face database resulting in an equal<br />

error rate of 5.85% for the verification scenario and a rank 1 recognition rate of 94.48% for the identification scenario<br />

using the sum rule as fusion technique. This result outperforms other 3D expression-invariant face recognition methods<br />

on the same database.<br />

09:00-11:10, Paper TuAT9.14<br />

Towards a Best Linear Combination for Multimodal Biometric Fusion<br />

Chia, Chaw, Chia, Chaw, Nottingham Trent Univ.<br />

Sherkat, Nasser, Nottingham Trent Univ.<br />

Nolle, Lars, Nottingham Trent Univ.<br />

Owing to effectiveness and ease of implementation Sum rule has been widely applied in the biometric research field. Different<br />

matcher information has been used as weighting parameters in the weighted Sum rule. In this work, a new parameter<br />

has been devised in reducing the genuine/imposter distribution overlap. It is shown that the overlap region width has the<br />

best generalization performance as the weighting parameter amongst other commonly used matcher information. Furthermore,<br />

it is illustrated that the equal weighted Sum rule can generally perform better than the Equal Error Rate and d-prime<br />

weighted Sum rule. The publicly available databases: the NIST-BSSR1 multimodal biometric and Xm2vts score sets have<br />

been used.<br />

09:00-11:10, Paper TuAT9.15<br />

Slap Fingerprint Segmentation for Live-Scan Devices and Ten-Print Cards<br />

Zhang, Yongliang, Zhejiang Univ. of Technology<br />

Xiao, Gang, Zhejiang Univ. of Technology<br />

Li, Yanmiao, Jiaotong Univ. Dalian<br />

Wu, Hongtao, Hebei Univ. of Tech.<br />

Huang, Yaping, Zhejiang Univ. of Technology<br />

Presented here is a highly accurate and computationally efficient algorithm suitable for slap fingerprint segmentation. The<br />

main advantages of this algorithm are as follows: 1)three-order cumulant is used to roughly segment the foreground; 2)frequency<br />

domain analysis is carried out in local areas to do binarization and fine segmentation; 3)cumulative sum analysis<br />

is applied to extract the knuckle lines; 4)two shape features of the ellipse are adapted to calculate the confidence of each<br />

fingertip candidate. Experimental results show that the algorithm has the characteristic of more robustness against noise<br />

and superior precision, not only for live-scan four finger slaps but also for ten-print-card five finger slaps.<br />

09:00-11:10, Paper TuAT9.16<br />

A Metric of Information Gained through Biometric Systems<br />

Takahashi, Kenta, Hitachi Ltd.<br />

Murakami, Takao, Hitachi Ltd.<br />

We propose a metric of information gained through biometric matching systems. Firstly, we discuss how the information<br />

about the identity of a person is derived from biometric samples through a biometric system, and define the “biometric<br />

system entropy” or BSE. Then we prove that the BSE can be approximated asymptotically by the Kullback-Leibler divergence<br />

D(f_G(x) || f_I(x)) where f_G(x), f_I(x) are PDFs of matching scores between samples from an individuals and<br />

among population. We also discuss how to evaluate D(f_G || f_I) of a biometric system and show a numerical example of<br />

face and fingerprint matching systems.<br />

- 98 -


09:00-11:10, Paper TuAT9.17<br />

Probabilistic Measure for Signature Verification based on Bayesian Learning<br />

Pu, Danjun, State Univ. of New York at Buffalo<br />

Srihari, Sargur<br />

Signature verification is a common task in forensic document analysis. The goal is to make a decision whether a questioned<br />

signature belongs to a set of known signatures of an individual or not. In a typical forgery case a very limited number of<br />

known signatures may be available, with as few as four or five knowns \cite{Stev95}. Here we describe a fully Bayesian<br />

approach which overcomes the limitation of having too few genuine samples. The algorithm has three steps: Step 1: Learn<br />

prior distributions of parameters from a population of known signatures; Step 2: Determine the posterior distributions of<br />

parameters using the genuine samples of a particular person; Step 3: Determine probabilities of the query from both genuine<br />

and forgery classes and the Log Likelihood Ratio (LLR) of the query. Rather than give a hard decision, this method provides<br />

a probabilistic measure LLR of the decision and the performance of the Bayesian Learning is improved especially in the<br />

case of limited known samples.<br />

09:00-11:10, Paper TuAT9.18<br />

Gender Classification using on Single Frontal Image Per Person: Combination of Appearance and Geometric based<br />

Features<br />

Mozaffari, Saeed, Semnan Univ.<br />

Behravan, Hamid, Semnan Univ.<br />

Akbari, Rohollah, Qazvin Azad Univ.<br />

Today, many social interactions and services depend on gender. In this paper, we introduce a single image gender classification<br />

algorithm using combination of appearance-based and geometric-based features. These include Discrete Cosine<br />

Transform (DCT), and Local Binary Pattern (LBP), and geometrical distance feature (GDF). The novel feature, GDF proposed<br />

in this paper, is inspired from physiological differences between male and female faces. Combination of appearance-based<br />

features (DCT and LBP) with geometric-based feature (GDF) leads to higher gender classification accuracy.<br />

Our system estimates gender of the input image based on the majority rule. If the results of DCT and LBP features are not<br />

identical, gender classification will be based on GDF feature. The proposed method was evaluated on two databases: AR<br />

and ethnic. Experimental results show that the novel geometric feature improves the gender classification accuracy by<br />

13%.<br />

09:00-11:10, Paper TuAT9.19<br />

Residual Analysis for Fingerprint Orientation Modeling<br />

Jirachaweng, Suksan, Kasetsart Univ.<br />

Hou, Zujun, Inst. For Infocomm Res.<br />

Li, Jun, Inst. For Infocomm Res.<br />

Yau, Wei-Yun, Inst. For Infocomm Res.<br />

Areekul, Vutipong, Kasetsart Univ.<br />

This paper presents a novel method for fingerprint orientation modeling, which executes in two phases. Firstly, the orientation<br />

field is reconstructed through fitting to a lower order Legendre polynomial basis to capture the global orientation<br />

pattern. Then the preliminary model around the singular region is dynamically refined by fitting to a higher order Legendre<br />

polynomial basis. The singular region is automatically detected through the analysis on the orientation residual field between<br />

the original orientation field and the orientation model. The method has been evaluated using the FVC 2004 data<br />

sets and compared with state-of-the-arts. Experiments turn out that the propose method attains higher accuracy in fingerprint<br />

matching and singularity preservation.<br />

09:00-11:10, Paper TuAT9.20<br />

Dynamic Amelioration of Resolution Mismatches for Local Feature based Identity Inference<br />

Wong, Yongkang, NICTA<br />

Sanderson, Conrad, NICTA<br />

Mau, Sandra, NICTA<br />

Lovell, Brian Carrington, The Univ. of Queensland<br />

While existing face recognition systems based on local features are robust to issues such as misalignment, they can exhibit<br />

- 99 -


accuracy degradation when comparing images of differing resolutions. This is common in surveillance environments<br />

where a gallery of high resolution mugshots is compared to low resolution CCTV probe images, or where the size of a<br />

given image is not a reliable indicator of the underlying resolution (e.g. poor optics). To alleviate this degradation, we<br />

propose a compensation framework which dynamically chooses the most appropriate face recognition system for a given<br />

pair of image resolutions. This framework applies a novel resolution detection method which does not rely on the size of<br />

the input images, but instead exploits the sensitivity of local features to resolution using a probabilistic multi-region histogram<br />

approach. Experiments on a resolution-modified version of the “Labeled Faces in the Wild” dataset show that the<br />

proposed resolution detector frontend obtains a 99% average accuracy in selecting the most appropriate face recognition<br />

system, resulting in higher overall face discrimination accuracy (across several resolutions) compared to the individual<br />

baseline face recognition systems.<br />

09:00-11:10, Paper TuAT9.21<br />

Patch-Based Similarity HMMs for Face Recognition with a Single Reference Image<br />

Vu, Ngoc-Son, Gipsa-Lab.<br />

Caplier, Alice, GIPSA-Lab. Grenoble Univ.<br />

In this paper we present a new architecture for face recognition with a single reference image, which completely separates<br />

the training process from the recognition process. In the training stage, by using a database containing various individuals,<br />

the spatial relations between face components are represented by two Hidden Markov Models (HMMs), one modeling<br />

within-subject similarities, and the other modeling inter-subject differences. This allows us during the recognition stage<br />

to take a pair of face images, neither of which has been seen before, and to determine whether or not they come from the<br />

same individual. Whilst other face-recognition HMMs use Maximum Likelihood criterion, we test our approach using<br />

both Maximum Likelihood and Maximum a Posteriori (MAP) criterion, and find that MAP provides better results. Importantly,<br />

the training database can be entirely separated from the gallery and test images: this means that adding new individuals<br />

to the system can be done without re-training. We present results based upon models trained on the FERET training<br />

dataset, and demonstrate that these give satisfactory recognition rates on both the FERET database itself and more impressively<br />

the unseen AR database. When compared to other HMM based face recognition techniques, our algorithm is of<br />

much lower complexity due to the small size of our observation sequence.<br />

09:00-11:10, Paper TuAT9.22<br />

How to Control Acceptance Threshold for Biometric Signatures with Different Confidence Values?<br />

Makihara, Yasushi, The Inst. of Scientific and Industrial Res. Univ.<br />

Hossain, Md. Altab, Osaka Univ.<br />

Yagi, Yasushi, Osaka Univ.<br />

In the biometric verification, authentication is given when a distance of biometric signatures between enrollment and test<br />

phases is less than an acceptance threshold, and the performance is usually evaluated by a so-called Receiver Operating<br />

Characteristics (ROC) curve expressing a trade off between False Rejection Rate (FRR) and False Acceptance Rate (FAR).<br />

On the other hand, it is also well known that the performance is significantly affected by the situation differences between<br />

enrollment and test phases. This paper describes a method to adaptively control an acceptance threshold with quality measures<br />

derived from situation differences so as to optimize the ROC curve. We show that the optimal evolution of the adaptive<br />

threshold in the domain of the distance and quality measure is equivalent to a constant evolution in the domain of the error<br />

gradient defined as a ratio of a total error rate to a total acceptance rate. An experiment with simulation data demonstrates<br />

that the proposed method outperforms the previous methods, particularly under a lower FAR or FRR tolerance condition.<br />

09:00-11:10, Paper TuAT9.23<br />

Binary Representations of Fingerprint Spectral Minutiae Features<br />

Xu, Haiyun, Univ. of Twente<br />

Veldhuis, Raymond, Univ. of Twente<br />

A fixed-length binary representation of a fingerprint has the advantages of a fast operation and a small template storage.<br />

For many biometric template protection schemes, a binary string is also required as input. The spectral minutiae representation<br />

is a method to represent a minutiae set as a fixed-length real-valued feature vector. In order to be able to apply the<br />

spectral minutiae representation with a template protection scheme, we introduce two novel methods to quantize the<br />

spectral minutiae features into binary strings: Spectral Bits and Phase Bits. The experiments on the FVC2002 database<br />

show that the binary representations can even outperformed the spectral minutiae real-valued features.<br />

- 100 -


09:00-11:10, Paper TuAT9.24<br />

Attacking Iris Recognition: An Efficient Hill-Climbing Technique<br />

Rathgeb, Christian, Univ. of Salzburg<br />

Uhl, Andreas, Univ. of Salzburg<br />

In this paper we propose a modified hill-climbing attack to iris biometric systems. Applying our technique we are able to<br />

effectively gain access to iris biometric systems at very low effort. Furthermore, we demonstrate that reconstructing approximations<br />

of original iris images is highly non-trivial.<br />

09:00-11:10, Paper TuAT9.25<br />

Face Recognition at-a-Distance using Texture, Dense- and Sparse-Stereo Reconstruction<br />

Rara, Ham, CVIP Lab. Univ. of Louisville<br />

Ali, Asem, Univ. of Louisville<br />

Elhabian, Shireen, Univ. of Louisville<br />

Starr, Thomas, Univ. of Louisville<br />

Farag, Aly A., Univ. of Louisville<br />

This paper introduces a framework for long-distance face recognition using dense and sparse stereo reconstruction, with<br />

texture of the facial region. Two methods to determine correspondences of the stereo pair are used in this paper: (a) dense<br />

global stereo-matching using maximum-a-posteriori Markov Random Fields (MAP-MRF) algorithms and (b) Active Appearance<br />

Model (AAM) fitting of both images of the stereo pair and using the fitted AAM mesh as the sparse correspondences.<br />

Experiments are performed using combinations of different features extracted from the dense and sparse<br />

reconstructions, as well as facial texture. The cumulative rank curves (CMC), which are generated using the proposed<br />

framework, confirms the feasibility of the proposed work for long distance recognition of human faces.<br />

09:00-11:10, Paper TuAT9.26<br />

Automatic Asymmetric 3D-2D Face Recognition<br />

Huang, Di, Ec. Centrale de Lyon<br />

Ardabilian, Mohsen, Ec. Centrale de Lyon<br />

Wang, Yunhong, Beihang Univ.<br />

Chen, Liming, Ec. Centrale de Lyon<br />

3D Face recognition has been considered as a major solution to deal with unsolved issues of reliable 2D face recognition<br />

in recent years, i.e. lighting and pose variations. However, 3D techniques are currently limited by their high registration<br />

and computation cost. In this paper, an asymmetric 3D-2D face recognition method is presented, enrolling in textured 3D<br />

whilst performing automatic identification using only 2D facial images. The goal is to limit the use of 3D data to where it<br />

really helps to improve face recognition accuracy. The proposed approach contains two separate matching steps: Sparse<br />

Representation Classifier (SRC) is applied to 2D-2D matching, while Canonical Correlation Analysis (CCA) is exploited<br />

to learn the mapping between range LBP faces (3D) and texture LBP faces (2D). Both matching scores are combined for<br />

the final decision. Moreover, we propose a new preprocessing pipeline to enhance robustness to lighting and pose effects.<br />

The proposed method achieves better experimental results in the FRGC v2.0 dataset than 2D methods do, but avoiding<br />

the cost and inconvenience of data acquisition and computation of 3D approaches.<br />

09:00-11:10, Paper TuAT9.27<br />

Model and Score Adaptation for Biometric Systems: Coping with Device Interoperability and Changing Acquisition<br />

Conditions<br />

Poh, Norman, Univ. of Surrey<br />

Kittler, Josef, Univ. of Surrey<br />

Marcel, Sebastien, IDIAP Res. Inst. EPFL<br />

Matrouf, Driss, Univ. d’Avignon et des Pays de Vaucluse<br />

Bonastre, Jean-Francois, Univ. d’Avignon et des Pays de Vaucluse<br />

The performance of biometric systems can be significantly affected by changes in signal quality. In this paper, two types<br />

of changes are considered: change in acquisition environment and in sensing devices. We investigated three solutions: (I)<br />

model-level adaptation, (ii) score-level adaptation (normalisation), and (iii) the combination of the two, called compound<br />

adaptation. In order to cope with the above changing conditions, the model-level adaptation attempts to update the param-<br />

- 101 -


eters of the expert systems (classifiers). This approach requires the authenticity of the candidate samples used for adaptation<br />

be known (corresponding to supervised adaptation), or can be estimated (unsupervised adaptation). In comparison, the<br />

score-level adaptation merely involves post processing the expert output, with the objective of rendering the associated<br />

decision threshold to be dependent only on the class priors despite the changing acquisition conditions. Since the above<br />

adaptation strategies treat the underlying biometric experts/classifiers as a black-box, they can be applied to any unimodal<br />

or multimodal biometric system, thus facilitating system-level integration and performance optimisation. Our contributions<br />

are: (I) proposal of compound adaptation; (ii) investigation and comparison of two different quality-dependent score normalisation<br />

strategies; and, (iii) empirical comparison of the merit of the above three solutions on the BANCA face (video)<br />

and speech database.<br />

09:00-11:10, Paper TuAT9.28<br />

Online Boosting OC for Face Recognition in Continuous Video Stream<br />

Huo, Hongwen, Peking Univ.<br />

Feng, Jufu, Peking Univ.<br />

In this paper, we present a novel online face recognition approach for video stream called online boosting OC (output<br />

code). Recently, boosting was successfully used in many study fields such as object detection and tracking. It is one kind<br />

of large margin classifiers for binary classification problems and also efficient for on-line learning. However, face recognition<br />

is a typical multi-class problem. Hence, it is difficult to use boosting in face recognition, especially in an online<br />

version. In our work, we combine online boosting and OC algorithm to solve real-time online multi-class classification<br />

problems. We perform online boosting OC on real-world experiments: face recognition in continuous video stream, and<br />

the results show that our algorithm is accurate and robust.<br />

09:00-11:10, Paper TuAT9.29<br />

On the Dimensionality Reduction for Sparse Representation based Face Recognition<br />

Zhang, Lei, The Hong Kong Pol. Univ.<br />

Yang, Meng, The Hong Kong Pol. Univ.<br />

Feng, Zhizhao, The Hong Kong Pol. Univ.<br />

Zhang, David, The Hong Kong Pol. Univ.<br />

Face recognition (FR) is an active yet challenging topic in computer vision applications. As a powerful tool to represent<br />

high dimensional data, recently sparse representation based classification (SRC) has been successfully used for FR. This<br />

paper discusses the dimensionality reduction (DR) of face images under the framework of SRC. Although one important<br />

merit of SRC is that it is insensitive to DR or feature extraction, a well trained projection matrix can lead to higher FR rate<br />

at a lower dimensionality. An SRC oriented unsupervised DR algorithm is proposed in this paper and the experimental results<br />

on benchmark face databases demonstrated the improvements brought by the proposed DR algorithm over PCA or<br />

random projection based DR under the SRC framework.<br />

09:00-11:10, Paper TuAT9.30<br />

Improved Fingerprint Image Segmentation and Reconstruction of Low Quality Areas<br />

Mieloch, Krzysztof, Univ. of Goettingen<br />

Munk, Axel, Univ. of Goettingen<br />

Mihailescu, Preda, Univ. of Goettingen<br />

One of the main reason for false recognition is noise added to fingerprint images during the acquisition step. Hence, the<br />

improvement of the enhancement step affects general accuracy of automatic recognition systems. In one of our previous<br />

publications we introduced hierarchically linked extended features – the new set of features which not only includes additional<br />

fingerprint features individually but also contains the information about their relationships such as line adjacency<br />

information at minutiae points or links between neighbouring fingerprint lines. In this work we present the application of<br />

the extended features to preprocessing and enhancement. We use structural information for improving the segmentation<br />

step, as well as connecting disrupted fingerprint lines and recovering missing minutiae. Experiments show a decrease in<br />

the error rate in matching.<br />

- 102 -


09:00-11:10, Paper TuAT9.31<br />

An Efficient Method for Offline Text Independent Writer Identification<br />

Ghiasi, Golnaz, Amirkabir Univ. of Tech.<br />

Safabakhsh, Reza, Amirkabir Univ. of Tech.<br />

This paper proposes, an efficient method for text independent writer identification using a code<strong>book</strong>. The occurrence histogram<br />

of the shapes in the code<strong>book</strong> is used to create a feature vector for the handwriting. There is a wide variety of different<br />

shapes in the connected components obtained from handwriting. Small fragments of connected components should<br />

be used to avoid complex patterns. A new and more efficient method is introduced for this purpose. To evaluate the methods,<br />

writer identification is conducted on three varieties of a Farsi database. These varieties include texts of short, medium and<br />

large lengths. Experimental results show the efficiency of the method especially for short texts.<br />

09:00-11:10, Paper TuAT9.32<br />

Study on Color Spaces for Single Image Enrolment Face Authentication<br />

Hemery, Baptiste, Univ. de CAEN<br />

Schwartzmann, Jean-Jacques, Orange Lab.<br />

Rosenberger, Christophe, Lab. GREYC<br />

We propose in this paper to study different color spaces for representing an image for the face authentication application.<br />

We used a generic algorithm based on a matching of keypoints using sift descriptors computed on one color component.<br />

Ten color spaces have been studied on four large and signicant benchmark databases (ENSIB, FACES94, AR and FERET).<br />

We show that all color spaces do not provide the same efficiency and the use of the color information allows an interesting<br />

improvement of verification results.<br />

09:00-11:10, Paper TuAT9.33<br />

Estimation of Fingerprint Orientation Field by Weighted 2D Fourier Expansion Mode<br />

Tao, Xunqiang, Chinese Acad. of Sciences<br />

Yang, Xin, Chinese Acad. of Sciences<br />

Cao, Kai, Chinese Acad. of Sciences<br />

Wang, Ruifang, Chinese Acad. of Sciences<br />

Li, Peng, Chinese Acad. of Sciences<br />

Tian, Jie<br />

Accurate estimation of fingerprint orientation field is an essential module in fingerprint recognition. This paper proposes<br />

a novel technique for improving fingerprint orientation field estimation by fingerprint orientation model based on weighted<br />

2D fourier expansion(W-FOMFE). The motivation for the proposed method can be found by: 1)the original FOMFE is<br />

sensitive to abrupt changes in orientation field; 2) blocks of different quality should have different impacts on FOMFE.<br />

Thus, we take into account the information of the Harris-corner strength (HCS) for orientation field estimation. In our<br />

method, we first calculate the fingerprint’s HCS; then use the HCS to remove abrupt changes in orientation field; finally,<br />

incorporate the normalized HCS as weighted value into original FOMFE. We test our method on FVC2004DB1. Experimental<br />

results show that our method (W-FOMFE) has better orientation field estimation than FOMFE.<br />

09:00-11:10, Paper TuAT9.34<br />

Iterative Fingerprint Enhancement with Matched Filtering and Quality Diffusion in Spatial-Frequency Domain<br />

Sutthiwichaiporn, Prawit, Kasetsart Univ.<br />

Areekul, Vutipong, Kasetsart Univ.<br />

Jirachaweng, Suksan, Kasetsart Univ.<br />

The proposed fingerprint enhancement algorithm utilizes power spectrum in spatial-frequency domain. The input fingerprint<br />

is partitioned and assessed as high/low quality zone by using signal-to-noise ratio (SNR) approach. For high quality<br />

zone, signal spectrum with noise suppression is used to shape an enhanced filter in frequency domain. Then, this algorithm<br />

feed neighboring enhanced zone back in order to repair unreliable low quality region. The proposed algorithm out-performs<br />

Gabor and STFT approaches by fingerprint matching experiments on FVC2004 Db2 and Db3.<br />

- 103 -


09:00-11:10, Paper TuAT9.35<br />

Cancelable Face Recognition using Random Multiplicative Transform<br />

Wang, Yongjin, Univ. of Toronto<br />

Hatzinakos, Dimitrios, Univ. of Toronto<br />

The generation of cancelable and privacy preserving biometric templates is important for the pervasive deployment of<br />

biometric technology in a wide variety of applications. This paper presents a novel approach for cancelable biometric authentication<br />

using random multiplicative transform. The proposed method transforms the original biometric feature vector<br />

through element-wise multiplication with a random vector, and the sorted index numbers of the resulting vector in the<br />

transformed domain are stored as the biometric template. The changeability and privacy protecting properties of the generated<br />

biometric template are analyzed in detail. The effectiveness of the proposed method is well supported by extensive<br />

experimentation on a face verification problem.<br />

09:00-11:10, Paper TuAT9.36<br />

Evaluation of Multi-Frame Fusion based Face Classification under Shadow<br />

Canavan, Shaun, SUNY Binghamton<br />

Johnson, Benjamin, Youngstown State Univ.<br />

Reale, Michael, Binghamton Univ.<br />

Zhang, Yong, Youngstown State Univ.<br />

Yin, Lijun, SUNY Binghamton<br />

Sullins, John, Youngstown State Univ.<br />

A video sequence of a head moving across a large pose angle contains much richer information than a single-view image,<br />

and hence has greater potential for identification purposes. This paper explores and evaluates the use of a multi-frame<br />

fusion method to improve face recognition in the presence of strong shadow. The dataset includes videos of 257 subjects<br />

who rotated their heads by 0 to 90. Experiments were carried out using ten video frames per subject that were fused on<br />

the score level. The primary findings are: (I) A significant performance increase was observed, with the recognition rate<br />

being doubled from 40% using a single frame to 80% using ten frames; (ii) The performance of multi-frame fusion is<br />

strongly related to its inter-frame variation that measures its information diversity.<br />

09:00-11:10, Paper TuAT9.37<br />

Finger-Vein Authentication based on Wide Line Detector and Pattern Normalization<br />

Huang, Beining, Peking Univ.<br />

Dai, Yanggang, Peking Univ.<br />

Li, Rongfeng, Peking Univ.<br />

Tang, Darun, Univ.<br />

Li, Wenxin, Peking Univ.<br />

In the finger-vein authentication, there are two problems in practice. One is that the quality of the vein image will be<br />

reduced under bad environment conditions; the other is the irregular distortion of the image caused by the variance of the<br />

finger poses. Both problems raise the error ratios. In this paper, we introduced a wide line detector for feature extraction,<br />

which can obtain precise width information of the vein and increase the information of the extracted feature from low<br />

quality image. We also developed a new pattern normalization model based on a hypothesis that the fingers cross-sections<br />

are approximately ellipses and the vein that can be imaged is close to the finger surface. It can effectively reduce the distortion<br />

caused by the pose. In our experiment based on a database containing 50,700 images, our method shows advantages<br />

on dealing with the low quality data collected from the practical personal authentication system.<br />

09:00-11:10, Paper TuAT9.38<br />

Performance Evaluation of Micropattern Representation on Gabor Features for Face Recognition<br />

Zhao, Sanqiang, Griffith Univ. / National ICT Australia<br />

Gao, Yongsheng, Griffith Univ.<br />

Zhang, Baochang, Beihang Univ.<br />

Face recognition using micropattern representation has recently received much attention in the computer vision and pattern<br />

recognition community. Previous researches demonstrated that micropattern representation based on Gabor features<br />

achieves better performance than its direct usage on gray-level images. This paper conducts a comparative performance<br />

- 104 -


evaluation of micropattern representations on four forms of Gabor features for face recognition. Three evaluation rules<br />

are proposed and observed for a fair comparison. To reduce the high feature dimensionality problem, uniform quantization<br />

is used to partition the spatial histograms. The experimental results reveal that: 1) micropattern representation based on<br />

Gabor magnitude features outperforms the other three representations, and the performances of the other three are comparable;<br />

and 2) micropattern representation based on the combination of Gabor magnitude and phase features performs<br />

the best.<br />

09:00-11:10, Paper TuAT9.39<br />

Block Pyramid based Adaptive Quantization Watermarking for Multimodal Biometric Authentication<br />

Ma, Bin, Beihang Univ.<br />

Li, Chunlei, Beihang Univ.<br />

Wang, Yunhong, Beihang Univ.<br />

Zhang, Zhaoxiang, Beihang Univ.<br />

Wang, Yiding, North China Univ. of Tech.<br />

This paper proposes a novel robust watermarking scheme to embed fingerprint minutiae into face images for multimodal<br />

biometric authentication. First, a block pyramid is layered according to the block-wise face region distinctiveness estimated<br />

by Adaboost; upper level indicates informative spacial regions. Then, we adopt a first-order statics QIM method to perform<br />

watermark embedding in each pyramid level. Numeric watermark bits with higher priority are embedded into upper pyramid<br />

level with a larger embedding strength. By joint differentiation of host image regions and watermark bits priority, our<br />

scheme achieves a trade-offs among watermarking robustness, capacity and fidelity. Experimental results demonstrate<br />

that our approach guarantees the robustness of hidden biometric data, while preserving the distinctiveness of host biometric<br />

images.<br />

09:00-11:10, Paper TuAT9.40<br />

A Topologic Approach to User-Dependent Key Extraction from Fingerprints<br />

Gudkov, Vladimir, Sonda<br />

Ushmaev, Oleg, Russian Acad. of Sciences<br />

The paper briefly describes an approach to key extraction from fingerprint images based on topological descriptors of<br />

minutiae point neighborhood. The approach allows designing biometric encryption procedures with variable key length<br />

and successful decryption rate.<br />

09:00-11:10, Paper TuAT9.41<br />

Robust Face Recognition using Block-Based Bag of Words<br />

Li, Zisheng, The Univ. of Electro-Communications<br />

Imai, Jun-Ichi, The Univ. of Electro-Communications<br />

Kaneko, Masahide, The Univ. of Electro-Communications<br />

A novel block-based bag of words (BboW) method is proposed for robust face recognition. In our approach, a face image<br />

is partitioned into multiple blocks, dense SIFT features are then calculated and vector quantized into different codewords<br />

on each block respectively. Finally, histograms of codeword distribution on each local block are concatenated to represent<br />

the face image. Experimental results on AR database show that only using one neutral expression frame per person for<br />

training, our method can obtain excellent face recognition results on face images with extreme expressions, variant illumination,<br />

and partial occlusions. Our method also achieves an average recognition rate of 100% on XM2VTS database.<br />

09:00-11:10, Paper TuAT9.42<br />

Analysis of Fingerprint Pores for Vitality Detection<br />

Marcialis, Gian Luca, Univ. of Cagliari<br />

Roli, Fabio, Univ. of Cagliari<br />

Tidu, Alessandra, Univ. of Cagliari<br />

Spoofing is an open-issue for fingerprint recognition systems. It consists in submitting an artificial fingerprint replica from<br />

a genuine user. Current sensors provide an image which is then processed as a true fingerprint. Recently, the so-called 3 rd -<br />

level features, namely, pores, which are visible in high-definition fingerprint images, have been used for matching. In this<br />

- 105 -


paper, we propose to analyse pores location for characterizing the liveness of fingerprints. Experimental results on a large<br />

dataset of spoofed and live fingerprints show the benefits of the proposed approach.<br />

09:00-11:10, Paper TuAT9.43<br />

Applying Dissimilarity Representation to Off-Line Signature Verification<br />

Batista, Luana, École de Tech. Supérieure<br />

Granger, Eric, École de Tech. Supérieure<br />

Sabourin, R., École de Tech. Supérieure<br />

In this paper, a two-stage off-line signature verification system based on dissimilarity representation is proposed. In the<br />

first stage, a set of discrete left-to-right HMMs trained with different number of states and code<strong>book</strong> sizes is used to<br />

measure similarity values that populate new feature vectors. Then, these vectors are input to the second stage, which provides<br />

the final classification. Experiments were performed using two different classification techniques – AdaBoost, and<br />

Random Subspaces with SVMs – and a real-world signature verification database. Results indicate that the performance<br />

is significantly better with the proposed system over other reference signature verification systems from literature.<br />

09:00-11:10, Paper TuAT9.44<br />

3D Face Decomposition and Region Selection against Expression Variations<br />

Günlü, Göksel, Gazi Univ.<br />

Bilge, Hasan Sakir, Gazi Univ.<br />

3D face recognition exploits shape information as well as texture information in 2D systems. The use of whole 3D face is<br />

sensitive to some undesired situations like expression variations. To overcome this problem, we investigate a new approach<br />

that decomposes the whole 3D face into sub-regions and independently extracts features from each sub-region. 3D DCT<br />

is applied to each sub-region and most discriminating DCT coefficients are selected. The nose region gives the most contribution<br />

to the list of discriminating coefficients. Furthermore, a better recognition rate is achieved by only using the nose<br />

region. The highest recognition score in our experiments is 98.97% where rank-one recognition rates are considered. The<br />

results of the proposed approach are compared to other methods that use FRGC v2 database.<br />

09:00-11:10, Paper TuAT9.45<br />

Fusion of Qualities for Frame Selection in Video Face Verification<br />

Villegas, Mauricio, Univ. Pol. De Valencia<br />

Paredes, Roberto, Univ. Pol. De Valencia<br />

It is known that the use of video can help improve the performance of face verification systems. However, processing<br />

video in resource constrained devices is prohibitive. In order to reduce the load of the algorithms, a quality-based selection<br />

of frames can be applied. Generally there are available several qualities and thus a good fusion scheme is required. This<br />

paper addresses the problem of fusing quality measures such that the resulting quality improves the performance of frame<br />

selection. A comparison of different methods for fusing qualities is presented. Also, some new quality measures based on<br />

time derivatives are proposed, which are shown to be beneficial for estimating the overall quality. Finally, a curve is proposed<br />

which proves that the qualities used for frame selection effectively improve verification performance, independent<br />

of the number of frames selected or the method employed for obtaining the overall biometric score.<br />

09:00-11:10, Paper TuAT9.46<br />

A Person Retrieval Solution using Finger Vein Patterns<br />

Tang, Darun, Peking Univ.<br />

Huang, Beining, Peking Univ.<br />

Li, Rongfeng, Peking Univ.<br />

Li, Wenxin, Peking Univ.<br />

Dai, Yanggang, Peking Univ.<br />

Personal identification based on finger vein patterns is a newly developed biometrics technique and several practical systems<br />

have been deployed recent years. We developed a finger vein verification system for checking attendance and have<br />

collected a database of 0.8 million finger vein samples. Based on the database, we proposed a person retrieval solution for<br />

searching an image in the database and can get the response in an acceptable time. To fit for the retrieval solution, we de-<br />

- 106 -


signed a new encoding method. The experimental results show that our solution can get a result in about 10 seconds when<br />

working on a database of 50,700 samples. In the same time, the error rate is nearly the same as the linear searching.<br />

09:00-11:10, Paper TuAT9.47<br />

Multi-Classifier Q-Stack Aging Model for Adult Face Verification<br />

Li, Weifeng, Swiss Federal Inst. of Tech. Lausanne (EPFL)<br />

Drygajlo, Andrzej, Swiss Federal Inst. of Tech. Lausanne (EPFL)<br />

The influence of age progression on the performance of multi-classifier face verification systems is a challenging and<br />

largely open research problem that deserves more and more attention. In this paper, we propose to manage the aging influence<br />

on the adult face verification system by a multi-classifier Q-stack age modeling technique, which uses the age as<br />

a class-independent metadata quality measure together with scores from baseline classifiers, combining global and local<br />

patterns, in order to obtain better recognition rates. This allows for improved long-term class separation by introducing a<br />

2D parameterized decision boundary in the scores-age space using a short-term enrollment model. This new method, based<br />

on the concept of classifier stacking and age-dependent decision boundary, compares favorably with the conventional face<br />

verification approach, which uses age-independent decision threshold calculated only in the score space at the time of enrollment.<br />

The proposed approach is evaluated on the MORPH database.<br />

09:00-11:10, Paper TuAT9.48<br />

Quality-Based Fusion for Multichannel Iris Recognition<br />

Vatsa, Mayank, IIIT Delhi<br />

Singh, Richa, IIIT Delhi<br />

Ross, Arun, West Virginia Univ.<br />

Noore, Afzel, West Virginia Univ.<br />

We propose a quality-based fusion scheme for improving the recognition accuracy using color iris images characterized<br />

by three spectral channels – Red, Green and Blue. In the proposed method, quality scores are employed to select two channels<br />

of a color iris image which are fused at the image level using a Redundant Discrete Wavelet Transform (RDWT). The<br />

fused image is then used in a score-level fusion framework along with the remaining channel to improve recognition accuracy.<br />

Experimental results on a heterogenous color iris database demonstrate the efficacy of the technique when compared<br />

against other score-level and image-level fusion methods. The proposed method can potentially benefit the use of color<br />

iris images in conjunction with their NIR counterparts.<br />

09:00-11:10, Paper TuAT9.49<br />

Iris Image Retrieval based on Macro-Features<br />

Sam Sunder, Manisha, West Virginia Univ.<br />

Ross, Arun, West Virginia Univ.<br />

Most iris recognition systems use the global and local texture information of the iris in order to recognize individuals. In<br />

this work, we investigate the use of macro-features that are visible on the anterior surface of RGB images of the iris for<br />

matching and retrieval. These macro-features correspond to structures such as moles, freckles, nevi, melanoma, etc. and<br />

may not be present in all iris images. Given an image of a macro-feature, the goal is to determine if it can be used to successfully<br />

retrieve the associated iris from the database. To address this problem, we use features extracted by the Scale-<br />

Invariant Feature Transform (SIFT) to represent and match macro-features. Experiments using a subset of 770 distinct<br />

irides from the Miles Research Iris Database suggest the possibility of using macro-features for iris characterization and<br />

retrieval.<br />

09:00-11:10, Paper TuAT9.50<br />

A Gradient Descent Approach for Multi-Modal Biometric Identification<br />

Basak, Jayanta, IBM Res.<br />

Kate, Kiran, IBM Res. – India<br />

Tyagi, Vivek, IBM Res. - India<br />

Ratha, Nalini, IBM Res.<br />

While biometrics-based identification is a key technology in many critical applications such as searching for an identity<br />

- 107 -


in a watch list or checking for duplicates in a citizen ID card system, there are many technical challenges in building a solution<br />

because the size of the database can be very large (often in 100s of millions) and the intrinsic errors with the underlying<br />

biometrics engines. Often multi-modal biometrics is proposed as a way to improve the underlying biometrics accuracy<br />

performance. In this paper, we propose a score based fusion scheme tailored for identification applications. The proposed<br />

algorithm uses a gradient descent method to learn weights for each modality such that weighted sum of genuine scores is<br />

larger than the weighted sum of all the impostor scores. During the identification phase, top K candidates from each modality<br />

are retrieved and a super-set of identities is constructed. Using the learnt weights, we compute the weighted score for<br />

all the candidates in the superset. The highest scoring candidate is declared as the top candidate for identification. The<br />

proposed algorithm has been tested using NIST BSSR-1 dataset and results in terms of accuracy as well as the speed (execution<br />

time) are shown to be far superior than the published results on this dataset.<br />

09:00-11:10, Paper TuAT9.51<br />

Robust ECG Biometrics by Fusing Temporal and Cepstral Information<br />

Li, Ming, Univ. of Southern California<br />

Narayanan, Shrikanth, Univ. of Southern California<br />

The use of vital signs as a biometric is a potentially viable approach in a variety of application scenarios such as security<br />

and personalized health care. In this paper, a novel robust Electrocardiogram (ECG) biometric algorithm based on both<br />

temporal and cepstral information is proposed. First, in the time domain, after pre-processing and normalization, each<br />

heartbeat of the ECG signal is modeled by Hermite polynomial expansion (HPE) and support vector machine (SVM).<br />

Second, in the homomorphic domain, cepstral features are extracted from the ECG signals and modeled by Gaussian mixture<br />

modeling (GMM). In the GMM framework, heteroscedastic linear discriminant analysis and GMM super vector kernel<br />

is used to perform feature dimension reduction and discriminative modeling, respectively. Finally, fusion of both temporal<br />

and cepstral system outcomes at the score level is used to improve the overall performance. Experiment results show that<br />

the proposed hybrid approach achieves 98.3% accuracy and 0.5% equal error rate on the MIT-BIH Normal Sinus Rhythm<br />

Database.<br />

09:00-11:10, Paper TuAT9.52<br />

A Comparative Study of Facial Landmark Localization Methods for Face Recognition using HOG Descriptors<br />

Monzo, David, Univ. Pol. Valencia<br />

Albiol, Alberto, Univ. Pol. Valencia<br />

Albiol, Antonio, Univ. Pol. Valencia<br />

Mossi, Jose M., Univ. Pol. Valencia<br />

This paper compares several approaches to extract facial landmarks and studies their influence on face recognition problems.<br />

In order to obtain fair comparisons, we use the same number of facial landmarks and the same type of descriptors<br />

(HOG descriptors) for each approach. The comparative results are obtained using FERET and FRGC datasets and show<br />

that better recognition rates are obtained when landmarks are located at real facial fiducial points. However, if the automatic<br />

detection of these is compromised by the difficulty of the images, better results are obtained using fixed landmarks grids.<br />

09:00-11:10, Paper TuAT9.53<br />

Confidence Weighted Subspace Projection Techniques for Robust Face Recognition in the Presence of Partial Occlusio<br />

Struc, Vitomir, Univ. of Ljubljana<br />

Dobrišek, Simon, Univ. of Ljubljana<br />

Pavesic, Nikola, Univ. of Ljubljana<br />

Subspace projection techniques are known to be susceptible to the presence of partial occlusions in the image data. To<br />

overcome this susceptibility, we present in this paper a confidence weighting scheme that assigns weights to pixels according<br />

to a measure, which quantifies the confidence that the pixel in question represents an outlier. With this procedure<br />

the impact of the occluded pixels on the subspace representation is reduced and robustness to partial occlusions is obtained.<br />

Next, the confidence weighting concept is improved by a local procedure for the estimation of the subspace representation.<br />

Both the global weighting approach and the local estimation procedure are assessed in face recognition experiments on<br />

the AR database, where encouraging results are obtained with partially occluded facial images.<br />

- 108 -


09:00-11:10, Paper TuAT9.54<br />

Face Recognition across Pose with Automatic Estimation of Pose Parameters through AAM-Based Landmarking<br />

Teijeiro-Mosquera, Lucía, Univ. de Vigo<br />

Alba Castro, Jose Luis, Univ. of Vigo<br />

Gonzalez-Jimenez, Daniel, Univ. of Vigo<br />

In this paper we present a fully automatic system for face recognition across pose where no frontal view is needed in enrollment<br />

or test. The system uses three Active Appearance Models(AAMs): the first one is a generic multi resolution AAM,<br />

while the remaining ones are trained to cope with left/right variations (i.e. pose-dependent AAMs). During the fitting<br />

stage, pose is automatically estimated using eigenvector analysis, and a synthetic face is generated through texture warping.<br />

Results over CMU PIE Database show promising results compared to the performance achieved with manually land<br />

marked faces.<br />

09:00-11:10, Paper TuAT9.55<br />

Cross-Spectral Face Verification in the Short Wave Infrared (SWIR) Band<br />

Bourlai, Thirimachos, WVU<br />

Kalka, Nathan, WVU<br />

Ross, Arun, West Virginia Univ.<br />

Cukic, Bojan, WVU<br />

Hornak, Lawrence, WVU<br />

The problem of face verification across the short wave infrared spectrum (SWIR) is studied in order to illustrate the advantages<br />

and limitations of SWIR face verification. The contributions of this work are two-fold. First, a database of 50<br />

subjects is assembled and used to illustrate the challenges associated with the problem. Second, a set of experiments is<br />

performed in order to demonstrate the possibility of SWIR cross-spectral matching. Experiments also show that images<br />

captured under different SWIR wavelengths can be matched to visible images with promising results. The role of multispectral<br />

fusion in improving recognition performance in SWIR images is finally illustrated. To the best of our knowledge,<br />

this is the first time cross-spectral SWIR face recognition is being investigated in the open literature.<br />

09:00-11:10, Paper TuAT9.56<br />

Decision Fusion for Patch-Based Face Recognition<br />

Topçu, Berkay, Sabancı Univ.<br />

Erdogan, Hakan, Sabanci Univ.<br />

Patch-based face recognition is a recent method which uses the idea of analyzing face images locally, in order to reduce<br />

the effects of illumination changes and partial occlusions. Feature fusion and decision fusion are two distinct ways to<br />

make use of the extracted local features. Apart from the well-known decision fusion methods, a novel approach for calculating<br />

weights for the weighted sum rule is proposed in this paper. Improvements in recognition accuracies are shown and<br />

superiority of decision fusion over feature fusion is advocated. In the challenging AR database, we obtain significantly<br />

better results using decision fusion as compared to conventional methods and feature fusion methods by using validation<br />

accuracy weighting scheme and nearest-neighbor discriminant analysis dimension reduction method.<br />

09:00-11:10, Paper TuAT9.57<br />

Video based Palmprint Recognition<br />

Methani, Chhaya, IIIT-H<br />

Namboodiri, Anoop, International Inst. of Information Tech.<br />

The use of camera as a biometric sensor is desirable due to its ubiquity and low cost, especially for mobile devices. Palm<br />

print is an effective modality in such cases due to its discrimination power, ease of presentation and the scale and size of<br />

texture for capture by commodity cameras. However, the unconstrained nature of pose and lighting introduces several<br />

challenges in the recognition process. Even minor changes in pose of the palm can induce significant changes in the visibility<br />

of the lines. We turn this property to our advantage by capturing a short video, where the natural palm motion<br />

induces minor pose variations, providing additional texture information. We propose a method to register multiple frames<br />

of the video without requiring correspondence, while being efficient. Experimental results on a set of different 100 palms<br />

show that the use of multiple frames reduces the error rate from 12.75% to 4.7%. We also propose a method for detection<br />

of poor quality samples due to specularities and motion blur, which further reduces the EER to 1.8%.<br />

- 109 -


09:00-11:10, Paper TuAT9.58<br />

Profile Lip Reading for Vowel and Word Recognition<br />

Saitoh, Takeshi, Kyushu Inst. of Tech.<br />

Konishi, Ryosuke, Tottori Univ.<br />

This paper focuses on the profile view, which is the second most typical angle after the frontal face, and proposes a profile<br />

view lip reading method. We applied the normalized cost method to detect profile contour. Five feature points, the tip of the<br />

nose, upper lip, lip corner, lower lip, and chin, were detected from the contour, and eight features obtained from the five<br />

feature points were defined. We gathered two types of utterance scenes, five Japanese vowels and 20 Japanese words. We<br />

selected 20 combinations based on the eight features and carried out recognition experiments. Recognition rates of 99% for<br />

vowel recognition and 86% for word recognition were obtained with five features: two lip heights, two protrusion lengths,<br />

and one lip angle.<br />

11:10-12:10, TuPL1 Anadolu Auditorium<br />

Computational Cameras: Redefining the Image<br />

Shree Nayar Plenary Session<br />

Columbia University, USA<br />

Shree K. Nayar received his PhD degree in Electrical and Computer Engineering from the Robotics Institute at Carnegie<br />

Mellon University in 1990. He is currently the T. C. Chang Professor of Computer Science at Columbia University. He<br />

co-directs the Columbia Vision and Graphics Center. He also heads the Columbia Computer Vision Laboratory (CAVE),<br />

which is dedicated to the development of advanced computer vision systems. His research is focused on three areas; the<br />

creation of novel cameras, the design of physics based models for vision, and the development of algorithms for scene<br />

understanding. His work is motivated by applications in the fields of digital imaging, computer graphics, and robotics.<br />

He has received best paper awards at ICCV 1990, <strong>ICPR</strong> 1994, CVPR 1994, ICCV 1995, CVPR 2000 and CVPR 2004.<br />

He is the recipient of the David Marr Prize (1990 and 1995), the David and Lucile Packard Fellowship (1992), the National<br />

Young Investigator Award (1993), the NTT Distinguished Scientific Achievement Award (1994), the Keck Foundation<br />

Award for Excellence in Teaching (1995) and the Columbia Great Teacher Award (2006). In February 2008, he was elected<br />

to the National Academy of Engineering.<br />

The computational camera embodies the convergence of the camera and the computer. It uses new optics to select rays<br />

from the scene in unusual ways, and an appropriate algorithm to process the selected rays. This as eline to manipulate<br />

images before they are recorded and process the recorded images before they are presented is a powerful one. It enables<br />

us to experience our visual world in rich and compelling ways.<br />

TuBT1 Anadolu Auditorium<br />

Image Analysis – IV Regular Session<br />

Session chair: Hlavac, Vaclav (Czech Technical Univ.)<br />

13:30-13:50, Paper TuBT1.1<br />

Joint Image GMM and Shading MAP Estimation<br />

Shekhovtsov, Alexander, Czech Tech. Univ. in Prague<br />

Hlavac, Vaclav, Czech Tech. Univ.<br />

We consider a simple statistical model of the image, in which the image is represented as a sum of two parts: one part is<br />

explained by an i.i.d. color Gaussian mixture and the other part by a (piecewise) smooth gray scale shading function. The<br />

smoothness is ensured by a quadratic (Tikhonov) or total variation regularization. We derive an EM algorithm to estimate<br />

simultaneously the parameters of the mixture model and the shading. Our algorithms for both kinds of the regularization<br />

solve for shading and mean parameters of the mixture model jointly.<br />

13:50-14:10, Paper TuBT1.2<br />

Continuous Markov Random Field Optimization using Fusion Move Driven Markov Chain Monte Carlo Technique<br />

Kim, Wonsik, Seoul National Univ.<br />

Lee, Kyoung Mu, Seoul National Univ.<br />

Many vision applications have been formulated as Markov Random Field (MRF) problems. Although many of them are<br />

discrete labeling problems, continuous formulation often achieves great improvement on the qualities of the solutions in<br />

- 110 -


some applications such as stereo matching and optical flow. In continuous formulation, however, it is much more difficult<br />

to optimize the target functions. In this paper, we propose a new method called fusion move driven Markov Chain Monte<br />

Carlo method (MCMC-F) that combines the Markov Chain Monte Carlo method and the fusion move to solve continuous<br />

MRF problems effectively. This algorithm exploits powerful fusion move while it fully explore the whole solution space.<br />

We evaluate it using the stereo matching problem. We empirically demonstrate that the proposed algorithm is more stable<br />

and always finds lower energy states than the state-of-the art optimization techniques.<br />

14:10-14:30, Paper TuBT1.3<br />

Approximate Belief Propagation by Hierarchical Averaging of Outgoing Messages<br />

Ogawara, Koichi, Kyushu Univ.<br />

This paper presents an approximate belief propagation algorithm that replaces outgoing messages from a node with the<br />

averaged outgoing message and propagates messages from a low resolution graph to the original graph hierarchically. The<br />

proposed method reduces the computational time by half or two-thirds and reduces the required amount of memory by<br />

60% compared with the standard belief propagation algorithm when applied to an image. The proposed method was implemented<br />

on CPU and GPU, and was evaluated against Middlebury stereo benchmark dataset in comparison with the<br />

standard belief propagation algorithm. It is shown that the proposed method outperforms the other in terms of both the<br />

computational time and the required amount of memory with minor loss of accuracy.<br />

14:30-14:50, Paper TuBT1.4<br />

Cascaded Background Subtraction using Block-Based and Pixel-Based Code<strong>book</strong>s<br />

Guo, Jing-Ming, National Taiwan Univ. of Science and Tech.<br />

Chih-Sheng Hsu, Sheng, National Taiwan Univ. of Science and Tech.<br />

This paper presents a cascaded scheme with block-based and pixel-based code<strong>book</strong>s for background subtraction. The<br />

code<strong>book</strong> is mainly used to compress information to achieve high efficient processing speed. In the block-based stage, 12<br />

intensity values are employed to represent a block. The algorithm extends the concept of the Block Truncation Coding<br />

(BTC), and thus it can further improve the processing efficiency by enjoying its low complexity advantage. In detail, the<br />

block-based stage can remove the most noise without reducing the True Positive (TP) rate, yet it has low precision. To<br />

overcome this problem, the pixel-based stage is adopted to enhance the precision, which also can reduce the False Positive<br />

(FP) rate. Moreover, this study also presents a color model and a match function which can classify an input pixel as<br />

shadow, highlight, background, or foreground. As documented in the experimental results, the proposed algorithm can<br />

provide superior performance to that of the former approaches.<br />

14:50-15:10, Paper TuBT1.5<br />

Moving Cast Shadow Removal based on Local Descriptors<br />

Qin, Rui, Chinese Acad. of Sciences<br />

Liao, Shengcai, Chinese Acad. of Sciences<br />

Lei, Zhen, Chinese Acad. of Sciences<br />

Li, Stan Z., Chinese Acad. of Sciences<br />

Moving cast shadow removal is an important yet difficult problem in video analysis and applications. This paper presents<br />

a novel algorithm for detection of moving cast shadows, that based on a local texture descriptor called Scale Invariant<br />

Local Ternary Pattern (SILTP). An assumption is made that the texture properties of cast shadows bears similar patterns<br />

to those of the background beneath them. The likelihood of cast shadows is derived using information in both color and<br />

texture. An online learning scheme is employed to update the shadow model adaptively. Finally, the posterior probability<br />

of cast shadow region is formulated by further incorporating prior contextual constrains using a Markov Random Field<br />

(MRF) model. The optimal solution is found using graph cuts. Experimental results tested on various scenes demonstrate<br />

the robustness of the algorithm.<br />

TuBT2 Topkapı Hall A<br />

Feature Extraction – I Regular Session<br />

Session chair: Franke, Katrin (Gjøvik Univ. College)<br />

- 111 -


13:30-13:50, Paper TuBT2.1<br />

Local Rotation Invariant Patch Descriptors for 3D Vector Fields<br />

Janis, Fehr, Univ. Freiburg<br />

In this paper, we present two novel methods for the fast computation of local rotation invariant patch descriptors for 3D<br />

vectorial data. Patch based algorithms have recently become very popular approach for a wide range of 2D computer<br />

vision problems. Our local rotation invariant patch descriptors allow an extension of these methods to 3D vector fields.<br />

Our approaches are based on a harmonic representation for local spherical 3D vector field patches, which enables us to<br />

derive fast algorithms for the computation of rotation invariant power spectrum and bispectrum feature descriptors of such<br />

patches.<br />

13:50-14:10, Paper TuBT2.2<br />

Anomaly Detection for Longwave Flir Imagery using Kernel wavelet-Rx<br />

Mehmood, Asif, US Army Res. Lab.<br />

Nasrabadi, Nasser, US Army Res. Lab.<br />

This paper describes a new kernel wavelet-based anomaly detection technique for long-wave (LW) Forward Looking Infrared<br />

(FLIR) imagery. The proposed approach called kernel wavelet-RX algorithm is essentially an extension of the<br />

wavelet-RX algorithm (combination of wavelet transform and RX anomaly detector) to a high dimensional feature space<br />

(possibly infinite) via a certain nonlinear mapping function of the input data. The wavelet-RX algorithm in this high dimensional<br />

feature space can easily be implemented in terms of kernels that implicitly compute dot products in the feature<br />

space (kernelizing the wavelet-RX algorithm). In our kernel wavelet-RX algorithm, a 2-D wavelet transform is first applied<br />

to decompose the input image into uniform subbands. A number of significant subbands (high energy subbands) are concatenated<br />

together to form a subband-image cube. The kernel RX algorithm is then applied to these subband-image cubes<br />

obtained from wavelet decomposition of the LW database images. Experimental results are presented for the proposed<br />

kernel wavelet-RX, wavelet-RX and the classical CFAR algorithm for detecting anomalies (targets) in a large database of<br />

LW imagery. The ROC plots show that the proposed kernel wavelet-RX algorithm outperforms the wavelet-RX as well<br />

as the classical CFAR detector.<br />

14:10-14:30, Paper TuBT2.3<br />

Detection of Salient Image Points using Principal Subspace Manifold Structure<br />

Paiva, Antonio, Univ. of Utah<br />

Tasdizen, Tolga, Univ. of Utah<br />

This paper presents a method to find salient image points in images with regular patterns based on deviations from the<br />

overall manifold structure. The two main contributions are that: (I) the features to extract salient point are derived directly<br />

and in an unsupervised manner from image neighborhoods, and (ii) the manifold structure is utilized, thus avoiding the<br />

assumption that data lies in clusters and the need to do density estimation. We illustrate the concept for the detection of<br />

fingerprint minutiae, fabric defects, and interesting regions of seismic data.<br />

14:30-14:50, Paper TuBT2.4<br />

Triangle-Constraint for Finding More Good Features<br />

Guo, Xiaojie, Tianjin Univ.<br />

Cao, Xiaochun, Tianjin Univ.<br />

We present a novel method for finding more good feature pairs between two sets of features. We first select matched features<br />

by Bi-matching method as seed points, then organize these seed points by adopting the Delaunay triangulation algorithm.<br />

Finally, we use Triangle-Constraint (T-C) to increase both number of correct matches and matching score (the ratio<br />

between number of correct matches and total number of matches). The experimental evaluation shows that our method is<br />

robust to most of geometric and photometric transformations including rotation, scale change, blur, viewpoint change,<br />

JPEG compression and illumination change, and significantly improves both number of correct matches and matching<br />

score.<br />

- 112 -


14:50-15:10, Paper TuBT2.5<br />

Compressing Sparse Feature Vectors using Random Ortho-Projections<br />

Rahtu, Esa, Univ. of Oulu<br />

Salo, Mikko, Univ. of Helsinki<br />

Heikkilä, Janne, Univ. of Oulu<br />

In this paper we investigate the usage of random ortho-projections in the compression of sparse feature vectors. The study<br />

is carried out by evaluating the compressed features in classification tasks instead of concentrating on reconstruction accuracy.<br />

In the random ortho-projection method, the mapping for the compression can be obtained without any further<br />

knowledge of the original features. This makes the approach favorable if training data is costly or impossible to obtain.<br />

The independence from the data also enables one to embed the compression scheme directly into the computation of the<br />

original features. Our study is inspired by the results in compressive sensing, which state that up to a certain compression<br />

ratio and with high probability, such projections result in no loss of information. In comparison to learning based compression,<br />

namely principal component analysis (PCA), the random projections resulted in comparable performance already<br />

at high compression ratios depending on the sparsity of the original features.<br />

TuBT3 Marmara Hall<br />

Object Detection and Recognition – II Regular Session<br />

Session chair: Porikli, Fatih (MERL)<br />

13:30-13:50, Paper TuBT3.1<br />

Learning Discriminative Features based on Distribution<br />

Shen, Jifeng, Southeast Univ.<br />

Yang, Wankou, Southeast Univ.<br />

Sun, Changyin, Southeast Univ.<br />

In this paper, a novel feature named adaptive projection LBP (APLBP) is proposed for face detection. To promote discriminative<br />

power, the distribution information of training samples is embedded into the proposed feature. APLBP is generated<br />

by LDA which maximizes the margin between positive and negative samples adaptively, utilizing characteristics<br />

of similarity to Gaussian distribution of the training samples. Asymmetric Gentle Adaboost is utilized to train strong classifier<br />

and nested cascade is applied to construct the final detector. Experimental results based on MIT+CMU database<br />

demonstrate that APLBP feature outperforms several well-existing features due to its excellent discriminative power with<br />

less feature number.<br />

13:50-14:10, Paper TuBT3.2<br />

Sub-Category Optimization for Multi-View Multi-Pose Object Detection<br />

Das, Dipankar, Saitama Univ.<br />

Kobayashi, Yoshinori, Saitama Univ.<br />

Kuno, Yoshinori, Saitama Univ.<br />

Object category detection with large appearance variation is a fundamental problem in computer vision. The appearance<br />

of object categories can change due to intra-class variability, viewpoint, and illumination. For object categories with large<br />

appearance change a sub-categorization based approach is necessary. This paper proposes a sub-category optimization<br />

approach that automatically divides an object category into an appropriate number of sub-categories based on appearance<br />

variation. Instead of using a predefined intra-category sub-categorization based on domain knowledge or validation<br />

datasets, we divide the sample space by unsupervised clustering based on discriminative image features. Then the clustering<br />

performance is verified using a sub-category discriminant analysis. Based on the clustering performance of the unsupervised<br />

approach and sub-category discriminant analysis results we determine an optimal number of sub-categories per object category.<br />

Extensive experimental results are shown using two standard and the authors’ own databases. The comparison<br />

results show that our approach outperforms the state-of-the-art methods.<br />

14:10-14:30, Paper TuBT3.3<br />

Learning and Detection of Object Landmarks in Canonical Object Space<br />

Kamarainen, Joni-Kristian, Lappeenranta Univ. of Tech.<br />

Ilonen, Jarmo, Lappeenranta Univ. of Tech.<br />

- 113 -


This work contributes to part-based object detection and recognition by introducing an enhanced method for local part<br />

detection. The method is based on complex-valued multiresolution Gabor features and their ranking using multiple hypothesis<br />

testing. In the present work, our main contribution is the introduction of a canonical object space, where objects<br />

are represented in their ``expected pose and visual appearance’’. The canonical space circumvents the problem of geometric<br />

image normalisation prior to feature extraction. In addition, we define a compact set of Gabor filter parameters, from<br />

where the optimal values can be easily devised. These enhancements make our method an attractive landmark detector<br />

for part-based object detection and recognition methods.<br />

14:30-14:50, Paper TuBT3.4<br />

Multiple-Shot Person Re-Identification by HPE Signature<br />

Bazzani, Loris, Univ. of Verona<br />

Cristani, Marco, Univ. of Verona<br />

Perina, Alessandro, Univ. of Verona<br />

Farenzena, Michela, Univ. of Verona<br />

Murino, Vittorio, Univ. of Verona<br />

In this paper, we propose a novel appearance-based method for person re-identification, that condenses a set of frames of<br />

the same individual into a highly informative signature, called Histogram Plus Epitome, HPE. It incorporates complementary<br />

global and local statistical descriptions of the human appearance, focusing on the overall chromatic content, via<br />

histograms representation, and on the presence of recurrent local patches, via epitome estimation. The matching of HPEs<br />

provides optimal performances against low resolution, occlusions, pose and illumination variations, defining novel stateof-the-art<br />

results on all the datasets considered.<br />

14:50-15:10, Paper TuBT3.5<br />

Building Detection in a Single Remotely Sensed Image with a Point Process of Rectangles<br />

Benedek, Csaba, Computer and Automation Res. Inst. Hungarian<br />

Descombes, Xavier, INRIA<br />

Zerubia, Josiane, INRIA<br />

In this paper we introduce a probabilistic approach of building extraction in remotely sensed images. To cope with data<br />

heterogeneity we construct a flexible hierarchical framework which can create various building appearance models from<br />

different elementary feature based modules. A global optimization process attempts to find the optimal configuration of<br />

buildings, considering simultaneously the observed data, prior knowledge, and interactions between the neighboring building<br />

parts. The proposed method is evaluated on various aerial image sets containing more than 500 buildings, and the<br />

results are matched against two state-of-the-art techniques.<br />

TuBT4 Dolmabahçe Hall A<br />

Model Selection and Clustering Regular Session<br />

Session chair: Shapiro, Linda (Univ. of Washington)<br />

13:30-13:50, Paper TuBT4.1<br />

A Relationship between Generalization Error and Training Samples in Kernel Regressors<br />

Tanaka, Akira, Hokkaido Univ.<br />

Imai, Hideyuki, Hokkaido Univ.<br />

Kudo, Mineichi, Hokkaido Univ.<br />

Miyakoshi, Masaaki, Hokkaido Univ.<br />

A relationship between generalization error and training samples in kernel regressors is discussed in this paper. The generalization<br />

error can be decomposed into two components. One is a distance between an unknown true function and an<br />

adopted model space. The other is a distance between an estimated function and the orthogonal projection of the unknown<br />

true function onto the model space. In our previous work, we gave a framework to evaluate the first component. In this<br />

paper, we theoretically analyze the second one and show that a larger set of training samples usually causes a larger generalization<br />

error.<br />

- 114 -


13:50-14:10, Paper TuBT4.2<br />

Localized Multiple Kernel Regression<br />

Gönen, Mehmet, Bogazici Univ.<br />

Alpaydin, Ethem, Bogazici Univ.<br />

Multiple kernel learning (MKL) uses a weighted combination of kernels where the weight of each kernel is optimized<br />

during training. However, MKL assigns the same weight to a kernel over the whole input space. Our main objective is the<br />

formulation of the localized multiple kernel learning (LMKL) framework that allows kernels to be combined with different<br />

weights in different regions of the input space by using a gating model. In this paper, we apply the LMKL framework to<br />

regression estimation and derive a learning algorithm for this extension. Canonical support vector regression may over fit<br />

unless the kernel parameters are selected appropriately; we see that even if provide more kernels than necessary, LMKL<br />

uses only as many as needed and does not overfit due to its inherent regularization.<br />

14:10-14:30, Paper TuBT4.3<br />

Probabilistic Clustering using the Baum-Eagon Inequality<br />

Rota Bulo’, Samuel, Univ. Ca’ Foscari di Venezia<br />

Pelillo, Marcello, Ca’ Foscari Univ.<br />

The paper introduces a framework for clustering data objects in a similarity-based context. The aim is to cluster objects<br />

into a given number of classes without imposing a hard partition, but allowing for a soft assignment of objects to clusters.<br />

Our approach uses the assumption that similarities reflect the likelihood of the objects to be in a same class in order to<br />

derive a probabilistic model for estimating the unknown cluster assignments. This leads to a polynomial optimization in<br />

probability domain, which is tackled by means of a result due to Baum and Eagon. Experiments on both synthetic and real<br />

standard datasets show the effectiveness of our approach.<br />

14:30-14:50, Paper TuBT4.4<br />

Ensemble Clustering via Random Walker Consensus Strategy<br />

Abdala, Daniel Duarte, Univ. of Münster<br />

Wattuya, Pakaket, Univ. of Münster<br />

Jiang, Xiaoyi, Univ. of Münster<br />

In this paper we present the adaptation of a random walker algorithm for combination of image segmentations to work<br />

with clustering problems. In order to achieve it, we pre-process the ensemble of clusterings to generate its graph representation.<br />

We show experimentally that a very small neighborhood will produce similar results if compared with larger choices.<br />

This fact alone improves the computational time needed to produce the final consensual clustering. We also present an experimental<br />

comparison between our results against other graph based and well known combination clustering methods in<br />

order to assess the quality of this approach.<br />

14:50-15:10, Paper TuBT4.5<br />

Bhattacharyya Clustering with Applications to Mixture Simplifications<br />

Nielsen, Frank, Ecole Polytechnique/SONY CLS<br />

Boltz, Sylvain, Ecole Polytechnique/SONY CLS<br />

Schwander, Olivier, Ecole Polytechnique/SONY CLS<br />

Bhattacharrya distance (BD) is a widely used distance in statistics to compare probability density functions (PDFs). It has<br />

shown strong statistical properties (in terms of Bayes error) and it relates to Fisher information. It has also practical advantages,<br />

since it strongly relates on measuring the overlap of the supports of the PDFs. Unfortunately, even with common<br />

parametric models on PDFs, few closed-form formulas are known. Moreover, the BD centroid estimation was limited to<br />

univariate gaussian PDFs in the literature and no convergence guarantees were provided. In this paper, we propose a<br />

closed-form formula for BD on a general class of parametric distributions named exponential families. We show that the<br />

BD is a Burbea-Rao divergence for the log normalizer of the exponential family. We propose an efficient iterative scheme<br />

to compute a BD centroid on exponential families. Finally, these results allow us to define a Bhattacharrya hierarchical<br />

clustering algorithms (BHC). It can be viewed as a generalization of k-means on BD. Results on image segmentation<br />

shows the stability of the method.<br />

- 115 -


TuBT5 Dolmabahçe Hall B<br />

Watermarking and Authentication Regular Session<br />

Session chair: Bülent Sankur (Boğaziçi Univ.)<br />

13:30-13:50, Paper TuBT5.1<br />

High Capacity Data Hiding for Binary Image Authentication<br />

Guo, Meng, Beijing Univ. of Tech.<br />

Zhang, Hongbin, Beijing Univ. of Tech.<br />

This paper proposes a novel data hiding scheme with high capacity for binary images, including document images, halftone<br />

images, scanned figures, text and signatures. In our scheme, the embedding efficiency and the placement of embedding<br />

changes are considered simultaneously. Given an MxN image block, the upper bound of the amount of bits that can be<br />

embedded of the scheme is nlog2((MxN)/n + 1) by changing at most n pixels. Experimental results show that the proposed<br />

scheme can embed more data, meanwhile maintain a better quality, and have wider applications than existing schemes.<br />

13:50-14:10, Paper TuBT5.2<br />

Secure Self-Recovery Image Authentication using Randomly-Sized Blocks<br />

Hassan, Ammar M., Otto-von-Guericke Univ.<br />

Al-Hamadi, Ayoub, IESK<br />

Michaelis, Bernd, IESK<br />

Hasan, Yassin M. Y., Assiut Univ.<br />

Wahab, Mohamed A. A., Minia Univ.<br />

In this paper, a secure variable-size block-based image authentication technique is proposed that can not only localize the<br />

alteration detection but also recover the missing data. An image undergoes recursive arbitrarily-asymmetric binary tree<br />

partitioning to obtain randomly-sized blocks spanning the entire image. To enhance reliability of altered block recovery,<br />

multiple description coding (MDC) is utilized to generate two block descriptions. Block signature copies and the two<br />

block descriptions are embedded into two relatively-distant blocks making a doubly linked chain. The experimental results<br />

deposit that the proposed technique successfully both localizes and compensates the alterations. Furthermore, it is robust<br />

against the vector quantization (VQ) attack.<br />

14:10-14:30, Paper TuBT5.3<br />

Blind Wavelet based Logo Watermarking Resisting to Cropping<br />

Soheili, Mohammadreza, Tarbiat Moallem Univ.<br />

In this paper we propose a blind wavelet-based logo watermarking scheme focusing on resisting to cropping. The binary<br />

logo is embedded in the LL2 sub-band of host image, using quantization technique. For increasing robustness of proposed<br />

algorithm two dimensional parity bits are added to the binary logo. Experimental results show that the proposed watermarking<br />

method can resist not only cropping attack, but also some common signal processing attacks, such as JPEG compression,<br />

average and median filtering, rotation and scaling.<br />

14:30-14:50, Paper TuBT5.4<br />

The New Blockwise Algorithm for Large-Scale Images Robust Watermarking<br />

Mitekin, Vitaly, Russian Acad. of Sciences<br />

Glumov, Nikolay, Russian Acad. of Sciences<br />

A new algorithm for digital watermarking of large-scale digital images is proposed in the article. The proposed algorithm<br />

provides watermark robustness to a wide range of host image distortions and has a number of advantages compared to an<br />

existing algorithms of robust watermarking.<br />

14:50-15:10, Paper TuBT5.5<br />

Lossless ROI Medical Image Watermarking Technique with Enhanced Security and High Payload Embedding<br />

Kundu, Malay Kumar, Indian Statistical Inst.<br />

Das, Sudeb, Indian Statistical Inst.<br />

- 116 -


In this article, a new fragile, blind, high payload capacity, ROI (Region of Interest) preserving Medical image watermarking<br />

(MIW) technique in the spatial domain for gray scale medical images is proposed. We present a watermarking scheme<br />

that combines lossless data compression and encryption technique in application to medical images. The effectiveness of<br />

the proposed scheme, proven through experiments on various medical images through various image quality measure matrices<br />

such as PSNR, MSE and MSSIM enables us to argue that, the method will help to maintain Electronic Patient Report(EPR)/DICOM<br />

data privacy and medical image integrity.<br />

TuBT6 Topkapı Hall B<br />

Face Recognition – I Regular Session<br />

Session chair: Ross, Arun (West Virginia Univ.)<br />

13:0-13:50, Paper TuBT6.1<br />

Efficient Facial Attribute Recognition with a Spatial Code<strong>book</strong><br />

Ijiri, Yoshihisa, OMRON Corp.<br />

Lao, Shihong, OMRON Corp.<br />

Han, Tony X., Univ. of Missouri<br />

Murase, Hiroshi, Nagoya Univ.<br />

There is a large number of possible facial attributes such as hairstyle, with/without glasses, with/without mustache, etc.<br />

Considering large number of facial attributes and their combinations, it is difficult to build attributes classifiers for all possible<br />

combinations needed in various applications, especially at the designing stage. To tackle this important and challenging<br />

problem, we propose a novel efficient facial attributes recognition algorithm using a learned spatial code<strong>book</strong>.<br />

The Maximum Entropy and Maximum Orthogonality (MEMO) criterion is followed to learn the spatial code<strong>book</strong>. With<br />

a spatial code<strong>book</strong> constructed at the designing stage, attribute classifiers can be trained on demand with a small number<br />

of exemplars with high accuracy on the testing data. Meanwhile, up to 600 times speedup is achieved in the on-demand<br />

training process, compared to current state-of-the-art method. The effectiveness of the proposed method is supported by<br />

convincing experimental results.<br />

13:50-14:10, Paper TuBT6.2<br />

Feature Space Hausdorff Distance for Face Recognition<br />

Chen, Shaokang, NICTA<br />

Lovell, Brian Carrington, The Univ. of Queensland<br />

We propose a novel face image similarity measure based on Hausdorff distance (HD). In contrast to conventional HDbased<br />

measures, which are generally applied in the image space (such as edge maps or gradient images), the proposed<br />

HD-based similarity measure is applied in the feature space. By extending the concept of HD using a variable radius and<br />

reference set, we can generate a neighbourhood set for HD measures in feature space and then apply this concept for classification.<br />

Experiments on the Labeled Faces in the Wild; and FRGC datasets show that the proposed measure improves<br />

the overall classification performance quite dramatically, especially under the highly desirable low false acceptance rate<br />

conditions.<br />

14:10-14:30, Paper TuBT6.3<br />

How to Measure Biometric Information?<br />

Sutcu, Yagiz, Pol. Inst. of New York Univ.<br />

Sencar, Husrev Taha, TOBB Univ. of Ec. and Tech.<br />

Memon, Nasir, Pol. Inst. of New York Univ.<br />

Being able to measure the actual information content of biometrics is very important but also a challenging problem. Main<br />

difficulty here is not only related to the selected feature representation of the biometric data, but also related to the matching<br />

algorithm employed in biometric systems. In this paper, we propose a new measure for measuring biometric information<br />

using relative entropy between intra-user and inter-user distance distributions. As an example, we evaluated the proposed<br />

measure on a face image dataset.<br />

- 117 -


14:30-14:50, Paper TuBT6.4<br />

Intensity-Based Congealing for Unsupervised Joint Image Alignment<br />

Storer, Markus, Graz Univ. of Tech.<br />

Urschler, Martin, Graz Univ. of Tech.<br />

Bischof, Horst, Graz Univ. of Tech.<br />

We present an approach for unsupervised alignment of an ensemble of images called congealing. Our algorithm is based<br />

on image registration using the mutual information measure as a cost function. The cost function is optimized by a standard<br />

gradient descent method in a multiresolution scheme. As opposed to other congealing methods, which use the SSD measure,<br />

the mutual information measure is better suited as a similarity measure for registering images since no prior assumptions<br />

on the relation of intensities between images are required. We present alignment results on the MNIST handwritten digit<br />

database and on facial images obtained from the CVL database.<br />

14:50-15:10, Paper TuBT6.5<br />

An Illumination Quality Measure for Face Recognition<br />

Rizo-Rodriguez, Dayron, Advanced Tech. Application Center<br />

Mendez-Vazquez, Heydi, Advanced Tech. Application Center<br />

Garcia, Edel, Advanced Tech. Application Center<br />

A method to determine whether face images are affected or not by lighting problems is proposed. The method is the result<br />

of combining the analysis of lighting effect on face regions with the analysis of special areas which have a weight on the<br />

decision. Good results were obtained classifying well and badly illuminated images. The proposed method was inserted<br />

on a face recognition framework in order to apply the preprocessing step only to those images affected by illumination<br />

variations. The good performance achieved on verification and identification experiments, confirm that it is better to apply<br />

the proposed methodology than to preprocess all images when the lighting conditions are variable.<br />

TuBT7 Dolmabahçe Hall C<br />

Biomedical Image Segmentation Regular Session<br />

Session chair: Kato, Zoltan (Univ. of Szeged)<br />

13:30-13:50, Paper TuBT7.1<br />

Cascaded Segmentation of Grained Cell Tissue with Active Contour Models<br />

Moeller, Birgit, Martin-Luther-Univ. Halle-Wittenberg<br />

Stöhr, Nadine, ZAMED, Martin Luther Univ. Halle-Wittenberg<br />

Hüttelmaier, Stefan, ZAMED, Martin Luther Univ. Halle-Wittenberg<br />

Posch, Stefan, Martin-Luther-Univ. Halle-Wittenberg<br />

Cell tissue in microscope images is often grained and its intensities do not well agree with Gaussian distribution assumptions<br />

widely used in many segmentation approaches. We present a new cascaded segmentation scheme for inhomogeneous<br />

cell tissue based on active contour models. Cell regions are iteratively expanded from initial nuclei regions applying a<br />

data-dependent number of optimization levels. Experimental results on a set of microscope images from a human hepatoma<br />

cell line prove high quality of the results with regard to the cell segmentation task and biomedical investigations.<br />

13:50-14:10, Paper TuBT7.2<br />

Live Cell Segmentation in Fluorescence Microscopy via Graph Cut<br />

Lesko, Milan, Univ. of Szeged<br />

Kato, Zoltan, Univ. of Szeged<br />

Nagy, Antal, Univ. of Szeged<br />

Gombos, Imre, Hungarian Acad. of Sciences<br />

Torok, Zsolt, Hungarian Acad. of Sciences<br />

Vigh Jr, Laszlo, Univ. of Szeged<br />

Vigh, Laszlo, Hungarian Acad. of Sciences<br />

We propose a novel Markovian segmentation model which takes into account edge information. By construction, the<br />

model uses only pairwise interactions and its energy is sub modular. Thus the exact energy minima is obtained via a maxflow/min-cut<br />

algorithm. The method has been quantitatively evaluated on synthetic images as well as on fluorescence microscopic<br />

images of live cells.<br />

- 118 -


14:10-14:30, Paper TuBT7.3<br />

Retinal Blood Vessels Segmentation using the Radial Projection and Supervised Classification<br />

Peng, Qinmu, Huazhong Univ. of Science and Tech.<br />

You, Xinge, Huazhong Univ. of Science and Tech.<br />

Zhou, Long, Wuhan Pol. Univ.<br />

Cheung, Yiu-Ming, Hong Kong Baptist Univ.<br />

The low-contrast and narrow blood vessels in retinal images are difficult to be extracted but useful in revealing certain<br />

systemic disease. Motivated by the goals of improving detection of such vessels, we propose the radial projection method<br />

to locate the vessel centerlines. Then the supervised classification is used for extracting the major structures of vessels.<br />

The final segmentation is obtained by the union of the two types of vessels after removal schemes. Our approach is tested<br />

on the STARE database, the results demonstrate that our algorithm can yield better segmentation.<br />

14:30-14:50, Paper TuBT7.4<br />

Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During Speech<br />

Fasel, Ian, Univ. of Arizona<br />

Berry, Jeff, Univ. of Arizona<br />

Ultrasound has become a useful tool for speech scientists studying mechanisms of language sound production. State-ofthe-art<br />

methods for extracting tongue contours from ultrasound images of the mouth, typically based on active contour<br />

snakes, require considerable manual interaction by an expert linguist. In this paper we describe a novel method for fully<br />

automatic extraction of tongue contours based on a hierarchy of restricted Boltzmann machines (RBMs), i.e. deep belief<br />

networks (DBNs). Usually, DBNs are first trained generatively on sensor data, then discriminatively to predict humanprovided<br />

labels of the data. In this paper we introduce the translational RBM (tRBM), which allows the DBN to make use<br />

of both human labels and raw sensor data at all stages of learning. This method yields performance in contour extraction<br />

comparable to human labelers, without any temporal smoothing or human intervention, and runs in real-time.<br />

14:50-15:10, Paper TuBT7.5<br />

Automated Gland Segmentation and Classification for Gleason Grading of Prostate Tissue Images<br />

Nguyen, Kien, Michigan State Univ.<br />

Jain, Anil, Michigan State Univ.<br />

Allen, Ronald, BioImagene<br />

The well-known Gleason grading method for an H&E prostatic carcinoma tissue image uses morphological features of histology<br />

patterns within a tissue slide to classify it into 5 grades. We have developed an automated gland segmentation and<br />

classification method that will be used for automated Gleason grading of a prostatic carcinoma tissue image. We demonstrate<br />

the performance of the proposed classification system for a three-class classification problem (benign, grade 3 carcinoma<br />

and grade 4 carcinoma) on a dataset containing 78 tissue images and achieve a classification accuracy of 88.84%. In comparison<br />

to the other segmentation-based methods, our approach combines the similarity of morphological patterns associated<br />

with a grade with the domain knowledge such as the appearance of nuclei and blue mucin for the grading task.<br />

TuCT1 Topkapı Hall B<br />

Face Recognition – II Regular Session<br />

Session chair: Tistarelli, Massimo (Univ. of Sassari)<br />

15:40-16:00, Paper TuCT1.1<br />

Multi-Resolution Local Appearance-Based Face Verification<br />

Gao, Hua, Karlsruhe Inst. of Tech.<br />

Ekenel, Hazim Kemal, Karlsruhe Inst. of Tech.<br />

Fischer, Mika, Karlsruhe Inst. of Tech.<br />

Stiefelhagen, Rainer, Karlsruhe Inst. of Tech. & Fraunhofer IITB<br />

Facial analysis based on local regions/blocks usually outperforms holistic approaches because it is less sensitive to local<br />

deformations and occlusions. Moreover, modeling local features enables us to avoid the problem of high dimensionality<br />

of feature space. In this paper, we model the local face blocks with Gabor features and project them into a discriminant<br />

identity space. The similarity score of a face pair is determined by fusion of the local classifiers. To acquire complementary<br />

- 119 -


information in different scales of face images, we integrate the local decisions from various image resolutions. The proposed<br />

multi-resolution block based face verification system is evaluated on the experiment 4 of Face Recognition Grand Challenge<br />

(FRGC) version 2.0. We obtained 92.5% verification rate@0.1% FAR, which is the highest performance reported<br />

on this experiment so far in the literature.<br />

16:00-16:20, Paper TuCT1.2<br />

Partial Face Biometry using Shape Decomposition on 2D Conformal Maps of Faces<br />

Szeptycki, Przemyslaw, Ec. Centrale de Lyon<br />

Ardabilian, Mohsen, Ec. Centrale de Lyon<br />

Chen, Liming, Ec. Centrale de Lyon<br />

Zeng, Wei, Wayne State Univ.<br />

Gu, Xianfeng, State Univ. of New York at Stony Brook<br />

Samaras, Dimitris, Stony Brook Univ.<br />

In this paper, we introduce a new approach for partial 3D face recognition, which makes use of shape decomposition over<br />

the rigid1 part of a face. To explore the descriptiveness of shape dissimilarity over an isometric part of a face, which has<br />

lower probability to be influenced by expression, we transform a 3D shape to a 2D domain using conformal mapping and<br />

use shape decomposition as a similarity measurement. In our work we investigate several classifiers as well as several<br />

shape descriptors for recognition purposes. Recognition tests on a subset of the FRGC data set show approximately 80%<br />

rank-one recognition rate using only the eyes and nose part of the face.<br />

16:20-16:40, Paper TuCT1.3<br />

Gender Classification using Interlaced Derivative Patterns<br />

Shobeirinejad, Ameneh, Griffith Univ.<br />

Gao, Yongsheng, Griffith Univ.<br />

Automated gender recognition has become an interesting and challenging research problem in recent years with its potential<br />

applications in security industry and human-computer interaction systems. In this paper we present a novel feature representation,<br />

namely Interlaced Derivative Patterns (IDP), which is a derivative-based technique to extract discriminative<br />

facial features for gender classification. The proposed technique operates on a neighborhood around a pixel and concatenates<br />

the extracted regional feature distributions to form a feature vector. The experimental results demonstrate the effectiveness<br />

of the IDP method for gender classification, showing that the proposed approach achieves 29.6% relative error<br />

reduction compared to Local Binary Patterns (LBP), while it performs over four times faster than Local Derivative Patterns<br />

(LDP).<br />

16:40-17:00, Paper TuCT1.4<br />

Heterogeneous Face Recognition: Matching NIR to Visible Light Images<br />

Klare, Brendan, Michigan State Univ.<br />

Jain, Anil, Michigan State Univ.<br />

Matching near-infrared (NIR) face images to visible light (VIS) face images offers a robust approach to face recognition<br />

with unconstrained illumination. In this paper we propose a novel method of heterogeneous face recognition that uses a<br />

common feature-based representation for both NIR images as well as VIS images. Linear discriminant analysis is performed<br />

on a collection of random subspaces to learn discriminative projections. NIR and VIS images are matched (I) directly<br />

using the random subspace projections, and (ii) using sparse representation classification. Experimental results demonstrate<br />

the effectiveness of the proposed approach for matching NIR and VIS face images.<br />

17:00-17:20, Paper TuCT1.5<br />

Clustering Face Carvings: Application to Devatas of Angkor Wat<br />

Klare, Brendan, Michigan State Univ.<br />

Mallapragada, Pavan Kumar, Michigan State Univ.<br />

Jain, Anil, Michigan State Univ.<br />

Davis, Kent, DatAsia Inc.<br />

We propose a framework for clustering and visualization of images of face carvings at archaeological sites. The pairwise<br />

- 120 -


similarities among face carvings are computed by performing Procrustes analysis on local facial features (eyes, nose,<br />

mouth, etc.). The distance between corresponding face features is computed using point distribution models; the final pairwise<br />

similarity is the weighted sum of feature similarities. A web-based interface is provided to allow domain experts to<br />

interactively assign different weights to each face feature, and display hierarchical clustering results in 2D or 3D projections<br />

obtained by multidimensional scaling. The proposed framework has been successfully applied to the devata goddesses<br />

depicted in the ancient Angkor Wat temple. The resulting clusterings and visualization will enable a systematic anthropological,<br />

ethnological and artistic analysis of nearly 1,800 stone portraits of devatas of Angkor Wat.<br />

TuCT2 Topkapı Hall A<br />

Feature Extraction – II Regular Session<br />

Session chair: Covell, Michele (Google, Inc.)<br />

15:40-16:00, Paper TuCT2.1<br />

Action Recognition using Spatial-Temporal Context<br />

Hu, Qiong, Chinese Acad. of Sciences<br />

Qin, Lei, Chinese Acad. of Sciences<br />

Huang, Qingming, Chinese Acad. of Sciences<br />

Jiang, Shuqiang, Chinese Acad. of Sciences<br />

Tian, Qi, Univ. of Texas at San Antonio<br />

The spatial-temporal local features and the bag of words representation have been widely used in the action recognition<br />

field. However, this framework usually neglects the internal spatial-temporal relations between video-words, resulting in<br />

ambiguity in action recognition task, especially for videos in the wild. In this paper, we solve this problem by utilizing the<br />

volumetric context around a video-word. Here, a local histogram of video-words distribution is calculated, which is referred<br />

as the context and further clustered into contextual words. To effectively use the contextual information, the descriptive<br />

video-phrases (ST-DVPs) and the descriptive video-cliques (ST-DVCs) are proposed. A general framework for ST-DVP<br />

and ST-DVC generation is described, and then action recognition can be done based on all these representations and their<br />

combinations. The proposed method is evaluated on two challenging human action datasets: the KTH dataset and the<br />

YouTube dataset. Experiment results confirm the validity of our approach.<br />

16:00-16:20, Paper TuCT2.2<br />

Feature Extraction for Simple Classification<br />

Stuhlsatz, André, Univ. of Applied Sciences Duesseldorf<br />

Lippel, Jens, Univ. of Applied Sciences Duesseldorf<br />

Zielke, Thomas, Univ. of Applied Sciences Duesseldorf<br />

Constructing a recognition system based on raw measurements for different objects usually requires expert knowledge of<br />

domain specific data preprocessing, feature extraction, and classifier design. We seek to simplify this process in a way<br />

that can be applied without any knowledge about the data domain and the specific properties of different classification algorithms.<br />

That is, a recognition system should be simple to construct and simple to operate in practical applications. For<br />

this, we have developed a nonlinear feature extractor for high-dimensional complex patterns, using Deep Neural Networks<br />

(DNN). Trained partly supervised and unsupervised, the DNN effectively implements a nonlinear discriminant analysis<br />

based on a Fisher criterion in a feature space of very low dimensions. Our experiments show that the automatically extracted<br />

features work very well with simple linear discriminants, while the recognition rates improve only minimally if more sophisticated<br />

classification algorithms like Support Vector Machines (SVM) are used instead.<br />

16:20-16:40, Paper TuCT2.3<br />

Towards a Generic Feature-Selection Measure for Intrusion Detection<br />

Nguyen, Hai Thanh, Gjøvik Univ. Coll.<br />

Franke, Katrin, Gjøvik Univ. Coll.<br />

Petrovic, Slobodan, Gjøvik Univ. Coll.<br />

Performance of a pattern recognition system depends strongly on the employed feature-selection method. We perform an<br />

in-depth analysis of two main measures used in the filter model: the correlation-feature-selection (CFS) measure and the<br />

minimal-redundancy-maximal-relevance (mRMR) measure. We show that these measures can be fused and generalized<br />

into a generic feature-selection (GeFS) measure. Further on, we propose a new feature-selection method that ensures glob-<br />

- 121 -


ally optimal feature sets. The new approach is based on solving a mixed 0-1 linear programming problem (M01LP) by<br />

using the branch-and-bound algorithm. In this M01LP problem, the number of constraints and variables is linear ($O(n)$)<br />

in the number $n$ of full set features. In order to evaluate the quality of our GeFS measure, we chose the design of an intrusion<br />

detection system (IDS) as a possible application. Experimental results obtained over the KDD Cup’99 test data set<br />

for IDS show that the GeFS measure removes 93% of irrelevant and redundant features from the original data set, while<br />

keeping or yielding an even better classification accuracy.<br />

16:40-17:00, Paper TuCT2.4<br />

Discriminative Basis Selection using Non-Negative Matrix Factorization<br />

Jammalamadaka, Aruna, Univ. of California, Santa Barbara<br />

Joshi, Swapna, Univ. of California, Santa Barbara<br />

Shanmuga Vadivel, Karthikeyan, Univ. of California, Santa Barbara<br />

Manjunath, B. S., Univ. of California, Santa Barbara<br />

Non-negative matrix factorization (NMF) has proven to be useful in image classification applications such as face recognition.<br />

We propose a novel discriminative basis selection method for classification of image categories based on the popular<br />

term frequency-inverse document frequency (TF-IDF) weight used in information retrieval. We extend the algorithm to<br />

incorporate color, and overcome the drawbacks of using unaligned images. Our method is able to choose visually significant<br />

bases which best discriminate between categories and thus prune the classification space to increase correct classifications.<br />

We apply our technique to ETH-80, a standard image classification benchmark dataset. Our results show that our algorithm<br />

outperforms other state-of-the-art techniques.<br />

17:00-17:20, Paper TuCT2.5<br />

Recognizing Dance Motions with Segmental SVD<br />

Deng, Liqun, Univ. of Science & Tech. of China<br />

Leung, Howard, City Univ. of Hong Kong<br />

Gu, Naijie, Univ. of Science & Tech. of China<br />

Yang, Yang, Univ. of Science & Tech. of China<br />

In this paper, a novel concept of segmental singular value decomposition (SegSVD) is proposed to represent a motion<br />

pattern with a hierarchical structure. The similarity measure based on the SegSVD representation is also proposed. SegSVD<br />

is capable of capturing the temporal information of the time series. It is effective in matching patterns in a time series in<br />

which the start and end points of the patterns are not known in advance. We evaluate the performance of our method on<br />

both isolated motion classification and continuous motion recognition for dance movements. Experiments show that our<br />

method outperforms existing work in terms of higher recognition accuracy.<br />

TuCT3 Marmara Hall<br />

Object Detection and Recognition – III Regular Session<br />

Session chair: Nixon, Mark (Univ. of Southampton)<br />

15:40-16:00, Paper TuCT3.1<br />

Multi-Class Graph Boosting with Subgraph Sharing for Object Recognition<br />

Zhang, Bang, Univ. of New South Wales, National ICT Australia<br />

Ye, Getian, Univ. of New South Wales<br />

Wang, Yang, National ICT Australia, Univ. of New South Wales<br />

Wang, Wei, Univ. of New South Wales<br />

Xu, Jie, National ICT Australia, Univ. of New South Wales<br />

Herman, Gunawan, National ICT Australia, Univ. of New South Wales<br />

Yang, Jun, National ICT Australia, Univ. of New South Wales<br />

In this paper, we propose a novel multi-class graph boosting algorithm to recognize different visual objects. The proposed<br />

method treats subgraph as feature to construct base classifier, and utilizes popular error correcting output code scheme to<br />

solve multi-class problem. Both factors, base classifier and error-correcting coding matrix are considered simultaneously.<br />

And subgragphs, which are shareable by different classes, are wisely used to improve the classification performance. The<br />

experimental results on multi-class object recognition show the effectiveness of the proposed algorithm.<br />

- 122 -


16:00-16:20, Paper TuCT3.2<br />

Level-Set Segmentation of Brain Tumors using a New Hybrid Speed Function<br />

Cho, Wanhyun, Chonnam National Univ.<br />

Park, Jonghyun, Chonnam National Univ.<br />

Park, Soonyoung, Mokpo National Univ.<br />

Kim, Soohyung, Chonnam National Univ.<br />

Kim, Sunworl, Chonnam National Univ.<br />

Ahn, Gukdong, Chonnam National Univ.<br />

Lee, Myungeun, Chonnam National Univ.<br />

Lee, Gueesang, Chonnam National Univ.<br />

This paper presents a new hybrid speed function needed to perform image segmentation within the level-set framework.<br />

This speed function provides a general form that incorporates the alignment term as a part of the driving force for the<br />

proper edge direction of an active contour by using the probability term derived from the region partition scheme and, for<br />

regularization, the geodesics contour term. First, we use an external force for active contours as the Gradient Vector Flow<br />

field. This is computed as the diffusion of gradient vectors of a gray level edge map derived from an image. Second, we<br />

partition the image domain by progressively fitting statistical models to the intensity of each region. Here we adopt two<br />

Gaussian distributions to model the intensity distribution of the inside and outside of the evolving curve partitioning the<br />

image domain. Third, we use the active contour model that has the computation of geodesics or minimal distance curves,<br />

which allows stable boundary detection when the model’s gradients suffer from large variations including gaps or noise.<br />

Finally, we test the accuracy and robustness of the proposed method for various medical images. Experimental results<br />

show that our method can properly segment low contrast, complex images.<br />

16:20-16:40, Paper TuCT3.3<br />

The Impact of Color on Bag-of-Words based Object Recognition<br />

Rojas Vigo, David Augusto, Computer Vision Center Barcelona<br />

Shahbaz Khan, Fahad, Computer Vision Center Barcelona<br />

Van De Weijer, Joost, Computer Vision Center Barcelona<br />

Gevers, Theo, Univ. of Amsterdam<br />

In recent years several works have aimed at exploiting color information in order to improve the bag-of-words based<br />

image representation. There are two stages in which color information can be applied in the bag-of-words framework.<br />

Firstly, feature detection can be improved by choosing highly informative color-based regions. Secondly, feature description,<br />

typically focusing on shape, can be improved with a color description of the local patches. Although both approaches<br />

have been shown to improve results the combined merits have not yet been analyzed. Therefore, in this paper we investigate<br />

the combined contribution of color to both the feature detection and extraction stages. Experiments performed on two<br />

challenging data sets, namely Flower and Pascal VOC 2009; clearly demonstrate that incorporating color in both feature<br />

detection and extraction significantly improves the overall performance.<br />

16:40-17:00, Paper TuCT3.4<br />

Pyramidal Model for Image Semantic Segmentation<br />

Passino, Giuseppe, Queen Mary, Univ. of London<br />

Patras, Ioannis, Queen Mary, Univ. of London<br />

Izquierdo, Ebroul, Queen Mary, Univ. of London<br />

We present a new hierarchical model applied to the problem of image semantic segmentation, that is, the association to<br />

each pixel in an image with a category label (e.g. tree, cow, building, ...). This problem is usually addressed with a combination<br />

of an appearance-based pixel classification and a pixel context model. In our proposal, the images are initially<br />

over-segmented in dense patches. The proposed pyramidal model naturally embeds the compositional nature of a scene to<br />

achieve a multi-scale contextualisation of patches. This is obtained by imposing an order on the patches aggregation operations<br />

towards the final scene. The nodes of the pyramid (that is, a dendrogram) thus represent patch clusters, or superpatches.<br />

The probabilistic model favours the homogeneous labelling of super-patches that are likely to contain a single<br />

object instance, modelling the uncertainty in identifying such super-patches. The proposed model has several advantages,<br />

including the computational efficiency, as well as the expandability. Initial results place the model in line with other works<br />

in the recent literature.<br />

- 123 -


17:00-17:20, Paper TuCT3.5<br />

Multi-View based Estimation of Human Upper-Body Orientation<br />

Rybok, Lukas, Karlsruhe Inst. of Tech.<br />

Voit, Michael, Fraunhofer Inst. of Optronics<br />

Ekenel, Hazim Kemal, Karlsruhe Inst. of Tech.<br />

Stiefelhagen, Rainer, Karlsruhe Inst. of Tech. & Fraunhofer IITB<br />

The knowledge about the body orientation of humans can improve speed and performance of many service components<br />

of a smart-room. Since many of such components run in parallel, an estimator to acquire this knowledge needs a very low<br />

computational complexity. In this paper we address these two points with a fast and efficient algorithm using the smartroom’s<br />

multiple camera output. The estimation is based on silhouette information only and is performed for each camera<br />

view separately. The single view results are fused within a Bayesian filter framework. We evaluate our system on a subset<br />

of videos from the CLEAR 2007 dataset and achieve an average correct classification rate of 87.8%, while the estimation<br />

itself just takes 12 ms when four cameras are used.<br />

TuCT4 Dolmabahçe Hall A<br />

Structural Methods Regular Session<br />

Session chair: Ghosh, Joydeep (Univ. of Texas)<br />

15:40-16:00, Paper TuCT4.1<br />

An Iterative Algorithm for Approximate Median Graph Computation<br />

Ferrer, Miquel, Univ. Pol. De Catalunuya<br />

Bunke, Horst, Univ. of Bern<br />

Recently, the median graph has been shown to be a good choice to obtain a representative of a given set of graphs. It has<br />

been successfully applied to graph-based classification and clustering. In this paper we exploit a theoretical property of<br />

the median, which has not yet been utilized in the past, to derive a new iterative algorithm for approximate median graph<br />

computation. Experiments done using five different graph databases show that the proposed approach yields, in four out<br />

of these five datasets, better medians than two of the previous existing methods.<br />

16:00-16:20, Paper TuCT4.2<br />

A Supergraph-Based Generative Model<br />

Han, Lin, Univ. of York<br />

Wilson, Richard, Univ. of York<br />

Hancock, Edwin, Univ. of York<br />

This paper describes a method for constructing a generative model for sets of graphs. The method is posed in terms of<br />

learning a supergraph from which the samples can be obtained by edit operations. We construct a probability distribution<br />

for the occurrence of nodes and edges over the supergraph. We use the EM algorithm to learn both the structure of the supergraph<br />

and the correspondences between the nodes of the sample graphs and those of the supergraph, which are treated<br />

as missing data. In the experimental evaluation of the method, we a) prove that our supergraph learning method can lead<br />

to an optimal or suboptimal supergraph, and b) show that our proposed generative model gives good graph classification<br />

results.<br />

16:20-16:40, Paper TuCT4.3<br />

Levelings and Flatzone Morphology<br />

Meyer, Fernand, Mines-ParisTech<br />

Successive levelings are applied on document images. The residues of successive levelings are made of flat zones for<br />

which morphological transforms are described.<br />

16:40-17:00, Paper TuCT4.4<br />

Combining Force Histogram and Discrete Lines to Extract Dashed Lines<br />

Debled-Rennesson, Isabelle, LORIA – Nancy Univ.<br />

Wendling, Laurent, Univ. Paris Descartes<br />

- 124 -


A new method to extract dashed lines in technical document is proposed in this paper by combining force histogram and<br />

discrete lines. The aim is to study the spatial location of couples of connected components using force histogram and to<br />

refine the recognition by considering surrounding discrete lines. This new model is fast and it allows a good extraction of<br />

occulted patterns in presence of noise. Efficient common methods require several thresholds to process with technical<br />

documents. The proposed method requires only few thresholds which can be automatically set from data.<br />

17:00-17:20, Paper TuCT4.5<br />

Heat Flow-Thermodynamic Depth Complexity in Networks<br />

Escolano, Francisco, Univ. of Alicante<br />

Lozano, Miguel Angel, Univ. of Alicante<br />

Hancock, Edwin, Univ. of York<br />

In this paper we establish a formal link between network complexity in terms of Birkhoff-von Newmann decompositions<br />

and heat flow complexity (in terms of quantifying the heat flowing through the network at a given inverse temperature).<br />

Furthermore, we also define heat flow complexity in terms of thermodynamic depth, which results in a novel approach<br />

for characterizing networks and quantify their complexity. In our experiments we characterize several protein-protein interaction<br />

(PPI) networks and then highlight their evolutive differences.<br />

TuCT5 Anadolu Auditorium<br />

Image Analysis – V Regular Session<br />

Session chair: Kasturi, Rangachar (Univ. of South Florida)<br />

15:40-16:00, Paper TuCT5.1<br />

Content Adaptive Hash Lookups for Near-Duplicate Image Search by Full or Partial Image Queries<br />

Harmanci, Oztan, Anvato Inc.<br />

Haritaoglu, Ismail, Pol. Rain Inc.<br />

In this paper we present a scalable and high performance near-duplicate image search method. The proposed algorithm<br />

follows the common paradigm of computing local features around repeatable scale invariant interest points. Unlike existing<br />

methods, much shorter hashes are used (40 bits). By leveraging on the shortness of the hashes, a novel high performance<br />

search algorithm is introduced which analyzes the reliability of each bit of a hash and performs content adaptive hash<br />

lookups by adaptively adjusting the “range” of each hash bit based on reliability. Matched features are post-processed to<br />

determine the final match results. We experimentally show that the algorithm can detect cropped, resized, print-scanned<br />

and re-encoded images and pieces from images among thousands of images. The proposed algorithm can search for a<br />

200x200 piece of image in a database of 2,250 images with size 2400x4000 in 0.020 seconds on 2.5GHz Intel Core 2.<br />

16:00-16:20, Paper TuCT5.2<br />

The Good, the Bad, and the Ugly: Predicting Aesthetic Image Labels<br />

Wu, Yaowen, RWTH Aachen Univ. Fraunhofer Inst. IAIS<br />

Bauckhage, Christian, Fraunhofer IAIS<br />

Thurau, Christian, Fraunhofer IAIS<br />

Automatic classification of the aesthetic content of a picture is one of the challenges in the emerging discipline of computational<br />

aesthetics. Any suitable solution must cope with the facts that aesthetic experiences are highly subjective and<br />

that a commonly agreed upon theory of their psychological constituents is still missing. In this paper, we present results<br />

obtained from an empirical basis of several thousand images. We train SVM based classifiers to predict aesthetic adjectives<br />

rather than aesthetic scores and we introduce a probabilistic post processing step that alleviates effects due to misleadingly<br />

labeled training data. Extensive experimentation indicates that aesthetics classification is possible to a large extent. In particular,<br />

we find that previously established low-level features are well suited to recognize beauty. Robust recognition of<br />

unseemliness, on the other hand, appears to require more high-level analysis.<br />

16:20-16:40, Paper TuCT5.3<br />

Information Fusion for Combining Visual and Textual Image Retrieval<br />

Zhou, Xin, Geneva Univ. Hospitals and Univ. of Geneva<br />

Depeursinge, Adrien, Geneva Univ. Hospitals and Univ. of Geneva<br />

- 125 -


Müller, Henning, Univ. of Applied Sciences Sierre, Switzerland<br />

In this paper, classical approaches such as maximum combinations (combMAX), sum combinations (comb-SUM) and<br />

the product of the maximum and a nonzero number (combMNZ) were employed and the trade off between two fusion effects<br />

(chorus and dark horse effects) was studied based on the sum of n maximums. Various normalization strategies were<br />

tried out. The fusion algorithms are evaluated using the best four visual and textual runs of the ImageCLEF medical image<br />

retrieval task 2008 and 2009. The results show that fused runs outperform the best original runs and multi-modality fusion<br />

statistically outperforms single modality fusion. The logarithmic rank penalization shows to be the most stable normalization.<br />

The dark horse effect is in competition with the chorus effect and each of them can produce best fusion performance<br />

depending on the nature of the input data.<br />

16:40-17:00, Paper TuCT5.4<br />

Perceptual Image Retrieval by Adding Color Information to the Shape Context Descriptor<br />

Rusiñol, Marçal, Univ. Autònoma de Barcelona<br />

Nourbakhsh, Farshad, Computer Vision Center / Univ. Autònoma de Barcelona<br />

Karatzas, Dimosthenis, Univ. Autonoma de Barcelona<br />

Valveny, Ernest, Computer Vision Center / Univ. Autònoma de Barcelona<br />

Llados, Josep, Computer Vision Center<br />

In this paper we present a method for the retrieval of images in terms of perceptual similarity. Local color information is<br />

added to the shape context descriptor in order to obtain an object description integrating both shape and color as visual cues.<br />

We use a color naming algorithm in order to represent the color information from a perceptual point of view. The proposed<br />

method has been tested in two different applications, an object retrieval scenario based on color sketch queries and a color<br />

trademark retrieval problem. Experimental results show that the addition of the color information significantly outperforms<br />

the sole use of the shape context descriptor.<br />

17:00-17:20, Paper TuCT5.5<br />

Weighted Boundary Points for Shape Analysis<br />

Zhang, Jing, Univ. of South Florida<br />

Kasturi, Rangachar, Univ. of South Florida<br />

Shape analysis is an active and important branch in computer vision research field. In recent years, many geometrical, topological,<br />

and statistical features have been proposed and widely used for shape-related applications. In this paper, based on<br />

the properties of Distance Transform, we present a new shape feature, weight of boundary point. By computing the shortest<br />

distances between boundary points and distance contours of a transformed shape, every boundary point is assigned a weight,<br />

which contains the interior structure information of the shape. To evaluate the proposed new shape feature, we tested the<br />

weighted boundary points on shape matching and shape decomposition. The experimental results demonstrated the validity.<br />

TuCT6 Dolmabahçe Hall B<br />

Speech and Speaker Recognition Regular Session<br />

Session chair: Shinoda, Koichi (Tokyo Institute of Technology)<br />

15:40-16:00, Paper TuCT6.1<br />

Dimension-Decoupled Gaussian Mixture Model for Short Utterance Speaker Recognition<br />

Stadelmann, Thilo, Univ. of Marburg<br />

Freisleben, Bernd, Univ. of Marburg<br />

The Gaussian Mixture Model (GMM) is often used in conjunction with Mel-frequency cepstral coefficient (MFCC) feature<br />

vectors for speaker recognition. A great challenge is to use these techniques in situations where only small sets of training<br />

and evaluation data are available, which typically results in poor statistical estimates and, finally, recognition scores. Based<br />

on the observation of marginal MFCC probability densities, we suggest to greatly reduce the number of free parameters in<br />

the GMM by modeling the single dimensions separately after proper preprocessing. Saving about 90% of the free parameters<br />

as compared to an already optimized GMM and thus making the estimates more stable, this approach considerably improves<br />

recognition accuracy over the baseline as the utterances get shorter and saves a huge amount of computing time both in training<br />

and evaluation, enabling real-time performance. The approach is easy to implement and to combine with other short-utterance<br />

approaches, and applicable to other features as well.<br />

- 126 -


16:00-16:20, Paper TuCT6.2<br />

Modeling Syllable-Based Pronunciation Variation for Accented Mandarin Speech Recognition<br />

Zhang, Shilei, IBM Res.<br />

Shi, Qin, IBM Res. – China<br />

Qin, Yong, IBM Res. – China<br />

Pronunciation variation is a natural and inevitable phenomenon in an accented Mandarin speech recognition application.<br />

In this paper, we integrate knowledge-based and data-driven approaches together for syllable-based pronunciation variation<br />

modeling to improve the performance of Mandarin speech recognition system for speakers with Southern accent. First,<br />

we generate the syllable-based pronunciation variation rules of Southern accent observed from the training corpus by Chinese<br />

linguistic expert. Second, dictionary augmentation with multiple pronunciation variants and pronunciation probability<br />

derived from forced alignment statistics of training data. The acoustic models will be retrained based on the new expansion<br />

dictionary. Finally, pronunciation variation adaptation will be performed to further fit the data on the decoding stage by<br />

taking distribution of variation rules clusters of testing set into account. The experimental results show that the proposed<br />

method provides a flexible framework to improve the recognition performance for accented speech effectively.<br />

16:20-16:40, Paper TuCT6.3<br />

Automatic Pronunciation Transliteration for Chinese-English Mixed Language Keyword Spotting<br />

Zhang, Shilei, IBM Res.<br />

Shuang, Zhiwei, IBM Res. – China<br />

Qin, Yong, IBM Res. – China<br />

This paper presents automatic pronunciation transliteration method with acoustic and contextual analysis for Chinese-<br />

English mixed language keyword spotting (KWS) system. More often, we need to develop robust Chinese-English mixed<br />

language spoken language technology without Chinese accented English acoustic data. In this paper, we exploit pronunciation<br />

conversion method based on syllable-based characteristic analysis of pronunciation and data-driven phoneme pairs<br />

mappings to solve mixed language problem by only using well-trained Chinese models. One obvious advantage of such<br />

method is that it provides a flexible framework to implement the pronunciation conversion of English keywords to Chinese<br />

automatically. The efficiency of the proposed method was demonstrated under KWS task on mixed language database.<br />

16:40-17:00, Paper TuCT6.4<br />

Learning Virtual HD Model for Bi-Model Emotional Speaker Recognition<br />

Huang, Ting, Zhejiang Univ.<br />

Yang, Yingchun, Zhejiang Univ.<br />

Pitch mismatch between training and testing is one of the important factors causing the performance degradation of the<br />

speaker recognition system. In this paper, we adopted the missing feature theory and specified the Unreliable Region (UR)<br />

as the parts of the utterance with high emotion induced pitch variation. To model these regions, a virtual HD (High Different<br />

from neutral, with large pitch offset) model for each target speaker was built from the virtual speech, which were converted<br />

from the neutral speech by the Pitch Transformation Algorithm (PTA). In the PTA, a polynomial transformation function<br />

was learned to model the relationship of the average pitch between the neutral and the high-pitched utterances. Compared<br />

with traditional GMM-UBM and our previous method, our new method obtained 1.88% and 0.84% identification rate<br />

(IR) increase on the MASC respectively, which are promising results.<br />

17:00-17:20, Paper TuCT6.5<br />

Role of Synthetically Generated Samples on Speech Recognition in a Resource-Scarce Language<br />

Chakraborty, Rupayan, St. Thomas’ Coll. of Eng. & Tech.<br />

Garain, Utpal, Indian Statistical Inst.<br />

Speech recognition systems that make use of statistical classifiers require a large number of training samples. However,<br />

collection of real samples has always been a difficult problem due to the involvement of substantial amount of human intervention<br />

and cost. Considering this problem, this paper presents a novel method for generating synthetic samples from<br />

a handful of real samples and investigates the role of these samples in designing a speech recognition system. Speaker dependent<br />

limited vocabulary isolated word recognition in an Indian language (i.e. Bengali) has been taken a reference to<br />

demonstrate the potential of the proposed framework. The role of synthetic samples is demonstrated by showing a significant<br />

improvement in recognition accuracy. A maximum improvement of 10% is achieved using the proposed approach.<br />

- 127 -


TuCT7 Dolmabahçe Hall C<br />

Fingerprint Regular Session<br />

Session chair: Sankur, Bülent (Bogazici Univ.)<br />

15:40-16:00, Paper TuCT7.1<br />

Detecting Altered Fingerprints<br />

Feng, Jianjiang, Tsinghua Univ.<br />

Jain, Anil, Michigan State Univ.<br />

Ross, Arun, West Virginia Univ.<br />

The widespread deployment of Automated Fingerprint Identification Systems (AFIS) in law enforcement and border<br />

control applications has prompted some individuals with criminal background to evade identification by purposely altering<br />

their fingerprints. Available fingerprint quality assessment software cannot detect most of the altered fingerprints since<br />

the implicit image quality does not always degrade due to alteration. In this paper, we classify the alterations observed in<br />

an operational database into three categories and propose an algorithm to detect altered fingerprints. Experiments were<br />

conducted on both real-world altered fingerprints and synthetically generated altered fingerprints. At a false alarm rate of<br />

7%, the proposed algorithm detected 92% of the altered fingerprints, while a well-known fingerprint quality software,<br />

NFIQ, only detected 20% of the altered fingerprints.<br />

16:00-16:20, Paper TuCT7.2<br />

A Variational Formulation for Fingerprint Orientation Modeling<br />

Hou, Zujun, Inst. For Infocomm Res.<br />

Yau, Wei-Yun, Inst. For Infocomm Res.<br />

Fingerprint orientation plays important roles in fingerprint recognition. This paper proposes a framework for modeling<br />

the fingerprint orientation field based on the variational principle. The proposed method does not require any prior information<br />

about the structure of acquired fingerprints. Comparison has been made with respect to state-of-the-arts in fingerprint<br />

orientation modeling.<br />

16:20-16:40, Paper TuCT7.3<br />

Fingerprint Pore Matching based on Sparse Representation<br />

Liu, Feng, The Hong Kong Pol. Univ.<br />

Zhao, Qijun, The Hong Kong Pol. Univ.<br />

Zhang, Lei, The Hong Kong Pol. Univ.<br />

Zhang, David, The Hong Kong Pol. Univ.<br />

This paper proposes an improved direct fingerprint pore matching method. It measures the differences between pores by<br />

using the sparse representation technique. The coarse pore correspondences are then established and weighted based on<br />

the obtained differences. The false correspondences among them are finally removed by using the weighted RANSAC algorithm.<br />

Experimental results have shown that the proposed method can greatly improve the accuracy of existing methods.<br />

16:40-17:00, Paper TuCT7.4<br />

Latent Fingerprint Core Point Prediction based on Gaussian Processes<br />

Su, Chang, Univ. at Buffalo, State Univ. of New York<br />

Srihari, Sargur, Univ. at Buffalo, State Univ. of New York<br />

Core point prediction is of critical importance to latent fingerprints individuality assessment. While tremendous effort<br />

have been made in core point detection, locating core points in latent fingerprints continues to be a difficult problem<br />

because latent prints usually contain only partial images with core points left outside the print. A novel method is proposed<br />

that predicts the locations and orientations of core points for latent fingerprints. The method is based on Gaussian processes<br />

and provides prediction in interpretations of probability rather than binary decision. The accuracy of the method is illustrated<br />

by experiments on a real-life latent fingerprint data set.<br />

17:00-17:20, Paper TuCT7.5<br />

Towards a Better Understanding of the Performance of Latent Fingerprint Recognition in Realistic Forensic Conditions<br />

Puertas, Maria, Univ. Autonoma de Madrid<br />

- 128 -


Ramos, Daniel, Univ. Autonoma de Madrid<br />

Fierrez, Julian, Univ. Autonoma de Madrid<br />

Ortega-Garcia, Javier, Univ. Autonoma de Madrid<br />

Exposito-Marquez, NicomedesDepartamento de Identificacion. Servicio de Criminalistica de la Guardia Civil, Ministerio<br />

del Interior, Spain.<br />

This work studies the performance of a state-of-the-art fingerprint recognition technology, in several practical scenarios<br />

of interest in forensic casework. First, the differences in performance between manual and automatic minutiae extraction<br />

for latent fingerprints are presented. Then, automatic minutiae extraction is analyzed using three different types of fingerprints:<br />

latent, rolled and plain. The experiments are carried out using a database of latent finger marks and fingerprint impressions<br />

from real forensic cases. The results show high performance degradation in automatic minutiae extraction<br />

compared to manual extraction by human experts. Moreover, high degradation in performance on latent finger marks can<br />

be observed in comparison to fingerprint impressions.<br />

TuBCT8 Upper Foyer<br />

3D Shape Recovery; Image and Physics-Based Modeling; Motion and Multi-View Vision; Tracking and Surveillance<br />

Poster Session<br />

Session chair: Jiang, Xiaoyi (Univ. of Münster)<br />

13:30-16:30, Paper TuBCT8.1<br />

Online Next-Best-View Planning for Accuracy Optimization using an Extended E-Criterion<br />

Trummer, Michael, Friedrich-Schiller Univ. of Jena<br />

Munkelt, Christoph, Fraunhofer Society<br />

Denzler, Joachim, Friedrich-Schiller Univ. of Jena<br />

Next-best-view (NBV) planning is an important aspect for three-dimensional (3D) reconstruction within controlled environments,<br />

such as a camera mounted on a robotic arm. NBV methods aim at a purposive 3D reconstruction sustaining<br />

predefined goals and limitations. Up to now, literature mainly presents NBV methods for range sensors, model-based approaches<br />

or algorithms that address the reconstruction of a finite set of primitives. For this work, we use an intensity<br />

camera without active illumination. We present a novel combined online approach comprising feature tracking, 3D reconstruction,<br />

and NBV planning that addresses arbitrary unknown objects. In particular we focus on accuracy optimization<br />

based on the reconstruction uncertainty. To this end we introduce an extension of the statistical E-criterion to model directional<br />

uncertainty, and we present a closed-form, optimal solution to this NBV planning problem. Our experimental<br />

evaluation demonstrates the effectivity of our approach using an absolute error measure.<br />

13:30-16:30, Paper TuBCT8.2<br />

Non Contact 3D Measurement Scheme for Transparent Objects using UV Structured Light<br />

Rantoson, Rindra, LE2I<br />

Fofi, David, Le2i UMR CNRS 5158<br />

Stolz, Christophe, LE2I<br />

Meriaudeau, Fabrice, LE2I<br />

This paper introduces a novel 3D measurement scheme based on UV laser triangulation to ascertain the shape of transparent<br />

objects. Transparent objects are extremely difficult to scan with traditional 3D scanners because of the refraction problem<br />

observed in the visible range. Therefore, the object surface needs to be preliminary powdered before being digitized with<br />

commercial scanners. Our approach consists of using non contact measurement scheme while dealing with the refraction<br />

problem in visible environment. The object shape is computed by classical triangulation method based on stereovision<br />

constraint. The proposed acquisition system is composed of two classical visible range cameras and a UV laser source.<br />

The exploitation of the UV laser for triangulation system characterizes the novelty of the proposed approach. The fluorescence<br />

generated by the UV radiation enables to acquire 3D data of transparent surface with a classical stereovision scheme.<br />

13:30-16:30, Paper TuBCT8.3<br />

Extending Fast Marching Method under Point Light Source Illumination and Perspective Projection<br />

Iwahori, Yuji, Chubu Univ.<br />

Iwai, Kazuki, Chubu Univ.<br />

Woodham, Robert J., Univ. of British Columbia<br />

- 129 -


Kawanaka, Haruki, Aichi Prefectural Univ.<br />

Fukui, Shinji, Aichi Univ. of Education<br />

Kasugai, Kunio, Aichi Medical Univ.<br />

An endoscope is a medical instrument that acquires images inside the human body. An endoscope carries its own light<br />

source. Classic shape-from-shading can be used to recover the 3-D shape of objects in view. Recent implementations have<br />

used the Fast Marching Method (FMM). Previous FMM approaches recover 3-D shape under assumptions of parallel light<br />

source illumination and orthographic projection. This paper extends the FMM approach to recover the 3-D shape under<br />

more realistic conditions of endoscopy, namely nearby point light source illumination and perspective projection. The new<br />

approach is demonstrated through experiment and is seen to improve performance.<br />

13:30-16:30, Paper TuBCT8.4<br />

Effective Structure-From-Motion for Hybrid Camera Systems<br />

Bastanlar, Yalin, Middle East Tech. Univ.<br />

Temizel, Alptekin, Middle East Tech. Univ.<br />

Yardimci, Yasemin, Middle East Tech. Univ.<br />

Sturm, Peter, INRIA<br />

We describe a pipeline for structure-from-motion with mixed camera types, namely omni directional and perspective cameras.<br />

The steps of the pipeline can be summarized as calibration, point matching, pose estimation, triangulation and bundle<br />

adjustment. For these steps, we either propose improved methods or modify existing perspective camera methods to make<br />

the pipeline more effective and automatic when employed for hybrid camera systems.<br />

13:30-16:30, Paper TuBCT8.5<br />

Single View Metrology Along Orthogonal Directions<br />

Peng, Kun, Peking Univ.<br />

Hou, Lulu, Peking Univ.<br />

Ren, Ren, Peking Univ.<br />

Ying, Xianghua, Peking Univ.<br />

Zha, Hongbin, Peking Univ.<br />

In this paper, we describe how 3D metric measurements can be determined from a single uncalibrated image, when only<br />

minimal geometric information are available in the image. The minimal information just is orthogonal vanishing points.<br />

Given such limited information, we show that the length ratios on different orthogonal directions can be directly computed.<br />

The exciting discovery of the method seems to oppose common senses: Usually, in the calibration process, all edge-lengths<br />

of cuboid are known, in this paper, cuboid edge-lengths are unknown but its edge-lengths ratios can be recovered from<br />

image. 3D metric measurements can be directly computed from the image using our linear method.<br />

13:30-16:30, Paper TuBCT8.6<br />

Depth Perception Model based on Fixational Eye Movements using Bayesian Statistical Inference<br />

Tagawa, Norio, Tokyo Metropolitan Univ.<br />

Small vibrations of eyeball, which occur when we fix our gaze on object, is called ``fixational eye movements.’’ It has<br />

been reported that such the involuntary eye movements work also for monocular depth perception. In this study, we focus<br />

on ``tremor’’ which is the smallest type of fixational eye movement, and construct depth perception model based on tremor<br />

using MAP-EM algorithm. Its effectiveness is confirmed through numerical evaluations using artificial images.<br />

13:30-16:30, Paper TuBCT8.7<br />

One-Shot Scanning using a Color Stripe Pattern<br />

Li, Renju, Peking Univ.<br />

Zha, Hongbin, Peking Univ.<br />

Structured light 3D scanning has many applications such as 3D modeling, animation, motion analysis, deformation measurement<br />

and so on. Traditional structured light methods make use of a sequence of patterns to obtain the dense 3D data of<br />

objects. However, few methods have been proposed to achieve pixel wise reconstruction using one pattern only. In this<br />

- 130 -


paper, we proposes a one-shot scanning system based on a novel stripe pattern. This pattern uses color stripes with quadratic<br />

intensity distribution in each stripe. The color distribution is based on a De Bruijn sequence with six colors and order<br />

three. Graph cut is utilized to decode the color information and the resulting code is calculated using local intensity. Compared<br />

with traditional methods, the proposed method uses one pattern only and achieves pixel wise reconstruction. Experimental<br />

results show that our one-shot scanning system can robustly capture 3D data with high accuracy.<br />

13:30-16:30, Paper TuBCT8.8<br />

Face Appearance Reconstruction based on a Regional Statistical Craniofacial Model (RSCM)<br />

Yan-Fei, Zhang, Northwest Univ.<br />

Ming-Quan, Zhou, Northwest Univ.<br />

Geng, Guohua, Northwest Univ.<br />

Feng, Jun, Northwest Univ.<br />

The reconstruction of facial soft tissue is an essential processing phase in a few of fields. In this paper, we propose a face<br />

appearance reconstruction algorithm based on a Regional Statistical Craniofacial model called RSCM. Specifically, the<br />

shape of the craniofacial model is decomposed into a few of segments, such as the eyes, the nose and the mouth regions,<br />

then the joint statistical models of different regions are constructed independently to address the small sample size problem.<br />

The face reconstruction task is formulated as a miss data problem, and is also fulfilled region by region respectively.<br />

Finally, the recovered regions are assembled together to achieve a completed face model. The experimental results show<br />

that the proposed reconstruction scheme achieves less error rate than a state of the art method.<br />

13:30-16:30, Paper TuBCT8.9<br />

3D Human Pose Reconstruction using Millions of Exemplars<br />

Jiang, Hao, Boston Coll.<br />

We propose a novel exemplar based method to estimate 3D human poses from single images by using only the joint correspondences.<br />

Due to the inherent depth ambiguity, estimating 3D poses from a monocular view is a challenging problem.<br />

We solve the problem by searching through millions of exemplars for optimal poses. Compared with traditional parametric<br />

schemes, our method is able to handle very large pose database, relieves parameter tweaking, is easier to train and is more<br />

effective for complex pose 3D reconstruction. The proposed method estimates upper body poses and lower body poses<br />

sequentially, which implicitly squares the size of the exemplar database and enables us to reconstruct unconstrained poses<br />

efficiently. Our implementation based on the kd-tree achieves real-time performance. The experiments on a variety of images<br />

show that the proposed method is efficient and effective.<br />

13:30-16:30, Paper TuBCT8.10<br />

Recovering 3D Shape using an Improved Fast Marching Method<br />

Zou, Chengming, Wuhan Univ. of Tech.<br />

Hancock, Edwin, Univ. of York<br />

In this paper we present an improved shape from shading method using improved fast marching method. We commence<br />

by showing how to recover 3D shape from a single image using an improved fast marching method for solving SFS problem.<br />

Then we use the level set method constrained by energy minimization to evolve the 3D shape. Finally we show that<br />

the method can recover stable surface estimates from both synthetic and real world images of complex objects. The experimental<br />

results show that the resulting method is both robust and accurate.<br />

13:30-16:30, Paper TuBCT8.11<br />

The Motion Dynamics Approach to the PnP Problem<br />

Wang, Bo, Chinese Acad. of Sciences<br />

Sun, Fengmei, North China University of Technology<br />

We propose a new motion dynamics approach to solve the PnP problem, where a dynamic simulation system is constituted<br />

by springs and balls. The equivalence between minimizing the energy of the dynamic system and solving the PnP problem<br />

is proved. With the assumption of the existence of resistances, the solution of the original PnP problem can be solved<br />

through the simulation of the process of the movement of the balls.<br />

- 131 -


13:30-16:30, Paper TuBCT8.12<br />

Eigenbubbles: An Enhanced Apparent BRDF Representation<br />

Kumar, Ritwik, Harvard Univ.<br />

Baba, Vemuri, Univ. of Florida<br />

Banerjee, Arunava, Univ. of Florida<br />

In this paper we address the problem of relighting faces in presence of cast shadows and specularities. We present a solution<br />

to this problem by capturing the spatially varying Apparent Bidirectional Reflectance Functions (ABRDF) fields of human<br />

faces using Spline Modulated Spherical Harmonics and representing them using a few salient spherical functions called<br />

Eigenbubbles. Through extensive experiments on the Extended Yale B and the CMU PIE benchmark datasets we demonstrate<br />

that the proposed method clearly outperforms the state-of-the-art techniques in synthesized image quality. Furthermore,<br />

we show that our framework allows for ABDRF field compression and can also be used to enhance performance of<br />

face recognition algorithms.<br />

13:30-16:30, Paper TuBCT8.13<br />

Reactive Object Tracking with a Single PTZ Camera<br />

Al Haj, Murad, Univ. Autonoma de Bracelona<br />

Bagdanov, Andrew D., Univ. Autonoma de Barcelona<br />

Gonzalez, Jordi, Centre de Visio per Computador<br />

Roca, F. Xavier, Univ. Autonoma de Barcelona<br />

In this paper we describe a novel approach to reactive tracking of moving targets with a pan-tilt-zoom camera. The approach<br />

uses an extended Kalman filter to jointly track the object position in the real world, its velocity in 3D and the camera intrinsics,<br />

in addition to the rate of change of these parameters. The filter outputs are used as inputs to PID controllers which<br />

continuously adjust the camera motion in order to reactively track the object at a constant image velocity while simultaneously<br />

maintaining a desirable target scale in the image plane. We provide experimental results on simulated and real<br />

tracking sequences to show how our tracker is able to accurately estimate both 3D object position and camera intrinsics<br />

with very high precision over a wide range of focal lengths.<br />

13:30-16:30, Paper TuBCT8.14<br />

An Experimental Study of Image Components and Data Metrics for Illumination-Robust Variational Optical Flow<br />

Chetverikov, Dmitry, MTA SZTAKI<br />

Molnar, Jozsef, ELTE<br />

Illumination-robust optical flow algorithms are needed in numerous machine vision applications such as vision-based intelligent<br />

vehicles, surveillance and traffic monitoring. Recently, we have proposed an implicit nonlinear scheme for variational<br />

optical flow that assumes no particular analytical form of energy functional and can accommodate various image<br />

components and data metrics. Using test data with brightness and colour illumination changes, we study different features<br />

and metrics and demonstrate that cross-correlation is superior to the L1 metric for all combinations of the features.<br />

13:30-16:30, Paper TuBCT8.15<br />

Multiple Human Tracking based on Multi-View Upper-Body Detection and Discriminative Learning<br />

Xing, Junliang, Tsinghua Univ.<br />

Ai, Haizhou, Tsinghua Univ. China<br />

Lao, Shihong, OMRON Corp.<br />

This paper focuses on the problem of tracking multiple humans in dense environments which is very challenging due to<br />

recurring occlusions between different humans. To cope with the difficulties it presents, an offline boosted multi-view<br />

upper-body detector is used to automatically initialize a new human trajectory and is capable of dealing with partial human<br />

occlusions. What is more, an online learning process is proposed to learn discriminative human observations, including<br />

discriminative interest points and color patches, to effectively track each human when even more occlusions occur. The<br />

offline and online observation models are neatly integrated into the particle filter framework to robustly track multiple<br />

highly interactive humans. Experiments results on CAVIAR dataset as well as many other challenging real-world cases<br />

demonstrate the effectiveness of the proposed method.<br />

- 132 -


13:30-16:30, Paper TuBCT8.16<br />

Visual Tracking using Sparsity Induced Similarity<br />

Liu, Huaping, Tsinghua Univ.<br />

Sun, Fuchun, Tsinghua Univ.<br />

Currently sparse signal reconstruction gains considerable interests and is applied in many fields. In this paper, we propose<br />

a new approach which utilizes the sparsity induced similarity to construct the tracking algorithm. Compared with stateof-the-art,<br />

the advantage of this approach is that the sparse representation needs to be calculated for only once and therefore<br />

the time cost is dramatically decreased. In addition, extensive experimental comparisons show that the proposed approach<br />

is more robust than some existing approaches.<br />

13:30-16:30, Paper TuBCT8.17<br />

An Information Fusion Approach for Multiview Feature Tracking<br />

Ataer-Cansizoglu, Esra, Boston Univ.<br />

Betke, Margrit, Boston Univ.<br />

We propose an information fusion approach to tracking objects from different viewpoints that can detect and recover from<br />

tracking failures. We introduce a reliability measure that is a combination of terms associated with correlation-based template<br />

matching and the epipolar geometry of the cameras. The measure is computed to evaluate the performance of 2D<br />

trackers in each camera view and detect tracking failures. The 3D object trajectory is constructed using stereoscopy and<br />

evaluated to predict the next 3D position of the object. In case of track loss in one camera view, the projection of the predicted<br />

3D position onto the image plane of this view is used to reinitialize the lost 2D tracker. We conducted experiments<br />

with 34 subjects to evaluate our proposed system on videos of facial feature movements during human-computer interaction.<br />

The system successfully detected feature loss and gave promising results on accurate re-initialization of the feature.<br />

13:30-16:30, Paper TuBCT8.18<br />

Monocular 3D Tracking of Deformable Surfaces using Linear Programming<br />

Chenhao, Wang, Shanghai Jiao Tong Univ.<br />

Li, Xiong, Shanghai Jiao Tong Univ.<br />

Liu, Yuncai, Shanghai Jiao Tong Univ.<br />

We present a method for 3D shape reconstruction of inextensible deformable surfaces from monocular image sequences.<br />

The key of our approach is to represent the surface as 3D triangulated mesh and formulate the reconstruction problem as<br />

a sequence of Linear Programming (LP) problems which can be effectively solved. The LP problem consists of data constraints<br />

which are 3D-to-2D keypoint correspondences and shape constraints which prevent large changes of the edge orientation<br />

between consecutive frames. Furthermore, we use a refined bisection algorithm to accelerate the computing speed.<br />

The robustness and efficiency of our approach are validated on both synthetic and real data.<br />

13:30-16:30, Paper TuBCT8.19<br />

Exploiting Visual Quasi-Periodicity for Automated Chewing Event Detection using Active Appearance Models and<br />

Support Vector Machines<br />

Cadavid, Steven, Univ. of Miami<br />

Abdel-Mottaleb, Mohamed, Univ. of Miami<br />

We present a method that automatically detects chewing events in surveillance video of a subject. Firstly, an Active Appearance<br />

Model (AAM) is used to track a subject’s face across the video sequence. It is observed that the variations in the<br />

AAM parameters across chewing events demonstrate a distinct periodicity. We utilize this property to discriminate between<br />

chewing and non-chewing facial actions such as talking. A feature representation is constructed by applying spectral analysis<br />

to a temporal window of model parameter values. The estimated power spectra subsequently undergo non-linear dimensionality<br />

reduction via spectral regression. The low-dimensional representations of the power spectra are employed<br />

to train a Support Vector Machine (SVM) binary classifier to detect chewing events. Experimental results yielded a cross<br />

validated percentage agreement of 93.4%, indicating that the proposed system provides an efficient approach to automated<br />

chewing detection.<br />

- 133 -


13:30-16:30, Paper TuBCT8.20<br />

Slip and Fall Events Detection by Analyzing the Integrated Spatiotemporal Energy Map<br />

Huang, Chung-Lin, National Tsing-Hua Univ.<br />

Liao, Tim, National Tsing-Hua Univ.<br />

This paper presents a new method to detect slip and fall events by analyzing the integrated spatiotemporal energy (ISTE)<br />

map. ISTE map includes motion and time of motion occurrence as our motion feature. The extracted human shape is represented<br />

by an ellipse that provides crucial information of human motion activities. We use this features to detect the<br />

events in the video with non-fixed frame rate. This work assumes that the person lies on the ground with very little motion<br />

after the fall accident. Experimental results show that our method is effective for fall and slip detection.<br />

13:30-16:30, Paper TuBCT8.21<br />

Color Constancy using Standard Deviation of Color Channels<br />

Choudhury, Anustup, Univ. of Southern California<br />

Medioni, Gerard, Univ. of Southern California<br />

We address here the problem of color constancy and propose a new method to achieve color constancy based on the statistics<br />

of images with color cast. Images with color cast have standard deviation of one color channel significantly different<br />

from that of other color channels. This observation is also applicable to local patches of images and ratio of the maximum<br />

and minimum standard deviation of color channels of local patches is used as a prior to select a pixel color as illumination<br />

color. We provide extensive validation of our method on commonly used datasets having images under varying illumination<br />

conditions and show our method to be robust to choice of dataset and at least as good as current state-of-the-art color constancy<br />

approaches.<br />

13:30-16:30, Paper TuBCT8.22<br />

Recognizing Human Actions using Key Poses<br />

Baysal, Sermetcan, Bilkent Univ.<br />

Kurt, Mehmet Can, Bilkent Univ.<br />

Duygulu, Pinar, Bilkent Univ.<br />

In this paper, we explore the idea of using only pose, without utilizing any temporal information, for human action recognition.<br />

In contrast to the other studies using complex action representations, we propose a simple method, which relies on<br />

extracting key poses from action sequences. Our contribution is two-fold. Firstly, representing the pose in a frame as a<br />

collection of line-pairs, we propose a matching scheme between two frames to compute their similarity. Secondly, to<br />

extract key poses for each action, we present an algorithm, which selects the most representative and discriminative poses<br />

from a set of candidates. Our experimental results on KTH and Weizmann datasets have shown that pose information by<br />

itself is quite effective in grasping the nature of an action and sufficient to distinguish one from others.<br />

13:30-16:30, Paper TuBCT8.23<br />

Action Recognition using Three-Way Cross Correlations Feature of Local Motion Attributes<br />

Matsukawa, Tetsu, Univ. of Tsukuba<br />

Kurita, Takio, National Inst. of Advanced Industrial Science andTechnology<br />

This paper proposes a spatio-temporal feature using three-way cross-correlations of local motion attributes for action<br />

recognition. Recently, the cubic higher-order local auto-correlations (CHLAC) feature has been shown high classification<br />

performances for action recognition. In previous researches, CHLAC feature was applied to binary motion image sequences<br />

that indicates moving or static points. However, each binary motion image lost informations about the type of motion such<br />

as timing of change or motion direction. Therefore, we can improve the classification accuracy further by extending<br />

CHLAC to multivalued motion image sequences that considered several types of local motion attributes. The proposed<br />

method is also viewed as an extension of popular bag-of-features approach. Experimental results using two datasets shows<br />

proposed method outperformed CHLAC features and bag-of-features approach.<br />

- 134 -


13:30-16:30, Paper TuBCT8.24<br />

Discriminative Level Set for Contour Tracking<br />

Li, Wei, Chinese Acad. of Sciences<br />

Conventional contour tracking algorithms with level set often use generative models to construct the energy function. For<br />

tracking through cluttered and noisy background, however, a generative model may not be discriminative enough. In this<br />

paper we integrate the discriminative methods into a level set framework when constructing the level set energy function.<br />

We train a set of weak classifiers to distinguish the object from the background. Each weak classifier is designed to select<br />

the most discriminative feature space and integrated via AdaBoost according to their training errors. We also introduce a<br />

novel interaction term to explore the correlation between pixels near the object edge. This term together with the discriminative<br />

model both enhance the discriminative power of the level set. The experimental results show that the contour<br />

tracked by our approach is more accurate than the conventional algorithms with the generative model. Our algorithm successfully<br />

tracks the object contour even in a cluttered environment.<br />

13:30-16:30, Paper TuBCT8.25<br />

Tracking Objects with Adaptive Feature Patches for PTZ Camera Visual Surveillance<br />

Xie, Yi, Beijing Inst. of Tech.<br />

Lin, Liang, Lotushill Inc<br />

Jia, Yunde, Beijing Inst. of Tech.<br />

Compared to the traditional tracking with fixed cameras, the PTZ-camera-based tracking is more challenging due to (I)<br />

lacking of reliable background modeling and subtraction; (ii) the appearance and scale of target changing suddenly and<br />

drastically. Tackling these problems, this paper proposes a novel tracking algorithm using patch-based object models and<br />

demonstrates its advantages with the PTZ-camera in the application of visual surveillance. In our method, the target model<br />

is learned and represented by a set of feature patches whose discriminative power is higher than others. The target model<br />

is matched and evaluated by both appearance and motion consistency measurements. The homography between frames is<br />

also calculated for scale adaptation. The experiment on several surveillance videos shows that our method outperforms<br />

the state-of-arts approaches.<br />

13:30-16:30, Paper TuBCT8.26<br />

Counting Moving People in Videos by Salient Points Detection<br />

Conte, Donatello, Univ. di Salerno<br />

Foggia, Pasquale, Univ. di Salerno<br />

Percannella, Gennaro, Univ. di Salerno<br />

Tufano, Francesco, Univ. degli Studi di Salerno<br />

Vento, Mario, Univ. degli Studi di Salerno<br />

This paper presents a novel method to count people for video surveillance applications. The problem is faced by establishing<br />

a mapping between some scene features and the number of people. Moreover, the proposed technique takes specifically<br />

into account problems due to perspective. In the experimental evaluation, the method has been compared with respect to<br />

the algorithm by Albiol et al., which provided the highest performance at the PETS 2009 contest on people counting, using<br />

the same datasets. The results confirm that the proposed method improves the accuracy, while retaining the robustness of<br />

Albiol’s algorithm.<br />

13:30-16:30, Paper TuBCT8.27<br />

Visualization of Customer Flow in an Office Complex over a Long Period<br />

Onishi, Masaki, National Inst. of Advanced Industrial Science and Technology<br />

Yoda, Ikushi, National Inst. of Advanced Industrial Science and Technology<br />

In facility management, analysis of customer trajectories in office complexes is considered critical. In this paper, we<br />

propose a novel approach for the visualization of customer flow in an office complex over a long period of time. We expressed<br />

the variation in the trajectories with respect to time by using a mixture model; this was used for the visualization<br />

of the trajectory flows. The effectiveness of our approach was evaluated from the results of the customer flow analysis experiments<br />

that were conducted in an office complex.<br />

- 135 -


13:30-16:30, Paper TuBCT8.28<br />

Incremental MPCA for Color Object Tracking<br />

Wang, Dong, Department of Electronic Engineering<br />

Lu, Hu-Chuan, Dalian Univ. of Tech.<br />

Chen, Yen-Wei, Ritsumeikan Univ.<br />

The task of visual tracking is to deal with dynamic image streams that change over time. For color object tracking, although<br />

a color object is a 3-order tensor in essence, little attention has been focused on this attribute. In this paper, we propose a<br />

novel Incremental Multiple Principal Component Analysis (IMPCA) method for online learning dynamic tensor streams.<br />

When newly added tensor set arrives, the mean tenor and the covariance matrices of different modes can be updated easily,<br />

and then projection matrices can be effectively calculated based on covariance matrices. Finally, we apply our IMPCA method<br />

to color object tracking using Bayes inference framework. Experiments are performed on some changeling public and our<br />

own video sequences. The experimental results demonstrate that the proposed method achieves considerable performance.<br />

13:30-16:30, Paper TuBCT8.29<br />

Epipolar-Based Stereo Tracking without Explicit 3D Reconstruction<br />

Gaschler, Andre Karlheinz, Tech. Univ. München<br />

Burschka, Darius, Tech. Univ. München<br />

Hager, Gregory<br />

We present a general framework for tracking image regions in two views simultaneously based on sum-of-squared differences<br />

(SSD) minimization. Our method allows for motion models up to affine transformations. Contrary to earlier approaches, we<br />

incorporate the well-known epipolar constraints directly into the SSD optimization process. Since the epipolar geometry can<br />

be computed from the image directly, no prior calibration is necessary. Our algorithm has been tested in different applications<br />

including camera localization, wide-baseline stereo, object tracking and medical imaging. We show experimental results on<br />

robustness and accuracy compared to the known ground truth given by a conventional tracking device.<br />

13:30-16:30, Paper TuBCT8.30<br />

Human Body Parts Tracking using Sequential Markov Random Fields<br />

Cao, Xiao-Qin, City Univ. of Hong Kong<br />

Zeng, Jia, Soochow University<br />

Liu, Zhi-Qiang, City Univ. of Hong Kong<br />

Automatically tracking human body parts is a difficult problem because of background clutters, missing body parts, and the<br />

high degrees of freedoms and complex kinematics of the articulated human body. This paper presents the sequential Markov<br />

random fields (SMRFs) for tracking and labeling moving human body parts automatically by learning the spatio-temporal<br />

structures of human motions in the setting of occlusions and clutters. We employ a hybrid strategy, where the temporal dependencies<br />

between two successive human poses are described by the sequential Monte Carlo method, and the spatial relationships<br />

between body parts in a pose is described by the Markov random fields. Efficient inference and learning algorithms<br />

are developed based on the relaxation labeling. Experimental results show that the SMRF can effectively track human body<br />

parts in natural scenes.<br />

13:30-16:30, Paper TuBCT8.31<br />

Action Recognition in Videos using Nonnegative Tensor Factorization<br />

Krausz, Barbara, Fraunhofer IAIS<br />

Bauckhage, Christian, Fraunhofer IAIS<br />

Recognizing human actions is of vital interest in video surveillance or ambient assisted living. We consider an action as a<br />

sequence of body poses which are themselves a linear combination of body parts. In an offline procedure, nonnegative tensor<br />

factorization is used to extract basis images that represent body parts. The weighting coefficients are obtained by filtering a<br />

frame with the set of basis images. Since the basis images are obtained from nonnegative tensor factorization, they are separable<br />

and filtering can be implemented efficiently. The weighting coefficients encode dynamics and are used for action<br />

recognition. In the proposed action recognition framework, neither explicit detection and tracking of humans nor background<br />

subtraction are needed. Furthermore, for recognizing location specific actions, we implicitely take scene objects into account.<br />

- 136 -


13:30-16:30, Paper TuBCT8.32<br />

Action Detection in Crowded Videos using Masks<br />

Guo, Ping, Beijing Jiaotong Univ.<br />

Miao, Zhenjiang, Beijing Jiaotong Univ.<br />

In this paper, we investigate the task of human action detection in crowded videos. Different from action analysis in clean<br />

scenes, action detection in crowded environments is difficult due to the cluttered backgrounds, high densities of people<br />

and partial occlusions. This paper proposes a method for action detection based on masks. No human segmentation or<br />

tracking technique is required. To cope with the cluttered and crowded backgrounds, shape and motion templates are built<br />

and the shape templates are used as masks for feature refining. In order to handle the partial occlusion problem, only the<br />

moving body parts in each motion are involved in action training. Experiments using our approach are conducted on the<br />

CMU dataset with encouraging results.<br />

13:30-16:30, Paper TuBCT8.33<br />

3D Model based Vehicle Tracking using Gradient based Fitness Evaluation under Particle Filter Framework<br />

Zhang, Zhaoxiang, Beihang Univ.<br />

Huang, Kaiqi, Chinese Academy of Sciences<br />

Tan, Tieniu, Chinese Academy of Sciences<br />

Wang, Yunhong, Beihang Univ.<br />

We address the problem of 3D model based vehicle tracking from monocular videos of calibrated traffic scenes. A 3D<br />

wire-frame model is set up as prior information and an efficient fitness evaluation method based on image gradients is introduced<br />

to estimate the fitness score between the projection of vehicle model and image data, which is then combined<br />

into a particle filter based framework for robust vehicle tracking. Numerous experiments are conducted and experimental<br />

results demonstrate the effectiveness of our approach for accurate vehicle tracking and robustness to noise and occlusions.<br />

13:30-16:30, Paper TuBCT8.34<br />

Recovering 3D Shape and Light Source Positions from Non-Planar Shadows<br />

Yamashita, Yukihiro, Nagoya Inst. of Tech.<br />

Sakaue, Fumihiko, Nagoya Inst. of Tech.<br />

Sato, Jun, Nagoya Inst. of Tech.<br />

Recently, Shadow Graph has been proposed for recovering 3D shapes from shadows projected on curved surfaces. Unfortunately,<br />

this method requires a large computational cost. In this paper, we introduce 1D Shadow Graph which can be<br />

used for recovering 3D shapes with much smaller computational costs. We also extend our method, so that we can estimate<br />

both 3D shapes and light source positions simultaneously under a condition where 3D shapes and light sources are unknown.<br />

13:30-16:30, Paper TuBCT8.35<br />

3D Contour Model Creation for Stereo-Vision Systems<br />

Maruyama, Kenichi, National Inst. of Advanced Industrial Science and Tech.<br />

Kawai, Yoshihiro, National Inst. of Advanced Industrial Science and Tech.<br />

Tomita, Fumiaki, National Inst. of Advanced Industrial Science and Tech.<br />

The present paper describes a method for automatic 3D contour model creation for stereo-vision systems. The object<br />

model is a triangular surface mesh and a set of aspect models, which consists of model features and model points. Model<br />

features and model points are generated using 3D contours, which are estimated by the projected images of the triangular<br />

surface mesh from multiple discrete viewing directions. Using a non-photorealistic rendering approach, we extract not<br />

only the outer contours but also the inner contours of the projected images. Using both the inner and outer contours of the<br />

projected images, we create the object model which has 3D inner contour features and 3D contour generator features. Experimental<br />

results obtained using the 3D localization algorithm demonstrate the effectiveness of the proposed model.<br />

- 137 -


13:30-16:30, Paper TuBCT8.36<br />

Multibody Motion Classification using the Geometry of 6 Points in 2D Images<br />

Nordberg, Klas, Linköping Univ.<br />

Zografos, Vasileios, Linkoping Univ.<br />

We propose a method for segmenting an arbitrary number of moving objects using the geometry of 6 points in 2D images to<br />

infer motion consistency. This geometry allows us to determine whether or not observations of 6 points over several frames<br />

are consistent with a rigid 3D motion. The matching between observations of the 6 points and an estimated model of their<br />

configuration in 3D space is quantified in terms of a geometric error derived from distances between the points and 6 corresponding<br />

lines in the image. This leads to a simple motion inconsistency score that is derived from the geometric errors of<br />

6 points, that in the ideal case should be zero when the motion of the points can be explained by a rigid 3D motion. Initial<br />

clusters are determined in the spatial domain and merged in motion trajectory domain based on the score. Each point is then<br />

assigned to a cluster by assigning the point to the segment of the lowest score. Our algorithm has been tested with real image<br />

sequences from the Hopkins155 database with very good results, competing with the state of the art methods, particularly<br />

for degenerate motion sequences. In contrast the motion segmentation methods based on multi-body factorization, that assumes<br />

an affine camera model, the proposed method allows the mapping from the 3D space to the 2D image to be fully projective.<br />

13:30-16:30, Paper TuBCT8.37<br />

Reflection Removal in Colour Videos<br />

Conte, Donatello, Univ. di Salerno<br />

Foggia, Pasquale, Univ. di Salerno<br />

Percannella, Gennaro, Univ. di Salerno<br />

Tufano, Francesco, Univ. degli Studi di Salerno<br />

Vento, Mario, Univ. degli Studi di Salerno<br />

This paper presents a novel method for reflection removal in the context of an object detection system. The method is based<br />

on chromatic properties of the reflections and does not require a geometric model of the objects. An experimental evaluation<br />

of the proposed method has been performed on a large database, showing its effectiveness.<br />

13:30-16:30, Paper TuBCT8.38<br />

A Compound MRF Texture Model<br />

Haindl, Michael, Inst. of Information Theory and Aut.<br />

Havlicek, Vojtech, Inst. of Information Theory and Aut.<br />

This paper describes a novel compound Markov random field model capable of realistic modelling of multispectral bidirectional<br />

texture function, which is currently the most advanced representation of visual properties of surface materials. The<br />

proposed compound Markov random field model combines a non-parametric control random field with analytically solvable<br />

wide sense Markov representation for single regions and thus allows to avoid demanding Markov Chain Monte Carlo methods<br />

for both parameters estimation and the compound random field synthesis.<br />

13:30-16:30, Paper TuBCT8.39<br />

Shape Prototype Signatures for Action Recognition<br />

Donoser, Michael, Graz Univ. of Tech.<br />

Riemenschneider, Hayko, Graz Univ. of Tech.<br />

Bischof, Horst, Graz Univ. of Tech.<br />

Recognizing human actions in video sequences is frequently based on analyzing the shape of the human silhouette as the<br />

main feature. In this paper we introduce a method for recognizing different actions by comparing signatures of similarities<br />

to pre-defined shape prototypes. In training, we build a vocabulary of shape prototypes by clustering a training set of human<br />

silhouettes and calculate prototype similarity signatures for all training videos. During testing a prototype signature is calculated<br />

for the test video and is aligned to each training signature by dynamic time warping. A simple voting scheme over<br />

the similarities to the training videos provides action classification results and temporal alignments to the training videos.<br />

Experimental evaluation on a reference data set demonstrates that state-of-the-art results are achieved.<br />

- 138 -


13:30-16:30, Paper TuBCT8.40<br />

Shape Guided Maximally Stable Extremal Region (MSER) Tracking<br />

Donoser, Michael, Graz Univ. of Tech.<br />

Riemenschneider, Hayko, Graz Univ. of Tech.<br />

Bischof, Horst, Graz Univ. of Tech.<br />

Maximally Stable Extremal Regions (MSERs) are one of the most prominent interest region detectors in computer vision<br />

due to their powerful properties and low computational demands. In general MSERs are detected in single images, but given<br />

image sequences as input, the repeatability of MSER detection can be improved by exploiting correspondences between<br />

subsequent frames by feature based analysis. Such an approach fails during fast movements, in heavily cluttered scenes and<br />

in images containing several similar sized regions because of the simple feature based analysis. In this paper we propose an<br />

extension of MSER tracking by considering shape similarity as strong cue for defining the frame-to-frame correspondences.<br />

Efficient calculation of shape similarity scores ensures that real-time capability is maintained. Experimental evaluation<br />

demonstrates improved repeatability and an application for tracking weakly textured, planar objects.<br />

13:30-16:30, Paper TuBCT8.41<br />

Locating People in Images by Optimal Cue Integration<br />

Atienza-Vanacloig, Vicente, Pol. Univ. of Valencia<br />

Rosell Ortega, Juan, Pol. Univ. of Valencia<br />

Andreu-Garcia, Gabriela, Pol. Univ. of Valencia<br />

Valiente, Jose Miguel, Pol. Univ. of Valencia<br />

This paper describes an approach to segment and locate people in crowded scenarios with application to a surveillance system<br />

for airport dependencies. To obtain robust operation, the system analyzes a variety of visual cues color, motion and shape<br />

and integrates them optimally. A general method for automatic inference of optimal cue integration rules is presented. This<br />

schema, based on supervised training on video sequences, avoids the need of explicitly formulate combination rules based<br />

on a-priori constraints. The performance of the system is at least as good as classical fusing strategies like those based on<br />

voting, because the optimized decision engine implicitly includes these and other strategies.<br />

13:30-16:30, Paper TuBCT8.42<br />

Visual Tracking Algorithm using Pixel-Pair Feature<br />

Nishida, Kenji, National Inst. of Advanced Industrial Science and Tech.<br />

Kurita, Takio, National Inst. of Advanced Industrial Science and Tech.<br />

Ogiuchi, Yasuo, Sumitomo Electric Industries Ltd.<br />

Higashikubo, Masakatsu, Sumitomo Electric Industries Ltd.<br />

A novel visual tracking algorithm is proposed in this paper. The algorithm uses pixel-pair features to discriminate between<br />

an image patch with an object in the correct position and image patches with an object in an incorrect position. The pixelpair<br />

feature is considered to be robust for the illumination change, and also is robust for partial occlusion when appropriate<br />

features are selected in every video frame. The tracking precision for a deforming object (skier) is examined and also the occlusion<br />

detection method is described.<br />

13:30-16:30, Paper TuBCT8.43<br />

Self-Calibration of Radially Symmetric Distortion by Model Selection<br />

Fujiki, Jun, National Inst. of Advanced Industrial Science and Tech.<br />

Hino, Hideitsu, Waseda Univ.<br />

Usami, Yumi, Waseda Univ.<br />

Akaho, Shotaro, National Inst. of Advanced Industrial Science and Tech.<br />

Murata, Noboru, Waseda Univ.<br />

For self-calibration of general radially symmetric distortion (RSD) of omni directional cameras such as fish-eye lenses, calibration<br />

parameters are usually estimated so that curved lines, which are supposed to be straight in the real-world, are mapped<br />

to straight lines in the calibrated image, which is assumed to be taken by an ideal pin-hole camera. In this paper, a method<br />

of calibrating RSD is introduced base on the notion of principal component analysis (PCA). In the proposed method, the<br />

distortion function, which maps a distorted image to an ideal pin-hole camera image, is assumed to be a linear combination<br />

of a certain class of basis functions, and an algorithm for solving its coefficients by using line patterns is given. Then a<br />

- 139 -


method of selecting good basis functions is proposed, which aims to realize appropriate calibration in practice. Experimental<br />

results for synthetic data and real images are presented to emonstrate the performance of our calibration method.<br />

13:30-16:30, Paper TuBCT8.44<br />

A Global Spatio-Temporal Representation for Action Recognition<br />

Deng, Chao, Tianjin Univ.<br />

Cao, Xiaochun, Tianjin Univ.<br />

Liu, Hanyu, Univ. of Southern Mississippi<br />

Chen, Jian, Univ. of Southern Mississippi<br />

In this paper we introduce an effective method to construct a global spatio-temporal representation for action recognition.<br />

This representation is inspired by the fact that human actions can be treated as 3D shapes induced by the silhouettes in the<br />

space-time volume. We estimate the silhouettes which contain detailed shape information of the action, and present an<br />

efficient sampling method to extract interest points along the silhouettes. The local interest point is represented by a spatiotemporal<br />

descriptor based on 2D DAISY. Our global space-time representation is the integration of these local descriptors<br />

in an order along the silhouette. In this manner, we not only utilize the static shape information, but also the spatial-temporal<br />

cue. We have obtained impressive results on publicly available action datasets.<br />

13:30-16:30, Paper TuBCT8.45<br />

Super-Resolution Texture Mapping from Multiple View Images<br />

Iiyama, Masaaki, Kyoto Univ.<br />

Kakusho, Koh, Kwansei Gakuin Univ.<br />

Minoh, Michihiko, Kyoto Univ.<br />

This paper presents an artifact-free super resolution texture mapping from multiple-view images. The multiple-view images<br />

are upscaled with a learning-based super resolution technique and are mapped onto a 3D mesh model. However, mapping<br />

multiple-view images onto a 3D model is not an easy task, because artifacts may appear when different upscaled images are<br />

mapped onto neighboring meshes. We define a cost function that becomes large when artifacts appear on neighboring meshes,<br />

and our method seeks the image-and mesh assignment that minimizes the cost function. Experimental results with real images<br />

demonstrate the effectiveness of our method.<br />

13:30-16:30, Paper TuBCT8.46<br />

Automatic Weak Calibration of Master-Slave Surveillance System based on Mosaic Image<br />

Li, You, Shanghai Jiao Tong University<br />

Song, Li, Shanghai Jiao Tong University<br />

Wang, Jia, Shanghai Jiao Tong University<br />

A master-slave camera surveillance system is composed of one(or more) wide FOV(field of view) static camera and one(or<br />

more) dynamic PTZ(Pan-Tilt-Zoom) camera. In such a system, master camera monitors a wide field and provides positional<br />

information of interesting objects to slave camera so that it can dynamically track them. This paper describes a novel method<br />

for the calibration of master-slave surveillance. The method uses a mosaic image created by snapshots of slave camera to estimate<br />

the relationship between static master camera plane and pan-tilt controls of slave camera. Compared with other ways,<br />

this solution provides an efficient and automatic way to calibration of a master-slave system.<br />

13:30-16:30, Paper TuBCT8.47<br />

Reconstruction-Free Parallel Planes Identification from Uncalibrated Images<br />

Habed, Adlane, Univ. deBourgogne<br />

Amintabar, Amirhasan, Univ. of Windsor<br />

Boufama, Boubakeur, Univ. of Windsor<br />

This paper proposes a new method for identifying parallel planes in a scene from three or more uncalibrated images. By<br />

using the fact that parallel planes intersect at infinity, we were able to devise a linear relationship between the inter-image<br />

homographies of the parallel planes and the plane at infinity. This relationship is combined with the so-called modulus constraint<br />

for identifying pairs of parallel planes solely from point correspondences. Experiments with both synthetic and real<br />

images have validated our method.<br />

- 140 -


13:30-16:30, Paper TuBCT8.48<br />

Accurate Dense Stereo by Constraining Local Consistency on Superpixels<br />

Mattoccia, Stefano, Univ. of Bologna<br />

Segmentation is a low-level vision cue often deployed by stereo algorithms to assume that disparity within superpixels<br />

varies smoothly. In this paper, we show that constraining, on a superpixel basis, the cues provided by a recently proposed<br />

technique, which explicitly models local consistency among neighboring points, yields accurate and dense disparity fields.<br />

Our proposal, starting from the initial disparity hypotheses of a fast dense stereo algorithm based on scan line optimization,<br />

demonstrates its effectiveness by enabling us to obtain results comparable to top-ranked algorithms based on iterative disparity<br />

optimization methods.<br />

13:30-16:30, Paper TuBCT8.49<br />

On-Line Structure and Motion Estimation based on an Novel Parameterized Extended Kalman Filter<br />

Haner, Sebastian, Lund Univ. of Tech.<br />

Heyden, Anders, Lund Univ.<br />

Estimation of structure and motion in computer vision systems can be performed using a dynamic systems approach,<br />

where states and parameters in a perspective system are estimated. We present a novel on-line method for structure and<br />

motion estimation in densely sampled image sequences. The proposed method is based on an extended Kalman filter and<br />

a novel parameterization. We assume calibrated cameras and derive a dynamic system describing the motion of the camera<br />

and the image formation. By a change of coordinates, we represent this system by normalized image coordinates and the<br />

inverse depths. Then we apply an extended Kalman filter for estimation of both structure and motion. The performance of<br />

the proposed method is demonstrated in both simulated and real experiments. We furthermore compare our method to the<br />

unified inverse depth parameterization and show that we achieve superior results.<br />

13:30-16:30, Paper TuBCT8.51<br />

Discriminant and Invariant Color Model for Tracking under Abrupt Illumination Changes<br />

Scandaliaris, Jorge, CSIC-UPC<br />

Sanfeliu, Alberto, Univ. Pol. De Catalunya<br />

The output from a color imaging sensor, or apparent color, can change considerably due to illumination conditions and<br />

scene geometry changes. In this work we take into account the dependence of apparent color with illumination an attempt<br />

to find appropriate color models for the typical conditions found in outdoor settings. We evaluate three color based trackers,<br />

one based on hue, another based on an intrinsic image representation and the last one based on a proposed combination of<br />

a chromaticity model with a physically reasoned adaptation of the target model. The evaluation is done on outdoor sequences<br />

with challenging illumination conditions, and shows that the proposed method improves the average track completeness<br />

by over 22% over the hue-based tracker and the closeness of track by over 7% over the tracker based on the<br />

intrinsic image representation.<br />

13:30-16:30, Paper TuBCT8.52<br />

Using Local Affine Invariants to Improve Image Matching<br />

Fleck, Daniel, George Mason Univ.<br />

Duric, Zoran, George Mason Univ.<br />

A method to classify tentative feature matches as inliers or outliers to a transformation model is presented. It is well known<br />

that ratios of areas of corresponding shapes are affine invariants [6]. Our algorithm uses consistency of ratios of areas in<br />

pairs of images to classify matches as inliers or outliers. The method selects four matches within a region, and generates<br />

all possible corresponding triangles. All matches are classified as inliers or outliers based on the variance among the ratio<br />

of areas of the triangles. The selected inliers are used to compute a homography transformation. We present experimental<br />

results showing significant improvements over the baseline RANSAC algorithm for pairs of images from the Zurich Building<br />

Database.<br />

- 141 -


13:30-16:30, Paper TuBCT8.53<br />

Segmenting Video Foreground using a Multi-Class MRF<br />

Dickinson, Patrick, Univ. of Lincoln<br />

Hunter, Andrew, Univ. of Lincoln<br />

Appiah, Kofi, Lincoln Univ.<br />

Methods of segmenting objects of interest from video data typically use a background model to represent an empty, static<br />

scene. However, dynamic processes in the background, such as moving foliage and water, can act to undermine the robustness<br />

of such methods and result in false positive object detections. Techniques for reducing errors have been proposed,<br />

including Markov Random Field (MRF) based pixel classification schemes, and also the use of region-based models. The<br />

work we present here combines these two approaches, using a region-based background model to provide robust likelihoods<br />

for multi-class MRF pixel labelling. Our initial results show the effectiveness of our method, by comparing performance<br />

with an analogous per-pixel likelihood model.<br />

13:30-16:30, Paper TuBCT8.54<br />

Real-Time Pose Regression with Fast Volume Descriptor Computation<br />

Hirai, Michiro, NAIST<br />

Ukita, Norimichi, Nara Inst. of Science and Tech.<br />

Kidode, Masatsugu, NAIST<br />

We present a real-time method for estimating the pose of a human body using its 3D volume obtained from synchronized<br />

videos. The method achieves pose estimation by pose regression from its 3D volume. While the 3D volume allows us to<br />

estimate the pose robustly against self occlusions, 3D volume analysis requires a large amount of computational cost. We<br />

propose fast and stable volume tracking with efficient volume representation in a low dimensional dynamical model. Experimental<br />

results demonstrated that pose estimation of a body with a significantly deformable clothing could run at around<br />

60 fps.<br />

TuBCT9 Lower Foyer<br />

Document Analysis Poster Session<br />

Session chair: Arica, Nafiz (Turkish Naval Academy)<br />

13:30-16:30, Paper TuBCT9.1<br />

Robust Staffline Thickness and Distance Estimation in Binary and Gray-Level Music Scores<br />

Cardoso S., Jaime, Univ. do Porto<br />

Silva, Rebelo, Ana Maria, Univ. do Porto<br />

The optical recognition of handwritten musical scores by computers remains far from ideal. Most OMR algorithms rely<br />

on an estimation of the staff line thickness and the vertical line distance within the same staff. Subsequent operation can<br />

use these values as references, dismissing the need for some predetermined threshold values. In this work we improve on<br />

previous conventional estimates for these two reference lengths. We start by proposing a new method for binarized music<br />

scores and then extend the approach for gray-level music scores. An experimental study with 50 images is used to assess<br />

the interest of the novel method.<br />

13:30-16:30, Paper TuBCT9.2<br />

Hierarchical Decomposition of Handwriting Deformation Vector Field for Improving Recognition Accuracy<br />

Wakahara, Toru, Hosei Univ.<br />

Uchida, Seiichi, Kyushu Univ.<br />

This paper addresses the problem of how to extract, describe, and evaluate handwriting deformation from the deterministic<br />

viewpoint for improving recognition accuracy. The key ideas are threefold. The first is to extract handwriting deformation<br />

vector field (DVF) between a pair of input and target images by 2D warping. The second is to hierarchically decompose<br />

the DVF by a parametric deformation model of global/local affine transformation, where local affine transformation is iteratively<br />

applied to the DVF by decreasing window sizes. The third is to accept only low-order deformation components<br />

as natural, within-class handwriting deformation. Experiments using the handwritten numeral database IPTP CDROM1B<br />

show that correlation-based matching absorbing components of global affine transformation and local affine transformation<br />

up to the 3 rd order achieved a higher recognition rate of 92.1% than that of 87.0% obtained by original 2D warping.<br />

- 142 -


13:30-16:30, Paper TuBCT9.3<br />

Prototype-Based Methodology for the Statistical Analysis of Local Features in Stereotypical Handwriting Tasks<br />

O’Reilly, Christian, École Pol. De Montreal<br />

Plamondon, Réjean, École Pol. De Montréal<br />

A three steps methodology is proposed to derive consistent sets of local features which may be easily compared between<br />

the different samples of a stereotypical human handwriting movement, allowing the statistical analysis its local variability.<br />

This technique is illustrated using the Sigma-Lognormal modeling of on-line triangular trajectory patterns obtained from<br />

a standardized neuromuscular task. The overall approach can be adapted and generalized to the analysis of the end-effector<br />

kinematics of many planar upper limb movements.<br />

13:30-16:30, Paper TuBCT9.4<br />

The Snippet Statistics of Font Recognition<br />

Lidke, Jakub, Fraunhofer IAIS<br />

Thurau, Christian, Fraunhofer IAIS<br />

Bauckhage, Christian, Fraunhofer IAIS<br />

This paper considers the topic of automatic font recognition. The task is to recognize a specific font from a text snippet.<br />

Unlike previous contributions, we evaluate, how the frequencies of certain letters or words influence automatic recognition<br />

systems. The evaluation provides estimates on the general feasibility of font recognition under various changing conditions.<br />

Results on a data-set containing 747 different fonts shows that precision can vary between 16% and 94%, dependent on<br />

(I) which letters are provided, (ii) how many letters are provided, and (iii) which language is used – as these factors considerably<br />

influence the text snippet statistics. As a second contribution, we introduce a novel bag-of-features based approach<br />

to font recognition.<br />

13:30-16:30, Paper TuBCT9.5<br />

A Study of Designing Compact Recognizers of Handwritten Chinese Characters using Multiple-Prototype based<br />

Classifiers<br />

Wang, Yongqiang, The Univ. of Hong Kong<br />

Huo, Qiang, Microsoft Res. Asia<br />

We present a study of designing compact recognizers of handwritten Chinese characters using multiple-prototype based<br />

classifiers. A modified Quick prop algorithm is proposed to optimize a sample-separation-margin based minimum classification<br />

error objective function. Split vector quantization technique is used to compress classifier parameters. Benchmark<br />

results are reported for classifiers with different footprints trained from about 10 million samples on a recognition task<br />

with a vocabulary of 9282 character classes which include 9119 Chinese characters, 62 alphanumeric characters, 101<br />

punctuation marks and symbols.<br />

13:30-16:30, Paper TuBCT9.6<br />

Membership Functions for Zoning-Based Recognition of Handwritten Digits<br />

Impedovo, Sebastiano, Univ. degli Studi di Bari<br />

Impedovo, Donato, Pol. Di Bari<br />

Pirlo, Giuseppe, Univ. degli Studi di Bari<br />

Modugno, Raffaele, Univ. of Bari “Aldo Moro”<br />

This paper focuses the role of membership functions in zoning based classification. In fact, the effectiveness of a zoning<br />

methods depends not only on the way in which the pattern image is partitioned by the zoning, but also on the criteria<br />

adopted to define the way in which a feature influences the diverse zones. For this purpose, an experimental investigation<br />

is presented, that focuses the most valuable way in which a features spreads its influence on the zones of the pattern image.<br />

The experimental tests have been carried out in the field of handwritten digit recognition, using the numeral digits of the<br />

CEDAR database. The result points out the membership function has a paramount relevance on the classification performance<br />

and demonstrate that the exponential model outperforms other membership functions.<br />

- 143 -


13:30-16:30, Paper TuBCT9.7<br />

Scribe Identification in Medieval English Manuscripts<br />

Gilliam, Tara, Univ. of York<br />

Wilson, Richard, Univ. of York<br />

Clark, John A., Univ. of York<br />

In this paper we present work on automated scribe identification on a new Middle-English manuscript dataset from around<br />

the 14 th – 15 th century. We discuss the image and textual problems encountered in processing historical documents, and<br />

demonstrate the effect of accounting for manuscript style on the writer identification rate. The grapheme code<strong>book</strong> method<br />

is used to achieve a Top-1 classification accuracy of up to 77% with a modification to the distance measure. The performance<br />

of the Sparse Multinomial Logistic Regression classifier is compared against five k-nn classifiers. We also consider<br />

classification against the principal components and propose a method for visualising the principal component vectors in<br />

terms of the original grapheme features.<br />

13:30-16:30, Paper TuBCT9.8<br />

Recognition of Handwritten Arabic (Indian) Numerals using Freeman’s Chain Codes and Abductive Network Classifier<br />

Lawal, Isah Abdullahi, King Fahd Univ. of Petroleum & Minerals<br />

Abdel-Aal, Radwan E., King Fahd Univ. of Petroleum & Minerals<br />

Mahmoud, Sabri A., King Fahd Univ. of Petroleum & Minerals<br />

Accurate automatic recognition of handwritten Arabic numerals has several important applications, e.g. in banking transactions,<br />

automation of postal services, and other data entry related applications. A number of modelling and machine learning<br />

techniques have been used for handwritten Arabic numerals recognition, including Neural Network, Support Vector<br />

Machine, and Hidden Markov Models. This paper proposes the use of abductive networks to the problem. We studied the<br />

performance of abductive network architecture on a dataset of 21120 samples of handwritten 0-9 digits produced by 44<br />

writers. We developed a new feature set using histograms of contour points chain codes. Recognition rates as high as<br />

99.03% were achieved, which surpass the performance reported in the literature for other recognition techniques on the<br />

same data set. Moreover, the technique achieves a significant reduction in the number of features required.<br />

13:30-16:30, Paper TuBCT9.9<br />

A SVM-HMM based Online Classifier for Handwritten Chemical Symbols<br />

Zhang, Yang, Nankai Univ.<br />

Shi, Guangshun, Nankai Univ.<br />

Wang, Kai, Nankai Univ.<br />

This paper presents a novel double-stage classifier for handwritten chemical symbols recognition task. The first stage is<br />

rough classification, SVM method is used to distinguish non-ring structure (NRS) and organic ring structure (ORS) symbols,<br />

while HMM method is used for fine recognition at second stage. A point-sequence-reordering algorithm is proposed<br />

to improve the recognition accuracy of ORS symbols. Our test data set contains 101 chemical symbols, 9090 training<br />

samples and 3232 test samples. Finally, we obtained top-1 accuracy of 93.10% and top-3 accuracy of 98.08% based on<br />

the test data set.<br />

13:30-16:30,Paper TuBCT9.10<br />

Symbol Recognition Combining Vectorial and Pixel-Level Features for Line Drawings<br />

Su, Feng, Nanjing Univ.<br />

Lu, Tong, Nanjing Univ.<br />

Yang, Ruoyu, Nanjing Univ.<br />

In this paper, we present an approach for symbol representation and recognition in line drawings, integrating both the vector-based<br />

structural description and pixel-level statistical features of the symbol. For the former, a vectorial template is<br />

defined on the basis of the vectorization model and exploited in segmenting symbols from the line network. For the latter,<br />

a Radon-transform-based signature is employed to characterize shapes on the symbol and the components level. Experimental<br />

results on real technical drawings are presented to show the promising aspect of our approach.<br />

- 144 -


13:30-16:30,Paper TuBCT9.11<br />

Writing Order Recovery from Off-Line Handwriting by Graph Traversal<br />

Cordella, Luigi P., Univ. di Napoli Federico II<br />

De Stefano, Claudio, Univ. di Napoli Federico II<br />

Marcelli, Angelo, Univ. of Salerno<br />

Santoro, Adolfo, Univ. of Salerno<br />

We present a method to recover the dynamic writing order from static images of handwriting. The static handwriting is<br />

initially represented by its skeleton, which is then converted into a graph, whose arcs correspond to the skeleton branches,<br />

and nodes to either end point or branch point of the skeleton. Criteria derived by handwriting generation are then applied<br />

to transform the graph in such a way that all its nodes, but the first and the last, have an even degree, so that it can be traversed<br />

from the first to the last by using the Fleury’s algorithm. The experimental results show that combining criteria derived<br />

from handwriting generation models with graph traversal leads to reconstruct the original sequence produced by a<br />

writer even in case of complex handwriting, i.e handwriting with retracing, crossings and pen-up’s.<br />

13:30-16:30,Paper TuBCT9.12<br />

Holistic Urdu Handwritten Word Recognition using Support Vector Machine<br />

Sagheer, Malik Waqas, CENPARMI, Concordia Univ.<br />

He, Chun Lei, Concordia Univ.<br />

Nobile, Nicola, Concordia Univ. CENPARMI<br />

Suen, Ching Y.<br />

Since the Urdu language has more isolated letters than Arabic and Farsi, a research on Urdu handwritten word is desired.<br />

This is a novel approach to use the compound features and a Support Vector Machine (SVM) in offline Urdu word recognition.<br />

Due to the cursive style in Urdu, a classification using a holistic approach is adapted efficiently. Compound feature<br />

sets, which involves in structural and gradient features (directional features), are extracted on each Urdu word. Experiments<br />

have been conducted on the CENPARMI Urdu Words Database, and a high recognition accuracy of 97.00% has been<br />

achieved.<br />

13:30-16:30,Paper TuBCT9.13<br />

A Framework for the Combination of Different Arabic Handwritten Word Recognition Systems<br />

El Abed, Haikal, Braunschweig Tech. Univ.<br />

Märgner, Volker, Braunschweig Tech. Univ.<br />

In this paper we present A Framework for the Combination of Different Arabic Handwritten Word Recognition Systems<br />

to achieve a decision with a higher performance. This performance can be expressed by lower rejection rates and higher<br />

recognition rates. The used methods range from voting schemes based on results of different recognizer to a neural network<br />

decision based on normalized confidences. This work presents an extension of the well known combination methods for<br />

a large lexicon, an extension from maximum 30 classes (e.g., 10 classes for digits classification) to 937 classes for the<br />

IfN/ENIT-database. In addition, different reject rules based on the evaluation and analysis of individual and combined<br />

systems output are discussed. Different threshold function for reject levels are tested and evaluated. Tests with a set of<br />

recognizer, which participated in the ICDAR 2007 competition and based on set coming from the IfN/ENIT-database<br />

show that a word error rate (WER) of 5.29% without reject and with a reject rate less than 25% even a word error rate of<br />

less than 1%.<br />

13:30-16:30,Paper TuBCT9.15<br />

Degraded Character Recognition by Image Quality Evaluation<br />

Liu, Chunmei, Tongji Univ.<br />

The character image quality plays an important role in degraded character recognition which could tell the recognition<br />

difficulty. This paper proposed a novel approach to degraded character recognition by three kinds of independent degradation<br />

sources. It is composed of two stems: character image quality evaluation, character recognition. Firstly, it presents<br />

the dual-evaluation to evaluate the image quality of the input character. Secondly, according to the input evaluation result,<br />

the character recognition sub-systems adaptively act on. These sub-systems are trained by character sets whose image<br />

qualities are similar to the input’s quality, and have special features and special classifiers respectively. Experiment results<br />

demonstrate the proposed approach highly improved the performance of degraded character recognition system.<br />

- 145 -


13:30-16:30,Paper TuBCT9.16<br />

Offline Arabic Handwriting Identification using Language Diacritics<br />

Lutf, Mohammed, Huazhong Univ. of Science and Tech.<br />

You, Xinge, Huazhong Univ. of Science and Tech.<br />

Li, Hong, Huazhong Univ. of Science and Tech.<br />

In this paper, we present an approach for writer identification using off-line Arabic handwriting. The proposed method introduced<br />

Arabic writing in a new form, by presenting Arabic writing in its basic components instead of alphabetic. We<br />

split the input document into two parts: one for the letters and the other for the diacritics, we extract all diacritics from the<br />

input image and calculate the LBP histogram for each diacritic then concatenate these histograms to use it as handwriting<br />

features. We use the IFN/ENIT database in the experiments reported here and our tests involve 287 writers. The results<br />

show that our method is very effective and makes the handling of the Arabic handwriting more easily than before.<br />

13:30-16:30,Paper TuBCT9.17<br />

Removing Rule-Lines from Binary Handwritten Arabic Document Images using Directional Local Profile<br />

Shi, Zhixin, SUNY at Buffalo<br />

Setlur, Srirangaraj, Univ. at Buffalo<br />

Govindaraju, Venu, Univ. at Buffalo<br />

In this paper, we present a novel approach for detecting and removing pre-printed rule-lines from binary handwritten<br />

Arabic document images. The proposed technique is based on a directional local profiling approach for the detection of<br />

the rule-line locations. Then a refined adaptive vertical run-length search is designed for removing the rule-line pixels<br />

without much damaging to the text. They are also tolerate to the variations in the rule-lines such as broken lines, orientation<br />

changes and variation in the thickness of the rule-lines. Analysis of experimental results on the DARPA MADCAT Arabic<br />

handwritten document data indicates that the method is robust and is capable of correctly removing rule-lines.<br />

13:30-16:30,Paper TuBCT9.18<br />

A Bag-of-Pages Approach to Unordered Multi-Page Document Classification<br />

Gordo, Albert, Univ. Autònoma de Barcelona<br />

Perronnin, Florent, Xerox Res. Centre Europe<br />

We consider the problem of classifying documents containing multiple unordered pages. For this purpose, we propose a<br />

novel bag-of-pages document representation. To represent a document, one assigns every page to a prototype in a code<strong>book</strong><br />

of pages. This leads to a histogram representation which can then be fed to any discriminative classifier. We also consider<br />

several refinements over this initial approach. We show on two challenging datasets that the proposed approach significantly<br />

outperforms a baseline system.<br />

13:30-16:30,Paper TuBCT9.19<br />

Fast Seamless Skew and Orientation Detection in Document Images<br />

Konya, Iuliu Vasile, Fraunhofer IAIS<br />

Eickeler, Stefan, Fraunhofer IAIS<br />

Seibert, Christoph, Fraunhofer IAIS<br />

Reliable and generic methods for skew detection are a necessity for any large-scale digitization projects. As one of the<br />

first processing steps, skew detection and correction has a heavy influence on all further document analysis modules, such<br />

as geometric and logical layout analysis. This paper introduces a generic, scale-independent algorithm capable of accurately<br />

detecting the global skew angle of document images within the range [-90,90] degrees. By using the same framework, the<br />

algorithm is then extended for Roman script documents so as to cope with the full range [-180,180) degrees of possible<br />

skew angles. Despite its generality, the improved algorithm is very fast and requires no explicit parameters. Experiments<br />

on a combined test set comprising around 110000 real-life images show the accuracy and robustness of the proposed<br />

method.<br />

- 146 -


13:30-16:30,Paper TuBCT9.20<br />

Unsupervised Block Covering Analysis for Text-Line Segmentation of Arabic Ancient Handwritten Document Images<br />

Boussellaa, Wafa, Univ. of Sfax<br />

Zahour, Abderrazek, Braunschweig Technical University<br />

El Abed, Haikal, Havre Univ.<br />

Benabdelhafid, Abdellatif, Braunschweig Technical University<br />

Alimi, Adel M., Univ. of Sfax<br />

This paper presents a new method for automatic text-line extraction from Arabic historical handwritten documents presenting<br />

an overlapping and multi-touching characters problems. Our approach is based on block covering analysis using unsupervised<br />

technique. This algorithm performs firstly a statistical block analysis which computes the optimal number of document decomposition<br />

into vertical strips. Then, our algorithm achieves a fuzzy as eline detection using fuzzy Cmeans algorithm. Finally,<br />

blocks are assigned to its corresponding lines. Experiment results show that the proposed method achieves high accuracy<br />

about 95% for detecting text lines in Arabic historical handwritten document images written with different scripts.<br />

13:30-16:30,Paper TuBCT9.21<br />

A Bi-Modal Handwritten Text Corpus: Baseline Results<br />

Pastor, Moises, Univ. Pol. De Valencia<br />

Toselli, Alejandro Héctor, Univ. Pol. De Valencia<br />

Casacuberta, Francisco, Univ. Pol. De Valencia<br />

Vidal, Enrique, Univ. Pol. De Valencia<br />

Handwritten text is generally captured through two main modalities: off-line and on-line. Smart approaches to handwritten<br />

text recognition (HTR) may take advantage of both modalities if they are available. This is for instance the case in computer-assisted<br />

transcription of text images, where on-line text can be used to interactively correct errors made by a main offline<br />

HTR system. We present here baseline results on the biMod-IAM-PRHLT corpus, which was recently compiled for<br />

experimentation with techniques aimed at solving the proposed multi-modal HTR problem, and is being used in one of the<br />

official <strong>ICPR</strong>-<strong>2010</strong> contests.<br />

13:30-16:30,Paper TuBCT9.22<br />

Feature Selection using Multiobjective Optimization for Named Entity Recognition<br />

Ekbal, Asif, Univ. of Heidelberg<br />

Saha, Sriparna, Univ. of Heidelberg<br />

Garbe, Christoph S., Univ. of Heidelberg<br />

Appropriate feature selection is a very crucial issue in any machine learning framework, specially in Maximum Entropy<br />

(ME). In this paper, the selection of appropriate features for constructing a ME based Named Entity Recognition (NER) system<br />

is posed as a multiobjective optimization (MOO) problem. Two classification quality measures, namely recall and precision<br />

are simultaneously optimized using the search capability of a popular evolutionary MOO technique, NSGA-II. The<br />

proposed technique is evaluated to determine suitable feature combinations for NER in two languages, namely Bengali and<br />

English that have significantly different characteristics. Evaluation results yield the recall, precision and F-measure values<br />

of 70.76%, 81.88% and 75.91%, respectively for Bengali, and 78.38%, 81.27% and 79.80%, respectively for English. Comparison<br />

with an existing ME based NER system shows that our proposed feature selection technique is more efficient than<br />

the heuristic based feature selection.<br />

13:30-16:30,Paper TuBCT9.23<br />

Redif Extraction in Handwritten Ottoman Literary Texts<br />

Can, Ethem Fatih, Bilkent Univ.<br />

Duygulu, Pinar, Bilkent Univ.<br />

Can, Fazli, Bilkent Univ.<br />

Kalpakli, Mehmet, Bilkent Univ.<br />

Repeated patterns, rhymes and redifs, are among the fundamental building blocks of Ottoman Divan poetry. They provide<br />

integrity of a poem by connecting its parts and bring a melody to its voice. In Ottoman literature, poets wrote their works by<br />

making use of the rhymes and redifs of previous poems according to the nazire (creative imitation) tradition either to prove<br />

their expertise or to show respect towards old masters. Automatic recognition of redifs would provide important data mining<br />

- 147 -


opportunities in literary analyses of Ottoman poetry where the majority of it is in handwritten form. In this study, we propose<br />

a matching criterion and method, Red if Extraction using Contour Segments (RECS) using the proposed matching criterion,<br />

that detects redifs in handwritten Ottoman literary texts using only visual analysis. Our method provides a success rate of<br />

0.682 in a test collection of 100 poems.<br />

13:30-16:30,Paper TuBCT9.24<br />

Analysis of Local Features for Handwritten Character Recognition<br />

Uchida, Seiichi, Kyushu Univ.<br />

Liwicki, Marcus, DFKI<br />

This paper investigates a part-based recognition method of handwritten digits. In the proposed method, the global structure<br />

of digit patterns is discarded by representing each pattern by just a set of local feature vectors. The method is then comprised<br />

of two steps. First, each of J local feature vectors of a target pattern is recognized into one of ten categories (``0’’—``9’’) by<br />

the nearest neighbor discrimination with a large database of reference vectors. Second, the category of the target pattern is<br />

determined by the majority voting on the J local recognition results. Despite a pessimistic expectation, we have reached<br />

recognition rates much higher than 90% for the task of digit recognition.<br />

13:30-16:30,Paper TuBCT9.25<br />

Detect Visual Spoofing in Unicode-Based Text<br />

Qiu, Bite, City Univ. of Hong Kong<br />

Fang, Ning, City Univ. of Hong Kong<br />

Liu, Wenyin, City U of HK<br />

Visual spoofing in Unicode-based text is anticipated as a severe web security problem in the near future as more and more<br />

Unicode-based web documents will be used. In this paper, to detect whether a suspicious Unicode character in a word is<br />

visual spoofing or not, the context of the suspicious character is utilized by employing a Bayesian framework. Specifically,<br />

two contexts are taken into consideration: simple context and general context. Simple context of a suspicious character is<br />

the word where the character exists while general context consists of all homoglyphs of the character within Universal Character<br />

Set (UCS). Three decision rules are designed and used jointly for convicting a suspicious character. Preliminary evaluations<br />

and user study show that the proposed approach can detect Unicode-based visual spoofing with high effectiveness<br />

and efficiency.<br />

13:30-16:30,Paper TuBCT9.26<br />

Comparing Several Techniques for Offline Recognition of Printed Mathematical Symbols<br />

Álvaro, Francisco, Inst. Tecnológico de Informática<br />

Sánchez, Joan Andreu, Univ. Pol. De Valencia<br />

Automatic recognition of printed mathematical symbols is a fundamental problem for recognition of mathematical expressions.<br />

Several classification techniques has been previously used, but there are very few works that compare different classification<br />

techniques on the same database and with the same experimental conditions. In this work we have tested classical<br />

and novelty classification techniques for mathematical symbol recognition on two databases.<br />

13:30-16:30,Paper TuBCT9.27<br />

Symbol Classification using Dynamic Aligned Shape Descriptor<br />

Fornés, Alicia Computer Vision Center<br />

Escalera, Sergio UB<br />

Llados, Josep Computer Vision Center<br />

Valveny, Ernest Univ. Autònoma de Barcelona<br />

Shape representation is a difficult task because of several symbol distortions, such as occlusions, elastic deformations, gaps<br />

or noise. In this paper, we propose a new descriptor and distance computation for coping with the problem of symbol recognition<br />

in the domain of Graphical Document Image Analysis. The proposed D-Shape descriptor encodes the arrangement information<br />

of object parts in a circular structure, allowing different levels of distortion. The classification is performed using<br />

a cyclic Dynamic Time Warping based method, allowing distortions and rotation. The methodology has been validated on<br />

different data sets, showing very high recognition rates.<br />

- 148 -


13:30-16:30,Paper TuBCT9.28<br />

Document Logo Detection and Recognition using Bayesian Model<br />

Wang, Hongye, Tsinghua Univ.<br />

Chen, Youbin, Tsinghua Univ.<br />

This paper presents a simple, dynamic approach to logo detection and recognition in document images. Although there<br />

are literatures on both logo detection and logo recognition issues, Current methods lack the adaptability to variable realworld<br />

documents. In this paper we initially observe this deficiency from a different point of view and reveal its inherent<br />

causation. Then we reorganize the structure of the logo detection and recognition procedures and integrate them into a<br />

unified framework. By applying feedback and selecting proper features, we make our framework dynamic and interactive.<br />

Experiments show that the proposed method outperforms existing methods in document processing domain.<br />

13:30-16:30,Paper TuBCT9.29<br />

An Efficient Staff Removal Approach from Printed Musical Documents<br />

Dutta, Anjan, Univ. Autonoma de Barcelona<br />

Pal, Umapada, Indian Statistical Inst.<br />

Fornés, Alicia, Computer Vision Center<br />

Llados, Josep, Computer Vision Center<br />

Staff removal is an important preprocessing step of the Optical Music Recognition (OMR). The process aims to remove<br />

the stafflines from a musical document and retain only the musical symbols, later these symbols are used effectively to<br />

identify the music information. This paper proposes a simple but robust method to remove stafflines from printed musical<br />

scores. In the proposed methodology we have considered a staffline segment as a horizontal linkage of vertical black runs<br />

with uniform height. We have used the neighbouring properties of a staffline segment to validate it as a true segment. We<br />

have considered the dataset along with the deformations described in \cite{ex8} for evaluation purpose. From experimentation<br />

we have got encouraging results.<br />

13:30-16:30,Paper TuBCT9.30<br />

Combining Spectral and Spatial Features for Robust Foreground-Background Separation<br />

Lettner, Martin, Vienna Univ. of Tech.<br />

Sablatnig, Robert, Vienna Univ. of Tech.<br />

Foreground-background separation in multispectral images of damaged manuscripts can benefit from both, spectral and<br />

spatial information. Therefore, we incorporate a Markov Random Field which provides a powerful tool to combine both<br />

features simultaneously. Higher order models enable the inclusion of spatial constraints based on stroke characteristics.<br />

We apply belief propagation for inference and include the higher order potentials by upgrading the message update. The<br />

proposed segmentation method requires no training and is independent of script, size, and style of characters. We will<br />

demonstrate the robust performance on a set of degraded documents and on synthetic images.<br />

13:30-16:30,Paper TuBCT9.31<br />

Unsupervised Learning of Stroke Tagger for Online Kanji Handwriting Recognition<br />

Blondel, Mathieu, Kobe Univ.<br />

Seki, Kazuhiro, Kobe Univ.<br />

Uehara, Kuniaki, Kobe Univ.<br />

Traditionally, HMM-based approaches to online Kanji handwriting recognition have relied on a hand-made dictionary,<br />

mapping characters to primitives such as strokes or substrokes. We present an unsupervised way to learn a stroke tagger<br />

from data, which we eventually use to automatically generate such a dictionary. In addition to not requiring a prior handmade<br />

dictionary, our approach can improve the recognition accuracy by exploiting unlabeled data when the amount of labeled<br />

data is limited.<br />

- 149 -


13:30-16:30,Paper TuBCT9.32<br />

A Baseline Dependent Approach for Persian Handwritten Character Segmentation<br />

Alaei, Alireza, Univ. of Mysore<br />

Nagabhushan, P., Univ. of Mysore<br />

Pal, Umapada, Indian Statistical Inst.<br />

In this paper, an efficient approach to segment Persian off-line handwritten text-line into characters is presented. The proposed<br />

algorithm first traces the baseline of the input text-line image and straightens it. Subsequently, it over-segments<br />

each word/subwords using features extracted from histogram analysis and then removes extra segmentation points using<br />

some baseline dependent as well as language dependent rules. We tested the proposed character segmentation scheme<br />

with 2 different datasets. On a test set of 899 Persian words/subwords created by us, 90.26% of the characters were segmented<br />

correctly. From another dataset of 200 handwritten Arabic word images [11] we obtained 93.49% correct segmentation<br />

accuracy.<br />

13:30-16:30,Paper TuBCT9.33<br />

Bayesian Networks Learning Algorithms for Online Form Classification<br />

Philippot, Emilie, Univ. Nancy 2, Loria<br />

Belaid, Yolande, Univ. Nancy 2, Loria<br />

Belaid, Abdel, Univ. Nancy 2, Loria<br />

In this paper a new method is presented for the recognition of online forms filled manually by a digital-type clip. This<br />

writing system transmits only the written fields without the pre-printed form. The form recognition consists in retrieving<br />

the original form directly from the filled fields without any context, which is a very challenging problem. We propose a<br />

method based on Bayesian networks. The networks use the conditional probabilities between fields in order to infer the<br />

real form. Two learning algorithms of form structures are employed to test their suitability for the case studied. The tests<br />

were conducted on the basis of 3200 forms provided by the Act image compagny, specialist in interactive writing processes.<br />

The first experiments show a recognition rate reaching more than 97%.<br />

13:30-16:30,Paper TuBCT9.34<br />

Bangla and English City Name Recognition for Indian Postal Automation<br />

Pal, Umapada, Indian Statistical Inst.<br />

Roy, R.K., Indian Statistical Inst.<br />

Kimura, Fumitaka, Mie Univ.<br />

Because of multi-lingual behavior destination address block of a postal document of an Indian state may be written in two<br />

or more scripts. From a statistical analysis of Indian postal document we noted that about 22.04% of Indian postal documents<br />

are written in two scripts. Because of inter-mixing of these scripts in postal address writings, it is very difficult to<br />

identify the script by which a city name is written. To avoid such identification difficulties, in this paper we proposed a<br />

lexicon-driven bi-lingual (English and Bangla) city name recognition scheme for Indian postal automation. We obtained<br />

93.19% accuracy when tested on 11875 city name samples.<br />

13:30-16:30,Paper TuBCT9.35<br />

Shape Code based Word-Image Matching for Retrieval of Indian Multi-Lingual Documents<br />

Tarafdar, Arundhati, Indian Statistical Inst.<br />

Mandal, Ranju, Indian Statistical Inst.<br />

Pal, Srikanta, Indian Statistical Inst.<br />

Pal, Umapada, Indian Statistical Inst.<br />

Kimura, Fumitaka, Mie Univ.<br />

In the current scenario retrieving information from document images is a challenging problem. In this paper we propose<br />

a shape code based word-image matching (word-spotting) technique for retrieval of multilingual documents written in Indian<br />

languages. Here, each query word image to be searched is represented by a primitive shape code using (I) zonal information<br />

of extreme points (ii) vertical shape based feature (iii) crossing count (with respect to vertical bar position) (iv)<br />

loop shape and position (v) background information etc. Each candidate word (a word having similar aspect ratio and<br />

topological feature to the query word) of the document is also coded accordingly. Then, an inexact string matching technique<br />

is used to measure the similarity between the primitive codes generated from the query word image and each can-<br />

- 150 -


didate word of the document with which the query image is to be searched. Based on the similarity score, we retrieve the<br />

document where the query image is found. Experimental results on Bangla, Devnagari and Gurumukhi scripts document<br />

image databases confirm the feasibility and efficiency of our proposed approach.<br />

13:30-16:30,Paper TuBCT9.36<br />

Stochastic Segment Model Adaptation for Offline Handwriting Recognition<br />

Prasad, Rohit, Raytheon BBN Tech.<br />

Bhardwaj, Anurag, SUNY Buffalo<br />

Subramanian, Krishna, Raytheon BBN Tech.<br />

Cao, Huaigu, Raytheon BBN Tech.<br />

Natarajan, P., BBN Tech.<br />

In this paper, we present techniques for unsupervised adaptation of stochastic segment models to improve accuracy on<br />

large vocabulary offline handwriting recognition (OHR) tasks. We build upon our previous work on stochastic segment<br />

modeling for Arabic OHR. In our previous work, stochastic character segments for each n-best hypothesis were generated<br />

by a hidden Markov model (HMM) recognizer, and then a segmental model was used as an additional knowledge source<br />

for re-ranking the n-best list. Here, we describe a novel framework for unsupervised adaptation. It integrates both HMM<br />

and segment model adaptation to achieve significant gains over un-adapted recognition. Experimental results demonstrate<br />

the efficacy of our proposed method on a large corpus of handwritten Arabic documents.<br />

13:30-16:30,Paper TuBCT9.37<br />

Shape-Based Image Retrieval using a New Descriptor based on the Radon and Wavelet Transforms<br />

Nacereddine, Nafaa, LORIA<br />

Tabbone, Salvatore, Univ. Nancy 2-LORIA<br />

Ziou, Djemel, Sherbrooke Univ.<br />

Hamami, Latifa, Ec. Nationale Pol.<br />

In this paper, the Radon transform is used to design a new descriptor called Phi-signature invariant to usual geometric<br />

transformations. Experiments show the effectiveness of the multilevel representation of the descriptor built from Phi-signature<br />

and R.<br />

13:30-16:30,Paper TuBCT9.38<br />

CUDA Implementation of Deformable Pattern Recognition and its Application to MNIST Handwritten Digit Database<br />

Mizukami, Yoshiki, Yamaguchi Univ.<br />

Tadamura, Katsumi, Yamaguchi Univ.<br />

Warrell, Jonathan, Oxford Brookes University<br />

Li, Peng, Univ. Coll. London<br />

Prince, Simon, Univ. Coll. London<br />

In this study we propose a deformable pattern recognition method with CUDA implementation. In order to achieve the<br />

proper correspondence between foreground pixels of input and prototype images, a pair of distance maps are generated<br />

from input and prototype images, whose pixel values are given based on the distance to the nearest foreground pixel. Then<br />

a regularization technique computes the horizontal and vertical displacements based on these distance maps. The dissimilarity<br />

is measured based on the eight-directional derivative of input and prototype images in order to leverage characteristic<br />

information on the curvature of line segments that might be lost after the deformation. The prototype-parallel displacement<br />

computation on CUDA and the gradual prototype elimination technique are employed for reducing the computational time<br />

without sacrificing the accuracy. A simulation shows that the proposed method with the k-nearest neighbor classifier gives<br />

the error rate of 0.57% for the MNIST handwritten digit database.<br />

13:30-16:30,Paper TuBCT9.39<br />

Text Independent Writer Identification for Bengali Script<br />

Chanda, Sukalpa, GJØVIK Univ. Coll.<br />

Franke, Katrin, Gjøvik Univ. Coll.<br />

Pal, Umapada, Indian Statistical Inst.<br />

Wakabayashi, Tetsushi, Mie Univ.<br />

- 151 -


Automatic identification of an individual based on his/her handwriting characteristics is an important forensic tool. In a<br />

computational forensic scenario, presence of huge amount of text/information in a questioned document cannot be always<br />

ensured. Also, compromising in terms of systems reliability under such situation is not desirable. We here propose a system<br />

to encounter such adverse situation in the context of Bengali script. Experiments with discrete directional feature and gradient<br />

feature are reported here, along with Support Vector Machine (SVM) as classifier. We got promising results of 95.19%<br />

writer identification accuracy at first top choice and 99.03% when considering first three top choices.<br />

13:30-16:30,Paper TuBCT9.40<br />

Document Image Retrieval using Feature Combination in Kernel Space<br />

Hassan, Ehtesham, Indian Inst. of Tech. Delhi<br />

Chaudhury, Santanu, Indian Inst. of Tech. Delhi<br />

Gopal, M, Indian Inst. of Tech. Delhi<br />

The paper presents application of multiple features for word based document image indexing and retrieval. A novel framework<br />

to perform Multiple Kernel Learning for indexing using the Kernel based Distance Based Hashing is proposed. The<br />

Genetic Algorithm based framework is used for optimization. Two different features representing the structural organization<br />

of word shape are defined. The optimal combination of both the features for indexing is learned by performing MKL. The<br />

retrieval results for document collection belonging to Devanagari script are presented.<br />

13:30-16:30,Paper TuBCT9.41<br />

A Novel Handwritten Urdu Word Spotting based on Connected Components Analysis<br />

Sagheer, Malik Waqas, CENPARMI, Concordia Univ.<br />

Nobile, Nicola, Concordia Univ. CENPARMI<br />

He, Chun Lei, Concordia Univ.<br />

Suen, Ching Y.<br />

We propose a novel word spotting system for Urdu words within handwritten text lines. Spatial information of diacritics<br />

is integrated to the detection of the main connected components in candidate words generation. An Urdu word recognition<br />

system is effectively designed and applied to classify the candidate words. In this word recognition system, compound<br />

features and SVM were adapted. The verification/rejection process was based on the outputs from the Urdu word recognition<br />

system and the image’s global features were applied to achieve a promising result. As a result, a high 92.11% correct<br />

segmentation rate, a 50.75% word spotting precision rate were achieved while maintaining a 70.1% recall on CENPARMI’s<br />

Urdu Database.<br />

13:30-16:30,Paper TuBCT9.42<br />

Computer Assisted Transcription of Text Images: Results on the GERMANA Corpus and Analysis of Improvements<br />

Needed for Practical Use<br />

Romero Gomez, Verónica, Univ. Pol. De Valencia<br />

Toselli, Alejandro Héctor, Univ. Pol. De Valencia<br />

Vidal, Enrique, Univ. Pol. De Valencia<br />

We present a study of the application of Computer Assisted Transcription of Text Images (CATTI) to a task which is much<br />

closer to real applications than other tasks previously studied. The new task consists in the transcription of a new publicly<br />

available historic handwritten document, called GERMANA. A detailed analysis of the main factors influencing the system<br />

performance are exposed and some strategies to circumvent them are proposed.<br />

13:30-16:30,Paper TuBCT9.43<br />

OCR Post-Processing using Weighted Finite-State Transducers<br />

Llobet, Rafael, Univ. Pol. De Valencia<br />

Navarro Cerdán, José Ramón, Univ. Pol. De Valencia<br />

Perez-Cortes, Juan-Carlos, Univ. Pol. De Valencia<br />

Arlandis, Joaquim, Univ. Pol. De Valencia<br />

A new approach for Stochastic Error-Correcting Language Modeling based on Weighted Finite-State Transducers (WFSTs)<br />

is proposed as a method to post-process the results of an optical character recognizer (OCR). Instead of using the recognized<br />

- 152 -


string as an input to the transducer, in our approach the complete set of OCR hypotheses, a sequence of vectors of a posteriori<br />

class probabilities, is used to build a WFST that is then composed with independent WFSTs for the error and language<br />

models. This combines the practical advantages of a de-coupled (OCR + post-processor) model with the full power<br />

of an integrated model.<br />

13:30-16:30,Paper TuBCT9.44<br />

Top down Analysis of Line Structure in Handwritten Documents<br />

Kasiviswanathan, Harish, Univ. at Buffalo<br />

Ball, Gregory R., Univ. at Buffalo<br />

Srihari, Sargur, Univ. at Buffalo<br />

One of the most challenging tasks in analyzing handwritten documents is to tackle the inherent skew that is introduced<br />

due to writer’s handwriting, segment the handwritten lines and estimate the skew angle and its direction. Complexities<br />

such as variable spacing between words and lines, variable line skew, variable line width and height, overlapping words<br />

and lines etc. arises in handwritten documents unlike printed documents. This paper explores the application of Radon<br />

transform to process handwritten documents and compares its performance with Hough transform while segmenting lines<br />

and detecting skew. The computational advantage of Radon transform over Hough transform with equally good results<br />

makes it an ideal choice to process handwritten documents.<br />

13:30-16:30,Paper TuBCT9.45<br />

Unsupervised Evaluation Methods based on Local Gray-Intensity Variances for Binarization of Historical Documents<br />

Ramírez-Ortegón, Marte Alejandro, Freie Univ. Berlin<br />

Rojas, Raul, Freie Univ. Berlin<br />

We attempt to evaluate the efficacy of six unsupervised evaluation method to tune Sauvola’s threshold in optical character<br />

recognition (OCR) applications. We propose local implementations of well-known measures based on gray-intensity variances.<br />

Additionally, we derive four new measures from them using the unbiased variance estimator and gray-intensity<br />

logarithms. In our experiment, we selected the well binarized images, according each measure, and computed the accuracy<br />

of the recognized text of each. The results show that the weighted and uniform variance (using logarithms) are suitable<br />

measures for OCR applications.<br />

13:30-16:30,Paper TuBCT9.46<br />

On the Significance of Stroke Size and Position for Online Handwritten Devanagari Word Recognition: An Empirical<br />

Study<br />

Bharath, A, Hewlett-Packard Lab. India<br />

Madhvanath, Sriganesh, Hewlett-Packard Lab. India<br />

Stroke size and position are considered as important information for online recognition of handwritten characters and<br />

words in oriental and Indic family of scripts especially because of their multi-stroke and two-dimensional nature. In an<br />

Indic script such as Devanagari, the vowel diacritics (matras) can occur at any position around the base consonant and<br />

there are even pairs of matras which have similar shapes and differ only in their position with respect to the base consonant.<br />

In this paper, we study the relevance of stroke size and position information for the recognition of online handwritten Devanagari<br />

words by comparing three different preprocessing schemes. Our experimental results indicate that the word recognition<br />

accuracy achieved using a preprocessing scheme that completely disregards the original sizes and positions of the<br />

strokes (and symbols) is comparable with the scheme that retains them, when the input is in discrete style, and contextual<br />

knowledge in the form of a lexicon is available.<br />

13:30-16:30,Paper TuBCT9.47<br />

Noise Tolerant Script Identification of Oriental and English Documents using a Downgraded Pixel Density Feature<br />

Wang, Ning, Concordia Univ.<br />

Lam, Louisa, Concordia Univ.<br />

Suen, Ching Y.<br />

Document Script Identification (DSI) is a very useful application in document processing. This paper presents a method<br />

for this application that uses a new noise tolerant feature, the Downgraded Pixel Density feature. Compared to other<br />

- 153 -


features widely used in existing DSI solutions, this new feature is much more robust to variations in slant, font and style<br />

of printed documents. Experimental results show that the method achieves promising identification performances.<br />

13:30-16:30,Paper TuBCT9.48<br />

Using Spatial Relations for Graphical Symbol Description<br />

K. C., Santosh, INRIA – LORIA and INPL<br />

Wendling, Laurent, Univ. Paris Descartes<br />

Lamiroy, Bart, LORIA – INPL<br />

In this paper, we address the use of unified spatial relations for symbol description. We present a topologically guided directional<br />

relation signature. It references a unique point set instead of one entity in a pair, thus avoiding problems related<br />

to erroneous choices of reference entities and preserves symmetry. We experimentally validate our method on showing its<br />

ability to serve in a symbol retrieval application, based only on a spatial relational descriptor that represents the links between<br />

the decomposed structural patterns called “vocabulary” in a spatial relational graph.<br />

13:30-16:30,Paper TuBCT9.49<br />

Automatic Discrimination between Confusing Classes with Writing Styles Verification in Arabic Handwritten Numeral<br />

Recognition<br />

He, Chun Lei, Concordia Univ.<br />

Lam, Louisa, Concordia Univ.<br />

Suen, Ching Y.<br />

In handwriting recognition, confusing/conflicting writing styles can result in irreducible errors, so the study of writing<br />

style consistencies is important for applications. In Arabic Handwritten Numeral Recognition, most errors occur between<br />

samples of classes two and three due to their very similar shapes in some writing styles. In this paper, an automated writing<br />

style detection process is effectively implemented in the pair-wise verification of samples in these two classes. As a result,<br />

the recognition results have improved significantly with a reduction by 25% of previous errors. With rejection, when the<br />

LDA (Linear Discriminant Analysis) measurement rejection threshold is adjusted to maintain the same error rate, the<br />

recognition rate increases from 96.87% to 97.81%.<br />

13:30-16:30,Paper TuBCT9.50<br />

Random Subspace Method in Text Categorization<br />

Gangeh, Mehrdad, Univ. of Waterloo<br />

Kamel, Mohamed S, Univ. of Waterloo<br />

Duin, Robert, TU Delft<br />

In text categorization (TC), which is a supervised technique, a feature vector of terms or phrases is usually used to represent<br />

the documents. Due to the huge number of terms in even a moderate-size text corpus, high dimensional feature space is<br />

an intrinsic problem in TC. Random subspace method (RSM), a technique that divides the feature space to smaller ones<br />

each submitted to a (base) classifier (BC) in an ensemble, can be an effective approach to reduce the dimensionality of the<br />

feature space. Inspired by a similar research on functional magnetic resonance imaging (fMRI) of brain, here we address<br />

the estimation of ensemble parameters, i.e., the ensemble size (L) and the dimensionality of feature subsets (M) by defining<br />

three criteria: usability, coverage, and diversity of the ensemble. We will show that relatively medium M and small L yield<br />

an ensemble that improves the performance of a single support vector machine, which is considered as the state-of-the-art<br />

in TC.<br />

13:30-16:30,Paper TuBCT9.51<br />

Shape-DNA: Effective Character Restoration and Enhancement for Arabic Text Documents<br />

Caner, Gulcin, Pol. Rain, Inc.<br />

Haritaoglu, Ismail, Pol. Rain, Inc.<br />

We present a novel learning-based image restoration and enhancement technique for improving character recognition performance<br />

of OCR products for degraded documents or documents/text captured with mobile devices such as cameraphones.<br />

The proposed technique is language independent and can simultaneously increase the effective resolution and<br />

restore broken characters with artifacts due to image capturing device such as a low quality/low resolution camera, or due<br />

- 154 -


to previous pre-processing such as extracting text region from the document image. The proposed technique develops a<br />

predictive relationship between high-resolution training images and their low-resolution/degraded counterparts, and exploits<br />

this relationship in a probabilistic scheme to generate a high resolution image from a low quality, low-resolution text<br />

image. We present a fast and scalable implementation of the proposed character restoration algorithm to improve the text<br />

recognition for document/text images captured by mobile phones. Experimental results demonstrate that the system effectively<br />

increases OCR performance for documents captured by mobile imaging devices, from levels of 50% to levels of<br />

over 80% for non-latin document/scene text images at 120dpi.<br />

13:30-16:30,Paper TuBCT9.52<br />

Linguistic Adaptation for Whole-Book Recognition<br />

Xiu, Pingping, Lehigh Univ.<br />

Baird, Henry, Lehigh Univ.<br />

Whole-<strong>book</strong> recognition is a document image analysis strategy that operates on the complete set of a <strong>book</strong>’s page images<br />

using automatic adaptation to improve accuracy. Our algorithm expects to be given approximate iconic and linguistic<br />

models—-derived from (generally errorful) OCR results and (generally incomplete) dictionaries—-and then, guided entirely<br />

by evidence internal to the test set, corrects the models yielding improved accuracy. The iconic model describes<br />

image formation and determines the behavior of a character-image classifier. The linguistic model describes word-occurrence<br />

probabilities. In previous work, we reported that adapting the iconic model alone (with a perfect linguistic model)<br />

was able to automatically reduce word error rate on a 180-page <strong>book</strong> by a large factor. In this paper, %we also adapt the<br />

linguistic model. We propose an algorithm that adapts both the iconic model and the linguistic model alternately to improve<br />

both models on the fly. The linguistic model adaptation method, which we report here, identifies new words and adds<br />

them to the dictionary. With 64.6% words in the dictionary missing, our previous algorithm reduced word error rate from<br />

40.2% to 23.2%. The new algorithm drives word error rate down further from 23.2% to 16.0%.<br />

13:30-16:30,Paper TuBCT9.53<br />

Online Arabic Handwriting Modeling System based on the Grapheme Segmentation<br />

Boubaker, Houcine, Univ. of Sfax<br />

El Baati, Abed El Karim, Univ. of Sfax<br />

Kherallah, Monji, Univ. of Sfax<br />

Alimi, Adel. M., Univ. of Sfax<br />

El Abed, Haikal, Braunschweig Tech. Univ.<br />

We present in this paper a new approach of online Arabic handwriting modeling based on the graphemes segmentation.<br />

This segmentation rests on the previous detection of baseline. It involves the detection of two types of topologically meaningful<br />

points: the backs of the valleys adjoining the baseline and the angular points. The stage of features extraction<br />

allows to model the shapes of segmented graphemes by relevant geometric parameters and to estimate their diacritics<br />

fuzzy affectation rates. The test results show a significant improvement in recognition rate with the introduction of new<br />

pertinent parameters.<br />

- 155 -


- 156 -


Technical Program for Wednesday<br />

August 25, <strong>2010</strong><br />

- 157 -


- 158 -


WeAT1 Marmara Hall<br />

Tracking and Surveillance - II Regular Session<br />

Session chair: Yilmaz, Alper (The Ohio State Univ.)<br />

09:00-09:20, Paper WeAT1.1<br />

The Fusion of Deep Learning Architectures and Particle Filtering Applied to Lip Tracking<br />

Carneiro, Gustavo, Tech. Univ. of Lisbon<br />

Nascimento, Jacinto, Inst. de Sistemas e Robótica<br />

This work introduces a new pattern recognition model for segmenting and tracking lip contours in video sequences. We<br />

formulate the problem as a general nonrigid object tracking method, where the computation of the expected segmentation<br />

is based on a filtering distribution. This is a difficult task because one has to compute the expected value using the whole<br />

parameter space of segmentation. As a result, we compute the expected segmentation using sequential Monte Carlo sampling<br />

methods, where the filtering distribution is approximated with a proposal distribution to be used for sampling. The<br />

key contribution of this paper is the formulation of this proposal distribution using a new observation model based on<br />

deep belief networks and a new transition model. The efficacy of the model is demonstrated in publicly available databases<br />

of video sequences of people talking and singing. Our method produces results comparable to state-of-the-art models, but<br />

showing potential to be more robust to imaging conditions.<br />

09:20-09:40, Paper WeAT1.2<br />

Robust Head-Shoulder Detection by PCA-Based Multilevel HOG-LBP Detector for People Counting<br />

Zeng, Chengbin, Beijing Univ. of Posts and Telecommunications<br />

Ma, Huadong, Beijing Univ. of Posts and Telecommunications<br />

Robustly counting the number of people for surveillance systems has widespread applications. In this paper, we propose<br />

a robust and rapid head-shoulder detector for people counting. By combining the multilevel HOG (Histograms of Oriented<br />

Gradients) with the multilevel LBP (Local Binary Pattern) as the feature set, we can detect the head-shoulders of people<br />

robustly, even though there are partial occlusions occurred. To further improve the detection performance, Principal Components<br />

Analysis (PCA) is used to reduce the dimension of the multilevel HOG-LBP feature set. Our experiments show<br />

that the PCA based multilevel HOG-LBP descriptors are more discriminative, more robust than the state-of-the-art algorithms.<br />

For the application of the real-time people-flow estimation, we also incorporate our detector into the particle filter<br />

tracking and achieve convincing accuracy<br />

09:40-10:00, Paper WeAT1.3<br />

Adaptive Motion Model for Human Tracking using Particle Filter<br />

Mohammad Hossein Ghaeminia, Mohammad Hossein Ghaeminia, Iran Univ. of Science and Tech.<br />

Shabani, Amir-Hossein, Univ. of Waterloo<br />

Baradaran Shokouhi, Shahriar, Iran Univ. ofScience & Tech.<br />

This paper presents a novel approach to model the complex motion of human using a probabilistic autoregressive moving<br />

average model. The parameters of the model are adaptively tuned during the course of tracking by utilizing the main<br />

varying components of the <strong>pdf</strong> of the target’s acceleration and velocity. This motion model, along with the color histogram<br />

as the measurement model, has been incorporated in the particle filtering framework for human tracking. The proposed<br />

method is evaluated by PETS benchmark in which the targets have non-smooth motion and suddenly change their motion<br />

direction. Our method competes with the state-of-the-art techniques for human tracking in the real world scenario.<br />

10:00-10:20, Paper WeAT1.4<br />

Bayesian GOETHE Tracking<br />

Wirkert, Sebastian, Ec. Centrale de Lyon<br />

Dellandréa, Emmanuel, Ec. Centrale de Lyon<br />

Chen, Liming, Ec. Centrale de Lyon<br />

Occlusions pose serious challenges when tracking multiple targets. By severly changing the measurement, they imply<br />

strong inter-target dependencies. Exact computation of these dependencies is not feasible. The GOETHE approximations<br />

preserve much of the information while staying computationally affordable.<br />

- 159 -


10:20-10:40, Paper WeAT1.5<br />

A Combined Self-Configuring Method for Object Tracking in Colour Video<br />

Rosell Ortega, Juan, Pol. Univ. of Valencia<br />

Andreu-Garcia, Gabriela, Pol. Univ. of Valencia<br />

Rodas-Jordà, Angel, Pol. Univ. of Valencia<br />

Atienza-Vanacloig, Vicente, Pol. Univ. of Valencia<br />

This paper introduces a novel approach to background modelling. We propose using initially a method to extract scene<br />

parameters from a sequence of frames. These parameters, together with an initial background model, are used as a starting<br />

point for a background subtraction method based on fuzzy logic. Our method permits modelling the background and detecting<br />

moving objects in a video sequence without user intervention. The algorithm is designed to work with CIEL*a*b*<br />

coordinates with multi modal support and eludes user parameters or fixed or probabilistic thresholds usually found in the<br />

traditional background subtraction methods. Quantitative and qualitative results obtained with a well-known benchmark<br />

and comparisons with other approaches justify the model.<br />

WeAT2 Dolmabahçe Hall A<br />

Shape Modeling - I Regular Session<br />

Session chair: De Floriani, L.<br />

09:00-09:20, Paper WeAT2.1<br />

A Geometric Invariant Shape Descriptor based on the Radon, Fourier, and Mellin Transforms<br />

Hoang, Thai V., Univ. Nancy 2-LORIA<br />

Tabbone, Salvatore, Univ. Nancy 2-LORIA<br />

A new shape descriptor invariant to geometric transformation based on the Radon, Fourier, and Mellin transforms is proposed.<br />

The Radon transform converts the geometric transformation applied on a shape image into transformation in the<br />

columns and rows of the Radon image. Invariances to translation, rotation, and scaling are obtained by applying 1D Fourier-<br />

Mellin and Fourier transforms on the columns and rows of the shape’s Radon image respectively. Experimental results on<br />

different datasets show the usefulness of the proposed shape descriptor.<br />

09:20-09:40, Paper WeAT2.2<br />

Fundamental Geodesic Deformations in Spaces of Treelike Shapes<br />

Feragen, Aasa, Univ. of Copenhagen<br />

Lauze, Francois, Univ. of Copenhagen<br />

Nielsen, Mads<br />

This paper presents a new geometric framework for analysis of planar treelike shapes for applications such as shape matching,<br />

recognition and morphology, using the geometry of the space of treelike shapes. Mathematically, the shape space is<br />

given the structure of a stratified set which is a quotient of a normed vector space with a metric inherited from the vector<br />

space norm. We give examples of geodesic paths in tree-space corresponding to fundamental deformations of small trees,<br />

and discuss how these deformations are key building blocks for understanding deformations between larger trees.<br />

09:40-10:00, Paper WeAT2.3<br />

Shape Interpolation with Flattenings<br />

Meyer, Fernand, Mines-ParisTech<br />

This paper presents the binary flattenings of shapes, first as a connected operator suppressing particles or holes, second as<br />

an erosion in a particular lattice of shapes. Using this erosion, it is then possible to construct a distance from a shape to<br />

another and derive from it an interpolation function between shapes.<br />

10:00-10:20, Paper WeAT2.4<br />

Circularity Measuring in Linear Time<br />

Nguyen, Thanh Phuong, LORIA<br />

Debled-Rennesson, Isabelle, LORIA - Nancy Univ.<br />

We propose a new circularity measure inspired from Arkin \cite{Arkin91}, Latecki \cite{Latecki00} tools of shape match-<br />

- 160 -


ing that is constructed in a tangent space. We then introduce a linear algorithm that uses this measure for circularity measuring.<br />

This method can also be regarded as a method for circular object recognition. Experimental results show the robustness<br />

of this simple method.<br />

10:20-10:40, Paper WeAT2.5<br />

Multiscale Analysis from 1D Parametric Geometric Decomposition of Shapes<br />

Feschet, Fabien, Univ. d’Auvergne Clermont-Ferrand 1<br />

This paper deals with the construction of a non parametric multiscale analysis from a 1D parametric decomposition of<br />

shapes where the elements of the decomposition are geometric primitives. We focus on the case of linear structures in<br />

shapes but our construction readily extends to the case of any geometric primitives. One key point of the construction is<br />

that it is truly multiscale in the sense that a higher level is a sublevel of a lower one and that it preserves symmetries of<br />

shapes. We made some experiments to show the simplification it provides on classical shapes. Results are promising.<br />

WeAT3 Dolmabahçe Hall B<br />

Image and Physics-Based Modeling Regular Session<br />

Session chair: Heyden, Anders (Lund Univ.)<br />

09:00-09:20, Paper WeAT3.1<br />

Region-Based Image Transform for Transition between Object Appearances<br />

Takahashi, Tomokazu, Gifu Shotoku Gakuen Univ.<br />

Kono, Yuki, Nagoya Univ.<br />

Ide, Ichiro, Nagoya Univ.<br />

Murase, Hiroshi, Nagoya Univ.<br />

We propose a method of region-based image transform to achieve accurate transition between object appearances. A viewtransition<br />

model (VTM) is one of the statistical methods that learn appearance transition from a sample image dataset of<br />

a large number of objects with various appearances. However, the VTM method has a practical problem that the appearance<br />

transition cannot be performed accurately if a sufficient number of learning samples is not available in the dataset. To<br />

cope with the problem, the proposed method first determines the regions of input and output images whose pixel values<br />

mutually affect each other during appearance transition, then transforms iteratively between partial images in the regions.<br />

We conducted experiments using actual image datasets. The results show that the proposed method could accurately transform<br />

appearances compared with the VTM method.<br />

09:20-09:40, Paper WeAT3.2<br />

Extended Multiple View Geometry for Lights and Cameras from Photometric and Geometric Constraints<br />

Kato, Kazuki, Nagoya Inst. of Tech.<br />

Sakaue, Fumihiko, Nagoya Inst. of Tech.<br />

Sato, Jun, Nagoya Inst. of Tech.<br />

In this paper, we derive a novel multilinear relationship for close light sources and cameras. In this multilinear relationship,<br />

image intensities and image point coordinates can be handled in a single framework. We first derive a linear representation<br />

of image intensity taken under a general close light source. We next analyze multiple view geometry among close light<br />

sources and cameras, and derive novel multilinear constraints among image intensity and image coordinates. In particular,<br />

we study the detail of the multilinear relationship among 7 lights and a camera. Finally, we show some experimental<br />

results, and show that the new multilinear relationship can be used for linearly generating images illuminated by arbitrary<br />

close light sources.<br />

09:40-10:00, Paper WeAT3.3<br />

Near-Regular BTF Texture Model<br />

Haindl, Michael, Inst. of Information Theory and Automation<br />

Hatka, Martin, Inst. of Information Theory and Automation<br />

In this paper we present a method for seamless enlargement and editing of intricate near-regular type of bidirectional<br />

texture function (BTF) which contains simultaneously both regular periodic and stochastic components. Such BTF textures<br />

- 161 -


cannot be convincingly synthesised using neither simple tiling nor using purely stochastic models. However these textures<br />

are ubiquitous in many man-made environments and also in some natural scenes. Thus they are required for their realistic<br />

appearance visualisation. The principle of the presented BTF-NR synthesis and editing method is to automatically separate<br />

periodic and random components from one or more input textures. Each of these components is subsequently independently<br />

modelled using its corresponding optimal method. The regular texture part is modelled using our roller method, while the<br />

random part is synthesised from its estimated exceptionally efficient Markov random field based representation. Both independently<br />

enlarged texture components from the original measured textures representing one (enlargement) or several<br />

(editing) materials are combined in the resulting synthetic near-regular texture.<br />

10:00-10:20, Paper WeAT3.4<br />

Detecting Vorticity in Optical Flows of Fluids<br />

Doshi, Ashish, Univ. of Surrey<br />

Bors, Adrian, Univ. of York<br />

In this paper we apply the diffusion framework to dense optical flow estimation. Local image information is represented<br />

by matrices of gradients between paired locations. Diffusion distances are modelled as sums of eigenvectors weighted by<br />

their eigenvalues extracted following the eigen decomposion of these matrices. Local optical flow is estimated by correlating<br />

diffusion distances characterizing features from different frames. A feature confidence factor is defined based on<br />

the local correlation efficiency when compared to that of its neighbourhood. High confidence optical flow estimates are<br />

propagated to areas of lower confidence.<br />

10:20-10:40, Paper WeAT3.5<br />

Modeling Facial Skin Motion Properties in Video and its Application to Matching Faces across Expressions<br />

Manohar, Vasant, Raytheon BBN Tech.<br />

Shreve, Matthew, Univ. of South Florida<br />

Goldgof, Dmitry, Univ. of South Florida<br />

Sarkar, Sudeep, Univ. of South Florida<br />

In this paper, we propose a method to model the material constants (Young’s modulus) of the skin in subregions of the<br />

face from the motion observed in multiple facial expressions and present its relevance to an image analysis task such as<br />

face verification. On a public database consisting of 40 subjects undergoing some set of facial motions associated with<br />

anger, disgust, fear, happy, sad, and surprise expressions, we present an expression invariant strategy to matching faces<br />

using the Young’s modulus of the skin. Results show that it is indeed possible to match faces across expressions using the<br />

material properties of their skin.<br />

WeAT4 Topkapı Hall A<br />

Kernel Methods Regular Session<br />

Session chair: Aksoy, Selim (Bilkent Univ.)<br />

09:00-09:20, Paper WeAT4.1<br />

AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach<br />

Zhang, Ziming, Simon Fraser Univ.<br />

Li, Ze-Nian, Simon Fraser Univ.<br />

Drew, Mark S.<br />

In this paper, we propose a novel large-margin based approach for multiple kernel learning (MKL) using biconvex optimization,<br />

called Adaptive Multiple Kernel Learning (AdaMKL). To learn the weights for support vectors and the kernel<br />

coefficients, AdaMKL minimizes the objective function alternately by learning one component while fixing the other at a<br />

time, and in this way only one convex formulation needs to be solved. We also propose a family of biconvex objective<br />

functions with an arbitrary Lp-norm (p>=1) of kernel coefficients. As our experiments show, AdaMKL performs comparably<br />

with state-of-the-art convex optimization based MKL approaches, but its learning is much simpler and faster.<br />

- 162 -


09:20-09:40, Paper WeAT4.2<br />

Von Mises-Fisher Mean Shift for Clustering on a Hypersphere<br />

Kobayashi, Takumi, Nat. Inst. of Advanced Industrial Science<br />

Otsu, Nobuyuki, Nat. Inst. of Advanced Industrial Science<br />

We propose a method of clustering sample vectors on a hypersphere. Sample vectors are normalized in many cases, especially<br />

when applying kernel functions, and thus lie on a (unit) hypersphere. Considering the constraint of the hypersphere,<br />

the proposed method utilizes the von Mises-Fisher distribution in the framework of mean shift. It is also extended to the<br />

kernel-based clustering method via kernel tricks to cope with complex distributions. The algorithms of the proposed methods<br />

are based on simple matrix calculations. In the experiments, including a practical motion clustering task, the proposed<br />

methods produce favorable clustering results.<br />

09:40-10:00, Paper WeAT4.3<br />

Nonlinear Mappings for Generative Kernels on Latent Variable Models<br />

Carli, Anna, Univ. of Verona<br />

Bicego, Manuele, Univ. of Verona<br />

Baldo, Sisto, Univ. of Verona<br />

Murino, Vittorio, Univ. of Verona<br />

Generative kernels have emerged in the last years as an effective method for mixing discriminative and generative approaches.<br />

In particular, in this paper, we focus on kernels defined on generative models with latent variables (e.g. the states<br />

in a Hidden Markov Model). The basic idea underlying these kernels is to compare objects, via a inner product, in a feature<br />

space where the dimensions are related to the latent variables of the model. Here we propose to enhance these kernels via<br />

a nonlinear normalization of the space, namely a nonlinear mapping of space dimensions able to exploit their discriminative<br />

characteristics. In this paper we investigate three possible nonlinear mappings, for two HMM-based generative kernels,<br />

testing them in different sequence classification problems, with really promising results.<br />

10:00-10:20, Paper WeAT4.4<br />

Multiple Kernel Learning with High Order Kernels<br />

Wang, Shuhui, Chinese Acad. of Sciences<br />

Jiang, Shuqiang, Chinese Acad. of Sciences<br />

Huang, Qingming, Chinese Acad. of Sciences<br />

Tian, Qi, Univ. of Texas at San Antonio<br />

Previous Multiple Kernel Learning approaches (MKL) employ different kernels by their linear combination. Though some<br />

improvements have been achieved over methods using single kernel, the advantages of employing multiple kernels for<br />

machine learning are far from being fully developed. In this paper, we propose to use high order kernels to enhance the<br />

learning of MKL when a set of original kernels are given. High order kernels are generated by the products of real power<br />

of the original kernels. We incorporate the original kernels and high order kernels into a unified localized kernel logistic<br />

regression model. To avoid over-fitting, we apply group LASSO regularization to the kernel coefficients of each training<br />

sample. Experiments on image classification prove that our approach outperforms many of the existing MKL approaches.<br />

10:20-10:40, Paper WeAT4.5<br />

Kernel-Based Implicit Regularization of Structured Objects<br />

Dupé, François-Xavier, GREYC<br />

Bougleux, Sébastien, Univ. de Caen<br />

Brun, Luc, ENSICAEN<br />

Lezoray, Olivier, Univ. de Caen<br />

Elmoataz, Abderrahim, Univ. de Caen<br />

Weighted Graph regularization provides a rich framework that allows to regularize functions defined over the vertices of a weighted<br />

graph. Until now, such a framework has been only defined for real or multivalued functions hereby restricting the regularization framework<br />

to numerical data. On the other hand, several kernels have been defined on structured objects such as strings or graphs. Using definite<br />

positive kernels, each original object is associated by the ``kernel trick’’ to one element of an Hilbert space. As a consequence, this<br />

paper proposes to extend the weighted graph regularization framework to objects implicitly defined by their kernel hereby performing<br />

the regularization within the Hilbert space associated to the kernel. This work opens the door to the regularization of structured objects.<br />

- 163 -


WeAT5 Topkapı Hall B<br />

Face Analysis Regular Session<br />

Session chair: Lovell, Brian Carrington (The Univ. of Queensland)<br />

09:00-09:20, Paper WeAT5.1<br />

Face Sketch Synthesis via Sparse Representation<br />

Chang, Liang, Beijing Normal Univ.<br />

Zhou, Mingquan, Beijing Normal Univ.<br />

Han, Yanjun, Chinese Acad. of Sciences<br />

Deng, Xiaoming, Chinese Acad. of Sciences<br />

Face sketch synthesis with a photo is challenging due to that the psychological mechanism of sketch generation is difficult<br />

to be expressed precisely by rules. Current learning-based sketch synthesis methods concentrate on learning the rules by<br />

optimizing cost functions with low-level image features. In this paper, a new face sketch synthesis method is presented,<br />

which is inspired by recent advances in sparse signal representation and neuroscience that human brain probably perceives<br />

images using high-level features which are sparse. Sparse representations are desired in sketch synthesis due to that sparseness<br />

can adaptively selects the most relevant samples which give best representations of the input photo. We assume that<br />

the face photo patch and its corresponding sketch patch follow the same sparse representation. In the feature extraction,<br />

we select succinct high-level features by using the sparse coding technique, and in the sketch synthesis process each sketch<br />

patch is synthesized with respect to high-level features by solving an $l_1$-norm optimization. Experiments have been<br />

given on CUHK database to show that our method can resemble the true sketch fairly well.<br />

09:20-09:40, Paper WeAT5.2<br />

Restoration of a Frontal Illuminated Face Image based on KPCA<br />

Xie, Xiaohua, Sun Yat-sen Univ.<br />

Zheng, Wei-Shi, Queen Mary Univ. of London<br />

Lai, Jian-Huang, Sun Yat-sen Univ.<br />

Suen, Ching Y.<br />

In this paper, we propose a novel illumination-normalization method. By using the combination of the Kernel Principal<br />

Component Analysis (KPCA) and Pre-image technology, this method can restore the frontal-illuminated face image from<br />

a single non-frontal-illuminated face image. In this method, a frontal-illumination subspace is first learned by KPCA. For<br />

each input face image, we project its large-scale features, which are affected by illumination variations, onto this subspace<br />

to normalize the illumination. Then the frontal-illuminated face image is reconstructed by combining the small- and the<br />

normalized large- scale features. Unlike most existing techniques, the proposed method does not require any shape modeling<br />

or lighting estimation. As a holistic reconstruction, KPCA+Pre-image technology incurs less local distortion. Compared<br />

to directly applying KPCA+Pre-image technology on the original image, our proposed method can be better at<br />

processing an image of a face that is outside the training set. Experiments on CMU-PIE and Extended Yale B face databases<br />

show that the proposed method outperforms state-of-the-art algorithms.<br />

09:40-10:00, Paper WeAT5.3<br />

A Bayesian Approach to Face Hallucination using DLPP and KRR<br />

Tanveer, Muhammad, National Univ. of Science and Tech.<br />

Rao, Naveed Iqbal, National Univ. of Sciences and Tech.<br />

Low resolution faces are the main barrier to efficient face recognition and identification in several problems primarily<br />

surveillance systems. To mitigate this problem we proposes a novel learning based two-step approach by the use of Direct<br />

Locality Preserving Projections (DLPP), Maximum a posterior estimation (MAP) and Kernel Ridge Regression (KRR)<br />

for super-resolution of face images or in other words Face Hallucination. First using DLPP for manifold learning and<br />

MAP estimation, a smooth Global high resolution image is obtained. In second step to introduce high frequency components<br />

KRR is used to model the Residue high resolution image, which is then added to Global image to get final high<br />

quality detail featured Hallucinated face image. As shown in experimental results the proposed system is robust and<br />

efficient in synthesizing low resolution faces similar to the original high resolution faces.<br />

- 164 -


10:00-10:20, Paper WeAT5.4<br />

Face Hallucination under an Image Decomposition Perspective<br />

Liang, Yan, Sun Yat-sen Univ.<br />

Lai, Jian-Huang, Sun Yat-sen Univ.<br />

Xie, Xiaohua, Sun Yat-sen Univ.<br />

Liu, Wanquan, Curtin Univ. of Tech.<br />

In this paper we propose to convert the task of face hallucination into an image decomposition problem, and then use the<br />

morphological component analysis (MCA) for hallucinating a single face image, based on a novel three-step framework.<br />

Firstly, a low-resolution input image is up-sampled by interpolation. Then, the MCA is employed to decompose the interpolated<br />

image into a high-resolution image and an unsharp masking, as MCA can properly decompose a signal into special<br />

parts according to typical dictionaries. Finally, a residue compensation, which is based on the neighbor reconstruction of<br />

patches, is performed to enhance the facial details. The proposed method can effectively exploit the facial properties for<br />

face hallucination under the image decomposition perspective. Experimental results demonstrate the effectiveness of our<br />

method, in terms of the visual quality of the hallucinated face images.<br />

10:20-10:40, Paper WeAT5.5<br />

Gender Classification using Local Directional Pattern (LDP)<br />

Jabid, Taskeed, Kyung Hee Univ.<br />

Kabir, Md. Hasanul, Kyung Hee Univ.<br />

Chae, Oksam, Kyung Hee Univ.<br />

In this paper, we present a novel texture descriptor Local Directional Pattern (LDP) to represent facial image for gender<br />

classification. The face area is divided into small regions, from which LDP histograms are extracted and concatenated<br />

into a single vector to efficiently represent the face image. The classification is performed by using support vector machines<br />

(SVMs), which had been shown to be superior to traditional pattern classifiers in gender classification problem. Experimental<br />

results show the superiority of the proposed method on the images collected from FERET face database and<br />

achieved 95.05% accuracy.<br />

WeAT6 Anadolu Auditorium<br />

Document Analysis - I Regular Session<br />

Session chair: Baird, Henry (Lehigh Univ.)<br />

09:00-09:20, Paper WeAT6.1<br />

Generating Sets of Classifiers for the Evaluation of Multi-Expert Systems<br />

Impedovo, Donato, Pol. di Bari<br />

Pirlo, Giuseppe, Univ. degli Studi di Bari<br />

This paper addresses the problem of multi-classifier system evaluation by artificially generated classifiers. For the purpose,<br />

a new technique is presented for the generation of sets of artificial abstract-level classifiers with different characteristics<br />

at the individual-level (i.e. recognition performance) and at the collective-level (i.e. degree of similarity). The technique<br />

has been used to generate sets of classifiers simulating different working conditions in which the performance of combination<br />

methods can be estimated. The experimental tests demonstrate the effectiveness of the approach in generating simulated<br />

data useful to investigate the performance of combination methods for abstract-level classifiers.<br />

09:20-09:40, Paper WeAT6.2<br />

Imbalance and Concentration in K-NN Classification<br />

Yin, Dawei, Lehigh Univ.<br />

An, Chang, Lehigh Univ.<br />

Baird, Henry, Lehigh Univ.<br />

We propose algorithms for ameliorating difficulties in fast approximate k Nearest Neighbors (kNN) classifiers that arise<br />

from imbalances among classes in numbers of samples, and from concentrations of samples in small regions of feature<br />

space. These problems can occur with a wide range of binning kNN algorithms such as k-D trees and our variant, hashed<br />

k-D trees. The principal method we discuss automatically rebalances training data and estimates concentration in each Kd<br />

hash bin separately, which then controls how many samples should be kept in each bin. We report an experiment on<br />

- 165 -


86.7M training samples which shows a 7-times speedup and higher minimum per-class recall, compared to previously reported<br />

methods. The context of these experiments is the need for image classifiers able to handle an unbounded variety of<br />

inputs: in our case, highly versatile document classifiers which require training sets as large as a billion training samples.<br />

09:40-10:00, Paper WeAT6.3<br />

Gaussian Mixture Models for Arabic Font Recognition<br />

Slimane, Fouad, Univ. of Fribourg<br />

Kanoun, Slim, ENIS<br />

Alimi, Adel M., Univ. of Sfax<br />

Ingold, Rolf, Univ. of Fribourg<br />

Hennebert, Jean, Univ. of Applied Sciences<br />

We present in this paper a new approach for Arabic font recognition. Our proposal is to use a fixed-length sliding window<br />

for the feature extraction and to model feature distributions with Gaussian Mixture Models (GMMs). This approach presents<br />

a double advantage. First, we do not need to perform a priori segmentation into characters, which is a difficult task<br />

for arabic text. Second, we use versatile and powerful GMMs able to model finely distributions of features in large multidimensional<br />

input spaces. We report on the evaluation of our system on the APTI (Arabic Printed Text Image) database<br />

using 10 different fonts and 10 font sizes. Considering the variability of the different font shapes and the fact that our<br />

system is independent of the font size, the obtained results are convincing and compare well with competing systems.<br />

10:00-10:20, Paper WeAT6.4<br />

Transfer of Supervision for Improved Address Standardization<br />

Kothari, Govind, IBM<br />

Faruquie, Tanveer, IBM Res. India<br />

Subramaniam, L. Venkata, IBM Res. India<br />

K, Hima Prasad, IBM Res. India<br />

Mohania, Mukesh, IBM Res. India<br />

Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners<br />

can be easily trained for different data sources. However, training requires labeling large corpora for each data source<br />

which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a<br />

given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data<br />

from one source. The shared component distribution across these dirichlet processes captures the semantic relation between<br />

data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision.<br />

10:20-10:40, Paper WeAT6.5<br />

Bag of Characters and SOM Clustering for Script Recognition and Writer Identification<br />

Marinai, Simone, Univ. of Florence<br />

Miotti, Beatrice, Univ. of Florence<br />

Soda, Giovanni, Univ. di Firenze<br />

In this paper, we describe a general approach for script (and language) recognition from printed documents and for writer<br />

identification in handwritten documents. The method is based on a bag of visual word strategy where the visual words<br />

correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM). Unknown pages (words<br />

in the case of script recognition) are classified comparing their vectorial representations with those of one training set<br />

using a cosine similarity. The comparison is improved using a similarity score that is obtained taking into account the<br />

SOM organization of cluster centroids. % Promising results are presented for both printed documents and handwritten<br />

musical scores.<br />

WeAT7 Dolmabahçe Hall C<br />

Gait and Gesture Regular Session<br />

Session chair: Shinoda, Koichi (Tokyo Institute of Technology)<br />

- 166 -


09:00-09:20, Paper WeAT7.1<br />

Multi-View Gait Recognition based on Motion Regression using Multilayer Perceptron<br />

Kusakunniran, Worapan, Univ. of New South Wales<br />

Wu, Qiang, Univ. of Tech. Sydney<br />

Zhang, Jian, National ICT Australia<br />

Li, Hongdong, Australian National Univ.<br />

It has been shown that gait is an efficient biometric feature for identifying a person at a distance. However, it is a challenging<br />

problem to obtain reliable gait feature when viewing angle changes because the body appearance can be different under<br />

the various viewing angles. In this paper, the problem above is formulated as a regression problem where a novel View<br />

Transformation Model (VTM) is constructed by adopting Multilayer Perceptron (MLP) as regression tool. It smoothly estimates<br />

gait feature under an unknown viewing angle based on motion information in a well selected Region of Interest<br />

(ROI) under other existing viewing angles. Thus, this proposal can normalize gait features under various viewing angles<br />

into a common viewing angle before gait similarity measurement is carried out. Encouraging experimental results have<br />

been obtained based on widely adopted benchmark database.<br />

09:20-09:40, Paper WeAT7.2<br />

Robust Gait Recognition against Speed Variation<br />

Aqmar, Muhammad Rasyid, Tokyo Inst. of Tech.<br />

Shinoda, Koichi, Tokyo Inst. of Tech.<br />

Furui, Sadaoki<br />

Variations in walking speed have a strong impact on the recognition of gait. We propose a method of recognition of gait<br />

that is robust against walking-speed variations. It is established on a combination of Fisher discriminant analysis (FDA)based<br />

cubic higher-order local auto-correlation (CHLAC) and the statistical framework provided by hidden Markov models<br />

(HMMs). The HMMs in this method identify the phase of each gait even when walking speed changes nonlinearly, and<br />

the CHLAC features capture the within-phase spatio-temporal characteristics of each individual. We compared the performance<br />

of our method with other conventional methods in our evaluation using three different databases, i.e., USH,<br />

USF-NIST, and Tokyo Tech DB. Ours was equal or better than the others when the speed did not change too much, and<br />

was significantly better when the speed varied across and within a gait sequence.<br />

09:40-10:00, Paper WeAT7.3<br />

Gait Recognition using Period-Based Phase Synchronization for Low Frame-Rate Videos<br />

Mori, Atsushi, Osaka Univ.<br />

Makihara, Yasushi, The Inst. of Scientific and Industrial Res. Univ.<br />

Yagi, Yasushi, Osaka Univ.<br />

This paper proposes a method for period-based gait trajectory matching in the eigenspace using phase synchronization for<br />

low frame-rate videos. First, a gait period is detected by maximizing the normalized autocorrelation of the gait silhouette<br />

sequence for the temporal axis. Next, a gait silhouette sequence is expressed as a trajectory in the eigenspace and the gait<br />

phase is synchronized by time stretching and time shifting of the trajectory based on the detected period. In addition, multiple<br />

period-based matching results are integrated via statistical procedures for more robust matching in the presence of<br />

fluctuations among gait sequences. Results of experiments conducted with 185 subjects to evaluate the performance of<br />

the gait verification with various spatial and temporal resolutions, demonstrate the effectiveness of the proposed method.<br />

10:00-10:20, Paper WeAT7.4<br />

Body Motion Analysis for Multi-Modal Identity Verification<br />

Williams, George, NYU<br />

Taylor, Graham, NYU<br />

Smolskiy, Kirill, NYU<br />

Bregler, Christoph, NYU<br />

This paper shows how Body Motion Signature Analysis a new soft-biometrics technique can be used for identity verification.<br />

It is able to extract motion features from the upper body of people and estimates so called super-features for input<br />

to a classifier. We demonstrate how this new technique can be used to identify people just based on their motion, or it can<br />

be used to significantly improve hard-biometrics techniques. For example, face verification achieves on this domain 6.45%<br />

- 167 -


Equal Error Rate (EER), and the combined verification performance of motion features and face reduces the error to 4.96%<br />

using an adaptive score-level integration method. The more ambiguous motion-only performance is 17.1% EER.<br />

10:20-10:40, Paper WeAT7.5<br />

Robust Sign Language Recognition with Hierarchical Conditional Random Fields<br />

Yang, Hee-Deok, Chosun Univ.<br />

Lee, Seong-Whan, Korea Univ.<br />

Sign language spotting is the task of detection and recognition of signs (words in the predefined vocabulary) and fingerspellings<br />

(a combination of continuous alphabets that are not found in signs) in a signed utterance. The internal structures<br />

of signs and fingerspellings differ significantly. Therefore, it is difficult to spot signs and fingerspellings simultaneously.<br />

In this paper, a novel method for spotting signs and fingerspellings is proposed, which can distinguish signs, fingerspellings,<br />

and nonsign patterns. This is achieved through a hierarchical framework consisting of three steps; (1) Candidate segments<br />

of signs and fingerspellings are discriminated with a two-layer conditional random field (CRF). (2) Hand shapes of detected<br />

signs and fingerspellings are verified by BoostMap embeddings. (3) The motions of fingerspellings are verified in order<br />

to distinguish those which have similar hand shapes and differ only in hand trajectories. Experiments demonstrate that the<br />

proposed method can spot signs and fingerspellings from utterance data at rates of 83% and 78%, respectively.<br />

WeAT8 Upper Foyer<br />

Image and Video Processing Poster Session<br />

Session chair: Koch, Reinhard (Univ. of Kiel)<br />

09:00-11:10, Paper WeAT8.1<br />

Compressive Sampling Recovery for Natural Images<br />

Shang, Fei, Beijing Inst. of Tech.<br />

Du, Huiqian, Beijing Inst. of Tech.<br />

Jia, Yunde, Beijing Inst. of Tech.<br />

Compressive sampling (CS) is a novel data collection and coding theory which allows us to recover sparse or compressible<br />

signals from a small set of measurements. This paper presents a new model for natural image recovery, in which the smooth<br />

l0 norm and the approximate total-variation (TV) norm are adopted simultaneously. By using one-order gradient decrease,<br />

the speed of algorithm for this new model can be guaranteed. Experimental results demonstrate that the principle of the<br />

model is correct and the performance is as good as that based on TV model. The computing speed of the proposed method<br />

is two orders of magnitude faster than that of interior point method and two times faster than that of the Nesta optimization<br />

based on TV model.<br />

09:00-11:10, Paper WeAT8.3<br />

De-Ghosting for Image Stitching with Automatic Content-Awareness<br />

Tang, Yu, The Univ. of Aizu<br />

Shin, Jungpil, The Univ. of Aizu<br />

Ghosting artifact in the field of image stitching is a common problem and the elimination of it is not an easy task. In this<br />

paper, we propose an intuitive technique according to a stitching line based on a novel energy map which is essentially a<br />

combination of gradient map which indicates the presence of structures and prominence map which determines the attractiveness<br />

of a region. We consider a region is of significance only if it is both structural and attractive. Using this improved<br />

energy map, the stitching line can easily skirt around the moving objects or salient parts based on the philosophy that<br />

human eyes mostly notice only the salient features of an image. We compare result of our method to those of 4 state-ofthe-art<br />

image stitching methods and it turns out that our method outperforms the 4 methods in removing ghosting artifacts.<br />

09:00-11:10, Paper WeAT8.4<br />

Content-Adaptive Automatic Image Sharpening<br />

Kobayashi, Tatsuya, Nagoya City Univ.<br />

Tajima, Johji, Nagoya City Univ.<br />

- 168 -


Optimal sharpness differs from image to image, de-pending on the content. In general, human observer prefers images of<br />

artifacts sharper and those of natural-objects less sharper. We have developed a content-adaptive automatic image sharpening<br />

algorithm that relies on the length of lines extracted from the image. It is applicable to images with various regions<br />

such as those contain natural and artificial objects. The proposed algorithm is expected to be used in image processing<br />

modules of image input/output devices, e.g. digital cameras, printers, etc.<br />

09:00-11:10, Paper WeAT8.5<br />

Irradiance Preserving Image Interpolation<br />

Giachetti, Andrea, Univ. di Verona<br />

In this paper we present a new image up scaling (single image super resolution) algorithm. It is based on the refinement<br />

of a simple pixel decimation followed by an optimization step maximizing the smoothness of the second order derivatives<br />

of the image intensity while keeping the sum of the brightness values of each subdivided pixel (i.e. the estimated irradiance<br />

on the area) constant. The method is physically grounded and creates images that appear very sharp and with reduced artifacts.<br />

Subjective and objective tests demonstrate the high quality of the results obtained.<br />

09:00-11:10, Paper WeAT8.7<br />

Interpolation and Sampling on a Honeycomb Lattice<br />

Strand, Robin, Uppsala Univ.<br />

In this paper, we focus on the three-dimensional honeycomb point-lattice in which the Voronoi regions are hexagonal<br />

prisms. The ideal interpolation function is derived by using a Fourier transform of the sampling lattice. From these results,<br />

the sampling efficiency of the lattice follows.<br />

09:00-11:10, Paper WeAT8.8<br />

Optimization of Topological Active Models with Multiobjective Evolutionary Algorithms<br />

Novo Buján, Jorge, Varpa group, Univ. of A Coruña<br />

Santos, Jose, Univ. of A Coruña<br />

Gonzalez Penedo, Manuel Francisco, Univ. of A Coruña<br />

Fernández Arias, Alba, VARPA Group, Univ. of A Coruña<br />

In this work we use the evolutionary multiobjective methodology for the optimization of topological active models, a deformable<br />

model that integrates features of region-based and boundary-based segmentation techniques. The model deformation<br />

is controlled by energy functions that must be minimized. As in other deformable models, a correct segmentation<br />

is achieved through the optimization of the model, governed by energy parameters that must be experimentally tuned.<br />

Evolutionary multiobjective optimization gives a solution to this problem by considering the optimization of several objectives<br />

in parallel. Concretely, we use the SPEA2 algorithm, adapted to our application, the search of the Pareto optimal<br />

individuals. The proposed method was tested on several representative images from different domains yielding highly accurate<br />

results.<br />

09:00-11:10, Paper WeAT8.9<br />

Fast Super-Resolution using Weighted Median Filtering<br />

Nasonov, Andrey, Lomonosov Moscow State Univ.<br />

Krylov, Andrey S., Lomonosov Moscow State Univ.<br />

A non-iterative method of image super-resolution based on weighted median filtering with Gaussian weights is proposed.<br />

Visual tests and basic edges metrics were used to examine the method. It was shown that the weighted median filtering<br />

reduces the errors caused by inaccurate motion vectors.<br />

- 169 -


09:00-11:10, Paper WeAT8.10<br />

Geodesic Thin Plate Splines for Image Segmentation<br />

Lombaert, Herve, Ec. Pol. de Montreal<br />

Cheriet, Farida, Ec. Pol. de Montreal<br />

Thin Plate Splines are often used in image registration to model deformations. Its physical analogy involves a thin lying<br />

sheet of metal that is deformed and forced to pass through a set of control points. The Thin Plate Spline equation minimizes<br />

that thin plate bending energy. Rather than using Euclidean distances between control points for image deformation, we<br />

are using geodesic distances for image segmentation. Control points become seed points and force the thin plate to pass<br />

through given heights. Intuitively, the thin plate surface in the vicinity of a seed point within a region should have similar<br />

heights. The minimally bended thin plate actually gives a “confidence” map telling what the closest seed point is for every<br />

surface point. The Thin Plate Spline has a closed-form solution which is fast to compute and global optimal. This method<br />

shows comparable results to the Graph Cuts method.<br />

09:00-11:10, Paper WeAT8.11<br />

Gestures and Lip Shape Integration for Cued Speech Recognition<br />

Heracleous, Panikos, Advanced Telecommunications Res. Inst. International<br />

Beautemps, Denis, Gipsa-Lab.<br />

Hagita, Norihiro, Advanced Telecommunications Res. Inst. International<br />

In this article, automatic recognition of Cued Speech in French based on hidden Markov models (HMMs) is presented.<br />

Cued Speech is a visual mode, which uses hand shapes in different positions and in combination with lip-patterns of speech<br />

makes all the sounds of spoken language clearly understandable to deaf and hearing-impaired people. The aim of Cued<br />

Speech is to overcome the problems of lipreading and thus enable deaf children and adults to understand full spoken language.<br />

In this study, lip shape component is fused with hand component using also multistream HMM decision fusion to<br />

realize Cued Speech recognition, and continuous phoneme recognition experiments using data from a normal-hearing and<br />

a deaf cuer were conducted. In the case of the normal-hearing cuer, the obtained phoneme accuracy was 83.5%, and in the<br />

case of the deaf cuer 82.1%.<br />

09:00-11:10, Paper WeAT8.12<br />

IFLT based Real-Time Framework for Image-Matching<br />

Janney, Pranam, Univ. of New South Wales<br />

Geers, Glenn, National ICT Australia<br />

In this paper we show that the features generated by the recently presented Invariant Features of Local Textures (IFLT)<br />

technique can be used in a SIFT like framework to deliver real-time point wise image matching with performance comparable<br />

to existing state-of-the-art image matching systems. The proposed framework is also capable of saving considerable<br />

amount of computation time.<br />

09:00-11:10, Paper WeAT8.13<br />

High-Order Circular Derivative Pattern for Image Representation and Recognition<br />

Zhao, Sanqiang, Griffith Univ. / National ICT Australia<br />

Gao, Yongsheng, Griffith Univ.<br />

Caelli, Terry, National ICT Australia<br />

Micropattern based image representation and recognition, e.g. Local Binary Pattern (LBP), has been proved successful<br />

over the past few years due to its advantages of illumination tolerance and computational efficiency. However, LBP only<br />

encodes the first-order radial-directional derivatives of spatial images and is inadequate to completely describe the discriminative<br />

features for classification. This paper proposes a new Circular Derivative Pattern (CDP) which extracts highorder<br />

derivative information of images along circular directions. We argue that the high-order circular derivatives contain<br />

more detailed and more discriminative information than the first-order LBP in terms of recognition accuracy. Experimental<br />

evaluation through face recognition on the FERET database and insect classification on the NICTA Biosecurity Dataset<br />

demonstrated the effectiveness of the proposed method.<br />

- 170 -


09:00-11:10, Paper WeAT8.14<br />

Automatic Face Replacement in Video based on 2D Morphable Model<br />

Min, Feng, WuHan Inst. of Tech.<br />

Sang, Nong, Huazhong Univ. of Science and Tech.<br />

Wang, Zhefu, Wuhan Inst. of Tech.<br />

This paper presents an automatic face replacement approach in video based on 2D morphable model. Our approach includes<br />

three main modules: face alignment, face morph, and face fusion. Given a source image and target video, the Active Shape<br />

Models (ASM) is adopted to source image and target frames for face alignment. Then the source face shape is warped to<br />

match the target face shape by a 2D morphable model. The color and lighting of source face are adjusted to keep consistent<br />

with those of target face, and seamlessly blended in the target face. Our approach is fully automatic without user interference,<br />

and generates natural and realistic results.<br />

09:00-11:10, Paper WeAT8.15<br />

3D Deformable Surfaces with Locally Self-Adjusting Parameters – a Robust Method to Determine Cell Nucleus Shapes<br />

Keuper, Margret, Univ. of Freiburg<br />

Schmidt, Thorsten, Univ. of Freiburg<br />

Padeken, Jan, Max-Planck-Insitute of Immunobiology<br />

Heun, Patrick, Max-Planck-Inst. of Immunobiology<br />

Palme, Klaus, Univ. of Freiburg<br />

Burkhardt, Hans, Univ. of Freiburg<br />

Ronneberger, Olaf, Univ. of Freiburg<br />

When using deformable models for the segmentation of biological data, the choice of the best weighting parameters for<br />

the internal and external forces is crucial. Especially when dealing with 3D fluorescence microscopic data and cells within<br />

dense tissue, object boundaries are sometimes not visible. In these cases, one weighting parameter set for the whole contour<br />

is not desirable. We are presenting a method for the dynamic adjustment of the weighting parameters, that is only depending<br />

on the underlying data and does not need any prior information. The method is especially apt to handle blurred, noisy, and<br />

deficient data, as it is often the case in biological microscopy.<br />

09:00-11:10, Paper WeAT8.16<br />

Decomposition of Dynamic Textures using Morphological Component Analysis: A New Adaptative Strategy<br />

Dubois, Sloven, Univ. de La Rochelle<br />

Péteri, Renaud, Univ. of La Rochelle<br />

Ménard, Michel, Univ. de La Rochelle<br />

The research context of this work is dynamic texture analysis and characterization. Many dynamic textures can be modeled<br />

as a large scale propagating wave and local oscillating phenomena. The Morphological Component Analysis algorithm is<br />

used to retrieve these components using a well chosen dictionary. We define a new strategy for adaptive thresholding in<br />

the Morphological Component Analysis framework, which greatly reduces the computation time when applied on videos.<br />

Tests on synthetic and real image sequences illustrate the efficiency of the proposed method and future prospects are<br />

finally exposed.<br />

09:00-11:10, Paper WeAT8.17<br />

Anisotropic Contour Completion for Cell Microinjection Targeting<br />

Becattini, Gabriele, Italian Inst. of Tech.<br />

Mattos, Leonardo, Italian Inst. of Tech.<br />

Caldwell, Darwin G., Italian Inst. of Tech.<br />

This paper shows a novel application of the diffusion tensor for anisotropic image processing. The designed system aims<br />

at spotting and localizing injection points on a population of adherent cells lying on a Petri’s dish. The overall procedure<br />

is described including pre-filtering, ridge enhancement, cell segmentation, shape analysis and injection point detection.<br />

The anisotropic contour completion (ACC) employed is equivalent to a dilation with a continuous elliptic structural element<br />

that takes into account the local orientation of the contours to be closed, preventing extension towards the normal direction.<br />

Experiments carried out on real images from an optical microscope revealed a remarkable reliability with up to 86% of<br />

cells in the field of view correctly segmented and targeted for microinjection.<br />

- 171 -


09:00-11:10, Paper WeAT8.18<br />

Active Contours with Thresholding Value for Image Segmentation<br />

Chen, Gang, Chinese Acad. of Sciences<br />

Zhang, Haiying, Chinese Acad. of Sciences<br />

Chen, Iron, Chinese Acad. of Sciences<br />

Yang, Wen, Wuhan Univ.<br />

In this paper, we propose an active contour with threshold value to detect objects and at the same time get rid of unimportant<br />

parts rather than extract all information. The basic ideal of our model is to introduce a weight matrix into region-based<br />

active contours, which can enhance the weight for the main parts while filter the weak intensity, such as shadows, illumination<br />

and so on. Moreover, we can choose threshold value to set weight matrix manually for accurate image segmentation.<br />

Thus, the proposed method can extract objects of interest in practice. Coupled partial differential equations are used to<br />

implement this method with level set algorithms. Experimental results show the advantages of our method in terms of accuracy<br />

for image segmentation.<br />

09:00-11:10, Paper WeAT8.19<br />

An Iterative Method for Superresolution of Optical Flow Derived by Energy Minimisation<br />

Mochizuki, Yoshihiko, Chiba Univ.<br />

Kameda, Yusuke, Chiba Univ.<br />

Imiya, Atsushi, IMIT, Chiba Univ.<br />

Sakai, Tomoya, Chiba Univ.<br />

Super resolution is a technique to recover a high resolution image from a low resolution image. We develop a variational<br />

super resolution method for the subpixel accurate optical flow computation using variational optimisation. We combine<br />

variational super resolution and the variational optical flow computation for the super resolution optical flow computation.<br />

09:00-11:10, Paper WeAT8.20<br />

Non-Rigid Image Registration for Historical Manuscript Restoration<br />

Wang, Jie, National Univ. of Singapore<br />

Tan, Chew-Lim, National Univ. of Singapore<br />

This paper presents a non-rigid registration method for the restoration of double-sided historical manuscripts. Firstly, the<br />

gradient direction maps of the two images of a manuscript are examined to identify candidate control points. Then the<br />

correspondences of these points are established by minimizing a disimilarity measure consisting of intensity, gradient and<br />

displacement. To fully capture the spatial relationship between the two images, a mapping function is defined as the combination<br />

of a global affine and local b-splines transformation. The cost function for optimization consists of two parts:<br />

normalized mutual information for the goal of similarity and space integral of the square of the second order derivatives<br />

for smoothness. To evaluate the proposed method, a wavelet based restoration procedure is applied to registered images.<br />

Real documents from the National Archives of Singapore are used for testing and the experimental results are impressive.<br />

09:00-11:10, Paper WeAT8.21<br />

An Effective Decentralized Nonparametric Quickest Detection Approach<br />

Yang, Dayu, Univ. of Tennessee<br />

Qi, Hairong, Univ. of Tennessee<br />

This paper studies decentralized quickest detection schemes that can be deployed in a sensing environment where data<br />

streams are simultaneously collected from multiple channels located distributively to jointly support the detection. Existing<br />

decentralized detection approaches are largely parametric that require the knowledge of pre-change and post-change distributions.<br />

In this paper, we first present an effective nonparametric detection procedure based on Q-Q distance measure.<br />

We then describe two implementations schemes, binary quickest detection and local decision fusion by majority voting,<br />

that realize decentralized nonparametric detection. Experimental results show that the proposed method has a comparable<br />

performance to the parametric CUSUM test in binary detection. Its decision fusion-based implementation also outperforms<br />

the other three popular fusion rules under the parametric framework.<br />

- 172 -


09:00-11:10, Paper WeAT8.22<br />

On the Design of a Class of Odd-Length Biorthogonal Wavelet Filter Banks for Signal and Image Processing<br />

Baradarani, Aryaz, Univ. of Windsor<br />

Mendapara, Pankajkumar, Univ. of Windsor<br />

Wu, Q. M. Jonathan, Univ. of Windsor<br />

In this paper, we introduce an approach to the design of odd-length biorthogonal wavelet filter banks based on semi definite<br />

programming employing Bernstein polynomials. The method is systematic and renders a simple optimization problem,<br />

yet it offers wavelet filters ranging from maximally flat to maximal passband/stopband width. The odd-length biorthogonal<br />

filter pairs are then used in multi-focus imaging to obtain a fully-focused image from a set of registered semi-focused<br />

input images at varying focus employing the distance transform and exponentially decaying function on the subbands in<br />

wavelet domain. Various images are tested and experimental results compare favorably to recent results in literature.<br />

09:00-11:10, Paper WeAT8.23<br />

Implicit Feature-Based Alignment System for Radiotherapy<br />

Yamakoshi, Ryoichi, Mitsubishi Electric Corp.<br />

Hirasawa, Kousuke, Mitsubishi Electric Corp.<br />

Okuda, Haruhisa, Mitsubishi Electric Corp.<br />

Kage, Hiroshi, Mitsubishi Electric Corp.<br />

Sumi, Kazuhiko, Mitsubishi Electric Corp.<br />

Ivanov, Yuri, MERL, USA<br />

Sakamoto, Hidenobu, Mitsubishi Electric Corp.<br />

Yanou, Toshihiro, Hyogo Ion Bean Medical Center, Tokyo<br />

Suga, Daisaku, Hyogo Ion Bean Medical Center, Tokyo<br />

Murakami, Masao, Hyogo Ion Bean Medical Center, Tokyo<br />

In this paper we present a robust alignment algorithm for correcting the effects of out-of-plane rotation to be used for automatic<br />

alignment of the Computed Tomography (CT) volumes and the generally low quality fluoroscopic images for radiotherapy<br />

applications. Analyzing not only in-plane but also out-of-plane rotation effects on the Dignitary Reconstructed<br />

Radiograph (DRR) images, we develop simple alignment algorithm that extracts a set of implicit features from DRR.<br />

Using these SIFT-based features, we align DRRs with the fluoroscopic images of the patient and evaluate the alignment<br />

accuracy. We compare our approach with traditional techniques based on gradient-based operators and show that our algorithm<br />

performs faster while in most cases delivering higher accuracy.<br />

09:00-11:10, Paper WeAT8.24<br />

3D Vertebrae Segmentation in CT Images with Random Noises<br />

Aslan, Melih Seref, Univ. of Louisville<br />

Ali, Asem, Univ. of Louisville<br />

Farag, Aly A., Univ. of Louisville<br />

Arnold, Ben, Image Analysis, Inc<br />

Chen, Dongqing, Univ. of Louisville<br />

Ping, Xiang, Image Analysis, Inc.<br />

Exposure levels (X-ray tube amperage and peak kilovoltage) are associated with various noise levels and radiation dose.<br />

When higher exposure levels are applied, the images have higher signal to noise ratio (SNR) in the CT images. However,<br />

the patient receives higher radiation dose in this case. In this paper, we use our robust 3D framework to segment vertebral<br />

bodies (VBs) in clinical computed tomography (CT) images with different noise levels. The Matched filter is employed<br />

to detect the VB region automatically. In the graph cuts method, a VB (object) and surrounding organs (background) are<br />

represented using a gray level distribution models which are approximated by a linear combination of Gaussians (LCG).<br />

Initial segmentation based on the LCG models is then iteratively refined by using Markov Gibbs random field(MGRF)<br />

with analytically estimated potentials. Experiments on the data sets show that the proposed segmentation approach is more<br />

accurate and robust than other known alternatives.<br />

- 173 -


09:00-11:10, Paper WeAT8.25<br />

An Improved Method for Cirrhosis Detection using Liver’s Ultrasound Images<br />

Fujita, Yusuke, Yamaguchi Univ.<br />

Hamamoto, Yoshihiko, Yamaguchi Univ.<br />

Segawa, Makoto, Yamaguchi Univ.<br />

Terai, Shuji, Yamaguchi Univ.<br />

Sakaida, Isao, Yamaguchi Univ.<br />

This paper describes an improved method for cirrhosis detection in the liver using Gabor features from ultrasound images.<br />

There are three main contributions of our cirrhosis detection method. The first contribution of this method is to combine<br />

weak classifiers using the AdaBoost algorithm. The second one is to use an artificial dataset to avoid the problem of over<br />

fitting the limited training dataset. The third one is to apply a voting classification with use of multiple regions of interest<br />

(ROIs). Although the accuracy rate of a single classifier designed with only original dataset was 56%, that of the proposed<br />

method was 80% in cross-validation.<br />

09:00-11:10, Paper WeAT8.26<br />

A Dual Pass Video Stabilization System using Iterative Motion Estimation and Adaptive Motion Smoothing<br />

Pan, Pan, Fujitsu R&D Center Co., Ltd.<br />

Minagawa, Akihiro, Fujitsu Lab. LTD<br />

Sun, Jun, Fujitsu R&D Center Co., LTD<br />

Hotta, Yoshinobu, Fujitsu Lab. LTD.<br />

Naoi, Satoshi, Fujitsu R&D Center Co., LTD<br />

In this paper, we propose a novel dual pass video stabilization system using iterative motion estimation and adaptive<br />

motion smoothing. In the first pass, the transformation matrix to stabilize each frame is returned. The global motion estimation<br />

is carried out by a novel iterative method. The intentional motion is estimated using adaptive window smoothing.<br />

Before the beginning of the second pass, we obtain the optimal trim size for a specific video based on the statistics of the<br />

transformation parameters. In the second pass, the stabilized video is composed according to the optimal trim size. Experimental<br />

results show the superior performance of the proposed method in comparison to other existing methods.<br />

09:00-11:10, Paper WeAT8.27<br />

A Modified Particle Swarm Optimization Applied in Image Registration<br />

Niazi, Muhammad Khalid Khan, Uppsala Univ.<br />

Nystrom, Ingela, Uppsala Univ.<br />

We report a modified version of the particle swarm optimization (PSO) algorithm and its application to image registration.<br />

The modified version utilizes benefits from the Gaussian and the uniform distribution, when updating the velocity equation<br />

in the PSO algorithm. Which one of the distributions is selected depends on the direction of the cognitive and social components<br />

in the velocity equation. This direction checking and selection of the appropriate distribution provide the particles<br />

with an ability to jump out of local minima. The registration results achieved by this new version proves the robustness<br />

and its ability to find a global minimum.<br />

09:00-11:10, Paper WeAT8.28<br />

Image Segmentation based on Adaptive Fuzzy-C-Means Clustering<br />

Ayech, Mohamed Walid, Pol. de Recherche Informatique Du CEntre<br />

El Kalti, Karim, Faculty of Science of Monastir Tunisia<br />

El Ayeb, Bechir, Pol. de Recherche Informatique Du CEntre<br />

The clustering method Fuzzy-C-Means (FCM) is widely used in image segmentation. However, the major drawback of<br />

this method is its sensitivity to the noise. In this paper, we propose a variant of this method which aims at resolving this<br />

problem. Our approach is based on an adaptive distance which is calculated according to the spatial position of the pixel<br />

in the image. The obtained results have shown a significant improvement of our approach performance compared to the<br />

standard version of the FCM, especially regarding the robustness face to noise and the accuracy of the edges between regions.<br />

- 174 -


09:00-11:10, Paper WeAT8.29<br />

Multi-Spectral Satellite Image Registration using Scale-Restricted SURF<br />

Teke, Mustafa, Middle East Tech. Univ.<br />

Temizel, Alptekin, Middle East Tech. Univ.<br />

Satellites generally have arrays of sensors having different resolution and wavelength parameters. For some applications,<br />

images acquired from different viewpoints and positions are required to be aligned. This alignment process could be<br />

achieved by matching the image features followed by image registration. In this paper registration of multispectral satellite<br />

images using Speeded Up Robust Features (SURF) method is examined. The performance of SURF for registration of<br />

high resolution satellite images captured at different bands is evaluated. Scale restriction (SR) method, which has recently<br />

been proposed for SIFT, is adapted to SURF to improve multispectral image registration performance. Matching performance<br />

between different bands using SURF, U-SURF, SURF with SR and U-SURF with SR is tested and robustness of<br />

these with respect to orientation and scale is evaluated.<br />

09:00-11:10, Paper WeAT8.30<br />

Automatic Attribute Threshold Selection for Blood Vessel Enhancement<br />

Kiwanuka, Fred Noah, Univ. of Groningen<br />

Wilkinson, Michael H.f., Univ. of Groningen<br />

Attribute filters allow enhancement and extraction of features without distorting their borders, and never introduce new<br />

image features. These are highly desirable properties in biomedical imaging, where accurate shape analysis is paramount.<br />

However, setting the attribute-threshold parameters has to date only been done manually. This paper explores simple, fast<br />

and automated methods of computing attribute threshold parameters based on image segmentation, thresholding and data<br />

clustering techniques. Though several techniques perform well on blood-vessel filtering, the choice of technique appears<br />

to depend on the imaging mode.<br />

09:00-11:10, Paper WeAT8.31<br />

Initialisation-Free Active Contour Segmentation<br />

Xie, Xianghua, Swansea Univ.<br />

Mirmehdi, Majid, Univ. of Bristol<br />

We present a region based active contour model which does not require any initialisation and is capable of modelling<br />

multi-modal image regions. Its external force is based on statistically learning and grouping of image primitives in multiscale,<br />

and its numerical solution is carried out using radial basis function interpolation and time dependent expansion<br />

coefficient updating. The initialisation-free property makes it attractive to applications such as detecting unkown number<br />

of objects with unkown topologies.<br />

09:00-11:10, Paper WeAT8.32<br />

On Clock Offset Estimation in Wireless Sensor Networks with Weibull Distributed Network Delays<br />

Ahmad, Aitzaz, Texas A&M Univ. Coll. Station<br />

Noor, Amina, Texas A&M Univ. Coll. Station<br />

Serpedin, Erchin, Texas A&M Univ. Coll. Station<br />

Nounou, Hazem, Texas A&M Univ.<br />

Nounou, Mohamed, Texas A&M Univ.<br />

We consider the problem of Maximum Likelihood (ML) estimation of clock parameters in a two-way timing exchange<br />

scenario where the random delays assume a Weibull distribution, which represents a more generalized model. The ML estimate<br />

of the clock offset for the case of exponential distribution was obtained earlier. Moreover, it was reported that when<br />

the fixed delay is known, MLE is not unique. We determine the uniformly minimum variance unbiased (UMVU) estimators<br />

for exponential distribution under such a scenario and produce biased estimators having lower MSE than UMVU for all<br />

values of clock offset. We then consider the case when shape parameter is greater than one and reduce the corresponding<br />

optimization problems to their equivalent convex forms, thus guaranteeing convergence to a global minimum.<br />

- 175 -


09:00-11:10, Paper WeAT8.33<br />

Parallel Algorithm of Two-Dimensional Discrete Cosine Transform based on Special Data Representation<br />

Chicheva, Marina, Image Processing System Inst. of RAS<br />

The paper is devoted to parallel approach efficiency research for two-dimensional discrete cosine transform. The algorithm<br />

based on data representation in hypercompex algebra is proposed.<br />

09:00-11:10, Paper WeAT8.34<br />

Parallel Scales for More Accurate Displacement Estimation in Phase-Based Image Registration<br />

Forsberg, Daniel, Linköping Univ.<br />

Andersson, Mats, Linköping Univ.<br />

Knutsson, Hans<br />

Phase-based methods are commonly applied in image registration. When working with phase-difference methods only a<br />

single scale is employed, although the algorithms are normally iterated over multiple scales, whereas phase-congruency<br />

methods utilize the the phase from multiple scales simultaneously. This paper presents an extension to phase-difference<br />

methods employing parallel scales to achieve more accurate displacements. Results are also presented clearly favouring<br />

the use of parallel scales over single scale in more than 95% of the 120 tested cases.<br />

09:00-11:10, Paper WeAT8.35<br />

A Comprehensive Evaluation on Non-Deterministic Motion Estimation<br />

Wu, Changzhu, Northwestern Pol. Univ.<br />

Wang, Qing, Northwestern Pol. Univ.<br />

When computing optical flow with region-based matching, very few of them can be reliably obtained, especially for the<br />

high-contrast areas or those with little texture. Instead of using a single pixel from the reference frame, non-deterministic<br />

motion utilizes multiple pixels within a neighborhood to represent the corresponding pixel in the current frame. Although<br />

remarkable improvement has been made with this method, the weight associated to each reference pixel is quite sensitive<br />

to the selection of its standard deviation. To address this issue, a dual probability is presented in this paper. Intuitively, it<br />

enhances those weights of pixels that are more similar to its counterpart in the current frame, while suppressing the rest<br />

of them. Experimental results show that the proposed method is effective to deal with intense motion and occlusion, especially<br />

in the case of reducing the adverse impact of noise.<br />

09:00-11:10, Paper WeAT8.36<br />

A Full-View Spherical Image Format<br />

Li, Shigang, Faculty of Engineering<br />

Hai, Ying, Tottori Univ.<br />

This paper proposes a full-view spherical image format which is based on the geodesic division of a sphere. In comparison<br />

with the conventional 3D array representation which consists of five parallelograms, the proposed spherical image format<br />

is a simple 2D array representation. The algorithms of finding the neighboring pixels given a pixel of a spherical image<br />

and mapping between spherical coordinate and spherical image pixel are given also.<br />

09:00-11:10, Paper WeAT8.37<br />

Shift-Map Image Registration<br />

Svärm, Linus, Lund Univ.<br />

Strandmark, Petter, Lund Univ.<br />

Shift-map image processing is a new framework based on energy minimization over a large space of labels. The optimization<br />

utilizes alpha-expansion moves and iterative refinement over a Gaussian pyramid. In this paper we extend the range<br />

of applications to image registration. To do this, new data and smoothness terms have to be constructed. We note a great<br />

improvement when we measure pixel similarities with the dense DAISY descriptor. The main contributions of this paper<br />

are: * The extension of the shift-map framework to include image registration. We register images for which SIFT only<br />

provides 3 correct matches. * A publicly available implementation of shift-map image processing (e.g. in painting, registration).<br />

We conclude by comparing shift-map registration to a recent method for optical flow with favorable results.<br />

- 176 -


09:00-11:10, Paper WeAT8.38<br />

An Adaptive Method for Efficient Detection of Salient Visual Object from Color Images<br />

Brezovan, Marius, Univ. of Craiova<br />

Burdescu, Dumitru Dan, Univ. of Craiova<br />

Ganea, Eugen, Univ. of Craiova<br />

Stanescu, Liana, Univ. of Craiova<br />

Stoica, Cosmin, Univ. of Craiova<br />

This paper presents an efficient graph-based method to detect salient objects from color images and to extract their color<br />

and geometric features. Despite of the majority of the segmentation methods our method is totally adaptive and it do not<br />

require any parameter to be chosen in order to produce a better segmentation. The proposed segmentation method uses a<br />

hexagonal structure defined on the set of the image pixels ant it performs two different steps: a pre-segmentation step that<br />

will produce a maximum spanning tree of the connected components of the visual graph constructed on the hexagonal<br />

structure of an image, and the final segmentation step that will produce a minimum spanning tree of the connected components,<br />

representing the visual objects, by using dynamic weights based on the geometric features of the regions. Experimental<br />

results are presented indicating a good performance of our method.<br />

09:00-11:10, Paper WeAT8.39<br />

Robust Matching in an Uncertain World<br />

Sur, Frédéric, INPL / INRIA Nancy Grand Est<br />

Finding point correspondences which are consistent with a geometric constraint is one of the cornerstones of many computer<br />

vision problems. This is a difficult task because of spurious measurements leading to ambiguously matched points<br />

and because of uncertainty in point location. In this article we address these problems and propose a new robust algorithm<br />

that explicitly takes account of location uncertainty. We propose applications to SIFT matching and 3D data fusion.<br />

09:00-11:10, Paper WeAT8.41<br />

Recursive Dynamically Variable Step Search Motion Estimation Algorithm for High Definition Video<br />

Tasdizen, Ozgur, Sabanci Univ.<br />

Hamzaoglu, Ilker, Sabanci Univ.<br />

For High Definition (HD) video formats, computational complexity of Full Search (FS) Motion Estimation (ME) algorithm<br />

is prohibitively high, whereas the Peak Signal-to-Noise Ratio obtained by fast search ME algorithms is low. Therefore, in<br />

this paper, we propose Recursive Dynamically Variable Step Search (RDVSS) ME algorithm for real-time processing of<br />

HD video formats. RDVSS algorithm dynamically determines the search patterns that will be used for each Macro block<br />

(MB) based on the motion vectors of its spatial and temporal neighboring MBs. RDVSS performs very close to FS by<br />

searching much fewer search locations than FS and it outperforms successful fast search ME algorithms by searching<br />

more search locations than these algorithms. In addition, RDVSS algorithm can be efficiently implemented by a reconfigurable<br />

systolic array based ME hardware.<br />

09:00-11:10, Paper WeAT8.42<br />

Spatial and Temporal Enhancement of Depth Images Captured by a Time-of-Flight Depth Sensor<br />

Kim, Sung-Yeol, The Unviersity of Tennessee<br />

Cho, Ji-Ho, Gwangju Insititute of Science and Tech.<br />

Koschan, Andreas, The Unviersity of Tennessee<br />

Abidi, Mongi, The Unviersity of Tennessee<br />

In this paper, we present a new method to enhance depth images captured by a time-of-flight (TOF) depth sensor spatially<br />

and temporally. In practice, depth images obtained from TOF depth sensors have critical problems, such as optical noise<br />

existence, unmatched boundaries, and temporal inconsistency. In this work, we improve depth quality by performing a<br />

newly-designed joint bilateral filtering, color segmentation-based boundary refinement, and motion estimation-based temporal<br />

consistency. Experimental results show that the proposed method significantly minimizes the inherent problems of<br />

the depth images so that we can use them to generate a dynamic and realistic 3D scene.<br />

- 177 -


09:00-11:10, Paper WeAT8.43<br />

Transition Thresholds for Binarization of Historical Documents<br />

Ramírez-Ortegón, Marte Alejandro, Free Univ. of Berlin<br />

Rojas, Raul, Freie Univ. Berlin<br />

This paper extends the transition method for binarization based on transition pixels, a generalization of edge pixels. This<br />

method originally computes transition thresholds using the quantile thresholding algorithm, that has a critical parameter.<br />

We achieved an automatic version of the transition method by computing the transition thresholds with the Rosin’s algorithm.<br />

We experimentally tested four variants of the transition method combining the density and cumulative distribution<br />

functions of transition values, with gray-intensity thresholds based on the normal and lognormal density functions. The<br />

results of our experiments show that these unsupervised methods yields superior binarization compared with top-ranked<br />

algorithms.<br />

09:00-11:10, Paper WeAT8.44<br />

Image Quality Metrics: PSNR vs. SSIM<br />

Horé, Alain, Sherbrooke Univ.<br />

Ziou, Djemel, Sherbrooke Univ.<br />

In this paper, we analyse two well-known objective image quality metrics, the peak-signal-to-noise ratio (PSNR) as well<br />

as the structural similarity index measure (SSIM), and we derive a simple mathematical relationship between them which<br />

works for various kinds of image degradations such as Gaussian blur, additive Gaussian white noise, jpeg and jpeg2000<br />

compression. A series of tests realized on images extracted from the Kodak database gives a better understanding of the<br />

similarity and difference between the SSIM and the PSNR.<br />

09:00-11:10, Paper WeAT8.45<br />

Coarse Scale Feature Extraction using the Spiral Architecture Structure<br />

Coleman, Sonya, Univ. of Ulster<br />

Scotney, Bryan, Univ. of Ulster<br />

Gardiner, Bryan, Univ. of Ulster<br />

The Spiral Architecture has been developed as a fast way of indexing a hexagonal pixel-based image. In combination with<br />

spiral addition and spiral multiplication, methods have been developed for hexagonal image processing operations such<br />

as translation and rotation. Using the Spiral Architecture as the basis for our operator structure, we present a general approach<br />

to the computation of adaptive coarse scale Laplacian operators for use on hexagonal pixel-based images. We evaluate<br />

the proposed operators using simulated hexagonal images and demonstrate improved performance when compared<br />

with rectangular Laplacian operators such as Marr-Hildreth<br />

09:00-11:10, Paper WeAT8.46<br />

Visual Perception Driven Registration of Mammograms<br />

Boucher, Arnaud, Univ. Paris Descartes<br />

Cloppet, Florence, Paris Descartes Univ.<br />

Vincent, Nicole, Paris Descartes Univ.<br />

Jouve, Pierre Emmanuel, Fenics Company<br />

This paper aims to develop a methodology to register pairs of temporal mammograms. Control points based on anatomical<br />

features are detected in an automated way. Thereby, image semantic is used to extract landmarks based on these control<br />

points. A referential is generated from these control points based on this referential the studied images are realigned using<br />

different levels of observation leading to both rigid and pseudo non-rigid transforms according to expert mammogram<br />

reading.<br />

- 178 -


09:00-11:10, Paper WeAT8.47<br />

Robust Fourier-Based Image Alignment with Gradient Complex Image<br />

Su, Hong-Ren, National Tsing Hua Univ.<br />

Lai, Shang-Hong, National Tsing Hua Univ.<br />

Tsai, Ya-Hui, Industrial Tech. Res. Inst.<br />

The paper proposes a robust image alignment framework based on Fourier transform of a gradient complex image. The<br />

proposed Fourier-based algorithm can handle translation, rotation, and scaling, and it is robust against noise and non-uniform<br />

illumination. The proposed alignment algorithm is further extended to work under occlusion by partitioning the template<br />

and performing the Fourier-based alignment for all partitioned sub-templates in a voting framework. Our experiments<br />

show superior alignment results by using the proposed robust Fourier-based alignment over the previous related methods.<br />

09:00-11:10, Paper WeAT8.48<br />

Rate Control of H.264 Encoded Sequences by Dropping Frames in the Compressed Domain<br />

Kapotas, Spyridon, Hellenic Open Univ.<br />

Skodras, Athanassios N., Hellenic Open Univ.<br />

A new technique for controlling the bitrate of H.264 encoded sequences is presented. Bitrate control is achieved by dropping<br />

frames directly in the compressed domain. The dropped frames are carefully selected so as to either eliminate or cause<br />

non perceptible drift errors in the decoder. The technique suits well H.264 encoded sequences such as movies and tv news,<br />

which are transmitted over wireless networks.<br />

09:00-11:10, Paper WeAT8.49<br />

Statistical Analysis of Kalman Filters by Conversion to Gauss Helmert Models with Applications to Process Noise Estimation<br />

Petersen, Arne, Christian-Albrechts-Univ. of Kiel<br />

Koch, Reinhard, Univ. of Kiel<br />

This paper introduces a reformulation of the extended Kalman Filter using the Gauss-Helmert model for least squares estimation.<br />

By proving the equivalence of both estimators it is shown how the methods of statistical analysis in least squares<br />

estimation can be applied to the prediction and update process in Kalman Filtering. Especially the efficient computation<br />

of the reliability (or redundancy) matrix allows the implementation of self supervising systems. As an application an unparameterized<br />

method for estimating the variances of the filters process noise is presented.<br />

09:00-11:10, Paper WeAT8.50<br />

Color Adjacency Modeling for Improved Image and Video Segmentation<br />

Price, Brian, Brigham Young Univ.<br />

Morse, Bryan, Brigham Young Univ.<br />

Cohen, Scott, Adobe Systems<br />

Color models are often used for representing object appearance for foreground segmentation applications. The relationships<br />

between colors can be just as useful for object selection. In this paper, we present a method of modeling color adjacency<br />

relationships. By using color adjacency models, the importance of an edge in a given application can be determined and<br />

scaled accordingly. We apply our model to foreground segmentation of similar images and video. We show that given one<br />

previously-segmented image, we can greatly reduce the error when automatically segmenting other images by using our<br />

color adjacency model to weight the likelihood that an edge is part of the desired object boundary.<br />

09:00-11:10, Paper WeAT8.51<br />

Paired Transform Slice Theorem of 2-D Image Reconstruction from Projections<br />

Dursun, Serkan, Univ. of Texas at San Antonio<br />

Du, Nan, Univ. of Texas at San Antonio<br />

Grigoryan, Artyom M., Univ. of Texas at San Antonio<br />

This paper discusses the paired transform-based method of reconstruction of 2-D images from their projections. The complete<br />

set of basic functions of the 2-D discrete paired transform are defined by specific directions, i.e. the transform is di-<br />

- 179 -


ectional and can be calculated from the projection data. A simple formula is presented for image reconstruction without<br />

calculating the 2-D discrete Fourier transform in the case, when the size of image is Lr x Lr, when L is prime. The image<br />

reconstruction is described by the discrete model that is used in the series expansion methods of image reconstruction.<br />

The proposed method of reconstruction has been implemented and successfully applied for modeled images on Cartesian<br />

grid of sizes up to 256x256.<br />

09:00-11:10, Paper WeAT8.52<br />

Segmentation of Cervical Cell Images<br />

Kale, Asli, Bilkent Univ.<br />

Aksoy, Selim, Bilkent Univ.<br />

The key step of a computer-assisted screening system that aims early diagnosis of cervical cancer is the accurate segmentation<br />

of cells. In this paper, we propose a two-phase approach to cell segmentation in Pap smear test images with the<br />

challenges of inconsistent staining, poor contrast, and overlapping cells. The first phase consists of segmenting an image<br />

by a non-parametric hierarchical segmentation algorithm that uses spectral and shape information as well as the gradient<br />

information. The second phase aims to obtain nucleus regions and cytoplasm areas by classifying the segments resulting<br />

from the first phase based on their spectral and shape features. Experiments using two data sets show that our method performs<br />

well for images containing both a single cell and many overlapping cells.<br />

09:00-11:10, Paper WeAT8.53<br />

Principal Contour Extraction and Contour Classification to Detect Coronal Loops from the Solar Images<br />

Durak, Nurcan, Univ. of Louisville<br />

Nasraoui, Olfa, Univ. of Louisville<br />

In this paper, we describe a system that determines coronal loop existence from a given Solar image region in two stages:<br />

1) extracting principal contours from the solar image regions, 2) deciding whether the extracted contours are in a loop<br />

shape. In the first stage, we propose a principal contour extraction method that achieves 88% accuracy in extracting the<br />

desired contours from the cluttered regions. In the second stage, we analyze the extracted contours in terms of their geometric<br />

features such as linearity, elliptical features, curvature, proximity, smoothness, and corner points. To distinguish<br />

loop contours from the other forms, we train an Adaboost classifier based C4.5 decision tree by using geometric features<br />

of 150 loop contours and 250 non-loop contours. Our system achieves 85% F1-Score from 10-fold cross validation experiments.<br />

09:00-11:10, Paper WeAT8.54<br />

Human Shadow Removal with Unknown Light Source<br />

Chen, Chia-Chih, The Univ. of Texas at Austin<br />

Aggarwal, J. K., The Univ. of Texas at Austin<br />

In this paper, we present a shadow removal technique which effectively eliminates a human shadow cast from an unknown<br />

direction of light source. A multi-cue shadow descriptor is proposed to characterize the distinctive properties of shadows.<br />

We employ a 3-stage process to detect then remove shadows. Our algorithm improves the shadow detection accuracy by<br />

imposing the spatial constraint between the foreground subregions of human and shadow. We collect a dataset containing<br />

81 human-shadow images for evaluation. Both descriptor ROC curves and qualitative results demonstrate the superior<br />

performance of our method.<br />

09:00-11:10, Paper WeAT8.55<br />

Generalizing Tableau to Any Color of Teaching Boards<br />

Oliveira, Daniel Marques, Univ. Federal de Pernambuco<br />

Lins, Rafael Dueire, Univ. Federal de Pernambuco<br />

Teaching boards are omnipresent in classrooms throughout the world. Tableau is a software environment for processing<br />

images from teaching-boards acquired using portable digital cameras and cell-phones. The previous versions of Tableau<br />

were restricted to white-board processing. This paper generalizes the enhancement algorithm to work with boards of any<br />

color, being the first software environment able to process non-white boards.<br />

- 180 -


09:00-11:10, Paper WeAT8.56<br />

Enhancing the Filtering-out of the Back-to-Front Interference in Color Documents with a Neural Classifier<br />

Silva, Gabriel De França Pereira E, Univ. Federal de Pernambuco<br />

Lins, Rafael Dueire, Univ. Federal de Pernambuco<br />

Silva, João Marcelo Monte Da, Univ. Federal de Pernambuco<br />

Banergee, Serene, Hewlett-Packard Labs - India<br />

Kuchibhotla, Anjaneyulu, Hewlett-Packard Labs - India<br />

Thielo, Marcelo, Hewlett-Parckard Labs - Brazil<br />

Back-to-front, show-through, or bleeding are the names given to the interference that appears whenever one writes or<br />

prints on both sides of translucent paper. Such interference degrades image binarization and document transcription via<br />

OCR. The technical literature presents several algorithms to remove the back-to-front noise, but no algorithm is good<br />

enough in all cases. This article presents a new technique to remove such noise in color documents which makes use of<br />

neural classifiers to evaluate the degree of intensity of the interference and besides that to indicate the existence of blur.<br />

Such classifier allows tuning the parameters of an algorithm for back-to-front interference and document enhancement.<br />

09:00-11:10, Paper WeAT8.57<br />

A Scale Estimation Algorithm using Phase-Based Correspondence Matching for Electron Microscope Images<br />

Suzuki, Ayako, Tohoku Univ.<br />

Ito, Koichi, Tohoku Univ.<br />

Aoki, Takafumi, Tohoku Univ.<br />

Tsuneta, Ruriko, Hitachi, Ltd., Central Res. Lab.<br />

This paper proposes a multi-stage scale estimation algorithm using phase-based correspondence matching for electron<br />

microscope images. Consider a sequence of microscope images of the same target object, where the image magnification<br />

is gradually increased so that the final image has a very large scale factor S (e.g., S=1,000) with respect to the initial image.<br />

The problem considered in this paper is to estimate the overall scale factor S of the given image sequence. The proposed<br />

scale estimation technique provides a new methodology for high-accuracy magnification calibration of electron microscopes.<br />

Experimental evaluation using Mandelbrot images as precisely scale-controlled image sequence shows that the<br />

proposed method can estimate the scale factor S=1,000 with approximately 0.1%-scale error. This paper also describes an<br />

application of the proposed algorithm to the magnification calibration of an actual STEM (Scanning Transmission Electron<br />

Microscope).<br />

09:00-11:10, Paper WeAT8.58<br />

Edge Drawing: An Heuristic Approach to Robust Real-Time Edge Detection<br />

Topal, Cihan, Anadolu Univ.<br />

Akinlar, Cuneyt, Anadolu Univ.<br />

Genc, Yakup, Siemens Corp. Res.<br />

We propose a new edge detection algorithm that works by computing a set of anchor edge points in an image and then<br />

linking these anchor points by drawing edges between them. The resulting edge map consists of perfect contiguous, one<br />

pixel wide edges. The performance tests show that our algorithm is up to 16% faster than the fastest known edge detection<br />

algorithm, i.e., OpenCV implementation of the Canny edge detector. We believe that our edge detector is a novel step in<br />

edge detection and would be very suitable for the next generation real-time image processing and computer vision applications.<br />

09:00-11:10, Paper WeAT8.59<br />

MPEG-2 Video Watermarking using Pattern Consideration<br />

Mansouri, Azadeh, Shahid Beheshti Univ.<br />

Mahmoudi Aznaveh, Ahmad, Shahid Beheshti Univ.<br />

Torkamani-Azar, Farah, Shahid Beheshti Univ.<br />

This paper proposes a new method for digital video watermarking in compressed domain. Both the embedding and extracting<br />

phases are performed after entropy decoding. Consequently, fully decompressing the compressed video is not<br />

necessary making this scheme an appropriate choice for real-time application. Furthermore, taking the structural information<br />

into account leads to presenting a robust watermarking scheme along with less quality degradation. To select suitable<br />

- 181 -


coefficients for embedding the watermark, three different aspects, imperceptibility, security, and bit rate increase, have<br />

been considered. These performance factors are adjusted by defining three priority matrices. In addition, a content based<br />

key is proposed in order to overcome the collusion attack. The flexibility of our method to provide desired characteristic<br />

can be expressed as another advantage.<br />

09:00-11:10, Paper WeAT8.60<br />

Lip Segmentation using Level Set Method: Fusing Landmark Edge Distance and Image Information<br />

Banimahd, Seyed Reza, Sahand Univ. of Tech.<br />

Ebrahimnezhad, Hossein, Sahand Univ. of Tech.<br />

Lip segmentation is an essential step in audio-visual processing systems. In this paper, we incorporate the color and edge<br />

information in level set formulation, for extraction of lip contour. We build two initiative auxiliary images by mixing of<br />

different color spaces, to extract the landmark edges for upper and lower part of lip. The performance of this approach on<br />

VidTIMIT databases is tasted and accuracy of 91.2% is reached.<br />

09:00-11:10, Paper WeAT8.61<br />

Adaptive Color Independent Components based SIFT Descriptors for Image Classification<br />

Ai, Danni, Ritsumeikan Univ.<br />

Han, Xian-Hua, Ritsumeikan Univ.<br />

Ruan, Xiang, Omron corparation<br />

Chen, Yen-Wei, Ritsumeikan Univ.<br />

This paper proposes an adaptive color independent components based SIFT descriptor (termed CIC-SIFT) for image classification.<br />

Our motivation is to seek an adaptive and efficient color space for color SIFT feature extraction. Our work has<br />

two key contributions. First, based on independent component analysis (ICA), an adaptive and efficient color space is<br />

proposed for color image representation. Second, in this ICA-based color space, a discriminative CIC-SIFT descriptor is<br />

calculated for image classification. The experiment results indicate that (1) contrast between objects and background can<br />

be enhanced on the ICA-based color space and (2) the CIC-SIFT descriptor outperforms other conventional color SIFT<br />

descriptors on image classification.<br />

WeAT9 Lower Foyer<br />

Bioinformatics and Biomedical Applications Poster Session<br />

Session chair: Unay, Devrim (Bahcesehir Univ.)<br />

09:00-11:10, Paper WeAT9.1<br />

Joint Registration and Segmentation of Histological Volume Data by Diffusion-Based Label Adaption<br />

Bollenbeck, Felix, Fraunhofer Inst. for Factory Operation and Automation<br />

Seiffert, Udo, Fraunhofer IFF Magdeburg<br />

Three-dimensional serial section imaging delivers high spatial resolution and histological detail, which facilitates analysis<br />

of differentiation and development by exact labelling of tissues and cells, unknown to other 3-D imaging modalities. We<br />

propose an algorithm for interleaved reconstruction and segmentation of tissues in serial section volumes by diffusionbased<br />

registration and adaption of two-dimensional reference labellings. Iterative refinement of the global image congruence<br />

and local deformation of labellings delivers an efficient algorithm for processing of large volume data-sets. The<br />

benefits of the approach are shown by means of reconstruction and segmentation of giga-voxel serial section volumes of<br />

plant specimen.<br />

09:00-11:10, Paper WeAT9.2<br />

The Use of Genetic Programming for Learning 3D Craniofacial Shape Quantification<br />

Atmosukarto, Indriyati, Univ. of Washington<br />

Shapiro, Linda,<br />

Heike, Carrie, Seattle Children’s Hospital, Craniofacial Center<br />

Craniofacial disorders commonly result in various head shape dysmorphologies. The goal of this work is to quantify the<br />

various 3D shape variations that manifest in the different facial abnormalities in individuals with a craniofacial disorder<br />

- 182 -


called 22q11.2 Deletion Syndrome. Genetic programming (GP) is used to learn the different 3D shape quantifications.<br />

Experimental results show that the GP method achieves a higher classification rate than those of human experts and existing<br />

computer algorithms.<br />

09:00-11:10, Paper WeAT9.3<br />

Identification of Ancestry Informative Markers from Chromosome-Wide Single Nucleotide Polymorphisms using Symmetrical<br />

Uncertainty Ranking<br />

Piroonratana, Theera, King Mongkut’s Univ. of Tech.<br />

Wongseree, Waranyu, King Mongkut’s Univ. of Tech.<br />

Usavanarong, Touchpong, King Mongkut’s Univ. of Tech.<br />

Assawamakin, Anunchai, Mahidol Univ.<br />

Limwongse, Chanin, Mahidol Univ.<br />

Chaiyaratana, Nachol, King Mongkut’s Univ. of Tech.<br />

Ancestry informative markers (AIMs) have been proven to contain necessary information for population classification. In<br />

this article, round robin symmetrical uncertainty ranking for preliminary AIM screening is proposed. Each single nucleotide<br />

polymorphism (SNP) is assigned a rank based on its ability to separate two populations from each other. In a multi-population<br />

scenario, all possible population pairs are considered and the screened SNP set incorporates top-ranked SNPs from<br />

every pair-wise comparison. After the preliminary screening, SNPs are further screened by a wrapper which is embedded<br />

with a naive Bayes classifier. A classification model is subsequently constructed from the finally screened SNPs via a<br />

naive Bayes classifier. The application of the proposed procedure to the Hap Map data indicates that AIM panels can be<br />

found on all chromosomes. Each panel consists of 11 to 24 SNPs and can be used to completely classify the CEU, CHB,<br />

JPT and YRI populations. Moreover, all panels are smaller than the AIM panels reported in previous studies.<br />

09:00-11:10, Paper WeAT9.4<br />

Evaluation of a New Point Clouds Registration Method based on Group Averaging Features<br />

Temerinac-Ott, Maja, Univ. of Freiburg<br />

Keuper, Margret, Univ. of Freiburg<br />

Burkhardt, Hans, Univ. of Freiburg<br />

Registration of point clouds is required in the processing of large biological data sets. The trade off between computation<br />

time and accuracy of the registration is the main challenge in this task. We present a novel method for registering point<br />

clouds in two and three dimensional space based on Group Averaging on the Euclidean transformation group. It is applied<br />

on a set of neighboring points whose size directly controls computing time and accuracy. The method is evaluated regarding<br />

dependencies of the computing time and the registration accuracy versus the point density assuming their random distribution.<br />

Results are verified in two biological applications on 2D and 3D images.<br />

09:00-11:10, Paper WeAT9.5<br />

Cell Tracking in Video Microscopy using Bipartite Graph Matching<br />

Chowdhury, Ananda, Jadavpur Univ.<br />

Chatterjee, Rohit, Jadavpur Univ.<br />

Ghosh, Mayukh, Jadavpur Univ.<br />

Ray, Nilanjan, Univ. of Alberta<br />

Automated visual tracking of cells from video microscopy has many important biomedical applications. In this paper, we<br />

model the problem of cell tracking over pairs of video microscopy image frames as a minimum weight matching problem in<br />

bipartite graphs. The bipartite matching essentially establishes one-to-one correspondences between the cells in different<br />

frames. A key advantage of using bipartite matching is the inherent scalability, which arises from its polynomial time-complexity.<br />

We propose two different tracking methods based on bipartite graph matching and properties of Gaussian distributions.<br />

In both the methods, i) the centers of the cells appearing in two frames are treated as vertices of a bipartite graph and ii) the<br />

weight matrix contains information about distance between the cells (in two frames) and cell velocity. In the first method,<br />

we identify fast-moving cells based on distance and filter them out using Gaussian distributions before the matching is<br />

applied. In the second method, we remove false matches using Gaussian distributions after the bipartite graph matching is<br />

employed. Experimental results indicate that both the methods are promising while the second method has higher accuracy.<br />

- 183 -


09:00-11:10, Paper WeAT9.6<br />

Human State Classification and Predication for Critical Care Monitoring by Real-Time Bio-Signal Analysis<br />

Li, Xiaokun, DCM Res. Res. LLC<br />

Porikli, Fatih, MERL<br />

To address the challenges in critical care monitoring, we present a multi-modality bio-signal modeling and analysis modeling<br />

framework for real-time human state classification and predication. The novel bioinformatic framework is developed<br />

to solve the human state classification and predication issues from two aspects: a) achieve 1:1 mapping between the biosignal<br />

and the human state via discriminant feature analysis and selection by using probabilistic principle component<br />

analysis (PPCA); b) avoid time-consuming data analysis and extensive integration resources by using Dynamic Bayesian<br />

Network (DBN). In addition, intelligent and automatic selection of the most suitable sensors from the bio-sensor array is<br />

also integrated in the proposed DBN.<br />

09:00-11:10, Paper WeAT9.7<br />

Automated Cephalometric Landmark Identification using Shape and Local Appearance Models<br />

Keustermans, Johannes, K.U. Leuven<br />

Mollemans, Wouter, Medicim nv.<br />

Vandermeulen, Dirk<br />

Suetens, Paul, K.U.Leuven<br />

In this paper a method is presented for the automated identification of cephalometric anatomical landmarks in craniofacial<br />

cone-beam CT images. This method makes use of statistical models, incorporating both local appearance and shape knowledge<br />

obtained from training data. Firstly, the local appearance model captures the local intensity pattern around each<br />

anatomical landmark in the image. Secondly, the shape model contains a local and a global component. The former improves<br />

the flexibility, whereas the latter improves the robustness of the algorithm. Using a leave-one-out approach to the<br />

training data, we assess the overall accuracy of the method. The mean and median error values for all landmarks are equal<br />

to 2.55mm and 1.72mm, respectively.<br />

09:00-11:10, Paper WeAT9.8<br />

Color Analysis for Segmenting Digestive Organs in VCE<br />

Vu, Hai, The Inst. of Scientific and Industrial Res. Osaka<br />

Echigo, Tomio, Osaka Electro-Communication Univ.<br />

Yagi, Yasushi, Osaka Univ.<br />

Yagi, Keiko, Kobe Pharmaceutical Univ.<br />

Shiba, Masatsugu, Osaka City Univ.<br />

Higuchi, Kazuhide, Osaka City Univ.<br />

Arakawa, Tetsuo, Osaka City Univ.<br />

This paper presents an efficient method for automatically segmenting the digestive organs in a Video Capsule Endoscopy<br />

(VCE) sequence. The method is based on unique characteristics of color tones of the digestive organs. We first introduce<br />

a color model of the gastrointestinal (GI) tract containing the color components of GI wall and non-wall regions. Based<br />

on the wall regions extracted from images, the distribution along the time dimension for each color component is exploited<br />

to learn the dominant colors that are candidates for discriminating digestive organs. The strongest candidates are then<br />

combined to construct a representative signal to detect the boundary of two adjacent regions. The results of experiments<br />

are comparable with previous works, but computation cost is more efficient.<br />

09:00-11:10, Paper WeAT9.9<br />

A New Application of Meg and DTI on Word Recognition<br />

Meng, Lu, Northeastern Univ.<br />

Xiang, Jing, CCHMC<br />

Zhao, Hong, Northeastern Univ.<br />

Zhao, Dazhe, Northeastern Univ.<br />

This paper presented a novel application of Magneto encephalography (MEG) and diffusion tensor image (DTI) on word<br />

recognition, in which the spatiotemporal signature and the neural network of brain activation associated with word recognition<br />

were investigated. The word stimuli consisted of matched and mismatched words, which were visually and acousti-<br />

- 184 -


cally presented simultaneously. Twenty participants were recruited to distinguish and gave different reactions to these two<br />

types of stimuli. The neural activations caused by their reactions were recorded by MEG system and 3T magnetic DTI<br />

scanner. Virtual sensor technique and wavelet beam former source analysis, which were state-of-the-art methods, were<br />

used to study the MEG and DTI data. Three responses were evoked in the MEG waveform and M160 was identified in<br />

the left temporal-occipital junction. All the results coincided with the previous studies’ conclusions, which indicated that<br />

the integration of virtual sensor and wavelet beam former were effective techniques in analyzing the MEG and DTI data.<br />

09:00-11:10, Paper WeAT9.10<br />

A Hypothesis Testing Approach for Fluorescent Blob Identification<br />

Wu, Le-Shin, Indiana Univ.<br />

Shaw, Sidney, Indiana Univ.<br />

Template matching is a common approach for identifying fluorescent objects within a biological image. But how to decide<br />

a threshold value for the purpose of justifying the goodness of matching score is a rather difficult task. In this paper, we<br />

propose a framework that dynamically chooses appropriate threshold values for correct object identification at a non-arbitrary<br />

statistical power based on the local measure of signal and noise. We validate the feasibility of our proposed framework<br />

by presenting simulation experiments conducted with both synthetic and live-cell data sets. The experimental results<br />

suggest that our auto-thresholding algorithm and local signal to noise ratio estimation can provide solid means for effective<br />

spot identity in place of an ad hoc threshold fitting value or minimization method.<br />

09:00-11:10, Paper WeAT9.11<br />

Automated Detection of Nucleoplasmic Bridges for DNA Damage Scoring in Binucleated Cells<br />

Sun, Changming, CSIRO<br />

Vallotton, Pascal, CSIRO<br />

Fenech, Michael, CSIRO<br />

Thomas, Phil, CSIRO<br />

Quantification of DNA damage, which may be caused by radiation or exposure to chemicals, is very important and can be<br />

very time consuming and subject to variability if carried out visually. The quantification of scoring DNA damage includes<br />

biomarkers such as micronuclei, nucleoplasmic bridges, and nuclear buds as scored in cytokinesis-blocked binucleated<br />

cells. In this paper, we present a new algorithm based on a shortest path technique that enables us to detect the nucleoplasmic<br />

bridges joining two nuclei in cell images of binucleated cells. The effectiveness of our algorithm is illustrated using<br />

a set of cell images. We believe that this is the first time that a feasible automated nucleoplasmic bridge detection system<br />

has been reported.<br />

09:00-11:10, Paper WeAT9.12<br />

Multiple Model Estimation for the Detection of Curvilinear Segments in Medical X-Ray Images using Sparse-Plus-<br />

Dense-RANSAC<br />

Papalazarou, Chrysi, Eindhoven Univ. of Tech.<br />

De With, Peter H. N., Eindhoven Univ. of Tech. / CycloMedia<br />

Rongen, Peter, Philips Healthcare<br />

In this paper, we build on the RANSAC method to detect multiple instances of objects in an image, where the objects are<br />

modeled as curvilinear segments with distinct endpoints. Our approach differs from previously presented work in that it<br />

incorporates soft constraints, based on a dense image representation, that guide the estimation process in every step. This<br />

enables (1) better correspondence with image content, (2) explicit endpoint detection and (3) a reduction in the number of<br />

iterations required for accurate estimation. In the case of curvilinear objects examined in this paper, these constraints are<br />

formulated as binary image labels, where the estimation proved to be robust to mislabeling, e.g. in case of intersections.<br />

Results for both synthetic and real data from medical X-ray images show the improvement from incorporating soft imagebased<br />

constraints.<br />

- 185 -


09:00-11:10, Paper WeAT9.13<br />

Statistical Texture Modeling for Medical Volume using Generalized N-Dimensional Principal Component Analysis<br />

Method and 3D Volume Morphing<br />

Qiao, Xu, Ritsumeikan Univ.<br />

Chen, Yen-Wei, Ritsumeikan Univ.<br />

In this paper, a statistical texture modeling method is proposed for medical volumes. As the shapes of the human organ<br />

are very different from one case to another, 3D volume morphing is applied to normalize all the volume datasets to a same<br />

shape for removing shape variations. In order to deal with the problems of high-dimension and small number of medial<br />

samples, we propose an effective image compression method named Generalized N-dimensional Principal Component<br />

Analysis (GND-PCA) to construct a statistical model. Experiments applied on liver volumes show good performance on<br />

generalization using our method. A simple experiment is employed to show that the features extracted by the statistical<br />

texture model have capability of discrimination for different types of data, such as normal and abnormal.<br />

09:00-11:10, Paper WeAT9.14 CANCELED<br />

Distinguishing Patients with Gastritis and Cholecystitis from the Healthy by Analyzing Wrist Radial Arterial Doppler<br />

Blood Flow Signals<br />

Jiang, Xiaorui, Harbin Inst. of Tech.<br />

Zhang, Dongyu, Harbin Inst. of Tech.<br />

Wang, Kuanquan, Harbin Inst. of Tech.<br />

Zuo, Wangmeng, Harbin Inst. of Tech.<br />

This paper tries to fill the gap between Traditional Chinese Pulse Diagnosis (TCPD) and Doppler diagnosis by applying<br />

digital signal analysis and pattern classification techniques to wrist radial arterial Doppler blood flow signals. Doppler<br />

blood flows signals (DBFS) of patients with cholecystitis, gastritis and healthy people are classified by L2-soft margin<br />

SVM and 5 linear classifiers using the proposed feature - piecewise axially integrated bispectra (PAIB). A 5-fold cross<br />

validation is used for performance evaluation. The classification accuracies between either two groups of subjects are<br />

greater than 93%. Gastritis can be recognized with higher accuracy than cholecystitis. Cholecystitis can be recognized<br />

with higher accuracy on left hand data than right. The findings in this paper partly conform to the theory of TCPD. Though<br />

the sample size is relatively small, we could still argue that the methods proposed here are effective and could serve as an<br />

assistive tool for TCPD.<br />

09:00-11:10, Paper WeAT9.15<br />

Pelvic Organs Dynamic Features Analysis for MRI Sequences Discrimination<br />

Rahim, Mehdi, Univ. Paul Cézanne<br />

Bellemare, Marc-Emmanuel, Univ. Paul Cézanne<br />

Pirro, Nicolas, Hôpital La Timone<br />

Bulot, Rémy, Univ. Paul Cézanne<br />

Dynamic magnetic resonance imaging MRI acquisitions are used in the clinical assessment of the pelvic organs behaviour<br />

during an abdominal strain. The main organs (bladder, uterus-vagina, rectum) undergo deformations and intrinsic movements<br />

along a sequence. Anatomical references and measurements are generally used by clinicians to evaluate pathology<br />

grades. In this context, we have established quantitative elements, which consist of deformation and movement features,<br />

for the pelvic dynamic characterization, by using shape descriptors computed from organ contours. Moreover, the deformation<br />

and movement features relevance has been assessed towards an efficient sequence discrimination and pathology<br />

detection.<br />

09:00-11:10, Paper WeAT9.16<br />

Multiple Atlas Inference and Population Analysis with Spectral Clustering<br />

Sfikas, Giorgos, Univ. of Ioannina<br />

Heinrich, Christian, Univ. de Strasbourg<br />

Nikou, Christophoros, Univ. of Ioannina<br />

In medical imaging, constructing an atlas and bringing an image set in a single common reference frame may easily lead<br />

the analysis to erroneous conclusions, especially when the population under study is heterogeneous. In this paper, we propose<br />

a framework based on spectral clustering that is capable of partitioning an image population into sets that require a<br />

- 186 -


separate atlas, and identifying the most suitable templates to be used as coordinate reference frames. The spectral analysis<br />

step relies on pairwise distances that express anatomical differences between subjects as a function of the diffeomorphic<br />

warp required to match the one subject onto the other, plus residual information. The methodology is validated numerically<br />

on artificial and medical imaging data.<br />

09:00-11:10, Paper WeAT9.17<br />

Automatic Pathology Annotation on Medical Images: A Statistical Machine Translation Framework<br />

Gong, Tianxia, National Univ. of Singapore<br />

Li, Shimiao, National Univ. of Singapore<br />

Tan, Chew-Lim, National Univ. of Singapore<br />

Pang, Boon Chuan, National Neuroscience Inst. Tan Tock Seng Hospital<br />

Lim, Tchoyoson, National Neuroscience Inst. Tan Tock Seng Hospital<br />

Lee, Cheng Kiang, National Neuroscience Inst. Tan Tock Seng Hospital<br />

Tian, Qi, Insitute of Infocomm Res.<br />

Zhang, Zhuo, Insitute of Infocomm Res.<br />

Large number of medical images are produced daily in hospitals and medical institutions, the needs to efficiently process,<br />

index, search and retrieve these images are great. In this paper, we propose a pathology based medical image annotation<br />

framework using a statistical machine translation approach. After pathology terms and regions of interest (ROIs) are extracted<br />

from training text and images respectively, we use machine translation model IBM Model 1 to iteratively learn the<br />

alignment between the ROIs and the pathology terms and generate an ROI-to-pathology translation table. In testing phase,<br />

we annotate the ROI in the image with the pathology label of the highest probability in the translation table. The overall<br />

annotation results and the retrieval performance are promising to doctors and medical professionals.<br />

09:00-11:10, Paper WeAT9.18<br />

3D Cell Nuclei Fluorescence Quantification using Sliding Band Filter<br />

Quelhas, Pedro, INEB- Inst. de Engenharia Biomedica<br />

Mendonça, Ana Maria, INEB - Inst. de Engenharia Biomédica<br />

Aurélio, Campilho, Faculdade de Engenharia da Univ. do Porto<br />

Plant development is orchestrated by transcription factors whose expression has become observable in living plants through<br />

the use of fluorescence microscopy. However, the exact quantification of expression levels is still not solved and most<br />

analysis is only performed through visual inspection. With the objective of automating the quantification of cell nuclei<br />

fluorescence we present a new approach to detect cell nuclei in 3D fluorescence confocal microscopy, based on the use of<br />

the sliding band convergence filter (SBF). The SBF filter detects cell nuclei and estimate their shape with high accuracy<br />

in each 2D image plane. For 3D detection, individual 2D shapes are joined into 3D estimates and then corrected based on<br />

the analysis of the fluorescence profile. The final nuclei detection’s precision/recall are of 0.779/0.803 respectively, and<br />

the average Dice’s coefficient of 0.773.<br />

09:00-11:10, Paper WeAT9.19<br />

AP-Based Consensus Clustering for Gene Expression Time Series<br />

Chiu, Tai-Yu, National Tsing Hua Univ.<br />

Hsu, Ting-Chieh, National Tsing Hua Univ.<br />

Wang, Jia-Shung, National Tsing Hua Univ.<br />

We propose an unsupervised approach for analyzing gene time-series datasets. Our method combines Affinity Propagation<br />

(AP) and the spirit of consensus clustering— extracting multiple partitions from different time intervals. Without priori<br />

knowledge of total number of clusters and exemplars, this method holds the relationship between genes through different<br />

time intervals, and eliminates the influence from noises and outliers. We demonstrate our method with both synthetic and<br />

real gene expression datasets showing significant improvement in accuracy and efficiency.<br />

- 187 -


09:00-11:10, Paper WeAT9.21<br />

Unsupervised Tissue Image Segmentation through Object-Oriented Texture<br />

Tosun, Akif Burak, Bilkent Univ.<br />

Sokmensuer, Cenk, Hacettepe Univ.<br />

Gunduz-Demir, Cigdem, Bilkent Univ.<br />

This paper presents a new algorithm for the unsupervised segmentation of tissue images. It relies on using the spatial information<br />

of cytological tissue components. As opposed to the previous study, it does not only use this information in<br />

defining its homogeneity measures, but it also uses it in its region growing process. This algorithm has been implemented<br />

and tested. Its visual and quantitative results are compared with the previous study. The results show that the proposed<br />

segmentation algorithm is more robust in giving better accuracies with less number of segmented regions.<br />

09:00-11:10, Paper WeAT9.22<br />

Automated Tracking of Vesicles in Phase Contrast Microscopy Images<br />

Usenik, Peter, Univ. of Ljubljana<br />

Vrtovec, Tomaž, Univ. of Ljubljana<br />

Pernus, Franjo, Univ. of Ljubljana<br />

Likar, Bostjan, Univ. of Ljubljana<br />

We propose an algorithm for automated tracking of the contours of phospholipid vesicles, which can be used to evaluate<br />

the power, magnitude and frequency distribution of vesicle contour movements induced by thermal fluctuations. The algorithm<br />

was tested on vesicles with different structure composition that were exposed to variable temperature. The results<br />

show that the proposed algorithm is fast, robust and reliable, and that the resulting description of vesicle contours enables<br />

straightforward spectral analysis of their fluctuations, which can be also used for the determination of other vesicle properties,<br />

e.g. the bending rigidity or spontaneous curvature.<br />

09:00-11:10, Paper WeAT9.23<br />

Automatic Detection and Segmentation of Focal Liver Lesions in Contrast Enhanced CT Images<br />

Militzer, Arne, Friedrich-Alexander-Univ. Erlangen-Nuremberg<br />

Hager, Tobias, Friedrich-Alexander-Univ. Erlangen-Nuremberg<br />

Jäger, Florian, Pattern Recognition Lab. Univ. of Erlangen<br />

Tietjen, Christian, Siemens Healthcare<br />

Hornegger, Joachim, Friedrich-Alexander-Univ.<br />

In this paper a novel system for automatic detection and segmentation of focal liver lesions in CT images is presented. It<br />

utilizes a probabilistic boosting tree to classify points in the liver as either lesion or parenchyma, thus providing both detection<br />

and segmentation of the lesions at the same time and fully automatically. To make the segmentation more robust,<br />

an iterative classification scheme is integrated, that incorporates knowledge gained from earlier iterations into later decisions.<br />

Finally, a comprehensive evaluation of both the segmentation and the detection performance for the most common<br />

hypo dense lesions is given. Detection rates of 77% could be achieved with a sensitivity of 0.95 and a specificity of 0.93<br />

for lesion segmentation at the same settings.<br />

09:00-11:10, Paper WeAT9.24<br />

Automatic Diagnosis of Masses by using Level Set Segmentation and Shape Description<br />

Oliver, Arnau, Univ. of Girona<br />

Torrent, Albert, Univ. of Girona<br />

Llado, Xavier, Univ. of Girona<br />

Martí, Joan, Univ. of Girona<br />

We present here an approach for automatic mass diagnosis in mammographic images. Our strategy contains three main<br />

steps. Firstly, region of interests containing mass and background are segmented using a level set algorithm based on<br />

region information. Secondly, the characterisation of each segmented mass is obtained using the Zernike moments for<br />

modelling its shape. The final step is the diagnosis of masses as benign or malignant lesions, which is done using the Gentleboost<br />

algorithm that also assigns a likelihood value to the final result. The experimental evaluation, performed using<br />

two different digitised databases and Receiver Operating Characteristics (ROC) analysis, proves the feasibility of our proposal,<br />

showing the benefits of a correct shape description for improving automatic mass diagnosis.<br />

- 188 -


09:00-11:10, Paper WeAT9.25<br />

3D Reconstruction of Tumors for Applications in Laparoscopy using Conformal Geometric Algebra<br />

Machucho, Rubén, CINVESTAV, Unidad Guadalajara<br />

Bayro Corrochano, Eduardo Jose, CINVESTAV, Unidad Guadalajara<br />

This paper presents a method for 3D reconstruction of tumors for applications in laparoscopy. This uses stereo endoscopic<br />

ultrasound images, which are simultaneously recorded. To do this, the ultrasound probe is tracked throughout the stereo<br />

endoscopic images using a particle filter and an auxiliary method based on thresholding in the HSV-space is used in order<br />

to improve the tracking. Then, the 3D pose of the ultrasound probe is calculated using conformal geometric algebra. The<br />

2D ultrasound images have been segmented using two methods: the level sets method and morphological operators, and<br />

a comparison between their performances has been done. Finally, the processed ultrasound images are compounded into<br />

a 3D volume, using the calculated ultrasound pose.<br />

09:00-11:10, Paper WeAT9.26<br />

Vessel Bend-Based Cup Segmentation in Retinal Images<br />

Joshi, Gopal Datt, IIIT Hyderabad<br />

Sivaswamy, Jayanthi, IIIT Hyderabad<br />

Karan, Kundan, AECS, Madurai<br />

Ranganath, Prashanth Ranganath, AECS, Madurai<br />

Krishnadas, S.r.krishnadas, AECS, Madurai<br />

In this paper, we present a method for cup boundary detection from monocular colour fundus image to help quantify cup<br />

changes. The method is based on anatomical evidence such as vessel bends at cup boundary, considered relevant by glaucoma<br />

experts. Vessels are modeled and detected in a curvature space to better handle inter-image variations. Bends in a<br />

vessel are robustly detected using a region of support concept, which automatically selects the right scale for analysis. A<br />

reliable subset called r-bends is derived using a multi-stage strategy and a local splinetting is used to obtain the desired<br />

cup boundary. The method has been successfully tested on 133 images comprising 32 normal and 101 glaucomatous<br />

images against three glaucoma experts. The proposed method shows high sensitivity in cup to disk ratio-based glaucoma<br />

detection and local assessment of the detected cup boundary shows good consensus with the expert markings.<br />

09:00-11:10, Paper WeAT9.27<br />

A Spot Segmentation Approach for 2D Gel Electrophoresis Images based on 2D Histograms<br />

Zacharia, Eleni, Univ. of Athens<br />

Kostopoulou, Eirini, Univ. of Athens<br />

Maroulis, Dimitris, Univ. of Athens<br />

Kossida, Sophia, Foundation of Biomedical Res. of the Acad. of Athens<br />

Spot-Segmentation, an essential stage of processing 2D gel electrophoresis images, remains a challenging process. The<br />

available software programs and techniques fail to separate overlapping protein spots correctly and cannot detect low intensity<br />

spots without human intervention. This paper presents an original approach to spot segmentation in 2D gel electrophoresis<br />

images. The proposed approach is based on 2D-histograms of the aforementioned images. The conducted<br />

experiments in a set of 16-bit 2D gel electrophoresis images demonstrate that the proposed method is very effective and<br />

it outperforms existing techniques even when it is applied to images containing several overlapping spots as well as to<br />

images containing spots of various intensities, sizes and shapes.<br />

09:00-11:10, Paper WeAT9.28<br />

Automated Tracking of the Carotid Artery in Ultrasound Image Sequences using a Self Organizing Neural Network<br />

Hamid Muhammed, Hamed, Royal Inst. of Tech. (KTH)<br />

Azar, Jimmy C., STH, KTH<br />

An automated method for the segmentation and tracking of moving vessel walls in 2D ultrasound image sequences is introduced.<br />

The method was tested on simulated and real ultrasound image sequences of the carotid artery. Tracking was<br />

achieved via a self organizing neural network known as Growing Neural Gas. This topology-preserving algorithm assigns<br />

a net of nodes connected by edges that distributes itself within the vessel walls and adapts to changes in topology with<br />

time. The movement of the nodes was analyzed to uncover the dynamics of the vessel wall. By this way, radial and longitudinal<br />

strain and strain rates have been estimated. Finally, wave intensity signals were computed from these measure-<br />

- 189 -


ments. The method proposed improves upon wave intensity wall analysis, WIWA, and opens up a possibility for easy and<br />

efficient analysis and diagnosis of vascular disease through noninvasive ultrasonic examination.<br />

09:00-11:10, Paper WeAT9.29<br />

Quantification of Subcellular Molecules in Tissue MicroArray<br />

Can, Ali, General Electric<br />

Gerdes, Michael, General Electric<br />

Bello, Musodiq, General Electric<br />

Quantifying expression levels of proteins with sub cellular resolution is critical to many applications ranging from biomarker<br />

discovery to treatment planning. In this paper, we present a fully automated method and a new metric that quantifies<br />

the expression of target proteins in immunohisto-chemically stained tissue microarray (TMA) samples. The proposed<br />

metric is superior to existing intensity or ratio-based methods. We compared performance with the majority decision of a<br />

group of 19 observers scoring estrogen receptor (ER) status, achieving a detection rate of 96% with 90% specificity. The<br />

presented methods will accelerate the processes of biomarker discovery and transitioning of biomarkers from research<br />

bench to clinical utility.<br />

09:00-11:10, Paper WeAT9.30<br />

Actual Midline Estimation from Brain CT Scan using Multiple Regions Shape Matching<br />

Chen, Wenan, Virginia Commonwealth Univ.<br />

Ward, Kevin, Virginia Commonwealth Univ.<br />

Kayvan, Najarian, Virginia Commonwealth Univ.<br />

Computer assisted medical image processing can extract vital information that may be elusive to human eyes. In this paper,<br />

an algorithm is proposed to automatically estimate the position of the actual midline from the brain CT scans using multiple<br />

regions shape matching. The method matches feature points identified from a set of ventricle templates, extracted from<br />

MRI, with the corresponding feature points in the segmented ventricles from CT images. Then based on the matched<br />

feature points, the position of the actual midline is estimated. The proposed multiple regions shape matching algorithm<br />

addresses the deformation problem arising from the intrinsic multiple regions nature of the brain ventricles. Experiments<br />

on the CT scans from patients with traumatic brain injuries (TBI) show promising results, particularly the proposed algorithm<br />

proves to be quite robust.<br />

09:00-11:10, Paper WeAT9.31<br />

Boosting Alzheimer Disease Diagnosis using PET Images<br />

Silveira, Margarida, Inst. Superior Técnico / Inst. de Sistema e Robótica<br />

Marques, Jorge S., Inst. Superior Técnico<br />

Alzheimer’s disease (AD) is one of the most frequent type of dementia. Currently there is no cure for AD and early diagnosis<br />

is crucial to the development of treatments that can delay the disease progression. Brain imaging can be a biomarker<br />

for Alzheimer’s disease. This has been shown in several works with MR Images, but in the case of functional imaging<br />

such as PET, further investigation is still needed to determine their ability to diagnose AD, especially at the early stage of<br />

Mild Cognitive Impairment (MCI). In this paper we study the use of PET images of the ADNI database for the diagnosis<br />

of AD and MCI. We adopt a Boosting classification method, a technique based on a mixture of simple classifiers, which<br />

performs feature selection concurrently with the segmentation thus is well suited to high dimensional problems. The Boosting<br />

classifier achieved an accuracy of 90.97% in the detection of AD and 79.63% in the detection of MCI.<br />

09:00-11:10, Paper WeAT9.32<br />

Efficient Quantitative Information Extraction from PCR-RFLP Gel Electrophoresis Images<br />

Maramis, Christos, Aristotle Univ. of Thessaloniki<br />

Delopoulos, Anastasios, Aristotle Univ. of Thessaloniki<br />

For the purpose of PCR-RFLP analysis, as in the case of human papillomavirus (HPV) typing, quantitative information<br />

needs to be extracted from images resulting from one-dimensional gel electrophoresis by associating the image intensity<br />

with the concentration of biological material at the corresponding position on a gel matrix. However, the background intensity<br />

of the image stands in the way of quantifying this association. We propose a novel, efficient methodology for mod-<br />

- 190 -


eling the image background with a polynomial function and prove that this can benefit the extraction of accurate information<br />

from the lane intensity profile when modeled by a superposition of properly shaped parametric functions.<br />

09:00-11:10, Paper WeAT9.33<br />

Heart Murmur Classification using Complexity Signatures<br />

Kumar, Dinesh, Univ. of Coimbra<br />

Carvalho, Paulo, Univ. of Coimbra<br />

Couceiro, Ricardo, Univ. of Coimbra<br />

Antunes, Manuel, Univ. Hospital of Coimbra<br />

Paiva, Rui Pedro, Univ. of Coimbra<br />

Henriques, Jorge, Univ. of Coimbra<br />

In this work, we propose a two-stage classifier based on the analysis of the heart sound’s complexity for murmur identification<br />

and classification. The first stage of the classifier verifies if the heart sound (HS) exhibits murmurs. To this end,<br />

the chaotic nature of the signal is assessed using the Lyapunov exponents (LEs). The second stage of the method is devoted<br />

to the classification of the type of murmur. In opposition to current state of the art methods for murmur classification, a<br />

reduced set of features is proposed. This set includes both well-known as well as new features designed to capture the<br />

morphological and the chaotic nature of murmurs. The classification scheme is evaluated with three classification methods:<br />

Learning Vector Quantization, Gaussian Mixture Models and Support Vector Machines. The achieved results are comparable<br />

to reported results in literature, while relying on a significant smaller set of features.<br />

09:00-11:10, Paper WeAT9.34<br />

3D Filtering for Injury Detection in Brain MRI<br />

Sun, Yu, Univ. of California, Riverside<br />

Bhanu, Bir, Univ. of California Riverside<br />

This paper introduces a brain injury detection approach, using 3D filtering technique, for the images acquired by the magnetic<br />

resonance imaging (MRI) technique. The proposed method uses the symmetry property of brain MRI on both 2D<br />

images and 3D volumetric information of the MRI sequences. The approach consists of two key steps: (1) each slice of a<br />

brain image is segmented into different parts using a region growing algorithm, and a symmetry affinity matrix is computed,<br />

(2) non-symmetric regions are extracted, and they are further used to detect brain injury. The Kalman filter is explicitly<br />

used in step (2) to filter out the non-injury regions in 3D. Experiments are carried out to indicate the high efficiency of the<br />

method to detect the brain injuries.<br />

09:00-11:10, Paper WeAT9.35<br />

Prediction of Protein Sub-Nuclear Location by Clustering mRMR Ensemble Feature Selection<br />

Sakar, Cemal Okan, Bahcesehir Univ.<br />

Kursun, Olcay, Istanbul Univ.<br />

Seker, Huseyin, De Montfort Univ.<br />

Gürgen, Fikret Boğaziçi Univ.<br />

In many applications of pattern recognition in the bioinformatics and biomedical fields, input variables are organized into<br />

natural partitions that are called views in the literature. Mutual information can be used in selecting a minimal yet capable<br />

subset of views. Ignoring the presence of views, dismantling them, and treating their variables intermixed along with those<br />

of others at best results in a complex uninterpretable predictive system for researchers in these fields. Moreover, it would<br />

require measuring or computing majority of the views. We use the clustering indices of the views and rank the views according<br />

to the unique information they have with the target using minimum redundancy-maximum relevance (mRMR)<br />

approach. We also propose an ensemble approach to reduce the random variations in clusterings.<br />

09:00-11:10, Paper WeAT9.36<br />

Multivariate Brain Mapping by Random Subspaces<br />

Sona, Diego, Fondazione Bruno Kessler<br />

Avesani, Paolo, Fondazione Bruno Kessler<br />

- 191 -


Functional neuroimaging consists in the use of imaging technologies allowing to record the functional brain activity in<br />

real-time. Among all techniques, data produced by functional magnetic resonance is encoded as sequences of 3D images<br />

of thousands of voxels. The main investigation performed on this data, termed brain mapping, aims at producing functional<br />

maps of the brain. Brain mapping aims at the detection of the portion of voxels concerned with specific perceptual or cognitive<br />

brain activities. This challenge can be shaped as a problem of feature selection. Excessive features-to-instances ratio<br />

characterizing this data is a major issue for the computation of statistically robust maps. We propose a solution based on<br />

a Random Subspace Method that extends the reference approach (Search Light) adopted by the neuroscientific community.<br />

A comparison of the two methods is supported by the results of an empirical evaluation.<br />

09:00-11:10, Paper WeAT9.37<br />

Dual Channel Colocalization for Cell Cycle Analysis using 3D Confocal Microscopy<br />

Jaeger, Stefan, Chinese Academy of Sciences<br />

Casas-Delucchi, Corella S., Tech. Univ. Darmstadt<br />

Cardoso, M. Cristina, Tech. Univ. Darmstadt<br />

Palaniappan, Kannappan, Univ. of Missouri<br />

We present a cell cycle analysis that aims towards improving our previous work by adding another channel and using one<br />

more dimension. The data we use is a set of 3D images of mouse cells captured with a spinning disk confocal microscope.<br />

All images are available in two channels showing the chromocenters and the fluorescently marked protein PCNA, respectively.<br />

In the present paper, we will describe our recent colocalization study in which we use Hessian-based blob detectors<br />

in combination with radial features to measure the degree of overlap between both channels. We show that colocalization<br />

performed in such a way provides additional discriminative power and allows us to distinguish between phases that we<br />

were not able to distinguish with a single 2D channel.<br />

09:00-11:10, Paper WeAT9.38<br />

Automated Cell Phase Classification for Zebrafish Fluorescence Microscope Images<br />

Lu, Yanting, Nanjing Univ. of Science and Tech.<br />

Lu, Jianfeng, Nanjing Univ. of Science and Tech.<br />

Liu, Tianming, Univ. of Georgia<br />

Yang,Jingyu, Univ. of Georgia<br />

Automated cell phenotype image classification is an interesting bioinformatics problem. In this paper, an automated cell<br />

phase classification framework is investigated for zebra fish presomitic mesoderm (PSM) images. Low image resolution,<br />

gradual transitions between adjacent categories and irregularity of real cell images make this classification task tough but<br />

intriguing. The proposed framework first segments zebra fish image into cell patches by a two-stage segmentation procedure,<br />

then extracts feature set NF9, which designed especially for this low resolution image set, on each cell patch, and finally<br />

employs support vector machine (SVM) as cell classifier. At present, the total accuracy by NF9 is 75%.<br />

09:00-11:10, Paper WeAT9.39<br />

Data-Driven Lung Nodule Models for Robust Nodule Detection in Chest CT<br />

Farag, Amal, Univ. of Louisville<br />

Graham, James, Univ. of Louisville<br />

Farag, Aly A., Univ. of Louisville<br />

The quality of the lung nodule models determines the success of lung nodule detection. This paper describes aspects of<br />

our data-driven approach for modeling lung nodules using the texture and shape properties of real nodules to form an average<br />

model template per nodule type. The ELCAP low dose CT (LDCT) scans database is used to create the required statistics<br />

for the models based on modern computer vision techniques. These models suit various machine learning approaches<br />

for nodule detection including Bayesian methods, SVM and Neural Networks, and computations may be enhanced through<br />

genetic algorithms and Adaboost. The eminence of the new nodule models are studied with respect to parametric models<br />

showing significant improvements in both sensitivity and specificity.<br />

- 192 -


09:00-11:10, Paper WeAT9.41<br />

Segmentation of Anatomical Structures in Brain MR Images using Atlases in FSL - a Quantitative Approach<br />

Soldea, Octavian, Sabanci Univ.<br />

Ekin, Ahmet, Philips Res. Europe<br />

Soldea, Diana Florentina, Sabanci Univ.<br />

Unay, Devrim, Bahcesehir Univ.<br />

Cetin, Mujdat, Sabanci Univ.<br />

Ercil, Aytul, Sabanci Univ.<br />

Uzunbas, Gokhan Mustafa, Rutgers State University<br />

Firat, Zeynep, Yeditepe University Hospital<br />

Cihangiroglu, Mutlu, Yeditepe University Hospital<br />

Segmentation of brain structures from MR images is crucial in understanding the disease progress, diagnosis, and treatment<br />

monitoring. Atlases, showing the expected locations of the structures, are commonly used to start and guide the segmentation<br />

process. In many cases, the quality of the atlas may have a significant effect in the final result. In the literature,<br />

commonly used atlases may be obtained from one subject’s data, only from the healthy, or depict only certain structures<br />

that limit their accuracy. Anatomical variations, pathologies, imaging artifacts all could aggravate the problems related to<br />

application of atlases. In this paper, we propose to use multiple atlases that are sufficiently different from each other as<br />

much as possible to handle such problems. To this effect, we have built a library of atlases and computed their similarity<br />

values to each other. Our study showed that the existing atlases have varying levels of similarity for different structures.<br />

09:00-11:10, Paper WeAT9.42<br />

Graphical Model-Based Tracking of Curvilinear Structures in Bio-Image Sequences<br />

Koulgi, Pradeep, Univ. of California, Santa Barbara<br />

Sargin, Mehmet Emre, Univ. of California, Santa Barbara<br />

Rose, Kenneth, Univ. of California, Santa Barbara<br />

Manjunath, B. S., Univ. of California, Santa Barbara<br />

Tracking of curvilinear structures is a task of fundamental importance in the quantitative analysis of biological structures<br />

such as neurons, blood vessels, retinal interconnects, microtubules, etc. The state of the art HMM-based contour tracking<br />

scheme for tracking microtubules, while performing well in most scenarios, can miss the track if, during its growth, it intersects<br />

another microtubule in its neighbourhood. In this paper we present a graphical model-based tracking algorithm<br />

which propagates across frames information about the dynamics of all the microtubules. This allows the algorithm to faithfully<br />

differentiate the contour of interest from others that contribute to the clutter, and maintain tracking accuracy. We<br />

present results of experiments on real microtubule images captured using fluorescence microscopy, and show that our proposed<br />

scheme outperforms the existing HMM-based scheme.<br />

11:10-12:10, WePL1 Anadolu Auditorium<br />

The Quantitative Analysis of User Behavior Online Data, Models and Algorithms<br />

Prabhakar Raghavan Plenary Session<br />

Yahoo! Research, USA<br />

Prabhakar Raghavan has been the head of Yahoo! Research since 2005. His research interests include text and web mining,<br />

and algorithm design. He is a consulting professor of Computer Science at Stanford University and editor-in-chief of the<br />

Journal of the ACM. Prior to joining Yahoo!, he was the chief technology officer at Verity and has held a number of technical<br />

and managerial positions at IBM Research. Prabhakar received his PhD from Berkeley and is a fellow of the ACM<br />

and of the IEEE.<br />

By blending principles from mechanism design, algorithms, machine learning and massive distributed computing, the search<br />

industry has become good at optimizing monetization on sound scientific principles. This represents a successful and<br />

growing partnership between computer science and microeconomics. When it comes to understanding how online users<br />

respond to the content and experiences presented to them, we have more of a lacuna in the collaboration between computer<br />

science and certain social sciences. We will use a concrete technical example from image search results presentation, developing<br />

in the process some algorithmic and machine learning problems of interest in their own right. We then use this<br />

example to motivate the kinds of studies that need to grow between computer science and the social sciences; a critical<br />

element of this is the need to blend large-scale data analysis with smaller-scale eye-tracking and “individualized” lab studies.<br />

- 193 -


WeBT1 Marmara Hall<br />

Tracking and Surveillance - III Regular Session<br />

Session chair: Liao, Mark (Univ. of Southampton)<br />

13:30-13:50, Paper WeBT1.1<br />

Object Tracking by Structure Tensor Analysis<br />

Donoser, Michael, Graz Univ. of Tech.<br />

Kluckner, Stefan, Graz Univ. of Tech.<br />

Bischof, Horst, Graz Univ. of Tech.<br />

Covariance matrices have recently been a popular choice for versatile tasks like recognition and tracking due to their powerful<br />

properties as local descriptor and their low computational demands. This paper outlines similarities of covariance matrices<br />

to the well-known structure tensor. We show that the generalized version of the structure tensor is a powerful descriptor and<br />

that it can be calculated in constant time by exploiting the properties of integral images. To measure the similarities between<br />

several structure tensors, we describe an approximation scheme which allows comparison in a Euclidean space. Such an approach<br />

is also much more efficient than the common, computationally demanding Riemannian Manifold distances. Experimental<br />

evaluation proves the applicability for the task of object tracking demonstrating improved performance compared to<br />

covariance tracking.<br />

13:50-14:10, Paper WeBT1.2<br />

Prototype Learning using Metric Learning based Behavior Recognition<br />

Zhu, Pengfei, Chinese Acad. of Sciences<br />

Hu, Weiming, Chinese Acad. of Sciences<br />

Yuan, Chunfeng, Chinese Acad. of Sciences<br />

Li, Li, Chinese Acad. of Sciences<br />

Behavior recognition is an attractive direction in the computer vision domain. In this paper, we propose a novel behavior<br />

recognition method based on prototype learning using metric learning. Prototype learning algorithm can improve the classification<br />

performance of nearest-neighbor classifier, reduce the storage and computation requirements. And the metric learning<br />

algorithm is used to advance the performance of the prototype learning. In this paper, We use a kind of compound feature<br />

including local feature and motion feature to recognize human behaviors. The experimental results show the effectiveness<br />

of our method.<br />

14:10-14:30, Paper WeBT1.3<br />

Are Correlation Filters Useful for Human Action Recognition?<br />

Ali, Saad, Carnegie Mellon Univ.<br />

Lucey, Simon, CSIRO<br />

It has been argued in recent work that correlation filters are attractive for human action recognition from videos. Motivation<br />

for their employment in this classification task lies in their ability to: (i) specify where the filter should peak in contrast to<br />

all other shifts in space and time, (ii) have some degree of tolerance to noise and intra-class variation (allowing learning<br />

from multiple examples), and (iii) can be computed deterministically with low computational overhead. Specifically, Maximum<br />

Average Correlation Height (MACH) filters have exhibited encouraging results~\cite{Mikel} on a variety of human<br />

action datasets. Here, we challenge the utility of correlation filters, like the MACH filter, in these circumstances. First, we<br />

demonstrate empirically that identical performance can be attained to the MACH filter by simply taking the~\emph{average}<br />

of the same action specific training examples. Second, we characterize theoretically and empirically under what circumstances<br />

a MACH filter would become equivalent to the average of the action specific training examples. Based on this characterization,<br />

we offer an alternative type of filter, based on a discriminative paradigm, that circumvent the inherent limitations of<br />

correlation filters for action recognition and demonstrate improved action recognition performance.<br />

14:30-14:50, Paper WeBT1.4<br />

Tracking Hand Rotation and Grasping from an IR Camera using Cylindrical Manifold Embedding<br />

Lee, Chan-Su, Yeungnam Univ.<br />

Park, Shin Won, Yeungnam Univ.<br />

- 194 -


This paper presents a new approach for hand rotation and grasping tracking from a single IR camera. For the complexity and<br />

ambiguity of hand pose, it is difficult to track hand pose and view variations simultaneously from a single camera. We propose<br />

a cylindrical manifold embedding for one dimensional hand pose variation and cyclic viewpoint variation. A hand pose shape<br />

from a specific viewpoint can be generated from an embedding point on the cylindrical manifold after learning nonlinear<br />

generative models from the embedding space to the corresponding observed shape. Hand grasping with simultaneous hand<br />

rotation is tracked using particle filter on the manifold space. Experimental results for synthetic and real data show accurate<br />

tracking of grasping hand with rotation. The proposed approach shows potentials for advanced user interface in dark environments.<br />

14:50-15:10, Paper WeBT1.5<br />

Particle Filter Tracking with Online Multiple Instance Learning<br />

Ni, Zefeng, Univ. of California, Santa Barbara<br />

Sunderrajan, Santhoshkumar, Univ. of California, Santa Barbara<br />

Rahimi, Amir, Univ. of California, Santa Barbara<br />

Manjunath, B. S., Univ. of California, Santa Barbara<br />

This paper addresses the problem of object tracking by learning a discriminative classifier to separate the object from its<br />

background. The online-learned classifier is used to adaptively model object’s appearance and its background. To solve the<br />

typical problem of erroneous training examples generated during tracking, an online multiple instance learning (MIL) algorithm<br />

is used by allowing false positive examples. In addition, particle filter is applied to make best use of the learned classifier<br />

and help to generate a better representative set of training examples for the online MIL learning. The effectiveness of the<br />

proposed algorithm is demonstrated in some challenging environments for human tracking.<br />

WeBT2 Topkapı Hall A<br />

Pattern Recognition Systems and Applications - I Regular Session<br />

Session chair: Fred, Ana Luisa Nobre (Instituto Superior Técnico)<br />

13:30-13:50, Paper WeBT2.1<br />

A Test of Granger Non-Causality based on Nonparametric Conditional Independence<br />

Seth, Sohan, Univ. of Florida<br />

Principe, Jose, Univ. of Florida<br />

In this paper we describe a test of Granger non-causality from the perspective of a new measure of nonparametric conditional<br />

independence. We apply the proposed test on two synthetic nonlinear problems where linear Granger causality fails and<br />

show that the proposed method is able to derive the true causal connectivity effectively.<br />

13:50-14:10, Paper WeBT2.2<br />

Haar Random Forest Features and SVM Spatial Matching Kernel for Stonefly Species Identification<br />

Larios, Natalia, Univ. of Washington<br />

Soran, Bilge, Univ. of Washington<br />

Shapiro, Linda,<br />

Martinez-Muñoz, Gonzalo, Univ. Autonoma de Madrid<br />

Lin, Junyuan, Oregon State Univ.<br />

Dietterich, Thomas G., Oregon State Univ.<br />

This paper proposes an image classification method based on extracting image features using Haar random forests and combining<br />

them with a spatial matching kernel SVM. The method works by combining multiple efficient, yet powerful, learning<br />

algorithms at every stage of the recognition process. On the task of identifying aquatic stonefly larvae, the method has stateof-the-art<br />

or better performance, but with much higher efficiency.<br />

14:10-14:30, Paper WeBT2.3<br />

Incorporating Lane Estimation as Context Source in Pedestrian Recognition Task<br />

Szczot, Magdalena, Daimler AG<br />

Dannenmann, Iris, Daimler AG<br />

Löhlein, Otto, Daimler AG<br />

- 195 -


This contribution presents a method for incorporating information given by a lane estimation system into the pedestrian<br />

recognition task. The lane in front of the vehicle is represented by a three dimensional set of points belonging to the middle<br />

of the road. A cascaded classifier solves the first stage of pedestrian recognition task delivering a list of detections in a camera<br />

image. We present a fusion system which combines the information provided by the cascaded classifier and the lane estimation.<br />

The fusion system delivers a probability map of the environment in front of the vehicle. The map indicates regions in<br />

front of the vehicle which with a certain probability contain a relevant detected pedestrian.<br />

14:30-14:50, Paper WeBT2.4<br />

PILL-ID: Matching and Retrieval of Drug Pill Imprint Images<br />

Lee, Young-Beom, Korea Univ.<br />

Park, Unsang, Michigan State Univ.<br />

Jain, Anil, Michigan State Univ.<br />

Automatic illicit drug pill matching and retrieval is becoming an important problem due to an increase in the number of<br />

tablet type illicit drugs being circulated in our society. We propose an automatic method to match drug pill images based on<br />

the imprints appearing on the tablet. This will help identify the source and manufacturer of the illicit drugs. The feature<br />

vector extracted from tablet images is based on edge localization and invariant moments. Instead of storing a single template<br />

for each pill type, we generate multiple templates during the edge detection process. This circumvents the difficulties during<br />

matching due to variations in illumination and viewpoint. Experimental results using a set of real drug pill images (822 illicit<br />

drug pill images and 1,294 legal drug pill images) showed 76.74% (93.02%) rank one (rank-20) matching accuracy.<br />

14:50-15:10, Paper WeBT2.5<br />

Identifying Gender from Unaligned Facial Images by Set Classification<br />

Chu, Wen-Sheng, Acad. Sinica<br />

Huang, Chun-Rong, Acad. Sinica<br />

Chen, Chu-Song, Acad. Sinica<br />

Rough face alignments lead to suboptimal performance of face identification systems. In this study, we present a novel approach<br />

for identifying genders from facial images without proper face alignments. Instead of using only one input for test,<br />

we generate an image set by randomly cropping out a set of image patches from a neighborhood of the face detection region.<br />

Each image set is represented as a subspace and compared with other image sets by measuring the canonical correlation between<br />

two associated subspaces. By finding an optimal discriminative transformation for all training subspaces, the proposed<br />

approach with unaligned facial images is shown to outperform the state-of-the-art methods with face alignment.<br />

WeBT3 Dolmabahçe Hall A<br />

Shape Modeling - II Regular Session<br />

Session chair: Imiya, Atsushi (Chiba Univ.)<br />

13:30-13:50, Paper WeBT3.1<br />

Detection of Shapes in 2D Point Clouds Generated from Images<br />

Su, Jingyong, Florida State Univ.<br />

Zhu, Zhiqiang, Florida State Univ.<br />

Srivastava, Anuj, Florida State Univ.<br />

Huffer, Fred W., Florida State Univ.<br />

We present a novel statistical framework for detecting pre-determined shape classes in 2D cluttered point clouds, which are<br />

in turn extracted from images. In this model based approach, we use a 1D Poisson process for sampling points on shapes, a<br />

2D Poisson process for points from background clutter, and an additive Gaussian model for noise. Combining these with a<br />

past stochastic model on shapes of continuous 2D contours, and optimization over unknown pose and scale, we develop a<br />

generalized likelihood ratio test for shape detection. We demonstrate the efficiency of this method and its robustness to clutter<br />

using both simulated and real data.<br />

- 196 -


13:50-14:10, Paper WeBT3.2<br />

Gait Learning-Based Regenerative Model: A Level Set Approach<br />

Al-Huseiny, Muayed Sattar, Univ. of Southampton<br />

Mahmoodi, Sasan, Univ. of Southampton<br />

Nixon, Mark, Univ. of Southampton<br />

We propose a learning method for gait synthesis from a sequence of shapes(frames) with the ability to extrapolate to novel<br />

data. It involves the application of PCA, first to reduce the data dimensionality to certain features, and second to model corresponding<br />

features derived from the training gait cycles as a Gaussian distribution. This approach transforms a non Gaussian<br />

shape deformation problem into a Gaussian one by considering features of entire gait cycles as vectors in a Gaussian space.<br />

We show that these features which we formulate as continuous functions can be modeled by PCA. We also use this model<br />

to in-between (generate intermediate unknown) shapes in the training cycle. Furthermore, this paper demonstrates that the<br />

derived features can be used in the identification of pedestrians.<br />

14:10-14:30, Paper WeBT3.3<br />

Scale-Space Spectral Representation of Shape<br />

Bates, Jonathan, Florida State Univ.<br />

Liu, Xiuwen, Florida State Univ.<br />

Mio, Washington, Florida State Univ.<br />

We construct a scale space of shape of closed Riemannian manifolds, equipped with metrics derived from spectral representations<br />

and the Hausdorff distance. The representation depends only on the intrinsic geometry of the manifolds, making it<br />

robust to pose and articulation. The computation of shape distance involves an optimization problem over the 2^p-element<br />

group of all p-bit strings, which is approached with Markov chain Monte Carlo techniques. The methods are applied to cluster<br />

surfaces in 3D space.<br />

14:30-14:50, Paper WeBT3.4<br />

Learning Metrics for Shape Classification and Discrimination<br />

Fan, Yu, Florida State Univ.<br />

Houle, David, Florida State Unversity<br />

Mio, Washington, Florida State Univ.<br />

We propose a family of shape metrics that generalize the classical Procrustes distance by attributing weights to general linear<br />

combinations of landmarks. We develop an algorithm to learn a metric that is optimally suited to a given shape classification<br />

problem. Shape discrimination experiments are carried out with phantom data, as well as landmark data representing the<br />

shape of the wing of different species of fruit flies.<br />

14:50-15:10, Paper WeBT3.5<br />

Non-Parametric 3D Shape Warping<br />

Hillenbrand, Ulrich, German Aerospace Center (DLR)<br />

A method is presented for non-rigid alignment of a source shape to a target shape through estimating and interpolating pointwise<br />

correspondences between their surfaces given as point clouds. The resulting mapping can be non-smooth and non-isometric,<br />

relate shapes across large variations, and find partial matches. It does not require a parametric model or a prior of<br />

deformations. Results are shown for some objects from the Princeton Shape Benchmark and a range scan.<br />

WeBT4 Dolmabahçe Hall B<br />

Image Denoising Regular Session<br />

Session chair: Skodras, A. (Hellenic Open Univ.)<br />

13:30-13:50, Paper WeBT4.1<br />

Edge Preserving Image Denoising in Reproducing Kernel Hilbert Spaces<br />

Bouboulis, Pantelis, Univ. of Athens<br />

Slavakis, Konstantinos, Univ. of Peloponnese<br />

Theodoridis, Sergios, Univ. of Athens<br />

- 197 -


The goal of this paper is the development of a novel approach for the problem of Noise Removal, based on the theory of<br />

Reproducing Kernels Hilbert Spaces (RKHS). The problem is cast as an optimization task in a RKHS, by taking advantage<br />

of the celebrated semi parametric Representer Theorem. Examples verify that in the presence of gaussian noise the proposed<br />

method performs relatively well compared to wavelet based techniques and outperforms them significantly in the presence<br />

of impulse or mixed noise.<br />

13:50-14:10, Paper WeBT4.2<br />

Multichannel Image Regularisation using Anisotropic Geodesic Filtering<br />

Grazzini, Jacopo, Los Alamos National Lab.<br />

Soille, Pierre, Ec. Joint Res. Centre<br />

Dillard, Scott, Los Alamos National Lab.<br />

This paper extends a recent image-dependent regularisation approach introduced in [Grazzini and Soille, PR09&CCIS09]<br />

aiming at edge-preserving smoothing. For that purpose, geodesic distances equipped with a Riemannian metric need to be<br />

estimated in local neighbourhoods. By deriving an appropriate metric from the gradient structure tensor, the associated geodesic<br />

paths are constrained to follow salient features in images. Following, we design a generalised anisotropic geodesic<br />

filter, incorporating not only a measure of the edge strength, like in the original method, but also further directional information<br />

about the image structures. The proposed filter is particularly efficient at smoothing heterogeneous areas while preserving<br />

relevant structures in multichannel images.<br />

14:10-14:30, Paper WeBT4.3<br />

Local Jet based Similarity for NL-Means Filtering<br />

Manzanera, Antoine, ENSTA-ParisTech<br />

Reducing the dimension of local descriptors in images is useful to perform pixels comparison faster. We show here that, for<br />

computing the NL-means denoising filter, image patches can be favourably replaced by a vector of spatial derivatives (local<br />

jet), to calculate the similarity between pixels. First, we present the basic, limited range implementation, and compare it with<br />

the original NL-means. We use a fast estimation of the noise variance to automatically adjust the decay parameter of the<br />

filter. Next, we present the unlimited range implementation using nearest neighbours search in the local jet space, based on<br />

a binary search tree representation.<br />

14:30-14:50, Paper WeBT4.4<br />

Image Denoising based on Fuzzy and Intra-Scale Dependency in Wavelet Transform Domain<br />

Saeedi, Jamal, Amirkabir Univ. of Tech.<br />

Moradi, Mohammad Hassan, Amirkabir Univ. of Tech.<br />

Abedi, Ali, Amirkabir Univ. of Tech.<br />

In this paper, we propose a new wavelet shrinkage algorithm based on fuzzy logic. Fuzzy logic is used for taking neighbor dependency and<br />

uncorrelated nature of noise into account in wavelet-based image denoising. For this reason, we use a fuzzy feature for enhancing wavelet coefficients<br />

information in the shrinkage step. Then a fuzzy membership function shrinks wavelet coefficients based on the fuzzy feature. We<br />

examine our image denoising algorithm in the dual-tree discrete wavelet transform, which is the new shiftable and modified version of discrete<br />

wavelet transform. Extensive comparisons with the state-of-the-art image denoising algorithm indicate that our image denoising algorithm<br />

has a better performance in noise suppression and edge preservation.<br />

14:50-15:10, Paper WeBT4.5<br />

Noise-Insensitive Contrast Enhancement for Rendering High-Dynamic-Range Images<br />

Lin, Hsueh-Yi Sean, Lunghwa Univ. of Science and Tech.<br />

The process of compressing the high luminance values into the displayable range inevitably incurs the loss of image contrasts. Although the<br />

local adaptation process, such as the two-scale contrast reduction scheme, is capable of preserving details during the HDR compression<br />

process, it cannot be used to enhance the local contrasts of image contents. Moreover, the effect of noise artifacts cannot be eliminated when<br />

the detail manipulation is subsequently performed. We propose a new tone reproduction scheme, which incorporates the local contrast enhancement<br />

and the noise suppression processes, for the display of HDR images. Our experimental results show that the proposed scheme is<br />

indeed effective in enhancing local contrasts of image contents and suppressing noise artifacts during the increase of the visibility of HDR<br />

scenes.<br />

- 198 -


WeBT5 Topkapı Hall B<br />

Feature Extraction for Face Recognition Regular Session<br />

Session chair: Govindaraju, Venu (Univ. at Buffalo)<br />

13:30-13:50, Paper WeBT5.1<br />

Monogenic Binary Pattern (MBP): A Novel Feature Extraction and Representation Model for Face Recognition<br />

Yang, Meng, The Hong Kong Pol. Univ.<br />

Zhang, Lei, The Hong Kong Pol. Univ.<br />

Zhang, Lin, The Hong Kong Pol. Univ.<br />

Zhang, David, The Hong Kong Pol. Univ.<br />

A novel feature extraction method, namely monogenic binary pattern (MBP), is proposed in this paper based on the theory<br />

of monogenic signal analysis, and the histogram of MBP (HMBP) is subsequently presented for robust face representation<br />

and recognition. MBP consists of two parts: one is monogenic magnitude encoded via uniform LBP, and the other is monogenic<br />

orientation encoded as quadrant-bit codes. The HMBP is established by concatenating the histograms of MBP of all<br />

sub-regions. Compared with the well-known and powerful Gabor filtering based LBP schemes, one clear advantage of<br />

HMBP is its lower time and space complexity because monogenic signal analysis needs fewer convolutions and generates<br />

more compact feature vectors. The experimental results on the AR and FERET face databases validate that the proposed<br />

MBP algorithm has better performance than or comparable performance with state-of-the-art local feature based methods<br />

but with significantly lower time and space complexity.<br />

13:50-14:10, Paper WeBT5.2<br />

Automatic Frequency Band Selection for Illumination Robust Face Recognition<br />

Ekenel, Hazim Kemal, Karlsruhe Inst. of Tech.<br />

Stiefelhagen, Rainer, Karlsruhe Inst. of Tech. & Fraunhofer IITB<br />

Varying illumination conditions cause a dramatic change in facial appearance that leads to a significant drop in face recognition<br />

algorithms’ performance. In this paper, to overcome this problem, we utilize an automatic frequency band selection<br />

scheme. The proposed approach is incorporated to a local appearance-based face recognition algorithm, which employs<br />

discrete cosine transform (DCT) for processing local facial regions. From the extracted DCT coefficients, the approach<br />

determines to the ones that should be used for classification. Extensive experiments conducted on the extended Yale face<br />

database B have shown that benefiting from frequency information provides robust face recognition under changing illumination<br />

conditions.<br />

14:10-14:30, Paper WeBT5.3<br />

Directed Random Subspace Method for Face Recognition<br />

Harandi, Mehrtash, NICTA<br />

Nili Ahmadabadi, Majid, Univ. of Tehran<br />

Nadjar Araabi, Babak, Univ. of Tehran<br />

Bigdeli, Abbas, NICTA<br />

Lovell, Brian Carrington, The Univ. of Queensland<br />

With growing attention to ensemble learning, in recent years various ensemble methods for face recognition have been<br />

proposed that show promising results. Among diverse ensemble construction approaches, random subspace method has<br />

received considerable attention in face recognition. Although random feature selection in random subspace method improves<br />

accuracy in general, it is not free of serious difficulties and drawbacks. In this paper we present a learning scheme<br />

to overcome some of the drawbacks of random feature selection in the random subspace method. The proposed learning<br />

method derives a feature discrimination map based on a measure of accuracy and uses it in a probabilistic recall mode to<br />

construct an ensemble of subspaces. Experiments on different face databases revealed that the proposed method gives superior<br />

performance over the well-known benchmarks and state of the art ensemble methods.<br />

14:30-14:50, Paper WeBT5.4<br />

Raw vs. Processed: How to Use the Raw and Processed Images for Robust Face Recognition under Varying Illuminatio<br />

Xu, Li, Chinese Acad. of Sciences<br />

- 199 -


Lei, Huang, Chinese Acad. of Sciences<br />

Liu, Changping, Chinese Acad. of Sciences<br />

Many previous image processing methods discard low-frequency components of images to extract illumination invariant<br />

for face recognition. However, this method may cause distortion of processed images and perform poorly under normal<br />

lighting. In this paper, a new method is proposed to deal with illumination problem in face recognition. Firstly, we define<br />

a score to denote a relative difference of the first and second largest similarities between the query input and the individuals<br />

in the gallery classes. Then, according to the score, we choose the appropriate images, raw or processed images, to involve<br />

the recognition. The experiment in ORL, CMU-PIE and Extended Yale B face databases shows that our adaptive method<br />

give more robust result after combination and perform better than the traditional fusion operators, the sum and the maximum<br />

of similarities.<br />

14:50-15:10, Paper WeBT5.5<br />

Discriminative Prototype Learning in Open Set Face Recognition<br />

Han, Zhongkai, Tsinghua Univ.<br />

Fang, Chi, Tsinghua Univ.<br />

Ding, Xiaoqing, Tsinghua Univ.<br />

We address the problem of prototype design for open set face recognition (OSFR) using single sample image. Normalized<br />

Correlation (NC), also known as Cosine Distance, offers many benefits in accuracy and robustness compared to other distance<br />

measurement in OSFR problem. Inspired by classical Learning Vector Quantization (LVQ), a novel discriminative<br />

learning method is proposed to design a discriminative prototype used by NC classifier. Specifically, we develop an objective<br />

function that fixes the NC score between the prototype and within-class sample at a high level and minimizes the<br />

similarity between the prototype and between-class samples. Several experiments conducted on benchmark databases<br />

demonstrate the superior performance of the prototype designed compared to the original one.<br />

WeBT6 Anadolu Auditorium<br />

Document Analysis - II Regular Session<br />

Session chair: Lopresti, Daniel (Lehigh Univ.)<br />

13:30-13:50, Paper WeBT6.1<br />

On-Line Handwriting Word Recognition using a Bi-Character Model<br />

Prum, Sophea, Univ. of La Rochelle<br />

Visani, Muriel, Univ. of La Rochelle<br />

Ogier, Jean-Marc, Univ. de la Rochelle<br />

This paper deals with on-line handwriting recognition. Analytic approaches have attracted an increasing interest during<br />

the last ten years. These approaches rely on a preliminary segmentation stage, which remains one of the most difficult<br />

problems and may affect strongly the quality of the global recognition process. In order to circumvent this problem, this<br />

paper introduces a bi-character model, where each character is recognized jointly with its neighboring characters. This<br />

model yields two main advantages. First, it reduces the number of confusions due to connections between characters<br />

during the character recognition step. Second, it avoids some possible confusion at the character recognition level during<br />

the word recognition stage. Our experimentation on significant databases shows some interesting improvements of the<br />

recognition rate, since the recognition rate is increased from 65% to 83% by using this bi-character strategy.<br />

13:50-14:10, Paper WeBT6.2<br />

Ruling Line Removal in Handwritten Page Images<br />

Lopresti, Daniel, Lehigh Univ.<br />

Kavallieratou, Ergina, Univ. of the Aegean<br />

In this paper we present a procedure for removing ruling lines from a handwritten document image that does not break existing<br />

characters. We take advantage of common ruling line properties such as uniform width, predictable spacing, position<br />

vs. text, etc. The proposed process has no effect on document images without ruling lines, hence no a priori discrimination<br />

is required. The system is evaluated on synthetic page images in five different languages.<br />

- 200 -


14:10-14:30, Paper WeBT6.3<br />

Script Identification – a Han & Roman Script Perspective<br />

Chanda, Sukalpa, GJØVIK Univ. Coll.<br />

Pal, Umapada, Indian Statistical Inst.<br />

Franke, Katrin, Gjøvik Univ. Coll.<br />

Kimura, Fumitaka, Mie Univ.<br />

All Han-based scripts (Chinese, Japanese, and Korean) possess similar visual characteristics. Hence system development<br />

for identification of Chinese, Japanese and Korean scripts from a single document page is quite challenging. It is noted<br />

that a Han-based document page might also have Roman script in them. A multi-script OCR system dealing with Chinese,<br />

Japanese, Korean, and Roman scripts, demands identification of scripts before execution of respective OCR modules. We<br />

propose a system to address this problem using directional features along with a Gaussian Kernel-based Support Vector<br />

Machine. We got promising results of 98.39% script identification accuracy at character level and 99.85% at block level,<br />

when no rejection was considered.<br />

14:30-14:50, Paper WeBT6.4<br />

Robust 1D Barcode Recognition on Mobile Devices<br />

Rocholl, Johann, Stuttgart Univ.<br />

Klenk, Sebastian, Stuttgart Univ.<br />

Heidemann, Gunther, Stuttgart Univ.<br />

In the following we will describe a novel method for decoding linear barcodes from blurry camera images. Our goal was<br />

to develop a algorithm that can be used on mobile devices to recognize product numbers from EAN or UPC barcodes.<br />

14:50-15:10, Paper WeBT6.5<br />

Fast Logo Detection and Recognition in Document Images<br />

Li, Zhe, Siemens AG<br />

Schulte-Austum, Matthias, Siemens AG<br />

Neschen, Martin, Recosys GmbH<br />

The scientific significance of automatic logo detection and recognition is more and more growing because of the increasing<br />

requirements of intelligent document image analysis and retrieval. In this paper, we introduce a system architecture which<br />

is aiming at segmentation-free and layout-independent logo detection and recognition. Along with the unique logo feature<br />

design, a novel way to ensure the geometrical relationships among the features, and different optimizations in the recognition<br />

process, this system can achieve improvements concerning both the recognition performance and the running time.<br />

The experimental results on several sets of real-word documents demonstrate the effectiveness of our approach.<br />

WeBT7 Dolmabahçe Hall C<br />

Classification in Biomedicine Regular Session<br />

Session chair: Gurcan, Metin (Ohio State Univ.)<br />

13:30-13:50, Paper WeBT7.1<br />

Joint Independent Component Analysis of Brain Perfusion and Structural Magnetic Resonance Images in Dementia<br />

Tosun, Duygu, Center for Imaging Neurodegenerative Diseases<br />

Rosen, Howard, UCSF<br />

Miller, Bruce L., UCSF<br />

Weiner, Michael W., UCSF<br />

Schuff, Norbert, UCSF<br />

Magnetic Resonance Imaging (MRI) provides various imaging modes to study the brain. We tested the benefits of joint<br />

analysis of multimodality MRI data using joint independent components analysis (jICA) in comparison to unimodality<br />

analyses. Specifically, we designed a jICA to decompose the joint distributions of multimodality MRI data across image<br />

voxels and subjects into independent components that explain joint variations between image modalities across subjects.<br />

We applied jICA to structural and perfusion-weighted MRI data from 12 patients diagnosed with behavioral variant front<br />

temporal dementia (bvFTD), a type of dementia, and 12 healthy elderly individuals. While unimodality analyses showed<br />

widespread brain atrophy and hypoperfusion in the patients, jICA further revealed links between atrophy and hypoperfusion<br />

- 201 -


in specific brain regions. Moreover, significant links were confined to the right brain hemisphere in FTLD, consistent with<br />

the clinical symptoms. Considering multimodality effect size between bvFTD patients and controls, brain atrophy and hypoperfusion<br />

regions identified by multimodality jICA yielded the large effect size while regions identified by unimodality<br />

analysis of atrophy and hypoperfusion differences revealed only a medium multimodality effect size between bvFTD patients<br />

and controls. The findings demonstrate the power of jICA to effectively evaluate multimodality brain imaging data.<br />

13:50-14:10, Paper WeBT7.2<br />

Endoscopic Image Classification using Edge-Based Features<br />

Häfner, Michael, St. Elisabeth Hospital<br />

Gangl, Alfred, Medical Univ. of Vienna<br />

Liedlgruber, Michael, Univ. of Salzburg<br />

Uhl, Andreas, Univ. of Salzburg<br />

Vécsei, Andreas, St. Anna Children’s Hospital<br />

Wrba, Friedrich, Medical Univ. of Vienna<br />

We present a system for an automated colon cancer detection based on the pit pattern classification. In contrast to previous<br />

work we exploit the visual nature of the underlying classification scheme by extracting features based on detected edges.<br />

To focus on the most discriminative subset of features we use a greedy forward feature subset selection. The classification<br />

is then carried out using the k-nearest neighbors (k-NN) classifier. The results obtained are very promising and show that<br />

an automated classification of the given imagery is feasible by using the proposed method.<br />

14:10-14:30, Paper WeBT7.3<br />

Biclustering of Expression Microarray Data with Topic Models<br />

Bicego, Manuele, Univ. of Verona<br />

Lovato, Pietro, Univ. of Verona<br />

Ferrarini, Alberto, Univ. of Verona<br />

Delledonne, Massimo, Univ. of Verona<br />

This paper presents an approach to extract biclusters from expression micro array data using topic models a class of probabilistic<br />

models which allow to detect interpretable groups of highly correlated genes and samples. Starting from a topic<br />

model learned from the expression matrix, some automatic rules to extract biclusters are presented, which overcome the<br />

drawbacks of previous approaches. The methodology has been positively tested with synthetic benchmarks, as well as<br />

with a real experiment involving two different species of grape plants (Vitis vinifera and Vitis riparia).<br />

14:30-14:50, Paper WeBT7.4<br />

A Multiple Instance Learning Approach Toward Optimal Classification of Pathology Slides<br />

Dundar, Murat, IUPUI<br />

Badve, Sunil, Indiana Univ.<br />

Raykar, Vikas, Siemens Medical<br />

Jain, Rohit, IUPUI<br />

Sertel, Olcay, The Ohio State Univ.<br />

Gurcan, Metin, The Ohio State Univ.<br />

Pathology slides are diagnosed based on the histological descriptors extracted from regions of interest (ROIs) identified<br />

on each slide by the pathologists. A slide usually contains multiple regions of interest and a positive (cancer) diagnosis is<br />

confirmed when at least one of the ROIs in the slide is identified as positive. For a negative diagnosis the pathologist has<br />

to rule out cancer for each and every ROI available. Our research is motivated toward computer-assisted classification of<br />

digitized slides. The objective in this study is to develop a classifier to optimize classification accuracy at the slide level.<br />

Traditional supervised training techniques which are trained to optimize classifier performance at the ROI level yield suboptimal<br />

performance in this problem. We propose a multiple instance learning approach based on the implementation of<br />

the large margin principle with different loss functions defined for positive and negative samples. We consider the classification<br />

of intraductal breast lesions as a case study, and perform experimental studies comparing our approach against<br />

the state-of-the-art.<br />

- 202 -


14:50-15:10, Paper WeBT7.5<br />

Gaussian ERP Kernel Classifier for Pulse Waveforms Classification<br />

Zuo, Wangmeng, Harbin Inst. of Tech.<br />

Zhang, Dongyu, Harbin Inst. of Tech.<br />

Zhang, David, The Hong Kong Pol. Univ.<br />

Wang, Kuanquan, Harbin Inst. of Tech.<br />

Li, Naimin, Harbin Inst. of Tech.<br />

While advances in sensor and signal processing techniques have provided effective tools for quantitative research on traditional<br />

Chinese pulse diagnosis (TCPD), the automatic classification of pulse waveforms is remained a difficult problem.<br />

To address this issue, this paper proposed a novel edit distance with real penalty (ERP)-based k-nearest neighbors (KNN)<br />

classifier by referring to recent progresses in time series matching and KNN classifier. Taking advantage of the metric<br />

property of ERP, we first develop a Gaussian ERP kernel, and then embed it into kernel difference-weighted KNN classifier.<br />

The proposed Gaussian ERP kernel classifier is evaluated on a dataset which includes 2470 pulse waveforms. Experimental<br />

results show that the proposed classifier is much more accurate than several other pulse waveform classification approaches.<br />

WeCT1 Marmara Hall<br />

Tracking and Surveillance - IV Regular Session<br />

Session chair: Carneiro, Gustavo (Technical Univ. of Lisbon)<br />

15:40-16:00, Paper WeCT1.1<br />

Human 3D Motion Recognition based on Spatial-Temporal Context of Joints<br />

Zhao, Qiong, Univ. of Science and Tech. of China<br />

Wang, Lihua, City Univ. of Hong Kong<br />

Ip, Horace,<br />

Zhou, Xuehai, Univ. of Science and Tech. of China<br />

The paper presents a novel human motion recognition method based on a new form of the Hidden Markov Models, called<br />

spatial-temporal hidden markov models (ST-HMM), which can be learnt from a sequence of joints positions. To cope with<br />

the high dimensionality of the pose space, in this paper, we exploit the spatial dependency between each pair of spatially<br />

connected joints in the articulated skeletal structure, as well as the temporal dependency due to the continuous movement<br />

of each of the joints. The spatial-temporal contexts of these joints are learnt from the sequences of joints movements and<br />

captured by our ST-HMM. Results of recognizing 11 different action classes on a large number of motion capture sequences<br />

as well as synthetic tracking data show that our approach outperforms traditional HMM approach in terms of robustness<br />

and recognition rates.<br />

16:00-16:20, Paper WeCT1.2<br />

Matching Groups of People by Covariance Descriptor<br />

Cai, Yinghao, Univ. of Oulu<br />

Takala, Valtteri, Univ. of Oulu<br />

Pietikäinen, Matti, Univ. of Oulu<br />

In this paper, we present a new solution to the problem of matching groups of people across multiple non-overlapping<br />

cameras. Similar to the problem of matching individuals across cameras, matching groups of people also faces challenges<br />

such as variations of illumination conditions, poses and camera parameters. Moreover, people often swap their positions<br />

while walking in a group. In this paper, we propose to use covariance descriptor in appearance matching of group images.<br />

Covariance descriptor is shown to be a discriminative descriptor which captures both appearance and statistical properties<br />

of image regions. Furthermore, it presents a natural way of combining multiple heterogeneous features together with a<br />

relatively low dimensionality. Experimental results on two different datasets demonstrate the effectiveness of the proposed<br />

method.<br />

16:20-16:40, Paper WeCT1.3<br />

Boosting Incremental Semi-Supervised Discriminant Analysis for Tracking<br />

Wang, Heng, Chinese Acad. of Sciences<br />

Hou, Xinwen, Chinese Acad. of Sciences<br />

- 203 -


Liu, Cheng-Lin, Chinese Acad. of Sciences<br />

Tracking is recently formulated as a problem of discriminating the object from its nearby background, where the classifier<br />

is updated by new samples successively arriving during tracking. Depending on whether labeling the samples or not, the<br />

tracker can be designed in a supervised or semi-supervised manner. This paper proposes a novel semi-supervised algorithm<br />

for tracking by combining Semi-supervised Discriminant Analysis (SDA) with an online boosting framework. Using the<br />

local geometric structure information from the samples, the SDA-based weak classifier is made more robust to outliers.<br />

Meanwhile, we design an incremental updating mechanism for SDA so that it can adapt to appearance changes. We further<br />

propose an Extended SDA (ESDA) algorithm, which gives better discrimination ability. Results on several challenging<br />

video sequences demonstrate the effectiveness of the method.<br />

16:40-17:00, Paper WeCT1.4<br />

Optical Rails: View-Based Track Following with Hemispherical Environment Model and Orientation View<br />

Descriptors<br />

Dederscheck, David, Goethe Univ. Frankfurt<br />

Zahn, Martin, Goethe Univ. Frankfurt<br />

Friedrich, Holger, Goethe Univ. Frankfurt<br />

Mester, Rudolf, Goethe Univ. Frankfurt<br />

We present a purely view-based method for robot navigation along a prerecorded track using compact omni directional<br />

view-descriptors. This paper focuses on a new model for the navigation environment to determine the steering direction<br />

by efficient holistic comparison of views. The concept of view descriptors based on low-order expansion of local orientation<br />

vectors into spherical harmonic basis functions is augmented by a linear illumination model, providing discriminative<br />

view matching also under illumination changes.<br />

17:00-17:20, Paper WeCT1.5<br />

Forward-Backward Error: Automatic Detection of Tracking Failures<br />

Kalal, Zdenek, Univ. of Surrey<br />

Mikolajczyk, Krystian, Univ. of Surrey<br />

Matas, Jiri, CTU Prague<br />

This paper proposes a novel method for tracking failure detection. The detection is based on the Forward-Backward error,<br />

i.e. the tracking is performed forward and backward in time and the discrepancies between these two trajectories are measured.<br />

We demonstrate that the proposed error enables reliable detection of tracking failures and selection of reliable trajectories<br />

in video sequences. We demonstrate that the approach is complementary to commonly used normalized<br />

cross-correlation (NCC). Based on the error, we propose a novel object tracker called Median Flow. State-of-the-art performance<br />

is achieved on challenging benchmark video sequences which include non-rigid objects.<br />

WeCT2 Topkapı Hall A<br />

Pattern Recognition Systems and Applications - II Regular Session<br />

Session chair: Marinai, Simone (Univ. of Florence)<br />

15:40-16:00, Paper WeCT2.1<br />

Scene-Adaptive Human Detection with Incremental Active Learning<br />

Joshi, Ajay, Univ. of Minnesota, Twin Cities<br />

Porikli, Fatih, MERL<br />

In many computer vision tasks, scene changes hinder the generalization ability of trained classifiers. For instance, a human<br />

detector trained with one set of images is unlikely to perform well in different scene conditions. In this paper, we propose<br />

an incremental learning method for human detection that can take generic training data and build a new classifier adapted<br />

to the new deployment scene. Two operation modes are proposed: i) a completely autonomous mode wherein first few<br />

empty frames of video are used for adaptation, and ii) an active learning approach with user in the loop, for more challenging<br />

scenarios including situations where empty initialization frames may not exist. Results show the strength of the<br />

proposed methods for quick adaptation.<br />

- 204 -


16:00-16:20, Paper WeCT2.2<br />

Direct Printability Prediction in VLSI using Features from Orthogonal Transforms<br />

Kryszczuk, Krzysztof, IBM Zurich Res. Lab.<br />

Hurley, Paul, IBM Zurich Res. Lab.<br />

Sayah, Robert, IBM Systems and Tech. Group<br />

Full-chip printability simulations for VLSI layouts use analytical and heuristic physical process models, and require an<br />

explicit creation of a mask and image. This is a computationally expensive task, often prohibitively so, especially when<br />

prototyping new designs. In this paper we show that using orthogonal transform-based fixed-length feature vector representations<br />

of 22nm VLSI layouts to perform classification based rapid printability prediction, can help in avoiding or reducing<br />

the number of simulations. Furthermore, in order to overcome the problem of scarcity of training data, we show<br />

how re-scaled, abundant 45nm designs can train error prediction models for new, native 22nm designs. Our experiments,<br />

run on M1 layer data and line width errors, demonstrate the viability of the proposed approach.<br />

16:20-16:40, Paper WeCT2.3<br />

Improving Performance of Network Traffic Classification Systems by Cleaning Training Data<br />

Gargiulo, Francesco, Univ. of Naples Federico II<br />

Sansone, Carlo, Univ. of Naples Federico II<br />

In this paper we propose to apply an algorithm for finding out and cleaning mislabeled training sample in an adversarial<br />

learning context, in which a malicious user tries to camouflage training patterns in order to limit the classification system<br />

performance. In particular, we describe how this algorithm can be effectively applied to the problem of identifying HTTP<br />

traffic flowing through port TCP 80, where mislabeled samples can be forced by using port-spoofing attacks.<br />

16:40-17:00, Paper WeCT2.4<br />

Bayesian Networks for Predicting IVF Blastocyst Development<br />

Uyar, Asli, Bogazici Univ.<br />

Bener, Ayse, Bogazici Univ.<br />

Ciray, H. Nadir, Bahceci Woman Healthcare Centre<br />

Bahceci, Mustafa, Bahceci Woman Healthcare Centre<br />

In in-vitro fertilization (IVF) treatment, blastocyst stage embryo transfers at day 5 result in higher pregnancy rates. However,<br />

there is a risk of transfer cancelation due to embryonic developmental failure. Clinicians need reliable models in<br />

predicting blastocyst development. In this study, we apply Bayesian networks in order to investigate cause-effect relationships<br />

of the variables of interest in embryo growth process and to predict blastocyst development. We have analyzed 7745<br />

embryo records including embryo morphological characteristics and patient related data. Experimental results revealed<br />

that, Bayesian networks can predict blastocyst development with 63.5% true positive rate and 33.8% false positive rate.<br />

17:00-17:20, Paper WeCT2.5<br />

Spectral Invariant Representation for Spectral Reflectance Image<br />

Ibrahim, Abdelhameed, Chiba Univ.<br />

Tominaga, Shoji, Chiba Univ.<br />

Horiuchi, Takahiko, Chiba Univ.<br />

Although spectral images contain large amount of information, compared with color images, the image acquisition is affected<br />

by several factors such as shading and specular highlight. Many researchers have introduced color invariant and<br />

spectral invariant representations for these factors using the standard dichromatic reflection model of inhomogeneous dielectric<br />

materials. However, these representations are inadequate for other materials like metal. This paper proposes a<br />

more general spectral invariant representation for obtaining reliable spectral reflectance images. Our invariant representation<br />

is derived from the standard dichromatic reflection model for dielectric materials and the extended dichromatic reflection<br />

model for metals. We proof the invariant formulas for spectral images of most natural objects preserve spectral<br />

information and are invariant to highlights, shading, surface geometry, and illumination intensity. The method is applied<br />

to the problem of material classification and image segmentation of a raw circuit board. Experiments are done with real<br />

spectral images to examine the performance of the proposed method.<br />

- 205 -


WeCT3 Dolmabahçe Hall A<br />

Active Contours and Related Methods Regular Session<br />

Session chair: Burkhardt, Hans (Univ. of Freiburg)<br />

15:40-16:00, Paper WeCT3.1<br />

Level Set based Segmentation using Local Feature Distribution<br />

Xie, Xianghua, Swansea Univ.<br />

We propose a level set based framework to segment textured images. The snake deforms in the image domain in searching<br />

for object boundaries by minimizing an energy functional, which is defined based on dynamically selected local distribution<br />

of orientation invariant features. We also explore the user initialization to simplify the segmentation and improve accuracy.<br />

Experimental results on both synthetic and real data show significant improvements compared to direct modeling of filtering<br />

responses or piecewise constant modeling.<br />

16:00-16:20, Paper WeCT3.2<br />

Mean Shift Gradient Vector Flow: A Robust External Force Field for 3D Active Surfaces<br />

Keuper, Margret, Univ. of Freiburg<br />

Padeken, Jan, Max-Planck-Inst. of Immunobiology<br />

Heun, Patrick, Max-Planck-Inst. of Immunobiology<br />

Burkhardt, Hans, Univ. of Freiburg<br />

Ronneberger, Olaf, Univ. of Freiburg<br />

Gradient vector flow snakes are a very common method in bio-medical image segmentation. The use of gradient vector flow<br />

herein brings some major advantages like a large capture range and a good adaption of the snakes in concave regions. In<br />

some cases though, the application of gradient vector flow can also have undesired effects, e.g. if only parts of an image are<br />

strongly blurred, the remaining weak gradients will be smoothed away. Also, large gradients resulting from small but bright<br />

image structures usually have strong impact on the overall result. To tackle this problem, we present an improvement of the<br />

gradient vector flow, using the mean shift procedure and show its advantages on the segmentation of 3D cell nuclei.<br />

16:20-16:40, Paper WeCT3.3<br />

Adaptive Diffusion Flow for Parametric Active Contours<br />

Wu, Yuwei, Beijing Inst. of Tech.<br />

Wang, Yuanquan, Beijing Inst. of Tech.<br />

Jia, Yunde, Beijing Inst. of Tech.<br />

This paper proposes a novel external force for active contours, called adaptive diffusion flow (ADF). We reconsider the generative<br />

mechanism of gradient vector flow (GVF) diffusion process from the perspective of image restoration, and exploit a<br />

harmonic hyper surface minimal function to substitute smoothness energy term of GVF for alleviating the possible leakage<br />

problem. Meanwhile, a Laplacian functional is incorporated in the ADF framework to ensure that the vector flow diffuses<br />

mainly along normal direction in homogenous regions of an image. Experiments on synthetic and real images demonstrate<br />

the good properties of the ADF snake, including noise robustness, weak edge preserving, and concavity convergence.<br />

16:40-17:00, Paper WeCT3.4<br />

Using Snakes with Asymmetric Energy Terms for the Detection of Varying-Contrast Edges in SAR Images<br />

Seppke, Benjamin, Univ. of Hamburg<br />

Dreschler-Fischer, Leonie, Univ. of Hamburg<br />

Hübbe, Nathanael, Univ. of Hamburg<br />

Active contour methods like snakes, have become a basic tool in computer vision and image analysis over the last years.<br />

They have proven to be adequate for the task of finding boundary features like broken edges in an image. However, when<br />

applying the basic snake technique to synthetic aperture radar (SAR) remote sensing images, the detection of varying-contrast<br />

edges may not be satisfying. This is caused by the special imaging technique of SAR and the commonly known specklenoise.<br />

In this paper we propose the use of asymmetric external energy terms to cope with this problem. We show first results of the<br />

method for the detection of edges of tidal creeks using an ENVISAT ASAR image. These creeks can be found in the World<br />

Heritage Site Wadden Sea located at the German Bight (North Sea).<br />

- 206 -


17:00-17:20, Paper WeCT3.5<br />

Length Increasing Active Contour for the Segmentation of Small Blood Vessels<br />

Rivest-Hénault, David, École de Tech. Supérieure<br />

Deschênes, Sylvain, Sainte-Justine Hospital<br />

Lapierre, Chantale, Hospital Sainte-Justine<br />

Cheriet, Mohammed, École de Tech. Supérieure<br />

A new level-set based active contour method for the segmentation of small blood vessels and other elongated structures<br />

is presented. Its main particularity is the presence of a length increasing force in the contour driving equation. The effect<br />

of this force is to push the active contour in the direction of thin elongated shapes. Although the proposed force is not<br />

stable in general, our experiments show that with few precautions it can successfully be integrated in a practical segmentation<br />

scheme and that it helps to segment a longer part of the structures of interest. For the segmentation of blood vessels,<br />

this may reduce the amount of user interactivity needed: only a small region inside the structure of interest need to be<br />

specified.<br />

WeCT4 Anadolu Auditorium<br />

Graphical Models and Bayesian Methods Regular Session<br />

Session chair: Murino, Vittorio (Univ. of Verona)<br />

15:40-16:00, Paper WeCT4.1<br />

Using Sequential Context for Image Analysis<br />

Paiva, Antonio, Univ. of Utah<br />

Jurrus, Elizabeth, Univ. of Utah<br />

Tasdizen, Tolga, Univ. of Utah<br />

This paper proposes the sequential context inference (SCI) algorithm for Markov random field (MRF) image analysis.<br />

This algorithm is designed primarily for fast inference on an MRF model, but its application requires also a specific modeling<br />

architecture. The architecture is composed of a sequence of stages, each modeling the conditional probability of the<br />

labels, conditioned on a neighborhood of the input image and output of the previous stage. By learning the model at each<br />

stage sequentially with regards to the true output labels, the stages learn different models which can cope with errors in<br />

the previous stage.<br />

16:00-16:20, Paper WeCT4.2<br />

Recovery Video Stabilization using MRF-MAP Optimization<br />

Kim, Soo Wan, Seoul National Univ.<br />

Yi, Kwang Moo, Automation and System Res. Inst. Univ.<br />

Oh, Songhwai, Seoul National Univ.<br />

Choi, Jin Young, Seoul National University<br />

In this paper, we propose a novel approach for video stabilization using Markov random field (MRF) modeling and maximum<br />

a posteriori (MAP) optimization. We build an MRF model describing a sequence of unstable images and find joint<br />

pixel matchings over all image sequences with MAP optimization via Gibbs sampling. The resulting displacements of<br />

matched pixels in consecutive frames indicate the camera motion between frames and can be used to remove the camera<br />

motion to stabilize image sequences. The proposed method shows robust performance even when a scene has moving<br />

foreground objects and brings more accurate stabilization results. The performance of our algorithm is evaluated on outdoor<br />

scenes.<br />

16:20-16:40, Paper WeCT4.3<br />

Annealed SMC Samplers for Dirichlet Process Mixture Models<br />

Ülker, Yener, Istanbul Tech. Univ.<br />

Gunsel, Bilge, Istanbul Tech. Univ.<br />

Cemgil, Ali Taylan, Bogazici Univ.<br />

In this work we propose a novel algorithm that approximates sequentially the Dirichlet Process Mixtures (DPM) model<br />

posterior. The proposed method takes advantage of the Sequential Monte Carlo (SMC) samplers framework to design an<br />

- 207 -


effective annealing procedure that prevents the algorithm to get trapped in a local mode. We evaluate the performance in<br />

a Bayesian density estimation problem with unknown number of components. The simulation results suggest that the proposed<br />

algorithm represents the target posterior much more accurately and provides significantly smaller Monte Carlo error<br />

when compared to particle filtering.<br />

16:40-17:00, Paper WeCT4.4<br />

Bayesian Inference for Nonnegative Matrix Factor Deconvolution Models<br />

Kirbiz, Serap, Istanbul Tech. Univ.<br />

Cemgil, Ali Taylan, Bogazici Univ.<br />

Gunsel, Bilge, Istanbul Tech. Univ.<br />

In this paper we develop a probabilistic interpretation and a full Bayesian inference for non-negative matrix deconvolution<br />

(NMFD) model. Our ultimate goal is unsupervised extraction of multiple sound objects from a single channel auditory<br />

scene. The proposed method facilitates automatic model selection and determination of the sparsity criteria. Our approach<br />

retains attractive features of standard NMFD based methods such as fast convergence and easy implementation. We demonstrate<br />

the use of this algorithm in the log-frequency magnitude spectrum domain, where we employ it to perform model<br />

order selection and control sparseness directly.<br />

17:00-17:20, Paper WeCT4.5<br />

A Graph Matching Algorithm using Data-Driven Markov Chain Monte Carlo Sampling<br />

Lee, Jungmin, Seoul National Univ.<br />

Cho, Minsu, Seoul National Univ.<br />

Lee, Kyoung Mu, Seoul National Univ.<br />

We propose a novel stochastic graph matching algorithm based on data-driven Markov Chain Monte Carlo (DDMCMC)<br />

sampling technique. The algorithm explores the solution space efficiently and avoid local minima by taking advantage of<br />

spectral properties of the given graphs in data-driven proposals. Thus, it enables the graph matching to be robust to deformation<br />

and outliers arising from the practical correspondence problems. Our comparative experiments using synthetic<br />

and real data demonstrate that the algorithm outperforms the state-of-the-art graph matching algorithms.<br />

WeCT5 Topkapı Hall B<br />

Image Processing Applications Regular Session<br />

Session chair: Zafeiriou, Stefanos (Imperial College of London)<br />

15:40-16:00, Paper WeCT5.1<br />

Tensor-Driven Hyperspectral Denoising: A Strong Link for Classification Chains?<br />

Martín-Herrero, Julio, Univ. de Vigo<br />

Ferreiro-Armán, Marcos, Univ. de Vigo<br />

We show how a tensor-driven anisotropic diffusion denoising method affects the performance of a classifier trained to<br />

discriminate among vine varieties in noisy hyper spectral images. We compare the classification statistics on the original<br />

and denoised images and discuss the convenience of this kind of preprocessing for classification in hyperspectral images.<br />

16:00-16:20, Paper WeCT5.2<br />

Search Strategies for Image Multi-Distortion Estimation<br />

Caron, Andre Louis, Univ. of Sherbrooke<br />

Jodoin, Pierre-Marc, Univ. of Sherbrooke<br />

Charrier, Christophe, Univ. de Caen<br />

In this paper, we present a method for estimating the amount of Gaussian noise and Gaussian blur in a distorted image.<br />

Our method is based on the MS-SSIM framework which, although designed to measure image quality, is used to estimate<br />

the amount of blur and noise in a degraded image given a reference image. Various search strategies such as Newton, Simplex,<br />

and brute force search are presented and rigorously compared. Based on quantitative results, we show that the amount<br />

of blur and noise in a distorted image can be recovered with an accuracy up to 0.95% and 5.40%, respectively. To our<br />

knowledge, such precision has never been achieved before.<br />

- 208 -


16:20-16:40, Paper WeCT5.3<br />

Development of a High-Definition and Multispectral Image Capturing System for Digital Archiving of Early Modern<br />

Tapestries of the Kyoto Gion Festival<br />

Tsuchida, Masaru, NTT Corp.<br />

Tanaka, Hiromi, Ritsumei Univ.<br />

Yano, Keiji, Ritsumeikan Univ.<br />

We developed a two-shot 6-band image capturing system consisting of a large-format camera, a customized interference<br />

filter, and a scanning digital back to capture a 185-M-pixel images. The interference filter is set in front of the camera lens<br />

to obtain a 6-band image, that is, two 3-band images, one taken with the filter and the other without it. After correction of<br />

optical aberrations caused by the interference filter as well as system arrangement errors, the two images are combined<br />

into a 6-band image. The 6-band image was converted into a color-managed RGB image embedded ICC profile. In experiments,<br />

object images were captured as several divided parts and synthesized as almost 500-M-pixel image by using an<br />

image stitching technique. Resolution of the captured images is 0.02 mm/pixel. This paper discusses the camera system<br />

with its focus on some early modern tapestries used in the Kyoto Gion Festival. After the experiments, we interviewed a<br />

craftsman to assess the image’s importance in archiving and analyzing fabric structures.<br />

16:40-17:00, Paper WeCT5.4<br />

Appearance Control using Projection with Model Predictive Control<br />

Amano, Toshiyuki, Nara Inst. of Science and Tech.<br />

Kato, Hirokazu, Nara Inst. of Science and Tech.<br />

The unified technique for the irradiance correction and appearance enhancement for the real scene is proposed in this<br />

paper. The proposed method employed MPC algorithm for the projector camera system and enabled arbitrary appearance<br />

control such like photo retouching software in the real world. In the experiment, the appearance control of saturation enhancement,<br />

color removal, phase control, edge enhancement, image blur, makes unique brightness and the other enhancements<br />

for the real scene are shown.<br />

17:00-17:20, Paper WeCT5.5<br />

Decision Trees for Fast Thinning Algorithms<br />

Grana, Costantino, Univ. degli Studi di Modena e Reggio Emilia<br />

Borghesani, Daniele, Univ. degli Studi di Modena e Reggio Emilia<br />

Cucchiara, Rita, Univ. degli Studi di Modena e Reggio Emilia<br />

We propose a new efficient approach for neighborhood exploration, optimized with decision tables and decision trees,<br />

suitable for local algorithms in image processing. In this work, it is employed to speed up two widely used thinning techniques.<br />

The performance gain is shown over a large freely available dataset of scanned document images.<br />

WeCT6 Dolmabahçe Hall B<br />

Iris Regular Session<br />

Session chair: Kittler, Josef (Univ. of Surrey)<br />

15:40-16:00, Paper WeCT6.1<br />

Personal Identification from Iris Images using Localized Radon Transform<br />

Zhou, Yingbo, The Hong Kong Pol. Univ.<br />

Kumar, Ajay, The Hong Kong Pol. Univ.<br />

Personal identification using iris images has invited lots of attention in the literature and offered higher accuracy. However,<br />

the computational complexity in the feature extraction from the normalized iris images is still of key concern and further<br />

efforts are required to develop efficient feature extraction approaches. In this paper, we investigate a new approach for the<br />

efficient and effective extraction of iris features using localized Radon transforms. The feature extraction process exploits<br />

the orientation information from the local iris texture features using finite Radon transform. The dominant orientation<br />

from these Radon transform features is used to generate a binarized/compact feature representation. The similarity between<br />

two feature vectors is computed from the minimum matching distance that can account for the variations resulting from<br />

translation and rotation of the images. The feasibility of this approach is rigorously evaluated on two publically available<br />

iris image databases, i.e. IITD iris image database v1 and CASIA v3 iris image database. We also investigate the multi-<br />

- 209 -


scale analysis of iris images to enhance the performance. The experimental results presented in this paper are highly promising<br />

and suggest the computationally attractive alternative for the online iris identification.<br />

16:00-16:20, Paper WeCT6.2<br />

Segmentation of Unideal Iris Images using Game Theory<br />

Roy, Kaushik, Concordia Univ.<br />

Bhattacharya, Prabir, Concordia Univ.<br />

Suen, Ching Y.<br />

Robust localization of inner/outer boundary from an iris image plays an important role in iris recognition. However, the<br />

conventional iris/pupil localization methods using the region-based segmentation or the gradient-based boundary finding<br />

are often hampered by non-linear deformations, pupil dilations, head rotations, motion blurs, reflections, non-uniform intensities,<br />

low image contrast, camera angles and diffusions, and presence of eyelids and eyelashes. The novelty of this research<br />

effort is that we apply a parallel game-theoretic decision making procedure by using the modified Chakra borty<br />

and Duncan’s algorithm, which integrates the region-based segmentation and gradient-based boundary finding methods<br />

and fuses the complementary strengths of each of these individual methods. This integrated scheme forms a unified approach,<br />

which is robust to noise and poor localization.<br />

16:20-16:40, Paper WeCT6.3<br />

Iris-Biometric Hash Generation for Biometric Database Indexing<br />

Rathgeb, Christian, Univ. of Salzburg<br />

Uhl, Andreas, Univ. of Salzburg<br />

Performing identification on large-scale biometric databases requires an exhaustive linear search. Since biometric data<br />

does not have any natural sorting order, indexing databases, in order to minimize the response time of the system, represents<br />

a great challenge. In this work we propose a biometric hash generation technique for the purpose of biometric database<br />

indexing, applied to iris biometrics. Experimental results demonstrate that the presented approach highly accelerates biometric<br />

identification.<br />

16:40-17:00, Paper WeCT6.4<br />

A Robust Iris Localization Method using an Active Contour Model and Hough Transform<br />

Koh, Jaehan, SUNY Buffalo<br />

Govindaraju, Venu, Univ. at Buffalo<br />

Chaudhary, Vipin, SUNY Buffalo<br />

Iris segmentation is one of the crucial steps in building an iris recognition system since it affects the accuracy of the iris<br />

matching significantly. This segmentation should accurately extract the iris region despite the presence of noises such as<br />

varying pupil sizes, shadows, specular reflections and highlights. Considering these obstacles, several attempts have been<br />

made in robust iris localization and segmentation. In this paper, we propose a robust iris localization method that uses an<br />

active contour model and a circular Hough transform. Experimental results on 100 images from CASIA iris image database<br />

show that our method achieves 99% accuracy and is about 2.5 times faster than the Daugman’s in locating the pupillary<br />

and the limbic boundaries.<br />

17:00-17:20, Paper WeCT6.5<br />

Isis: Iris Segmentation for Identification Systems<br />

Nappi, Michele, Univ. of Salerno<br />

Riccio, Daniel, Univ. of Salerno<br />

De Marsico, Maria, Sapienza Univ. of Rome<br />

Advances in processing procedures make the iris a realistic candidate to the role of biometry of the future. Precise detection<br />

and segmentation for such biometry are a crucial ongoing research area. We propose an iris segmentation technique and<br />

show that it is more reliable than existent ones.<br />

- 210 -


WeCT7 Dolmabahçe Hall C<br />

Handwriting Recognition Regular Session<br />

Session chair: Doermann, David (Univ. of Maryland)<br />

15:40-16:00, Paper WeCT7.1<br />

Consensus Network based Hypotheses Combination for Arabic Offline Handwriting Recognition<br />

Prasad, Rohit, Raytheon BBN Tech.<br />

Kamali, Matin, BBN Tech.<br />

Belanger, David, Raytheon BBN Tech.<br />

Rosti, Antti-Veikko, Raytheon BBN Tech.<br />

Matsoukas, Spyros, Raytheon BBN Tech.<br />

Natarajan, P., BBN Tech.<br />

Offline handwriting recognition (OHR) is an extremely challenging task because of many factors including variations in<br />

writing style, writing device and material, and noise in the scanning and collection process. Due to the diverse nature of<br />

the above challenges, it is highly unlikely that a single recognition technique can address all the characteristics of realworld<br />

handwritten documents. Therefore, one must consider designing different systems, each addressing specific challenges<br />

in the handwritten corpus, and then combining the hypotheses from these diverse systems. To that end, we present<br />

an innovative approach for combining hypotheses from multiple handwriting recognition systems. Our approach is based<br />

on generating a consensus network using hypotheses from a diverse set of handwriting recognition systems. Next, we decode<br />

the consensus network for producing the best possible hypothesis given an error criterion. Experimental results on<br />

an Arabic OHR task show that our combination algorithm outperforms the NIST ROVER technique and results in a 7%<br />

relative reduction in the word error rate over the single best OHR system.<br />

16:00-16:20, Paper WeCT7.2<br />

A Novel Lexicon Reduction Method for Arabic Handwriting Recognition<br />

Wshah, Safwan, SUNY Buffalo<br />

Govindaraju, Venu, Univ. at Buffalo<br />

Li, Huiping, Applied Media Analysis Inc.<br />

Cheng, Yanfen, Wuhan Univ. of Tech.<br />

In this paper, we present a method for lexicon size reduction which can be used as an important pre-processing for an offline<br />

Arabic word recognition. The method involves extraction of the dot descriptors and PAWs (Piece of Arabic Word ).<br />

Then the number and position of dots and the number of the PAWs are used to eliminate unlikely candidates. The extraction<br />

of the dot descriptors is based on defined rules followed by a convolutional neural network for verification. The reduction<br />

algorithm makes use of the combination of two features with a dynamic matching scheme. On IFN/ENIT database of<br />

26459 Arabic handwritten word images we achieved a reduction rate of 87% with accuracy above 93%.<br />

16:20-16:40, Paper WeCT7.3<br />

A Novel Verification System for Handwritten Words Recognition<br />

Guichard, Laurent, IRISA - INRIA<br />

Toselli, Alejandro Héctor, Univ. Pol. de Valencia<br />

Couasnon, Bertrand, Irisa / Insa<br />

In the field of isolated handwritten word recognition, the development of highly effective verification systems to reject<br />

words presenting ambiguities is still an active research topic. In this paper, a novel verification system based on support<br />

vector machine scoring and multiple reject class-dependent thresholds is presented. In essence, a set of support vector machines<br />

appended to a standard HMM-based recognition system provides class-dependent confidence measures employed<br />

by the verification mechanism to accept or reject the recognized hypotheses. Experimental results on RIMES database<br />

show that this approach outperforms other state-of-the-art approaches.<br />

16:40-17:00, Paper WeCT7.4<br />

Multi-Template GAT/PAT Correlation for Character Recognition with a Limited Quantity of Data<br />

Wakahara, Toru, Hosei Univ.<br />

Yamashita, Yukihiko, Tokyo Inst. of Tech.<br />

- 211 -


This paper addresses the problem of improving the accuracy of character recognition with a limited quantity of data. The<br />

key ideas are twofold. One is distortion-tolerant template matching via hierarchical global/partial affine transformation<br />

(GAT/PAT) correlation to absorb both linear and nonlinear distortions in a parametric manner. The other is use of multiple<br />

templates per category obtained by k-means clustering in a gradient feature space for dealing with topological distortion.<br />

Recognition experiments using the handwritten numerical database IPTP CDROM1B show that the proposed method<br />

achieves a much higher recognition rate of 97.9% than that of 85.8% obtained by the conventional, simple correlation<br />

matching with a single template per category. Furthermore, comparative experiments show that the k-NN classification<br />

using the tangent distance and the GAT correlation technique achieves recognition rates of 97.5% and 98.7%, respectively.<br />

17:00-17:20, Paper WeCT7.5<br />

Structure Adaptation of HMM Applied to OCR<br />

Ait Mohand, Kamel, Univ. of Rouen<br />

Paquet, Thierry, Univ. of Rouen<br />

Ragot, Nicolas, Univ. François Rabelais Tours<br />

Heutte, Laurent, Univ. of Rouen<br />

In this paper we present a new algorithm for the adaptation of Hidden Markov Models (HMM models). The principle of<br />

our iterative adaptive algorithm is to alternate an HMM structure adaptation stage with an HMM Gaussian MAP adaptation<br />

stage of the parameters. This algorithm is applied to the recognition of printed characters to adapt the character models of<br />

a poly font general purpose character recognizer to new fonts of characters, never seen during training. A comparison of<br />

the results with those of MAP classical adaptation scheme show a slight increase in the recognition performance.<br />

WeBCT8 Upper Foyer<br />

SVM, NN, Kernel and Learning; Object Detection and Recognition Poster Session<br />

Session chair: Ross, Arun (West Virginia Univ.)<br />

13:30-16:30, Paper WeBCT8.1<br />

Multi-Class Pattern Classification in Imbalanced Data<br />

Ghanem, Amal Saleh, Univ. of Bahrain<br />

Venkatesh, Svetha, Curtin Univ. of Tech.<br />

West, Geoff, Curtin Univ. of Tech.<br />

The majority of multi-class pattern classification techniques are proposed for learning from balanced datasets. However,<br />

in several real-world domains, the datasets have imbalanced data distribution, where some classes of data may have few<br />

training examples compared for other classes. In this paper we present our research in learning from imbalanced multiclass<br />

data and propose a new approach, named Multi-IM, to deal with this problem. Multi-IM derives its fundamentals<br />

from the probabilistic relational technique (PRMs-IM), designed for learning from imbalanced relational data for the twoclass<br />

problem. Multi-IM extends PRMs-IM to a generalized framework for multi-class imbalanced learning for both relational<br />

and non-relational domains.<br />

13:30-16:30, Paper WeBCT8.2<br />

Deep Quantum Networks for Classification<br />

Zhou, Shusen, Harbin Inst. of Tech.<br />

Chen, Qingcai, Harbin Inst. of Tech.<br />

Wang, Xiaolong, Harbin Inst. of Tech.<br />

This paper introduces a new type of deep learning method named Deep Quantum Network (DQN) for classification. DQN<br />

inherits the capability of modeling the structure of a feature space by fuzzy sets. At first, we propose the architecture of<br />

DQN, which consists of quantum neuron and sigmoid neuron and can guide the embedding of samples divisible in new<br />

Euclidean space. The parameter of DQN is initialized through greedy layer-wise unsupervised learning. Then, the parameter<br />

space of the deep architecture and quantum representation are refined by supervised learning based on the global gradient-descent<br />

procedure. An exponential loss function is introduced in this paper to guide the supervised learning procedure.<br />

Experiments conducted on standard datasets show that DQN outperforms other feed forward neural networks and neurofuzzy<br />

classifiers.<br />

- 212 -


13:30-16:30, Paper WeBCT8.3<br />

Nonlinear Combination of Multiple Kernels for Support Vector Machines<br />

Li, Jinbo, East China Normal Univ.<br />

Sun, Shiliang, East China Normal Univ.<br />

Support vector machines (SVMs) are effective kernel methods to solve pattern recognition problems. Traditionally, they<br />

adopt a single kernel chosen beforehand, which makes them lack flexibility. The recent multiple kernel learning (MKL)<br />

overcomes this issue by optimizing over a linear combination of kernels. Despite its success, MKL neglects useful information<br />

generated from the nonlinear interaction of different kernels. In this paper, we propose SVMs based on the nonlinear<br />

combination of multiple kernels (NCMK) which surmounts the drawback of previous MKL by the potential to exploit<br />

more information. We show that our method can be formulated as a semi-definite programming (SDP) problem then solved<br />

by interior-point algorithms. Empirical studies on several data sets indicate that the presented approach is very effective.<br />

13:30-16:30, Paper WeBCT8.4<br />

Data Transformation of the Histogram Feature in Object Detection<br />

Zhang, Rongguo, Chinese Acad. of Sciences<br />

Xiao, Baihua, Chinese Acad. of Sciences<br />

Wang, Chunheng, Chinese Acad. of Sciences<br />

Detecting objects in images is very important for several application domains in computer vision. This paper presents an<br />

experimental study on data transformation of the feature vector in object detection. We use the modified Pyramid of Histograms<br />

of Orientation Gradients descriptor and the SVM classifier to form an object detection model. We apply a simple<br />

transformation to the histogram features before training and testing. This transformation equals a small change in the<br />

kernel function for Support Vector Machines. This change is much quicker than the kernel, but obtains better results. Experimental<br />

evaluations on the UIUC Image Database and TU Darmstadt Database show that the transformed features perform<br />

better than the raw features, and this transformation improves the linear separability of the histogram feature.<br />

13:30-16:30, Paper WeBCT8.5<br />

A New Learning Formulation for Kernel Classifier Design<br />

Sato, Atsushi, NEC<br />

This paper presents a new learning formulation for classifier design called ``General Loss Minimization.’’ The formulation<br />

is based on Bayes decision theory which can handle various losses as well as prior probabilities. A learning method for<br />

RBF kernel classifiers is derived based on the formulation. Experimental results reveal that the classification accuracy by<br />

the proposed method is almost the same as or better than Support Vector Machine (SVM), while the number of obtained<br />

reference vectors by the proposed method is much less than that of support vectors by SVM.<br />

13:30-16:30, Paper WeBCT8.6<br />

Variable Selection for Five-Minute Ahead Electricity Load Forecasting<br />

Koprinska, Irena, Univ. of Sydney<br />

Sood, Rohen, Univ. of Sydney<br />

Agelidis, Vassilios, Univ. of Sydney<br />

We use autocorrelation analysis to extract 6 nested feature sets of previous electricity loads for 5-minite ahead electricity<br />

load forecasting. We evaluate their predictive power using Australian electricity data. Our results show that the most important<br />

variables for accurate prediction are previous loads from the forecast day, 1, 2 and 7 days ago. By using also load<br />

variables from 3 and 6 days ago, we achieved small further improvements. The 3 bigger feature sets (37-51 features) when<br />

used with linear regression and support vector regression algorithms, were more accurate than the benchmarks. The overall<br />

best prediction model in terms of accuracy and training time was linear regression using the set of 51 features.<br />

13:30-16:30, Paper WeBCT8.7<br />

Enhancing Web Page Classification via Local Co-Training<br />

Du, Youtian, Xi’an Jiaotong Univ.<br />

Guan, Xiaohong, Xi’an Jiaotong Univ., Tsinghua University<br />

Cai, Zhongmin, Xi’an Jiaotong Univ.<br />

- 213 -


In this paper we propose a new multi-view semi-supervised learning algorithm called Local Co-Training(LCT). The proposed<br />

algorithm employs a set of local models with vector outputs to model the relations among examples in a local region<br />

on each view, and iteratively refines the dominant local models (i.e. the local models related to the unlabeled examples<br />

chosen for enriching the training set) using unlabeled examples by the co-training process. Compared with previous cotraining<br />

style algorithms, local co-training has two advantages: firstly, it has higher classification precision by introducing<br />

local learning; secondly, only the dominant local models need to be updated, which significantly decreases the computational<br />

load. Experiments on WebKB and Cora datasets demonstrate that LCT algorithm can effectively exploit unlabeled<br />

data to improve the performance of web page classification.<br />

13:30-16:30, Paper WeBCT8.8<br />

Robust Face Recognition using Multiple Self-Organized Gabor Features and Local Similarity Matching<br />

Aly, Saleh, Kyushu Univ.<br />

Shimada, Atsushi, Kyushu Univ.<br />

Tsuruta, Naoyuki, Fukuoka Univ.<br />

Taniguchi, Rin-Ichiro, Kyushu Univ.<br />

Gabor-based face representation has achieved enormous success in face recognition. However, one drawback of Gaborbased<br />

face representation is the huge amount of data that must be stored. Due to the nonlinear structure of the data obtained<br />

from Gabor response, classical linear projection methods like principal component analysis fail to learn the distribution<br />

of the data. A nonlinear projection method based on a set of self-organizing maps is employed to capture this nonlinearity<br />

and to represent face in a new reduced feature space. The Multiple Self-Organized Gabor Features (MSOGF) algorithm<br />

is used to represent the input image using all winner indices from each SOM map. A new local matching algorithm based<br />

on the similarity between local features is also proposed to classify unlabeled data. Experimental results on FERET database<br />

prove that the proposed method is robust to expression variations.<br />

13:30-16:30, Paper WeBCT8.9<br />

Exploring Pattern Selection Strategies for Fast Neural Network Training<br />

Vajda, Szilard, Tech. Univ. of Dortmund<br />

Fink, Gernot, TU Dortmund Univ.<br />

Nowadays, the usage of neural network strategies in pattern recognition is a widely considered solution. In this paper we<br />

propose three different strategies to select more efficiently the patterns for a fast learning in such a neural framework by<br />

reducing the number of available training patterns. All the strategies rely on the idea of dealing just with samples close to<br />

the decision boundaries of the classifiers. The effectiveness (accuracy, speed) of these methods is confirmed through different<br />

experiments on the MNIST handwritten digit data [1], Bangla handwritten numerals [2] and the Shuttle data from<br />

the UCI machine learning repository [3].<br />

13:30-16:30, Paper WeBCT8.10<br />

The Detection of Concept Frames using Clustering Multi-Instance Learning<br />

Tax, David, Delft Univ. of Tech.<br />

Hendriks, E. , Delft Univ. of Tech.<br />

Valstar, Michel, Imperial Coll.<br />

Pantic, M., Imperial Coll.<br />

The classification of sequences requires the combination of information from different time points. In this paper the detection<br />

of facial expressions is considered. Experiments on the detection of certain facial muscle activations in videos<br />

show that it is not always required to model the sequences fully, but that the presence of specific frames (the concept<br />

frame) can be sufficient for a reliable detection of certain facial expression classes. For the detection of these concept<br />

frames a standard classifier is often sufficient, although a more advanced clustering approach performs better in some<br />

cases.<br />

13:30-16:30, Paper WeBCT8.11<br />

Kernel Domain Description with Incomplete Data: Using Instance-Specific Margins to Avoid Imputation<br />

Gripton, Adam, Heriot-Watt Univ.<br />

Lu, Weiping, Heriot-Watt Univ.<br />

- 214 -


We present a method of performing kernel space domain description of a dataset with incomplete entries without the need<br />

for imputation, allowing kernel features of a class of data with missing features to be rigorously described. This addresses<br />

the problem that absent data completion is usually required before kernel classifiers, such as support vector domain description<br />

(SVDD), can be applied; equally, few existing techniques for incomplete data adequately address the issue of<br />

kernel spaces. Our method, which we call instance-specific domain description (ISDD), uses a parametrisation framework<br />

to compute minimal kernelised distances between data points with missing features through a series of optimisation runs,<br />

allowing evaluation of the kernel distance while avoiding subjective completions of missing data. We compare results of<br />

our method against those achieved by SVDD applied to an imputed dataset, using synthetic and experimental datasets<br />

where feature absence has a non-trivial structure. We show that our methods can achieve tighter sphere bounds when applied<br />

to linear and quadratic kernels.<br />

13:30-16:30, Paper WeBCT8.12<br />

Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts<br />

Fausser, Stefan, Univ. of Ulm<br />

Schwenker, Friedhelm, Univ. of Ulm<br />

Having a large game-tree complexity and being EXPTIME-complete, English Draughts, recently weakly solved during<br />

almost two decades, is still hard to learn for intelligent computer agents. In this paper we present a Temporal-Difference<br />

method that is nonlinear neural approximated by a 4-layer multi-layer perceptron. We have built multiple English Draughts<br />

playing agents, each starting with a randomly initialized strategy, which use this method during self-play to improve their<br />

strategies. We show that the agents are learning by comparing their winning-quote relative to their parameters. Our best<br />

agent wins versus the computer draughts programs Neuro Draughts, KCheckers and CheckerBoard with the easych engine<br />

and looses to Chinook, GuiCheckers and CheckerBoard with the strong cake engine. Overall our best agent has reached<br />

an amateur league level.<br />

13:30-16:30, Paper WeBCT8.13<br />

Learning the Kernel Combination for Object Categorization<br />

Zhang, Deyuan, Harbin Inst. of Tech.<br />

Wang, Xiaolong, Harbin Inst. of Tech.<br />

Liu, Bingquan, Harbin Inst. of Tech.<br />

Although Support Vector Machines(SVM) succeed in classifying several image databases using image descriptors proposed<br />

in the literature, no single descriptor can be optimal for general object categorization. This paper describes a novel framework<br />

to learn the optimal combination of kernels corresponding to multiple image descriptors before SVM training, leading<br />

to solve a quadratic programming problem efficiently. Our framework takes into account the variation of kernel matrix<br />

and imbalanced dataset, which are common in real world image categorization tasks. Experimental results on Graz-01<br />

and Caltech-101 image databases show the effectiveness and robustness of our algorithm.<br />

13:30-16:30, Paper WeBCT8.14<br />

SemiCCA: Efficient Semi-Supervised Learning of Canonical Correlations<br />

Kimura, Akisato, NTT Corp.<br />

Kameoka, Hirokazu, NTT Corp.<br />

Sugiyama, Masashi, Tokyo Inst. of Tech.<br />

Nakano, Takuho, University of Tokyo<br />

Maeda, Eisaku, Communication Science Lab.<br />

Sakano, Hitoshi, NTT<br />

Ishiguro, Katsuhiko, NTT<br />

Canonical correlation analysis (CCA) is a powerful tool for analyzing multi-dimensional paired data. However, CCA tends<br />

to perform poorly when the number of paired samples is limited, which is often the case in practice. To cope with this<br />

problem, we propose a semi-supervised variant of CCA named “Semi CCA” that allows us to incorporate additional unpaired<br />

samples for mitigating overfitting. The proposed method smoothly bridges the eigenvalue problems of CCA and<br />

principal component analysis (PCA), and thus its solution can be computed efficiently just by solving a single (generalized)<br />

eigenvalue problem as the original CCA. Preliminary experiments with artificially generated samples and PASCAL VOC<br />

data sets demonstrate the effectiveness of the proposed method.<br />

- 215 -


13:30-16:30, Paper WeBCT8.15<br />

Spatial String Matching for Image Classification<br />

Liu, Yunqiang, Barcelona Media - Innovation Center<br />

Caselles, Vicent, Univ. Pompeu Fabra<br />

This paper presents a spatial string matching method to incorporate spatial information into the bag-of-words model, which<br />

represents an image as an unordered distribution of local features. Spatial constraints among neighboring features are explored<br />

in order to achieve better discrimination power for image classification. The features from neighboring points are<br />

combined together and taken as a spatial string, and then our method matches the images according to the similarity of<br />

string pairs. The categorization problem can be formulated using KNN or SVM classifier based on the spatial string matching<br />

kernel. The proposed method is able to capture spatial dependencies across the neighboring features. Experiment<br />

results show promising performance for image classification tasks.<br />

13:30-16:30, Paper WeBCT8.16<br />

A Semi-Supervised Gaussian Mixture Model for Image Segmentation<br />

Martínez-Usó, Adolfo, Univ. Jaume I<br />

Pla, F., Univ. Jaume I<br />

Martínez Sotoca, Jose, Univ. Jaume I<br />

In this paper, the results of a semi-supervised approach based on the Expectation-Maximisation algorithm for model-based<br />

clustering are presented. We show in this work that, if the appropriate generative model is chosen, the classification accuracy<br />

on clustering for image segmentation can be significantly improved by the combination of a reduced set of labelled<br />

data and a large set of unlabelled data. This technique has been tested on real images as well as on medical images from<br />

a dermatology application. The preliminary results are quite promising. Not only the unsupervised accuracies have been<br />

improved as expected but the segmentation results obtained are considerably better than the results obtained by other powerful<br />

and well-known unsupervised image segmentation techniques.<br />

13:30-16:30, Paper WeBCT8.17<br />

Adding Classes Online in Error Correcting Output Codes Framework<br />

Escalera, Sergio, UB<br />

Masip, David, CVC, UOC<br />

Puertas, Eloi, Univ. de Barcelona<br />

Radeva, Petia, CVC<br />

Pujol, Oriol, UB<br />

This article proposes a general extension of the Error Correcting Output Codes (ECOC) framework to the online learning<br />

scenario. As a result, the final classifier handles the addition of new classes independently of the base classifier used. Validation<br />

on UCI database and two real machine vision applications show that the online problem-dependent ECOC proposal<br />

provides a feasible and robust way for handling new classes using any base classifier.<br />

13:30-16:30, Paper WeBCT8.18<br />

Training Multi-Level Features for the RobotVision@<strong>ICPR</strong> <strong>2010</strong> Challenge<br />

Sebastien, Paris, Univ. de la Méditerranée<br />

Herve, Glotin, LSIS<br />

This paper combines and proposes two novel multi-level spatial pyramidal (sp) features: spELBP (Extended Local Binary<br />

Pattern), spELBOP (Extended Local Binary Orientation Pattern) and spHOEE (Histogram of Oriented Edge Energy).<br />

These features feed state-of-the-art SVM algorithms for the localization of a robot in indoor environments. Two tasks are<br />

associated with the RobotVision@<strong>ICPR</strong> <strong>2010</strong> Challenge, the first one uses only a frame of stereoscopic images, the second<br />

takes into account the dynamics of the robot for improving results. Our scores are ranked 3rd for Task1 and 1st for Task2<br />

13:30-16:30, Paper WeBCT8.19<br />

Subclass Error Correcting Output Codes using Fisher’s Linear Discriminant Ratio<br />

Arvanitopoulos, Nikolaos, Aristotle Univ. of Thessaloniki<br />

Bouzas, Dimitrios, Aristotle Univ. of Thessaloniki<br />

- 216 -


Tefas, Anastasios, Aristotle Univ. of Thessaloniki<br />

Error-Correcting Output Codes (ECOC) with sub-classes reveal a common way to solve multi-class classification problems.<br />

According to this approach, a multi-class problem is decomposed into several binary ones based on the maximization of<br />

the mutual information (MI) between the classes and their respective labels. The MI is modelled through the fast quadratic<br />

mutual information (FQMI) procedure. However, FQMI is not applicable on large datasets due to its high algorithmic<br />

complexity. In this paper we propose Fisher’s Linear Discriminant Ratio (FLDR) as an alternative decomposition criterion<br />

which is of much less computational complexity and achieves in most experiments conducted better classification performance.<br />

Furthermore, we compare FLDR against FQMI for facial expression recognition over the Cohn-Kanade database.<br />

13:30-16:30, Paper WeBCT8.20<br />

Pattern Recognition Method using Ensembles of Regularities Found by Optimal Partitioning<br />

Senko, Oleg, Inst. of Russian Acad. of Sciences<br />

Kuznetsova, Anna, Inst. of Russian Acad. of Sciences<br />

New pattern recognition method is considered that is based on ensembles of syndromes. The developed method that is referred<br />

to as Multi-model statistically weighted syndromes (MSWS) is further development of earlier Statistically Weighted<br />

Syndromes (SWS) method. Syndromes are subregions in space of prognostic features where content of objects from one<br />

of the classes differs significantly from the same class contents in neighboring subregions. Syndromes are discussed as<br />

simple basic classifiers that are combined with the help of weighted voting procedure. Method of optimal partitioning of<br />

input features space is used for syndromes searching. At that syndromes are selected depending on quality of data separation<br />

and complexity of used partitioning model (partitions family). Performance of MSWS is compered with performance of<br />

SWS and alternative techniques in several applied tasks. Influence of recognition ability on characteristics of syndromes<br />

selection is studied.<br />

13:30-16:30, Paper WeBCT8.21<br />

A Geometric Radial Basis Function Network for Robot Perception and Action<br />

Bayro Corrochano, Eduardo Jose, CINVESTAV, Unidad Guadalajara<br />

Vázquez Santacruz, Eduardo, CINVESTAV, Unidad Guadalajara<br />

This paper presents a new hyper complex valued Radial Basis Network. This network constitutes a generalization of the<br />

standard real valued RBF. This geometric RBF can be used in real time to estimate changes in linear transformations between<br />

sets of geometric entities. Experiments using stereo image sequences validate this proposal. We propose a Geometric<br />

RBF Network (GRBF-N) designed in the geometric algebra framework. We present an application to estimate linear transformations<br />

between sets of geometric entities. Our experiments validate our proposal.<br />

13:30-16:30, Paper WeBCT8.22<br />

Kernel on Graphs based on Dictionary of Paths for Image Retrieval<br />

Haugeard, Jean-Emmanuel, ETIS, CNRS, ENSEA, Univ. Cergy-Pontoise<br />

Philipp-Foliguet, Sylvie, ENSEA/UCP/CNRS<br />

Gosselin, Philippe Henri, CNRS<br />

Recent approaches of graph comparison consider graphs as sets of paths. Kernels on graphs are then computed from<br />

kernels on paths. A common strategy for graph retrieval is to perform pairwise comparisons. In this paper, we propose to<br />

follow a different strategy, where we collect a set of paths into a dictionary, and then project each graph to this dictionary.<br />

Then, graphs can be classified using powerful classification methods, such as SVM. Furthermore, we collect the paths<br />

through interaction with a user. This strategy is ten times faster than a straight comparisons of paths. Experiments have<br />

been carried out on a database of city windows.<br />

13:30-16:30, Paper WeBCT8.23<br />

An Efficient Active Constraint Selection Algorithm for Clustering<br />

Vu, Viet-Vu, Univ. Pierre et Marie Curie - Paris 6<br />

Labroche, Nicolas, Univ. Pierre et Marie Curie - Paris 6<br />

Bouchon-Meunier, Bernadette, Univ. Pierre et Marie Curie - Paris 6<br />

- 217 -


In this paper, we address the problem of active query selection for clustering with constraints. The objective is to determine<br />

automatically a set of queries and their associated must-link and can-not link constraints to help constraints based clustering<br />

algorithms to converge. Some works on active constraints learning have already been proposed but they are only applied<br />

to K-Means like clustering algorithms which are known to be limited to spherical clusters while we are interested in constraints-based<br />

clustering algorithms that deals with clusters of arbitrary shapes and sizes (like Constrained-DBSCAN,<br />

Constrained-Hierarchical Clustering. . . ). Our novel approach relies on a k-nearest neighbors graph to estimate the dense<br />

regions of the data space and generates queries at the frontier between clusters where the cluster membership is most uncertain.<br />

Experiments show that our framework improves the performance of constraints based clustering algorithms.<br />

13:30-16:30, Paper WeBCT8.24<br />

Fuzzy Support Vector Machines for ECG Arrhythmia Detection<br />

Özcan, N. Özlem, Boğaziçi Univ.<br />

Gürgen, Fikret, Boğaziçi Univ.<br />

Besides cardiovascular diseases, heart attacks are the main cause of death around the world. Pre-monitoring or pre-diagnostic<br />

helps to prevent heart attacks and strokes. ECG plays a key role in this regard. In recent studies, SVM with different<br />

kernel functions and parameter values are applied for classification on ECG data. The classification model of SVM can<br />

be improved by assigning membership values for inputs. SVM combined with fuzzy theory, FSVM, is exercised on UCI<br />

Arrhythmia Database. Five different membership functions are defined. It is shown that the accuracy of classification can<br />

be improved by defining appropriate membership functions. ANFIS is used in order to interpret the resulting classification<br />

model. The ANFIS model of the ECG data is compared to and found consistent with the medical knowledge.<br />

13:30-16:30, Paper WeBCT8.25<br />

ROC Analysis and Cost-Sensitive Optimization for Hierarchical Classifiers<br />

Paclik, Pavel, PR Sys Design<br />

Lai, Carmen, TU Delft<br />

Landgrebe, Thomas, De Beers<br />

Duin, Robert, TU Delft<br />

Instead of solving complex pattern recognition problems using a single complicated classifier, it is often beneficial to<br />

leverage our prior knowledge and decompose the problem into parts. These may be tackled using specific feature subsets<br />

and simpler classifiers resulting in a hierarchical system. In this paper, we propose an efficient and scalable approach for<br />

cost-sensitive optimization of a general hierarchical classifier using ROC analysis. This allows the designer to view the<br />

hierarchy of trained classifiers as a system, and tune it according to the application needs.<br />

13:30-16:30, Paper WeBCT8.26<br />

Variational Mixture of Experts for Classification with Applications to Landmine Detection<br />

Yuksel, Seniha Esen, Univ. of Florida<br />

Gader, Paul, Univ. of Florida<br />

In this paper, we (1) provide a complete framework for classification using Variational Mixture of Experts (VME); (2) derive<br />

the variational lower bound; and (3) apply the method to landmine, or simply mine, detection and compare the results<br />

to the Mixtures of Experts trained with Expectation Maximization (EMME). VME has previously been used for regression<br />

and Waterhouse explained how to apply VME to classification (which we will call as VMEC). However, the steps to train<br />

the model were not made clear since the equations were applicable to vector valued parameters as opposed to matrices for<br />

each expert. Also, a variational lower bound was not provided. The variational lower bound provides an excellent stopping<br />

criterion that resists over-training. We demonstrate the efficacy of the method on real-world mine classification; in which,<br />

training robust mine classification algorithms is difficult because of the small number of samples per class. In our experiments<br />

VMEC consistently improved performance over EMME.<br />

13:30-16:30, Paper WeBCT8.27<br />

A Unifying Framework for Learning the Linear Combiners for Classifier Ensembles<br />

Erdogan, Hakan, Sabanci Univ.<br />

Sen, Mehmet Umut, Sabanci Univ.<br />

- 218 -


For classifier ensembles, an effective combination method is to combine the outputs of each classifier using a linearly<br />

weighted combination rule. There are multiple ways to linearly combine classifier outputs and it is beneficial to analyze<br />

them as a whole. We present a unifying framework for multiple linear combination types in this paper. This unification<br />

enables using the same learning algorithms for different types of linear combiners. We present various ways to train the<br />

weights using regularized empirical loss minimization. We propose using the hinge loss for better performance as compared<br />

to the conventional least-squares loss. We analyze the effects of using hinge loss for various types of linear weight training<br />

by running experiments on three different databases. We show that, in certain problems, linear combiners with fewer parameters<br />

may perform as well as the ones with much larger number of parameters even in the presence of regularization.<br />

13:30-16:30, Paper WeBCT8.28<br />

Reinforcement Learning for Robust and Efficient Real-World Tracking<br />

Cohen, Andre, Rutgers Univ.<br />

Pavlovic, Vladimir, Rutgers Univ.<br />

In this paper we present a new approach for combining several independent trackers into one robust real-time tracker. Unlike<br />

previous work that employ multiple tracking objectives used in unison, our tracker manages to determine an optimal sequence<br />

of individual trackers given the characteristics present in the video and the desire to achieve maximally efficient tracking.<br />

This allows for the selection of fast less-robust trackers when little movement is sensed, while using more robust but computationally<br />

intensive trackers in more dynamic scenes. We test this approach on the problem of real-world face tracking.<br />

Results show that this approach is a viable method for combining several independent trackers into one robust real-time<br />

tracker capable of tracking faces in varied lighting conditions, video resolutions, and with occlusions.<br />

13:30-16:30, Paper WeBCT8.29<br />

An Efficient and Stable Algorithm for Learning Rotations<br />

Arora, Raman, Univ. of Washington<br />

Sethares, William A., Univ. of Wisconsin-Madison<br />

This paper analyses the computational complexity and stability of an online algorithm recently proposed for learning rotations.<br />

The proposed algorithm involves multiplicative updates that are matrix exponentials of skew-symmetric matrices comprising<br />

the Lie algebra of the rotation group. The rank-deficiency of the skew-symmetric matrices involved in the updates is exploited<br />

to reduce the updates to a simple quadratic form. The Lyapunov stability of the algorithm is established and the application<br />

of the algorithm to registration of point-clouds in n-dimensional Euclidean space is discussed.<br />

13:30-16:30, Paper WeBCT8.30<br />

An Incremental Learning Algorithm for Nonstationary Environments and Class Imbalance<br />

Ditzler, Greg, Rowan Univ.<br />

Chawla, Nitesh, Univ. of Notre Dame<br />

Polikar, Robi, Rowan Univ.<br />

Learning in a non-stationary environment and in the presence of class imbalance has been receiving more recognition from the computational<br />

intelligence community, but little work has been done to create an algorithm or a framework that can handle both issues simultaneously. We<br />

have recently introduced a new member to the Learn++ family of algorithms, Learn++.NSE, which is designed to track non-stationary environments.<br />

However, this algorithm does not work well when there is class imbalance as it has not been designed to handle this problem. On<br />

the other hand, SMOTE a popular algorithm that can handle class imbalance is not designed to learn in nonstationary environments because<br />

it is a method of over sampling the data. In this work we describe and present preliminary results for integrating SMOTE and Learn++.NSE<br />

to create an algorithm that is robust to learning in a non-stationary environment and under class imbalance.<br />

13:30-16:30, Paper WeBCT8.31<br />

Feature-Based Partially Occluded Object Recognition<br />

Fan, Na, East China Normal Univ.<br />

We propose a framework to combine geometry, color and texture information among pairwise feature points into a graph and find the correct<br />

assignments from all candidates using graph matching techniques. Because of our informative similarity matrix, objects can be still recognized<br />

under severe occlusion and the matching errors can be greatly reduced when images are taken from very different view angles and partial occluded.<br />

- 219 -


13:30-16:30, Paper WeBCT8.32<br />

A Sample Pre-Mapping Method Enhancing Boosting for Object Detection<br />

Ren, Haoyu, Chinese Acad. of Sciences<br />

Hong, Xiaopeng, Harbin Inst. of Tech.<br />

Heng, Cher Keng, Panasonic Singapore Lab. Pte Ltd<br />

Liang, Luhong, Chinese Acad. of Sciences<br />

Chen, Xilin, Chinese Acad. of Sciences<br />

We propose a novel method to improve the training efficiency and accuracy of boosted classifiers for object detection.<br />

The key step of the proposed method is a sample pre-mapping on original space by referring to the selected reference<br />

sample before feeding into weak classifiers. The reference sample corresponds to an approximation of the optimal separating<br />

hyper-plane in an implicit high dimensional space, so that the resulting classifier could achieve the performance<br />

similar to kernel method, while spending the computation cost of linear classifier in both training and detection. We employ<br />

two different non-linear mappings to verify the proposed method under boosting framework. Experimental results show<br />

that the proposed approach achieves performance comparable with the common used methods on public datasets in both<br />

pedestrian detection and car detection.<br />

13:30-16:30, Paper WeBCT8.33<br />

Context Inspired Pedestrian Detection in Far-field Videos<br />

Ma, Wenhua, Chinese Acad. of Sciences<br />

He, Peng, Chinese Acad. of Sciences<br />

Lei, Huang, Chinese Acad. of Sciences<br />

Liu, Changping, Chinese Acad. of Sciences<br />

A novel pedestrian detection method that integrates context information with slide window search is proposed. The method<br />

applies notions such as corner, motion, and appearance to localize pedestrians in far-field videos without performing bruteforce-search.<br />

The corners direct attention to a set of conspicuous locations as the starting points for searching. And motion<br />

detection restricts the searching area within the foreground mask. Based on the above two, slide window search is applied<br />

to confirm the exact locations of pedestrians. Experiments demonstrate that the proposed method is efficient in detecting<br />

pedestrians in far-field videos.<br />

13:30-16:30, Paper WeBCT8.34<br />

Theme-Based Multi-Class Object Recognition and Segmentation<br />

Wu, Shilin, Chinese Acad. of Sciences<br />

Geng, Jiajia, Chinese Acad. of Sciences<br />

Zhu, Feng, Chinese Acad. of Sciences<br />

In this paper, we propose a new theme-based CRF model and investigate its performance on class based pixel-wise segmentation<br />

of images. By including the theme of an image, we also propose a new texture-environment potential to represent<br />

texture environment of a pixel, which alone gives satisfactory recognition results. The pixel-wise segmentation accuracy<br />

is remarkably improved by introducing texture potential. We compare our results to recent published results on the MSRC<br />

21-class database and show that our theme-based CRF model significantly outperforms the current state-of-the-art. Especially,<br />

by assigning a theme for each image, our model obtains greatly improved accuracy of structured classes with high<br />

visual variability and fewer training examples, the accuracy of which is very low in most related works.<br />

13:30-16:30, Paper WeBCT8.35<br />

Boosted Sigma Set for Pedestrian Detection<br />

Hong, Xiaopeng, Harbin Inst. of Tech.<br />

Chang, Hong, Chinese Acad. of Sciences<br />

Chen, Xilin, Chinese Acad. of Sciences<br />

Gao, Wen, PeKing Univ.<br />

This paper presents a new method to detect pedestrian in still image using Sigma sets as image region descriptors in the<br />

boosting framework. Sigma set encodes second order statistics of an image region implicitly in the form of a point set.<br />

Compared with the covariance matrix, the traditional second order statistics based region descriptor, which requires computationally<br />

demanding operations based on Riemannian manifold, Sigma set preserves similar robustness and discrimi-<br />

- 220 -


native power more efficiently because the classification on Sigma sets can be directly performed in vector space. Experimental<br />

results on the INRIA and the Daimler Chrysler pedestrian datasets show the effectiveness and efficiency of the<br />

proposed method.<br />

13:30-16:30, Paper WeBCT8.36<br />

Reverse Indexing for Reading Graffiti Tags<br />

Thurau, Christian, Fraunhofer IAIS<br />

Bauckhage, Christian, Fraunhofer IAIS<br />

In this paper, we consider the problem of automatically reading graffiti tags. As a preparatory step, we create a large set<br />

of synthetic graffiti-like characters, generated from publicly available true type fonts. For each character in the database,<br />

we extract a number of scale independent local binary descriptors. Then, using binary non negative matrix factorization,<br />

a sufficient number of basis functions are learned. Basis function coefficients of novel images can then be directly used<br />

for hashing characters from the database of prototypes. Finally, graffiti tags are recognized by means of a localized, spatial<br />

voting scheme.<br />

13:30-16:30, Paper WeBCT8.37<br />

Generic Object Recognition by Tree Conditional Random Field based on Hierarchical Segmentation<br />

Okumura, Takeshi, Kobe Univ.<br />

Takiguchi, Tetsuya, Kobe Univ.<br />

Ariki, Yasuo, Kobe Univ.<br />

Generic object recognition by a computer is strongly required in various fields like robot vision and image retrieval in<br />

recent years. Conventional methods use Conditional Random Field (CRF) that recognizes the class of each region using<br />

the features extracted from the local regions and the class co-occurrence between the adjoining regions. However, there<br />

is a problem that the discriminative ability of the features extracted from local regions is insufficient, and these methods<br />

is not robust to the scale variance. To solve this problem, we propose a method that integrates the recognition results in<br />

multi-scales by tree conditional random field based on hierarchical segmentation. As a result of the image dataset of 7<br />

classes, the proposed method has improved the recognition rate by 2.2%.<br />

13:30-16:30, Paper WeBCT8.38<br />

A Fast Approach for Pixelwise Labeling of Facade Images<br />

Fröhlich, Björn, Friedrich-Schiller Univ. of Jena<br />

Rodner, Erik, Friedrich-Schiller Univ. of Jena<br />

Denzler, Joachim, Friedrich-Schiller Univ. of Jena<br />

Facade classification is an important subtask for automatically building large 3d city models. In the following we present<br />

an approach for pixel wise labeling of facade images using an efficient Randomized Decision Forest classifier and robust<br />

local color features. Experiments are performed with a popular facade dataset and a new demanding dataset of pixel wise<br />

labeled images from the Label Me project. Our method achieves high recognition rates and is significantly faster for<br />

training and testing than other Methods based on expensive feature transformation techniques.<br />

13:30-16:30, Paper WeBCT8.39<br />

Real-Time Traffic Sign Detection: An Evaluation Study<br />

Li, Ying, IBM T. J. Watson Res. Center<br />

Guan, Weiguang, IBM<br />

Pankanti, Sharath<br />

This paper presents an experimental evaluation of three different traffic sign detection approaches, which detect or localize<br />

various types of traffic signs from real-time videos. Specifically, the first approach exploits geometric features to identify<br />

traffic signs, while the other two are developed based on SVM (Support Vector Machine) and AdaBoost learning mechanisms.<br />

We describe each of the three approaches, conduct a detailed comparison among them, and examine their pros and<br />

cons. Our conclusions should lead to useful guidelines for developing a real-time traffic sign detector.<br />

- 221 -


13:30-16:30, Paper WeBCT8.40<br />

Image Categorization by Learned Nonlinear Subspace of Combined Visual-Words and Low-Level Features<br />

Han, Xian-Hua, Ritsumeikan Univ.<br />

Chen, Yen-Wei, Ritsumeikan Univ.<br />

Ruan, Xiang, Omron Corparation<br />

Image category recognition is important to access visual information on the level of objects and scene types. This paper<br />

presents a new algorithm for the automatic recognition of object and scene classes. Compact and yet discriminative visual-words<br />

and low-level-features object class subspaces are automatically learned from a set of training images by a Supervised<br />

Nonlinear Neighborhood Embedding (SNNE) algorithm, which can learn an adaptive nonlinear subspace by<br />

preserving the neighborhood structure of the visual feature space. The main contribution of this paper is two fold: i) an<br />

optimally compact and discriminative feature subspace is learned by the proposed SNNE algorithm for different feature<br />

space (visual-word and low-level features). ii) An effective merge of different feature subspace can be implemented simply.<br />

High classification accuracy is demonstrated on different database including the scene databas (Simplicity) and object<br />

recognition database (Caltech). We confirm that the proposed strategy is much better than state-of-the-art methods for different<br />

databases.<br />

13:30-16:30, Paper WeBCT8.41<br />

Can Motion Segmentation Improve Patch-Based Object Recognition?<br />

Ulges, Adrian, DFKI<br />

Breuel, Thomas -<br />

Patch-based methods, which constitute the state of the art in object recognition, are often applied to video data, where<br />

motion information provides a valuable clue for separating objects of interest from the background. We show that such<br />

motion-based segmentation improves the robustness of patch-based recognition with respect to clutter. Our approach,<br />

which employs segmentation information to rule out incorrect correspondences between training and test views, is demonstrated<br />

empirically to distinctly outperform baselines operating on unsegmented images. Relative improvements reach<br />

50% for the recognition of specific objects, and 33% for object category retrieval.<br />

13:30-16:30, Paper WeBCT8.42<br />

Semi-Supervised and Interactive Semantic Concept Learning for Scene Recognition<br />

Han, Xian-Hua, Ritsumeikan Univ.<br />

Chen, Yen-Wei, Ritsumeikan Univ.<br />

Ruan, Xiang, Omron Corparation<br />

In this paper, we present a novel semi-supervised and interactive concept learning algorithm for scene recognition by local<br />

semantic description. Our work is motivated by the continuing effort in content-based image retrieval to extract and to<br />

model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic<br />

concept classes such as water, sunset, or sky [1]. However, labeling concept sampling manually for training semantic<br />

model is fairly expensive, and the labeling results is, to some extent, subjective to the operators. In this paper, by using<br />

the proposed semi-supervised and interactive learning algorithm, training samples and new concepts can be obtained accurately<br />

and efficiently. Through extensive experiments, we demonstrate that the image concept representation is well<br />

suited for modeling the semantic content of heterogenous scene categories, and thus for recognition and retrieval. Furthermore,<br />

higher recognition accuracy can be achieved by updating new training samples and concepts, which are obtained<br />

by the novel proposed algorithm.<br />

13:30-16:30, Paper WeBCT8.43<br />

Dense Structure Inference for Object Classification in Aerial LIDAR Dataset<br />

Kim, Eunyoung, Univ. of Southern California<br />

Medioni, Gerard, Univ. of Southern California<br />

We present a framework to classify small freeform objects in 3D aerial scans of a large urban area. The system first identifies<br />

large structures such as the ground surface and roofs of buildings densely built in the scene, by fitting planar patches<br />

and grouping adjacent patches similar in pose together. Then, it segments initial object candidates which represent the<br />

visible surface of an object using the identified structures. To deal with sparse density in points representing each candidate,<br />

we also propose a novel method to infer a dense 3D structure from the given sparse and noisy points without any meshes<br />

- 222 -


and iterations. To label object candidates, we build a tree-structure database of object classes, which captures latent patterns<br />

in shape of 3D objects in a hierarchical manner. We demonstrate our system on the aerial LIDAR dataset acquired from a<br />

few square kilometers of Ottawa.<br />

13:30-16:30, Paper WeBCT8.44<br />

Data-Driven Foreground Object Detection from a Non-Stationary Camera<br />

Sun, Shih-Wei, Acad. Sinica, Taiwan<br />

Huang, Fay, National Ilan Univ. Taiwan<br />

Liao, Mark, Acad. Sinica, Taiwan<br />

In this paper, we propose a data-driven foreground object detection technique which can detect foreground objects from<br />

a moving camera. We propose to build a data-driven consensus foreground object template (CFOT) and then detect the<br />

foreground object region in each frame. The proposed foreground object detection technique is equipped with the following<br />

functions: (1) the ability to detect the foreground object captured by a fast moving camera ; (2) the ability to detect a low<br />

contrast (spatially/temporally) foreground object; and (3) the ability to detect a foreground object from a dynamic background.<br />

There are three contributions of our method: (1) a newly proposed data-driven foreground region decision process<br />

for generating the CFOT has been shown robust and efficient; (2) a foreground object probability is proposed for properly<br />

dealing with the imperfect initial foreground region estimations; and (3) a CFOT is generated for precise foreground object<br />

detection.<br />

13:30-16:30, Paper WeBCT8.45<br />

Efficient Shape Retrieval under Partial Matching<br />

Demirci, Fatih, TOBB Univ. of Ec. and Tech.<br />

Indexing into large database systems is essential for a number of applications. This paper presents a new indexing structure,<br />

which overcomes an important restriction of a previous indexing technique using a recently developed theorem from the<br />

domain of matrix analysis. Specifically, given a set of distance values computed by distance function, which do not necessarily<br />

satisfy the triangle inequality, this paper shows that computing its nearest distance values that obey the properties<br />

of a metric enables us to overcome the limitations of the previous indexing algorithm. We demonstrate the proposed framework<br />

in the context of a recognition task.<br />

13:30-16:30, Paper WeBCT8.46<br />

Component Identification in the 3D Model of a Building<br />

Xu, Mai, Imperial Coll.<br />

Petrou, Maria, Imperial Coll.<br />

Jahangiri, Mohammad, Imperial Coll.<br />

This paper addresses the problem of identifying the components (such as balconies and windows) of the 3D model of a<br />

building. A novel method, based on a voting scheme, is presented for solving such a problem. It is intuitive that interference<br />

(such as shadows and occlusions) rarely happen at the same place or at different times when looking at a scene from different<br />

directions. In the spirit of this intuition, the voting-based method combines the information from various images to<br />

identify and segment the components of a building.<br />

13:30-16:30, Paper WeBCT8.48<br />

Multi-Scale Color Local Binary Patterns for Visual Object Classes Recognition<br />

Zhu, Chao, Ec. Centrale de Lyon<br />

Bichot, Charles-Edmond, Ec. Centrale de Lyon<br />

Chen, Liming, Ec. Centrale de Lyon<br />

The Local Binary Pattern (LBP) operator is a computationally efficient yet powerful feature for analyzing local texture<br />

structures. While the LBP operator has been successfully applied to tasks as diverse as texture classification, texture segmentation,<br />

face recognition and facial expression recognition, etc., it has been rarely used in the domain of Visual Object<br />

Classes (VOC) recognition mainly due to its deficiency of power for dealing with various changes in lighting and viewing<br />

conditions in real-world scenes. In this paper, we propose six novel multi-scale color LBP operators in order to increase<br />

photometric invariance property and discriminative power of the original LBP operator. The experimental results on the<br />

- 223 -


PASCAL VOC 2007 image benchmark show significant accuracy improvement by the proposed operators as compared<br />

with both the original LBP and other popular texture descriptors such as Gabor filter.<br />

13:30-16:30, Paper WeBCT8.49<br />

Object Localization by Propagating Connectivity via Superfeatures<br />

Chakraborty, Ishani, Rutgers Univ.<br />

Elgammal, Ahmed, Rutgers Univ.<br />

In this paper, we propose a part-based approach to localize objects in cluttered images. We represent object parts as boundary<br />

segments and image patches. A semi-local grouping of parts named superfeatures encodes appearance and connectivity<br />

within a neighborhood. To match parts, we integrate inter-feature similarities and intra-feature connectivity via a relaxation<br />

labeling framework. Additionally, we use a global elliptical shape prior to match the shape of the solution space to that of<br />

the object. To this end, we demonstrate the efficacy of the method for detecting various objects in cluttered images by<br />

comparing them to simple object models.<br />

13:30-16:30, Paper WeBCT8.50<br />

Efficient Object Detection and Matching using Feature Classification<br />

Dornaika, Fadi, Univ. of the Basque Country<br />

Chakik, Fadi, Lebanese Univ.<br />

This paper presents a new approach for efficient object detection and matching in images and videos. We propose a stage<br />

based on a classification scheme that classifies the extracted features in new images into object features and non-object<br />

features. This binary classification scheme has turned out to be an efficient tool that can be used for object detection and<br />

matching. By means of this classification not only the matching process becomes more robust and faster but also the robust<br />

object registration becomes fast. We provide quantitative evaluations showing the advantages of using the classification<br />

stage for object matching and registration. Our approach could lend itself nicely to real-time object tracking and detection.<br />

13:30-16:30, Paper WeBCT8.51<br />

A Discriminative Model for Object Representation and Detection via Sparse Features<br />

Song, Xi, Beijing Inst. of Tech.<br />

Luo, Ping, Sun Yat-Sen Univ.<br />

Lin, Liang, Sun Yat-Sen Univ.<br />

Jia, Yunde, Beijing Inst. of Tech.<br />

This paper proposes a discriminative model that represents an object category with a batch of boosted image patches, motivated<br />

by detecting and localizing objects with sparse features. Instead of designing features carefully and category-specifically<br />

as in previous work, we extract a massive number of local image patches from the positive object instances and<br />

quantize them as weak classifiers. Then we extend the Adaboost algorithm for learning the patch-based model integrating<br />

object appearance and structure information. With the learned model, a few features are activated to localize instances in<br />

the testing images. In the experiments, we apply the proposed method with several public datasets and achieve advancing<br />

performance.<br />

13:30-16:30, Paper WeBCT8.52<br />

A Robust Recognition Technique for Dense Checkerboard Patterns<br />

Dao, Vinh Ninh, The Univ. of Tokyo<br />

Sugimoto, Masanori, The Univ. of Tokyo<br />

The checkerboard pattern is widely used in computer vision techniques for camera calibration and simple geometry acquisition,<br />

both in practical use and research. However, most of the current techniques fail to recognize the checkerboard<br />

pattern under distorted, occluded or discontinuous conditions, especially when the checkerboard pattern is dense. This<br />

paper proposes a novel checkerboard recognition technique that is robust to noise, surface distortion or discontinuity, supporting<br />

checkerboard recognition in dynamic conditions for a wider range of applications. When the checkerboard pattern<br />

is used in a projector camera system for geometry reconstruction, by using epipolar geometry, this technique can recognize<br />

the corresponding positions of the crossing points, even if the checkerboard pattern is only partly detected.<br />

- 224 -


13:30-16:30, Paper WeBCT8.53<br />

Spike-Based Convolutional Network for Real-Time Processing<br />

Pérez-Carrasco, Jose-Antonio, Univ. de Sevilla<br />

Serrano-Gotarredona, Carmen, Univ. de Sevilla<br />

Acha-Piñero, Begoña, Univ. de Sevilla<br />

Serrano-Gotarredona, Teresa, Univ. de Sevilla<br />

Linares-Barranco, Bernabe, Univ. de Sevilla<br />

In this paper we propose the first bio-inspired six layer convolutional network (ConvNet) non-frame based that can be implemented<br />

with already physically available spikebased electronic devices. The system was designed to recognize people in<br />

three different positions: standing, lying or up-side down. The inputs were spikes obtained with a motion retina chip. We<br />

provide simulation results showing recognition delays of 16 milliseconds from stimulus onset (time-to-first spike) with a<br />

recognition rate of 94%. The weight sharing property in ConvNets and the use of AER protocol allow a great reduction in<br />

the number of both trainable parameters and connections (only 748 trainable parameters and 123 connections in our AER<br />

system (out of 506998 connections that would be required in a frame-based implementation).<br />

13:30-16:30, Paper WeBCT8.54<br />

Learning Affordances for Categorizing Objects and Their Properties<br />

Dag, Nilgun, Middle East Tech. Univ.<br />

Atil, Ilkay, Middle East Tech. Univ.<br />

Kalkan, Sinan, Middle East Tech. Univ.<br />

Sahin, Erol, Middle East Tech. Univ.<br />

In this paper, we demonstrate that simple interactions with objects in the environment leads to a manifestation of the perceptual<br />

properties of objects. This is achieved by deriving a condensed representation of the effects of actions (called effect prototypes<br />

in the paper), and investigating the relevance between perceptual features extracted from the objects and the actions that can<br />

be applied to them. With this at hand, we show that the agent can categorize (i.e., partition) its raw sensory perceptual feature<br />

vector, extracted from the environment, which is an important step for development of concepts and language. Moreover,<br />

after learning how to predict the effect prototypes of objects, the agent can categorize objects based on the predicted effects<br />

of actions that can be applied on them.<br />

13:30-16:30, Paper WeBCT8.55<br />

Feature Pairs Connected by Lines for Object Recognition<br />

Awais, Muhammad, Univ. of Surrey<br />

Mikolajczyk, Krystian, Univ. of Surrey<br />

In this paper we exploit image edges and segmentation maps to build features for object category recognition. We build a<br />

parametric line based image approximation to identify the dominant edge structures. Line ends are used as features described<br />

by histograms of gradient orientations. We then form descriptors based on connected line ends to incorporate weak topological<br />

constraints which improve their discriminative power. Using point pairs connected by an edge assures higher repeatability<br />

than a random pair of points or edges. The results are compared with state-of-the-art, and show significant improvement on<br />

challenging recognition benchmark Pascal VOC 2007. Kernel based fusion is performed to emphasize the complementary<br />

nature of our descriptors with respect to the state-of-the-art features.<br />

13:30-16:30, Paper WeBCT8.56<br />

Using Gait Features for Improving Walking People Detection<br />

Bouchrika, Imed, Univ. of Southampton<br />

Carter, John, Univ. of Southampton<br />

Nixon, Mark, Univ. of Southampton<br />

Morzinger, Roland, Joanneum Res.<br />

Thallinger, Georg, Joanneum Res.<br />

In this paper, we explore a new approach for enriching the HoG method for pedestrian detection in an unconstrained outdoor<br />

environment. The proposed algorithm is based on using gait motion since the rhythmic footprint pattern for walking people<br />

is considered the stable and characteristic feature for the detection of walking people. The novelty of our approach is motivated<br />

by the latest research for people identification using gait. The experimental results confirmed the robustness of our method<br />

- 225 -


to enhance HoG to detect walking people as well as to discriminate between single walking subject, groups of people and<br />

vehicles with a detection rate of 100%. Furthermore, the results revealed the potential of our method to be used in visual surveillance<br />

systems for identity tracking over different camera views.<br />

13:30-16:30, Paper WeBCT8.57<br />

Learning-Based Vehicle Detection using Up-Scaling Schemes and Predictive Frame Pipeline Structures<br />

Tsai, Yi-Min, National Taiwan Univ.<br />

Huang, Keng-Yen, National Taiwan Univ.<br />

Tsai, Chih-Chung, National Taiwan Univ.<br />

Chen, Liang-Gee, National Taiwan Univ.<br />

This paper aims at detecting preceding vehicles in a variety of distance. A sub-region up-scaling scheme significantly raises<br />

far distance detection capability. Three frame pipeline structures involving object predictors are explored to further enhance<br />

accuracy and efficiency. It claims a 140-meter detecting distance along proposed methodology. 97.1% detection rate with<br />

4.2% false alarm rate is achieved. At last, the benchmark of several learning-based vehicle detection approaches is provided.<br />

13:30-16:30, Paper WeBCT8.58<br />

Dynamic Hand Pose Recognition using Depth Data<br />

Suryanarayan, Poonam, The Pennsylvania State Univ.<br />

Subramanian, Anbumani, HP Lab.<br />

Mandalapu, Dinesh, HP Lab.<br />

Hand pose recognition has been a problem of great interest to the Computer Vision and Human Computer Interaction community<br />

for many years and the current solutions either require additional accessories at the user end or enormous computation<br />

time. These limitations arise mainly due to the high dexterity of human hand and occlusions created in the limited view of<br />

the camera. This work utilizes the depth information and a novel algorithm to recognize scale and rotation invariant hand<br />

poses dynamically. We have designed a volumetric shape descriptor enfolding the hand to generate a 3D cylindrical histogram<br />

and achieved robust pose recognition in real time.<br />

13:30-16:30, Paper WeBCT8.59<br />

A Hierarchical GIST Model Embedding Multiple Biological Feasibilities for Scene Classification<br />

Han, Yina, Xi’an Jiaotong Univ.<br />

Liu, Guizhong, Xi’an Jiaotong Univ.<br />

We propose a hierarchical GIST model embedding multiple biological feasibilities for scene classification. In the perceptual<br />

layer, spatial layout of Gabor features are extracted in a bio-vision guided way: introducing diagnostic color information,<br />

tuning the orientations and scales of Gabor filters, as well as the spacial pooling size to a biological feasible value. In the<br />

conceptual layer, for the first time, we attempt to build a computational model for the biological conceptual GIST by kernel<br />

PCA based prototype representation, which is specific task orientated as biological GIST, and also in accordance with the<br />

unsupervised learning assumption in the primary visual cortex and prototype similarity based categorization in human cognition.<br />

Using around $200$ dimensions, our model is shown to outperform existing GIST models, and to achieve state-ofthe-art<br />

performances on four scene datasets.<br />

13:30-16:30, Paper WeBCT8.60<br />

Road Network Extraction using Edge Detection and Spatial Voting<br />

Sirmacek, Beril, Deutsches Zentrum fur Luft und Raumfahrt<br />

Unsalan, Cem, Yeditepe Univ.<br />

Road network detection from very high resolution satellite images is important for two main reasons. First, the detection<br />

result can be used in automated map making. Second, the detected network can be used in trajectory planning for unmanned<br />

aerial vehicles. Although an expert can label road pixels in a given satellite image, this operation is prone to errors. Therefore,<br />

an automated system is needed to detect the road network in a given satellite image in a robust manner. In this study, we propose<br />

a novel approach to detect the road network from a given panchromatic Ikonos satellite image. Our method has five<br />

main steps. First, we apply a nonlinear bilateral filtering to smooth the given image. Then, we extract Canny edges and the<br />

gradient information as local features. Using these local features, we generate a spatial voting matrix. This voting matrix in-<br />

- 226 -


dicates the possible locations of the road network pixels. By processing this voting matrix in an iterative manner, we detect<br />

initial road pixels. Finally, we apply a tracking algorithm on the voting matrix to detect the missing road pixels. We tested<br />

our method on various satellite images and provided the extracted road networks in the experiments section.<br />

13:30-16:30, Paper WeBCT8.61<br />

Decomposition Methods and Learning Approaches for Imbalanced Dataset: An Experimental Integration<br />

Soda, Paolo, Univ. Campus Bio-Medico di Roma<br />

Iannello, Giulio, Univ. Campus Bio-Medico di Roma<br />

Decomposition methods are multiclass classification schemes where the polychotomy is reduced into several dichotomies.<br />

Each dichotomy is addressed by a classifier trained on a training set derived from the original one on the basis of the decomposition<br />

rule adopted. These new training sets may present a disproportion between the classes, harming the global recognition<br />

accuracy. Indeed, traditional learning algorithms are biased towards the majority class, resulting in poor predictive accuracy<br />

over the minority one. This paper investigates if the application of learning methods specifically tailored for imbalanced<br />

training set introduces any performance improvement when used by dichotomizers of decomposition methods. The results<br />

on five public datasets show that the application of these learning methods improves the global performance of decomposition<br />

schemes.<br />

13:30-16:30, Paper WeBCT8.62<br />

The Balanced Accuracy and its Posterior Distribution<br />

Brodersen, Kay Henning, ETH Zurich<br />

Ong, Cheng Soon, ETH Zurich<br />

Stephan, Klaas Enno, Univ. of Zurich<br />

Buhmann, Joachim M., Swiss Federal Inst. of Tech. Zurich<br />

Evaluating the performance of a classification algorithm critically requires a measure of the degree to which unseen examples<br />

have been identified with their correct class labels. In practice, generalizability is frequently estimated by averaging the accuracies<br />

obtained on individual cross-validation folds. This procedure, however, is problematic in two ways. First, it does<br />

not allow for the derivation of meaningful confidence intervals. Second, it leads to an optimistic estimate when a biased<br />

classifier is tested on an imbalanced dataset. We show that both problems can be overcome by replacing the conventional<br />

point estimate of accuracy by an estimate of the posterior distribution of the balanced accuracy.<br />

WeBCT9, Lower Foyer<br />

Multimedia Analysis and Retrieval, Poster Session<br />

Session chair: Cetin, E. (Bilkent Univ.)<br />

13:30-16:30, Paper WeBCT9.1<br />

A Study on Detecting Patterns in Twitter Intra-Topic User and Message Clustering<br />

Cheong, Marc, Monash Univ.<br />

Lee, Vincent C S, Monash Univ.<br />

Timely detection of hidden patterns is the key for the analysis and estimating of driving determinants for mission critical decision<br />

making. This study applies Cheong and Lee’s context-aware content analysis framework to extract latent properties<br />

from Twitter messages (tweets). In addition, we incorporate an unsupervised Self-organizing Feature Map (SOM) as a machine<br />

learning-based clustering tool that has not been investigated in the context of opinion mining and sentimental analysis<br />

using microblogging. Our experimental results reveal the detection of interesting patterns for topics of interest which are<br />

latent and cannot be easily detected from the observed tweets without the aid of machine learning tools.<br />

13:30-16:30, Paper WeBCT9.2<br />

Classification of Near-Duplicate Video Segments based on Their Appearance Patterns<br />

Ide, Ichiro, Nagoya Univ.<br />

Shamoto, Yuji, Nagoya Univ.<br />

Deguchi, Daisuke, Nagoya Univ.<br />

Takahashi, Tomokazu, Gifu Shotoku Gakuen Univ.<br />

Murase, Hiroshi, Nagoya Univ.<br />

- 227 -


We propose a method that analyzes the structure of a large volume of general broadcast video data by the appearance patterns<br />

of near-duplicate video segments. We define six classification rules based on the appearance patterns of near-duplicate video<br />

segments according to their roles, and evaluated them over more than 1,000 hours of actual broadcast video data.<br />

13:30-16:30, Paper WeBCT9.3<br />

Motion Vector based Features for Content based Video Copy Detection<br />

Tasdemir, Kasim, Bilkent Univ.<br />

Cetin, E., Bilkent Univ.<br />

In this article, we propose a motion vector based feature set for Content Based Copy Detection (CBCD) of video clips. Motion<br />

vectors of image frames are one of the signatures of a given video. However, they are not descriptive enough when consecutive<br />

image frames are used because most vectors are too small. To overcome this problem we calculate motion vectors in a lower<br />

frame rate than the actual frame rate of the video. As a result we obtain longer vectors which form a robust parameter set representing<br />

a given video. Experimental results are presented.<br />

13:30-16:30, Paper WeBCT9.4<br />

A Statistical Learning Approach to Spatial Context Exploitation for Semantic Image Analysis<br />

Papadopoulos, Georgios Th., Centre for Res. and Tech. Hellas<br />

Mezaris, Vasileios, Centre for Res. and Tech. Hellas<br />

Kompatsiaris, Yiannis, Centre for Res. and Tech. Hellas<br />

Strintzis, Michael-Gerasimos,<br />

In this paper, a statistical learning approach to spatial context exploitation for semantic image analysis is presented. The proposed<br />

method constitutes an extension of the key parts of the authors’ previous work on spatial context utilization, where a Genetic<br />

Algorithm (GA) was introduced for exploiting fuzzy directional relations after performing an initial classification of image regions<br />

to semantic concepts using solely visual information. In the extensions reported in this work, a more elaborate approach<br />

is followed during the spatial knowledge acquisition and modeling process. Additionally, the impact of every resulting spatial<br />

constraint on the final outcome is adaptively adjusted. Experimental results as well as comparative evaluation on three datasets<br />

of varying complexity in terms of the total number of supported semantic concepts demonstrate the efficiency of the proposed<br />

method.<br />

13:30-16:30, Paper WeBCT9.5<br />

Wavelet-Based Texture Retrieval Modeling the Magnitudes of Wavelet Detail Coefficients with a Generalized Gamma Distribution<br />

De Ves Cuenca, Esther, Univ. of Valencia<br />

Benavent, Xaro, Univ. of Valencia<br />

Ruedin, Ana María Clara, Univ. de Buenos Aires<br />

Acevedo, Daniel Germán, Univ. de Buenos Aires<br />

Seijas, Leticia María, Univ. de Buenos Aires<br />

This paper presents a texture descriptor based on the fine detail coefficients at three resolution levels of a traslation invariant<br />

undecimated wavelet transform. First, we consider vertical and horizontal wavelet detail coefficients at the same position as the<br />

components of a bivariate random vector, and the magnitude and angle of these vectors are computed. The magnitudes are modeled<br />

by a Generalized Gamma distribution. Their parameters, together with the circular histograms of angles, are used to characterize<br />

each texture image of the database. The Kullback-Leibler divergence is used as the similarity measurement. Retrieval<br />

experiments, in which we compare two wavelet transforms, are carried out on the Brodatz texture collection. Results reveal the<br />

good performance of this wavelet-based texture descriptor obtained via the Generalized Gamma distribution.<br />

13:30-16:30, Paper WeBCT9.6<br />

3D-Shape Retrieval using Curves and HMM<br />

Tabia, Hedi, Lagis Univ. Lille 1<br />

Daoudi, Mohamed, TELECOM Lille1<br />

Vandeborre, Jean-Philippe, Univ. of Lille 1<br />

Colot, Olivier, Univ. Lille 1<br />

In this paper, we propose a new approach for 3D-shape matching. This approach encloses an off-line step and an on-line step.<br />

- 228 -


In the off-line one, an alphabet, of which any shape can be composed, is constructed. First, 3D-objects are subdivided into a set<br />

of 3D-parts. The subdivision consists to extract from each object a set of feature points with associated curves. Then the whole<br />

set of 3D-parts is clustered into different classes from a semantic point of view. After that, each class is modeled by a Hidden<br />

Markov Model (HMM). The HMM, which represents a character in the alphabet, is trained using the set of curves corresponding<br />

to the class parts. Hence, any 3D-object can be represented by a set of characters. The on-line step consists to compare the set<br />

of characters representing the 3D-object query and that of each object in the given dataset. The experimental results obtained<br />

on the TOSCA dataset show that the system efficiently performs in retrieving similar 3D-models.<br />

13:30-16:30, Paper WeBCT9.7<br />

Fast Fingerprint Retrieval with Line Detection<br />

Lian, Hui-Cheng, Shanghai University<br />

In this paper, a retrieval method is proposed for audio and video fingerprinting systems by adopting a line detection technique.<br />

To achieve fast retrieval, the lines are generated from sub-fingerprints of query and database, and the non-candidate lines are<br />

filtered out. So, the distance between query and refers can be calculated fast. To demonstrate the superiority of this method, the<br />

audio fingerprints and video fingerprints are generated for comparisons. The experimental results indicate that the proposed<br />

method outperforms the direct hashing method.<br />

13:30-16:30, Paper WeBCT9.8<br />

A High-Dimensional Access Method for Approximated Similarity Search in Text Mining<br />

Artigas-Fuentes, Fernando José, Univ. de Oriente, CERPAMID<br />

Badía-Contelles, José Manuel, Univ. Jaume I, Castellón<br />

Gil-García, Reynaldo, Univ. de Oriente, CERPAMID<br />

In this paper, a new access method for very high-dimensional data space is proposed. The method uses a graph structure and<br />

pivots for indexing objects, such as documents in text mining. It also applies a simple search algorithm that uses distance or<br />

similarity based functions in order to obtain the k-nearest neighbors for novel query objects. This method shows a good selectivity<br />

over very-high dimensional data spaces, and a better performance than other state-of-the-art methods. Although it is a probabilistic<br />

method, it shows a low error rate. The method is evaluated on data sets from the well-known collection Reuters corpus<br />

version 1 (RCV1-v2) and dealing with thousands of dimensions.<br />

13:30-16:30, Paper WeBCT9.9<br />

3D Model Comparison through Kernel Density Matching<br />

Wang, Yiming, Nanjing Univ.<br />

Lu, Tong, Nanjing Univ.<br />

Gao, Rongjun, Nanjing Univ.<br />

Liu, Wenyin, City U of HK<br />

A novel 3D shape matching method is proposed in this paper. We first extract angular and distance feature pairs from preprocessed<br />

3D models, then estimate their kernel densities after quantifying the feature pairs into a fixed number of bins. During<br />

3D matching, we adopt the KL-divergence as a distance of 3D comparison. Experimental results show that our method is effective<br />

to match similar 3D shapes, and robust to model deformations or rotation transformations.<br />

13:30-16:30, Paper WeBCT9.10<br />

Improving the Efficiency of Content-Based Multimedia Exploration<br />

Beecks, Christian, RWTH Aachen Univ.<br />

Wiedenfeld, Sascha, RWTH Aachen Univ.<br />

Seidl, Thomas, RWTH Aachen Univ.<br />

Visual exploration systems enable users to search, browse, and explore voluminous multimedia databases in an interactive and<br />

playful manner. Whether users know the database’s contents in advance or not, these systems guide the user’s exploration<br />

process by visualizing the database contents and allowing him or her to issue queries intuitively. In order to improve the efficiency<br />

of content-based visual exploration systems, we propose an efficient query evaluation scheme which aims at reducing the total<br />

number of costly similarity computations. We evaluate our approach on different state-of-the-art image databases.<br />

- 229 -


13:30-16:30, Paper WeBCT9.11<br />

Tertiary Hash Tree: Indexing Structure for Content-Based Image Retrieval<br />

Tak, Yoon-Sik, Korea Univ.<br />

Hwang, Eenjun, Korea Univ.<br />

Dominant features for content-based image retrieval usually consist of high-dimensional values. So far, many researches<br />

have been done to index such values for fast retrieval. Still, many existing indexing schemes are suffering from performance<br />

degradation due to the curse of dimensionality problem. As an alternative, heuristic algorithms have been proposed to calculate<br />

the result with high probability at the cost of accuracy. In this paper, we propose a new hash tree-based indexing structure<br />

called tertiary hash tree for indexing high-dimensional feature values. Tertiary hash tree provides several advantages compared<br />

to the traditional extendible hash structure in terms of resource usage and search performance. Through extensive experiments,<br />

we show that our proposed index structure achieves outstanding performance.<br />

13:30-16:30, Paper WeBCT9.12<br />

An Augmented Reality Setup with an Omnidirectional Camera based on Multiple Object Detection<br />

Hayashi, Tomoki, Keio Univ.<br />

Uchiyama, Hideaki, Keio Univ.<br />

Pilet, Julien, Keio Univ.<br />

Saito, Hideo, Keio Univ.<br />

We propose a novel augmented reality (AR) setup with an omni directional camera on a table top display. The table acts as<br />

a mirror on which real playing cards appear augmented with virtual elements. The omni directional camera captures and recognizes<br />

its surrounding based on a feature based image retrieval approach which achieves fast and scalable registration. It<br />

allows our system to superimpose virtual visual effects to the omni directional camera image. In our AR card game, users sit<br />

around a table top display and show a card to the other players. The system recognizes it and augments it with virtual elements<br />

in the omni directional image acting as a mirror. While playing the game, the users can interact with each other directly and<br />

through the display. Our setup is a new, simple, and natural approach to augmented reality. It opens new doors to traditional<br />

card games.<br />

13:30-16:30, Paper WeBCT9.13<br />

Enhancing SVM Active Learning for Image Retrieval using Semi-Supervised Bias-Ensemble<br />

Wu, Jun, Dalian Maritime Univ.<br />

Lu, Ming-Yu, Dalian Maritime Univ.<br />

Wang, Chun-Li, Dalian Maritime Univ.<br />

Support vector machine (SVM) based active learning technique plays a key role to alleviate the burden of labeling in relevance<br />

feedback. However, most SVM-based active learning algorithms are challenged by the small example problem and the asymmetric<br />

distribution problem. This paper proposes a novel active learning scheme that deals with SVM ensemble under the<br />

semi-supervised setting to address the fist problem. For the second problem, a bias-ensemble mechanism is developed to<br />

guide the classification model to pay more attention on the positive examples than the negative ones. An empirical study<br />

shows that the proposed scheme is significantly more effective than some existing approaches.<br />

13:30-16:30, Paper WeBCT9.14<br />

Interactive Browsing of Remote JPEG 2000 Image Sequences<br />

Garcia Ortiz, Juan Pablo, Univ. of Almeria<br />

Ruiz, Gonzalez V., Univ. of Almeria<br />

Garcia, I., Univ. of Almeria<br />

Müller, D., European Space Agency/NASA<br />

Dimitoglou, G., European Space Agency/NASA<br />

This papers studies a novel prefetching scheme for the remote browsing of sequences of high resolution JPEG 2000 images.<br />

Using this scheme, an user is able to select randomly any of the remote images for its analysis, repeating this process with<br />

other images after some undefined time. Our solution has been proposed in a low bit-rate communication context where the<br />

complete transmission of any of the images for its lossless recovery should take too much time for an interactive visualization.<br />

For this reason, quality scalability is used in order to minimize the decoding latency. Frequently, the user can also play a<br />

``video’’, moving sequentially on the neighbour (consecutive in time over previous or following) images of the currently<br />

displayed one. With the objective of hiding also the link latency, the proposed data scheduler transmits in parallel data of the<br />

- 230 -


image that it is currently displayed and data of the rest of the temporally adjacent images. This scheduler uses a model based<br />

on the quality progression of the image in order to estimate which percentage of the bandwidth is dedicated to prefetch data.<br />

Our experimental results prove that a significant benefit can be achieved in terms of both subjective quality and responsiveness<br />

by means of prefetching.<br />

13:30-16:30, Paper WeBCT9.15<br />

Binarization of Color Characters in Scene Images using K-Means Clustering and Support Vector Machines<br />

Wakahara, Toru, Hosei Univ.<br />

Kita, Kohei, Hosei Univ.<br />

This paper proposes a new technique for binalizing multicolored characters subject to heavy degradations. The key ideas are<br />

threefold. The first is generation of tentatively binarized images via every dichotomization of k clusters obtained by k-means<br />

clustering in the HSI color space. The total number of tentatively binarized images equals 2^k2. The second is use of support<br />

vector machines (SVM) to determine whether and to what degree each tentatively binarized image represents a character or<br />

non-character. We feed the SVM with mesh and weighted direction code histogram features to output the degree of character-likeness.<br />

The third is selection of a single binarized image with the maximum degree of character likeness as an optimal<br />

binarization result. Experiments using a total of 1000 single-character color images extracted from the ICDAR 2003 robust<br />

OCR dataset show that the proposed method achieves a correct binarization rate of 93.7%.<br />

13:30-16:30, Paper WeBCT9.16<br />

A Self-Training Learning Document Binarization Framework<br />

Su, Bolan, National Univ. of Singapore<br />

Lu, Shijian, -<br />

Tan, Chew-Lim, National Univ. of Singapore<br />

Document Image Binarization techniques have been studied for many years, and many practical binarization techniques have<br />

been developed and applied successfully on commercial document analysis systems. However, the current state-of-the-art<br />

methods, fail to produce good binarization results for many badly degraded document images. In this paper, we propose a<br />

self-training learning framework for document image binarization. Based on reported binarization methods, the proposed<br />

framework first divides document image pixels into three categories, namely, foreground pixels, background pixels and uncertain<br />

pixels. A classifier is then trained by learning from the document image pixels in the foreground and background categories.<br />

Finally, the uncertain pixels are classified using the learned pixel classifier. Extensive experiments have been<br />

conducted over the dataset that is used in the recent Document Image Binarization Contest(DIBCO) 2009. Experimental results<br />

show that our proposed framework significantly improves the performance of reported document image binarization<br />

methods.<br />

13:30-16:30, Paper WeBCT9.17<br />

Novel Edge Features for Text Frame Classification in Video<br />

Palaiahnakote, Shivakumara, National Univ. of Singapore<br />

Tan, Chew-Lim, National Univ. of Singapore<br />

Text frame classification is needed in many applications such as event identification, exact event boundary identification,<br />

navigation, video surveillance in multimedia etc. To the best of our knowledge, there are no methods reported solely dedicated<br />

to text frame classifications so far. Hence this paper presents a new approach to text frame classification in video based on<br />

capturing local observable edge properties of text frames, by virtue of the strong presence of sharp edges, straight appearances<br />

of edges and consistent proximity between edges. The approach initially classifies the blocks of the frame into text blocks<br />

and non-text blocks. The true text block is then identified among classified text blocks to detect text frames by the proposed<br />

features. If the text frame produces one true text block then it is considered as a text frame otherwise a non-text frame. We<br />

evaluate the proposed approach on a large database containing both text and nontext frames and publicly available data at<br />

two levels, i.e., estimating recall and precision at the block level and the frame level.<br />

13:30-16:30, Paper WeBCT9.18<br />

Image Matching and Retrieval by Repetitive Patterns<br />

Doubek, Petr, Czech Tech. Univ. in Prague<br />

Matas, Jiri, Czech Tech. Univ. in Prague<br />

- 231 -


Perdoch, Michal, Czech Tech. Univ. in Prague<br />

Chum, Ondrej, Czech Tech. Univ. in Prague<br />

Detection of repetitive patterns in images has been studied for a long time in computer vision. This paper discusses a<br />

method for representing a lattice or line pattern by shift-invariant descriptor of the repeating element. The descriptor overcomes<br />

shift ambiguity and can be matched between different a views. The pattern matching is then demonstrated in retrieval<br />

experiment, where different images of the same buildings are retrieved solely by repetitive patterns.<br />

13:30-16:30, Paper WeBCT9.19<br />

An Approach for Recognizing Text Labels in Raster Maps<br />

Chiang, Yao-Yi, USC ISI<br />

Knoblock, Craig, USC ISI<br />

Text labels in raster maps provide valuable geospatial information by associating geographical names with geospatial locations.<br />

Although present commercial optical character recognition (OCR) products can achieve a high recognition rate<br />

on documents containing text lines of the same orientation, text recognition on raster maps is challenging due to the varying<br />

text orientations and the overlap of text labels. This paper presents a text recognition approach that focuses on locating individual<br />

text labels in the map and detecting their orientations to then leverage the horizontal text recognition capability<br />

of commercial OCR software. We show that our approach detects accurate string orientations and achieves 96.2% precision<br />

and 94.7% recall on character recognition and 80.6% precision and 84.1% recall on word recognition.<br />

13:30-16:30, Paper WeBCT9.20<br />

Local Visual Pattern Indexing for Matching Screenshots with Videos<br />

Poullot, Sebastien, National Inst. of Informatics<br />

Satoh, Shin’Ichi, National Inst. of Informatics<br />

In this paper a particular issue is addressed: matching still images (screen shots) with videos. A content-based similarity<br />

search approach using image queries is proposed. A fast method based on local visual patterns both for matching and indexing<br />

is employed. But we argue that using every frames may limit the scalability of the approach. Therefore only<br />

keyframes are extracted and used. The main contribution of this paper is an investigation over the trade-off between accuracy<br />

and scalability using different keyframe rates for sampling the video database. This trade-off is evaluated on a<br />

ground truth using a large reference video database (1000 hours).<br />

13:30-16:30, Paper WeBCT9.21<br />

Suggesting Songs for Media Creation using Semantics<br />

Joshi, Dhiraj, Kodak Res. Lab.<br />

Wood, Mark, Eastman Kodak Company<br />

Luo, Jiebo, -<br />

In this paper, we describe a method for matching song lyrics with semantic annotations of picture collections in order to<br />

suggest songs that reflect picture content in lyrics or genre. Picture collections are first analyzed to extract a variety of semantic<br />

information including scene type, event type, and geospatial information. When aggregated over a picture collection,<br />

this semantic information forms a semantic signature of the collection. Typical picture collections in our scenario consist<br />

of photo subdirectories in which people store pictures of a place, activity, or event. Picture collections are expected to<br />

contain coherent semantic content describing in part or whole the event or activity they depict. The semantic signature of<br />

a picture collection is compared against song lyrics using a WordNet expansion based text matching to find songs relevant<br />

to the collection. We present interesting song suggestions, compare and contrast scenarios with human versus machine labels,<br />

and perform a user study to validate the usefulness of the proposed method. The proposed method will be a useful<br />

tool to support user media creation.<br />

13:30-16:30, Paper WeBCT9.22<br />

Color Feature based Approach for Determining Ink Age in Printed Documents<br />

Halder, Biswajit, Mallabhum Inst. of Tech.<br />

Garain, Utpal, Indian Statistical Inst.<br />

- 232 -


Answering to a query like when a particular document was printed is quite helpful in practice especially forensic purposes.<br />

This study attempts to develop a general framework that makes use of image processing and pattern recognition principles<br />

for ink age determination in printed documents. The approach, at first, computationally extracts a set of suitable color features<br />

and then analyzes them to properly associate them with ink age. Finally, a neural net is designed and trained to determine<br />

ages of unknown samples. The dataset used for the present experiment consists of the cover pages of LIFE<br />

magazines published in between 1930’s and 70’s (five decades). Test results show that a viable framework for involving<br />

machines in assisting human experts for determining age of printed documents.<br />

13:30-16:30, Paper WeBCT9.23<br />

Automatic Detection and Localization of Natural Scene Text in Video<br />

Huang, Xiaodong, Beijing Univ. of Posts and Telecommunications<br />

Ma, Huadong, Beijing Univ. of Posts and Telecommunications<br />

Video scene text contains semantic information and thus can contribute significantly to video indexing and summarization.<br />

However, most of the previous approaches to detecting scene text from videos experience difficulties in handling texts<br />

with various character size and text alignments. In this paper, we propose a novel algorithm of scene text detection and<br />

localization in video. Based on our observation that text character strokes show intensive edge details in the fixed orientation<br />

no matter what text alignment and size are, a stroke map is first generated. In the scene text detection, we extract<br />

the texture feature of stroke map to locate text lines. The detected scene text lines are accurately located by using Harris<br />

corners in the stroke map. Experimental results show that this approach is robust and can be effectively applied to scene<br />

text detection and localization in video.<br />

13:30-16:30, Paper WeBCT9.24<br />

High-Level Feature Extraction using SIFT GMMs and Audio Models<br />

Inoue, Nakamasa, Tokyo Inst. of Tech.<br />

Saito, Tatsuhiko, Tokyo Inst. of Tech.<br />

Shinoda, Koichi, Tokyo Inst. of Tech.<br />

Furui, Sadaoki,<br />

We propose a statistical framework for high-level feature extraction that uses SIFT Gaussian mixture models (GMMs)<br />

and audio models. SIFT features were extracted from all the image frames and modeled by a GMM. In addition, we used<br />

mel-frequency cepstral coefficients and ergodic hidden Markov models to detect high-level features in audio streams. The<br />

best result obtained by using SIFT GMMs in terms of mean average precision on the TRECVID 2009 corpus was 0.150<br />

and was improved to 0.164 by using audio information.<br />

13:30-16:30, Paper WeBCT9.25<br />

Pairwise Features for Human Action Recognition<br />

Ta, Anh Phuong, Univ. de Lyon, CNRS, INSA-Lyon, LIRIS<br />

Wolf, Christian, INSA de Lyon<br />

Lavoue, Guillaume, Univ. de Lyon, CNRS<br />

Baskurt, Atilla, LIRIS, INSA Lyon<br />

Jolion, Jolion, Univ. de Lyon<br />

Existing action recognition approaches mainly rely on the discriminative power of individual local descriptors extracted<br />

from spatio-temporal interest points (STIP), while the geometric relationships among the local features are ignored. This<br />

paper presents new features, called pairwise features (PWF), which encode both the appearance and the spatio-temporal<br />

relations of the local features for action recognition. First STIPs are extracted, then PWFs are constructed by grouping<br />

pairs of STIPs which are both close in space and close in time. We propose a combination of two code<strong>book</strong>s for video<br />

representation. Experiments on two standard human action datasets: the KTH dataset and the Weizmann dataset show that<br />

the proposed approach outperforms most existing methods.<br />

13:30-16:30, Paper WeBCT9.26<br />

Group Activity Recognition by Gaussian Processes Estimation<br />

Cheng, Zhongwei, Chinese Acad. of Sciences<br />

Qin, Lei, Chinese Acad. of Sciences<br />

- 233 -


Huang, Qingming, Chinese Acad. of Sciences<br />

Jiang, Shuqiang, Chinese Acad. of Sciences<br />

Tian, Qi, Univ. of Texas at San Antonio<br />

Human action recognition has been well studied recently, but recognizing the activities of more than three persons remains<br />

a challenging task. In this paper, we propose a motion trajectory based method to classify human group activities. Gaussian<br />

Processes are introduced to represent human motion trajectories from a probabilistic perspective to handle the variability<br />

of people’s activities in group. With respect to the relationships of persons in group activities, three discriminative descriptors<br />

are designed, which are Individual, Dual and Unitized Group Activity Pattern. We adopt the Bag of Words approach<br />

to solve the problem of unbalanced number of persons in different activities. Experiments are conducted on the<br />

human group-activity video database, and the results show that our approach outperforms the state-of-the-art.<br />

13:30-16:30, Paper WeBCT9.27<br />

Extracting Captions in Complex Background from Videos<br />

Liu, Xiaoqian, Chinese Acad. of Sciences<br />

Wang, Weiqiang, Chinese Acad. of Sciences<br />

Captions in videos play a significant role for automatically understanding and indexing video content, since much semantic<br />

information is associated with them. This paper presents an effective approach to extracting captions from videos, in which<br />

multiple different categories of features (edge, color, stroke etc.) are utilized, and the spatio-temporal characteristics of<br />

captions are considered. First, our method exploits the distribution of gradient directions to decompose a video into a sequence<br />

of clips temporally, so that each clip contains a caption at most, which makes the successive extraction computation<br />

more efficient and accurate. For each clip, the edge and corner information are then utilized to locate text regions. Further,<br />

text pixels are extracted based on the assumption that text pixels in text regions always have homogeneous color, and their<br />

quantity dominates the region relative to non-text pixels with different colors. Finally, the segmentation results are further<br />

refined. The encouraging experimental results on 2565 characters have preliminarily validated our approach.<br />

13:30-16:30, Paper WeBCT9.28<br />

Keyframe-Guided Automatic Non-Linear Video Editing<br />

Rajgopalan, Vaishnavi, Concordia Univ.<br />

Ranganathan, Ananth, Honda Res. Inst. USA<br />

Rajagopalan, Ramgopal, Res. in Motion<br />

Mudur, Sudhir, Concordia Univ.<br />

We describe a system for generating coherent movies from a collection of unedited videos. The generation process is<br />

guided by one or more input keyframes, which determine the content of the generated video. The basic mechanism involves<br />

similarity analysis using the histogram intersection function. The function is applied to spatial pyramid histograms computed<br />

on the video frames in the collection using Dense SIFT features. A two-directional greedy path finding algorithm is<br />

used to select and arrange frames from the collection while maintaining visual similarity, coherence, and continuity. Our<br />

system demonstrates promising results on large video collections and is a first step towards increased automation in nonlinear<br />

video editing.<br />

13:30-16:30, Paper WeBCT9.29<br />

Images in News<br />

Sankaranarayanan, Jagan, Univ. of Maryland<br />

Samet, Hanan, Univ. of Maryland<br />

A system, called News Stand, is introduced that automatically extracts images from news articles. The system takes RSS feeds of news<br />

article and applies an online clustering algorithm so that articles belonging to the same news topic can be associated with the same cluster.<br />

Using the feature vector associated with the cluster, the images from news articles that form the cluster are extracted. First, the caption text<br />

associated with each of the images embedded in the news article is determined. This is done by analyzing the structure of the news article’s<br />

HTML page. If the caption and feature vector of the cluster are found to contain keywords in common, then the image is added to an image<br />

repository. Additional meta-information are now associated with each image such as caption, cluster features, names of people in the news<br />

article, etc. A very large repository containing more than 983k images from 12 million news articles was built using this approach. This<br />

repository also contained more than 86.8 million keywords associated with the images. The key contribution of this work is that it combines<br />

clustering and natural language processing tasks to automatically create a large corpus of news images with good quality tags or meta-information<br />

so that interesting vision tasks can be performed on it.<br />

- 234 -


13:30-16:30, Paper WeBCT9.30<br />

A Multimodal Approach to Violence Detection in Video Sharing Sites<br />

Giannakopoulos, Theodoros, Univ. of Athens<br />

Pikrakis, Aggelos, Univ. of Piraeus<br />

Theodoridis, Sergios, Univ. of Athens<br />

This paper presents a method for detecting violent content in video sharing sites. The proposed approach operates on a fusion<br />

of three modalities: audio, moving image and text data, the latter being collected from the accompanying user comments.<br />

The problem is treated as a binary classification task (violent vs non-violent content) on a 9-dimensional feature<br />

space, where 7 out of 9 features are extracted from the audio stream. The proposed method has been evaluated on 210<br />

YouTube videos and the overall accuracy has reached 82%.<br />

13:30-16:30, Paper WeBCT9.31<br />

Video Retrieval based on Tracked Features Quantization<br />

Hiroaki, Kubo, Keio Univ.<br />

Pilet, Julien, Keio Univ.<br />

Satoh, Shin’Ichi, National Inst. of Informatics<br />

Saito, Hideo, Keio Univ.<br />

In this paper, we present an image retrieval method based on feature tracking. Feature tracks are summarized into a compact<br />

discreet value and used for video indexing purpose. As opposed to existing space-time features, we do not make any assumption<br />

on the motion visible on the indexed videos. As a result, given an example query, our system is able to retrieve<br />

related videos from a large database. We evaluated our system with the copy detection benchmark MUSCLE-VCD-2007.<br />

We also ran retrieval experiment on hours of TV broadcast.<br />

13:30-16:30, Paper WeBCT9.32<br />

Interactive Web Video Advertising with Context Analysis and Search<br />

Wang, Bo, Chinese Acad. of Sciences<br />

Wang, Jinqiao, Chinese Acad. of Sciences<br />

Duan, Lingyu, Peking Univ.<br />

Tian, Qi, Univ. of Texas at San Antonio<br />

Lu, Hanqing, Chinese Acad. of Sciences<br />

Gao, Wen, PeKing Univ.<br />

Online media services and electronic commerce are booming recently. Previous studies have been devoted to contextual<br />

advertising, but few work deals with interactive web advertising. In this paper, we propose to put users in the loop of collecting<br />

contextual ad information with an interaction process, establishing semantic ad links across media platforms. Given<br />

an ad video, the key frames with explicit product information are located, which allow users to click favorite key frames<br />

for searching ads interactively. A three-stage contextual search is applied to find relevant products or services from web<br />

pages, i.e., searching visually similar product images on shopping websites, ranking product tags by text aggregation, and<br />

re-search textual items consisting of semantic meaningful tags to make a recommendation. In addition, users can choose<br />

automatically suggested keywords to reflect their intentions. Subjective evaluation has demonstrated the effectiveness of<br />

the proposed approach to interactive video advertising over the Web.<br />

13:30-16:30, Paper WeBCT9.33<br />

Selection of Photos for Album Building Applications<br />

Egorova, Marta, National Nuclear Res. Univ.<br />

Safonov, Ilia, National Nuclear Res. Univ.<br />

In this work we propose a new algorithm for selection of high-quality photos for album building applications. We describe<br />

how to select features for detection well-exposed, sharp and artifact-free photos. We considered two approaches: the first,<br />

typical way when all features are used in single AdaBoost classifiers committee and the second way, when decision tree,<br />

including 3 committees. Careful analysis of features and decision tree construction allowed better outcomes to be reached.<br />

- 235 -


13:30-16:30, Paper WeBCT9.34<br />

Comparison of Multidimensional Data Access Methods for Feature-Based Image Retrieval<br />

Arslan, Serdar, Middle East Tech. Univ.<br />

Açar, Esra, Middle East Tech. Univ.<br />

Saçan, Ahmet, Middle East Tech. Univ.<br />

Toroslu, Ismail Hakkı , Middle East Tech. Univ.<br />

Yazıcı, Adnan, Middle East Tech. Univ.<br />

Within the scope of information retrieval, efficient similarity search in large document or multimedia collections is a<br />

critical task. In this paper, we present a rigorous comparison of three different approaches to the image retrieval problem,<br />

including cluster-based indexing, distance-based indexing, and multidimensional scaling methods. The time and accuracy<br />

trade-offs for each of these methods are demonstrated on a large Corel image database. Similarity of images is obtained<br />

via a feature-based similarity measure using four MPEG-7 low-level descriptors. We show that an optimization of feature<br />

contributions to the distance measure can identify irrelevant features and is necessary to obtain the maximum accuracy.<br />

We further show that using multidimensional scaling can achieve comparable accuracy, while speeding-up the query times<br />

significantly by allowing the use of spatial access methods.<br />

13:30-16:30, Paper WeBCT9.35<br />

A Pixel-Based Evaluation Method for Text Detection in Color Images<br />

Anthimopoulos, Marios, National Center for Scientific Res. “Demokritos”<br />

Vlissidis, Nikolaos, National Center for Scientific Res. “Demokritos”<br />

Gatos, B., National Center for Scientific Res. “Demokritos”<br />

This paper proposes a performance evaluation method for text detection in color images. The method, contrary to previous<br />

approaches, is not based on the inexplicitly defined text bounding boxes for the evaluation of the text detection result but<br />

considers only the text pixels detected by binarizing the image and applying a color inversion if needed. Moreover, in<br />

order to gain independence from the chosen binarization algorithm, the method uses the skeleton of the binarized image.<br />

The results produced by the proposed evaluation protocol proved to be quite representative and reasonable compared to<br />

the corresponding optical result.<br />

13:30-16:30, Paper WeBCT9.36<br />

Active Boosting for Interactive Object Retrieval<br />

Lechervy, Alexis, ETIS, CNRS, ENSEA, Univ. Cergy-Pontoise<br />

Gosselin, Philippe Henri, CNRS<br />

Precioso, Frederic, ETIS, CNRS, ENSEA, Univ. Cergy-Pontoise<br />

This paper presents a new algorithm based on boosting for interactive object retrieval in images. Recent works propose<br />

online boosting algorithms where weak classifier sets are iteratively trained from data. These algorithms are proposed for<br />

visual tracking in videos, and are not well adapted to online boosting for interactive retrieval. We propose in this paper to<br />

iteratively build weak classifiers from images, labeled as positive by the user during a retrieval session. A novel active<br />

learning strategy for the selection of images for user annotation is also proposed. This strategy is used to enhance the<br />

strong classifier resulting from boosting process, but also to build new weak classifiers. Experiments have been carried<br />

out on a generalist database in order to compare the proposed method to a SVM based reference approach.<br />

13:30-16:30, Paper WeBCT9.37<br />

Geotagged Photo Recognition using Corresponding Aerial Photos with Multiple Kernel Learning<br />

Keita, Yaegashi, Univ. of Electro-Commnications<br />

Keiji, Yanai, Univ. of Electro-Commnications<br />

In this paper, we treat with generic object recognition for geotagged images. As a recognition method for geotagged photos,<br />

we have already proposed exploiting aerial photos around geotag places as additional image features for visual recognition<br />

of geotagged photos. In the previous work, to fuse two kinds of features, we just concatenate them. Instead, in this paper,<br />

we introduce Multiple Kernel Learning (MKL) to integrate both features of photos and aerial images. MKL can estimate<br />

the contribution weights to integrate both kinds of features. In the experiments, we confirmed effectiveness of usage of<br />

aerial photos for recognition of geotagged photos, and we evaluated the weights of both features estimated by MKL for<br />

eighteen concepts.<br />

- 236 -


13:30-16:30, Paper WeBCT9.38<br />

Efficient Semantic Indexing for Image Retrieval<br />

Pulla, Chandrika, International Inst. of Information Tech. Hyderabad<br />

Karthik, Suman, International Inst. of Information Tech. Hyderabad<br />

Jawahar, C. V., IIIT<br />

Semantic analysis of a document collection can be viewed as an unsupervised clustering of the constituent words and documents<br />

around hidden or latent concepts. This has shown to improve the performance of visual bag of words in image retrieval.<br />

However, the enhancement in performance depends heavily on the right choice of number of semantic concepts.<br />

Most of the semantic indexing schemes are also computationally costly. In this paper, we employ a bipartite graph model<br />

(BGM) for image retrieval. BGM is a scalable data structure that aids semantic indexing in an efficient manner. It can also<br />

be incrementally updated. BGM uses \textbf{tf-idf} values for building a semantic bipartite graph. We also introduce a<br />

graph partitioning algorithm that works on the BGM to retrieve semantically relevant images from a database. We demonstrate<br />

the properties as well as performance of our semantic indexing scheme through a series of experiments. We also<br />

compare our methods with incremental pLSA.<br />

13:30-16:30, Paper WeBCT9.39<br />

Improving and Aligning Speech with Presentation Slides<br />

Swaminathan, Ranjini, Univ. of Arizona<br />

Thompson, Michael E., Univ. of Arizona<br />

Fong, Sandiway, Univ. of Arizona<br />

Efrat, Alon, Univ. of Arizona<br />

Amir, Arnon<br />

Barnard, Kobus, Univ. of Arizona<br />

We present a novel method to correct automatically generated speech transcripts of talks and lecture videos using text<br />

from accompanying presentation slides. The approach finesses the challenges of dealing with technical terms which are<br />

often outside the vocabulary of speech recognizers. Further, we align the transcript to the slide word sequence so that we<br />

can improve the organization of closed captioning for hearing impaired users, and improve automatic highlighting or magnification<br />

for visually impaired users. For each speech segment associated with a slide, we construct a sequential Hidden<br />

Markov Model for the observed phonemes that follows slide word order, interspersed with text not on the slide. Incongruence<br />

between slide words and mistaken transcript words is accounted for using phoneme confusion probabilities. Hence,<br />

transcript words different from aligned high probability slide words can be corrected. Experiments on six talks show improvement<br />

in transcript accuracy and alignment with slide words.<br />

13:30-16:30, Paper WeBCT9.40<br />

The ImageCLEF Medical Retrieval Task at <strong>ICPR</strong> <strong>2010</strong> - Information Fusion<br />

Kalpathy-Cramer, Jayashree, Oregon Health & Science Univ.<br />

Müller, Henning, Univ. of Applied Sciences<br />

An increasing number of clinicians, researchers, educators and patients routinely search for medical information on the<br />

Internet as well as in image archives. However, image retrieval is far less understood and developed than text-based search.<br />

The ImageCLEF medical image retrieval task is an international benchmark that enables researchers to assess and compare<br />

techniques for medical image retrieval using standard test collections. Although text retrieval is mature and well researched,<br />

it is limited by the quality and availability of the annotations associated with the images. Advances in computer vision<br />

have led to methods for using the image itself as search entity. However, the success of purely content-based techniques<br />

has been limited and these systems have not had much clinical success. On the other hand a combination of text- and content-based<br />

retrieval can achieve improved retrieval performance if combined effectively. Combining visual and textual<br />

runs is not trivial based on experience in ImageCLEF. The goal of the fusion challenge at <strong>ICPR</strong> is to encourage participants<br />

to combine visual and textual results to improve search performance. Participants were provided textual and visual runs,<br />

as well as the results of the manual judgments from ImageCLEFmed 2008 as training data. The goal was to combine<br />

textual and visual runs from 2009. In this paper, we present the results from this <strong>ICPR</strong> contest.<br />

13:30-16:30, Paper WeBCT9.41<br />

Unified Approach to Detection and Identification of Commercial Films by Temporal Occurrence Pattern<br />

Putpuek, Narongsak, Chulalongkorn Univ.<br />

- 237 -


Cooharojananone, Nagul, Chulalongkorn Univ.<br />

Lursinsap, Chidchanok, Chulalongkorn Univ.<br />

Satoh, Shin’Ichi, National Inst. of Informatics<br />

In this paper, we propose a method to detect and identify commercial films from broadcast videos by using Temporal Occurrence<br />

Pattern (TOP). Our method uses the characteristic of broadcast videos in Japan that each individual commercial<br />

film appears multiple times in broadcast stream and typically has the same duration (e.g., 15 seconds). Using this characteristic,<br />

the method can detect as well as identify individual commercial films within given video archive. Based on simple<br />

signature (global feature) for each frame image, the method first puts all frames into numbers of buckets where each bucket<br />

contains frames having the same signature, and thus they appear the same. For each bucket, TOP as a binary sequence<br />

representing the occurrence time within video archive is then generated. All buckets are then clustered using simple hierarchical<br />

clustering with similarity between TOPs allowing possible temporal offset. This clustering stage can stitch up all<br />

frames for each commercial film and identify multiple occurrence of the same commercial film at the same time. We tested<br />

our method using actual broadcast video archive and confirmed good performance in detecting and identifying commercial<br />

films.<br />

- 238 -


Technical Program for Thursday<br />

August 26, <strong>2010</strong><br />

- 239 -


- 240 -


ThAT1 Marmara Hall<br />

Object Detection and Recognition - IV Regular Session<br />

Session chair: Lee, Kyoung Mu (Seoul National Univ.)<br />

09:00-09:20, Paper ThAT1.1<br />

Visual Recognition of Types of Structural Corridor Landmarks using Vanishing Points Detection and Hidden Markov<br />

Models<br />

Park, Young-Bin, Hanyang Univ.<br />

Kim, Sung-Su, Hanyang Univ.<br />

Suh, Il Hong, Hanyang Univ.<br />

In this paper, to provide a robot with information relative to structure of its environment, we propose a method to recognize<br />

types of structural corridor landmarks such as T-junction, L-junction, end of the corridor, using vanishing points-based visual<br />

image features and hidden Markov models. Several experimental results are illustrated to emonstrate the validity of the proposed<br />

approach in a real environment.<br />

09:20-09:40, Paper ThAT1.2<br />

Multi-Object Segmentation in a Projection Plane using Subtraction Stereo<br />

Ubukata, Toru, Chuo University / CREST, JST<br />

Terabayashi, Kenji, Chuo Univ.<br />

Moro, Alessandro, Univ. of Trieste<br />

Umeda, Kazunori, Chuo Univ.<br />

We propose a method for multi-object segmentation in a projection plane. Our algorithm requires a stereo camera system<br />

called Subtraction Stereo, which extracts foreground information with a fixed stereo camera. The main contribution of this<br />

paper is how the image sequences that include partial occlusion of the foreground objects can be accurately segmented using<br />

mean shift clustering in real-time processing. The proposed method is suitable for inside a medium-sized environment, such<br />

as a room. Finally, we try to segment the sequences that include occlusion and show the accuracy of the proposed method.<br />

09:40-10:00, Paper ThAT1.3<br />

Transitive Closure based Visual Words for Point Matching in Video Sequence<br />

Bhat, Srikrishna, INRIA<br />

Berger, Marie-Odile, INRIA<br />

Simon, Gilles, Nancy-Univ.<br />

Sur, Frédéric, INPL / INRIA Nancy Grand Est<br />

We present Transitive Closure based visual word formation technique for obtaining robust object representations from<br />

smoothly varying multiple views. Each one of our visual words is represented by a set of feature vectors which is obtained<br />

by performing transitive closure operation on SIFT features. We also present range-reducing tree structure to speed up the<br />

transitive closure operation. The robustness of our visual word representation is demonstrated for Structure from Motion<br />

(SfM) and location identification in video images.<br />

10:00-10:20, Paper ThAT1.4<br />

Constrained Energy Minimization for Matching-Based Image Recognition<br />

Gass, Tobias, RWTH Aachen Univ.<br />

Dreuw, Philippe, RWTH Aachen Univ.<br />

Ney, Hermann, RWTH Aachen Univ.<br />

We propose to use energy minimization in MRFs for matching-based image recognition tasks. To this end, the Tree-<br />

Reweighted Message Passing algorithm is modified by geometric constraints and efficiently used by exploiting the guaranteed<br />

monotonicity of the lower bound within a nearest-neighbor based classification framework. The constraints allow for a<br />

speedup linear to the dimensionality of the reference image, and the lower bound allows to optimally prune the nearestneighbor<br />

search without loosing accuracy, effectively allowing to increase the number of optimization iterations without an<br />

effect on runtime. We evaluate our approach on well-known OCR and face recognition tasks and on the latter outperform<br />

current state-of-the-art.<br />

- 241 -


10:20-10:40, Paper ThAT1.5<br />

A Re-Evaluation of Pedestrian Detection on Riemannian Manifolds<br />

Tosato, Diego, Univ. of Verona<br />

Farenzena, Michela, Univ. of Verona<br />

Cristani, Marco, Univ. of Verona<br />

Murino, Vittorio, Univ. of Verona<br />

Boosting covariance data on Riemannian manifolds has proven to be a convenient strategy in a pedestrian detection context.<br />

In this paper we show that the detection performances of the state-of-the-art approach of Tuzel et al. [7] can be greatly improved,<br />

from both a computational and a qualitative point of view, by considering practical and theoretical issues, and<br />

allowing also the estimation of occlusions in a fine way. The resulting detection system reaches the best performance on the<br />

INRIA dataset, setting novel state-of-the art results.<br />

ThAT2 Anadolu Auditorium<br />

Classification - I Regular Session<br />

Session chair: Duin, Robert (TU Delft)<br />

09:00-09:20, Paper ThAT2.1<br />

An Optimum Class-Rejective Decision Rule and its Evaluation<br />

Le Capitaine, Hoel, Univ. of La Rochelle<br />

Frelicot, Carl, Univ. of La Rochelle<br />

Decision-making systems intend to copy human reasoning which often consists in eliminating highly non probable situations<br />

(e.g. diseases, suspects) rather than selecting the most reliable ones. In this paper, we present the concept of class-rejective<br />

rules for pattern recognition. Contrary to usual reject option schemes where classes are selected when they may correspond<br />

to the true class of the input pattern, it allows to discard classes that can not be the true one. Optimality of the rule is proven<br />

and an upper-bound for the error probability is given. We also propose a criterion to evaluate such class-rejective rules. Classification<br />

results on artificial and real datasets are provided.<br />

09:20-09:40, Paper ThAT2.2<br />

A Practical Heterogeneous Classifier for Relational Databases<br />

Manjunath, Geetha, Indian Inst. of Science<br />

M, Narasimha Murty, Indian Inst. of Science<br />

Sitaram, Dinkar, Hewlett Packard Company<br />

Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional singletable<br />

machine learning techniques over such data not only incur a computational penalty for converting to a flat form (megajoin),<br />

even the human-specified semantic information present in the relations is lost. In this paper, we present a two-phase<br />

hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose<br />

a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. A preliminary<br />

evaluation on TPCH and UCI benchmarks shows reduced training time without any loss of prediction accuracy.<br />

09:40-10:00, Paper ThAT2.3<br />

Spatial Representation for Efficient Sequence Classification<br />

Kuksa, Pavel, Rutgers Univ.<br />

Pavlovic, Vladimir, Rutgers Univ.<br />

We present a general, simple feature representation of sequences that allows efficient inexact matching, comparison and<br />

classification of sequential data. This approach, recently introduced for the problem of biological sequence classification,<br />

exploits a novel multi-scale representation of strings. The new representation leads to discovery of very efficient algorithms<br />

for string comparison, independent of the alphabet size. We show that these algorithms can be generalized to handle a wide<br />

gamut of sequence classification problems in diverse domains such as the music and text sequence classification. The presented<br />

algorithms offer low computational cost and highly scalable implementations across different application domains.<br />

The new method demonstrates order-of-magnitude running time improvements over existing state-of-the-art ap<br />

proaches while matching or exceeding their predictive accuracy.<br />

- 242 -


10:00-10:20, Paper ThAT2.4<br />

Rectifying Non-Euclidean Similarity Data using Ricci Flow Embedding<br />

Xu, Weiping, Univ. of York<br />

Hancock, Edwin, Univ. of York<br />

Wilson, Richard, Univ. of York<br />

Similarity based pattern recognition is concerned with the analysis of patterns that are specified in terms of object dissimilarity<br />

or proximity rather than ordinal values. For many types of data and measures, these dissimilarities are not Euclidean.<br />

This hinders the use of many machine-learning techniques. In this paper, we provide a means of correcting or rectifying<br />

the similarities so that the non-Euclidean artifacts are minimized. We consider the data to be embedded as points on a<br />

curved manifold and then evolve the manifold so as to increase its flatness. Our work uses the idea of Ricci flow on the<br />

constant curvature Riemannian manifold to modify the Gaussian curvatures on the edges of a graph representing the non-<br />

Euclidean data. We demonstrate the utility of our method on the standard ``Chicken pieces’’ dataset and show that we can<br />

transform the non-Euclidean distances into Euclidean space.<br />

10:20-10:40, Paper ThAT2.5<br />

One-Vs-All Training of Prototype Classifier for Pattern Classification and Retrieval<br />

Liu, Cheng-Lin, Chinese Acad. of Sciences<br />

Prototype classifiers trained with multi-class classification objective are inferior in pattern retrieval and outlier rejection.<br />

To improve the binary classification (detection, verification, retrieval, outlier rejection) performance of prototype classifiers,<br />

we propose a one-vs-all training method, which enriches each prototype as a binary discriminant function with a local<br />

threshold, and optimizes both the prototype vectors and the thresholds on training data using a binary classification objective,<br />

the cross-entropy (CE). Experimental results on two OCR datasets show that prototype classifiers trained by the onevs-all<br />

method is superior in both multi-class classification and binary classification.<br />

ThAT3 Topkapı Hall A<br />

Computer Vision Applications - I Regular Session<br />

Session chair: Haindl, Michael (Institute of Information Theory)<br />

09:00-09:20, Paper ThAT3.1<br />

Probabilistic Modeling of Dynamic Traffic Flow across Non-Overlapping Camera Views<br />

Huang, Ching-Chun, National Chiao Tung University<br />

Chiu, Wei-Chen, Department of Computer Science<br />

Wang, Sheng-Jyh, National Chiao Tung Univ.<br />

Chuang, Jen-Hui, National Chiao Tung Univ.<br />

In this paper, we propose a probabilistic method to model the dynamic traffic flow across non-overlapping camera views.<br />

By assuming the transition time of object movement follows a certain global model, we may infer the time-varying traffic<br />

status in the unseen region without performing explicit object correspondence between camera views. In this paper, we<br />

model object correspondence and parameter estimation as a unified problem under the proposed Expectation-Maximization<br />

(EM) based framework. By treating object correspondence as a latent random variable, the proposed framework can iteratively<br />

search for the optimal model parameters with the implicit consideration of object correspondence.<br />

09:20-09:40, Paper ThAT3.2<br />

Vehicle Recognition as Changes in Satellite Imagery<br />

Ozcanli, Ozge Can, Brown Univ.<br />

Mundy, Joseph,<br />

Over the last several years, a new probabilistic representation for 3-d volumetric modeling has been developed. The main purpose of the<br />

model is to detect deviations from the normal appearance and geometry of the scene, i.e. change detection. In this paper, the model is<br />

utilized to characterize changes in the scene as vehicles. In the training stage, a compositional part hierarchy is learned to represent the<br />

geometry of Gaussian intensity extrema primitives exhibited by vehicles. In the test stage, the learned compositional model produces vehicle<br />

detections. Vehicle recognition performance is measured on low-resolution satellite imagery and detection accuracy is significantly improved<br />

over the initial change map given by the 3-d volumetric model. A PCA-based Bayesian recognition algorithm is implemented for comparison,<br />

which exhibits worse performance than the proposed method.<br />

- 243 -


09:40-10:00, Paper ThAT3.3<br />

Crowd Motion Analysis using Linear Cyclic Pursuit<br />

Viswanathan, Srikrishnan, I.I.T Bombay<br />

Chaudhuri, Subhasis, IIT<br />

Crowd motion analysis, where there is interdependence amongst the constituent elements, is a relatively unexplored application<br />

area in computer vision. In this work, we propose a fast method for short-term crowd motion prediction using a<br />

sparse set of particles. We study the dynamics of a crowd motion model and linear cyclic pursuit. We show that linear<br />

cyclic pursuit naturally captures the repulsive and attractive forces acting on the individual crowd member. The pursuit<br />

parameters are estimated from videos in an online manner using a feature tracker. Short term trajectory prediction is done<br />

by numerical solution of estimated cyclic pursuit equation. We demonstrate the suitability of the proposed technique<br />

through extensive experimentations.<br />

10:00-10:20, Paper ThAT3.4<br />

Integrating Object Detection with 3D Tracking towards a Better Driver Assistance System<br />

Prisacariu, Victor Adrian, Univ. of Oxford<br />

Timofte, Radu, Katholieke Univ. Leuven<br />

Zimmermann, Karel, Katholieke Univ. Leuven<br />

Reid, Ian,<br />

Van Gool, Luc<br />

Driver assistance helps save lives. Accurate 3D pose is required to establish if a traffic sign is relevant to the driver. We<br />

propose a real-time system that integrates single view detection with region-based 3D tracking of road signs. The optimal<br />

set of candidate detections is found, followed by AdaBoost cascades and SVMs. The 2D detections are then employed in<br />

simultaneous 2D segmentation and 3D pose tracking, using the known 3D model of the recognised traffic sign. We demonstrate<br />

the abilities of our system by tracking multiple road signs in real world scenarios.<br />

10:20-10:40, Paper ThAT3.5<br />

Real-Time Automatic Traffic Accident Recognition using HFG<br />

Bakheet, Samy, Otto-von-Guericke Univ. Magdeburg<br />

Al-Hamadi, Ayoub, Otto-von-Guericke Univ. Magdeburg<br />

Michaelis, Bernd, Otto-von-Guericke Univ. Magdeburg<br />

Sayed, Usama, Otto-von-Guericke Univ. Magdeburg<br />

Recently, the problem of automatic traffic accident recognition has appealed to the machine vision community due to its<br />

implications on the development of autonomous Intelligent Transportation Systems (ITS). In this paper, a new framework<br />

for real-time automated traffic accidents recognition using Histogram of Flow Gradient (HFG) is proposed. This framework<br />

performs two major steps. First, HFG-based features are extracted from video shots. Second, logistic regression is employed<br />

to develop a model for the probability of occurrence of an accident by fitting data to a logistic curve. In case of occurrence<br />

of an accident, the trajectory of vehicle by which the accident was occasioned is determined. Preliminary results on real<br />

video sequences confirm the effectiveness and the applicability of the proposed approach, and it can offer delay guarantees<br />

for real-time surveillance and monitoring scenarios.<br />

ThAT4 Dolmabahçe Hall A<br />

Semi-Supervised and Metric Learning Regular Session<br />

Session chair: Sanfeliu, Alberto (Universitat Politecnica de Catalunya)<br />

09:00-09:20, Paper ThAT4.1<br />

Semi-Supervised Distance Metric Learning by Quadratic Programming<br />

Cevikalp, Hakan, Eskisehir Osmangazi Univ.<br />

This paper introduces a semi-supervised distance metric learning algorithm which uses pair-wise equivalence (similarity<br />

and dissimilarity) constraints to improve the original distance metric in lower-dimensional input spaces. We restrict ourselves<br />

to pseudo-metrics that are in quadratic forms parameterized by positive semi-definite matrices. The proposed method<br />

works in both the input space and kernel in-duced feature space, and learning distance metric is formulated as a quadratic<br />

- 244 -


optimization problem which returns a global optimal solution. Experimental results on several databases show that the<br />

learned distance metric improves the performances of the subsequent classification and clustering algorithms.<br />

09:20-09:40, Paper ThAT4.2<br />

A Comparitive Study on the Use of an Ensemble of Feature Extractors for the Automatic Design of Local Image Descriptors<br />

Carneiro, Gustavo, Tech. Univ. of Lisbon<br />

The use of an ensemble of feature spaces trained with distance metric learning methods has been empirically shown to be<br />

useful for the task of automatically designing local image descriptors. In this paper, we present a quantitative analysis<br />

which shows that in general, nonlinear distance metric learning methods provide better results than linear methods for automatically<br />

designing local image descriptors. In addition, we show that the learned feature spaces present better results<br />

than state of- the-art hand designed features in benchmark quantitative comparisons. We discuss the results and suggest<br />

relevant problems for further investigation.<br />

09:40-10:00, Paper ThAT4.3<br />

A Study on Combining Sets of Differently Measured Dissimilarities<br />

Ibba, Alessandro, Delft Univ. of Tech.<br />

Duin, Robert, Delft Univ. of Tech.<br />

Lee, Wan-Jui, Delft Univ. of Tech.<br />

The ways distances are computed or measured enable us to have different representations of the same objects. In this paper<br />

we want to discuss possible ways of merging different sources of information given by differently measured dissimilarity<br />

representations. We compare here a simple averaging scheme [1] with dissimilarity forward selection and other techniques<br />

based on the learning of weights of linear and quadratic forms. Our general conclusion is that, although the more advanced<br />

forms of combination cannot always lead to better classification accuracies, combining given distance matrices prior to<br />

training is always worthwhile. We can thereby suggest which combination schemes are preferable with respect to the problem<br />

data.<br />

10:00-10:20, Paper ThAT4.4<br />

Efficient Kernel Learning from Constraints and Unlabeled Data<br />

Soleymani Baghshah, Mahdieh, Sharif Univ. of Tech.<br />

Bagheri Shouraki, Saeed, Sharif Univ. of Tech.<br />

Recently, distance metric learning has been received an increasing attention and found as a powerful approach for semisupervised<br />

learning tasks. In the last few years, several methods have been proposed for metric learning when must-link<br />

and/or cannot-link constraints as supervisory information are available. Although many of these methods learn global Mahalanobis<br />

metrics, some recently introduced methods have tried to learn more flexible distance metrics using a kernelbased<br />

approach. In this paper, we consider the problem of kernel learning from both pairwise constraints and unlabeled<br />

data. We propose a method that adapts a flexible distance metric via learning a nonparametric kernel matrix. We formulate<br />

our method as an optimization problem that can be solved efficiently. Experimental evaluations show the effectiveness of<br />

our method compared to some recently introduced methods on a variety of data sets.<br />

10:20-10:40, Paper ThAT4.5<br />

Semi-Supervised Graph Learning: Near Strangers or Distant Relatives<br />

Chen, Weifu, Sun Yat-sen Univ.<br />

Feng, Guocan, Sun Yat-Sen Univ.<br />

In this paper, an easily implemented semi-supervised graph learning method is presented for dimensionality reduction and<br />

clustering, using the most of prior knowledge from limited pairwise constraints. We extend instance-level constraints to<br />

space-level constraints to construct a more meaningful graph. By decomposing the (normalized) Laplacian matrix of this<br />

graph, to use the bottom eigenvectors leads to new representations of the data, which are hoped to capture the intrinsic<br />

structure. The proposed method improves the previous constrained learning methods. Furthermore, to achieve a given<br />

clustering accuracy, fewer constraints are required in our method. Experimental results demonstrate the advantages of the<br />

proposed method.<br />

- 245 -


ThAT5 Dolmabahçe Hall B<br />

Image Segmentation - I Regular Session<br />

Session chair: Puig, Domenec (Univ. Rovira i Virgili)<br />

09:00-09:20, Paper ThAT5.1<br />

Robust Color Image Segmentation through Tensor Voting<br />

Moreno, Rodrigo, Rovira i Virgili Univ.<br />

Garcia Garcia, Miguel Angel, Autonomous Univ. of Madrid<br />

Puig, Domenec, Univ. Rovira i Virgili<br />

This paper presents a new method for robust color image segmentation based on tensor voting, a robust perceptual grouping<br />

technique used to extract salient information from noisy data. First, an adaptation of tensor voting to both image denoising<br />

and robust edge detection is applied. Second, pixels in the filtered image are classified into likely-homogeneous and likelyinhomogeneous<br />

by means of the edginess maps generated in the first step. Third, the likely-homosgeneous pixels are segmented<br />

through an efficient graph-based segmenter. Finally, a modified version of the same graph-based segmenter is<br />

applied to the likely-inhomogeneous pixels in order to obtain the final segmentation. Experiments show that the proposed<br />

algorithm has a better performance than the state-of-the-art.<br />

09:20-09:40, Paper ThAT5.2<br />

An Improved Fluid Vector Flow for Cavity Segmentation in Chest Radiographs<br />

Xu, Tao, Univ. of Alberta<br />

Cheng, Irene, Univ. of Alberta<br />

Mandal, Mrinal, Univ. of Alberta<br />

Fluid vector flow (FVF) is a recently developed edge-based parametric active contour model for segmentation. By keeping<br />

its merits of large capture range and ability to handle acute concave shapes, we improved the model from two aspects:<br />

edge leakage and control point selection. Experimental results of cavity segmentation in chest radiographs show that the<br />

proposed method provides at least 8% improvement over the original FVF method.<br />

09:40-10:00, Paper ThAT5.3<br />

Patchy Aurora Image Segmentation based on ALBP and Block Threshold<br />

Fu, Rong, Xidian Univ.<br />

Gao, Xinbo, Xidian Univ.<br />

Jian, Yongjun, Xidian Univ.<br />

The proportion of aurora region to the field of view is an important index to measure the range and scale of aurora. A<br />

crucial step to obtain the index is to segment aurora region from the background. A simple and efficient aurora image segmentation<br />

algorithm is proposed, which is composed of feature representation based on adaptive local binary patterns<br />

(ALBP) and aurora region estimation through block threshold. First the ALBP features of sky image are extracted and the<br />

threshold is determined. The aurora image to be segmented is then equally divided into detection blocks from which ALBP<br />

features are also extracted. Aurora block is estimated through comparison its ALBP features with the threshold. Simple as<br />

it is, processing in huge data set is possible. The experiment illustrates the segmentation effect of the proposed method is<br />

satisfying from human visual aspect and segmentation accuracy.<br />

10:00-10:20, Paper ThAT5.4<br />

Retinal Image Segmentation based on Mumford-Shah Model and Gabor Wavelet Filter<br />

Du, Xiaojun, Concordia Univ.<br />

Bui, Tien D., Concordia Univ.<br />

Automatic retinal image segmentation is desirable for some disease diagnosis such as diabetes. In this paper, we propose<br />

a new image segmentation method to segment retinal images. The new method is based on the Mumford-Shah (MS)<br />

model. As a region-based approach, the MS model is a good segmentation technique. However, due to non-uniform illumination,<br />

some traditional approximations of the MS model cannot deal with this type of problems. We present a new<br />

method that requires no approximations. Instead, Gabor wavelet filter is used, and the method can segment objects with<br />

complicated image intensity distribution. The method is used to detect blood vessels in retinal images. The results are<br />

comparable with or better than state-of-the-art. Our method requires no training and is relatively fast.<br />

- 246 -


10:20-10:40, Paper ThAT5.5<br />

On Selecting an Optimal Number of Clusters for Color Image Segmentation<br />

Le Capitaine, Hoel, Univ. of La Rochelle<br />

Frelicot, Carl, Univ. of La Rochelle<br />

This paper addresses the problem of region-based color image segmentation using a fuzzy clustering algorithm, e.g. a<br />

spatial version of fuzzy c-means, in order to partition the image into clusters corresponding to homogeneous regions. We<br />

propose to determine the optimal number of clusters, and so the number of regions, by using a new cluster validity index<br />

computed on fuzzy partitions. Experimental results and comparison with other existing methods show the validity and the<br />

efficiency of the proposed method.<br />

ThAT6 Topkapı Hall B<br />

Face Ageing Regular Session<br />

Session chair: Yanikoglu, Berrin (Sabanci Univ.)<br />

09:00-09:20, Paper ThAT6.1<br />

Cross-Age Face Recognition on a Very Large Database: The Performance versus Age Intervals and Improvement<br />

using Soft Biometric Traits<br />

Guo, Guodong, West Virginia Univ.<br />

Mu, Guowang, North Carolina Central Univ.<br />

Ricanek, Karl, Univ. of North Carolina<br />

Facial aging can degrade the face recognition performance dramatically. Traditional face recognition studies focus on<br />

dealing with pose, illumination, and expression (PIE) changes. Considering a large span of age difference, the influence<br />

of facial aging could be very significant compared to the PIE variations. How big the aging influence could be? What is<br />

the relation between recognition accuracy and age intervals? Can soft biometrics be used to improve the face recognition<br />

performance under age variations? In this paper we address all these issues. First, we investigate the face recognition performance<br />

degradation with respect to age intervals between the probe and gallery images on a very large database which<br />

contains about 55,000 face images of more than 13,000 individuals. Second, we study if soft biometric traits, e.g., race,<br />

gender, height, and weight, could be used to improve the cross-age face recognition accuracies, and how useful each of<br />

them could be.<br />

09:20-09:40, Paper ThAT6.2<br />

A Ranking Approach for Human Age Estimation based on Face Images<br />

Chang, Kuang-Yu, Acad. Sinica<br />

Chen, Chu-Song, Acad. Sinica<br />

Hung, Yi-Ping, National Taiwan Univ.<br />

In our daily life, it is much easier to distinguish which person is elder between two persons than how old a person is. When<br />

inferring a person’s age, we may compare his or her face with many people whose ages are known, resulting in a series of<br />

comparative results, and then we conjecture the age based on the comparisons. This process involves numerous pairwise<br />

preferences information obtained by a series of queries, where each query compares the target person’s face to those faces<br />

in a database. In this paper, we propose a ranking-based framework consisting of a set of binary queries. Each query<br />

collects a binary-classification-based comparison result. All the query results are then fused to predict the age. Experimental<br />

results show that our approach performs better than traditional multi-class-based and regression-based approaches for age<br />

estimation.<br />

09:40-10:00, Paper ThAT6.3<br />

Perceived Age Estimation under Lighting Condition Change by Covariate Shift Adaptation<br />

Ueki, Kazuya, NEC Soft, Ltd.<br />

Sugiyama, Masashi, Tokyo Inst. of Tech.<br />

Ihara, Yasuyuki, NEC Soft, Ltd.<br />

Over the recent years, a great deal of effort has been made to age estimation from face images. It has been reported that<br />

age can be accurately estimated under controlled environment such as frontal faces, no expression, and static lighting conditions.<br />

However, it is not straightforward to achieve the same accuracy level in real-world environment because of con-<br />

- 247 -


siderable variations in camera settings, facial poses, and illumination conditions. In this paper, we apply a recently-proposed<br />

machine learning technique called covariate shift adaptation to alleviating lighting condition change between laboratory<br />

and practical environment. Through real-world age estimation experiments, we demonstrate the usefulness of our proposed<br />

method.<br />

10:00-10:20, Paper ThAT6.4<br />

Ranking Model for Facial Age Estimation<br />

Yang, Peng, Rutgers Univ.<br />

Lin, Zhong, Rutgers Univ.<br />

Metaxas, Dimitris, Rutgers Univ.<br />

Feature design and feature selection are two key problems in facial image based age perception. In this paper, we proposed<br />

to using ranking model to do feature selection on the haar-like features. In order to build the pairwise samples for the ranking<br />

model, age sequences are organized by personal aging pattern within each subject. The pairwise samples are extracted<br />

from the sequence of each subject. Therefore, the order information is intuitively contained in the pairwise data. Ranking<br />

model is used to select the discriminative features based on the pairwise data. The combination of the ranking model and<br />

personal aging pattern are powerful to select the discriminative features for age estimation. Based on the selected features,<br />

different kinds of regression models are used to build prediction models. The experiment results show the performance of<br />

our method is comparable to the state-of-art works.<br />

10:20-10:40, Paper ThAT6.5<br />

Development of Recognition Engine for Baby Faces<br />

Di, Wen, Tsinghua Univ.<br />

Zhang, Tong, Hewlett-Packard Lab.<br />

Fang, Chi, Tsinghua Univ.<br />

Ding, Xiaoqing, Tsinghua Univ.<br />

Existing face recognition approaches are mostly developed based on adult faces which may not work well in distinguishing<br />

faces of kids. Especially, baby faces tend to have common features such as round cheeks and chins, so that current face<br />

recognition engines often fail to differentiate them. In this paper, we present methods for discriminating baby faces from<br />

adult faces, and for training a special engine to recognize faces of different babies. To achieve these, we collected a huge<br />

number of baby face images and developed a software system to annotate the image database. Experimental results prove<br />

that the trained baby face recognizer achieves dramatic improvement on differentiating baby faces and the fusion of it<br />

with the conventional adult face recognition engine also works well on the overall data set containing both baby and adult<br />

faces.<br />

ThAT7 Dolmabahçe Hall C<br />

Document Retrieval Regular Session<br />

Session chair: Faruquie, Tanveer (IBM Res. India)<br />

09:00-09:20, Paper ThAT7.1<br />

An Information Extraction Model for Unconstrained Handwritten Documents<br />

Thomas, Simon, LITIS<br />

Chatelain, Clement, LITIS Lab. INSA de Rouen<br />

Heutte, Laurent, Univ. de Rouen<br />

Paquet, Thierry, Univ. of Rouen<br />

In this paper, a new information extraction system by statistical shallow parsing in unconstrained handwritten documents<br />

is introduced. Unlike classical approaches found in the literature as keyword spotting or full document recognition, our<br />

approch relies on a strong and powerful global handwriting model. A entire text line is considered as an indivisible entity<br />

and is modeled with Hidden Markov Models. In this way, text line shallow parsing allows fast extraction of the relevant<br />

information in any document while rejecting at the same time irrelevant information. First results are promising and show<br />

the interest of the approach.<br />

- 248 -


09:20-09:40, Paper ThAT7.2<br />

HMM-Based Word Spotting in Handwritten Documents using Subword Models<br />

Fischer, Andreas, Univ. of Bern<br />

Keller, Andreas, Univ. of Bern<br />

Frinken, Volkmar, Univ. of Bern<br />

Bunke, Horst, Univ. of Bern<br />

Handwritten word spotting aims at making document images amenable to browsing and searching by keyword retrieval.<br />

In this paper, we present a word spotting system based on Hidden Markov Models (HMM) that uses trained subword models<br />

to spot keywords. With the proposed method, arbitrary keywords can be spotted that do not need to be present in the<br />

training set. Also, no text line segmentation is required. On the modern IAM off-line database and the historical George<br />

Washington database we show that the proposed system outperforms a standard template matching approach based on dynamic<br />

time warping (DTW).<br />

09:40-10:00, Paper ThAT7.3<br />

A Content Spotting System for Line Drawing Graphic Document Images<br />

Luqman, Muhammad Muzzamil, Univ. Françoise Rabelaise Tours France; CVC Barcelona<br />

Brouard, Thierry, Univ. Françoise Rabelaise Tours France<br />

Ramel, Jean-Yves, Univ. François Rabelais de Tours<br />

Llados, Josep, Computer Vision Center<br />

We present a content spotting system for line drawing graphic document images. The proposed system is sufficiently domain<br />

independent and takes the keyword based information retrieval for graphic documents, one step forward, to Query<br />

By Example (QBE) and focused retrieval. During offline learning mode: we vectorize the documents in the repository,<br />

represent them by attributed relational graphs, extract regions of interest (ROIs) from them, convert each ROI to a fuzzy<br />

structural signature, cluster similar signatures to form ROI classes and build an index for the repository. During online<br />

querying mode: a Bayesian network classifier recognizes the ROIs in the query image and the corresponding documents<br />

are fetched by looking up in the repository index. Experimental results are presented for synthetic images of architectural<br />

and electronic documents.<br />

10:00-10:20, Paper ThAT7.4<br />

Toward Massive Scalability in Image Matching<br />

Moraleda, Jorge, Ricoh Innovations Inc.<br />

Hull, Jonathan, Ricoh<br />

A method for image matching from partial blurry images is presented that leverages existing text retrieval algorithms to<br />

provide a solution that scales to hundreds of thousands of images. As an initial application, we present a document image<br />

matching system in which the user supplies a query image of a small patch of a paper document taken with a cell phone<br />

camera, and the system returns a label identifying the original electronic document if found in a previously indexed collection.<br />

Experimental results show that a retrieval rate of over 70% is achieved on a collection of nearly 500,000 document<br />

pages.<br />

10:20-10:40, Paper ThAT7.5<br />

Learning Image Anchor Templates for Document Classification and Data Extraction<br />

Sarkar, Prateek, Palo Alto Res. Center<br />

Image anchor templates are used in document image analysis for document classification, data localization, and other<br />

tasks. Current tools allow human operators to mark out small sub-images from documents to act as anchor templates.<br />

However, this requires time, and expertise because operators have to make informed decisions based on behavior of the<br />

template matching algorithms, and the expected degradations patterns in documents. We propose learning templates for a<br />

task automatically and quickly from a few training examples. Document classification or data localization can be done<br />

more robustly by combining evidence from many more discriminating templates (e.g., hundreds) than would be practicable<br />

for operators to specify.<br />

- 249 -


ThAT8 Upper Foyer<br />

Image Analysis; Scene Understanding; Shape Modeling; Tracking and Surveillance; Vision Sensors<br />

Poster Session<br />

Session chair: Gimel’farb, Georgy (Univ. of Auckland)<br />

09:00-11:10, Paper ThAT8.2<br />

Sparse Embedding Visual Attention Systems Combined with Edge Information<br />

Zhao, Cairong, Nanjing Univ. of Science and Tech.<br />

Liu, ChuanCai, Nanjing Univ. of Science and Tech.<br />

Lai, Zhihui, Nanjing Univ. of Science and Tech.<br />

Yang, Jingyu, Nanjing Univ. of Science and Tech.<br />

The general computational models of visual attention are to obtain multi-scale feature maps in terms of visual properties<br />

like intensity, color and orientation, and then combine them to get one saliency map. But due to the lack of object edge information<br />

and reasonable feature combination strategy, the visual saliency map of the image is a blur map. Being aware<br />

of these, we propose a new scheme for saliency extraction. In this paper, we firstly put forward a sparse embedding feature<br />

combination strategy, inspired by sparse representation. The strategy is used to combine the salient regions from the individual<br />

feature maps based on a novel feature sparse indicator that measures the contribution of each map to saliency. Then<br />

we combine traditional visual attention with edge information. Results on different scene images show that our method<br />

outperforms other traditional feature combination strategies.<br />

09:00-11:10, Paper ThAT8.4<br />

LLN-Based Model-Driven Validation of Data Points for Random Sample Consensus Methods<br />

Zhang, Liang, Communications Res. Centre Canada<br />

Wang, Demin, Communications Res. Center Canada<br />

This paper presents an on-the-fly model-driven validation of data points for random sample consensus methods (RANSAC).<br />

The novelty resides in the idea that an analysis of the outcomes of previous random model samplings can benefit subsequent<br />

samplings. Given a sequence of successful model samplings, information from the inlier sets and the model errors is used<br />

to provide a validness of a data point. This validness is used to guide subsequent model samplings, so that the data point<br />

with a higher validness has more chance to be selected. To evaluate the performance, the proposed method is applied to<br />

the problem of the line model fitting and the estimation of the fundamental matrix. Experimental results confirm that the<br />

proposed algorithm improves the performance of RANSAC in terms of the estimate accuracy and the number of samplings.<br />

09:00-11:10, Paper ThAT8.5<br />

Estimating 3D Human Pose from Single Images using Iterative Refinement of the Prior<br />

Daubney, Ben Christopher, Swansea Univ.<br />

Xie, Xianghua, Swansea Univ.<br />

This paper proposes a generative method to extract 3D human pose using just a single image. Unlike many existing approaches<br />

we assume that accurate foreground background segmentation is not possible and do not use binary silhouettes.<br />

A stochastic method is used to search the pose space and the posterior distribution is maximized using Expectation Maximization<br />

(EM). It is assumed that some knowledge is known a priori about the position, scale and orientation of the person<br />

present and we specifically develop an approach to exploit this. The result is that we can learn a more constrained prior<br />

without having to sacrifice its generality to a specific action type. A single prior is learnt using all actions in the Human<br />

Eva dataset [9] and we provide quantitative results for images selected across all action categories and subjects, captured<br />

from differing viewpoints.<br />

09:00-11:10, Paper ThAT8.6<br />

Human-Area Segmentation by Selecting Similar Silhouette Images based on Weak-Classifier Response<br />

Ando, Hiroaki, Chubu Univ.<br />

Fujiyoshi, Hironobu, Chubu Univ.<br />

Human-area segmentation is a major issue in video surveillance. Many existing methods estimate individual human areas<br />

from the foreground area obtained by background subtraction, but the effects of camera movement can make it difficult<br />

- 250 -


to obtain a background image. We have achieved human-area segmentation requiring no background image by using<br />

chamfer matching to match the results of human detection using Real AdaBoost with silhouette images. Although accuracy<br />

in chamfer matching drops as the number of templates increases, the proposed method enables segmentation accuracy to<br />

be improved by selecting silhouette images similar to the matching target beforehand based on response values from weak<br />

classifiers in Real AdaBoost.<br />

09:00-11:10, Paper ThAT8.7<br />

Local Optical Operators for Subpixel Scene Analysis<br />

Jean, Yves, City Univ. of NY<br />

In this paper we present a scene analysis technique with subpixel filtering based on dense coded light fields. Our technique<br />

computes alignment and optically projects analysis filters to local surfaces within the extent of a camera pixel. The resolution<br />

gain depends on the local light field density not on the point spread function of the camera optics. <strong>Abstract</strong> An initial<br />

structured light sequence is used in establishing each camera pixel’s footprint in the projector generated light field. Then<br />

a sequence of basis functions embedded in the light field, with camera pixel support, combine with the local surface texture<br />

and are integrated by a camera sensor to produce a localized response at the subpixel scale. We address optical modeling<br />

and aliasing issues since the dense light field is under sampled by the camera pixels. Results are provided with objects of<br />

planar and non-planar topology.<br />

09:00-11:10, Paper ThAT8.8<br />

Aesthetic Image Classification for Autonomous Agents<br />

Desnoyer, Mark, Carnegie Mellon Univ.<br />

Wettergreen, David, Carnegie Mellon Univ.<br />

Computational aesthetics is the study of applying machine learning techniques to identify aesthetically pleasing imagery.<br />

Prior work used online datasets scraped from large user communities like Flikr to get labeled data. However, online imagery<br />

represents results late in the media generation process, as the photographer has already framed the shot and then picked<br />

the best results to upload. Thus, this technique can only identify quality imagery once it has been taken. In contrast, automatically<br />

creating pleasing imagery requires understanding the imagery present earlier in the process. This paper applies<br />

computational aesthetics techniques to a novel dataset from earlier in that process in order to understand how the problem<br />

changes when an autonomous agent, like a robot or a real-time camera aid, creates pleasing imagery instead of simply<br />

identifying it.<br />

09:00-11:10, Paper ThAT8.9<br />

Removal of Moving Objects from a Street-view Image by Fusing Multiple Image Sequences<br />

Uchiyama, Hiroyuki, Nagoya Univ.<br />

Deguchi, Daisuke, Nagoya Univ.<br />

Takahashi, Tomokazu, Gifu Shotoku Gakuen Univ.<br />

Ide, Ichiro, Nagoya Univ.<br />

Murase, Hiroshi, Nagoya Univ.<br />

We propose a method to remove moving objects from an in-vehicle camera image sequence by fusing multiple image sequences.<br />

Driver assistance systems and services such as Google Street View require images containing no moving object.<br />

The proposed scheme consists of three parts: (i) collection of many image sequences along the same route by using vehicles<br />

equipped with an omni-directional camera, (ii) temporal and spatial registration of image sequences, and (iii) mosaicing<br />

partial images containing no moving object. Experimental results show that 97.3% of the moving object area could be removed<br />

by the proposed method.<br />

09:00-11:10, Paper ThAT8.10<br />

Improving SIFT-Based Descriptors Stability to Rotations<br />

Bellavia, Fabio, Univ. of Palermo<br />

Tegolo, Domenico, Univ. of Palermo<br />

Trucco, Emanuele<br />

Image descriptors are widely adopted structures to match image features. SIFT-based descriptors are collections of gradient<br />

- 251 -


orientation histograms computed on different feature regions, commonly divided by using a regular Cartesian grid or a<br />

log-polar grid. In order to achieve rotation invariance, feature patches have to be generally rotated in the direction of the<br />

dominant gradient orientation. In this paper we present a modification of the GLOH descriptor, a SIFT-based descriptor<br />

based on a log-polar grid, which avoids to rotate the feature patch before computing the descriptor since predefined discrete<br />

orientations can be easily derived by shifting the descriptor vector. The proposed descriptors, called sGLOH and sGLOH+,<br />

have been compared with the SIFT descriptor on the Oxford image dataset, with good results which point out its robustness<br />

and stability.<br />

09:00-11:10, Paper ThAT8.11<br />

Inpainting Large Missing Regions in Range Images<br />

Bhavsar, Arnav, Indian Inst. of Tech. Madras<br />

Ambasamudram, Rajagopalan, Indian Inst. of Tech. Madras<br />

We propose a technique to in paint large missing regions in range images. Such a technique can be used to restore degraded/occluded<br />

range maps. It can also serve to reconstruct dense depth maps from sparse measurements which can speed<br />

up the acquisition. Our method uses the visual cue from segmentation of an intensity image registered to the range image.<br />

Our approach enforces that pixels in the same segment should have similar range. Our simple strategy involves planefitting<br />

and local medians over segments to compute local energies for labeling unknown pixels. Our results exhibit high<br />

quality in painting with very low errors.<br />

09:00-11:10, Paper ThAT8.12<br />

Angular Variation as a Monocular Cue for Spatial Perception<br />

Aranda, Joan, UPC<br />

Navarro, Agustin A., UPC<br />

Perspective projection presents objects as they are naturally seen by the eye. However, this type of mapping strongly<br />

distorts their geometric properties as angles, which are not preserved under perspective transformations. In this work, this<br />

angular variation serves to model the visual effect of perspective projection. Thus, knowing that the angular distortion depends<br />

on the point of view of the observer, it is demonstrated that it is possible to determine the pose of an object as a consequence<br />

of its perspective distortion. It is a computational approach to direct perception in which spatial information of<br />

a scene is calculated directly from the optic array. Experimental results show the robustness provided by the use of angles<br />

and establishes this 3D measurement technique as an emulation of a visual perception process.<br />

09:00-11:10, Paper ThAT8.13<br />

An Exploration Scheme for Large Images: Application to Breast Cancer Grading<br />

Veillard, Antoine, NUS<br />

Lomenie, Nicolas, CNRS<br />

Racoceanu, Daniel, CNRS - French National Res. Center<br />

Most research works focus on pattern recognition within a small sample images but strategies for running efficiently these<br />

algorithms over large images are rarely if ever specifically considered. In particular, the new generation of satellite and<br />

microscopic images are acquired at a very high resolution and a very high daily rate. We propose an efficient, generic<br />

strategy to explore large images by combining computational geometry tools with a local signal measure of relevance in<br />

a dynamic sampling framework. An application to breast cancer grading from huge histopathological images illustrates<br />

the benefit of such a general strategy for new major applications in the field of microscopy.<br />

09:00-11:10, Paper ThAT8.14<br />

3D Human Body Modeling using Range Data<br />

Yamauchi, Koichiro, Keio Univ.<br />

Bhanu, Bir, Univ. of California<br />

Saito, Hideo, Keio Univ.<br />

For the 3D modeling of walking humans the determination of body pose and extraction of body parts, from the sensed 3D<br />

range data, are challenging image processing problems. Real body data may have holes because of self-occlusions and<br />

grazing angle views. Most of the existing modeling methods rely on direct fitting a 3D model into the data without con-<br />

- 252 -


sidering the fact that the parts in an image are indeed the human body parts. In this paper, we present a method for 3D<br />

human body modeling using range data that attempts to overcome these problems. In our approach the entire human body<br />

is first decomposed into major body parts by a parts-based image segmentation method, and then a kinematics model is<br />

fitted to the segmented body parts in an optimized manner. The fitted model is adjusted by the iterative closest point (ICP)<br />

algorithm to resolve the gaps in the body data. Experimental results and comparisons demonstrate the effectiveness of our<br />

approach.<br />

09:00-11:10, Paper ThAT8.15<br />

Scale Matching of 3D Point Clouds by Finding Keyscales with Spin Images<br />

Tamaki, Toru, Hiroshima Univ.<br />

Tanigawa, Shunsuke, Hiroshima Univ.<br />

Ueno, Yuji, Hiroshima Univ.<br />

Raytchev, Bisser, Hiroshima Univ.<br />

Kaneda, Kazufumi, Hiroshima Univ.<br />

In this paper we propose a method for matching the scales of 3D point clouds. 3D point sets of the same scene obtained<br />

by 3D reconstruction techniques usually differ in scales. To match scales, we propose a keyscale that characterizes the<br />

scale of a given 3D point cloud. By performing PCA of spin images over different scales, a keyscale is defined as the<br />

scale that gives the minimum of cumulative contribution rate of PCA at a specific dimension of eigen space. Simulations<br />

with the Stanford bunny and experimental results with 3D reconstructions of a real scene demonstrate that keyscales of<br />

any 3D point clouds can be uniquely found and effectively used for scale matching.<br />

09:00-11:10, Paper ThAT8.16<br />

Tracking Multiple People with Illumination Maps<br />

Zen, Gloria, Fondazione Bruno Kessler<br />

Lanz, Oswald, Fondazione Bruno Kessler<br />

Messelodi, Stefano, Fondazione Bruno Kessler<br />

Ricci, Elisa, Fondazione Bruno Kessler<br />

We address the problem of multiple people tracking under non-homogenous and time-varying illumination conditions.<br />

We propose a unified framework for jointly estimating the position of the targets and their illumination conditions. For<br />

each target multiple templates are considered to model appearance variations due to lighting changes. The template choice<br />

is driven by an illumination map which describes the light conditions in different areas of the scene. This map is computed<br />

with a novel algorithm for efficient inference in a hierarchical Markov Random Field (MRF) and is updated online to<br />

adapt to slow lighting changes. Experimental results demonstrate the effectiveness of our approach.<br />

09:00-11:10, Paper ThAT8.17<br />

Combining Foreground / Background Feature Points and Anisotropic Mean Shift for Enhanced Visual Object<br />

Tracking<br />

Haner, Sebastian, Lund Univ. of Tech.<br />

Gu, Irene Yu-Hua, Chalmers Univ. of Tech.<br />

This paper proposes a novel visual object tracking scheme, exploiting both local point feature correspondences and global<br />

object appearance using the anisotropic mean shift tracker. Using a RANSAC cost function incorporating the mean shift<br />

motion estimate, motion smoothness and complexity terms, an optimal feature point set for motion estimation is found<br />

even when a high proportion of outliers is presented. The tracker dynamically maintains sets of both foreground and background<br />

features, the latter providing information on object occlusions. The mean shift motion estimate is further used to<br />

guide the inclusion of new point features in the object model. Our experiments on videos containing long term partial occlusions,<br />

object intersections and cluttered or close color distributed background have shown more stable and robust tracking<br />

performance in comparison to three existing methods.<br />

09:00-11:10, Paper ThAT8.18<br />

Enhanced Measurement Model for Subspace-Based Tracking<br />

Yin, Shimin, Seoul National Univ.<br />

Yoo, Haan Ju, Seoul National Univ.<br />

- 253 -


Choi, Jin Young, Automation and System Res. Inst. Seoul NationalUniversity<br />

We present an efficient and robust measurement model for visual tracking. This approach builds on and extends work on<br />

measurement model of subspace representation. Subspace-based tracking algorithms have been introduced to visual tracking<br />

literature for a decade and show considerable tracking performance due to its robustness in matching. However, the<br />

measures used in their measurement models are not robust enough in cluttered backgrounds. We propose a novel measure<br />

of object matching referred to as WDIFS, which aims to improve the discriminability of matching within the subspace.<br />

Our measurement model can distinguish target from similar background clutters which often cause erroneous drift by conventional<br />

DFFS based measure. Experiments demonstrate the effectiveness of the proposed tracking algorithm under cluttered<br />

background.<br />

09:00-11:10, Paper ThAT8.19<br />

Person-Specific Face Shape Estimation under Varying Head Pose from Single Snapshots<br />

Dornaika, Fadi, Univ. of the Basque Country<br />

Raducanu, Bogdan, Computer Vision Center<br />

This paper presents a new method for person-specific face shape estimation under varying head pose of a previously<br />

unseen person from a single image. We describe a featureless approach based on a deformable 3D model and a learned<br />

face subspace. The proposed approach is based on maximizing a likelihood measure associated with a learned face subspace,<br />

which is carried out by a stochastic and genetic optimizer. We conducted the experiments on a subset of Honda<br />

Video Database showing the feasibility and robustness of the proposed approach. For this reason, our approach could lend<br />

itself nicely to complex frameworks involving 3D face tracking and face gesture recognition in monocular videos.<br />

09:00-11:10, Paper ThAT8.20<br />

Tracking Ships from Fast Moving Camera through Image Registration<br />

Fefilatyev, Sergiy, Univ. of South Florida<br />

Goldgof, Dmitry, Univ. of South Florida<br />

Lembke, Chad, Univ. of South Florida<br />

This paper presents an algorithm that detects and tracks marine vessels in video taken by a nonstationary camera installed<br />

on an untethered buoy. The video is characterized by large inter-frame motion of the camera, cluttered background, and<br />

presence of compression artifacts. Our approach performs segmentation of ships in individual frames processed with a<br />

color-gradient filter. The threshold selection is based on the histogram of the search region. Tracking of ships in a sequence<br />

is enabled by registering the horizon images in one coordinate system and by using a multihypothesis framework. Registration<br />

step uses an area-based technique to correlate a processed strip of the image over the found horizon line. The results<br />

of evaluation of detection, localization, and tracking of the ships show significant increase in performance in comparison<br />

to the previously used technique.<br />

09:00-11:10, Paper ThAT8.21<br />

Boosted Multiple Kernel Learning for Scene Category Recognition<br />

Jhuo, I-Hong, National Taiwan Univ.<br />

Lee, Der-Tsai, National Taiwan Univ.<br />

Scene images typically include diverse and distinctive properties. It is reasonable to consider different features in establishing<br />

a scene category recognition system with a promising performance. We propose an adaptive model to represent<br />

various features in a unified domain, i.e., a set of kernels, and transform the discriminant information contained in each<br />

kernel into a set of weak learners, called dyadic hyper cuts. Based on this model, we present a novel approach to carrying<br />

out incremental multiple kernel learning for feature fusion by applying AdaBoost to the union of the sets of weak learners.<br />

We further evaluate the performance of this approach by a benchmark dataset for scene category recognition. Experimental<br />

results show a significantly improved performance in both accuracy and efficiency.<br />

09:00-11:10, Paper ThAT8.22<br />

Receding Horizon Estimation for Hybrid Particle Filters and Application for Robust Visual Tracking<br />

Kim, Du Yong, Gwangju Inst. of Science and Tech.<br />

Yang, Ehwa, Gwangju Inst. of Science and Tech.<br />

- 254 -


Jeon, Moongu, Gwangju Inst. of Science and Tech.<br />

Shin, Vladimir, Gwangju Inst. of Science and Tech.<br />

The receding horizon estimation is applied to design robust visual trackers. Most recent data within the fixed size of windows<br />

is receding, and is processed to obtain an estimate of the object state at the current time. In visual tracking such a<br />

scheme improves filter accuracy by avoiding accumulated approximation errors. A newly derived unscented Kalman filter<br />

(UKF) based on the receding horizon strategy is proposed for determining the importance density of the hybrid particle<br />

filter. The importance density derived by the receding horizon-based UKF (RHUKF) provides significantly improved accuracy<br />

and performance consistency compared to the unscented particle filter (UPF). Visual tracking examples are subsequently<br />

tested to demonstrate the advantages of the filter.<br />

09:00-11:10, Paper ThAT8.23<br />

Efficient Polygonal Approximation of Digital Curves via Monte Carlo Optimization<br />

Zhou, Xiuzhuang, Beijing Inst. of Tech.<br />

Lu, Yao, Beijing Inst. of Tech.<br />

A novel stochastic searching scheme based on the Monte Carlo optimization is presented for polygonal approximation<br />

(PA) problem. We propose to combine the split-and-merge based local optimization and the Monte Carlo sampling, to<br />

give an efficient stochastic optimization scheme. Our approach, in essence, is a well-designed Basin-Hopping scheme,<br />

which performs stochastic hopping among the reduced energy peaks. Experiment results on various benchmarks show<br />

that our method achieves high-quality solutions with lower computational costs, and outperforms most of state-of-the-art<br />

algorithms for PA problem.<br />

09:00-11:10, Paper ThAT8.24<br />

Weakly Supervised Action Recognition using Implicit Shape Models<br />

Thi, Tuan Hue, Univ. of New South Wales and National ICT of Australia<br />

Cheng, Li, National ICT of Australia<br />

Zhang, Jian, National ICT of Australia<br />

Wang, Li, Nanjing Forest Univ.<br />

Satoh, Shin’Ichi, National Inst. of Informatics<br />

In this paper, we present a robust framework for action recognition in video, that is able to perform competitively against<br />

the state-of-the-art methods, yet does not rely on sophisticated background subtraction preprocess to remove background<br />

features. In particular, we extend the Implicit Shape Modeling (ISM) of [10] for object recognition to 3D to integrate local<br />

spatiotemporal features, which are produced by a weakly supervised Bayesian kernel filter. Experiments on benchmark<br />

datasets (including KTH and Weizmann) verifies the effectiveness of our approach.<br />

09:00-11:10, Paper ThAT8.25<br />

Moments of Elliptic Fourier Descriptors<br />

Soldea, Octavian, Sabanci Univ.<br />

Unel, Mustafa, Sabanci Univ.<br />

Ercil, Aytul, Sabanci Univ.<br />

This paper develops a recursive method for computing moments of 2D objects described by elliptic Fourier descriptors<br />

(EFD). Green’s theorem is utilized to transform 2D surface integrals into 1D line integrals and EFD description is employed<br />

to derive recursions for moments computations. Experiments are performed to quantify the accuracy of our proposed<br />

method. Comparison with Bernstein-Bezier representations is also provided.<br />

09:00-11:10, Paper ThAT8.26<br />

Semi-Supervised Trajectory Learning using a Multi-Scale Key Point based Trajectory Representation<br />

Liu, Yang, Chinese Acad. of Sciences<br />

Li, Xi, CNRS, TELECOM ParisTech<br />

Hu, Weiming, National Lab. of Pattern Recognition, Inst.<br />

Motion trajectories contain rich high-level semantic information such as object behaviors and gestures, which can be ef-<br />

- 255 -


fectively captured by supervised trajectory learning. However, it is usually a tough task to obtain a large number of highquality<br />

manually labeled samples in real applications. Thus, how to perform trajectory learning in small training sample<br />

size situations is an important research topic. In this paper, we propose a trajectory learning framework using graph-based<br />

semi-supervised transductive learning, which propagates training sample labels along a particular graph. Furthermore, a<br />

novel trajectory descriptor based on multi-scale key points is proposed to characterize the spatial structural information.<br />

Experimental results demonstrate effectiveness of our framework.<br />

09:00-11:10, Paper ThAT8.27<br />

Detection based Low Frame Rate Human Tracking<br />

Wang, Lu, The Univ. of Hong Kong<br />

Yung, Nelson, the Univ. of Hong Kong<br />

Tracking by association of low frame rate detection responses is not trivial, as motion is less continuous and hence ambiguous.<br />

The problem becomes more challenging when occlusion occurs. To solve this problem, we firstly propose a<br />

robust data association method that explicitly differentiates ambiguous tracklets that are likely to introduce incorrect<br />

linking from other tracklets, and deal with them effectively. Secondly, we solve the long-time occlusion problem by detecting<br />

inter-track relationship and performing track split and merge according to appearance similarity and occlusion<br />

order. Experiment on a challenging human surveillance dataset shows the effectiveness of the proposed method.<br />

09:00-11:10, Paper ThAT8.28<br />

Detecting Dominant Motion Flows in Unstructured/Structured Crowd Scenes<br />

Ozturk, Ovgu, The Univ. of Tokyo<br />

Yamasaki, Toshihiko, The Univ. of Tokyo<br />

Aizawa, Kiyoharu, The Univ. of Tokyo<br />

Detecting dominant motion flows in crowd scenes is one of the major problems in video surveillance. This is particularly<br />

difficult in unstructured crowd scenes, where the participants move randomly in various directions. This paper presents a<br />

novel method which utilizes SIFT features’ flow vectors to calculate the dominant motion flows in both unstructured and<br />

structured crowd scenes. SIFT features can represent the characteristic parts of objects, allowing robust tracking under<br />

non-rigid motion. First, flow vectors of SIFT features are calculated at certain intervals to form a motion flow map of the<br />

video. ‘ext, this map is divided into equally sized square regions and in each region dominant motion flows are estimated<br />

by clustering the flow vectors. Then, local dominant motion flows are combined to obtain the global dominant motion<br />

flows. Experimental results demonstrate the successful application of the proposed method to challenging real-world<br />

scenes.<br />

09:00-11:10, Paper ThAT8.29<br />

Statistical Shape Modeling using Morphological Representations<br />

Velasco-Forero, Santiago, MINES ParisTech<br />

Angulo, Jesus, MINES ParisTech<br />

The aim of this paper is to propose tools for statistical analysis of shape families using morphological operators. Given a<br />

series of shape families (or shape categories), the approach consists in empirically computing shape statistics (i.e., mean<br />

shape and variance of shape) and then to use simple algorithms for random shape generation, for empirical shape confidence<br />

boundaries computation and for shape classification using Bayes rules. The main required ingredients for the present methods<br />

are well known in image processing, such as watershed on distance functions or log-polar transformation. Performance<br />

of classification is presented in a well-known shape database.<br />

09:00-11:10, Paper ThAT8.30<br />

Recovering the Topology of Multiple Cameras by Finding Continuous Paths in a Trellis<br />

Cai, Yinghao, Univ. of Oulu<br />

Kaiqi, Huang, CAS Inst. of Automation<br />

Tan, Tieniu, CAS Inst. of Automation<br />

Pietikäinen, Matti, Univ. of Oulu<br />

In this paper, we propose an unsupervised method for recovering the topology of multiple cameras with non-overlapping<br />

- 256 -


fields of view. The nodes in the topology graph are defined as entry/exit zones in each camera while the connectivity between<br />

nodes is inferred through finding continuous paths in a trellis where appearance information and temporal information<br />

of moving objects are encoded. Unlike previous methods which assume a single mode transition distribution between<br />

nodes, our method is capable of dealing with multi-modal transition situations when both cars and pedestrians are in the<br />

scene. Results on simulated and real-life datasets demonstrate the effectiveness of the proposed method.<br />

09:00-11:10, Paper ThAT8.31<br />

On-Line Random Naive Bayes for Tracking<br />

Godec, Martin, Graz Univ. of Tech.<br />

Leistner, Christian, Graz Univ. of Tech.<br />

Saffari, Amir, Graz Univ. of Tech.<br />

Bischof, Horst, Graz Univ. of Tech.<br />

Randomized learning methods (i.e., Forests or Ferns) have shown excellent capabilities for various computer vision applications.<br />

However, it was shown that the tree structure in Forests can be replaced by even simpler structures, e.g., Random<br />

Naive Bayes classifiers, yielding similar performance. The goal of this paper is to benefit from these findings to develop<br />

an efficient on-line learner. Based on the principals of on-line Random Forests, we adapt the Random Naive Bayes classifier<br />

to the on-line domain. For that purpose, we propose to use on-line histograms as weak learners, which yield much better<br />

performance than simple decision stumps. Experimentally we show, that the approach is applicable to incremental learning<br />

on machine learning datasets. Additionally, we propose to use an iir filtering-like forgetting function for the weak learners<br />

to enable adaptivity and evaluate our classifier on the task of tracking by detection.<br />

09:00-11:10, Paper ThAT8.32<br />

Interest Point based Tracking<br />

Kloihofer, Werner, Center Communication Systems GmbH<br />

Kampel, Martin, Vienna Univ. of Tech.<br />

This paper deals with a novel method for object tracking. In the first step interest points are detected and feature descriptors<br />

around them are calculated. Sets of known points are created, allowing tracking based on point matching. The set representation<br />

is updated online at every tracking step. Our method uses one-shot learning with the first frame, so no offline<br />

and no supervised learning is required. Following an object recognition based approach there is no need for a background<br />

model or motion model, allowing tracking of abrupt motion and with non-stationary cameras. We compare our method to<br />

Mean Shift and Tracking via Online Boosting, showing the benefits of our approach.<br />

09:00-11:10, Paper ThAT8.33<br />

Stochastic Filtering of Level Sets for Curve Tracking<br />

Avenel, Christophe, Irisa<br />

Memin, Etienne<br />

Perez, Patrick<br />

This paper focuses on the tracking of free curves using non-linear stochastic filtering techniques. It relies on a particle<br />

filter which includes color measurements. The curve and its velocity are defined through two coupled implicit level set<br />

representations. The stochastic dynamics of the curve is expressed directly on the level set function associated to the curve<br />

representation and combines a velocity field captured from the additional second level set attached to the past curve’s<br />

points location. The curve’s dynamics combines a low-dimensional noise model and a data-driven local force. We demonstrate<br />

how this approach allows the tracking of highly and rapidly deforming objects, such as convective cells in infra-red<br />

satellite images, while providing a location-dependent assessment of the estimation confidence.<br />

09:00-11:10, Paper ThAT8.34<br />

Scalable Cage-Driven Feature Detection and Shape Correspondence for 3D Point Sets<br />

Seversky, Lee, State Univ. of New York at Binghamton<br />

Yin, Lijun, State Univ. of New York at Binghamton<br />

We propose an automatic deformation-driven correspondence algorithm for 3D point sets of non-rigid articulated shapes.<br />

- 257 -


Our approach uses simple geometric cages to embed the point set data and extract and match a coarse set of prominent<br />

features. We seek feature correspondences which lead to low-distortion deformations of the cages while satisfying the feature<br />

pairing. Our approach operates on the simplified geometric domain of the cage instead of the more complex 3D point<br />

data. Thus, it is robust to noise, partial occlusions, and insensitive to non-regular sampling. We demonstrate the potential<br />

of our approach by finding pairwise correspondences for sequences of acquired time-varying 3D scan point data.<br />

09:00-11:10, Paper ThAT8.35<br />

Event Recognition based on Top-Down Motion Attention<br />

Li, Li, Chinese Acad. of Sci.<br />

Hu, Weiming, Chinese Acad. of Sci.<br />

Li, Bing, Chinese Acad. of Sci.<br />

Yuan, Chunfeng, Chinese Acad. of Sci.<br />

Zhu, Pengfei, Chinese Acad. of Sci.<br />

Li, Wanqing, Univ. of Wollongong<br />

How to fuse static and dynamic information is a key issue in event analysis. In this paper, a top-down motion guided<br />

fusing method is proposed for recognizing events in an unconstrained news video. In the method, the static information is<br />

represented as a Bag-of-SIFT-features and motion information is employed to generate event specific attention map to<br />

direct the sampling of the interest points. We build class-specific motion histograms for each event so as to give more<br />

weight on the interest points that are discriminative to the corresponding event. Experimental results on TRECVID 2005<br />

video corpus demonstrate that the proposed method can improve the mean average accuracy of recognition.<br />

09:00-11:10, Paper ThAT8.36<br />

Construction of Precise Local Affine Frames<br />

Mikulik, Andrej, CMP FEE, CTU Prague<br />

Matas, Jiri, CTU Prague<br />

Perdoch, Michal, CMP, FEE, CTU Prague<br />

Chum, Ondrej,<br />

We propose a novel method for the refinement of Maximally Stable Extremal Region (MSER) boundaries to sub-pixel<br />

precision by taking into account the intensity function in the 2x2 neighborhood of the contour points. The proposed method<br />

improves the repeatability and precision of Local Affine Frames (LAFs) constructed on extremal regions. Additionally,<br />

we propose a novel method for detection of local curvature extrema on the refined contour. Experimental evaluation on<br />

publicly available datasets shows that matching with the modified LAFs leads to a higher number of correspondences and<br />

a higher inlier ratio in more than 80% of the test image pairs. Since the processing time of the contour refinement is negligible,<br />

there is no reason not to include the algorithms as a standard part of the MSER detector and LAF constructions.<br />

09:00-11:10, Paper ThAT8.37<br />

Foreground Segmentation via Background Modeling on Riemannian Manifolds<br />

Caseiro, Rui, Univ. of Coimbra<br />

Henriques, João F, Univ. of Coimbra<br />

Batista, Jorge, Univ. of Coimbra<br />

Statistical modeling in color space is a widely used approach for background modeling to foreground segmentation. Nevertheless,<br />

sometimes computing such statistics directly on image values is not enough to achieve a good discrimination.<br />

Thus the image may be converted into a more information rich form, such as a tensor field, in which can be encoded color<br />

and gradients. In this paper, we exploit the theoretically well-founded differential geometrical properties of the Riemannian<br />

manifold where tensors lie. We propose a novel and efficient approach for foreground segmentation on tensor field based<br />

on data modeling by means of Gaussians mixtures (GMM) directly in the tensor domain. We introduced a Expectation<br />

Maximization (EM) algorithm to estimate the mixture parameters, and are proposed two algorithms based on an online<br />

K-means approximation of EM, in order to speed up the process. Theoretic analysis and experimental evaluations demonstrate<br />

the promise and effectiveness of the proposed framework.<br />

- 258 -


09:00-11:10, Paper ThAT8.38<br />

Robust Human Behavior Modeling from Multiple Cameras<br />

Kosmopoulos, D., NCSR Demokritos<br />

Voulodimos, Athanasios, National Tech. Univ. of Athens<br />

Varvarigou, Theodora, National Tech. Univ. of Athens<br />

In this work, we propose a framework for classifying structured human behavior in complex real environments, where<br />

problems such as frequent illumination changes and heavy occlusions are expected. Since target recognition and tracking<br />

can be very challenging, we bypass these problems by employing an approach similar to Motion History Images for feature<br />

extraction. Furthermore, to tackle outliers residing within the training data, which might affect severely the training algorithm<br />

of models with Gaussian observation likelihoods, we scrutinize the effectiveness of the multivariate Student-t distribution<br />

as the observation likelihood of the employed Hidden Markov Models. Additionally, the problem of visibility<br />

and occlusions is addressed by providing various extensions of the framework for multiple cameras, both at the feature<br />

and at the state level. Finally, we evaluate the performance of the examined approaches under real-life visual behavior understanding<br />

scenarios and we compare and discuss the obtained results.<br />

09:00-11:10, Paper ThAT8.39<br />

Unsupervised Learning of Activities in Video using Scene Context<br />

Oh, Sangmin, Kitware Inc.<br />

Hoogs, Anthony, Kitware Inc.<br />

Unsupervised learning of semantic activities from video collected over time is an important problem for visual surveillance<br />

and video scene understanding. Our goal is to cluster tracks into semantically interpretable activity models that are independent<br />

of scene locations; most previous work in video scene understanding is focused on learning location-specific normalcy<br />

models. Location-independent models can be used to detect instances of the same activity anywhere in the scene,<br />

or even across multiple scenes. Our insight for this unsupervised activity learning problem is to incorporate scene context<br />

to characterize the behavior of every track. By scene context, we mean local scene structures, such as building entrances,<br />

parking spots and roads, that moving objects frequently interact with. Each track is attributed with large number of potentially<br />

useful features that capture the relationships and interactions with a set of existing scene context elements. Once<br />

feature vectors are obtained, tracks are grouped in this feature space using state-of-the-art clustering techniques, without<br />

considering scene location. Experiments are conducted on webcam video of a complex scene, with many interacting<br />

objects and very noisy tracks resulting from low frame rates and poor image quality. Our results demonstrate that location-independent<br />

and semantically interpretable groupings can be successfully obtained using unsupervised clustering<br />

methods, and that the models are superior to standard location-dependent clustering.<br />

09:00-11:10, Paper ThAT8.40<br />

Multipath Interference Compensation in Time-of-Flight Camera Images<br />

Fuchs, Stefan, German Aerospace Center<br />

Multipath interference is inherent to the working principle of a Time-of-flight camera and can influence the measurements<br />

by several centimeters. Especially in applications that demand for high accuracy, such as object localization for robotic<br />

manipulation or ego-motion estimation of mobile robots, multipath interference is not tolerable. In this paper we formulate<br />

a multipath model in order to estimate the interference and correct the measurements. The proposed approach comprises<br />

the measured scene structure. All distracting surfaces are assumed to be Lambertian radiators and the directional interference<br />

is simulated for correction purposes. The positive impact of these corrections is experimentally demonstrated.<br />

09:00-11:10, Paper ThAT8.41<br />

Segment-Based Foreground Extraction Dedicated to 3D Reconstruction<br />

Kim, Jungwhan, Soongsil Univ.<br />

Park, Anjin, AIST<br />

Jung, Keechul, Soongsil Univ.<br />

Researches of image-based 3D reconstruction have recently produced a number of good results, but they assume that the<br />

accurate foreground to be reconstructed is already extracted from each input image. This paper proposes a novel approach<br />

to extract more accurate foregrounds by iteratively performing foreground extraction and 3D reconstruction in a manner<br />

similar to an EM algorithm on regions segmented in an initial stage, called segments. After definitively extracting the<br />

- 259 -


foregrounds in multi-views based on simply selecting segments corresponding to the real foreground in only one image,<br />

further improved foregrounds are extracted by back-projecting 3D objects reconstructed based on the foreground extracted<br />

in the previous step into segments of each image in multi-views. These two steps are iteratively performed until the energy<br />

function is optimized. In the experiments, more accurate boundaries were obtained, although the proposed method used a<br />

simple 3D reconstruction method.<br />

09:00-11:10, Paper ThAT8.42<br />

Human Pose Estimation for Multiple Persons based on Volume Reconstruction<br />

Luo, Xinghan, Utrecht Univ.<br />

Berendsen, Berend<br />

Tan, Robby T., Utrecht Univ.<br />

Veltkamp, R. C., Utrecht Univ.<br />

Most of the development of pose recognition focused on a single person. However, many applications of computer vision<br />

essentially require the estimation of multiple people. Hence, in this paper, we address the problems of estimating poses of<br />

multiple persons using volumes estimated from multiple cameras. One of the main issues that causes the multiple person<br />

from multiple cameras to be problematic is the present of ghost; volumes. This problem arises when the projections of<br />

two different silhouettes of two different persons onto the 3D world overlap in a place where in fact there is no person in<br />

it. To solve this problem, we first introduce a novel principal axis-based framework to estimate the 3D ground plane positions<br />

of multiple people, and then use the position cues to label the multi-person volumes (voxels), while considering<br />

the voxel connectivity. Having labeled the voxels, we fit the volume of each person with a body model, and determine the<br />

pose of the person based on the model. The results on real videos demonstrate the accuracy and efficiency of our approach.<br />

09:00-11:10, Paper ThAT8.43<br />

3D Articulated Shape Segmentation using Motion Information<br />

Kalafatlar, Emre, Koç Univ.<br />

Yemez, Yucel, Koç Univ.<br />

We present a method for segmentation of articulated 3D shapes by incorporating the motion information obtained from<br />

time-varying models. We assume that the articulated shape is given in the form of a mesh sequence with fixed connectivity<br />

so that the inter-frame vertex correspondences, hence the vertex movements, are known a priori. We use different postures<br />

of an articulated shape in multiple frames to constitute an affinity matrix which encodes both temporal and spatial similarities<br />

between surface points. The shape is then decomposed into segments in spectral domain based on the affinity<br />

matrix using a standard K-means clustering algorithm. The performance of the proposed segmentation method is demonstrated<br />

on the mesh sequence of a human actor.<br />

09:00-11:10, Paper ThAT8.44<br />

Online Learning with Self-Organizing Maps for Anomaly Detection in Crowd Scenes<br />

Feng, Jie, Peking Univ.<br />

Zhang, Chao, Peking Univ.<br />

Hao, Pengwei, Queen Mary Univ. of London<br />

Detecting abnormal behaviors in crowd scenes is quite important for public security and has been paid more and more attentions.<br />

Most previous methods use offline trained model to perform detection which can’t handle the constantly changing<br />

crowd environment. In this paper, we propose a novel unsupervised algorithm to detect abnormal behavior patterns in<br />

crowd scenes with online learning. The crowd behavior pattern is extracted from the local spatio-temporal volume which<br />

consists of multiple motion patterns in temporal order. An online self-organizing map (SOM) is used to model the large<br />

number of behavior patterns in crowd. Each neuron can be updated by incrementally learning the new observations. To<br />

demonstrate the effectiveness of our proposed method, we have performed experiments on real-world crowd scenes. The<br />

online learning can efficiently reduce the false alarms while still be able to detect most of the anomalies.<br />

09:00-11:10, Paper ThAT8.45<br />

Scene Classification using Spatial Pyramid of Latent Topics<br />

Ergul, Emrah, Turkish Naval Academy<br />

Arica, Nafiz, Turkish Naval Academy<br />

- 260 -


We propose a scene classification method, which combines two popular methods in the literature: Spatial Pyramid Matching<br />

(SPM) and probabilistic Latent Semantic Analysis (pLSA) modeling. The proposed scheme called Cascaded pLSA performs<br />

pLSA in a hierarchical sense after the soft-weighted BoW representation based on dense local features is extracted.<br />

We associate spatial layout information by dividing each image into overlapping regions iteratively at different resolution<br />

levels and implementing a pLSA model for each region individually. Finally, an image is represented by concatenated<br />

topic distributions of each region. In performance evaluation, we compare the proposed method with the most successful<br />

methods in the literature, using the popular 15-class-dataset. In the experiments, it is seen that our method slightly outperforms<br />

the others in that particular dataset.<br />

09:00-11:10, Paper ThAT8.46<br />

Optimization of Target Objects for Natural Feature Tracking<br />

Gruber, Lukas, Graz Univ. of Tech.<br />

Zollmann, Stefanie, Graz Univ. of Tech.<br />

Wagner, Daniel, Graz Univ. of Tech.<br />

Schmalstieg, Dieter, Graz Univ. of Tech.<br />

Hollerer, Tobias, UCSB<br />

This paper investigates possible physical alterations of tracking targets to obtain improved 6DoF pose detection for a<br />

camera observing the known targets. We explore the influence of several texture characteristics on the pose detection, by<br />

simulating a large number of different target objects and camera poses. Based on statistical observations, we rank the importance<br />

of characteristics such as texturedness and feature distribution for a specific implementation of a 6DoF tracking<br />

technique. These findings allow informed modification strategies for improving the tracking target objects themselves, in<br />

the common case of man-made targets, as for example used in advertising. This fundamentally differs from and complements<br />

the traditional approach of leaving the targets unchanged while trying to optimize the tracking algorithms and parameters.<br />

09:00-11:10, Paper ThAT8.47<br />

View-Invariant Action Recognition using Rank Constraint<br />

Ashraf, Nazim, Univ. of Central Florida<br />

Shen, Yuping, Univ. of Central Florida<br />

Foroosh, Hassan, Univ. of Central Florida<br />

We propose a new method for view-invariant action recognition based on the rank constraint on the family of planar homographies<br />

associated with triplets of body points. We represent action as a sequence of poses and we use the fact that the<br />

family of homographies associated with two identical poses would have rank 4 to gauge similarity of the pose between<br />

two subjects, observed by different perspective cameras and from different viewpoints. Extensive experimental results<br />

show that our method can accurately identify action from video sequences when they are observed from totally different<br />

viewpoints with different camera parameters.<br />

09:00-11:10, Paper ThAT8.48<br />

Coarse-To-Fine Particle Filter by Implicit Motion Estimation for 3D Head Tracking on Mobile Devices<br />

Sung, Hacheon, Yonsei Univ.<br />

Choi, Kwontaeg, Yonsei Univ.<br />

Byun, Hyeran, Yonsei Univ.<br />

Due to the widely spread mobile devices over the years, a low cost implementation of an efficient head tracking system<br />

is becoming more useful for a wide range of applications. In this paper, we make an attempt to solving real-time 3D head<br />

tracking problem on mobile devices by enhancing the fitness of the dynamics. In our method, the particles are generated<br />

by implicit motion estimation between two particles rather than the explicit motion estimation using corresponding point<br />

matching between consecutive two frames. This generation is applied iteratively using coarse-to fine strategy in order to<br />

handle a large motion using a small number of particle. This reduces the computational cost while preserving the performance.<br />

We evaluate the efficiency and effectiveness of the proposed algorithm by empirical experiments. Finally, we demonstrate<br />

our method on a recent mobile phone.<br />

- 261 -


09:00-11:10, Paper ThAT8.49<br />

Visibility of Multiple Cameras in a Scene with Unknown Geometry<br />

Zhang, Liuxin, Beijing Inst. of Tech.<br />

Jia, Yunde, Beijing Inst. of Tech.<br />

In this paper, we investigate the problem of determining the visible regions of multiple cameras in a 3D scene without a<br />

priori knowledge of the scene geometry. Our approach is based on a variational energy functional where both the unresolved<br />

visibility information of multiple cameras and the unknown scene geometry are included. We cast visibility estimation<br />

and scene geometry reconstruction as an optimization of the variational energy functional amenable for minimization with<br />

the Euler-Lagrange driven evolution. Starting from any initial value, the accurate visibility of multiple cameras as well as<br />

the true scene geometry can be obtained at the end of the evolution. Experimental results show the validity of our approach.<br />

09:00-11:10, Paper ThAT8.50<br />

Low-Level Image Segmentation based Scene Classification<br />

Akbas, Emre, Univ. of Illinois<br />

Ahuja, Narendra, Univ. of Illinois<br />

This paper is aimed at evaluating the semantic information content of multiscale, low-level image segmentation. As a<br />

method of doing this, we use selected features of segmentation for semantic classification of real images. To estimate the<br />

relative measure of the information content of our features, we compare the results of classifications we obtain using them<br />

with those obtained by others using the commonly used patch/grid based features. To classify an image using segmentation<br />

based features, we model the image in terms of a probability density function, a Gaussian mixture model (GMM) to be<br />

specific, of its region features. This GMM is fit to the image by adapting a universal GMM which is estimated so it fits<br />

all images. Adaptation is done using a maximum-aposteriori criterion. We use kernelized versions of Bhattacharyya distance<br />

to measure the similarity between two GMMs and support vector machines to perform classification. We outperform previously<br />

reported results on a publicly available scene classification dataset. These results suggest further experimentation<br />

in evaluating the promise of low level segmentation in image classification.<br />

09:00-11:10, Paper ThAT8.51<br />

Learning Scene Semantics using Fiedler Embedding<br />

Liu, Jingen, Univ. of Michigan<br />

Ali, Saad, Carnegie Mellon Univ.<br />

We propose a framework to learn scene semantics from surveillance videos. Using the learnt scene semantics, a video analyst<br />

can efficiently and effectively retrieve the hidden semantic relationship between homogeneous and heterogeneous<br />

entities existing in the surveillance system. For learning scene semantics, the algorithm treats different entities as nodes<br />

in a graph, where weighted edges between the nodes represent the “initial” strength of the relationship between entities.<br />

The graph is then embedded into a k-dimensional space by Fiedler Embedding.<br />

09:00-11:10, Paper ThAT8.52<br />

Counting Vehicles in Highway Surveillance Videos<br />

Tamersoy, Birgi, The Univ. of Texas at Austin<br />

Aggarwal, J. K., The Univ. of Texas at Austin<br />

This paper presents a complete system for accurately and efficiently counting vehicles in a highway surveillance video.<br />

The proposed approach employs vehicle detection and tracking modules. In the detection module, an automatically trained<br />

binary classifier detects vehicles while providing robustness against view-point, poor quality videos and clutter. Efficient<br />

tracking is then achieved by a simplified multi-hypothesis approach. First an over-complete set of tracks is created considering<br />

every observed detection within a time interval. As needed, hypothesized detections are generated to force continuous<br />

tracks. Finally, a scoring function is used to separate the valid tracks in the over-complete set. Our tracking system<br />

achieved accurate results in significantly challenging highway surveillance videos.<br />

- 262 -


09:00-11:10, Paper ThAT8.53<br />

Efficient 3D Upper Body Tracking with Self-Occlusions<br />

Chen, Jixu, RPI<br />

Ji, Qiang, RPI<br />

We propose an efficient 3D upper body tracking method, which recovers the positions and orientations of six upper-body<br />

parts from the video sequence. Our method is based on a probabilistic graphical model (PGM), which incorporates the<br />

spatial relationships among the body parts, and a robust multi-view image likelihood using probabilistic PCA (PPCA).<br />

For the efficiency, we use a tree-structured graphical model and use the particle based belief propagation to perform the<br />

inference. Since our image likelihood is based on multiple views, we address the self-occlusion by modeling the likelihood<br />

of the body part in each view, and automatically decrease the influence of the occluded view in the inference procedure.<br />

09:00-11:10, Paper ThAT8.54<br />

Track Initialization in Low Frame Rate and Low Resolution Videos<br />

Cuntoor, Naresh, Kitware Inc.<br />

Basharat, Arslan, Kitware Inc.<br />

Perera, A. G. Amitha, Kitware Inc.<br />

Hoogs, Anthony, Kitware Inc.<br />

The problem of object detection and tracking has received relatively less attention in low frame rate and low resolution<br />

videos. Here we focus on motion segmentation in videos where objects appear small (less than 30-pixel tall people) and<br />

have low frame rate (less than 5 Hz). We study challenging cases where some of the, otherwise successful, approaches<br />

may break down. We investigate a number of popular techniques in computer vision that have been shown to be useful<br />

for discriminating various spatio-temporal signatures. These include: Histogram of oriented Gradients (HOG), Histogram<br />

of oriented optical Flow (HOF) and Haar-features (Viola and Jones). We use these feature to classify the motion segmentations<br />

into person vs. other and vehicle vs. other. We rely on aligned motion history images to create a more consistent<br />

object representation across frames. We present results on these features using webcam data and wide-area aerial video<br />

sequences.<br />

09:00-11:10, Paper ThAT8.55<br />

On the Performance of Handoff and Tracking in a Camera Network<br />

Li, Yiming, Univ. of California Riverside<br />

Bhanu, Bir, Univ. of California Riverside<br />

Nguyen, Vincent, Univ. of California Riverside<br />

Camera handoff is an important problem when using multiple cameras to follow a number of objects in a video network.<br />

However, almost all the handoff techniques rely on a robust tracker. State-of-the-art techniques used to evaluate the performance<br />

of camera handoff use either annotated videos or simulated data, and the handoff performance is evaluated in<br />

conjunction with a tracker. This does not allow a deeper understanding into the performance of a tracker and a handoff<br />

technique separately in the real-world settings. In this paper, we evaluate three camera handoff techniques, two different<br />

color-based trackers in seven real-life cases, with varying numbers of cameras, number of objects and the changing environmental<br />

conditions. We also perform experiments on annotated videos to provide the ground-truth for all the scenarios.<br />

This evaluation of performance isolates the effect of tracking and handoff techniques and clarifies their role in a video<br />

network.<br />

09:00-11:10, Paper ThAT8.56<br />

Object Tracking with Ratio Cycles using Shape and Appearance Cues<br />

Sargin, Mehmet Emre, UC Santa Barbara<br />

Ghosh, Pratim, UC Santa Barbara<br />

Manjunath, B. S., UC Santa Barbara<br />

Rose, Kenneth, UC Santa Barbara<br />

We present a method for object tracking over time sequence imagery. The image plane is represented with a 4-connected<br />

planar graph where vertices are associated with pixels. On each image, the outer contour of the object is localized by<br />

finding the optimal cycle in the graph such that a cost function based on temporal, appearance and shape priors is minimized.<br />

Our contribution is the particle filtering-based framework to integrate the shape cue with the temporal and appear-<br />

- 263 -


ance cues. We demonstrate that incorporating the shape prior yields promising performance improvement over temporal<br />

and appearance priors on various object tracking scenarios.<br />

09:00-11:10, Paper ThAT8.57<br />

Real-Time Abnormal Event Detection in Complicated Scenes<br />

Shi, Yinghuan, Nanjing Univ.<br />

Gao, Yang, Nanjing Univ.<br />

Wang, Ruili, Massey Univ.<br />

In this paper, we proposed a novel real-time abnormal event detection framework that requires a short training period<br />

and has a fast processing speed. Our approach is based on phase correlation and our newly developed spatial-temporal<br />

co-occurrence Gaussian mixture models (STCOG)with the following steps: (i) a frame is divided into non-overlapping<br />

local regions; (ii) phase correlation is used to estimate the motion vectors between successive two frames for all corresponding<br />

local regions, and (iii) STCOG is used to model normal events and detect abnormal events if any deviation<br />

from the trained STCOG is found. Our proposed approach is also able to update the parameters incrementally and can<br />

be applied in complicated scenes. The proposed approach outperforms previous ones in terms of shorter training periods<br />

and lower computational complexity.<br />

ThAT9 Lower Foyer<br />

Human Computer Interaction and Biometrics Poster Session<br />

Session chair: Alba Castro, Jose Luis (Univ. of Vigo)<br />

09:00-11:10, Paper ThAT9.1<br />

Encoding Actions via Quantized Vocabulary of Averaged Silhouettes<br />

Wang, Liang, The Univ. of Melbourne<br />

Leckie, Christopher, The Univ. of Melbourne<br />

Human action recognition from video clips has received increasing attention in recent years. This paper proposes a simple<br />

yet effective method for the problem of action recognition. The method aims to encode human actions using the quantized<br />

vocabulary of averaged silhouettes that are derived from space-time windowed shapes and implicitly capture local temporal<br />

motion as well as global body shape. Experimental results on the publicly available Weizmann dataset have demonstrated<br />

that, despite its simplicity, our method is effective for recognizing actions, and is comparable to other state-of-the-art methods.<br />

09:00-11:10, Paper ThAT9.2<br />

Action Recognition using Space-Time Shape Difference Images<br />

Qu, Hao, The Univ. of Melbourne<br />

Wang, Liang, The Univ. of Melbourne<br />

Leckie, Christopher, The Univ. of Melbourne<br />

A common approach to human action recognition is to use 2-D silhouettes in the space-time volume as a basis for further<br />

extraction of useful features. In this paper, we present a novel motion representation based on difference images. We show<br />

that this representation exploits the dynamics of motion, and show its effectiveness in action recognition. Moreover, experimental<br />

results demonstrate that this method is highly accurate and is not sensitive to the resolution of the video.<br />

09:00-11:10, Paper ThAT9.3<br />

A Brain Computer Interface for Communication using Real-Time fMRI<br />

Eklund, Anders, Linköping Univ.<br />

Andersson, Mats, Linköping Univ.<br />

Ohlsson, Henrik, Linköping Univ.<br />

Ynnerman, Anders, Linköping Univ.<br />

Knutsson, Hans,<br />

We present the first step towards a brain computer interface (BCI) for communication using real-time functional magnetic<br />

resonance imaging (fMRI). The subject in the MR scanner sees a virtual keyboard and steers a cursor to select different<br />

- 264 -


letters that can be combined to create words. The cursor is moved to the left by activating the left hand, to the right by activating<br />

the right hand, down by activating the left toes and up by activating the right toes. To select a letter, the subject<br />

simply rests for a number of seconds. We can thus communicate with the subject in the scanner by for example showing<br />

questions that the subject can answer. Similar BCI for communication have been made with electroencephalography<br />

(EEG). In these implementations the subject for example focuses on a letter while different rows and columns of the virtual<br />

keyboard are flashing. The system then tries to detect if the correct letter is flashing or not. In our setup we instead classify<br />

the brain activity. Our system is not limited to a communication interface, but can be used for any interface where five degrees<br />

of freedom is necessary.<br />

09:00-11:10, Paper ThAT9.4<br />

Combined Top-Down/Bottom-Up Human Articulated Pose Estimation using AdaBoost Learning<br />

Wang, Sheng, Tsinghua Univ.<br />

Ai, Haizhou, Tsinghua Univ.<br />

Yamashita, Takayoshi, OMRON Corp.<br />

Lao, Shihong, OMRON Corp.<br />

In this paper, a novel human articulated pose estimation method based on AdaBoost algorithm is presented. The human<br />

articulated pose is estimated by locating major human joint positions. We learn the classifiers on a normalized image for<br />

classifying each pixel position into a certain category. Two different kinds of classifiers, bottom-up joint position classifier<br />

and top-down skeleton classifier, are combined to achieve final results. HOG (Histogram of Oriented Gradient) feature is<br />

used for training both classifiers. Our human pose estimation system consists of three models, human detection, view classification,<br />

and pose estimation. The implemented system can automatically estimate human pose of different views. Experiment<br />

results are reported to show our proposed method can work on relatively small-size human images without using<br />

human silhouettes as a prerequisite, which is very efficient, robust and accurate enough for potential applications in visual<br />

surveillance.<br />

09:00-11:10, Paper ThAT9.5<br />

The Human Action Image<br />

Sethi, Ricky, Univ. of California, Riverside<br />

Roy-Chowdhury, Amit, Univ. of California, Riverside<br />

Recognizing a person’s motion is intuitive for humans but represents a challenging problem in machine vision. In this<br />

paper, we present a multi-disciplinary framework for recognizing human actions. We develop a novel descriptor, the<br />

Human Action Image (HAI): a physically-significant, compact representation for the motion of a person, which we derive<br />

from first principles in physics using Hamilton’s Action. We embed the HAI as the Motion Energy Pathway of the latest<br />

Neurobiological model of motion recognition. The Form Pathway is modelled using existing low-level feature descriptors<br />

based on shape and appearance. Experimental validation of the theory is provided on the well-known Weizmann and USF<br />

Gait datasets.<br />

09:00-11:10, Paper ThAT9.6<br />

Combining Spatial and Temporal Information for Gait based Gender Classification<br />

Hu, Maodi, Beihang Univ.<br />

Wang, Yunhong, Beihang Univ.<br />

Zhang, Zhaoxiang, Beihang Univ.<br />

Wang, Yiding, North China Univ. of Tech.<br />

In this paper, we address the problem of gait based gender classification. The Gabor feature which is a new attempt for<br />

gait analysis, not only improves the robustness to the segmental noise, but also provides a feasible way to purge the additional<br />

influence factors like clothing and carrying condition changes before supervised learning. Furthermore, through the<br />

agency of Maximization of Mutual Information (MMI), the low dimensional discriminative representation is obtained as<br />

the Gabor-MMI feature. After that, gender related Gaussian Mixture Model-Hidden Markov Models (GMM-HMMs) are<br />

constructed for classification work. In this case, supervised learning reduces the dimension of parameter space, and significantly<br />

increases the gap between likelihoods of the gender models. In order to assess the performance of our proposed<br />

approach, we compare it with other methods on the standard CASIA Gait Databases (Dataset B). Experimental results<br />

demonstrate that our approach achieves better Correct Classification Rate (CCR) than the state of the art methods.<br />

- 265 -


09:00-11:10, Paper ThAT9.7<br />

A Vision-Based Taiwanese Sign Language Recognition System<br />

Huang, Chung-Lin, National Tsing-Hua Univ.<br />

Tsai, Bo-Lin, National Tsing-Hua Univ.<br />

This paper presents a vision-based continuous sign language recognition system to interpret the Taiwanese Sign Language<br />

(TSL). The continuous sign language, which consists of a sequence of hold and movement segments, can be decomposed<br />

into non-signs and signs. The signs can be either static signs or dynamic signs. The former can be found in the hold<br />

segment, whereas the latter can be identified in the combination of hold and movement segments. We use Support Vector<br />

Machine (SVM) to recognize the static sign and apply HMM model to identify the dynamic signs. Finally, we use the<br />

finite state machine to verify the correctness of the grammar of the recognized TSL sentence, and correct the miss-recognized<br />

signs.<br />

09:00-11:10, Paper ThAT9.8<br />

Fusing Audio-Visual Nonverbal Cues to Detect Dominant People in Group Conversations<br />

Aran, Oya, Idiap Res. Inst.<br />

Gatica-Perez, Daniel,<br />

This paper addresses the multimodal nature of social dominance and presents multimodal fusion techniques to combine<br />

audio and visual nonverbal cues for dominance estimation in small group conversations. We combine the two modalities<br />

both at the feature extraction level and at the classifier level via score and rank level fusion. The classification is done by<br />

a simple rule-based estimator. We perform experiments on a new 10-hour dataset derived from the popular AMI meeting<br />

corpus. We objectively evaluate the performance of each modality and each cue alone and in combination. Our results<br />

show that the combination of audio and visual cues is necessary to achieve the best performance.<br />

09:00-11:10, Paper ThAT9.9<br />

Wavelet Domain Local Binary Pattern Features for Writer Identification<br />

Du, Liang, Huazhong Univ. of Science and Tech.<br />

You, Xinge, Huazhong Univ. of Science and Tech.<br />

Xu, Huihui, Huazhong Univ. of Science and Tech.<br />

Gao, Zhifan, Huazhong Univ. of Science and Tech.<br />

Tang, Yuanyan, Hongkong Baptist University<br />

The representation of writing styles is a crucial step of writer identification schemes. However, the large intra-writer variance<br />

makes it a challenging task. Thus, a good feature of writing style plays a key role in writer identification. In this<br />

paper, we present a simple and effective feature for off-line, text-independent writer identification, namely wavelet domain<br />

local binary patterns (WD-LBP). Based on WD-LBP, a writer identification algorithm is developed. WD-LBP is able to<br />

capture the essence of characteristics of writer while ignoring the variations intrinsic to every single writer. Unlike other<br />

texture framework method, we do not assign any statistical distribution assumption to the proposed method. This prevent<br />

us from making any, possibly erroneous, assumptions about the handwritten image feature distributions. The experimental<br />

results show that the proposed writer identification method achieves high accuracy of identification and outperforms recent<br />

writer identification method such as wavelet-GGD model and Gabor filtering method.<br />

09:00-11:10, Paper ThAT9.10<br />

Audio-Visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space<br />

Nicolaou, Mihalis, Imperial Coll.<br />

Gunes, Hatice, Imperial Coll.<br />

Pantic, Maja, Imperial Coll.<br />

This paper focuses on audio-visual (using facial expression, shoulder and audio cues) classification of spontaneous affect,<br />

utilising generative models for classification (i) in terms of Maximum Likelihood Classification with the assumption that<br />

the generative model structure in the classifier is correct, and (ii) Likelihood Space Classification with the assumption<br />

that the generative model structure in the classifier may be incorrect, and therefore, the classification performance can be<br />

improved by projecting the results of generative classifiers onto likelihood space, and then using discriminative classifiers.<br />

Experiments are conducted by utilising Hidden Markov Models for single cue classification, and 2 and 3-chain coupled<br />

Hidden Markov Models for fusing multiple cues and modalities. For discriminative classification, we utilise Support<br />

- 266 -


Vector Machines. Results show that Likelihood Space Classification improves the performance (91.76%) of Maximum<br />

Likelihood Classification (79.1%). Thereafter, we introduce the concept of fusion in the likelihood space, which is shown<br />

to outperform the typically used model-level fusion, attaining a classification accuracy of 94.01% and further improving<br />

all previous results.<br />

09:00-11:10, Paper ThAT9.12<br />

Improved Mandarin Keyword Spotting using Confusion Garbage Model<br />

Zhang, Shilei, IBM Res., China<br />

Shuang, Zhiwei, IBM Res., China<br />

Shi, Qin, IBM Res., China<br />

Qin, Yong, IBM Res., China<br />

This paper presents an improved acoustic keyword spotting (KWS) algorithm using a novel confusion garbage model in<br />

Mandarin conversational speech. Observing the KWS corpus, we found there are many words with similar pronunciation<br />

with predefined keywords, although they have different Chinese characters and different meanings, which easily result in<br />

high false alarm rate. In this paper, an improved acoustic KWS method with confusion garbage models was developed<br />

that absorbs similar pronunciation words confused with specific keywords for a given task. One obvious advantage of<br />

such method is that it provides a flexible framework to implement the selection procedure and reduce false alarm rate effectively<br />

for a specific task. The efficiency of the proposed architecture was evaluated under HMM-based confidence<br />

measures (CM) methods and demonstrated on a conversational telephone dataset.<br />

09:00-11:10, Paper ThAT9.13<br />

Human Activity Recognition using Local Shape Descriptors<br />

Venkatesha, Sharath, Univ. of California, Santa Barbara<br />

Turk, Matthew, Univ. of California, Santa Barbara<br />

We propose a method for human activity recognition in videos, based on shape analysis. We define local shape descriptors<br />

for interest points on the detected contour of the human action and build an action descriptor using a Bag of Features<br />

method. We also use the temporal relation among matching interest points across successive video frames. Further, an<br />

SVM is trained on these action descriptors to classify the activity in the scene. The method is invariant to the length of the<br />

video sequence, and hence it is suitable in online activity recognition. We have demonstrated the results on an action database<br />

consisting of nine actions like walk, jump, bend, etc., by twenty people, in indoor and outdoor scenarios. The proposed<br />

method achieves an accuracy of 87%, and is comparable to other state-of-the-art methods.<br />

09:00-11:10, Paper ThAT9.14<br />

Use of Line Spectral Frequencies for Emotion Recognition from Speech<br />

Bozkurt, Elif, Koc Univ.<br />

Erzin, Engin, Koc Univ.<br />

Eroglu Erdem, Cigdem, Bahcesehir Univ.<br />

Erdem, Arif Tanju, Ozyegin Univ.<br />

We propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not<br />

been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled<br />

cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition.<br />

The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant<br />

structure as well, which are related to the emotional state of the speaker [4]. We use the Gaussian mixture model (GMM)<br />

classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin<br />

Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF<br />

features bring a consistent improvement over the MFCC based emotion classification rates.<br />

09:00-11:10, Paper ThAT9.15<br />

Spatially Regularized Common Spatial Patterns for EEG Classification<br />

Lotte, Fabien, Inst. for Infocomm Res.<br />

Guan, Cuntai, Inst. for Infocomm Res.<br />

- 267 -


In this paper, we propose a new algorithm for Brain-Computer Interface (BCI): Spatially Regularized Common Spatial<br />

Patterns (SRCSP). SRCSP is an extension of the famous CSP algorithm which includes spatial a priori in the learning<br />

process, by adding a regularization term which penalizes spatially non smooth filters. We compared SRCSP and CSP algorithms<br />

on data of 14 subjects from BCI competitions. Results suggested that SRCSP can improve performances, around<br />

10% more in classification accuracy, for subjects with poor CSP performances. They also suggested that SRCSP leads to<br />

more physiologically relevant filters than CSP.<br />

09:00-11:10, Paper ThAT9.16<br />

Comparing Multiple Classifiers for Speech-Based Detection of Self-Confidence – A Pilot Study<br />

Krajewski, Jarek, Univ. of Wuppertal<br />

Batliner, Anton, Univ. of Erlangen-Nuremberg<br />

Kessel, Silke, Univ. of Wuppertal<br />

The aim of this study is to compare several classifiers commonly used within the field of speech emotion recognition<br />

(SER) on the speech based detection of self-confidence. A standard acoustic feature set was computed, resulting in 170<br />

features per one-minute speech sample (e.g. fundamental frequency, intensity, formants, MFCCs). In order to identify<br />

speech correlates of self-confidence, the lectures of 14 female participants were recorded, resulting in 306 one-minute<br />

segments of speech. Five expert raters independently assessed the self-confidence impression. Several classification models<br />

(e.g. Random Forest, Support Vector Machine, Naive Bayes, Multi-Layer Perceptron) and ensemble classifiers (AdaBoost,<br />

Bagging, Stacking) were trained. AdaBoost procedures turned out to achieve best performance, both for single models<br />

(AdaBoost LR: 75.2% class-wise averaged recognition rate) and for average boosting (59.3%) within speaker-independent<br />

settings.<br />

09:00-11:10, Paper ThAT9.17<br />

Hierarchical Human Action Recognition by Normalized-Polar Histogram<br />

Ziaeefard, Maryam, Sahand Univ. of Tech.<br />

Ebrahimnezhad, Hossein, Sahand Univ. of Tech.<br />

This paper proposes a novel human action recognition approach which represents each video sequence by a cumulative<br />

skeletonized images (called CSI) in one action cycle. Normalized-polar histogram corresponding to each CSI is computed.<br />

That is the number of pixels in CSI which is located in the certain distance and angles of the normalized circle. Using hierarchical<br />

classification in two levels, human action is recognized. In first level, course classification is performed with<br />

whole bins of histogram. In the second level, the more similar actions are examined again employing the special bins and<br />

the fine classification is completed. We use linear multi-class SVM as the classifier in two steps. Real human action dataset,<br />

Weizmann, is selected for evaluation. The resulting average recognition rate of the proposed method is 97.6%.<br />

09:00-11:10, Paper ThAT9.18<br />

Automatic 3D Facial Expression Recognition based on a Bayesian Belief Net and a Statistical Facial Feature Model<br />

Zhao, Xi, Ec. Centrale de Lyon<br />

Huang, Di, Ec. Centrale de Lyon<br />

Dellandréa, Emmanuel, Ec. Centrale de Lyon<br />

Chen, Liming, Ec. Centrale de Lyon<br />

Automatic facial expression recognition on 3D face data is still a challenging problem. In this paper we propose a novel<br />

approach to perform expression recognition automatically and flexibly by combining a Bayesian Belief Net (BBN) and<br />

Statistical facial feature models (SFAM). A novel BBN is designed for the specific problem with our proposed parameter<br />

computing method. By learning global variations in face landmark configuration (morphology) and local ones in terms of<br />

texture and shape around landmarks, morphable Statistic Facial feature Model (SFAM) allows not only to perform an automatic<br />

landmarking but also to compute the belief to feed the BBN. Tested on the public 3D face expression database<br />

BU-3DFE, our automatic approach allows to recognize expressions successfully, reaching an average recognition rate<br />

over 82%.<br />

- 268 -


09:00-11:10, Paper ThAT9.19<br />

EEG-Based Personal Identification: From Proof-of-Concept to a Practical System<br />

Su, Fei, Beijing Univ. of Posts and Telecommunications<br />

Xia, Liwen, Beijing Univ. of Posts and Telecommunications<br />

Ma, Junshui, Merck Res. Lab. Merck & Co, Inc.<br />

Although the concept of using brain waves, e.g. Electroencephalogram (EEG), for personal identification has been validated<br />

in several studies, some unanswered practical and theoretical questions prevent this technology from further development<br />

for commercialization. Based on a well-designed personal identification experiment using EEG recordings, this study addressed<br />

three of these questions, which are (1) feasibility of using portable EEG equipment, (2) necessity for controlling<br />

factors influencing EEG, (3) the optimal set of features. With our understanding of the answers to these questions, the<br />

EEG-based personal identification system we built achieved an average accuracy of 97.5% on a dataset with 40 subjects.<br />

Results of this study provided supporting evidence that EEG-based personal identification from proof-of-concept to system<br />

implementation is promising.<br />

09:00-11:10, Paper ThAT9.20<br />

Improved Facial Expression Recognition with Trainable 2-D Filters and Support Vector Machines<br />

Peiyao, Li, Univ. of Wollongong<br />

Phung, Son Lam, Univ. of Wollongong<br />

Bouzerdoum, Abdesselam, Univ. of Wollongong<br />

Tivive, Fok Hing Chi, Univ. of Wollongong<br />

Facial expression is one way humans convey their emotional states. Accurate recognition of facial expressions is essential<br />

in perceptual human-computer interface, robotics and mimetic games. This paper presents a novel approach to facial expression<br />

recognition from static images that combines fixed and adaptive 2-D filters in a hierarchical structure. The fixed<br />

filters are used to extract primitive features. They are followed by the adaptive filters that are trained to extract more complex<br />

facial features. Both types of filters are non-linear and are based on the biological mechanism of shunting inhibition.<br />

The features are finally classified by a support vector machine. The proposed approach is evaluated on the JAFFE database<br />

with seven types of facial expressions: anger, disgust, fear, happiness, neutral, sadness and surprise. It achieves a classification<br />

rate of 96.7%, which compares favorably with several existing techniques for facial expression recognition tested<br />

on the same database.<br />

09:00-11:10, Paper ThAT9.21<br />

A Biologically-Inspired Top-Down Learning Model based on Visual Attention<br />

Sang, Nong, Huazhong Univ. of Science and Tech.<br />

Wei, Longsheng, Huazhong Univ. of Science and Tech.<br />

Wang, Yuehuan, Huazhong Univ. of Science and Tech.<br />

A biologically-inspired top-down learning model based on visual attention is proposed in this paper. Low-level visual features<br />

are extracted from learning object itself and do not depend on the background information. All the features are expressed<br />

as a feature vector, which is looked as a random variable following a normal distribution. So every learning object<br />

is represented as the mean and standard deviation. All the learning objects are combined as an object class, which is represented<br />

as class’s mean and class’s standard deviation stored in long-term memory (LTM). Then the learned knowledge<br />

is used to find the similar location in an attended image. Experimental results indicate that: when the attended object<br />

doesn’t always appear in the background similar to that in the learning objects or their combinations change hugely between<br />

learning images and attended images, our model is excellent to the top-down approach of VOCUS and NavaIPakkam’s<br />

statistical model.<br />

09:00-11:10, Paper ThAT9.22<br />

Human Action Recognition using Segmented Skeletal Features<br />

Yoon, Sang Min, Tech. Univ. of Darmstadt<br />

Kuijper, Arjan, Fraunhofer IGD<br />

We present a novel human action recognition system based on segmented skeletal features which are separated into several<br />

human body parts such as face, torso and limbs. Our proposed human action recognition system consists of two steps: (i)<br />

automatic skeletal feature extraction and splitting by measuring the similarity in the space of diffusion tensor fields, and<br />

- 269 -


(ii) multiple kernel Support Vector Machine based human action recognition. Experimental results on a set of test database<br />

show that our proposed method is very efficient and effective to recognize human actions using few parameters, independent<br />

of dimensions, shadows, and viewpoints.<br />

09:00-11:10, Paper ThAT9.23<br />

Action Recognition by Multiple Features and Hyper-Spheremulti-Class SVM<br />

Liu, Jia, Shanghai Jiao Tong Univ.<br />

Yang, Jie, Shanghai Jiao Tong Univ.<br />

Zhang, Yi, Shanghai Jiao Tong Univ.<br />

He, Xiangjian, University of Technology, Sydney<br />

In this paper we propose a novel framework for action recognition based on multiple features for improve action recognition<br />

in videos. The fusion of multiple features is important for recognizing actions as often a single feature based representation<br />

is not enough to capture the imaging variations (view-point, illumination etc.) and attributes of individuals (size, age,<br />

gender etc.). Hence, we use two kinds of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (cuboids<br />

and 2-D SIFT), and ii) the higher-order statistical models of interest points, which aims to capture the global information<br />

of the actor. We construct video representation in terms of local space-time features and global features and integrate such<br />

representations with hyper-sphere multi-class SVM. Experiments on publicly available datasets show that our proposed<br />

approach is effective. An additional experiment shows that using both local and global features provides a richer representation<br />

of human action when compared to the use of a single feature type.<br />

09:00-11:10, Paper ThAT9.24<br />

Multimodal Recognition of Cognitive Workload for Multitasking in the Car<br />

Putze, Felix, Karlsruhe Inst. of Tech.<br />

Jarvis, Jan-Philip, Karlsruhe Inst. of Tech.<br />

Schultz, Tanja, Univ. Karlsruhe<br />

This work describes the development and evaluation of a recognizer for different levels of cognitive workload in the car.<br />

We collected multiple biosignal streams (skin conductance, pulse, respiration, EEG) during an experiment in a driving<br />

simulator in which the drivers performed a primary driving task and several secondary tasks of varying difficulty. From<br />

this data, an SVM based workload classifier was trained and evaluated, yielding recognition rates of up to for three levels<br />

of workload.<br />

09:00-11:10, Paper ThAT9.25<br />

Automatic Facial Action Detection using Histogram Variation between Emotional States<br />

Senechal, Thibaud, ISIR, UPMC<br />

Bailly, Kevin, Univ. PIERRE 1 MARIE CURIE - PARIS 6<br />

Prevost, Lionel, Univ. PIERRE 1 MARIE CURIE - PARIS 6<br />

This article presents an appearance based method to detect automatically facial actions. Our approach focuses on reducing<br />

features sensitivity to identity of the subject. We compute from an expressive image a Local Gabor Binary Pattern (LGBP)<br />

histogram and synthesize a LGBP histogram approaching the one we would compute on a neutral face. Difference between<br />

these two histograms are used as inputs of Support Vector Machine (SVM) binary detectors associated with a new kernel:<br />

the Histogram Difference Intersection (HDI) kernel. Experimental results carried out for 16 Action Units (AUs) on the<br />

benchmark Cohn-Kanade database can be compared favorably with two state-of-the-art methods.<br />

09:00-11:10, Paper ThAT9.27<br />

Decoding Finger Flexion from Electrocorticographic Signals using Sparse Gaussian Process<br />

Wang, Zuoguan, RPI<br />

Ji, Qiang, RPI<br />

Schalk, Gerwin, NYS Dept of Health<br />

Miller, Kai J., Univ. of Washington,<br />

A brain-computer interface (BCI) creates a direct communication pathway between the brain and an external device, and<br />

can thereby restore function in people with severe motor disabilities. A core component in a BCI system is the decoding<br />

- 270 -


algorithm that translates brain signals into action commands of an output device. Most of current decoding algorithms are<br />

based on linear models (e.g., derived using linear regression) that may have important shortcomings. The use of nonlinear<br />

models (e.g., neural networks) could overcome some of these shortcomings, but has difficulties with high dimensional<br />

feature spaces. Here we propose another decoding algorithm that is based on the sparse gaussian process with pseudoinputs<br />

(SPGP). As a nonparametric method, it can model more complex relationships compared to linear methods. As a<br />

kernel method, it can readily deal with high dimensional feature space. The evaluations shown in this paper demonstrate<br />

that SPGP can decode the flexion of finger movements from electrocorticographic (ECoG) signals more accurately than<br />

a previously described algorithm that used a linear model. In addition, by formulating problems in the bayesian probabilistic<br />

framework, SPGP can provide estimation of the prediction uncertainty. Furthermore, the trained SPGP offers a very effective<br />

way for identifying important features.<br />

09:00-11:10, Paper ThAT9.28<br />

Hand Pointing Estimation for Human Computer Interaction based on Two Orthogonal-Views<br />

Hu, Kaoning, State Univ. of New York at Binghamton<br />

Canavan, Shaun, State Univ. of New York at Binghamton<br />

Yin, Lijun, State Univ. of New York at Binghamton<br />

Hand pointing has been an intuitive gesture for human interaction with computers. Big challenges are still posted for accurate<br />

estimation of finger pointing direction in a 3D space. In this paper, we present a novel hand pointing estimation<br />

system based on two regular cameras, which includes hand region detection, hand finger estimation, two views‘ feature<br />

detection, and 3D pointing direction estimation. Based on the idea of binary pattern face detector, we extend the work to<br />

hand detection, in which a polar coordinate system is proposed to represent the hand region, and achieved a good result<br />

in terms of the robustness to hand orientation variation. To estimate the pointing direction, we applied an AAM based approach<br />

to detect and track 14 feature points along the hand contour from a top view and a side view. Combining two views<br />

of the hand features, the 3D pointing direction is estimated. The experiments have demonstrated the feasibility of the system.<br />

09:00-11:10, Paper ThAT9.29<br />

A Brain-Computer Interface for Mental Arithmetic Task from Single-Trial Near-Infrared Spectroscopy Brain Signals<br />

Ang, Kai Keng, Inst. for Infocomm Res. A*STAR<br />

Guan, Cuntai, Inst. for Infocomm Res.<br />

Lee, Kerry, National Inst. of Education<br />

Lee, Jie Qi, National Inst. of Education<br />

Nioka, Shoko, Univ. of Pennsylvania<br />

Chance, Britton, Univ. of Pennsylvania<br />

Near-infrared spectroscopy (NIRS) enables non-invasive recording of cortical hemoglobin oxygenation in human subjects<br />

through the intact skull using light in the near-infrared range to determine. Recently, NIRS-based brain-computer interfaces<br />

are introduced for discriminating left and right-hand motor imagery. A neuroimaging study has also revealed event-related<br />

hemodynamic responses associated with the performance of mental arithmetic tasks. This paper proposes a novel BCI for<br />

detecting changes resulting from increases in the magnitude of operands used in a mental arithmetic task, using data from<br />

single-trial NIRS brain signals. We measured hemoglobin responses from 20 healthy subjects as they solved mental arithmetic<br />

problems with three difficulty levels. Accuracy in recognizing one difficulty level from another is then presented<br />

using 5 ? 5-fold cross-validations on the data collected. The results yielded an overall average accuracy of 71.2%, thus<br />

demonstrating potential in the proposed NIRS-based BCI in recognizing difficulty of problems encountered by mental<br />

arithmetic problem solvers.<br />

09:00-11:10, Paper ThAT9.30<br />

Articulated Human Body: 3D Pose Estimation using a Single Camera<br />

Wang, Zibin, The Chinese Univ. of Hong Kong<br />

Chung, Chi-Kit Ronald, The Chinese Univ. of Hong Kong<br />

We address how human pose in 3D can be tracked from a monocular video using a probabilistic inference method. Human<br />

body is modeled as a number of cylinders in space, each with an appearance facet as well as a pose facet. The appearance<br />

facets are acquired in a learning phase from some beginning frames of the input video. On this the visual hull description<br />

- 271 -


of the target human subject constructed from multiple images is found to be instrumental. In the operation phase, the 3D<br />

pose of the target subject in the subsequent frames of the input video is tracked. A bottom-up framework is used, which<br />

for any current image frame extracts firstly the tentative candidates of each body part in the image space. The human<br />

model, with the appearance facets already learned, and with the pose entries initialized with those for the previous image<br />

frame, is then brought in under a belief propagation algorithm, to establish correlation with the above 2D body part candidates<br />

while enforcing the proper articulation between the body parts, thereby determining the 3D pose of the human<br />

body in the current frame. The tracking performance on a number of monocular videos is shown.<br />

09:00-11:10, Paper ThAT9.31<br />

Resampling Approach to Facial Expression Recognition using 3D Meshes<br />

Murthy, O. V. Ramana, NUS<br />

Venkatesh, Y. V., NUS<br />

Kassim, Ashraf, NUS<br />

We propose a novel strategy, based on resampling of 3D meshes, to recognize facial expressions. This entails conversion<br />

of the existing irregular 3D mesh structure in the database to a uniformly sampled 3D matrix structure. An important consequence<br />

of this operation is that the classical correspondence problem can be dispensed with. In the present paper, in<br />

order to demonstrate the feasibility of the proposed strategy, we employ only spectral flow matrices as features to recognize<br />

facial expressions. Experimental results are presented, along with suggestions for possible refinements to the strategy to<br />

improve classification accuracy.<br />

09:00-11:10, Paper ThAT9.33<br />

Facial Expression Mimicking System<br />

Fukui, Ryuichi, Toyohashi Univ. of Tech.<br />

Katsurada, Kouichi, Toyohashi Univ. of Tech.<br />

Iribe, Yurie, Toyohashi Univ. of Tech.<br />

Nitta, Tsuneo, Toyohashi Univ. of Tech.<br />

We propose a facial expression mimicking system that copies the facial expression of one person on the image of another.<br />

The system uses the active appearance model (AAM), a commonly used model in the field of facial expression processing.<br />

AAM compositionally comprises some parameters representing facial shape, brightness, and illumination environment.<br />

Therefore, in addition to the facial expression elements, the model parameters express other elements, such as individuality<br />

and direction of the face. In order to extract the facial expression elements from compositional parameters of AAM, we<br />

applied principal component analysis (PCA) to the AAM parameter values, collected with each change in facial expression.<br />

The obtained facial expression model is applied to the facial expression mimicking system and the experiment shows its<br />

effectiveness for mimicking.<br />

09:00-11:10, Paper ThAT9.34<br />

A Framework for Hand Gesture Recognition and Spotting using Sub-Gesture Modeling<br />

Malgireddy, Manavender, Univ. at Buffalo, SUNY<br />

Corso, Jason, Univ. at Buffalo, SUNY<br />

Setlur, Srirangaraj, Univ. at Buffalo<br />

Govindaraju, Venu, Univ. at Buffalo<br />

Mandalapu, Dinesh, HP Lab.<br />

Hand gesture interpretation is an open research problem in Human Computer Interaction (HCI), which involves locating<br />

gesture boundaries (Gesture Spotting) in a continuous video sequence and recognizing the gesture. Existing techniques<br />

model each gesture as a temporal sequence of visual features extracted from individual frames which is not efficient due<br />

to the large variability of frames at different timestamps. In this paper, we propose a new sub-gesture modeling approach<br />

which represents each gesture as a sequence of fixed sub-gestures (a group of consecutive frames with locally coherent<br />

context) and provides a robust modeling of the visual features. We further extend this approach to the task of gesture spotting<br />

where the gesture boundaries are identified using a filler model and gesture completion model. Experimental results<br />

show that the proposed method outperforms state-of-the-art Hidden Conditional Random Fields (HCRF) based methods<br />

and baseline gesture spotting techniques.<br />

- 272 -


09:00-11:10, Paper ThAT9.35<br />

Off-Line Signature Verification using Graphical Model<br />

Lv, Hairong<br />

Bai, Xinxin, IBM Res. – China<br />

Yin, Wenjun, IBM Res. – China<br />

Dong, Jin, IBM Res. – China<br />

In this paper, we propose a novel probabilistic graphical model to address the off-line signature verification problem. Different<br />

from previous work, our approach introduces the concept of feature roles according to their distribution in genuine<br />

and forgery signatures, with all these features represented by a unique graphical model. And we propose several new techniques<br />

to improve the performance of the new signature verification system. Results based on 200 persons’ signatures<br />

(16000 signature samples) indicate that the proposed method outperforms other popular techniques for off-line signature<br />

verification with a great improvement.<br />

09:00-11:10, Paper ThAT9.36<br />

Linear Facial Expression Transfer with Active Appearance Models<br />

De La Hunty, Miles, Australian National Univ.<br />

Asthana, Akshay, Australian National Univ.<br />

Goecke, Roland, Univ. of Canberra<br />

The issue of transferring facial expressions from one person’s face to another’s has been an area of interest for the movie<br />

industry and the computer graphics community for quite some time. In recent years, with the proliferation of online image<br />

and video collections and web applications, such as Google Street View, the question of preserving privacy through face<br />

de-identification has gained interest in the computer vision community. In this paper, we focus on the problem of realtime<br />

dynamic facial expression transfer using an Active Appearance Model framework. We provide a theoretical foundation<br />

for a generalisation of two well-known expression transfer methods and demonstrate the improved visual quality of the<br />

proposed linear extrapolation transfer method on examples of face swapping and expression transfer using the AVOZES<br />

data corpus. Realistic talking faces can be generated in real-time at low computational cost.<br />

09:00-11:10, Paper ThAT9.37<br />

Fractal and Multi-Fractal for Arabic Offline Writer Identification<br />

Chaabouni, Aymen, Univ. of Sfax<br />

Boubaker, Houcine, Univ. of Sfax<br />

Kherallah, Monji, Univ. of Sfax<br />

El Abed, Haikal, Technische Universitat Braunschweig<br />

Alimi, Adel M., Univ. of Sfax<br />

In recent years, fractal and multi-fractal analysis have been widely applied in many domains, especially in the field of<br />

image processing. In this direction we present in this paper a novel method for Arabic text-dependent writer identification<br />

based on fractal and multi-fractal features; thus, from the images of Arabic words, we calculate their fractal dimensions<br />

by using the Box-counting method, then we calculate their multi-fractal dimensions by using the method of DLA (Diffusion<br />

Limited Aggregates). To evaluate our method, we used 50 writers of the ADAB database, each writer wrote 288 words<br />

(24 Tunisian cities repeated 12 times) with 2/3 of words are used for the learning phase and the rest is used for the identification.<br />

The results obtained by using knearest neighbor classifier, demonstrate the effectiveness of our proposed method.<br />

09:00-11:10, Paper ThAT9.38<br />

A Simulation Study on the Generative Neural Ensemble Decoding Algorithms<br />

Kim, Sung-Phil, Korea Univ.<br />

Kim, Min-Ki, Korea Univ.<br />

Park, Gwi-Tae, Korea Univ.<br />

Brain-computer interfaces rely on accurate decoding of cortical activity to understand intended action. Algorithms for<br />

neural decoding can be broadly categorized into two groups: direct versus generative methods. Two generative models,<br />

the population vector algorithm (PVA) and the Kalman filter (KF), have been widely used for many intracortical BCI studies,<br />

where KF generally showed superior decoding to PVA. However, little has been known for which conditions each algorithm<br />

works properly and how KF translates the ensemble information. To address these questions, we performed a<br />

- 273 -


simulation study and demonstrated that KF and PVA worked congruently for uniformly distributed preferred directions<br />

(PDs) whereas KF outperformed PVA for non-uniform PDs. In addition, we showed that KF decoded better than PVA for<br />

low signal-to-noise ratio (SNR) or a small ensemble size. The results suggest that KF may decode direction better than<br />

PVA with non-uniform PDs or with low SNR and small ensemble size.<br />

09:00-11:10, Paper ThAT9.39<br />

3D Active Shape Model for Automatic Facial Landmark Location Trained with Automatically Generated Landmark<br />

Points<br />

Zhou, Dianle, TMSP<br />

Petrovska-Delacretaz, Dijana, Inst. Telecom SudParis (ex GET-INT)<br />

Dorizzi, Bernadette, TELECOM & Management SudParis<br />

In this paper, a 3D Active Shape Model (3DASM) algorithm is presented to automatically locate facial landmarks from different<br />

views. The 3DASM is trained by setting different shape and texture parameters of 3D Morphable Model (3DMM).<br />

Using 3DMM to synthesize training data offers us two advantages: first, few manual operations are need, except labeling<br />

landmarks on the mean face of 3DMM. Second, since the learning data are directly from 3DMM, landmarks have one to one<br />

correspondence between the 2D points detected from the image and 3D points on 3DMM. This kind of correspondence will<br />

benefit 3D face reconstruction processing. During fitting, 3D rotation parameters are added comparing to 2D Active Shape<br />

Model (ASM). So we separate shape variations into intrinsic change (caused by the character of different person) and extrinsic<br />

change (caused by model projection). The experimental results show that our method is robust to pose variation.<br />

09:00-11:10, Paper ThAT9.40<br />

Using Moments on Spatiotemporal Plane for Facial Expression Recognition<br />

Ji, Yi, INSA de Lyon<br />

Idrissi, Khalid, INSA de Lyon<br />

In this paper, we propose a novel approach to capture the dynamic deformation caused by facial expressions. The proposed<br />

method is concentrated on the spatiotemporal plane which is not well explored. It uses the moments as features to describe<br />

the movements of essential components such as eyes and mouth on vertical time plane. The system we developed can automatically<br />

recognize the expression on images as well as on image sequences. The experiments are performed on 348 sequences<br />

from 95 subjects in Cohn-Kanade database and obtained good results as high as 96.1% in 7-class recognition for<br />

frames and 98.5% in 6-class for sequences.<br />

09:00-11:10, Paper ThAT9.41<br />

Towards a More Realistic Appearance-Based Gait Representation for Gender Recognition<br />

Martín-Félez, Raúl, Univ. Jaume I<br />

Mollineda, Ramón A., Univ. Jaume I<br />

Sanchez, J. Salvador, Univ. Jaume I<br />

A realistic appearance-based representation of side-view gait sequences is here introduced. It is based on a prior method<br />

where a set of appearance-based features of a gait sample is used for gender recognition. These features are computed from<br />

parameter values of ellipses that fit body parts enclosed by regions previously defined while ignoring well-known facts of<br />

the human body structure. This work presents an improved regionalization method supported by some adaptive heuristic<br />

rules to better adjust regions to body parts. As a result, more realistic ellipses and a more meaningful feature space are obtained.<br />

Gender recognition experiments conducted on the CASIA Gait Database show better classification results when using<br />

the new features.<br />

09:00-11:10, Paper ThAT9.42<br />

A Calibration-Free Head Gesture Recognition System with Online Capability<br />

Wöhler, Nils-Christian, Bielefeld Univ.<br />

Großekathöfer, Ulf, Bielefeld Univ.<br />

Dierker, Angelika, Bielefeld Univ.<br />

Hanheide, Marc, Univ. of Birmingham<br />

Kopp, Stefan, Bielefeld Univ.<br />

Hermann, Thomas, Bielefeld Univ.<br />

- 274 -


In this paper, we present a calibration-free head gesture recognition system using a motion-sensor-based approach. For<br />

data acquisition we conducted a comprehensive study with 10 subjects. We analyzed the resulting head movement data<br />

with regard to separability and transferability to new subjects. Ordered means models (OMMs) were used for classification,<br />

since they provide an easy-to-use, fast, and stable approach to machine learning of time series. In result, we achieved classification<br />

rates of 85-95% for nodding, head shaking and tilting head gestures and good transferability. Finally, we show<br />

first promising attempts towards online recognition.<br />

09:00-11:10, Paper ThAT9.43<br />

TrajAlign: A Method for Precise Matching of 3-D Trajectories<br />

Aung, Zeyar, Inst. for Infocomm Res. Singapore<br />

Sim, Kelvin, Inst. for Infocomm Res. Singapore<br />

Ng, Wee Siong, Inst. for Infocomm Res. Singapore<br />

Matching two 3-D trajectories is an important task in a number of applications. The trajectory matching problem can be<br />

solved by aligning the two trajectories and taking the alignment score as their similarity measurement. In this paper, we<br />

propose a new method called “TrajAlign” (Trajectory Alignment). It aligns two trajectories by means of aligning their<br />

representative distance matrices. Experimental results show that our method is significantly more precise than the existing<br />

state-of-the-art methods. While the existing methods can provide correct answers in only up to 67% of the test cases, TrajAlign<br />

can offer correct results in 79% (i.e. 12% more) of the test cases, TrajAlign is also computationally inexpensive,<br />

and can be used practically for applications that demand efficiency.<br />

09:00-11:10, Paper ThAT9.44<br />

Real-Time 3D Model based Gesture Recognition for Multimedia Control<br />

Lin, Shih-Yao, National Taiwan Univ.<br />

Lai, Yun-Chien, National Taiwan Univ.<br />

Chan, Li-Wei, National Taiwan Univ.<br />

Hung, Yi-Ping, National Taiwan Univ.<br />

This paper presents a new 3D model-based gesture tracking system for controlling multimedia player in an intuitive way.<br />

The motivation of this paper is to make home appliance aware of user’s intention. This 3D model-based gesture tracking<br />

system adopts a Bayesian framework to track the user’s 3D hand position and to recognize meaning of these postures for<br />

controlling 3D player interactively. To avoid the high dimensionality of the whole 3D upper body model, which may complicate<br />

the gesture tracking problem, our system applies a novel hierarchical tracking algorithm to improve the system<br />

performance. Moreover, this system applies multiple cues for improving the accuracy of tracking results. Based on the<br />

above idea, we have implemented a 3D hand gesture interface for controlling multimedia players. Experimental results<br />

have shown that the proposed system robustly tracks the 3D position of the hand and has high potential for controlling the<br />

multimedia player.<br />

09:00-11:10, Paper ThAT9.45<br />

Motif Discovery and Feature Selection for CRF-Based Activity Recognition<br />

Zhao, Liyue, Univ. of Central Florida<br />

Wang, Xi, Univ. of Central Florida<br />

Sukthankar, Gita, Univ. of Central Florida<br />

Sukthankar, Rahul, Intel Labs Pittsburgh and Carnegie Mellon University<br />

Due to their ability to model sequential data without making unnecessary independence assumptions, conditional random<br />

fields (CRFs) have become an increasingly popular discriminative model for human activity recognition. However, how<br />

to represent signal sensor data to achieve the best classification performance within a CRF model is not obvious. This<br />

paper presents a framework for extracting motif features for CRF-based classification of IMU (inertial measurement unit)<br />

data. To do this, we convert the signal data into a set of motifs, approximately repeated symbolic sub sequences, for each<br />

dimension of IMU data. These motifs leverage structure in the data and serve as the basis to generate a large candidate set<br />

of features from the multi-dimensional raw data. By measuring reductions in the conditional log-likelihood error of the<br />

training samples, we can select features and train a CRF classifier to recognize human activities. An evaluation of our<br />

classifier on the CMU Multi-Modal Activity Database reveals that it outperforms the CRF-classifier trained on the raw<br />

features as well as other standard classifiers used in prior work.<br />

- 275 -


09:00-11:10, Paper ThAT9.46<br />

On-Line Signature Verification using 1-D Velocity-Based Directional Analysis<br />

Muhammad Talal Ibrahim, Ryerson Unviersity<br />

Matthew, Kyan, Ryerson Unviersity<br />

M. Aurangzeb, Khan, COMSATS Inst. of Information Tech.<br />

Ling, Guan, Ryerson Unviersity<br />

In this paper, we propose a novel approach for identity verification based on the directional analysis of velocity-based<br />

partitions of an on-line signature. First, inter-feature dependencies in a signature are exploited by decomposing the shape<br />

(horizontal trajectory, vertical trajectory) into two partitions based on the velocity profile of the base-signature for each<br />

signer, which offers the flexibility of analyzing both low and high-curvature portions of the trajectory independently. Further,<br />

these velocity-based shape partitions are analyzed directionally on the basis of relative angles. Support Vector Machine<br />

(SVM) is then used to find the decision boundary between the genuine and forgery class. Experimental results demonstrate<br />

the superiority of our approach in on-line signature verification in comparison with other techniques.<br />

09:00-11:10, Paper ThAT9.47<br />

Age Classification based on Gait using HMM<br />

Zhang, De, Beihang Univ.<br />

Wang, Yunhong, Beihang Univ.<br />

Bhanu, Bir, Univ. of California<br />

In this paper we propose a new framework for age classification based on human gait using Hidden Markov Model (HMM).<br />

A gait database including young people and elderly people is built. To extract appropriate gait features, we consider a contour<br />

related method in terms of shape variations during human walking. Then the image feature is transformed to a lowerdimensional<br />

space by using the Frame to Exemplar (FED) distance. A HMM is trained on the FED vector sequences.<br />

Thus, the framework provides flexibility in the selection of gait feature representation. In addition, the framework is robust<br />

for classification due to the statistical nature of HMM. The experimental results show that video-based automatic age classification<br />

from human gait is feasible and reliable.<br />

09:00-11:10, Paper ThAT9.48<br />

Human Electrocardiogram for Biometrics using DTW and FLDA<br />

N, Venkatesh, Tata Consultancy Services Innovation Lab.<br />

Jayaraman, Srinivasan, Tata Consultancy Services, Bangalore<br />

This paper proposes a new approach for person identification and novel person authentication using single lead human<br />

Electrocardiogram. Nine Feature parameters were extracted from ECG in spatial domain for classification. For person<br />

identification, Dynamic Time Warping (DTW) and Fisher‘s Linear Discriminant Analysis (FLDA) with K-Nearest Neighbor<br />

Classifier (NNC) as single stage classification yielded a recognition accuracy of 96% and 97% respectively. To further<br />

improve the performance of the system, two stage classification techniques have been adapted. In two stage classifications<br />

FLDA is used with k-NNC at the first stage followed by DTW classifier at the second stage which yielded 100% recognition<br />

accuracy. During person authentication we adapted the QRS complex based threshold technique. The overall performance<br />

of the system was 96% for both legal and intruder situations is verified for MIT-BIH normal database size of 375 recording<br />

from 15 individual ECG.<br />

09:00-11:10, Paper ThAT9.49<br />

Recognizing Sign Language from Brain Imaging<br />

Mehta, Nishant, Georgia Inst. of Tech.<br />

Starner, Thad, Georgia Inst. of Tech.<br />

Moore Jackson, Melody, Georgia Inst. of Tech.<br />

Babalola, Karolyn, Georgia Inst. of Tech.<br />

James, George Andrew, Univ. of Arkansas<br />

Classification of complex motor activities from brain imaging is relatively new in the fields of neuroscience and braincomputer<br />

interfaces (BCIs). We report sign language classification results for a set of three contrasting pairs of signs. Executed<br />

sign accuracy was 93.3%, and imagined sign accuracy was 76.7%. For a full multiclass problem, we used a decision<br />

directed acyclic graph of pairwise support vector machines, resulting in 63.3% accuracy for executed sign and 31.4% ac-<br />

- 276 -


curacy for imagined sign. Pairwise comparison of phrases composed of these signs yielded a mean accuracy of 73.4%.<br />

These results suggest the possibility of BCIs based on sign language.<br />

09:00-11:10, Paper ThAT9.50<br />

American Sign Language Phrase Verification in an Educational Game for Deaf Children<br />

Zafrulla, Zahoor, Georgia Inst. of Tech.<br />

Brashear, Helene, Georgia Inst. of Tech.<br />

Yin, Pei, Georgia Inst. of Tech.<br />

Presti, Peter, Georgia Inst. of Tech.<br />

Starner, Thad, Georgia Inst. of Tech.<br />

Hamilton, Harley, Georgia Inst. of Tech.<br />

We perform real-time American Sign Language (ASL) phrase verification for an educational game, CopyCat, which is<br />

designed to improve deaf children’s signing skills. Taking advantage of context information in the game we verify a phrase,<br />

using Hidden Markov Models (HMMs), by applying a rejection threshold on the probability of the observed sequence for<br />

each sign in the phrase. We tested this approach using 1204 signed phrase samples from 11 deaf children playing the game<br />

during the phase two deployment of CopyCat. The CopyCat data set is particularly challenging because sign samples are<br />

collected during live game play and contain many variations in signing and disfluencies. We achieved a phrase verification<br />

accuracy of 83% compared to 90% real-time performance by a sign linguist. We report on the techniques required to reach<br />

this level of performance.<br />

09:00-11:10, Paper ThAT9.51<br />

A Robust Method for Hand Gesture Segmentation and Recognition using Forward Spotting Scheme in Conditional<br />

Random Fields<br />

Elmezain, Mahmoud, Otto-von-Guericke-Univ. Magdeburg<br />

Al-Hamadi, Ayoub, Otto-von-Guericke-Univ. Magdeburg<br />

Michaelis, Bernd, Otto-von-Guericke-Univ. Magdeburg<br />

This paper proposes a forward spotting method that handles hand gesture segmentation and recognition simultaneously<br />

without time delay. To spot meaningful gestures of numbers (0-9) accurately, a stochastic method for designing a nongesture<br />

model using Conditional Random Fields (CRFs) is proposed without training data. The non-gesture model provides<br />

a confidence measures that are used as an adaptive threshold to find the start and the end point of meaningful gestures.<br />

Experimental results show that the proposed method can successfully recognize isolated gestures with 96.51% and meaningful<br />

gestures with 90.49% reliability.<br />

09:00-11:10, Paper ThAT9.52<br />

Real-Time Upper-Limbs Posture Recognition based on Particle Filters and AdaBoost Algorithms<br />

Fahn, Chin-Shyurng, National Taiwan Univ. of Science and Tech.<br />

Chiang, Sheng-Lung, National Taiwan Univ. of Science and Tech.<br />

In this paper, we employ particle filters to dynamically locate a face and upper-limbs. To prevent from the disturbance<br />

caused by skin color regions, such as other naked parts of a human body, or some skin color-like objects in the background,<br />

we further take the motion cue as a feature during the tracking. Currently, we prescribe eight kinds of upper-limbs postures<br />

with reference to the characteristic of flag semaphore. The advantage is that we can utilize the relative positions of a face<br />

and two hands to recognize the postures easily. To achieve posture recognition, we evaluate three different classifiers using<br />

the machine learning methods: multi-layer perceptrons, support vector machines, and AdaBoost algorithms. The experimental<br />

results reveal that AdaBoost algorithms are the best one, which reach the accuracy rate of recognizing upper-limbs<br />

postures more than 95% and require much less training time than the other two do.<br />

09:00-11:10, Paper ThAT9.53<br />

One-Lead ECG-Based Personal Identification using Ziv-Merhav Cross Parsing<br />

Pereira Coutinho, David, Inst. Superior de Engenharia de Lisboa<br />

Fred, Ana Luisa Nobre, Inst. Superior Técnico<br />

Figueiredo, Mario A. T., Inst. Superior Técnico<br />

- 277 -


The advance of falsification technology increases security concerns and gives biometrics an important role in security solutions.<br />

The electrocardiogram (ECG) is an emerging biometric that does not need liveliness verification. There is strong<br />

evidence that ECG signals contain sufficient discriminative information to allow the identification of individuals from a<br />

large population. Most approaches rely on ECG data and the fiducia of different parts of the heartbeat waveform. However<br />

non-fiducial approaches have proved recently to be also effective, and have the advantage of not relying critically on the<br />

accurate extraction of fiducia data. In this paper, we propose a new % NEW DAV non-fiducial ECG biometric identification<br />

method based on data compression techniques, namely the Ziv-Merhav cross parsing algorithm for symbol sequences<br />

(strings). Our method relies on a string similarity measure derived from algorithmic cross complexity concept and its compression-based<br />

approximation. NEW DAV We present results on real data, one-lead ECG, acquired during a concentration<br />

task, from 19 healthy individuals. Our approach achieves 100% subject recognition rate despite the existence of differentiated<br />

stress states.<br />

09:00-11:10, Paper ThAT9.54<br />

Multimodal Human Computer Interaction with MIDAS Intelligent Infokiosk<br />

Karpov, Alexey, Russian Acad. of Sciences<br />

Ronzhin, Andrey, Russian Acad. of Sciences<br />

Kipyatkova, Irina, Russian Acad. of Sciences<br />

Ronzhin, Alexander, Russian Acad. of Sciences<br />

Akarun, Lale, Bogazici Univ.<br />

In this paper, we present an intelligent information kiosk called MIDAS (Multimodal Interactive-Dialogue Automaton for<br />

Self-service), including its hardware and software architecture, stages of deployment of speech recognition and synthesis<br />

technologies. MIDAS uses the methodology Wizard of Oz (WOZ) that allows an expert to correct speech recognition<br />

results and control the dialogue flow. User statistics of the multimodal human computer interaction (HCI) have been analyzed<br />

for the operation of the kiosk in the automatic and automated modes. The infokiosk offers information about the<br />

structure and staff of laboratories, the location and phones of departments and employees of the institution. The multimodal<br />

user interface is provided with a touch screen, natural speech input and head and manual gestures, both for ordinary and<br />

physically handicapped users.<br />

09:00-11:10, Paper ThAT9.55<br />

View Invariant Body Pose Estimation based on Biased Manifold Learning<br />

Hur, Dongcheol, Korea Univ.<br />

Lee, Seong-Whan, Korea Univ.<br />

Wallraven, Christian, MPI for Biological Cybernetics<br />

In human body pose estimation, manifold learning is a popular technique for reducing the dimension of 2D images and<br />

3D body configuration data. This technique, however, is especially vulnerable to silhouette variation such as caused by<br />

viewpoint changes. In this paper, we propose a novel approach that combines three separate manifolds for representing<br />

variations in viewpoint, pose and 3D body configuration. We use biased manifold learning to learn these manifolds with<br />

appropriately weighted distances. A set of four mapping functions are then learned by a generalized regression neural network<br />

for added robustness. Despite using only three manifolds, we show that this method can reliably estimate 3D body<br />

poses from 2D images with all learned viewpoints.<br />

09:00-11:10, Paper ThAT9.56<br />

Visual Gaze Estimation by Joint Head and Eye Information<br />

Valenti, Roberto, Univ. of Amsterdam<br />

Lablack, Adel, UMR USTL/CNRS 8022<br />

Sebe, Nicu, Univ. of Trento<br />

Djeraba, Chabane, UMR USTL/CNRS 8022<br />

Gevers, Theo, Univ. of Amsterdam<br />

In this paper, we present an unconstrained visual gaze estimation system. The proposed method extracts the visual field<br />

of view of a person looking at a target scene in order to estimate the approximate location of interest (visual gaze). The<br />

novelty of the system is the joint use of head pose and eye location information to fine tune the visual gaze estimated by<br />

the head pose only, so that the system can be used in multiple scenarios. The improvements obtained by the proposed approach<br />

are validated using the Boston University head pose dataset, on which the standard deviation of the joint visual<br />

- 278 -


gaze estimation improved by 61:06% horizontally and 52:23% vertically with respect to the gaze estimation obtained by<br />

the head pose only. A user study shows the potential of the proposed system.<br />

09:00-11:10, Paper ThAT9.57<br />

Discrimination of Moderate and Acute Drowsiness based on Spontaneous Facial Expressions<br />

Vural, Esra, Univ. of California San Diego<br />

Bartlett, Marian Stewart, Univ. of California San Diego<br />

Littlewort, Gwen, Univ. of California San Diego<br />

Cetin, Mujdat, Sabanci Univ.<br />

Ercil, Aytul, Sabanci Univ.<br />

Movellan, Javier, Univ. of California San Diego<br />

It is important for drowsiness detection systems to identify different levels of drowsiness and respond appropriately at<br />

each level. This study explores how to discriminate moderate from acute drowsiness by applying computer vision techniques<br />

to the human face. In our previous study, spontaneous facial expressions measured through computer vision techniques<br />

were used as an indicator to discriminate alert from acutely drowsy episodes. In this study we are exploring which<br />

facial muscle movements are predictive of moderate and acute drowsiness. The effect of temporal dynamics of action<br />

units on prediction performances is explored by capturing temporal dynamics using an over complete representation of<br />

temporal Gabor Filters. In the final system we perform feature selection to build a classifier that can discriminate moderate<br />

drowsy from acute drowsy episodes. The system achieves a classification rate of .96 A’ in discriminating moderately<br />

drowsy versus acutely drowsy episodes. Moreover the study reveals new information in facial behavior occurring during<br />

different stages of drowsiness.<br />

11:10-12:10, ThPL1 Anadolu Auditorium<br />

J.K. Aggarwal Prize Lecture:<br />

Scene and Object Recognition in Context<br />

Antonio Torralba Plenary Session<br />

Computer Science and Artificial Intelligence Laboratory<br />

Dept. of Electrical Engineering and Computer Science<br />

MIT, USA<br />

Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been<br />

much progress and there are already object recognition systems operating in commercial products. Most of the algorithms<br />

for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions<br />

with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem<br />

by brute force. However, in the real world, objects tend to co-vary with other objects, providing a rich collection of<br />

contextual associations. These contextual associations can be used to reduce the search space by looking only in places in<br />

which the object is expect to be; this also increases performance, by rejecting image patterns that appear to look like the<br />

target object but that are in unlikely places.<br />

As the field moves into integrated systems that try to recognize many object classes and learn about contextual relationships<br />

between objects, the lack of large annotated datasets hinders the fast development of robust solutions. In this talk I will<br />

describe recent work on visual scene understanding that try to build integrated models for scene and object recognition,<br />

emphasizing the power of large database of annotated images in computer vision.<br />

ThBT1 Marmara Hall<br />

Object Detection and Recognition - V Regular Session<br />

Session chair: Wang, Yunhong (Beihang Univ.)<br />

13:30-13:50, Paper ThBT1.1<br />

Finding Multiple Object Instances with Occlusion<br />

Guo, Ge, Chinese Acad. of Sciences<br />

Jiang, Tingting, Peking Univ.<br />

Wang, Yizhou, School of EECS, Peking<br />

Gao, Wen, Peking Univ.<br />

- 279 -


In this paper we provide a framework of detection and localization of multiple similar shapes or object instances from an<br />

image based on shape matching. There are three challenges about the problem. The first is the basic shape matching<br />

problem about how to find the correspondence and transformation between two shapes; second how to match shapes under<br />

occlusion; and last how to recognize and locate all the matched shapes in the image. We solve these problems by using<br />

both graph partition and shape matching in a global optimization framework. A Hough-like collaborative voting is adopted,<br />

which provides a good initialization, data-driven information, and plays an important role in solving the partial matching<br />

problem due to occlusion. Experiments demonstrate the efficiency of our method.<br />

13:50-14:10, Paper ThBT1.2<br />

Bag of Hierarchical Co-Occurrence Features for Image Classification<br />

Kobayashi, Takumi, National Inst. of Advanced Industrial Science and<br />

Otsu, Nobuyuki, National Inst. of Advanced Industrial Science and<br />

We propose a bag-of-hierarchical-co-occurrence features method incorporating hierarchical structures for image classification.<br />

Local co-occurrences of visual words effectively characterize the spatial alignment of objects‘ components. The<br />

visual words are hierarchically constructed in the feature space, which helps us to extract higher-level words and to avoid<br />

quantization error in assigning the words to descriptors. For extracting descriptors, we employ two types of features hierarchically:<br />

narrow (local) descriptors, like SIFT [1], and broad descriptors based on co-occurrence features. The proposed<br />

method thus captures the co-occurrences of both small and large components. We conduct an experiment on image classification<br />

by applying the method to the Caltech 101 dataset and show the favorable performance of the proposed method.<br />

14:10-14:30, Paper ThBT1.3<br />

Person Detection using Temporal and Geometric Context with a Pan Tilt Zoom Camera<br />

Del Bimbo, Alberto, Univ. of Florence<br />

Lisanti, Giuseppe, Univ. of Florence<br />

Masi, Iacopo, Univ. of Florence<br />

Pernici, Federico, Univ. of Florence<br />

In this paper we present a system that integrates automatic camera geometry estimation and object detection from a Pan<br />

Tilt Zoom camera. We estimate camera pose with respect to a world scene plane in real-time and perform human detection<br />

exploiting the relative space-time context. Using camera self-localization, 2D object detections are clustered in a 3D world<br />

coordinate frame. Target scale inference is further exploited to reduce the number of false alarms and to increase also the<br />

detection rate in the final non-maximum suppression stage. Our integrated system applied on real-world data shows superior<br />

performance with respect to the standard detector used.<br />

14:30-14:50, Paper ThBT1.4<br />

Disparity Map Refinement for Video based Scene Change Detection using a Mobile Stereo Camera Platform<br />

Haberdar, Hakan, Univ. of Houston<br />

Shah, Shishir, Univ. of Houston<br />

This paper presents a novel disparity map refinement method and vision based surveillance framework for the task of detecting<br />

objects of interest in dynamic outdoor environments from two stereo video sequences taken at different times and<br />

from different viewing angles by a mobile camera platform. The proposed framework includes several steps, the first of<br />

which computes disparity maps of the same scene in two video sequences. Preliminary disparity images are refined based<br />

on estimated disparities in neighboring frames. Segmentation is performed to estimate ground planes, which in turn are<br />

used for establishing spatial registration between the two video sequences. Finally, the regions of change are detected<br />

using the combination of texture and intensity gradient features. We present experiments on detection of objects of different<br />

sizes and textures in real videos.<br />

14:50-15:10, Paper ThBT1.5<br />

Using Symmetry to Select Fixation Points for Segmentation<br />

Kootstra, Gert, KTH<br />

Bergström, Niklas, Royal Inst. of Tech.<br />

Kragic, Danica, KTH<br />

- 280 -


For the interpretation of a visual scene, it is important for a robotic system to pay attention to the objects in the scene and<br />

segment them from their background. We focus on the segmentation of previously unseen objects in unknown scenes. The<br />

attention model therefore needs to be bottom-up and context-free. In this paper, we propose the use of symmetry, one of<br />

the Gestalt principles for figure-ground segregation, to guide the robot’s attention. We show that our symmetry-saliency<br />

model outperforms the contrast-saliency model, proposed in (Itti et al 1998). The symmetry model performs better in finding<br />

the objects of interest and selects a fixation point closer to the center of the object. Moreover, the objects are better<br />

segmented from the background when the initial points are selected on the basis of symmetry.<br />

ThBT2 Anadolu Auditorium<br />

Classification - II Regular Session<br />

Session chair: Pelillo, Marcello (Ca’Foscari Univ.)<br />

13:30-13:50, Paper ThBT2.1<br />

Data Classification on Multiple Manifolds<br />

Xiao, Rui, Shanghai Jiao Tong Univ.<br />

Zhao, Qijun, The Hong Kong Pol. Univ.<br />

Zhang, David, The Hong Kong Pol. Univ.<br />

Shi, Pengfei, Shanghai Jiao Tong Univ.<br />

Unlike most previous manifold-based data classification algorithms assume that all the data points are on a single manifold,<br />

we expect that data from different classes may reside on different manifolds of possible different dimensions. Therefore,<br />

better classification accuracy would be achieved by modeling the data by multiple manifolds each corresponding to a<br />

class. To this end, a general framework for data classification on multiple manifolds is presented. The manifolds are firstly<br />

learned for each class separately, and a stochastic optimization algorithm is then employed to get the near optimal dimensionality<br />

of each manifold from the classification viewpoint. Then, classification is performed under a newly defined minimum<br />

reconstruction error based classifier. Our method could be easily extended by involving various manifold learning<br />

methods and searching strategies. Experiments on both synthetic data and databases of facial expression images show the<br />

effectiveness of the proposed multiple manifold based approach.<br />

13:50-14:10, Paper ThBT2.2<br />

Unsupervised Ensemble Ranking: Application to Large-Scale Image Retrieval<br />

Lee, Jung-Eun, Michigan State Univ.<br />

Jin, Rong, Michigan State Univ.<br />

Jain, Anil, Michigan State Univ.<br />

The continued explosion in the growth of image and video databases makes automatic image search and retrieval an extremely<br />

important problem. Among the various approaches to Content-based Image Retrieval (CBIR), image similarity<br />

based on local point descriptors has shown promising performance. However, this approach suffers from the scalability<br />

problem. Although bag-of-words model resolves the scalability problem, it suffers from loss in retrieval accuracy. We circumvent<br />

this performance loss by an ensemble ranking approach in which rankings from multiple bag-of-words models<br />

are combined to obtain more accurate retrieval results. An unsupervised algorithm is developed to learn the weights for<br />

fusing the rankings from multiple bag-of-words models. Experimental results on a database of 100,000 images show that<br />

this approach is both efficient and effective in finding visually similar images.<br />

14:10-14:30, Paper ThBT2.3<br />

Cross Entropy Optimization of the Random Set Framework for Multiple Instance Learning<br />

Bolton, Jeremy, Univ. of Florida<br />

Gader, Paul, Univ. of Florida<br />

Multiple instance learning (MIL) is a recently researched technique used for learning a target concept in the presence of<br />

noise. Previously, a random set framework for multiple instance learning (RSF-MIL) was proposed; however, the proposed<br />

optimization strategy did not permit the harmonious optimization of model parameters. A cross entropy, based optimization<br />

strategy is proposed. Experimental results on synthetic examples, benchmark and landmine data sets illustrate the benefits<br />

of the proposed optimization strategy.<br />

- 281 -


14:30-14:50, Paper ThBT2.4<br />

A Constant Average Time Algorithm to Allow Insertions in the LAESA Fast Nearest Neighbour Search Index<br />

Oncina, Jose, Univ. de Alicante<br />

Micó, Luisa, Univ. de Alicante<br />

Nearest Neighbour search is a widely used technique in Pattern Recognition. In order to speed up the search many indexing<br />

techniques have been proposed. However, most of the proposed techniques are static, that is, once the index is built the<br />

incorporation of new data is not possible unless a costly rebuilt of the index is performed. The main effect is that changes<br />

in the environment are very costly to be taken into account. In this work, we propose a technique to allow the insertion of<br />

elements in the LAESA index. The resulting index is exactly the same as the one that would be obtained by building it<br />

from scratch. In this paper we also obtain an upper bound for its expected running time. Surprisingly, this bound is independent<br />

of the database size.<br />

14:50-15:10, Paper ThBT2.5<br />

Feature Extraction from Discrete Attributes<br />

Yildiz, Olcay Taner, Isik Univ.<br />

In many pattern recognition applications, first decision trees are used due to their simplicity and easily interpretable nature.<br />

In this paper, we extract new features by combining k discrete attributes, where for each subset of size k of the attributes,<br />

we generate all orderings of values of those attributes exhaustively. We then apply the usual univariate decision tree classifier<br />

using these orderings as the new attributes. Our simulation results on 16 datasets from UCI repository show that the<br />

novel decision tree classifier performs better than the proper in terms of error rate and tree complexity. The same idea can<br />

also be applied to other univariate rule learning algorithms such as C4.5 Rules and Ripper.<br />

ThBT3 Topkapı Hall A<br />

Computer Vision Applications - II Regular Session<br />

Session chair: Foggia, Pasquale (Univ. di Salerno)<br />

13:30-13:50, Paper ThBT3.1<br />

Fire-Flame Detection based on Fuzzy Finite Automation<br />

Ko, Byoungchul, Keimyung Univ.<br />

Ham, Seoun-Jae, Keimyung Univ.<br />

Nam, Jaeyeal, Keimyung Univ.<br />

This paper proposes a new fire-flame detection method using probabilistic membership function of visual features and<br />

Fuzzy Finite Automata (FFA). First, moving regions are detected by analyzing the background subtraction and candidate<br />

flame regions then identified by applying flame color models. Since flame regions generally have an irregular pattern continuously,<br />

membership functions of variance of intensity, wavelet energy and motion orientation are generate and applied<br />

to FFA. Since FFA combines the capabilities of automata with fuzzy logic, it not only provides a systemic approach to<br />

handle uncertainty in computational systems, but also can handle continuous spaces. The proposed algorithm is successfully<br />

applied to various fire videos and shows a better detection performance when compared with other methods<br />

13:50-14:10, Paper ThBT3.2<br />

Extrinsic Camera Parameter Estimation using Video Images and GPS Considering GPS Positioning Accuracy<br />

Kume, Hideyuki, Nara Inst. of Science and Tech.<br />

Taketomi, Takafumi, Nara Inst. of Science and Tech.<br />

Sato, Tomokazu, Nara Inst. of Science and Tech.<br />

Yokoya, Naokazu, Nara Inst. of Science and Tech.<br />

This paper proposes a method for estimating extrinsic camera parameters using video images and position data acquired<br />

by GPS. In conventional methods, the accuracy of the estimated camera position largely depends on the accuracy of GPS<br />

positioning data because they assume that GPS position error is very small or normally distributed. However, the actual<br />

error of GPS positioning easily grows to the 10m level and the distribution of these errors is changed depending on satellite<br />

positions and conditions of the environment. In order to achieve more accurate camera positioning in outdoor environments,<br />

in this study, we have employed a simple assumption that true GPS position exists within a certain range from the observed<br />

GPS position and the size of the range depends on the GPS positioning accuracy. Concretely, the proposed method estimates<br />

- 282 -


camera parameters by minimizing an energy function that is defined by using the reprojection error and the penalty term<br />

for GPS positioning.<br />

14:10-14:30, Paper ThBT3.3<br />

Combining Monocular and Stereo Cues for Mobile Robot Localization using Visual Words<br />

Fraundorfer, Friedrich, ETH Zurich<br />

Wu, Changchang, UNC-Chapel Hill<br />

Pollefeys, Marc,<br />

This paper describes an approach for mobile robot localization using a visual word based place recognition approach. In<br />

our approach we exploit the benefits of a stereo camera system for place recognition. Visual words computed from SIFT<br />

features are combined with VIP (viewpoint invariant patches) features that use depth information from the stereo setup.<br />

The approach was evaluated under the ImageCLEF@<strong>ICPR</strong> <strong>2010</strong> competition. The results achieved on the competition<br />

datasets are published in this paper.<br />

14:30-14:50, Paper ThBT3.4<br />

Fast Derivation of Soil Surface Roughness Parameters using Multi-Band SAR Imagery and the Integral Equation<br />

Model<br />

Seppke, Benjamin, Univ. of Hamburg<br />

Dreschler-Fischer, Leonie, Univ. of Hamburg<br />

Heiming, Jo-Ann, Univ. of Hamburg<br />

Wengenroth, Felix, Univ. of Hamburg<br />

The Integral Equation Model (IEM) predicts the normalized radar cross section (NRCS) of dielectric surfaces given surface<br />

and radar parameters. To derive the surface parameters from the NRCS using the IEM, the model needs to be inverted. We<br />

present a fast method of this model inversion to derive soil surface roughness parameters from synthetic aperture radar<br />

(SAR) remote sensing data. The model inversion is based on two different collocated SAR images of different bands, the<br />

derivation of the parameters cannot be done using one band alone. The computation of the model and the model inversion<br />

are very time consuming tasks and therefore may be impractical for large remote sensing data. We present an approach<br />

that is based on a few model assumptions to speed up the computation of the surface parameters. We applied the algorithm<br />

to detect the correlation length of the surface for dry-fallen areas in the World Cultural Heritage Wadden Sea, a coastal<br />

tidal flat at the German Bight (North Sea). The results are very promising and may be used for a classification of the area<br />

in future steps.<br />

14:50-15:10, Paper ThBT3.5<br />

Social Network Approach to Analysis of Soccer Game<br />

Park, Kyoung-Jin, The Ohio State Univ.<br />

Yilmaz, Alper, The Ohio State Univ.<br />

Video understanding has been an active area of research, where many articles have been published on how to detect and<br />

track objects in videos, and how to analyze their trajectories. These methods, however, only provided heuristic low level<br />

information without providing a higher level understanding of global relations within the whole context. This paper presents<br />

a new way to provide such understanding using social network approach in soccer videos. Our approach considers representing<br />

interactions between the objects in the video as a social network. This network is then analyzed by detecting small<br />

communities using modularity, which relates social interaction. Additionally, we analyze the centrality of nodes which<br />

provides importance of individuals composing the network. In particular, we introduce five centralities exploiting directed<br />

and weighted social network. The partitions of the resulting social network are shown to relate to clusters of soccer players<br />

with respect to their role in the game.<br />

ThBT4 Dolmabahçe Hall B<br />

Image Segmentation - II Regular Session<br />

Session chair: Farag, Aly A. (Univ. of Louisville)<br />

13:30-13:50, Paper ThBT4.1<br />

Robust Foreground Object Segmentation via Adaptive Region-Based Background Modelling<br />

Reddy, Vikas, NICTA, The Univ. of Queensland<br />

Sanderson, Conrad, NICTA<br />

- 283 -


Lovell, Brian Carrington, The Univ. of Queensland<br />

We propose a region-based foreground object segmentation method capable of dealing with image sequences containing<br />

noise, illumination variations and dynamic backgrounds (as often present in outdoor environments). The method utilises<br />

contextual spatial information through analysing each frame on an overlapping block by-block basis and obtaining a lowdimensional<br />

texture descriptor for each block. Each descriptor is passed through an adaptive multi-stage classifier, comprised<br />

of a likelihood evaluation, an illumination invariant measure, and a temporal correlation check. The overlapping of<br />

blocks not only ensures smooth contours of the foreground objects but also effectively minimises the number of false positives<br />

in the generated foreground masks. The parameter settings are robust against wide variety of sequences and postprocessing<br />

of foreground masks is not required. Experiments on the challenging I2R dataset show that the proposed method<br />

obtains considerably better results (both qualitatively and quantitatively) than methods based on Gaussian mixture models<br />

(GMMs), feature histograms, and normalised vector distances. On average, the proposed method achieves 36% more accurate<br />

foreground masks than the GMM based method.<br />

13:50-14:10, Paper ThBT4.2<br />

Flooding and MRF-Based Algorithms for Interactive Segmentation<br />

Grinias, Ilias, Univ. of Crete<br />

Komodakis, Nikos, Univ. of Crete<br />

Tziritas, G., Univ. of Crete<br />

We propose a method for interactive colour image segmentation. The goal is to detect an object from the background,<br />

when some markers on object(s) and the background are given. As features only probability distributions of the data are<br />

used. At first, all the labelled seeds are independently propagated for obtaining homogeneous connected components for<br />

each of them. Then the image is divided in blocks, which are classified according to their probabilistic distance from the<br />

classified regions. A topographic surface for each class is obtained, using Bayesian dissimilarities and a min-max criterion.<br />

Two algorithms are proposed: a regularized classification based on the topographic surface and incorporating an MRF<br />

model, and a priority multi-label flooding algorithm. Segmentation results on the LHI data set are presented.<br />

14:10-14:30, Paper ThBT4.3<br />

Steerable Filtering using Novel Circular Harmonic Functions with Application to Edge Detection<br />

Papari, Giuseppe, Univ. of Groningen<br />

Campisi, Patrizio, Univ. degli Studi Roma TRE<br />

Petkov, N, Univ. of Groningen<br />

In this paper, we perform approximate steering of the elongated 2D Hermite-Gauss functions with respect to rotations and<br />

provide a compact analytical expressions for the related basis functions. A special notation introduced here considerably<br />

simplifies the derivation and unifies the cases of even and odd indices. The proposed filters are applied to edge detection.<br />

Quantitative analysis shows a performance increase of about 12.5% in terms of the Pratt’s figure of merit with respect to<br />

the well-established Gaussian gradient proposed by Canny.<br />

14:30-14:50, Paper ThBT4.4<br />

3D Vertebral Body Segmentation using Shape based Graph Cuts<br />

Aslan, Melih Seref, Univ. of Louisville<br />

Ali, Asem, Univ. of Louisville<br />

Farag, Aly A., Univ. of Louisville<br />

Rara, Ham, Univ. of Louisville<br />

Arnold, Ben, Image Analysis, Inc.<br />

Ping, Xiang, Image Analysis, Inc.<br />

Bone mineral density (BMD) measurements and fracture analysis of the spine bones are restricted to the Vertebral bodies<br />

(VBs). In this paper, we propose a novel 3D shape based method to segment VBs in clinical computed tomography (CT)<br />

images without any user intervention. The proposed method depends on both image appearance and shape information.<br />

3D shape information is obtained from a set of training data sets. Then, we estimate the shape variations using a distance<br />

probabilistic model which approximates the marginal densities of the VB and background in the variability region. To<br />

segment a VB, the Matched filter is used to detect the VB region automatically. We align the detected volume with 3D<br />

shape prior in order to be used in distance probabilistic model. Then, the graph cuts method which integrates the linear<br />

- 284 -


combination of Gaussians (LCG), Markov Gibbs Random Field (MGRF), and distance probabilistic model obtained from<br />

3D shape prior is used. Experiments on the data sets show that the proposed segmentation approach is more accurate than<br />

other known alternatives.<br />

14:50-15:10, Paper ThBT4.5<br />

Locally Deformable Shape Model to Improve 3D Level Set based Esophagus Segmentation<br />

Kurugol, Sila, Northeastern Univ.<br />

Ozay, Necmiye, Northeastern Univ.<br />

Dy, Jennifer G., Northeastern Univ.<br />

Sharp, Gregory C., Mass. General Hospital and Harvard Medical School<br />

Brooks, Dana H., Northeastern Univ.<br />

In this paper we propose a supervised 3D segmentation algorithm to locate the esophagus in thoracic CT scans using a<br />

variational framework. To address challenges due to low contrast, several priors are learned from a training set of segmented<br />

images. Our algorithm first estimates the centerline based on a spatial model learned at a few manually marked anatomical<br />

reference points. Then an implicit shape model is learned by subtracting the centerline and applying PCA to these shapes.<br />

To allow local variations in the shapes, we propose to use nonlinear smooth local deformations. Finally, the esophageal<br />

wall is located within a 3D level set framework by optimizing a cost function including terms for appearance, the shape<br />

model, smoothness constraints and an air/contrast model.<br />

ThBT5 Topkapı Hall B<br />

3D Face Recognition Regular Session<br />

Session chair: Li, Stan Z. (CASIA)<br />

13:30-13:50, Paper ThBT5.1<br />

3D Face Recognition by Deforming the Normal Face<br />

Li, Xiaoli, Southeast Univ.<br />

Da, Feipeng, Southeast Univ.<br />

3D face recognition is complicated by the presence of expression variation. In this paper, we present an automatic 3D face<br />

recognition method which can differentiate the expression deformations from the interpersonal differences and recognize<br />

faces with expressions being removed. The deformations caused by expression and interpersonal difference are firstly<br />

learnt from training set, respectively. Then the deformations are linearly combined to synthesize new face with certain expression.<br />

When a target face comes in, the synthesized face is used to match it by adjusting the coefficients in the linear<br />

combination. After the matching process, coefficients corresponding to the interpersonal differences are chosen as features<br />

for recognition. We perform experiments on the FRGC v2.0 database and good performance is obtained.<br />

13:50-14:10, Paper ThBT5.2<br />

Real-Time 3D Face and Facial Action Tracking using Extended 2D+3D AAMs<br />

Zhou, Mingcai, Chinese Acad. of Sciences<br />

Wang, Yangsheng, Chinese Acad. of Sciences<br />

Huang, Xiangsheng, Chinese Acad. of Sciences<br />

In this work, we address the problem of tracking three dimensional (3D) faces and facial actions in video sequences. The<br />

main contributions of the paper are as follows. First, we develop an extended 2D+3D Active Appearance Models (AAM)<br />

based 3D face and facial action tracking framework using 2D view-based AAMs and a modified 3D face model. Second,<br />

we develop a robust shape initialization method based on local feature matching to track fast face motion. Experiments<br />

evaluating the effectiveness of the proposed algorithm are reported.<br />

14:10-14:30, Paper ThBT5.3<br />

A Novel Face Recognition Approach using a 2D-3D Searching Strategy<br />

Dahm, Nicholas, NICTA<br />

Gao, Yongsheng, Griffith Univ.<br />

Many Face Recognition techniques focus on 2D-2D comparison or 3D-3D comparison, however few techniques explore<br />

- 285 -


the idea of cross-dimensional comparison. This paper presents a novel face recognition approach that implements crossdimensional<br />

comparison to solve the issue of pose invariance. Our approach implements a Gabor representation during<br />

comparison to allow for variations in texture, illumination, expression and pose. Kernel scaling is used to reduce comparison<br />

time during the branching search, which determines the facial pose of input images. The conducted experiments prove<br />

the viability of this approach, with our larger kernel experiments returning 91.6% - 100% accuracy on a database comprised<br />

of both local data, and data from the USF Human ID 3D database.<br />

14:30-14:50, Paper ThBT5.4<br />

Initialization and Pose Alignment in Active Shape Model<br />

Xiong, Pengfei, Chinese Acad. of Sciences<br />

Lei, Huang, Chinese Acad. of Sciences<br />

Liu, Changping, Chinese Acad. of Sciences<br />

In this paper, we propose a new algorithm for shape initialization and 3D pose alignment in Active Shape Model (ASM).<br />

Instead of initializing with average shape in previous works, we build a scatter data interpolation model from key points<br />

to obtain the initial shape, which ensures shape initialized around face organs. These key points are chosen from organs<br />

of face shape and located with a strong classifier firstly. Then they are utilized to build a Radial Basis Function (RBF)<br />

model to deform the average shape as initial shape. Besides, to cope with variety face poses, we define a 3D general shape<br />

to align face shapes in 3D instead of 2D alignment in Classic ASM. With the accurate 3D rotation angles iteratively calculated<br />

by Levenberg-Marquardt (LM) algorithm, shapes can be aligned to standard shape more reliably. Experiments<br />

and comparisons on FERET show that both shape initialization and 3D pose alignment of our algorithm greatly improve<br />

the location accuracy.<br />

14:50-15:10, Paper ThBT5.5<br />

3D Face Reconstruction using a Single or Multiple Views<br />

Choi, Jongmoo, Univ. of Southern California<br />

Medioni, Gerard, Univ. of Southern California<br />

Lin, Yuping, Univ. of Southern California<br />

Silva, Luciano, Univ. Federal do Parana<br />

Bellon, Olga Regina Pereira, Univ. Federal do Parana<br />

Pamplona, Mauricio, Univ. Federal do Parana<br />

Faltemier, Timothy, Progeny Systems<br />

We present a 3D face reconstruction system that takes as input either one single view or several different views. Given a<br />

facial image, we first classify the facial pose into one of five predefined poses, then detect two anchor points that are then<br />

used to detect a set of predefined facial landmarks. Based on these initial steps, for a single view we apply a warping<br />

process using a generic 3D face model to build a 3D face. For multiple views, we apply sparse bundle adjustment to reconstruct<br />

3D landmarks which are used to deform the generic 3D face model. Experimental results on the Color FERET<br />

and CMU multi-PIE databases confirm our framework is effective in creating realistic 3D face models that can be used in<br />

many computer vision applications, such as 3D face recognition at a distance.<br />

ThBT6 Dolmabahçe Hall A<br />

Text Analysis and Detection Regular Session<br />

Session chair: Kholmatov, Alisher (TUBITAK UEKAE)<br />

13:30-13:50, Paper ThBT6.1<br />

Text Detection using Edge Gradient and Graph Spectrum<br />

Zhang, Jing, Univ. of South Florida<br />

Kasturi, Rangachar, Univ. of South Florida<br />

In this paper, we propose a new unsupervised text detection approach which is based on Histogram of Oriented Gradient<br />

and Graph Spectrum. By investigating the properties of text edges, the proposed approach first extracts text edges from<br />

an image and localize candidate character blocks using Histogram of Oriented Gradients, then Graph Spectrum is utilized<br />

to capture global relationship among candidate blocks and cluster candidate blocks into groups to generate bounding boxes<br />

of text objects in the image. The proposed method is robust to the color and size of text. ICDAR 2003 text locating dataset<br />

- 286 -


and video frames were used to evaluate the performance of the proposed approach. Experimental results demonstrated the<br />

validity of our approach.<br />

13:50-14:10, Paper ThBT6.2<br />

Scene Text Extraction with Edge Constraint and Text Collinearity<br />

Lee, Seonghun, KAIST<br />

Cho, MinSu, KAIST<br />

Jung, Kyomin, KAIST<br />

Kim, Jin Hyung, KAIST<br />

In this paper, we propose a framework for isolating text regions from natural scene images. The main algorithm has two<br />

functions: it generates text region candidates, and it verifies of the label of the candidates (text or non-text). The text region<br />

candidates are generated through a modified K-means clustering algorithm, which references texture features, edge information<br />

and color information. The candidate labels are then verified in a global sense by the Markov Random Field model<br />

where collinearity weight is added as long as most texts are aligned. The proposed method achieves reasonable accuracy<br />

for text extraction from moderately difficult examples from the ICDAR 2003 database.<br />

14:10-14:30, Paper ThBT6.3<br />

Typographical Features for Scene Text Recognition<br />

Weinman, Jerod, Grinnell Coll.<br />

Scene text images feature an abundance of font style variety but a dearth of data in any given query. Recognition methods<br />

must be robust to this variety or adapt to the query data’s characteristics. To achieve this, we augment a semi-Markov<br />

model—-integrating character segmentation and recognition—-with a bigram model of character widths. Softly promoting<br />

segmentations that exhibit font metrics consistent with those learned from examples, we use the limited information available<br />

while avoiding error-prone direct estimates and hard constraints. Incorporating character width bigrams in this fashion<br />

improves recognition on low-resolution images of signs containing text in many fonts.<br />

14:30-14:50, Paper ThBT6.4<br />

A Visual Attention based Approach to Text Extraction<br />

Sun, Qiaoyu, HuaihaiInstitute of Tech.<br />

Lu, Yue, East China Normal Univ.<br />

Sun, Shiliang, East China Normal Univ.<br />

A visual attention based approach is proposed to extract texts from complicated background in camera-based images.<br />

First, it applies the simplified visual attention model to highlight the region of interest (ROI) in an input image and to<br />

yield a map, named the VA map, consisting of the ROIs. Second, an edge map of image containing the edge information<br />

of four directions is obtained by Sobel operators. Character areas are detected by connected component analysis and<br />

merged into candidate text regions. Finally, the VA map is employed to confirm the candidate text regions. The experimental<br />

results demonstrate that the proposed method can effectively extract text information and locate text regions contained in<br />

camera-based images. It is robust not only for font, size, color, language, space, alignment and complexity of background,<br />

but also for perspective distortion and skewed texts embedded in images.<br />

14:50-15:10, Paper ThBT6.5<br />

New Wavelet and Color Features for Text Detection in Video<br />

Palaiahnakote, Shivakumara, National Univ. of Singapore<br />

Phan, Trung Quy, National Univ. of Singapore<br />

Tan, Chew-Lim, National Univ. of Singapore<br />

Automatic text detection in video is an important task for efficient and accurate indexing and retrieval of multimedia data<br />

such as events identification, events boundary identification etc. This paper presents a new method comprising of wavelet<br />

decomposition and color features namely R, G and B. The wavelet decomposition is applied on three color bands separately<br />

to obtain three high frequency sub bands (LH, HL and HH) and then the average of the three sub bands for each color<br />

band is computed further to enhance the text pixels in video frame. To take advantage of wavelet and color information,<br />

we again take the average of the three average images (AoA) obtained by the former step to increase the gap between text<br />

- 287 -


and non text pixels. Our previous Laplacian method is employed on AoA for text detection. The proposed method is evaluated<br />

by testing on a large dataset which includes publicly available data, non text data and ICDAR-03 data. Comparative<br />

study with existing methods shows that the results of the proposed method are encouraging and useful.<br />

ThBT7 Dolmabahçe Hall C<br />

Quantitative Biological Image and Signal Analysis Regular Session<br />

Session chair: Tasdizen, Tolga (Univ. of Utah)<br />

13:30-13:50, Paper ThBT7.1<br />

Improving Undersampled MRI Reconstruction using Non-Local Means<br />

Adluru, Ganesh, Univ. of Utah<br />

Tasdizen, Tolga, Univ. of Utah<br />

Whitaker, Ross, Univ. of Utah<br />

Dibella, Edward, Univ. of Utah<br />

Obtaining high quality images in MR is desirable not only for accurate visual assessment but also for automatic processing<br />

to extract clinically relevant parameters. Filtering-based techniques are extremely useful for reducing artifacts caused due<br />

to under sampling of k-space (to reduce scan time). The recently proposed Non-Local Means (NLM) filtering method<br />

offers a promising means to denoise images. Compared to most previous approaches, NLM is based on a more realistic<br />

model of images, which results in little loss of information while removing the noise. Here we extend the NLM method<br />

for MR image reconstruction from under sampled k-space data. The method is applied on T1-weighted images of the<br />

breast and T2-weighted anatomical brain images. Results show that NLM offers a promising method that can be used for<br />

accelerating MR data acquisitions.<br />

13:50-14:10, Paper ThBT7.2<br />

Towards an Intelligent Bed Sensor: Non-Intrusive Monitoring of Sleep Irregularities with Computer Vision Techniques<br />

Branzan Albu, Alexandra, Univ. of Victoria<br />

Malakuti, Kaveh, Univ. of Victoria<br />

This paper proposes a novel approach for monitoring sleep using pressure data. The goal of sleep monitoring is to detect<br />

and log events of normal breathing, sleep apnea and body motion. The proposed approach is based on translating the signal<br />

data to the image domain by computing a sequence of inter-frame similarity matrices from pressure maps acquired with<br />

a mattress of pressure sensors. Periodicity analysis was performed on similarity matrices via a new algorithm based on<br />

segmentation of elementary patterns using the watershed transform, followed by aggregation of quasi-rectangular patterns<br />

into breathing cycles. Once breathing events are detected, all remaining elementary patterns aligned on the main diagonal<br />

are considered as belonging to either apnea or motion events. The discrimination between these two events is based on<br />

detecting movement times from a statistical analysis of pressure data. Experimental results confirm the validity of our approach.<br />

14:10-14:30, Paper ThBT7.3<br />

Automatic Selection of Keyframes from Angiogram Videos<br />

Syeda-Mahmood, Tanveer, IBM Almaden Res. Center<br />

Wang, Fei, Almaden Res. Center<br />

Beymer, David, IBM Almaden Res. Center<br />

Mahmood, Aafreen, Monta Vista High School<br />

Lundstrom, Robert, Kaiser Permanente SFO Medical Center<br />

In this paper we address the problem of automatic selection of important vessel-depicting key frames within 2D angiography<br />

videos. Two different methods of frame selection are described, one based on Frangi filter, and the other based on<br />

detecting parallel curves formed from edges in angiography images. Results are shown by comparison to physician annotation<br />

of such key frames on 2D coronary angiograms.<br />

14:30-14:50, Paper ThBT7.4<br />

A Computer-Aided Method for Scoliosis Fusion Level Selection by a Topologicaly Ordered Self Organizing Kohonen<br />

Network<br />

- 288 -


Mezghani, Neila, Centre de Recherche du CHUM<br />

Phan, Philippe, Sainte-Justine University Hospital Center<br />

Mitiche, Amar, Labella, Hubert, cole Polytechnique de Montreal<br />

de Guise, Jacques, Centre de Recherche du CHUM<br />

Surgical instrumentation for the Adolescent idiopathic scoliosis (AIS) is a complex procedure involving many difficult<br />

decisions. Selection of the appropriate fusion level remains one of the most challenging decisions in scoliosis surgery.<br />

Currently, the Lenke classification model is generally followed in surgical planning. The purpose of our study is to investigate<br />

a computer aided method for Lenke classification and scoliosis fusion level selection. The method uses a self organizing<br />

neural network trained on a large database of surgically treated AIS cases. The neural network produces two<br />

maps, one of Lenke classes and the other of fusion levels. These two maps show that the Lenke classes are associated<br />

with the the proper fusion level categories everywhere in the map except at the Lenke class transitions. The topological<br />

ordering of the Cobb angles in the neural network justifies determining a patient scoliotic treatment instrumentation using<br />

directly the fusion level map rather than via the Lenke classification.<br />

14:50-15:10, Paper ThBT7.5<br />

A Fast and Robust Graph-Based Approach for Boundary Estimation of Fiber Bundles Relying on Fractional<br />

Anisotropy Maps<br />

Bauer, Miriam Helen Anna, Univ. of Marburg<br />

Egger, Jan, Univ. of Marburg<br />

Odonnell, Thomas Patrick, Siemens Corp. Res.<br />

Freisleben, Bernd, Univ. of Marburg<br />

Barbieri, Sebastiano, Fraunhofer MEVIS<br />

Klein, Jan, Fraunhofer MEVIS<br />

Hahn, Horst Karl, Fraunhofer MEVIS<br />

Nimsky, Christopher, Univ. Marburg<br />

In this paper, a fast and robust graph-based approach for boundary estimation of fiber bundles derived from Diffusion<br />

Tensor Imaging (DTI) is presented. DTI is a non-invasive imaging technique that allows the estimation of the location of<br />

white matter tracts based on measurements of water diffusion properties. Depending on DTI data, the fiber bundle boundary<br />

can be determined to gain information about eloquent structures, which is of major interest for neurosurgery. DTI in combination<br />

with tracking algorithms allows the estimation of position and course of fiber tracts in the human brain. The presented<br />

method uses these tracking results as the starting point for a graph-based approach. The overall method starts by<br />

computing the fiber bundle centerline between two user-defined regions of interests (ROIs). This centerline determines<br />

the planes that are used for creating a directed graph. Then, the mincut of the graph is calculated, creating an optimal<br />

boundary of the fiber bundle.<br />

ThCT1 Marmara Hall<br />

Object Detection and Recognition - VI Regular Session<br />

Session chair: Denzler, Joachim (Friedrich-Schiller Univ. of Jena )<br />

15:40-16:00, Paper ThCT1.1<br />

Recognizing 3D Objects with 3D Information from Stereo Vision<br />

Yoon, Kuk-Jin, GIST<br />

Shin, Min-Gil, GIST<br />

Lee, Ji-Hyo, Samsung Electronics<br />

Conventional local feature-based object recognition methods try to recognize learned 3D objects by using unordered local<br />

feature matching followed by the verification. However, the matching between unordered feature sets can be ambiguous<br />

and, moreover, it is difficult to deal with general shaped 3D objects in the verification stage. In this paper, we present a<br />

new framework for general 3D object recognition, which is based on the invariant local features and their 3D information<br />

with stereo cameras. We extend the conventional object recognition framework for stereo cameras. Since the proposed<br />

method is based on the stereo vision, it is possible to utilize 3D information of local features visible from two cameras.<br />

- 289 -


16:00-16:20, Paper ThCT1.2<br />

Combining Geometry and Local Appearance for Object Detection<br />

Pascual García-Tubío, Manuel, Vienna Univ. of Tech.<br />

Wildenauer, Horst, Vienna Univ. of Tech.<br />

Szumilas, Lech, Ind. Research Inst. for Automation & Measurement<br />

In this paper we address the problem of object detection in cluttered scenes. Local image features and their spatial configuration<br />

act as representation of object classes which are learned in a discriminative fashion. Recent contributions in the<br />

area of object detection indicate the importance of using geometrical properties for representing object classes. Prompted<br />

by this, we devised an approach tailored to control the importance of the features and their spatial alignment. We quantitatively<br />

show that modeling the spatial distribution of local features and optimising the influence of both cues significantly<br />

boosts object detection performance.<br />

16:20-16:40, Paper ThCT1.3<br />

Illumination and Expression Invariant Face Recognition using SSIM based Sparse Representation<br />

Khwaja, Asim, The Australian National Univ.<br />

Asthana, Akshay, Australian National Univ.<br />

Goecke, Roland, Univ. of Canberra<br />

The sparse representation technique has provided a new way of looking at object recognition. As we demonstrate in this<br />

paper, however, the mean-squared error (MSE) measure, which is at the heart of this technique, is not a very robust measure<br />

when it comes to comparing facial images, which differ significantly in luminance values, as it only performs pixel-bypixel<br />

comparisons. This requires a significantly large training set with enough variations in it to offset the drawback of the<br />

MSE measure. A large training set, however, is often not available. We propose the replacement of the MSE measure by<br />

the structural similarity (SSIM) measure in the sparse representation algorithm, which performs a more robust comparison<br />

using only one training sample per subject. In addition, since the off-the-shelf sparsifiers are also written using the MSE<br />

measure, we developed our own sparsifier using genetic algorithms that use the SSIM measure. We applied the modified<br />

algorithm to the Extended Yale Face B database as well as to the Multi-PIE database with expression and illumination<br />

variations. The improved performance demonstrates the effectiveness of the proposed modifications.<br />

16:40-17:00, Paper ThCT1.4<br />

Improving Classification Accuracy by Comparing Local Features through Canonical Correlations<br />

Dikmen, Mert, Univ. of Illinois at Urbana Champaign<br />

Huang, Thomas, Univ. of Illinois at Urbana-Champaign<br />

Classifying images using features extracted from densely sampled local patches has enjoyed significant success in many detection<br />

and recognition tasks. It is also well known that generally more than one type of feature is needed to achieve robust<br />

classification performance. Previous works using multiple features have addressed this issue either through simple concatenation<br />

of feature vectors or through combining feature specific kernels at the classifier level. In this work we introduce a<br />

novel approach for combining features at the feature level by projecting two types of features onto two respective subspaces<br />

in which they are maximally correlated. We use their correlation as an augmented feature and demonstrate improvement in<br />

classification accuracy over simple combination through concatenation in a pedestrian detection framework.<br />

17:00-17:20, Paper ThCT1.5<br />

A Robust Approach for Person Localization in Multi-Camera Environment<br />

Sun, Luo, Tsinghua Univ.<br />

Di, Huijun, Tsinghua Univ.<br />

Tao, Linmi, Tsinghua Univ.<br />

Xu, Guangyou, Tsinghua Univ.<br />

Person localization is fundamental in human centered computing, since person should be localized before being actively serviced. This paper<br />

proposed a robust approach to localize person based on the geometric constraints in multi-camera environment. The proposed algorithm has<br />

several advantages: 1) no assumption on the positions and orientations of cameras except the cameras should have certain common field of<br />

view; 2) no assumption on the visibility of particular body part (e.g., feet), except a portion of person should be observed in at least two views;<br />

3) reliability in terms of tolerating occlusion, body posture change and inaccurate motion detection. It can also provide error control and be<br />

further extended to measure person height. The efficacy of the approach is demonstrated on challenging real-world scenarios.<br />

- 290 -


ThCT2 Anadolu Auditorium<br />

Classification - III Regular Session<br />

Session chair: Tortorella, Francesco (Univ. degli Studi di Cassino)<br />

15:40-16:00, Paper ThCT2.1<br />

Nearest Archetype Hull Methods for Large-Scale Data Classification<br />

Thurau, Christian, Fraunhofer IAIS<br />

This paper introduces an efficient geometric approach for data classification that can build class models from large amounts<br />

of high dimensional data. We determine a convex model of the data as the outcome of convex hull non-negative matrix<br />

factorization, a large-scale variant of Archetypal Analysis. The resulting convex regions or archetype hulls give an optimal<br />

(in a least squares sense) bounding of the data region and can be efficiently computed. We classify based on the minimum<br />

distance to the closest archetype hull. The proposed method offers (i) an intuitive geometric interpretation, (ii) single as<br />

well as multi-class classification, and (iii) handling of large amounts of high dimensional data. Experimental evaluation<br />

on common benchmark data sets shows promising results.<br />

16:00-16:20, Paper ThCT2.2<br />

A Bound on the Performance of LDA in Randomly Projected Data Spaces<br />

Durrant, Robert John, Univ. of Birmingham<br />

Kaban, Ata, Univ. of Birmingham<br />

We consider the problem of classification in nonadaptive dimensionality reduction. Specifically, we bound the increase in<br />

classification error of Fisher’s Linear Discriminant classifier resulting from randomly projecting the high dimensional<br />

data into a lower dimensional space and both learning the classifier and performing the classification in the projected<br />

space. Our bound is reasonably tight, and unlike existing bounds on learning from randomly projected data, it becomes<br />

tighter as the quantity of training data increases without requiring any sparsity structure from the data.<br />

16:20-16:40, Paper ThCT2.3<br />

Adaptive Incremental Learning with an Ensemble of Support Vector Machines<br />

Kapp, Marcelo N., École de Tech. Supérieure - Univ. of Quebec<br />

Sabourin, R., École de Tech. Supérieure<br />

Maupin, Patrick, Defence Res. and Development Canada<br />

The incremental updating of classifiers implies that their internal parameter values can vary according to incoming data.<br />

As a result, in order to achieve high performance, incremental learner systems should not only consider the integration of<br />

knowledge from new data, but also maintain an optimum set of parameters. In this paper, we propose an approach for performing<br />

incremental learning in an adaptive fashion with an ensemble of support vector machines. The key idea is to track,<br />

evolve, and combine optimum hypotheses over time, based on dynamic optimization processes and ensemble selection.<br />

From experimental results, we demonstrate that the proposed strategy is promising, since it outperforms a single classifier<br />

variant of the proposed approach and other classification methods often used for incremental learning.<br />

16:40-17:00, Paper ThCT2.4<br />

Margin Preserved Approximate Convex Hulls for Classification<br />

Takahashi, Tetsuji, Hokkaido Univ.<br />

Kudo, Mineichi, Hokkaido Univ.<br />

The usage of convex hulls for classification is discussed with a practical algorithm, in which a sample is classified according<br />

to the distances to convex hulls. Sometimes convex hulls of classes are too close to keep a large margin. In this paper, we<br />

discuss a way to keep a margin larger than a specified value. To do this, we introduce a concept of ``expanded convex<br />

hull’’ and confirm its effectiveness.<br />

17:00-17:20, Paper ThCT2.5<br />

Evolving Fuzzy Classifiers: Application to Incremental Learning of Handwritten Gesture Recognition Systems<br />

Almaksour, Abdullah, IRISA/INSA de Rennes<br />

Anquetil, Eric, IRISA/INSA<br />

- 291 -


Quiniou, Solen, Ec. de Tech. Supérieure<br />

Cheriet, Mohammed, École de Tech. Supérieure<br />

In this paper, we present a new method to design customizable self-evolving fuzzy rule-based classifiers. The presented<br />

approach combines an incremental clustering algorithm with a fuzzy adaptation method in order to learn and maintain the<br />

model. We use this method to build an evolving handwritten gesture recognition system. The self-adaptive nature of this<br />

system allows it to start its learning process with few learning data, to continuously adapt and evolve according to any<br />

new data, and to remain robust when introducing a new unseen class at any moment in the life-long learning process.<br />

ThCT3 Topkapı Hall A<br />

Computer Vision Applications - III Regular Session<br />

Session chair: Yilmaz, Alper (Ohio State Univ.)<br />

15:40-16:00, Paper ThCT3.1<br />

Fast and Spatially-Smooth Terrain Classification using Monocular Camera<br />

Jakkoju, Chetan, IIIT Hyderabad<br />

Krishna, Madhava, IIIT Hyderabad<br />

Jawahar, C. V., IIIT<br />

In this paper, we present a monocular camera based terrain classification scheme. The uniqueness of the proposed scheme<br />

is that it inherently incorporates spatial smoothness while segmenting a image, without requirement of post-processing<br />

smoothing methods. The algorithm is extremely fast because it is build on top of a Random Forest classifier. We present<br />

comparison across features and classifiers. The baseline algorithm uses color, texture and their combination with classifiers<br />

such as SVM and Random Forests. We further enhance the algorithm through a label transfer method. The efficacy of the<br />

proposed solution can be seen as we reach a low error rates on both our dataset and other publicly available datasets.<br />

16:00-16:20, Paper ThCT3.2<br />

Learning Major Pedestrian Flows in Crowded Scenes<br />

Widhalm, Peter, Austrian Inst. of Tech.<br />

Braendle, Norbert, Austrian Inst. of Tech.<br />

We present a crowd analysis approach computing a representation of the major pedestrian flows in complex scenes. It<br />

treats crowds as a set of moving particles and builds a spatio-temporal model of motion events. A Growing Neural Gas algorithm<br />

encodes optical flow particle trajectories as sequences of local motion events and learns a topology which is the<br />

base for trajectory distance computations. Trajectory prototypes are aligned with a two-open-ends version of Dynamic<br />

Time Warping to cope with fragmented trajectores. The trajectories are grouped into an automatically determined number<br />

of clusters with self-tuning spectral clustering. The clusters are compactly represented with the help of Principal Component<br />

Analysis, providing a technique for unusual motion detection based on residuals. We demonstrate results for a publicly<br />

available crowded video and a scene with volunteers moving according to defined origin-destination flows.<br />

16:20-16:40, Paper ThCT3.3<br />

On-Line Video Recognition and Counting of Harmful Insects<br />

Bechar, Ikhlef, INRIA<br />

Sabine Moisan, INRIA<br />

Monique Thonnat, INRIA<br />

Francois Bremond, INRIA<br />

This article is concerned with on-line counting of harmful insects of certain species in videos in the framework of in situ<br />

video-surveillance that aims at the early detection of prominent pest attacks in greenhouse crops. The video-processing<br />

challenges that need to be coped with concern mainly the low spatial resolution and color contrast of the objects of interest<br />

in the videos, the outdoor issues and the video-processing which needs to be done in quasi-real time. Thus, we propose an<br />

approach which makes use of a pattern recognition algorithm to extract the locations of the harmful insects of interest in<br />

a video, which we combine with some video-processing algorithms in order to achieve an on-line video-surveillance solution.<br />

The system has been validated off-line on the whiteflie species (one potential harmful insect) and has shown acceptable<br />

performance in terms of accuracy versus computational time.<br />

- 292 -


16:40-17:00, Paper ThCT3.4<br />

Boosted Edge Orientation Histograms for Grasping Point Detection<br />

Lefakis, Leonidas, TU Vienna<br />

Wildenauer, Horst, Vienna Univ. of Tech.<br />

Pascual García-Tubío, Manuel, Vienna Univ. of Tech.<br />

Szumilas, Lech, Ind. Research Inst. for Automation & Measurement<br />

In this paper, we describe a novel algorithm for the detection of grasping points in images of previously unseen objects.<br />

A basic building block of our approach is the use of a newly devised descriptor, representing semi-local grasping point<br />

shape by the use edge orientation histograms. Combined with boosting, our method learns discriminative grasp point models<br />

for new objects from a set of annotated real-world images. The method has been extensively evaluated on challenging<br />

images of real scenes, exhibiting largely varying characteristics concerning illumination conditions, scene complexity,<br />

and viewpoint. Our experiments show that the method works in a stable manner and that its performance compares favorably<br />

to the state-of-the-art.<br />

17:00-17:20, Paper ThCT3.5<br />

Automatic Refinement of Foreground Regions for Robot Trail Following<br />

Kocamaz, Mehmet Kemal, Univ. of Delaware<br />

Rasmussen, Christopher, Univ. of Delaware<br />

Continuous trails are extended regions along the ground such as roads, hiking paths, rivers, and pipelines which can be<br />

navigationally useful for ground-based or aerial robots. Finding trails in an image and determining possible obstacles on<br />

them are important tasks for robot navigation systems. Assuming that a rough initial segmentation or outline of the region<br />

of interest is available, our goal is to refine the initial guess to obtain a more accurate and detail representation of the true<br />

trail borders. In this paper, we compare the suitability of several previously published segmentation algorithms both in<br />

terms of agreement with ground truth and speed on a range of trail images with diverse appearance characteristics. These<br />

algorithms include generic graph cut, a shape-based version of graph cut which employs a distance penalty, Grab Cut, and<br />

an iterative superpixel grouping method.<br />

ThCT4 Dolmabahçe Hall A<br />

Image Representation and Analysis Regular Session<br />

Session chair: Debled-Rennesson, Isabelle (LORIA-Nancy Univ.)<br />

15:40-16:00, Paper ThCT4.1<br />

Object Decomposition via Curvilinear Skeleton Partition<br />

Serino, Luca, Istituto di Cibernetica<br />

Sanniti Di Baja, Gabriella, CNR<br />

Arcelli, Carlo, Istituto di Cibernetica<br />

A method to decompose a complex 3D object into simpler parts is presented, based on a suitable partition of the curvilinear<br />

skeleton of the object. The curvilinear skeleton is divided into subsets, by taking into account the regions of influence that<br />

can be associated with its branch points. The obtained subsets are then used to recover the parts into which the object can<br />

be decomposed.<br />

16:00-16:20, Paper ThCT4.2<br />

Differential Area Profiles<br />

Ouzounis, Georgios, Joint Res. Center - Ispra, European Commission,<br />

Soille, Pierre, Ec. Joint Res. Centre<br />

In this paper a new feature descriptor, the differential area profile (DAP), is presented. DAPs, like the regular differential<br />

morphological profiles, are computed from some size distribution. The proposed method is based on the area metric given<br />

by regular connected area filters. Area compared to local width, i.e. the diameter of the structuring element in the corresponding<br />

set of openings by reconstruction in classical DMPs, leads to a rather different multi-scale decomposition. This<br />

is investigated here and an example on a very high resolution satellite image tile is given.<br />

- 293 -


16:20-16:40, Paper ThCT4.3<br />

Connected Component Trees for Multivariate Image Processing and Applications in Astronomy<br />

Perret, Benjamin, Univ. of Strasbourg, LSIIT-CNRS<br />

Lefèvre, Sébastien, Univ. of Strasbourg<br />

Collet, Christophe, Univ. of Strasbourg, LSIIT-CNRS<br />

Slezak, Eric Jean Marc, Univ. de Nice - Sophia Antipolis,<br />

In this paper, we investigate the possibilities offered by the extension of the connected component trees (cc-trees) to multivariate<br />

images. We propose a general framework for image processing using the cc-tree based on the lattice theory and<br />

we discuss the possible applications depending on the properties of the underlying ordered set. This theoretical reflexion<br />

is illustrated by two applications in multispectral astronomical imaging: source separation and object detection.<br />

16:40-17:00, Paper ThCT4.4<br />

Multiresolution Analysis of 3D Images based on Discrete Distortion<br />

Weiss, Kenneth, Univ. of Maryland, Coll. Park<br />

Mesmoudi, Mohammed Mostefa, Univ. of Genova<br />

De Floriani, L., Univ. of Genova<br />

We consider a model of a 3D image obtained by discretizing it into a multiresolution tetrahedral mesh known as a hierarchy<br />

of diamonds. This model enables us to extract crack-free approximations of the 3D image at any uniform or variable resolution,<br />

thus reducing the size of the data set without reducing the accuracy. A 3D intensity image is a scalar field (the intensity<br />

field) defined at the vertices of a 3D regular grid and thus the graph of the image is a hyper surface in $R^4$. We<br />

measure the discrete distortion, a generalization of the notion of curvature, of the transformation which maps the tetrahedralized<br />

3D grid onto its graph in $R^4$. We evaluate the use of a hierarchy of diamonds to analyze properties of a 3D<br />

image, such as its discrete distortion, directly on lower resolution approximations. Our results indicate that distortionguided<br />

extractions focus the resolution of approximated images on the salient features of the intensity image.<br />

17:00-17:20, Paper ThCT4.5<br />

Multiscale Analysis of Digital Segments by Intersection of 2D Digital Lines<br />

Said, Mouhammad, Univ. de Savoie, Univ. d’Auvergne<br />

Lachaud, Jacques-Olivier, Univ. of Savoie<br />

Feschet, Fabien, Univ. d’Auvergne Clermont-Ferrand 1<br />

A theory for the multiscale analysis of digital shapes would be very interesting for the pattern recognition community,<br />

giving a digital equivalent of the continuous scale-space theory. We focus here on providing analytical formulae of the<br />

multiresolution of Digital Straight Segments (DSS), which is a fundamental tool for describing digital shape contours.<br />

ThCT5 Dolmabahçe Hall B<br />

Image/Video Processing Regular Session<br />

Session chair: Hamzaoğlu, İlker (Sabancı Univ.)<br />

15:40-16:00, Paper ThCT5.1<br />

Stereoscopic Image Inpainting: Distinct Depth Maps and Images Inpainting<br />

Hervieu, Alexandre, Barcelona Media, Univ. Pompeu Fabra of Barcelona<br />

Papadakis, Nicolas, Barcelona Media<br />

Bugeau, Aurélie, Barcelona Media<br />

Gargallo, Pau, Barcelona Media<br />

Caselles, Vicent, Univ. Pompeu Fabra<br />

In this paper we propose an algorithm for in painting of stereo images. The issue is to reconstruct the holes in a pair of<br />

stereo image as if they were the projection of a 3D scene. Hence, the reconstruction of the missing information has to produce<br />

a consistent visual perception of depth. Thus, first step of the algorithm consists in the computation and in painting<br />

of disparity maps in the given holes. The second step of the algorithm is to fill-in missing regions using the complete disparity<br />

maps in a way that avoids the creation of 3D artifacts. We present some experiments on several pairs of stereo images.<br />

- 294 -


16:00-16:20, Paper ThCT5.2<br />

Panoramic Video Generation by Multi View Data Synthesis<br />

D’Orazio, Tiziana, Italian National Res. Council<br />

Leo, Marco, Italian National Res. Council<br />

Mosca, Nicola, Italian National Res. Council<br />

This paper presents a mosaic based approach for enlarged view soccer video production that can be provided to the audience<br />

as a complementary view for greater enjoyment of relevant events, such as offside, counter attack or goal, that spread out<br />

all over the playing feld. Firstly, an enlarged view of the whole field is produced by fusing the images of six cameras<br />

placed on the two sides of the field. Then a color transformation is applied to have uniform colors on the parts of the<br />

playing field acquired from different cameras. Finally, the players are segmented by each camera and projected onto the<br />

enlarged view to produce videos of the most interesting events.<br />

16:20-16:40, Paper ThCT5.3<br />

An Adaptive True Motion Estimation Algorithm for Frame Rate Conversion of High Definition Video<br />

Cetin, Mert, Sabanci Univ.<br />

Hamzaoglu, Ilker, Sabanci Univ.<br />

Frame Rate Up-Conversion (FRUC) is necessary for displaying low frame rate video signals on high frame rate flat panel<br />

displays. This paper proposes an adaptive true Motion Estimation (ME) algorithm for FRUC of High Definition video<br />

formats. The proposed ME algorithm produces similar quality results with less number of calculations or better quality<br />

results with similar number of calculations compared to 3D Recursive Search true ME algorithm by adaptively using optimized<br />

sets of candidate search locations and several computational complexity reduction techniques.<br />

16:40-17:00, Paper ThCT5.4<br />

Human Daily Activities Indexing in Videos from Wearable Cameras for Monitoring of Patients with Dementia Diseases<br />

Karaman, Svebor, Lab.<br />

Benois-Pineau, Jenny, Lab.<br />

Megret, Remi, Univ. of Bordeaux<br />

Dovgalecs, Vladislavs, IMS<br />

Gaëstel, Yann, INSERM U.897<br />

Dartigues, Jean-Francois, INSERM U.897<br />

Our research focuses on analysing human activities according to a known behaviorist scenario, in case of noisy and high<br />

dimensional collected data. The data come from the monitoring of patients with dementia diseases by wearable cameras.<br />

We define a structural model of video recordings based on a Hidden Markov Model. New spatio-temporal features, color<br />

features and localization features are proposed as observations. First results in recognition of activities are promising.<br />

17:00-17:20, Paper ThCT5.5<br />

Automatic Composition of an Informative Wide-View Image from Video<br />

Habe, Hitoshi, NAIST<br />

Makiyama, Shota, NAIST<br />

Kidode, Masatsugu, NAIST<br />

We describe a method for generating an informative wide-view image using images captured by a moving camera. The<br />

generated image allows for events in the scene observed by the camera to be understood easily. Our method does not use<br />

3D shape information explicitly. Instead, it employs the trajectory of feature points across multiple images and generates<br />

a composite image by taking into account the distribution of the trajectories of the feature points.<br />

ThCT6 Topkapı Hall B<br />

Facial Expression Regular Session<br />

Session chair: Akarun, Lale (Bogazici Univ.)<br />

- 295 -


15:40-16:00, Paper ThCT6.1<br />

Regression-Based Multi-View Facial Expression Recognition<br />

Rudovic, Ognjen, Imperial Coll.<br />

Patras, Ioannis, Queen Mary Univ. of London<br />

Pantic, Maja, Imperial Coll.<br />

We present a regression-based scheme for multi-view facial expression recognition based on 2D geometric features. We<br />

address the problem by mapping facial points (e.g. mouth corners) from non-frontal to frontal view where further recognition<br />

of the expressions can be performed using a state-of-the-art facial expression recognition method. To learn the mapping<br />

functions we investigate four regression models: Linear Regression (LR), Support Vector Regression (SVR),<br />

Relevance Vector Regression (RVR) and Gaussian Process Regression (GPR). Our extensive experiments on the CMU<br />

Multi-PIE facial expression database show that the proposed scheme outperforms view-specific classifiers by utilizing<br />

considerably less training data.<br />

16:00-16:20, Paper ThCT6.2<br />

A Set of Selected SIFT Features for 3D Facial Expression Recognition<br />

Berretti, Stefano, Univ. of Firenze<br />

Del Bimbo, Alberto, Univ. of Florence<br />

Pala, Pietro, Univ. of Firenze<br />

Ben Amor, Boulbaba, LIFL UMR 8022<br />

Daoudi, Mohamed, TELECOM Lille1<br />

In this paper, the problem of person-independent facial expression recognition is addressed on 3D shapes. To this end, an<br />

original approach is proposed that computes SIFT descriptors on a set of facial landmarks of depth images, and then selects<br />

the subset of most relevant features. Using SVM classification of the selected features, an average recognition rate of<br />

77.5% on the BU-3DFE database has been obtained. Comparative evaluation on a common experimental setup, shows<br />

that our solution is able to obtain state of the art results.<br />

16:20-16:40, Paper ThCT6.3<br />

Local 3D Shape Analysis for Facial Expression Recognition<br />

Maalej, Ahmed, LIFL UMR 8022<br />

Ben Amor, Boulbaba, LIFL UMR 8022<br />

Daoudi, Mohamed, TELECOM Lille1<br />

Srivastava, Anuj, Florida State Univ.<br />

Berretti, Stefano, Univ. of Firenze<br />

We investigate the problem of facial expression recognition using 3D face data. Our approach is based on local shape<br />

analysis of several relevant regions of a given face scan. These regions or patches from facial surfaces are extracted and<br />

represented by sets of closed curves. A Riemannian framework is used to derive the shape analysis of the extracted patches.<br />

The applied framework permits to calculate a similarity (or dissimilarity) distances between patches, and to compute the<br />

optimal deformation between them. Once calculated, these measures are employed as inputs to a commonly used classification<br />

techniques such as AdaBoost and Support Vector Machines (SVM). A quantitative evaluation of our novel approach<br />

is conducted on a subset of the publicly available BU-3DFE database.<br />

16:40-17:00, Paper ThCT6.4 CANCELED<br />

Incorporating Action Unit Co-Movement in Classification of Dynamic Facial Expressions using Lasso<br />

Rastad, Mahdi, Univ. of Illinois<br />

Zhu, Lusha, Univ. of Illinois<br />

Koenker, Roger, Univ. of Illinois<br />

Spencer-Smith, Jesse, Univ. of Illinois<br />

Hsu, Ming, Univ. of California, Berkeley<br />

Current literature on facial expression analysis are often applied to static facial images along with a small set of expressions.<br />

In this research we generate a novel dataset of facial action unit dynamics during several experiment sessions by the means<br />

of an avatar controlled by participants using a joystick. Previous studies have shown that this generates highly realistic<br />

facial expressions, comparable to popular displays of facial expressions used in computer vision experiments. Here we<br />

- 296 -


extend this work by using functional data analysis (FDA) to classify facial movement functions into basic emotion categories.<br />

Several single and hybrid classification algorithms are tested. By incorporating action unit co-movement in a Lasso<br />

shrinkage method, we achieved a recognition rate of 89%, substantially outperforming competitor approaches. Application<br />

to real expressions, and introduction of intensity and other temporal features of expressions are discussed as examples of<br />

extensions of our method.<br />

17:00-17:20, Paper ThCT6.5<br />

Multi-Modal Emotion Recognition using Canonical Correlations and Acoustic Features<br />

Gajsek, Rok, Univ. of Ljubljana<br />

Struc, Vitomir, Univ. of Ljubljana<br />

Mihelic, France, Univ. of Ljubljana<br />

The information of the psycho-physical state of the subject is becoming a valuable addition to the modern audio or video<br />

recognition systems. As well as enabling a better user experience, it can also assist in superior recognition accuracy of the<br />

base system. In the article, we present our approach to multi-modal (audio-video) emotion recognition system. For audio<br />

sub-system, a feature set comprised of prosodic, spectral and cepstrum features is selected and support vector classifier is<br />

used to produce the scores for each emotional category. For video sub-system a novel approach is presented, which does<br />

not rely on the tracking of specific facial landmarks and thus, eliminates the problems usually caused, if the tracking algorithm<br />

fails at detecting the correct area. The system is evaluated on the interface database and the recognition accuracy<br />

of our audio-video fusion is compared to the published results in the literature.<br />

ThCT7 Dolmabahçe Hall C<br />

Multimedia and Document Analysis Applications Regular Session<br />

Session chair: Duygulu Sahin, Pinar (Bilkent Univ.)<br />

15:40-16:00, Paper ThCT7.1<br />

Automatic Music Genre Classification using Bass Lines<br />

Simsekli, Umut, Bogazici Univ.<br />

A bass line is an instrumental melody that encapsulates both rhythmic, melodic, and harmonic features and arguably contains<br />

sufficient information for accurate genre classification. In this paper a bass line based automatic music genre classification<br />

system is described. “Melodic Interval Histograms” are used as features and k-nearest neighbor classifiers are<br />

utilized and compared with SVMs on a small size standard MIDI database. Apart from standard distance metrics for knearest<br />

neighbor (Euclidean, symmetric Kullback-Leibler, earth mover’s, normalized compression distances) we propose<br />

a novel distance metric, perceptually weighted Euclidean distance (PWED). The maximum classification accuracy (84%)<br />

is obtained with k-nearest neighbor classifiers and the added utility of the novel metric is illustrated in our experiments.<br />

16:00-16:20, Paper ThCT7.2<br />

Exploiting Combined Multi-Level Model for Document Sentiment Analysis<br />

Li, Si, Beijing Univ. of Posts and Telecommunications<br />

Zhang, Hao, Beijing Univ. of Posts and Telecommunications<br />

Xu, Weiran, Beijing Univ. of Posts and Telecommunications<br />

Guo, Jun, Beijing Univ. of Posts and Telecommunications<br />

This paper focuses on the task of text sentiment analysis in hybrid online articles and web pages. Traditional approaches<br />

of text sentiment analysis typically work at a particular level, such as phrase, sentence or document level, which might<br />

not be suitable for the documents with too few or too many words. Considering every level analysis has its own advantages,<br />

we expect that a combination model may achieve better performance. In this paper, a novel combined model based on<br />

phrase and sentence level’s analyses and a discussion on the complementation of different levels’ analyses are presented.<br />

For the phrase-level sentiment analysis, a newly defined Left-Middle-Right template and the Conditional Random Fields<br />

are used to extract the sentiment words. The Maximum Entropy model is used in the sentence-level sentiment analysis.<br />

The experiment results verify that the combination model with specific combination of features is better than single level<br />

model.<br />

- 297 -


16:20-16:40, Paper ThCT7.3<br />

MONORAIL: A Disk-Friendly Index for Huge Descriptor Databases<br />

Akune, Fernando, Univ. of Campinas<br />

Valle, Eduardo, Univ. of Campinas<br />

Torres, Ricardo, Univ. of Campinas<br />

We propose MONORAIL, an indexing scheme for very large multimedia descriptor databases. Our index is based on the<br />

Hilbert curve, which is able to map the high-dimensional space of those descriptors to a single dimension. Instead of using<br />

several curves to mitigate boundary effects, we use a single curve with several surrogate points for each descriptor. Thus,<br />

we are able to reduce the random accesses to the bare minimum. In a rigorous empirical comparison with another method<br />

based on multiple surrogates, ours shows a significant improvement, due to our careful choice of the surrogate points.<br />

16:40-17:00, Paper ThCT7.4<br />

Localized Supervised Metric Learning on Temporal Physiological Data<br />

Sun, Jimeng, IBM T. J. Watson Res. Center<br />

Sow, Daby, IBM T.J. Watson Res. Center<br />

Hu, Jianying, IBM<br />

Ebadollahi, Shahram, IBM T.J. Watson Res. Center<br />

Effective patient similarity assessment is important for clinical decision support. It enables the capture of past experience<br />

as manifested in the collective longitudinal medical records of patients to help clinicians assess the likely outcomes resulting<br />

from their decisions and actions. However, it is challenging to devise a patient similarity metric that is clinically relevant<br />

and semantically sound. Patient similarity is highly context sensitive: it depends on factors such as the disease, the particular<br />

stage of the disease, and co-morbidities. One way to discern the semantics in a particular context is to take advantage of<br />

physicians’ expert knowledge as reflected in labels assigned to some patients. In this paper we present a method that leverages<br />

localized supervised metric learning to effectively incorporate such expert knowledge to arrive at semantically sound<br />

patient similarity measures. Experiments using data obtained from the MIMIC II database demonstrate the effectiveness<br />

of this approach.<br />

17:00-17:20, Paper ThCT7.5<br />

Automatic Detection of Phishing Target from Phishing Webpage<br />

Liu, Gang, City Univ. of Hong Kong<br />

Qiu, Bite, City Univ. of Hong Kong<br />

Liu, Wenyin, City Univ. of Hong Kong<br />

An approach to identification of the phishing target of a given (suspicious) webpage is proposed by clustering the webpage<br />

set consisting of its all associated webpages and the given webpage itself. We first find its associated webpages, and then<br />

explore their relationships to the given webpage as their features for clustering. Such relationships include link relationship,<br />

ranking relationship, text similarity, and webpage layout similarity relationship. A DBSCAN clustering method is employed<br />

to find if there is a cluster around the given webpage. If such cluster exists, we claim the given webpage is a phishing<br />

webpage and then find its phishing target (i.e., the legitimate webpage it is attacking) from this cluster. Otherwise, we<br />

identify it as a legitimate webpage. Our test dataset consists of 8745 phishing pages (targeting at 76 well-known websites)<br />

selected from Phish Tank and preliminary experiments show that the approach can successfully identify 91.44% of their<br />

phishing targets. Another dataset of 1000 legitimate webpages is collected to test our method‘s false alarm rate, which is<br />

3.40%.<br />

ThBCT8 Upper Foyer<br />

Pattern Recognition Systems and Applications - III Poster Session<br />

Session chair: Radeva, Petia (CVC)<br />

13:30-16:30, Paper ThBCT8.1<br />

Underwater Mine Classification with Imperfect Labels<br />

Williams, David, NATO Undersea Res. Centre<br />

A new algorithm for performing classification with imperfectly labeled data is presented. The proposed approach is motivated<br />

by the insight that the average prediction of a group of sufficiently informed people is often more accurate than the<br />

- 298 -


prediction of any one supposed expert. This idea that the “wisdom of crowds” can outperform a single expert is implemented<br />

by drawing sets of labels as samples from a Bernoulli distribution with a specified labeling error rate. Additionally,<br />

ideas from multiple imputation are exploited to provide a principled way for determining an appropriate number of label<br />

sampling rounds to consider. The approach is demonstrated in the context of an underwater mine classification application<br />

on real synthetic aperture sonar data collected at sea, with promising results.<br />

13:30-16:30, Paper ThBCT8.2<br />

Optimizing Optimum-Path Forest Classification for Huge Datasets<br />

Papa, Joao Paulo, Sao Paulo State Univ<br />

Cappabianco, Fabio, Univ. of Campinas<br />

Falcao, Alexandre Xavier, State Univ. of Campinas<br />

Traditional pattern recognition techniques can not handle the classification of large datasets with both efficiency and effectiveness.<br />

In this context, the Optimum-Path Forest (OPF) classifier was recently introduced, trying to achieve high<br />

recognition rates and low computational cost. Although OPF was much faster than Support Vector Machines for training,<br />

it was slightly slower for classification. In this paper, we present the Efficient OPF (EOPF), which is an enhanced and<br />

faster version of the traditional OPF, and validate it for the automatic recognition of white matter and gray matter in magnetic<br />

resonance images of the human brain.<br />

13:30-16:30, Paper ThBCT8.3<br />

Model-Based Detection of Acoustically Dense Objects in Ultrasound<br />

Banerjee, Jyotirmoy, General Electric<br />

Krishnan, Kajoli B., General Electric<br />

Traditional methods of detection tend to under perform in the presence of the strong and variable background clutter that<br />

characterize a medical ultrasound image. In this paper, we present a novel diffusion based technique to localize acoustically<br />

dense objects in an ultrasound image. The approach is premised on the observation that the topology of noise in ultrasound<br />

images is more sensitive to diffusion than that of any such physical object. We show that our method when applied to the<br />

problem of fetal head detection and automatic measurement of head circumference in 59 obstetric scans compares remarkably<br />

well with manually assisted measurements. Based on fetal age estimates and their bounds specified in Standard<br />

OB Tables [6], the Gestational Age predictions from automated measurements is found to be within 2SD in 95% and 98%<br />

of cases when compared with manual measurements by two experts. The framework is general and can be extended to<br />

object localization in diverse applications of ultrasound imaging.<br />

13:30-16:30, Paper ThBCT8.4<br />

SubXPCA versus PCA: A Theoretical Investigation<br />

Negi, Atul, Univ. of Hyderabad<br />

Kadappagari, Vijaya Kumar, Vasavi Coll. of Egineering<br />

Principal Component Analysis (PCA) is a widely accepted dimensionality reduction technique that is optimal in a MSE<br />

sense. PCA extracts `global’ variations and is insensitive to `local’ variations in sub patterns. Recently, we have proposed<br />

a novel approach, SubXPCA, which was more effective computationally than PCA and also effective in computing principal<br />

components with both global and local information across sub patterns. In this paper, we show the near-optimality<br />

of SubXPCA (in terms of summarization of variance) by proving analytically that `SubXPCA approaches PCA with increase<br />

in number of local principal components of sub patterns.’ This is demonstrated empirically upon CMU Face Data.<br />

13:30-16:30, Paper ThBCT8.5<br />

Feature Extraction Base on Class Mean Embedding (CME)<br />

Wan, Minghua, Nanjing Univ. of Science and Tech.<br />

Lai, Zhihui, Nanjing Univ. of Science and Tech.<br />

Jin, Zhong, Nanjing Univ. of Science and Tech.<br />

Recently, local discriminant embedding (LDE) was proposed to manifold learning and pattern classification. In LDE<br />

framework, the neighbor and class of data points were used to construct the graph embedding for classification problems.<br />

From a high dimensional to a low dimensional subspace, data points of the same class maintain their intrinsic neighbor<br />

- 299 -


elations, whereas neighboring data points of different classes no longer stick to one another. But, neighboring data points<br />

of different classes are not deemphasized efficiently by LDE and it may degrade the performance of classification. In this<br />

paper, we investigated its extension, called class mean embedding (CME), using class mean of data points to enhance its<br />

discriminant power in their mapping into a low dimensional space. Experimental results on ORL and FERET face databases<br />

show the effectiveness of the proposed method.<br />

13:30-16:30, Paper ThBCT8.6<br />

Forest Species Recognition using Color-Based Features<br />

Paula, Pedro Luiz, UFPR<br />

Oliveira, Luiz, Federal Univ. of Parana<br />

Britto, Alceu, Pontificia Univ. Católica do Paraná<br />

Sabourin, R., École de Tech. supérieure<br />

In this work we address the problem of forest species recognition which is a very challenging task and has several potential<br />

applications in the wood industry. The first contribution of this work is a database composed of 22 different species of the<br />

Brazilian flora that has been carefully labeled by expert in wood anatomy. In addition, in this work we demonstrate through<br />

a series of comprehensive experiments that color-based features are quite useful to increase the discrimination power for<br />

this kind of application. Last but not least, we propose a segmentation approach so that a wood can be locally processed<br />

to mitigate the intra-class variability featured in some classes. Such an approach also brings important contribution to improve<br />

the final performance in terms of classification.<br />

13:30-16:30, Paper ThBCT8.7<br />

An Information Theoretic Linear Discriminant Analysis Method<br />

Zhang, Haihong, Inst. for Infocomm Res.<br />

Guan, Cuntai, Inst. for Infocomm Res.<br />

We propose a novel linear discriminant analysis method and demonstrate its superiority over existing linear methods.<br />

Based on information theory, we introduce a non-parametric estimate of mutual information with variable kernel bandwidth.<br />

Furthermore, we derive a gradient-based optimization algorithm for learning the optimal linear reduction vectors which<br />

maximizes the mutual information estimate. We evaluate the proposed method by running cross-validation on 2 data sets<br />

from the UCI repository, together with linear and nonlinear SVMs as classifiers. The result attests to the superority of the<br />

method over conventional LDA and its variant, aPAC.<br />

13:30-16:30, Paper ThBCT8.8<br />

Framewise Phone Classification using Weighted Fuzzy Classification Rules<br />

Dehzangi, Omid, Nanyang Tech. Univ.<br />

Ma, Bin, Inst. for Infocomm Res.<br />

Chng, Eng Siong, Nanyang Tech. Univ.<br />

Li, Haizhou, Inst. for Infocomm Res.<br />

Our aim in this paper is to propose a rule-weight learning algorithm in fuzzy rule-based classifiers. The proposed algorithm<br />

is presented in two modes: first, all training examples are assumed to be equally important and the algorithm attempts to<br />

minimize the error-rate of the classifier on the training data by adjusting the weight of each fuzzy rule in the rule-base,<br />

and second, a weight is assigned to each training example as the cost of misclassification of it using the class distribution<br />

of its neighbors. Then, instead of minimizing the error-rate, the learning algorithm is modified to minimize the sum of<br />

costs for misclassified examples. Using six data sets from UCI-ML repository and the TIMIT speech corpus for frame<br />

wise phone classification, we show that our proposed algorithm considerably improves the prediction ability of the classifier.<br />

13:30-16:30, Paper ThBCT8.9<br />

Statistical Fourier Descriptors for Defect Image Classification<br />

Timm, Fabian, Univ. of Lübeck<br />

Martinetz, Thomas, Univ. of Lübeck<br />

In many industrial applications, Fourier descriptors are commonly used when the description of the object shape is an im-<br />

- 300 -


portant characteristic of the image. However, these descriptors are limited to single objects. We propose a general Fourierbased<br />

approach, called statistical Fourier descriptor (SFD), which computes shape statistics in grey level images. The SFD<br />

is computationally efficient and can be used for defect image classification. In a first example, we deployed the SFD to<br />

the inspection of welding seams with promising results.<br />

13:30-16:30, Paper ThBCT8.10<br />

A Measure of Competence based on Randomized Reference Classifier for Dynamic Ensemble Selection<br />

Woloszynski, Tomasz, Wroclaw Univ. of Tech.<br />

Kurzynski, Marek, Wroclaw Univ. of Tech.<br />

This paper presents a measure of competence based on a randomized reference classifier (RRC) for classifier ensembles.<br />

The RRC can be used to model, in terms of class supports, any classifier in the ensemble. The competence of a modelled<br />

classifier is calculated as the probability of correct classification of the respective RRC. A multiple classifier system (MCS)<br />

was developed and its performance was compared against five MCSs using eight databases taken from the UCI Machine<br />

Learning Repository. The system developed achieved the highest overall classification accuracies for both homogeneous<br />

and heterogeneous ensembles.<br />

13:30-16:30, Paper ThBCT8.12<br />

Information Theory based WCE Video Summarization<br />

Granata, Eliana, Univ. of Catania<br />

Gallo, Giovanni, Univ. of Catania<br />

Torrisi, Alessandro, Univ. of Catania<br />

Wireless Capsule Endoscopy (WCE) is a technical break-through that allows to produce a video of the entire intestine<br />

without surgery. It is reported that a medical clinician spends one or two hours to assess a WCE video. It is hence useful<br />

to help the physician to do analysis diagnosis using computerized methods. In this paper an algorithmic informationtheroretic<br />

method is presented for the automatic summarization of meaningful changes in video sequences extracted from<br />

WCE videos. To segment a WCE video into anatomic parts (esophagus, stomach, small intestine, colon) we use a textonsbased<br />

method. The local textons histogram sequence is used for image representation and the Normalized Compression<br />

Distance (NCD) measure is used to compute the similarity between images.<br />

13:30-16:30, Paper ThBCT8.13<br />

An LDA-Based Relative Hysteresis Classifier with Application to Segmentation of Retinal Vessels<br />

Condurache, Alexandru Paul, Univ. of Luebeck<br />

Müller, Florian, Univ. of Luebeck<br />

Mertins, Alfred, Univ. of Luebeck<br />

In a pattern classification setup, image segmentation is achieved by assigning each pixel to one of two classes: object or<br />

background. The special case of vessel segmentation is characterized by a strong disproportion between the number of<br />

representatives of each class (i.e. class skew) and also by a strong overlap between classes. These difficulties can be solved<br />

using problem-specific knowledge. The proposed hysteresis classification makes use of such knowledge in an efficient<br />

way. We describe a novel, supervised, hysteresis-based classification method that we apply to the segmentation of retina<br />

photographies. This procedure is fast and achieves results that comparable or even superior to other hysteresis methods<br />

and, for the problem of retina vessel segmentation, to known dedicated methods on similar data sets.<br />

13:30-16:30, Paper ThBCT8.14<br />

An Offline Map Matching via Integer Programming<br />

Yanagisawa, Hiroki, IBM<br />

The map matching problem is, given a spatial road network and a sequence of locations of an object moving on the network,<br />

to identify the path in the network that the moving object passed through. In this paper, an integer programming formulation<br />

for the offline map matching problem is presented. This is the first approach that gives the optimal solution with respect<br />

to a widely used objective function for map matching.<br />

- 301 -


13:30-16:30, Paper ThBCT8.15<br />

Invisible Calibration Pattern based on Human Visual Perception Characteristics<br />

Takimoto, Hironori, Okayama Prefectural Univ.<br />

Yoshimori, Seiki, Nippon Bunri Univ.<br />

Mitsukura, Yasue, Tokyo Univ. of Agriculture and Tech.<br />

Fukumi, Minoru, The Univ. of Tokushima<br />

In the print-type steganographic system and watermark, a calibration pattern is arranged around contents where invisible<br />

data is embedded, as plural feature points corresponding to between an original image and the scanned image for normalization<br />

of the scanned image. However, it is clear that conventional methods interfere with page layout and artwork of<br />

contents. In addition, visible calibration patterns are not suitable for security service. In this paper, we propose an arrangement<br />

and detection method of an invisible calibration pattern based on characteristics of human visual perception. The<br />

calibration pattern is embedded to blue intensity in an original image by adding high frequency component.<br />

13:30-16:30, Paper ThBCT8.16<br />

Boosting Gray Codes for Red Eyes Removal<br />

Battiato, Sebastiano, Univ. of Catania<br />

Farinella, Giovanni Maria, Univ. of Catania<br />

Guarnera, Mirko, ST Microelectronics<br />

Messina, Giuseppe, ST Microelectronics<br />

Ravì, Daniele, ST Microelectronics<br />

Since the large diffusion of digital camera and mobile devices with embedded camera and flashgun, the red-eyes artifacts<br />

have de-facto become a critical problem. The technique herein described makes use of three main steps to identify and remove<br />

red-eyes. First, red eyes candidates are extracted from the input image by using an image filtering pipeline. A set of<br />

classifiers is then learned on gray code features extracted in the clustered patches space, and hence employed to distinguish<br />

between eyes and non-eyes patches. Once red-eyes are detected, artifacts are removed through desaturation and brightness<br />

reduction. The proposed method has been tested on large dataset of images achieving effective results in terms of hit rates<br />

maximization, false positives reduction and quality measure.<br />

13:30-16:30, Paper ThBCT8.17<br />

A New Rotation Feature for Single Tri-Axial Accelerometer based 3D Spatial Handwritten Digit Recognition<br />

Xue, Yang, South China Univ. of Tech.<br />

Jin, Lianwen, South China Univ. of Tech.<br />

A new rotation feature extracted from tri-axial acceleration signals for 3D spatial handwritten digit recognition is proposed.<br />

The feature can effectively express the clockwise and anti-clockwise direction changes of the users‘ movement while writing<br />

in a 3D space. Based on the rotation feature, an algorithm for 3D spatial handwritten digit recognition is presented.<br />

First, the rotation feature of the handwritten digit is extracted and coded. Then, the normalized edit distance between the<br />

digit and class model is computed. Finally, classification is performed using Support Vector Machine (SVM). The proposed<br />

approach outperforms time-domain features with a 22.12% accuracy improvement, peak-valley features with a 12.03%<br />

accuracy improvement, and FFT features with a 3.24% accuracy improvement, respectively. Experimental results show<br />

that the proposed approach is effective.<br />

13:30-16:30, Paper ThBCT8.18<br />

Improved Mean Shift Algorithm with Heterogeneous NodeWeights<br />

Yoon, Ji Won, Trinity Coll. Dublin<br />

Wilson, Simon, Trinity Coll. Dublin<br />

The conventional mean shift algorithm has been known to be sensitive to selecting a bandwidth. We present a robust mean<br />

shift algorithm with heterogeneous node weights that come from a geometric structure of a given data set. Before running<br />

MS procedure, we reconstruct un-normalized weights (a rough surface of data points) from the Delaunay Triangulation.<br />

The un-normalized weights help MS to avoid the problem of failing of misled mean shift vectors. As a result, we can<br />

obtain a more robust clustering result compared to the conventional mean shift algorithm. We also propose an alternative<br />

way to assign weights for large size datasets and noisy datasets.<br />

- 302 -


13:30-16:30, Paper ThBCT8.19<br />

Word Clustering using PLSA Enhanced with Long Distance Bigrams<br />

Bassiou, Nikoletta, Aristotle Univ. of Thessaloniki<br />

Kotropoulos, Constantine, Aristotle Univ. of Thessaloniki<br />

Probabilistic latent semantic analysis is enhanced with long distance bigram models in order to improve word clustering.<br />

The long distance bigram probabilities and the interpolated long distance bigram probabilities at varying distances within<br />

a context capture different aspects of contextual information. In addition, the baseline bigram, which incorporates triggerpairs<br />

for various histories, is tested in the same framework. The experimental results collected on publicly available corpora<br />

(CISI, Cran field, Medline, and NPL) demonstrate the superiority of the long distance bigrams over the baseline bigrams<br />

as well as the superiority of the interpolated long distance bigrams against the long distance bigrams and the baseline<br />

bigram with trigger-pairs in yielding more compact clusters containing less outliers.<br />

13:30-16:30, Paper ThBCT8.20<br />

Scene Classification using Local Co-Occurrence Feature in Subspace Obtained by KPCA of Local Blob Visual<br />

Words<br />

Hotta, Kazuhiro, Meijo University<br />

In recent years, scene classification based on local correlation of binarized projection lengths in subspace obtained by<br />

Kernel Principal Component Analysis (KPCA) of visual words was proposed and its effectiveness was shown. However,<br />

local correlation of 2 binary features becomes 1 only when both features are 1. In other cases, local correlation becomes<br />

0. This discarded information. In this paper, all kinds of co-occurrence of 2 binary features are used. This is the first device<br />

of our method. The second device is local Blob visual words. Conventional method made visual words from an orientation<br />

histogram on each grid. However, it is too local information. We use orientation histograms in a local Blob on grid as a<br />

basic feature and develop local Blob visual words. The third device is norm normalization of each orientation histogram<br />

in a local Blob. By normalizing local norm, the similarity between corresponding orientation histogram is reflected in<br />

subspace by KPCA. By these 3 devices, the accuracy is achieved more than 84% which is higher than conventional methods.<br />

13:30-16:30, Paper ThBCT8.21<br />

Recognition and Prediction of Situations in Urban Traffic Scenarios<br />

Käfer, Eugen, Daimler AG<br />

Hermes, Christoph, Bielefeld Univ.<br />

Wöhler, Christian, Dortmund University of Technology<br />

Kummert, Franz, Bielefeld Univ.<br />

Ritter, Helge, Bielefeld Univ.<br />

The recognition and prediction of intersection situations and an accompanying threat assessment are an indispensable skill<br />

of future driver assistance systems. This study focuses on the recognition of situations involving two vehicles at intersections.<br />

For each vehicle, a set of possible future motion trajectories is estimated and rated based on a motion database for<br />

a time interval of 2-4 s ahead. Possible situations involving two vehicles are generated by a pairwise combination of these<br />

individual motion trajectories. An interaction model based on the mutual visibility of the vehicles and the assumption that<br />

a driver will attempt to avoid a collision is used to rate possible situations. The correspondingly favoured situations are<br />

classified with a probabilistic framework. The proposed method is evaluated on a real-world differential GPS data set acquired<br />

during a test drive of about 10 km, including three road intersections. Our method is typically able to recognise the<br />

situation correctly about 1.5-3 s before the last vehicle has passed its minimum distance to the centre of the intersection.<br />

13:30-16:30, Paper ThBCT8.22<br />

Employing Decoding of Specific Error Correcting Codes as a New Classification Criterion in Multiclass Learning<br />

Problems<br />

Luo, Yurong, Virginia Commonwealth Univ.<br />

Kayvan, Najarian, Virginia Commonwealth Univ.<br />

Error Correcting Output Codes (ECOC) method solves multiclass learning problems by combining the outputs of several<br />

binary classifiers according to an error correcting output code matrix. Traditionally, the minimum Hamming distance is<br />

adopted as the classification criterion to “vote” among multiple hypotheses, and the focus is given to the choice of error<br />

- 303 -


correcting output code matrix. In this paper, we apply a decoding methodology in multiclass learning problems, in which<br />

class labels of testing samples are unknown. In other words, without comparing the predicted and actual class labels, it<br />

can be known whether testing samples are classified correctly. Based on this property, a new cascade classifier is introduced.<br />

The classifier can improve the accuracy and will not result in over fitting. The analytical results show feasibility, accuracy,<br />

and the advantages of the proposed method.<br />

13:30-16:30, Paper ThBCT8.23<br />

EEG-Based Emotion Recognition using Self-Organizing Map for Boundary Detection<br />

Khosrowabadi, Reza, Nanyang Tech. Univ. Singapore<br />

Ang, Kai Keng, Inst. for Infocomm Res. A*STAR<br />

Quek, Hiok Chai, Nanyang Tech. Univ.<br />

Bin Abdul Rahman, Abdul Wahab, International Islamic Univ. Malaysia<br />

This paper presents an EEG-based emotion recognition system using self-organizing map for boundary detection. Features<br />

from EEG signals are classified by considering the subjects‘ emotional responses using scores from SAM questionnaire.<br />

The selection of appropriate threshold levels for arousal and valence is critical to the performance of the recognition system.<br />

Therefore, this paper investigates the performance of a proposed EEG-based emotion recognition system that employed selforganizing<br />

map to identify the boundaries between separable regions. A study was performed to collect 8 channels of EEG<br />

data from 26 healthy right-handed subjects in experiencing 4 emotional states while exposed to audio-visual emotional<br />

stimuli. EEG features were extracted using the magnitude squared coherence of the EEG signals. The boundaries of the EEG<br />

features were then extracted using SOM. 5-fold cross-validation was then performed using the k-nn classifier. The results<br />

showed that proposed method improved the accuracies to 84.5%.<br />

13:30-16:30, Paper ThBCT8.24<br />

Vocabulary-Based Approaches for Multiple-Instance Data: A Comparative Study<br />

Amores, Jaume, Univ. Autònoma de Barcelona<br />

Multiple Instance Learning (MIL) has become a hot topic and many different algorithms have been proposed in the last<br />

years. Despite this fact, there is a lack of comparative studies that shed light into the characteristics of the different methods<br />

and their behavior in different scenarios. In this paper we provide such an analysis. We include methods from different families,<br />

and pay special attention to vocabulary-based approaches, a new family of methods that has not received much attention<br />

in the MIL literature. The empirical comparison includes seven databases from four heterogeneous domains, implementations<br />

of eight popular MIL methods, and a study of the behavior under synthetic conditions. Based on this analysis, we show that,<br />

with an appropriate implementation, vocabulary-based approaches outperform other MIL methods in most of the cases,<br />

showing in general a more consistent performance.<br />

13:30-16:30, Paper ThBCT8.25<br />

A Multiple Classifier System Approach for Facial Expressions in Image Sequences Utilizing GMM Supervectors<br />

Schels, Martin, Univ. of Ulm<br />

Schwenker, Friedhelm, Univ. of Ulm<br />

The Gaussian mixture model (GMM) super vector approach is a well known technique in the domain of speech processing,<br />

e.g. speaker verification and audio segmentation. In this paper we apply this approach to video data in order to recognize<br />

human facial expressions. Three different image feature types (optical flow histograms, orientation histograms and principal<br />

components) from four pre-selected regions of the human’s face image were extracted and GMM super-vectors of the feature<br />

channels per sequence were constructed. Support vector machines (SVM) were trained using these super vectors for every<br />

channel separately and its results were combined using classifier fusion techniques. Thus, the performance of the classifier<br />

could be improved compared to the best individual classifier.<br />

13:30-16:30, Paper ThBCT8.26<br />

Incremental Learning of Visual Landmarks for Mobile Robotics<br />

Bandera, Antonio, Univ. of Malaga<br />

Vázquez-Martín, Ricardo, Centro Andaluz de Innovación y Tecnologías de la Información y las Comunicaciones CITIC<br />

Marfil, Rebeca, Univ. of Malaga<br />

- 304 -


This paper proposes an incremental scheme for visual landmark learning and recognition. The feature selection stage characterises<br />

the landmark using the Opponent SIFT, a color-based variant of the SIFT descriptor. To reduce the dimensionality<br />

of this descriptor, an incremental non-parametric discriminant analysis is conducted to seek directions for efficient discrimination<br />

(incremental eigenspace learning). On the other hand, the classification stage uses the incremental envolving clustering<br />

method (ECM) to group feature vectors into a set of clusters (incremental prototype learning). Then, the final classification<br />

is conducted based on the k-nearest neighbor approach, whose prototypes were updated by the ECM. This global scheme<br />

enables a classifier to learn incrementally, on-line, and in one pass. Besides, the ECM allows to reduce the memory and computation<br />

expenses. Experimental results show that the proposed recognition system is well suited to be used by an autonomous<br />

mobile robot.<br />

13:30-16:30, Paper ThBCT8.27<br />

Subspace Methods with Globally/Locally Weighted Correlation Matrix<br />

Yamashita, Yukihiko, Tokyo Inst. of Tech.<br />

Wakahara, Toru, Hosei Univ.<br />

The discriminant function of a subspace method is provided by using correlation matrices that reflect the averaged feature<br />

of a category. As a result, it will not work well on unknown input patterns that are far from the average. To address this problem,<br />

we propose two kinds of weighted correlation matrices for subspace methods. The globally weighted correlation matrix<br />

(GWCM) attaches importance to training patterns that are far from the average. Then, it can reflect the distribution of patterns<br />

around the category boundary more precisely. The computational cost of a subspace method using GWCMs is almost the<br />

same as that using ordinary correlation matrices. The locally weighted correlation matrix (LWCM) attaches importance to<br />

training patterns that arenear to an input pattern to be classified. Then, it can reflect the distribution of training patterns<br />

around the input pattern in more detail. The computational cost of a subspace method with LWCM at the recognition stage<br />

does not depend on the number of training patterns, while those of the conventional adaptive local and the nonlinear subspace<br />

methods do. We show the advantages of the proposed methods by experiments made on the MNIST database of handwritten<br />

digits.<br />

13:30-16:30, Paper ThBCT8.28<br />

The Binormal Assumption on Precision-Recall Curves<br />

Brodersen, Kay Henning, ETH Zurich<br />

Ong, Cheng Soon, ETH Zurich<br />

Stephan, Klaas Enno, Univ. of Zurich<br />

Buhmann, Joachim M., Swiss Federal Inst. of Tech. Zurich<br />

The precision-recall curve (PRC) has become a widespread conceptual basis for assessing classification performance. The curve<br />

relates the positive predictive value of a classifier to its true positive rate and often provides a useful alternative to the well-known<br />

receiver operating characteristic (ROC). The empirical PRC, however, turns out to be a highly imprecise estimate of the true curve,<br />

especially in the case of a small sample size and class imbalance in favour of negative examples. Ironically, this situation tends to<br />

occur precisely in those applications where the curve would be most useful, e.g., in anomaly detection or information retrieval. Here,<br />

we propose to estimate the PRC on the basis of a simple distributional assumption about the decision values that generalizes the established<br />

binormal model for estimating smooth ROC curves. Using simulations, we show that our approach outperforms empirical<br />

estimates, and that an account of the class imbalance is crucial for obtaining unbiased PRC estimates.<br />

13:30-16:30, Paper ThBCT8.29<br />

Incremental Training of Multiclass Support Vector Machines<br />

Nikitidis, Symeon, Centre for Res. and Tech. Hellas<br />

Nikolaidis, Nikos, Aristotle Univ. of Thessaloniki<br />

Pitas, Ioannis, -<br />

We present a new method for the incremental training of multiclass Support Vector Machines that provides computational efficiency<br />

for training problems in the case where the training data collection is sequentially enriched and dynamic adaptation of the classifier<br />

is required. An auxiliary function that incorporates some desired characteristics in order to provide an upper bound of the objective<br />

function which summarizes the multiclass classification task has been designed and the global minimizer for the enriched dataset is<br />

found using a warm start algorithm, since faster convergence is expected when starting from the previous global minimum. Experimental<br />

evidence on two data collections verified that our method is faster than retraining the classifier from scratch, while the<br />

achieved classification accuracy is maintained at the same level.<br />

- 305 -


13:30-16:30, Paper ThBCT8.30<br />

User Adaptive Clustering of a Large Image Database<br />

Saboorian, Mohammad Mehdi, Sharif Univ. of Tech.<br />

Jamzad, Mansour, Sharif Univ. of Tech.<br />

Rabiee, Hamid Reza, Sharif Univ. of Tech.<br />

Searching large image databases is a time consuming process when done manually. Current CBIR methods mostly rely<br />

on training data in specific domains. When source and domain of images are unknown, unsupervised methods provide<br />

better solutions. In this work, we use a hierarchical clustering scheme to group images in an unknown and large image<br />

database. In addition, the user should provide the current class assignment of a small number of images as a feedback to<br />

the system. The proposed method uses this feedback to guess the number of required clusters, and optimizes the weight<br />

vector in an iterative manner. In each step, after modification of the weight vector, the images are reclustered. We compared<br />

our method with a similar approach (but without users feedback) named CLUE. Our experimental results show that by<br />

considering the user feedback, the accuracy of clustering is considerably improved.<br />

13:30-16:30, Paper ThBCT8.31<br />

Alignment-Based Similarity of People Trajectories using Semi-Directional Statistics<br />

Calderara, Simone, Univ. of Modena and Reggio Emilia<br />

Prati, Andrea, Univ. of Modena and Reggio Emilia<br />

Cucchiara, Rita, Univ. of Modena and Reggio Emilia<br />

This paper presents a method for comparing people trajectories for video surveillance applications, based on semi-directional<br />

statistics. In fact, the modelling of a trajectory as a sequence of angles, speeds and time lags, requires the use of a<br />

statistical tool capable to jointly consider periodic and linear variables. Our statistical method is compared with two stateof-the-art<br />

methods.<br />

13:30-16:30, Paper ThBCT8.32<br />

Contact Lens Detection based on Weighted LBP<br />

Zhang, Hui, Shanghai Inst. of Tech.<br />

Sun, Zhenan, Chinese Acad. of Sciences<br />

Tan, Tieniu, Chinese Acad. of Sciences<br />

Spoof detection is a critical function for iris recognition because it reduces the risk of iris recognition systems being forged.<br />

Despite various counterfeit artifacts, cosmetic contact lens is one of the most common and difficult to detect. In this paper,<br />

we proposed a novel fake iris detection algorithm based on improved LBP and statistical features. Firstly, a simplified<br />

SIFT descriptor is extracted at each pixel of the image. Secondly, the SIFT descriptor is used to rank the LBP encoding<br />

sequence. Then, statistical features are extracted from the weighted LBP map. Lastly, SVM classifier is employed to<br />

classify the genuine and counterfeit iris images. Extensive experiments are conducted on a database containing more than<br />

5000 fake iris images by wearing 70 kinds of contact lens, and captured by four iris devices. Experimental results show<br />

that the proposed method achieves state-of-the-art performance in contact lens spoof detection.<br />

13:30-16:30, Paper ThBCT8.33<br />

Integrating ILSR to Bag-of-Visual Words Model based on Sparse Codes of SIFT Features Representations<br />

Wu, Lina, Univ. Beijing<br />

Luo, Siwei, Univ. Beijing<br />

Sun, Wei, Beijing Jiaotong Univ.<br />

Zheng, Xiang, Beijing Jiaotong Univ.<br />

In computer vision, the bag-of-visual words(BOV) approach has been shown to yield state-of-the-art results. To improve<br />

BOV model, we use sparse codes of SIFT features instead of previous vector quantization (VQ) such as k-means, due to<br />

more quantization errors of VQ. And as local features in most categories have spatial dependence in real world, we use<br />

neighbor features of one local feature as its implicit local spatial relationship (ILSR). This paper proposes an object categorization<br />

algorithm which integrate implicit local spatial relationship with its appearance features based on sparse codes<br />

of SIFT to form two sources of information for categorization. The algorithm is applied in Caltech-101 and Caltech-256<br />

datasets to validate its effectiveness. The experimental results show its good performance.<br />

- 306 -


13:30-16:30, Paper ThBCT8.34<br />

Heteroscedastic Multilinear Discriminant Analysis for Face Recognition<br />

Safayani, Mehran, Sharif Univ. of Tech.<br />

Manzuri Shalmani, Mohammad Taghi, Sharif Univ. of Tech.<br />

There is a growing attention in subspace learning using tensor-based approaches in high dimensional spaces. In this paper<br />

we first indicate that these methods suffer from the Heteroscedastic problem and then propose a new approach called Heteroscedastic<br />

Multilinear Discriminant Analysis (HMDA). Our method can solve this problem by utilizing the pairwise<br />

chernoff distance between every pair of clusters with the same index in different classes. We also show that our method is<br />

a general form of Multilinear Discriminant Analysis (MDA) approach. Experimental results on CMU-PIE, AR and AT&T<br />

face databases demonstrate that the proposed method always perform better than MDA in term of classification accuracy.<br />

13:30-16:30, Paper ThBCT8.35<br />

Applying Error Correcting Output Coding to Enhance the Convolutional Neural Network for Target Detection and<br />

Pattern Recognition<br />

Deng, Huiqun, Concordia Univ.<br />

Stathopoulos, George, Concordia Univ.<br />

Suen, Ching Y.<br />

This paper views target detection and pattern recognition as a kind of communications problem and applies error-correcting<br />

coding to the outputs of a convolutional neural network to improve the accuracy and reliability of detection and recognition<br />

of targets. The outputs of the convolutional neural network are designed according to codewords with maximum Hamming<br />

distances. The effects of the codewords on the performance of the convolutional neural network in target detection and<br />

recognition are then investigated. Images of hand-written digits and printed English letters and symbols are used in the<br />

experiments. Results show that error-correcting output coding provides the neural network with more reliable decision<br />

rules and enables it to perform more accurate and reliable detection and recognition of targets. Moreover, our error-correcting<br />

output coding can reduce the number of neurons required, which is highly desirable in efficient implementations.<br />

13:30-16:30, Paper ThBCT8.36<br />

Action Recognition using Direction Models of Motion<br />

Benabbas, Yassine, LIFL<br />

Lablack, Adel, UMR USTL/CNRS 8022<br />

Ihaddadene, Nacim, UMR USTL/CNRS 8022<br />

Djeraba, Chabane, UMR USTL/CNRS 8022<br />

In this paper, we present an effective method for human action recognition using statistical models based on optical flow<br />

orientations. We compute a distribution mixture over motion orientations at each spatial location of the video sequence.<br />

The set of estimated distributions constitutes the direction model, which is used as a mid-level feature for the video sequence.<br />

We recognize human actions using a distance metric to compare the direction model of a query sequence with the<br />

direction models of training sequences. The experimentations have been performed on standard datasets and have showed<br />

promising results.<br />

13:30-16:30, Paper ThBCT8.37<br />

Boolean Combination of Classifiers in the ROC Space<br />

Khreich, Wael, École de Tech. Supérieure<br />

Granger, Eric, École de Tech. Supérieure<br />

Miri, Ali, Univ. of Ottawa<br />

Sabourin, R., École de Tech. Supérieure<br />

Using Boolean AND and OR functions to combine the responses of multiple one- or two-class classifiers in the ROC<br />

space may significantly improve performance of a detection system over a single best classifier. However, techniques<br />

found in literature assume that the classifiers are conditionally independent, and that their ROC curves are convex. These<br />

assumptions are not valid in most real-world applications, where classifiers are designed using limited and imbalanced<br />

training data. A new Iterative Boolean Combination (IBC) technique applies all Boolean functions to combine the ROC<br />

curves produced by multiple classifiers without prior assumptions, and its time complexity is linear according to the<br />

number of classifiers. The results of computer simulations conducted on synthetic and real-world host-based intrusion de-<br />

- 307 -


tection data indicate that combining the responses from multiple HMMs with IBC can achieve a significantly higher level<br />

of performance than with the AND and OR combinations, especially when training data is limited and imbalanced.<br />

13:30-16:30, Paper ThBCT8.38<br />

Stereo-Based Multi-Person Tracking using Overlapping Silhouette Templates<br />

Satake, Junji, Toyohashi Univ. of Tech.<br />

Miura, Jun, Toyohashi Univ. of Tech.<br />

This paper describes a stereo-based person tracking method for a person following robot. Many previous works on person<br />

tracking use laser range finders which can provide very accurate range measurements. Stereo-based systems have also<br />

been popular, but most of them are not used for controlling a real robot. We previously developed a tracking method which<br />

uses depth templates of person shape applied to a dense depth image. The method, however, sometimes failed when complex<br />

occlusions occurred. In this paper, we propose an accurate, stable tracking method using overlapping silhouette templates<br />

which consider how persons overlap in the image. Experimental results show the effectiveness of the proposed<br />

method.<br />

13:30-16:30, Paper ThBCT8.40<br />

Characterising Facial Gender Difference using Fisher-Rao Metric<br />

Ceolin, Simone Regina, Univ. of York<br />

Hancock, Edwin, Univ. of York<br />

The aim in this paper is to explore whether the Fisher-Rao metric can be used to measure different facets of facial shape<br />

estimated from fields of surface normals using the von-Mises Fisher distribution. In particular we aim to characterise the<br />

shape changes due to differences in gender. We make use of the von-Mises Fisher distribution since we are dealing with<br />

surface normal data over the sphere R^2. Finally, we show the results achieved using EAR and Max Planck datasets.<br />

13:30-16:30, Paper ThBCT8.41<br />

On-Line FMRI Data Classification using Linear and Ensemble Classifiers<br />

Plumpton, Catrin Oliver, Bangor Univ.<br />

Kuncheva, Ludmila I., Bangor Univ.<br />

Linden, David E. J., Bangor Univ.<br />

Johnston, Stephen Jaye, Bangor Univ.<br />

The advent of real-time fMRI pattern classification opens many avenues for interactive self-regulation where the brain’s<br />

response is better modelled by multivariate, rather than univariate techniques. Here we test three on-line linear classifiers,<br />

applied to a real fMRI dataset, collected as part of an experiment on the cortical response to emotional stimuli. We propose<br />

a random subspace ensemble as a fast and more accurate alternative to component classifiers. The on-line linear discriminant<br />

classifier (O-LDC) was found to be a better base classifier than the on-line versions of the perceptron and the balanced<br />

winnow.<br />

13:30-16:30, Paper ThBCT8.42<br />

Adaptive Feature and Score Level Fusion Strategy using Genetic Algorithms<br />

Ben Soltana, Wael, Ec. Centrale de Lyon<br />

Ardabilian, Mohsen, Ec. Centrale de Lyon<br />

Chen, Liming, Ec. Centrale de Lyon<br />

Ben Amar, Chokri, Res. Group on Intelligent Machines<br />

Classifier fusion is considered as one of the best strategies for improving performance of general purpose classification systems.<br />

On the other hand, fusion strategy space strongly depends on classifiers, features and data spaces. As the cardinality<br />

of this space is exponential, one needs to resort to a heuristic to find a sub-optimal fusion strategy. In this work, we present<br />

a new adaptive feature and score level fusion strategy (AFSFS) based on adaptive genetic algorithm. AFSFS tunes itself between<br />

feature and matching score level, and improves the final performance over the original on both levels, and as a fusion<br />

method, it does not only contain fusion strategy to combine the most relevant features so as to achieve adequate and optimized<br />

results, but also has the extensive ability to select the most discriminative features. Experiments are provided on the FRGC<br />

database showing that the proposed method produces significantly better results than the baseline fusion methods.<br />

- 308 -


13:30-16:30, Paper ThBCT8.43<br />

Local Binary Pattern-Based Features for Text Identification of Web Images<br />

Jung, Insook, Chonbuk National Univ.<br />

Oh, Il-Seok, Chonbuk National Univ.<br />

We present a method of robustly identifying a text block in complex web images. The method is a MLP (Multi-layer perceptron)<br />

classifier trained on LBP (Local binary patterns), wavelet and shape feature spaces. Especially, we propose adaptive<br />

masks of LBP which responses flexibly to various character sizes. Most of previous works use fixed mask size or<br />

multi level scales by pyramid schemes, which may have weakness in dealing with diverse size of text. Experiments carried<br />

out on 100 web images show promising results.<br />

13:30-16:30, Paper ThBCT8.44<br />

Classification of Polarimetric SAR Images using Evolutionary RBF Neural Networks<br />

Turker, Ince, Izmir Univ. of Ec.<br />

Kiranyaz, Serkan, Tampere Univ. of Tech.<br />

Moncef, Gabbouj, Tampere Univ. of Tech.<br />

This paper proposes an evolutionary RBF network classifier for polar metric synthetic aperture radar ( SAR) images. The<br />

proposed feature extraction process utilizes the full covariance matrix, the gray level co-occurrence matrix (GLCM) based<br />

texture features, and the backscattering power (Span) combined with the H/&alpha;/A decomposition, which are projected<br />

onto a lower dimensional feature space using principal component analysis. An experimental study is performed using<br />

the fully polar metric San Francisco Bay data set acquired by the NASA/Jet Propulsion Laboratory Airborne SAR (AIR-<br />

SAR) at L-band to evaluate the performance of the proposed classifier. Classification results (in terms of confusion matrix,<br />

overall accuracy and classification map) compared to the Wish art and a recent NN-based classifiers demonstrate the effectiveness<br />

of the proposed algorithm.<br />

13:30-16:30, Paper ThBCT8.45<br />

On the Use of Median String for Multi-Source Translation<br />

González Rubio, Jesús, Univ. Pol. de Valencia<br />

Casacuberta, Francisco, Univ. Pol. de Valencia<br />

State-of-the-art approaches to multi-source translation involve a multimodal-like process which applies an individual<br />

translation system to each source language. Then, the translations of the individual systems are combined to obtain a consensus<br />

output. We propose to use the (generalised) median string as the consensus output of the individual translation systems.<br />

Different approximations to the median string are studied as well as different approaches to improve the median<br />

string performance when dealing with natural language strings. The proposed approaches were evaluated on the Europarl<br />

corpus, achieving significant improvements in translation quality.<br />

13:30-16:30, Paper ThBCT8.47<br />

A Lip Contour Extraction Method using Localized Active Contour Model with Automatic Parameter Selection<br />

Liu, Xin, Hong Kong Baptist Univ.<br />

Cheung, Yiu-Ming, Hong Kong Baptist Univ.<br />

Li, Meng, Hong Kong Baptist Univ.<br />

Liu, Hailin, Guangdong Univ. of Technology<br />

Lip contour extraction is crucial to the success of a lipreading system. This paper presents a lip contour extraction algorithm<br />

using localized active contour model with the automatic selection of proper parameters. The proposed approach utilizes a<br />

minimum-bounding ellipse as the initial evolving curve to split the local neighborhoods into the local interior region and<br />

the local exterior region, respectively, and then compute the localized energy for evolving and extracting. This method is<br />

robust against the uneven illumination, rotation, deformation, and the effects of teeth and tongue. Experiments show its<br />

promising result in comparison with the existing methods.<br />

13:30-16:30, Paper ThBCT8.48<br />

Multimodal Sleeping Posture Classification<br />

Huang, Weimin, I2R<br />

Phyo Wai, Aung Aung, Inst. for Infocomm Res.<br />

Foo, Siang Fook, Inst. for Infocomm Res.<br />

- 309 -


Biswas, Jit, Inst. for Infocomm Res.<br />

Liou, Kou Juch, Industrial Tech. Res. Inst.<br />

Hsia, C. C., ITRI<br />

Sleeping posture reveals important information for eldercare and patient care, especially for bed ridden patients. Traditionally,<br />

some works address the problem from either pressure sensor or video image. This paper presents a multimodal<br />

approach to sleeping posture classification. Features from pressure sensor map and video image have been proposed in<br />

order to characterize the posture patterns. The spatiotemporal registration of the two modalities has been considered in<br />

the design, and the joint feature extraction and data fusion is presented. Using multi-class SVM, experiment results demonstrate<br />

that the multimodal approach achieves better performance than the approaches using single modal sensing.<br />

13:30-16:30, Paper ThBCT8.49<br />

Exploiting System Knowledge to Improve ECOC Reject Rules<br />

Simeone, Paolo, Univ. of Cassino<br />

Marrocco, Claudio, Univ. of Cassino<br />

Tortorella, Francesco, Univ. of Cassino<br />

Error Correcting Output Coding is a common technique for multiple class classification tasks which decomposes the original<br />

problem in several two-class problems solved through dichotomizers. Such classification system can be improved<br />

with a reject option which can be defined according to the level of information available from the dichotomizers. This<br />

paper analyzes how this knowledge is useful when applying such reject rules. The nature of the outputs, the kind of the<br />

employed classifiers and the knowledge of their loss function are influential details for the improvement of the general<br />

performance of the system. Experimental results on popular benchmark data sets are reported to show the behavior of the<br />

different schemes.<br />

13:30-16:30, Paper ThBCT8.50<br />

Human Smoking Event Detection using Visual Interaction Clues<br />

Wu, Pin, Yuan-Ze University<br />

Hsieh, Jun-Wei, Yuan-Ze University<br />

Cheng, Jiun-Cheng, National Taiwan Ocean Univ.<br />

Cheng, Shyi-Chyi, National Taiwan Ocean Univ.<br />

Tseng, Shau-Yin, Industry Tech. Res. Institute<br />

This paper presents a novel scheme to automatically and directly detect smoking events in video. In this scheme, a colorbased<br />

ratio histogram analysis is introduced to extract the visual clues from appearance interactions between lighted cigarette<br />

and its human holder. The techniques of color re-projection and Gaussian Mixture Models (GMMs) enable the tasks<br />

of cigarette segmentation and tracking over the background pixels. Then, a key problem for event analysis is the nonregular<br />

form of smoking events. Thus, we propose a self-determined mechanism to analyze this suspicious event using<br />

HHM framework. Due to the uncertainties of cigarette size and color, there is no automatic system which can well analyze<br />

human smoking events directly from videos. The proposed scheme is compatible to detect the smoking events of uncertain<br />

actions with various cigarette sizes, colors, and shapes, and has capacity to extend visual analysis to human events of<br />

similar interaction relationship. Experimental results show the effectiveness and real-time performances of our scheme in<br />

smoking event analysis.<br />

13:30-16:30, Paper ThBCT8.51<br />

Malware Detection on Mobile Devices using Distributed Machine Learning<br />

Sharifi Shamili, Ashkan, RWTH Aachen Univ.<br />

Bauckhage, Christian, Fraunhofer IAIS<br />

Alpcan, Tansu, Tech. Univ. Berlin<br />

This paper presents a distributed Support Vector Machine (SVM) algorithm in order to detect malicious software (malware)<br />

on a network of mobile devices. The light-weight system monitors mobile user activity in a distributed and privacy-preserving<br />

way using a statistical classification model which is evolved by training with examples of both normal usage patterns<br />

and unusual behavior. The system is evaluated using the MIT reality mining data set. The results indicate that the<br />

distributed learning system trains quickly and performs reliably. Moreover, it is robust against failures of individual components.<br />

- 310 -


13:30-16:30, Paper ThBCT8.52<br />

Combining Single Class Features for Improving Performance of a Two Stage Classifier<br />

Cordella, Luigi P., Univ. di Napoli Federico II<br />

De Stefano, Claudio, Univ. of Cassino<br />

Fontanella, Francesco, Univ. of Cassino<br />

Marrocco, Cristina, Univ. of Cassino<br />

Scotto Di Freca, Alessandra, Univ. of Cassino<br />

We propose a feature selection—based approach for improving classification performance of a two stage classification<br />

system in contexts where a high number of features is involved. A problem with a set of N classes is subdivided into a set<br />

of N two class problems. In each problem, a GA—based feature selection algorithm is used for finding the best subset of<br />

features. These subsets are then used for training N classifiers. In the classification phase, unknown samples are given in<br />

input to each of the trained classifiers by using the corresponding subspace. In case of conflicting responses, the sample<br />

is sent to a suitably trained supplementary classifier. The proposed approach has been tested on a real world dataset containing<br />

hyper—spectral image data. The results favourably compare with those obtained by other methods on the same<br />

data.<br />

13:30-16:30, Paper ThBCT8.53<br />

The Rex Leopold II Model: Application of the Reduced Set Density Estimator to Human Categorization<br />

De Schryver, Maarten, Ghent Univ.<br />

Roelstraete, Bjorn, Ghent Univ.<br />

Reduction techniques are important tools in machine learning and pattern recognition. In this article, we demonstrate how<br />

a kernel-based density estimator can be used as a tool for understanding human category representation. Despite the dominance<br />

of exemplar models of categorization, there is still ambiguity about the number of exemplars stored in memory.<br />

Here, we illustrate that by omitting exemplars categorization performance is not affected.<br />

13:30-16:30, Paper ThBCT8.54<br />

A Hybrid Method for Feature Selection based on Mutual Information and Canonical Correlation Analysis<br />

Sakar, Cemal Okan, Bahcesehir Univ.<br />

Kursun, Olcay, Istanbul Univ.<br />

Mutual Information (MI) is a classical and widely used dependence measure that generally can serve as a good feature selection<br />

algorithm. However, under-sampled classes or rare but certain relations are overlooked by this measure, which can<br />

result in missing relevant features that could be very predictive of variables of interest, such as certain phenotypes or disorders<br />

in biomedical research, rare but dangerous factors in ecology, intrusions in network systems, etc. On the other hand,<br />

Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence<br />

but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the<br />

amount of information (entropy) of the dependence. In this paper, we propose Predictive Mutual Information (PMI), a hybrid<br />

measure of relevance not only is based on MI but also accounts for predictability of signals from one another as in<br />

KCCA. We show that PMI has more improved feature detection capability than MI and KCCA, especially in catching<br />

suspicious coincidences that are rare but potentially important not only for subsequent experimental studies but also for<br />

building computational predictive models which is demonstrated on two toy datasets and a real intrusion detection system<br />

dataset.<br />

13:30-16:30, Paper ThBCT8.55<br />

Speech Magnitude-Spectrum Information-Entropy (MSIE) for Automatic Speech Recognition in Noisy Environments<br />

Nolazco-Flores, Juan A., Inst. Tecnológico y de Estudios Superiores de Monterrey<br />

Aceves-López, Roberto A., Inst. Tecnológico y de Estudios Superiores de Monterrey<br />

García-Perera, L. Paola, Inst. Tecnológico y de Estudios Superiores de Monterrey<br />

The Magnitude-Spectrum Information-Entropy (MSIE) of the speech signal is presented as an alternative representation<br />

of the speech that can be used to mitigate the mismatch between training and testing conditions. The speech-magnitude<br />

spectrum is considered as a random variable from which entropy coefficients can be calculated for each frame. By concatenating<br />

these entropic coefficients to its corresponding MFCC vector, then calculating the dynamic coefficients, and<br />

- 311 -


the results show an improvement compared to a baseline. The MSIE effectiveness was tested under the Aurora 2 database<br />

audio ?les. When trained in clean speech, the experimental results obtained by the MSIE concatenated to the MFCC outperform<br />

the results obtained with the MFCC baseline system for selected types of noises at different SNRs. For this selected<br />

group of noises the overall improvement performance in the range 0 dB to 20 dB for the Aurora 2 database is of 15.06%.<br />

13:30-16:30, Paper ThBCT8.56<br />

Unsupervised Image Retrieval with Similar Lighting Conditions<br />

Serrano Talamantes, Jose Felix, Centro De Investigacion en Computacion<br />

Aviles, Carlos, Univ. Autónoma Metropolitana-Azcapotzalco México<br />

Sossa, Humberto, Center for Computing Res. CIC-IPN<br />

Villegas, Juan, Univ. Autónoma Metropolitana-Azcapotzalco México<br />

Olague, Gustavo, Centro de Investiación Científica y de Educación Superior<br />

In this work a new method to retrieve images with similar lighting conditions is presented. It is based on automatic clustering<br />

and automatic indexing. Our proposal belongs to Content Based Image Retrieval (CBIR) category. The goal is to<br />

retrieve from a database, images (by their content) with similar lighting conditions. When we look at images taken from<br />

outdoor scenes, much of the information perceived depends on the lighting conditions. The proposal combines fixed and<br />

random extracted points for feature extraction. The describing features are the mean, the standard deviation and the homogeneity<br />

(from the co-occurrence matrix) of a sub-image extracted from the three color channels: (H, S, I). A K-MEANS<br />

algorithm and a 1-NN classifier are used to build an indexed database of 300 images in order to retrieve images with<br />

similar lighting conditions applied on sky regions such as: sunny, partially cloudy and completely cloudy. One of the advantages<br />

of the proposal is that we do not need to manually label the images for their retrieval. The performance of our<br />

framework is demonstrated through several experimental results, including the improved rates for images retrieval with<br />

similar lighting conditions. A comparison with another similar work is also presented.<br />

13:30-16:30, Paper ThBCT8.57<br />

Lattice-Based Anomaly Rectification for Sport Video Annotation<br />

Khan, Aftab, Univ. of Surrey<br />

Windridge, David, Univ. of Surrey<br />

De Campos, Teofilo, Univ. of Surrey<br />

Anomaly detection has received much attention within the literature as a means of determining, in an unsupervised manner,<br />

whether a learning domain has changed in a fundamental way. This may require continuous adaptive learning to be abandoned<br />

and a new learning process initiated in the new domain. A related problem is that of anomaly rectification; the adaptation<br />

of the existing learning mechanism to the change of domain. As a concrete instantiation of this notion, the current<br />

paper investigates a novel lattice-based HMM induction strategy for arbitrary court-game environments. We test (in real<br />

and simulated domains) the ability of the method to adapt to a change of rule structures going from tennis singles to tennis<br />

doubles. Our long term aim is to build a generic system for transferring game-rule inferences.<br />

13:30-16:30, Paper ThBCT8.58<br />

An Ensemble of Classifiers Approach to Steganalysis<br />

Bayram, Sevinc, Pol. Inst. of NYU<br />

Dirik, Ahmet Emir, Pol. Inst. of NYU<br />

Sencar, Husrev Taha, TOBB Univ. of Ec. and Tech.<br />

Memon, Nasir, Pol. Inst. of New York Univ.<br />

Most work on steganalysis, except a few exceptions, have primarily focused on providing features with high discrimination<br />

power without giving due consideration to issues concerning practical deployment of steganalysis methods. In this work,<br />

we focus on machine learning aspect of steganalyzer design and utilize a hierarchical ensemble of classifiers based approach<br />

to tackle two main issues. Firstly, proposed approach provides a workable and systematic procedure to incorporate several<br />

steganalyzers together in a composite steganalyzer to improve detection performance in a scalable and cost-effective manner.<br />

Secondly, since the approach can be readily extended to multi-class classification it can also be used to infer the<br />

steganographic technique deployed in generation of a stego-object. We provide results to demonstrate the potential of the<br />

proposed approach.<br />

- 312 -


13:30-16:30, Paper ThBCT8.59<br />

Discriminating Intended Human Objects in Consumer Videos<br />

Uegaki, Hiroshi, Osaka Univ.<br />

Nakashima, Yuta, Osaka Univ.<br />

Babaguchi, Noboru, Osaka Univ.<br />

In a consumer video, there are not only intended objects, which are intentionally captured by the camcorder user, but also<br />

unintended objects, which are accidentally framed-in. Since the intended objects are essential to present what the camcorder<br />

user wants to express in the video, discriminating the intended objects from the unintended objects are beneficial for many<br />

applications, e.g., video summarization, privacy protection, and so forth. In this paper, focusing on human objects, we<br />

propose a method for discriminating the intended human objects from the unintended human objects. We evaluated the<br />

proposed method using 10 videos captured by 3 camcorder users. The results demonstrate that the proposed method successfully<br />

discriminates the intended human objects with 0.45 of recall and 0.80 of precision.<br />

13:30-16:30, Paper ThBCT8.60<br />

Detecting Human Activity Profiles with Dirichlet Enhanced Inhomogeneous Poisson Processes<br />

Shimosaka, Masamichi, The Univ. of Tokyo<br />

Ishino, Takahito, The Univ. of Tokyo<br />

Noguchi, Hiroshi, The Univ. of Tokyo<br />

Mori, Taketoshi, The Univ. of Tokyo<br />

Sato, Tomomasa, The Univ. of Tokyo<br />

This paper describes an activity pattern mining method via inhomogeneous Poisson point processes (IPPPs) from timeseries<br />

of count data generated in behavior detection by pyroelectric sensors. IPPP reflects the idea that typical human activity<br />

is rhythmic and periodic. We also focus on the idea that activity patterns are affected by exogenous phenomena,<br />

such as the day of the week, and weather condition. Because single IPPP could not tackle this idea, Dirichlet process mixtures<br />

(DPM) are leveraged in order to discriminate and discover different activity patterns caused by such factors. The use<br />

of DPM leads us to discover the appropriate number of the typical daily patterns automatically. Experimental result using<br />

long-term count data shows that our model successfully and efficiently discovers typical daily patterns.<br />

13:30-16:30, Paper ThBCT8.61<br />

I-FAC: Efficient Fuzzy Associative Classifier for Object Classes in Images<br />

Mangalampalli, Ashish, International Inst. of Information Tech. Hyderabad, India<br />

Chaoji, Vineet, Yahoo! Inc<br />

Sanyal, Subhajit, Yahoo! Lab. Bangalore, India<br />

We present I-FAC, a novel fuzzy associative classification algorithm for object class detection in images using interest<br />

points. In object class detection, the negative class CN is generally vague (CN = U CP ; where U and CP are the universal<br />

and positive classes respectively). But, image classification necessarily requires both positive and negative classes for<br />

training. I-FAC is a single class image classifier that relies only on the positive class for training. Because of its fuzzy<br />

nature, I-FAC also handles polysemy and synonymy (common problems in most crisp (non-fuzzy) image classifiers) very<br />

well. As associative classification leverages frequent patterns mined from a given dataset, its performance as adjudged<br />

from its false-positive-rate(FPR)-versus-recall curve is very good, especially at lower FPRs when its recall is even better.<br />

IFAC has the added advantage that the rules used for classification have clear semantics, and can be comprehended easily,<br />

unlike other classifiers, such as SVM, which act as black-boxes. From an empirical perspective (on standard public<br />

datasets), the performance of I-FAC is much better, especially at lower FPRs, than that of either bag-of-words (BOW) or<br />

SVM (both using interest points).<br />

13:30-16:30, Paper ThBCT8.62<br />

Audio-Visual Data Fusion using a Particle Filter in the Application of Face Recognition<br />

Steer, Michael, Otto-von-guericke-Univ. Magdeburg<br />

This paper describes a methodology by which audio and visual data about a scene can be fused in a meaningful manner<br />

in order to locate a speaker in a scene. This fusion is implemented within a Particle Filter such that a single speaker can<br />

be identified in the presence of multiple visual observations. The advantages of this fusion are that weak sensory data<br />

from either modality can be reinforced and the presence of noise can be reduced.<br />

- 313 -


13:30-16:30, Paper ThBCT8.63<br />

The Problem of Fragile Feature Subset Preference in Feature Selection Methods and a Proposal of Algorithmic<br />

Workaround<br />

Somol, Petr, Inst. of Information Theory and Automation, Czech<br />

Grim, Jiří, Inst. of Information Theory and Automation<br />

Pudil, Pavel, Prague Univ. of Ec.<br />

We point out a problem inherent in the optimization scheme of many popular feature selection methods. It follows from<br />

the implicit assumption that higher feature selection criterion value always indicates more preferable subset even if the<br />

value difference is marginal. This assumption ignores the reliability issues of particular feature preferences, over-fitting<br />

and feature acquisition cost. We propose an algorithmic extension applicable to many standard feature selection methods<br />

allowing better control over feature subset preference. We show experimentally that the proposed mechanism is capable<br />

of reducing the size of selected subsets as well as improving classifier generalization.<br />

ThBCT9 Lower Foyer<br />

Signal, Speech, and Image Processing Poster Session<br />

Session chair: Ariki, Yasuo (Kobe Univ.)<br />

13:30-16:30, Paper ThBCT9.1<br />

Removing Partial Occlusion from Blurred Thin Occluders<br />

Mccloskey, Scott, McGill Univ. Honeywell<br />

Langer, Michael, McGill Univ.<br />

Siddiqi, Kaleem, McGill Univ.<br />

We present a method to remove partial occlusion that arises from out-of-focus thin foreground occluders such as wires,<br />

branches, or a fence. Such partial occlusion causes the irradiance at a pixel to be a weighted sum of the radiances of a<br />

blurred foreground occluder and that of the background. The result is that the background component has lower contrast<br />

than it would if seen without the occluder. In order to remove the contribution of the foreground in such regions, we characterize<br />

the position and size of the occluder in a narrow aperture image. In subsequent images with wider apertures, we<br />

use this characterization to remove the contribution of the foreground, thereby restoring contrast in the background. We<br />

demonstrate our method on real camera images without assuming that the background is static.<br />

13:30-16:30, Paper ThBCT9.2<br />

A New Approach to Aircraft Surface Inspection based on Directional Energies of Texture<br />

Mumtaz, Mustafa, National Univ. of Sciences and Tech.<br />

Bin Mansoor, Atif, National Univ. of Sciences and Tech.<br />

Masood, Hassan, National Univ. of Sciences and Tech.<br />

Non Destructive Inspections (NDI) plays a vital role in aircraft industry as it determines the structural integrity of aircraft<br />

surface and material characterization. The existing NDI methods are time consuming, we propose a new NDI approach<br />

using Digital Image Processing that has the potential to substantially decrease the inspection time. The aircraft imagery is<br />

analyzed by two methods i.e Contourlet Transform (CT) and Discrete Cosine Transform (DCT). With the help of Contourlet<br />

Transform the two dimensional (2-D) spectrum is divided into fine slices, using iterated directional filter banks. Next, directional<br />

energy components for each block of the decomposed subband outputs are computed. These energy values are<br />

used to distinguish between the crack and scratch images using the Dot Product classifier. In next approach, the aircraft<br />

imagery is decomposed into high and low frequency components using DCT and the first order moment is determined to<br />

form feature vectors. A correlation based approach is then used for distinction between crack and scratch surfaces. A comparative<br />

examination between the two techniques on a database of crack and scratch images revealed that texture analysis<br />

using the combined transform based approach gave the best results by giving an accuracy of 96.6% for the identification<br />

of crack surfaces and 98.3% for scratch surfaces.<br />

13:30-16:30, Paper ThBCT9.3<br />

A Generalized Anisotropic Diffusion for Defect Detection in Low-Contrast Surfaces<br />

Chao, Shin-Min, Utechzone Co. Ltd.<br />

Tsai, Du-Ming, Yuan-Ze Univ.<br />

Li, Wei-Chen, Yuan-Ze Univ.<br />

Chiu, Wei-Yao, Yuan-Ze Univ.<br />

- 314 -


In this paper, an anisotropic diffusion model with a generalized diffusion coefficient function is presented for defect detection<br />

in low-contrast surface images and, especially, aims at material surfaces found in liquid crystal display (LCD)<br />

manufacturing. A defect embedded in a low-contrast surface image is extremely difficult to detect because the intensity<br />

difference between unevenly-illuminated background and defective regions are hardly observable. The proposed anisotropic<br />

diffusion model provides a generalized diffusion mechanism that can flexibly change the curve of the diffusion coefficient<br />

function. It adaptively carries out a smoothing process for faultless areas and performs a sharpening process for defect<br />

areas in an image. An entropy criterion is proposed as the performance measure of the diffused image and then a stochastic<br />

evolutionary computation algorithm, particle swarm optimization (PSO), is applied to automatically determine the best<br />

parameter values of the generalized diffusion coefficient function. Experimental results have shown that the proposed<br />

method can effectively and efficiently detect small defects in low-contrast surface images.<br />

13:30-16:30, Paper ThBCT9.4<br />

Impact of Vector Ordering Strategies on Morphological Unmixing of Remotely Sensed Hyperspectral Images<br />

Plaza, Antonio, Univ. of Extremadura<br />

Hyper spectral imaging is a new technique in remote sensing that generates hundreds of images, corresponding to different<br />

wavelength channels, for the same area on the surface of the Earth. In previous work, we have explored the application of<br />

morphological operations to integrate both spatial and spectral responses in hyper spectral data analysis. These operations<br />

rely on ordering pixel vectors in spectral space, but there is no unambiguous means of defining the minimum and maximum<br />

values between two vectors of more than one dimension. Our original contribution in this paper is to examine the impact<br />

of different vector ordering strategies on the definition of multi-channel morphological operations. Our focus is on morphological<br />

unmixing, which decomposes each pixel vector in the hyper spectral scene into a combination of pure spectral<br />

signatures (called end members) and their associated abundance fractions, allowing sub-pixel characterization. Experiments<br />

are conducted using real hyper spectral data sets collected by NASA/JPL’s Airborne Visible Infra-Red Imaging Spectrometer<br />

(AVIRIS) system.<br />

13:30-16:30, Paper ThBCT9.5<br />

A Recursive and Model-Constrained Region Splitting Algorithm for Cell Clump Decomposition<br />

Xiong, Wei, Inst. for Infocomm Res. A-STAR<br />

Ong, Sim Heng, National Univ. of Singapore<br />

Lim, Joo-Hwee, Inst. for Infocomm Res.<br />

Decomposition of cells in clumps is a difficult segmentation task requiring region splitting techniques. Techniques that do<br />

not employ prior shape constraints usually fail to achieve accurate segmentation. Those using shape constraints are unable<br />

to cope with large clumps and occlusions. In this work, we propose a model-constrained region splitting algorithm for cell<br />

clump decomposition. We build the cell model using joint probability distribution of invariant shape features. The shape<br />

model, the contour smoothness and the gradient information along the cut are used to optimize the splitting in a recursive<br />

manner. The short cut rule is also adopted as a strategy to speed up the process. The algorithm performs well in validation<br />

experiments using 60 images with 4516 cells and 520 clumps.<br />

13:30-16:30, Paper ThBCT9.6<br />

Bounding-Box based Segmentation with Single Min-Cut using Distant Pixel Similarity<br />

Pham, Viet-Quoc, The Univ. of Tokyo<br />

Takahashi, Keita, The Univ. of Tokyo<br />

Naemura, Takeshi, The Univ. of Tokyo<br />

This paper addresses the problem of interactive image segmentation with a user-supplied object bounding box. The underlying<br />

problem is the classification of pixels into foreground and background, where only background information is<br />

provided with sample pixels. Many approaches treat appearance models as an unknown variable and optimize the segmentation<br />

and appearance alternatively, in an expectation maximization manner. In this paper, we describe a novel approach<br />

to this problem: the objective function is expressed purely in terms of the unknown segmentation and can be optimized<br />

using only one minimum cut calculation. We aim to optimize the trade-off of making the foreground layer as large as possible<br />

while keeping the similarity between the foreground and background layers as small as possible. This similarity is<br />

formulated using the similarities of distant pixel pairs. We evaluated our algorithm on the GrabCut dataset and demonstrated<br />

that high-quality segmentations were attained at a fast calculation speed.<br />

- 315 -


13:30-16:30, Paper ThBCT9.7<br />

Image Retargeting in Compressed Domain<br />

Murthy, O.v. Ramana, Nanyang Tech. Univ.<br />

Muthuswamy, Karthik, Nanyang Tech. Univ.<br />

Rajan, Deepu, Nanyang Tech. Univ.<br />

Chia, Liang-Tien, Nanyang Tech. Univ.<br />

A simple algorithm for image retargeting in the compressed domain is proposed. Most existing retargeting algorithms<br />

work directly in the spatial domain of the raw image. Here, we work on the DCT coefficients of a JPEG-compressed image<br />

to generate a gradient map that serves as an importance map to help identify those parts in the image that need to be<br />

retained during the retargeting process. Each 8x8 block of DCT coefficients is scaled based on the least importance value.<br />

Retargeting can be done both in the horizontal and vertical directions with the same framework. We also illustrate image<br />

enlargement using the same method. Experimental results show that the proposed algorithm produces less distortion in<br />

the retargeted image compared to some other algorithms reported recently.<br />

13:30-16:30, Paper ThBCT9.8<br />

Progressive MAP-Based Deconvolution with Pixel-Dependent Gaussian Prior<br />

Tanaka, Masayuki, Tokyo Inst. of Tech.<br />

Kanda, Takafumi, Tokyo Inst. of Tech.<br />

Okutomi, Masatoshi,<br />

A deconvolution is a fundamental technique and used in various vision applications. A maximum a posteriori estimation<br />

is known as a powerful tool. In this paper, we propose a progressive MAP-based deconvolution algorithm with a pixel dependent<br />

Gaussian image prior. In the proposed algorithm, a mean and a variance for each pixel are adaptively estimated.<br />

Then, the mean and the variance are progressively updated. We experimentally show that the proposed algorithm is comparable<br />

to the state-of-the-art algorithms in the case that the true point spread function (PSF) is used for the deconvolution,<br />

and that the proposed algorithm outperforms in the non-true PSF case.<br />

13:30-16:30, Paper ThBCT9.9<br />

A Fast Image Inpainting Method based on Hybrid Similarity-Distance<br />

Liu, Jie, Chinese Acad. of Sciences<br />

Zhang, Shuwu, Chinese Acad. of Sciences<br />

Yang, Wuyi, Chinese Acad. of Sciences<br />

Li, Heping, Chinese Acad. of Sciences<br />

A fast image in painting method based on hybrid similarity-distance is proposed in this paper. In Criminisi et al.’s work<br />

[1], similarity distance are not reliable enough in many cases and the algorithm performs inefficiently. To solve these problems,<br />

we propose a new searching strategy to accelerate the algorithm. In addition, we modify the confidence-updating<br />

rule to make more reasonable the distributions of the confidences in source region. Besides, taking account of the stationarity<br />

of texture and the reliability of the source regions, we present a hybrid similarity-distance, which combines the<br />

distance in color space with the distance in spatial space by weight coefficients related to the confidence value. A more<br />

reasonable patch will be found out by this hybrid similarity-distance. The experiments verify that the proposed method<br />

yields qualitative improvements compared to Criminisi et al.’s work [1].<br />

13:30-16:30, Paper ThBCT9.10<br />

Reversible Integer 2-D Discrete Fourier Transform by Control Bits<br />

Dursun, Serkan, Univ. of Texas at San Antonio<br />

Grigoryan, Artyom M., Univ. of Texas at San Antonio<br />

This paper describes the 2-D reversible integer discrete Fourier transform (RiDFT), which is based on the concept of the<br />

paired representation of the 2-D image, which is referred to as the unique 2-D frequency and 1-D time representation. The<br />

2-D DFT of the image is split into a minimum set of short transforms, and the image is represented as a set of 1-D signals.<br />

The paired 2-DDFT involves a few operations of multiplication that can be approximated by integer transforms, such as<br />

one-point transforms with one control bit. 24 control bits are required to perform the 8x8-point RiDFT, and 264 control<br />

bits for the 16x16-point 2-D RiDFT of real inputs. The fast paired method of calculating the 1-D DFT is used. The computational<br />

complexity of the proposed 2-D RiDFTs is comparative with the complexity of the fast 2-D DFT.<br />

- 316 -


13:30-16:30, Paper ThBCT9.12<br />

Image Inpainting based on Local Optimisation<br />

Zhou, Jun, National ICT Australia<br />

Robles-Kelly, Antonio, National ICT Australia<br />

In this paper, we tackle the problem of image in painting which aims at removing objects from an image or repairing damaged<br />

pictures by replacing the missing regions using the information in the rest of the scene. The image in painting method<br />

proposed here builds on an exemplar-based perspective so as to improve the local consistency of the in painted region.<br />

This is done by selecting the optimal patch which maximises the local consistency with respect to abutting candidate<br />

patches. The similarity computation generates weights based upon an edge prior and the structural differences between in<br />

painting exemplar candidates. This treatment permits the generation of an in painting sequence based on a list of factors.<br />

The experiments show that the proposed method delivers a margin of improvement as compared to alternative methods.<br />

13:30-16:30, Paper ThBCT9.13<br />

Image Processing based Approach for Retrieving Data from a Seismic Section in Bitmap Format<br />

Chevion, Dan, IBM Res. Lab. in Haifa<br />

Navon, Yaakov, IBM<br />

Ramm, Dov, former Res. stuff member of IBM Israel Res. Lab.<br />

A new method for retrieving seismic data from a seismic section provided in a bitmap format is described. The method is<br />

based on image processing techniques and includes creating a grey level image of a seismic section, processing the grey<br />

level image (by integration, filtering, etc.) and then reconstructing digitized values of individual seismic traces_ from the<br />

resulting image, thus ending with the data in standard SEG-Y format<br />

13:30-16:30, Paper ThBCT9.14<br />

Visible Entropy: A Measure for Image Visibility<br />

Hou, Zujun, Inst. for Infocomm Res.<br />

Yau, Wei-Yun, Inst. for Infocomm Res.<br />

Image visibility is a fundamental issue in the field of computer vision. This paper investigates the connection between<br />

histogram and image visibility, where the concept of entropy is employed to depict the information content of the histogram.<br />

It turns out that image visibility is more dependent on the observed intensity levels with higher frequencies and the distribution<br />

of their locations in the range of intensity levels. With this in mind, the concept of visible entropy is proposed. The<br />

usefulness of the proposed visibility measure has been evaluated using a number of realistic images.<br />

13:30-16:30, Paper ThBCT9.15<br />

Research the Performance of a Recursive Algorithm of the Local Discrete Wavelet Transform<br />

Kopenkov, Vasiliy, RAS<br />

Myasnikov, Vladislav, RAS<br />

We experimentally compare the performance of two fast algorithms for computing the local discrete wavelet transform of<br />

one-dimensional signals: the Mallatalgorithm and a recursive algorithm. For the comparison purposes, we analyze Haar<br />

wavelet bases for one and two-dimensional signals, an extension of the Haar basis with the scale coefficient 3, and biorthogonal<br />

polynomial spline wavelets with finite support.<br />

13:30-16:30, Paper ThBCT9.16<br />

Auditory Features Revisited for Robust Speech Recognition<br />

Harte, Naomi, Trinity Coll. Dublin<br />

Kelly, Finnian, Trinity Coll. Dublin<br />

Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most<br />

promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings<br />

with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC).<br />

Standard Mel-Frequency Cepstral Coefficients (MFCC) are also tested for reference. The performance of all features is<br />

reported on the TIMIT database using a HMM-based recogniser. It is found that the PNCC features outperform MFCC in<br />

- 317 -


clean conditions and are most robust to noise. ZCPA performance is shown to vary widely with filter bank configuration<br />

and frame length. The ZCPA performance is poor in clean conditions but is the least affected by white noise. PNCC is<br />

shown to be the most promising new feature set for robust ASR in recent years.<br />

13:30-16:30, Paper ThBCT9.17<br />

Sparse Representation for Speaker Identification<br />

Naseem, Imran, The Univ. of Western Australia<br />

Togneri, Roberto, The Univ. of Western Australia<br />

Bennamoun, Mohammed, The Univ. of Western Australia<br />

We address the closed-set problem of speaker identification by presenting a novel sparse representation classification algorithm.<br />

We propose to develop an over complete dictionary using the GMM mean super vector kernel for all the training<br />

utterances. A given test utterance corresponds to only a small fraction of the whole training database. We therefore propose<br />

to represent a given test utterance as a linear combination of all the training utterances, thereby generating a naturally<br />

sparse representation. Using this sparsity, the unknown vector of coefficients is computed via l1minimization which is<br />

also the sparsest solution [12]. Ideally, the vector of coefficients so obtained has nonzero entries representing the class<br />

index of the given test utterance. Experiments have been conducted on the standard TIMIT [14] database and a comparison<br />

with the state-of-art speaker identification algorithms yields a favorable performance index for the proposed algorithm.<br />

13:30-16:30, Paper ThBCT9.18<br />

Latency in Speech Feature Analysis for Telepresence State Coding<br />

O’Gorman, Lawrence, Alcatel-Lucent Bell Lab.<br />

For video conferencing, there are network bandwidth and screen real-estate constraints that limit the number of user channels.<br />

We propose an intermediate transmission mode that transmits only at events, where these are detected by both audio<br />

and video changes from the short-term signal average. Our objective in this paper is to determine latency until the audio<br />

portion of a single telepresence channel stabilizes. It is this stable signal from which we detect events. We describe a recursive<br />

filter approach for feature determination and experiments on the Switchboard telephone call database. Results<br />

show latency to stable signal of up to 10 seconds. Although events can be detected much more quickly.<br />

13:30-16:30, Paper ThBCT9.19<br />

Automatically Detecting Peaks in Terahertz Time-Domain Spectroscopy<br />

Stephani, Henrike, Fraunhofer ITWM<br />

Jonuscheit, Joachim, Fraunhofer IPM<br />

Robiné, Christoph, Fraunhofer IPM<br />

Heise, Bettina, JKU<br />

To classify spectroscopic measurements it is necessary to have comparable methods of evaluation. In Terahertz (THz)<br />

time-domain spectroscopy, as a new technology, neither the presentation of the data nor the peak detection is standardized<br />

yet. We propose a procedure for automatic peak extraction in THz spectra of chemical compounds. After preprocessing in<br />

the time-domain, we use a variance based algorithm for determining the valid frequency region. We furthermore propose<br />

a baseline correction using simulated THz spectra. We illustrate how this procedure works on the example of hyperspectral<br />

THz measurements of six chemical compounds. Subsequently we propose to use unsupervised classification on the thus<br />

processed data to robustly detect the characteristic peaks of a compound.<br />

13:30-16:30, Paper ThBCT9.20<br />

Iwasawa Decomposition and Computational Riemannian Geometry<br />

Lenz, Reiner, Linköping Univ.<br />

Mochizuki, Rika, Nippon Telegraph and Telephone Corp.<br />

Chao, Jinhui, Chuo Univ.<br />

We investigate several topics related to manifold-techniques for signal processing. On the most general level we consider<br />

manifolds with a Riemannian Geometry. These manifolds are characterized by their inner products on the tangent spaces.<br />

We describe the connection between the symmetric positive-definite matrices defining these inner products and the Cartan<br />

and the Iwasawa decomposition of the general linear matrix groups. This decomposition gives rise to the decomposition<br />

- 318 -


of the inner product matrices into diagonal matrices and orthonormal and into diagonal and upper triangular matrices.<br />

Next we describe the estimation of the inner product matrices from measured data as an optimization process on the homogeneous<br />

space of upper triangular matrices. We show that the decomposition leads to simple forms of partial derivatives<br />

that are commonly used in optimization algorithms. Using the group theoretical parametrization ensures also that all intermediate<br />

estimates of the inner product matrix are symmetric and positive definite. Finally we apply the method to a<br />

problem from psychophysics where the color perception properties of an observer are characterized with the help of color<br />

matching experiments. We will show that measurements from color weak observers require the enforcement of the positive-definiteness<br />

of the matrix with the help of the manifold optimization technique.<br />

13:30-16:30, Paper ThBCT9.21<br />

Rethinking Algorithm Design and Development in Speech Processing<br />

Stadelmann, Thilo, Univ. of Marburg<br />

Wang, Yinghui, Univ. of Marburg<br />

Smith, Matthew, Univ. of Hannover<br />

Ewerth, Ralph, Univ. of Marburg<br />

Freisleben, Bernd, Univ. of Marburg<br />

Speech processing is typically based on a set of complex algorithms requiring many parameters to be specified. When<br />

parts of the speech processing chain do not behave as expected, trial and error is often the only way to investigate the reasons.<br />

In this paper, we present a research methodology to analyze unexpected algorithmic behavior by making (intermediate)<br />

results of the speech processing chain perceivable and intuitively comprehensible by humans. The workflow of the<br />

process is explicated using a real-world example leading to considerable improvements in speaker clustering. The described<br />

methodology is supported by a software toolbox available for download.<br />

13:30-16:30, Paper ThBCT9.22<br />

Phone-Conditioned Suboptimal Wiener Filtering<br />

Gonzalez-Caravaca, Guillermo, Univ. Autonoma de Madrid<br />

Toledano, Doroteo, Univ. Autonoma de Madrid<br />

Puertas, Maria, Univ. Autonoma de Madrid<br />

A novel way of managing the compromise between noise reduction and speech distortion in Wiener filters is presented. It<br />

is based on adjusting the amount of noise reduced, and therefore the speech distortion introduced, on a phone-by-phone<br />

basis. We show empirically that optimal Wiener filters produce different amounts of speech distortion for different phones.<br />

Therefore we propose a phone-conditioned suboptimal Wiener filter that uses different amounts of noise reduction for<br />

each phone, based on a previous estimation of the amount of distortion introduced. Speech recognition results have shown<br />

that phone conditioning suboptimal Wiener filtering can provide almost a 5% additional relative improvement in word<br />

accuracy over comparable optimal Wiener filtering.<br />

13:30-16:30, Paper ThBCT9.23<br />

Geodesic Active Fields on the Sphere<br />

Zosso, Dominique, École Pol. Fédérale de Lausanne<br />

Thiran, Jean-Philippe, École Pol. Fédérale de Lausanne<br />

In this paper, we propose a novel method to register images defined on spherical meshes. Instances of such spherical<br />

images include inflated cortical feature maps in brain medical imaging or images from omni directional cameras. We apply<br />

the Geodesic Active Fields (GAF) framework locally at each vertex of the mesh. Therefore we define a dense deformation<br />

field, which is embedded in a higher dimensional manifold, and minimize the weighted Polyakov energy. While the<br />

Polyakov energy itself measures the hyper area of the embedded deformation field, its weighting allows to account for the<br />

quality of the current image alignment. Iteratively minimizing the energy drives the deformation field towards a smooth<br />

solution of the registration problem. Although the proposed approach does not necessarily outperform state-of-the-art<br />

methods that are tightly tailored to specific applications, it is of methodological interest due to its high degree of flexibility<br />

and versatility.<br />

- 319 -


13:30-16:30, Paper ThBCT9.24<br />

Emotional Speech Classification based on Multi View Characterization<br />

Mahdhaoui, Ammar, Univ. Pierre & Marie Curie<br />

Chetouani, M., Inst. des Systèmes Intelligents et Robotique<br />

Emotional speech classification is a key problem in social interaction analysis. Traditional emotional speech classification<br />

methods are completely supervised and require large amounts of labeled data. In addition, various feature sets are usually<br />

used to characterize the emotional speech signals. Therefore, we propose a new co-training algorithm based on multiview<br />

features. More specifically, we adopt different features for the characterization of speech signals to form different<br />

views for classification, so as to extract as much discriminative information as possible. We then use the co-training algorithm<br />

to classify emotional speech with only few annotations. In this article, a dynamic weighted co-training algorithm is<br />

developed to combine different features (views) to predict the common class variable. Experiments prove the validity and<br />

effectiveness of this method compared to self-training algorithm.<br />

13:30-16:30, Paper ThBCT9.25<br />

Image Inpainting using Structure-Guided Priority Belief Propagation and Label Transformations<br />

Hsin, Heng-Feng, National Chung Cheng Univ.<br />

Leou, Jin-Jang, National Chung Cheng Univ.<br />

Lin, Cheng-Shian, National Chung Cheng Univ.<br />

Chen, Hsuan-Ying, National Chung Cheng Univ.<br />

In this study, an image in painting approach using structure-guided priority belief propagation (BP) and label transformations<br />

is proposed. The proposed approach contains five stages, namely, Markov random field (MRF) node determination,<br />

structure map generation, label set enlargement by label transformations, image in painting by priority-BP optimization,<br />

and overlapped region composition. Based on experimental results obtained in this study, as compared with three comparison<br />

approaches, the proposed approach provides the better image in painting results.<br />

13:30-16:30, Paper ThBCT9.26<br />

Comparison of Syllable/Phone HMM based Mandarin TTS<br />

Duan, Quansheng, Tsinghua Univ.<br />

Kang, Shiyin, Tsinghua Univ.<br />

Shuang, Zhiwei, IBM Res. - China<br />

Wu, Zhiyong, Tsinghua Univ.<br />

Cai, Lianhong, Tsinghua Univ.<br />

Qin, Yong, IBM Res. - China<br />

The performance of HMM-based text to speech (TTS) system is affected by the basic modeling units and the size of<br />

training data. This paper compares two HMM based Mandarin TTS systems using syllable and phone as basic units respectively<br />

with 1000, 3000 and 5000 sentences’ training data. Two female speakers’ corpora are used as training data for<br />

evaluation. For both corpora, the system using syllable as basic unit outperforms the system using phone as basic unit<br />

with 3000 and 5000 sentences’ training data.<br />

13:30-16:30, Paper ThBCT9.27<br />

QRS Complex Detection by Non Linear Thresholding of Modulus Maxima<br />

Jalil, Bushra, Univ. de Bourgogne<br />

Laligant, Olivier, Univ. de Bourgogne<br />

Fauvet, Eric, Univ. de Bourgogne<br />

Beya, Ouadi, Univ. de Bourgogne<br />

Electrocardiogram (ECG) signal is used to analyze the cardiovascular activity in the human body and has a primary role<br />

in the diagnosis of several heart diseases. The QRS complex is the most distinguishable component in the ECG. Therefore,<br />

the accuracy of the detection of QRS complex is crucial to the performance of subsequent machine learning algorithms<br />

for cardiac disease classification. The aim of the present work is to detect QRS wave from ECG signals. Wavelet transform<br />

filtering is applied to the signal in order to remove baseline drift, followed by QRS localization. By using the property of<br />

R peak, having highest and prominent amplitude, we have applied thresholding technique based on the median absolute<br />

deviation(MAD) of modulus maximas to detect the complex. In order to evaluate the algorithm, the analysis has been<br />

- 320 -


done on MIT-BIH Arrhythmia database. The results have been examined and approved by medical doctors.<br />

13:30-16:30, Paper ThBCT9.28<br />

Crossmodal Matching of Speakers using Lip and Voice Features in Temporally Non-Overlapping Audio and Video<br />

Streams<br />

Roy, Anindya, Ec. Pol. Federale de Lausanne<br />

Marcel, Sebastien, Ec. Pol. Federale de Lausanne<br />

Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently<br />

or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identification<br />

in a cross-modal scenario, i.e., matching the speaker in an audio recording to the same speaker in a video recording,<br />

where the two recordings have been made during different sessions, using speaker specific information which is<br />

common to both the audio and video modalities. Several recent psychological studies have shown how humans can indeed<br />

perform this task with an accuracy significantly higher than chance. Here we propose two systems which can solve this<br />

task comparably well, using purely pattern recognition techniques. We hypothesize that such systems could be put to practical<br />

use in multimodal biometric and surveillance systems.<br />

13:30-16:30, Paper ThBCT9.29<br />

Image Parsing with a Three-State Series Neural Network Classifier<br />

Seyedhosseini Tarzjani, Seyed Mojtaba, Univ. of Utah<br />

Paiva, Antonio, Univ. of Utah<br />

Tasdizen, Tolga, Univ. of Utah<br />

We propose a three-state series neural network for effective propagation of context and uncertainty information for image<br />

parsing. The activation functions used in the proposed model have three states instead of the normal two states. This makes<br />

the neural network more flexible than the two-state neural network, and allows for uncertainty to be propagated through<br />

the stages. In other words, decisions about difficult pixels can be left for later stages which have access to more contextual<br />

information than earlier stages. We applied the proposed method to three different datasets and experimental results demonstrate<br />

higher performance of the three-state series neural network.<br />

13:30-16:30, Paper ThBCT9.30<br />

Pan-Sharpening using an Adaptive Linear Model<br />

Liu, Lining, Beihang Univ.<br />

Wang, Yiding, North China Univ. of Tech.<br />

Wang, Yunhong, Beihang Univ.<br />

Yu, Haiyan, Beihang Univ.<br />

In this paper, we propose an algorithm to synthesize high-resolution multispectral images by fusing panchromatic (Pan)<br />

images and multispectral (MS) images. The algorithm is based on an adaptive linear model, which is automatically estimated<br />

by least square fitting. In this model, a virtual difference band is appended to the MS to guarantee the correlation<br />

between the Pan and MS. Then, an iterative procedure is carried out to generate the fused images using steepest descent<br />

method. The efficiency of the presented technique is tested by performing pan-sharpening of IKONOS, Quick Bird, and<br />

Landsat-7 ETM+ datasets. Experimental results show that our method provides better fusion results than other methods.<br />

13:30-16:30, Paper ThBCT9.31<br />

A Study of Voice Source and Vocal Tract Filter based Features in Cognitive Load Classification<br />

Le, Phu, The Univ. of New South Wales<br />

Epps, Julien, The Univ. of New South Wales<br />

Choi, Eric, ational ICT Australia<br />

Ambikairajah, Eliathamby, The Univ. of New South Wales<br />

Speech has been recognized as an attractive method for the measurement of cognitive load. Previous approaches have<br />

used mel frequency cepstral coefficients (MFCCs) as discriminative features to classify cognitive load. The MFCCs contain<br />

information from both the voice source and the vocal tract, so that the individual contributions of each to cognitive load<br />

variation are unclear. This paper aims to extract speech features related to either the voice source or the vocal tract and use<br />

- 321 -


them to discriminate between cognitive load levels in order to identify the individual contribution of each for cognitive<br />

load measurement. Voice source-related features are then used to improve the performance of current cognitive load classification<br />

systems, using adapted Gaussian mixture models. Our experimental result shows that the use of voice source<br />

feature could yield around 12% reduction in relative error rate compared with the baseline system based on MFCCs, intensity,<br />

and pitch contour.<br />

13:30-16:30, Paper ThBCT9.32<br />

Adaptive Enhancement with Speckle Reduction for SAR Images using Mirror-Extended Curvelet and PSO<br />

Li, Ying, Northwestern Pol. Univ.<br />

Hongli, Gong, Northwestern Pol. Univ.<br />

Wang, Qing, Northwestern Pol. Univ.<br />

Speckle and low contrast can cause image degradation, which reduces the detectability of targets and impedes further investigation<br />

of synthetic aperture radar (SAR) images. This paper presents an adaptive enhancement method with speckle<br />

reduction for SAR images using mirror-extended curve let (ME-curve let) transform and particle swarm optimization<br />

(PSO). First, an improved enhancement function is proposed to nonlinearly shrink and stretch the curve let coefficients.<br />

Then, a novel objective evaluation criterion is introduced to adaptively obtain the optimal parameters in the enhancement<br />

function. Finally, a PSO algorithm with two improvements is used as a global search strategy for the best enhanced image.<br />

Experimental results indicate that the proposed method can reduce the speckle and enhance the edge features and the contrast<br />

of SAR images better with comparison to the wavelet-based and curve let-based non-adaptive enhancement methods.<br />

13:30-16:30, Paper ThBCT9.33<br />

Recursive Video Matting and Denoising<br />

Prabhu, Sahana, Indian Inst. of Tech. Madras<br />

Ambasamudram, Rajagopalan, Indian Inst. of Tech. Madras<br />

In this paper, we propose a video matting method with simultaneous noise reduction based on the Unscented Kalman filter<br />

(UKF). This recursive approach extracts the alpha mattes and denoised foregrounds from noisy videos, in a unified framework.<br />

No assumptions are made about the type of motion of the camera or of the foreground object in the video. Moreover,<br />

user-specified trimaps are required only once every ten frames. In order to accurately extract information at the borders<br />

between the foreground and the background, we include a discontinuity-adaptive Markov random field (MRF) prior. It<br />

incorporates spatio-temporal information from the current and previous frame during estimation of the alpha matte as well<br />

as the foreground. Results are given on videos with real film-grain noise.<br />

13:30-16:30, Paper ThBCT9.35<br />

The Effects of Radiometry on the Accuracy of Intensity based Registration<br />

Selby, Boris Peter, Medcom GmbH<br />

Sakas, Georgios, Fraunhofer IGD<br />

Walter, Stefan, Medcom GmbH<br />

Groch, Wolf-Dieter, Univ. of Applied Sciences Darmstadt<br />

Stilla, Uwe, Tech. Univ. Muenchen<br />

Besides several other factors, radiometric differences between a reference and a floating image greatly influence the achievable<br />

accuracy of image registration. In this work we derive the magnitude of registration inaccuracy coming from changes<br />

in radiometric properties. This is done for the example of medical X-ray image registration. We therefore estimate the<br />

change of image intensity with respect to object shape, X-ray attenuation of the object material and the initial X-ray energy<br />

by modeling a simplified image formation process. The change in intensity is then used to determine a closed form estimation<br />

of the resulting registration error, independent from a specific registration algorithm. Finally the theoretical calculations<br />

are compared to the accuracy of intensity based registration performed on X-ray images with different radiometric<br />

properties. Results show that the herewith derived accuracy estimation is well suited to predict the achievable accuracy of<br />

a registration for images with radiometric differences.<br />

- 322 -


13:30-16:30, Paper ThBCT9.36<br />

Fence Removal from Multi-Focus Images<br />

Yamashita, Atsushi, Shizuoka Univ.<br />

Matsui, Akiyoshi, Shizuoka Univ.<br />

Kaneko, Toru, Shizuoka Univ.<br />

When an image of a scene is captured by a camera through a fence, a blurred fence image interrupts objects in the scene.<br />

In this paper, we propose a method for a fence removal from the image using multiple focusing. Most of previous methods<br />

interpolate the interrupted regions by using information of surrounding textures. However, these methods fail when information<br />

of surrounding textures is not rich. On the other hand, there are methods that acquire multiple images for image<br />

restoration and composite them to generate a new clear image. The latter approach is adopted because it is robust and accurate.<br />

Multi-focus images are acquired and ``defocusing’’ information is utilized to generate a clear image. Experimental<br />

results show the effectiveness of the proposed method.<br />

13:30-16:30, Paper ThBCT9.37<br />

Information Theoretic Expectation Maximization based Gaussian Mixture Modeling for Speaker Verification<br />

Memon, Sheeraz, RMIT Univ.<br />

Lech, Margaret, RMIT Univ.<br />

Namunu, Maddage, RMIT Univ.<br />

The expectation maximization (EM) algorithm is widely used in the Gaussian mixture model (GMM) as the state-of-art<br />

statistical modeling technique. Like the classical EM method, the proposed EM-Information Theoretic algorithm (EM-<br />

IT) adapts means, covariances and weights, however this process is not conducted directly on feature vectors but on a<br />

smaller set of centroids derived by the information theoretic procedure, which simultaneously minimizes the divergence<br />

between the Parzen estimates of the feature vector’s distribution within a given Gaussian component and the centroid’s<br />

distribution within the same Gaussian component. The EM-IT algorithm was applied to the speaker verification problem<br />

using NIST 2004 speech corpus and the MFCC with dynamic features. The results showed an improvement of the equal<br />

error rate (ERR) by 1.5% over the classical EM approach. The EM-IT also showed higher convergence rates compare to<br />

the EM method.<br />

13:30-16:30, Paper ThBCT9.38<br />

A Gaussian Process Regression Framework for Spatial Error Concealment with Adaptive Kernels<br />

Asheri, Hadi, Sharif Univ. of Tech.<br />

Rabiee, Hamid Reza, Sharif Univ. of Tech.<br />

Pourdamghani, Nima, Sharif Univ. of Tech.<br />

Rohban, Mohammad H., Sharif Univ. of Tech.<br />

We have developed a Gaussian Process Regression method with adaptive kernels for concealment of the missing macroblocks<br />

of block-based video compression schemes in a packet video system. Despite promising results, the proposed algorithm<br />

introduces a solid framework for further improvements. In this paper, the problem of estimating lost macro-blocks<br />

will be solved by estimating the proper covariance function of the Gaussian process defined over a region around the missing<br />

macro-blocks (i.e. its kernel function). In order to preserve block edges, the kernel is constructed adaptively by using<br />

the local edge related information. Moreover, we can achieve more improvement by local estimation of the kernel parameters.<br />

While restoring the prominent edges of the missing macro-blocks, the proposed method produces perceptually<br />

smooth concealed frames. Objective and subjective evaluations verify the effectiveness of the proposed method.<br />

13:30-16:30, Paper ThBCT9.39<br />

Colour Constant Image Shapening<br />

Alsam, Ali, Sør-Trøndelag Univ. Coll.<br />

In this paper, we introduce a new sharpening method which guarantees colour constancy and resolves the problem of equiluminance<br />

colours. The algorithm is similar to unsharp masking in that the gradients are calculated at different scales by<br />

blurring the original with a variable size kernel. The main difference is in the blurring stage where we calculate the average<br />

of an n times n neighborhood by projecting each colour vector onto the space of the center pixel before averaging. Thus<br />

starting with the center pixel we define a projection matrix onto the space of that vector. Each neighboring colour is then<br />

projected onto the center and the result is summed up. The projection step results in an average vector which shares the<br />

- 323 -


direction of the original center pixel. The difference between the center pixel and the average is by definition a vector<br />

which is scalar away from the center pixel. Thus adding the average to the center pixel is guaranteed not to result in colour<br />

shifts. This projection step is also shown to remedy the problem of equiluminance colours and can be used for m-dimensional<br />

data. Finally, the results indicate that the new sharpening method results in better sharpening than that achieved<br />

using unsharp masking with noticeably less halos around strong edges. The latter aspect of the algorithm is believed to be<br />

due to the asymmetric nature of the projection step.<br />

13:30-16:30, Paper ThBCT9.40<br />

Maximally Stable Texture Regions<br />

Güney, Mesut, Turkish Naval Academy<br />

Arica, Nafiz, Turkish Naval Academy<br />

In this study, we propose to detect interest regions based on texture information of images. For this purpose, Maximally<br />

Stable Extremal Regions (MSER) approach is extended using the high dimensional texture features of image pixels. The<br />

regions with different textures from their vicinity are detected using agglomerative clustering successively. The proposed<br />

approach is evaluated in terms of repeatability and matching scores in an experimental setup used in the literature. It outperforms<br />

the intensity and color based detectors, especially in the images containing textured regions. It succeeds better<br />

in the transformations including viewpoint change, blurring, illumination and JPEG compression, while producing comparable<br />

results in the other transformations tested in the experiments.<br />

13:30-16:30, Paper ThBCT9.41<br />

Combining the Likelihood and the Kullback-Leibler Distance in Estimating the Universal Background Model for<br />

Speaker Verification using SVM<br />

Lei, Zhenchun, JiangxiNormal Univ.<br />

The state-of-the-art methods for speaker verification are based on the support vector machine. The Gaussian supervector<br />

SVM is a typical method which uses the Gaussian mixture model for creating feature vectors for the discriminative SVM.<br />

And all GMMs are adapted from the same universal background model, which is got by maximum likelihood estimation<br />

on a large number of data sets. So the UBM should cover the feature space widely as possible. We propose a new method<br />

to estimate the parameters of the UBM by combining the likelihood and the Kullback-Leibler distances in the UBM. Its<br />

aim is to find the model parameters which get the high likelihood value and all Gaussian distributions are dispersed to<br />

cover the feature space in a great measuring. Experiments on NIST 2001 task show that our method can improve the performance<br />

obviously.<br />

13:30-16:30, Paper ThBCT9.42<br />

Asymmetric Generalized Gaussian Mixture Models and EM Algorithm for Image Segmentation<br />

Nacereddine, Nafaa, LORIA<br />

Tabbone, Salvatore, Univ. Nancy 2-LORIA<br />

Ziou, Djemel, Sherbrooke Univ.<br />

Hamami, Latifa, Ec. Nationale Pol.<br />

In this paper, a parametric and unsupervised histogram-based image segmentation method is presented. The histogram is<br />

assumed to be a mixture of asymmetric generalized Gaussian distributions. The mixture parameters are estimated by using<br />

the Expectation Maximization algorithm. Histogram fitting and region uniformity measures on synthetic and real images<br />

reveal the effectiveness of the proposed model compared to the generalized Gaussian mixture model.<br />

13:30-16:30, Paper ThBCT9.43<br />

Color Connectedness Degree for Mean-Shift Tracking<br />

Gouiffès, Michèle, IEF Univ. Paris Sud 11<br />

Laguzet, Florence, LRI Univ. Paris Sud 11<br />

Lacassagne, Lionel, IEF Univ. Paris Sud 11<br />

This paper proposes an extension to the mean shift tracking. We introduce the color connectedness degrees (CCD) which,<br />

more than providing statistical information about the target to track, embeds information about the amount of connectedness<br />

of the color intervals which compose the target. With a low increase of complexity, this approach provides a better robust-<br />

- 324 -


ness and quality of the tracking compared to the use of the RGB space. This is asserted by the experiments performed on<br />

several sequences showing vehicles and pedestrians in various contexts.<br />

13:30-16:30, Paper ThBCT9.44<br />

Signal-To-Signal Ratio Independent Speaker Identifi Cation for Co-Channel Speech Signals<br />

Saeidi, Rahim, Univ. of Eastern Finland<br />

Mowlaee, Pejman, Aalborg Univ.<br />

Kinnunen, Tomi, Univ. of Eastern Finland<br />

Tan, Zheng-Hua, Aalborg Univ.<br />

Christensen, Mads Græsbøll, Aalborg Univ.<br />

Jensen, Søren Holdt, Aalborg Univ.<br />

Fränti, Pasi, Univ. of Eastern Finland<br />

In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is<br />

recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition<br />

accuracies have already been reported when an accurately estimated signal-to-signal ratio (SSR) is available. In this paper,<br />

we approach the problem without estimating SSR. We show that a simple method based on fusion of adapted Gaussian<br />

mixture models and Kullback-Leibler divergence calculated between models, achieves an accuracy of 97% and 93% when<br />

the two target speakers enlisted as three and two most probable speakers, respectively.<br />

13:30-16:30, Paper ThBCT9.45<br />

Selection of Training Instances for Music Genre Classification<br />

Lopes, Miguel, INESC Porto<br />

Gouyon, Fabien, INESC Porto<br />

Koerich, Alessandro, PUCPR<br />

Oliveira, Luiz, Federal Univ. of Parana<br />

In this paper we present a method for the selection of training instances based on the classification accuracy of a SVM<br />

classifier. The instances consist of feature vectors representing short-term, low-level characteristics of music audio signals.<br />

The objective is to build, from only a portion of the training data, a music genre classifier with at least similar performance<br />

as when the whole data is used. The particularity of our approach lies in a pre-classification of instances prior to the main<br />

classifier training: i.e. we select from the training data those instances that show better discrimination with respect to class<br />

memberships. On a very challenging dataset of 900 music pieces divided among 10 music genres, the instance selection<br />

method slightly improves the music genre classification in 2.4 percentage points. On the other hand, the resulting classification<br />

model is significantly reduced, permitting much faster classification over test data.<br />

13:30-16:30, Paper ThBCT9.46<br />

Semi-Blind Speech-Music Separation using Sparsity and Continuity Priors<br />

Erdogan, Hakan, Sabanci Univ.<br />

M. Grais, Emad, Sabanci Univ.<br />

In this paper we propose an approach for the problem of single channel source separation of speech and music signals.<br />

Our approach is based on representing each source’s power spectral density using dictionaries and nonlinearly projecting<br />

the mixture signal spectrum onto the combined span of the dictionary entries. We encourage sparsity and continuity of the<br />

dictionary coefficients using penalty terms (or log-priors) in an optimization framework. We propose to use a novel coordinate<br />

descent technique for optimization, which nicely handles nonnegativity constraints and nonquadratic penalty terms.<br />

We use an adaptive Wiener filter, and spectral subtraction to reconstruct both of the sources from the mixture data after<br />

corresponding power spectral densities (PSDs) are estimated for each source. Using conventional metrics, we measure<br />

the performance of the system on simulated mixtures of single person speech and piano music sources. The results indicate<br />

that the proposed method is a promising technique for low speech-to-music ratio conditions and that sparsity and continuity<br />

priors help improve the performance of the proposed system.<br />

- 325 -


13:30-16:30, Paper ThBCT9.47<br />

Comparative Analysis for Detecting Objects under Cast Shadows in Video Images<br />

Villamizar Vergel, Michael, CSIC-UPC<br />

Scandaliaris, Jorge, CSIC-UPC<br />

Sanfeliu, Alberto, Univ. Pol. de Catalunya<br />

Cast shadows add additional difficulties on detecting objects because they locally modify image intensity and color. Shadows<br />

may appear or disappear in an image when the object, the camera, or both are free to move through a scene. This<br />

work evaluates the performance of an object detection method based on boosted HOG paired with three different image<br />

representations in outdoor video sequences. We follow and extend on the taxonomy from van de Sande with considerations<br />

on the constraints assumed by each descriptor on the spatial variation of the illumination. We show that the intrinsic image<br />

representation consistently gives the best results. This proves the usefulness of this representation for object detection in<br />

varying illumination conditions, and supports the idea that in practice local assumptions in the descriptors can be violated.<br />

13:30-16:30, Paper ThBCT9.48<br />

Shape-Appearance Guided Level-Set Deformable Model for Image Segmentation<br />

Khalifa, Fahmi, Univ. of Louisville<br />

El-Baz, Ayman, Univ. of Louisville<br />

Gimel’Farb, Georgy, Univ. of Auckland<br />

Abou El-Ghar, Mohamed, Univ. of Mansoura<br />

A new speed function to guide evolution of a level-set based active contour is proposed for segmenting an object from its<br />

background in a given image. The guidance accounts for a learned spatially variant statistical shape prior, 1st-order visual<br />

appearance descriptors of the contour interior and exterior (associated with the object and background, respectively), and<br />

a spatially invariant 2nd-order homogeneity descriptor. The shape prior is learned from a subset of co-aligned training images.<br />

The visual appearances are described with marginal gray level distributions obtained by separating their mixture<br />

over the image. The evolving contour interior is modeled by a 2nd-order translation and rotation invariant Markov-Gibbs<br />

random field of object/background labels with analytically estimated potentials. Experiments with kidney CT images confirm<br />

robustness and accuracy of the proposed approach.<br />

13:30-16:30, Paper ThBCT9.49<br />

Iterative Ramp Sharpening for Structure/Signature-Preserving Simplification of Images<br />

Grazzini, Jacopo, Los Alamos National Lab.<br />

Soille, Pierre, Ec. Joint Res. Centre<br />

In this paper, we present a simple and heuristic ramp sharpening algorithm that achieves local contrast enhancement of<br />

vector-valued images. The proposed algorithm performs pixel wise comparisons of intensity values, gradient strength and<br />

directional information in order to locate transition ramps around true edges in the image. The sharpening is then applied<br />

only for those pixels found on the ramps. This way, the contrast between objects and regions separated by a ramp is enhanced<br />

correspondingly, avoiding ringing artifacts. It is found that applying this technique in an iterative manner on blurred<br />

imagery produces sharpening preserving both structure and signature of the image. The final approach reaches a good<br />

compromise between complexity and effectiveness for image simplfication, enhancing in an efficient manner the image<br />

details and maintaining the overall image appearance.<br />

13:30-16:30, Paper ThBCT9.50<br />

Learning Naive Bayes Classifiers for Music Classification and Retrieval<br />

Fu, Zhouyu, Monash Univ.<br />

Lu, Guojun, Monash Univ.<br />

Ting, Kai Ming, Monash Univ.<br />

Zhang, Dengsheng, Monash Univ.<br />

In this paper, we explore the use of naive Bayes classifiers for music classification and retrieval. The motivation is to employ<br />

all audio features extracted from local windows for classification instead of just using a single song-level feature<br />

vector produced by compressing the local features. Two variants of naive Bayes classifiers are studied based on the extensions<br />

of standard nearest neighbor and support vector machine classifiers. Experimental results have demonstrated superior<br />

performance achieved by the proposed naive Bayes classifiers for both music classification and retrieval as compared<br />

to the alternative methods.<br />

- 326 -


13:30-16:30, Paper ThBCT9.52<br />

An Empirical Study of Feature Extraction Methods for Audio Classification<br />

Parker, Charles, Eastman Kodak Company<br />

With the growing popularity of video sharing web sites and the increasing use of consumer-level video capture devices,<br />

new algorithms are needed for intelligent searching and indexing of such data. The audio from these video streams is particularly<br />

challenging due to its low quality and high variability. Here, we perform a broad empirical study of features used<br />

for intelligent audio processing. We perform experiments on a dataset of 200 consumer videos over which we attempt to<br />

detect 10 semantic audio concepts.<br />

13:30-16:30, Paper ThBCT9.53<br />

Geometric Total Variation for Texture Deformation<br />

Bespalov, Dmitriy, Drexel Univ.<br />

Dahl, Anders, Tech. Univ. of Denmark<br />

Shokoufandeh, Ali, Drexel Univ.<br />

In this work we propose a novel variational method that we intend to use for estimating non-rigid texture deformation.<br />

The method is able to capture variation in gray scale images with respect to the geometry of its features. Accurate localization<br />

of features in the presence of unknown deformations is a crucial property for texture characterization. Our experimental<br />

evaluations demonstrate that accounting for geometry of features in texture images leads to significant<br />

improvements in localization of these features, when textures undergo geometrical transformations. In addition, feature<br />

descriptors using geometrical total variation energies discriminate between various regular textures with accuracy comparable<br />

to SIFT descriptors, while reduced dimensionality of TVG descriptor yields significant improvements over SIFT<br />

in terms of retrieval time.<br />

13:30-16:30, Paper ThBCT9.54<br />

A Novel Approach to Detect Ship-Radiated Signal based on HMT<br />

Zhou, Yue, Shanghai Jiaotong Univ.<br />

Niu, Zhibin, Shanghai Jiaotong Univ.<br />

Wang, Chenhao, Shanghai Jiaotong Univ.<br />

In the presence of non-gaussian noise, we propose a method for the detection of underwater ship-radiated signal. The<br />

wavelet decomposition of the underwater signal yields a natural tree structure, which is further modeled by the Hidden<br />

Markov Tree (HMT). Therefore, the signal is represented as the parameter of the correspondent HMT. We analysis the<br />

likelihood defined on the parameters and form the new detection criteria. Experimental results demonstrate a reliable and<br />

robust solution of our method.<br />

13:30-16:30, Paper ThBCT9.55<br />

Speech Emotion Analysis in Noisy Real-World Environment<br />

Tawari, Ashish, Univ. of California, San Diego<br />

Trivedi, Mohan, Univ. of California, San Diego<br />

Automatic recognition of emotional states via speech signal has attracted increasing attention in recent years. A number<br />

of techniques have been proposed which are capable of providing reasonably high accuracy for controlled studio settings.<br />

However, their performance is considerably degraded when the speech signal is contaminated by noise. In this paper, we<br />

present a framework with adaptive noise cancellation as front end to speech emotion recognizer. We also introduce a new<br />

feature set based on cepstral analysis of pitch and energy contours. Experimental analysis shows promising results.<br />

13:30-16:30, Paper ThBCT9.56<br />

Sampling and Ideal Reconstruction on the 3D Diamond Grid<br />

Strand, Robin, Uppsala Univ.<br />

This paper presents basic, yet important, properties that can be used when developing methods for image acquisition, processing,<br />

and visualization on the diamond grid. The sampling density needed to reconstruct a band-limited signal and the<br />

ideal interpolation function on the diamond grid are derived.<br />

- 327 -


13:30-16:30, Paper ThBCT9.57<br />

Detecting Faint Compact Sources using Local Features and a Boosting Approach<br />

Torrent, Albert, Univ. of Girona<br />

Peracaula, Marta, Univ. of Girona<br />

Llado, Xavier, Univ. of Girona<br />

Freixenet, Jordi, Univ. of Girona<br />

Sanchez-Sutil, Juan Ramon, Univ. de Jaén<br />

Martí, Josep, Univ. de Jaén<br />

Paredes, Josep Maria, Univ. de Barcelona<br />

Several techniques have been proposed so far in order to perform faint compact source detection in wide field interferometric<br />

radio images. However, all these methods can easily miss some detections or obtain a high number of false positive<br />

detections due to the low intensity of the sources, the noise ratio, and the interferometric patterns present in the images.<br />

In this paper we present a novel strategy to tackle this problem. Our approach is based on using local features extracted<br />

from a bank of filters in order to provide a description of different types of faint source structures. We then perform a<br />

training step in order to automatically learn and select the most salient features, which are used in a Boosting classifier to<br />

perform the detection. The validity of our method is demonstrated using 19 real images that compose a radio mosaic. The<br />

comparison with two well-known state of the art methods shows that our approach is able to obtain more source detections,<br />

reducing also the number of false positives.<br />

13:30-16:30, Paper ThBCT9.58<br />

Automatic Hair Detection in the Wild<br />

Julian, Pauline, IRIT, FittingBox<br />

Dehais, Christophe, FittingBox<br />

Lauze, Francois, Univ. of Copenhagen<br />

Charvillat, Vincent, IRIT<br />

Bartoli, Adrien, UdA<br />

Choukroun, Ariel, FittingBox<br />

This paper presents an algorithm for segmenting the hair region in uncontrolled, real life conditions images. Our method<br />

is based on a simple statistical hair shape model representing the upper hair part. We detect this region by minimizing an<br />

energy which uses active shape and active contour. The upper hair region then allows us to learn the hair appearance parameters<br />

(color and texture) for the image considered. Finally, those parameters drive a pixel-wise segmentation technique<br />

that yields the desired (complete) hair region. We demonstrate the applicability of our method on several real images.<br />

13:30-16:30, Paper ThBCT9.59<br />

De-Noising of SRμCT Fiber Images by Total Variation Minimization<br />

Lindblad, Joakim, Swedish Univ. of Agricultural Sciences<br />

Sladoje, Natasa, Univ. of Novi Sad<br />

Lukic, Tibor, Univ. of Novi Sad<br />

SRCT images of paper and pulp fiber materials are characterized by a low signal to noise ratio. De-noising is therefore a<br />

common preprocessing step before segmentation into fiber and background components. We suggest a de-noising<br />

method based on total variation minimization using a modified Spectral Conjugate Gradient algorithm. Quantitative<br />

evaluation performed on synthetic 3D data and qualitative evaluation on real 3D paper fiber data confirm appropriateness<br />

of the suggested method for the particular application.<br />

- 328 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!