Abstract book (pdf) - ICPR 2010
Abstract book (pdf) - ICPR 2010
Abstract book (pdf) - ICPR 2010
- TAGS
- abstract
- icpr
- icpr2010.org
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
CONTENTS<br />
Organizing Committees 2<br />
Tracks and Co-Chairs 4<br />
Message from the General Chair 6<br />
Message from the Technical Program Chairs 7<br />
Technical Program Overview 8<br />
Technical Program for Monday 17<br />
Technical Program for Tuesday 69<br />
Technical Program for Wednesday 157<br />
Technical Program for Thursday 239<br />
- 1 -
Organizing Committees<br />
Conference Chair<br />
Aytül Erçil<br />
Sabanci University<br />
Turkey<br />
Technical Co-Chairs<br />
Kim Boyer<br />
Rensselaer<br />
Polytechnic Institute<br />
USA<br />
Müjdat Çetin<br />
Sabanci University<br />
Turkey<br />
Seong-Whan Lee<br />
Korea University<br />
Korea<br />
Advisory Committee<br />
Sergey Ablameyko<br />
National Academy of Sciences<br />
Belarus<br />
Hüseyin Abut<br />
San Diego<br />
State University<br />
USA<br />
Jake Aggarwal<br />
University of Texas<br />
USA<br />
Horst Bunke<br />
University of Bern<br />
Switzerland<br />
Rama Chellappa<br />
University of Maryland<br />
USA<br />
Igor B. Gurevich<br />
Russian Academy of Sciences<br />
Russia<br />
Anil K. Jain<br />
Michigan State University<br />
USA<br />
Takeo Kanade<br />
Carnegie Mellon University<br />
USA<br />
Rangachar Kasturi<br />
University of South Florida<br />
USA<br />
- 2 -<br />
Josef Kittler<br />
University of Surrey<br />
UK<br />
Brian Lovell<br />
University of Queensland<br />
Australia<br />
Theo Pavlidis<br />
Stony Brook University<br />
USA<br />
Pietro Perona<br />
California Institute of Technology<br />
USA<br />
Fatih Porikli<br />
MERL<br />
USA<br />
Alberto Sanfeliu<br />
Politechnical University of Catalonia<br />
Spain<br />
Bülent Sankur<br />
Bogazici University<br />
Turkey<br />
Bernhard Schölkopf<br />
Max Planck Institutes<br />
Germany<br />
Mubarak Shah<br />
University of Central Florida<br />
USA<br />
Tieniu Tan<br />
National Laboratory of<br />
Pattern Recognition<br />
China<br />
Sergios Theodoridis<br />
University of Athens<br />
Greece<br />
Plenary Speakers Committee<br />
Anil K. Jain<br />
Michigan State University<br />
USA<br />
Tutorials<br />
Denis Laurendeau<br />
Laval University<br />
Canada<br />
- 2 -
Arun Ross<br />
West Virginia University<br />
USA<br />
Birsen Yazıcı<br />
Rensselaer<br />
Polytechnic Institute<br />
USA<br />
Workshops<br />
Selim Aksoy<br />
Bilkent University<br />
Turkey<br />
Theo Gevers<br />
University of Amsterdam<br />
The Netherlands<br />
Denis Laurendeau<br />
Laval University<br />
Canada<br />
Bülent Sankur<br />
Bogazici University<br />
Turkey<br />
Contest Organization<br />
Selim Aksoy<br />
Bilkent University<br />
Turkey<br />
Zehra Çataltepe<br />
Istanbul Technical Universityv<br />
Turkey<br />
Devrim Ünay<br />
Bahcesehir University<br />
Turkey<br />
Publicity<br />
Enis Çetin<br />
Bilkent University<br />
Turkey<br />
Pınar Duygulu Şahin<br />
Bilkent University<br />
Turkey<br />
Asian Liaisons<br />
Karthik Nandakumar<br />
Institute for Infocomm Research<br />
Singapore<br />
Yunhou Wang<br />
Beihang University<br />
China<br />
- 3 -<br />
European Liaisons<br />
Javier Ortega-Garcia<br />
Universidad Autonoma de Madrid<br />
Spain<br />
Fabio Roli<br />
University of Cagliari<br />
Italy<br />
American Liaisons<br />
Deniz Erdoğmuş<br />
Northeastern University<br />
USA<br />
Publications<br />
Nafiz Arıca<br />
Naval Academy<br />
Turkey<br />
Cem Ünsalan<br />
Yeditepe University<br />
Turkey<br />
Local Arrangements<br />
Ayşın Baytan Ertüzün<br />
Bogazici University<br />
Turkey<br />
Mustafa Ünel<br />
Sabanci University<br />
Turkey<br />
Finance<br />
Gülbin Akgün<br />
Sabanci University<br />
Turkey<br />
Hakan Erdoğan<br />
Sabanci University<br />
Turkey<br />
Sponsorship<br />
Fatoş Yarman Vural<br />
Middle East Technical University<br />
Turkey<br />
Exhibits<br />
Olcay Kurşun<br />
Istanbul University<br />
Turkey
Tracks and Co-Chairs<br />
Track I: Computer Vision<br />
Joachim Buhmann<br />
ETH Zurich, Switzerland<br />
Xiaoyi Jiang<br />
University of Munster, Germany<br />
Jussi Parkkinen<br />
University of Joensuu, Finland<br />
Alper Yılmaz<br />
Ohio State University, USA<br />
Area Co-Chairs:<br />
Ahmet Ekin, Philips Research Europe, The Netherlands<br />
Georgy Gimel’farb, University of Auckland, New Zealand<br />
Muhittin Gökmen, Istanbul Technical University, Turkey<br />
Atsushi Imiya, Chiba University, Japan<br />
Nikos Paragios, Ecole Centrale de Paris, France<br />
Fatih Porikli, MERL, USA<br />
Sudeep Sarkar, University of South Florida, USA<br />
Bernt Schiele, TU Darmstadt, Germany<br />
Yaser Ajmal Sheikh, Carnegie Mellon, USA<br />
Dacheng Tao, Nanyang Technological University, Singapore<br />
Track II: Pattern Recognition and Machine Learning<br />
G. Sanniti di Baja<br />
Istituto di Cibernetica Eduardo Caianiello, Italy<br />
Mario Figueiredo<br />
Instituto Superior Tecnico, Portugal<br />
Bilge Günsel<br />
Istanbul Technical University, Turkey<br />
D.Y. Yeung<br />
Hong Kong University of Science and Technology, China<br />
Area Co-Chairs:<br />
Ethem Alpaydın, Bogazici University, Turkey<br />
Gunilla Borgefors, CBA Uppsala, Sweden<br />
Yang Gao, Nanjing University, China<br />
Simone Marinai, University of Florence, Italy<br />
Aleix Martinez, The Ohio State University, USA<br />
Petr Somol, UTIA, Czech Republic<br />
Tolga Taşdizen, University of Utah, USA<br />
Zhi-Hua Zhou, Nanjing University, China<br />
Track III: Signal, Speech, Image and Video Processing<br />
Maria Petrou<br />
Imperial College, UK<br />
Kazuya Takeda<br />
Nagoya University, Japan<br />
Murat Tekalp<br />
Koc University, Turkey<br />
Jean-Philippe Thiran<br />
Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland<br />
- 4 -
Track IV:<br />
Biometrics and Human Computer Interaction<br />
Lale Akarun<br />
Bogazici University, Turkey<br />
Patrick Flynn<br />
University of Notre Dame, USA<br />
B. Vijaya Kumar<br />
Carnegie Mellon, USA<br />
Stan Z. Li<br />
Chinese Academy of Sciences, China<br />
Track V: Multimedia and Document Analysis,<br />
Processing and Retrieval<br />
Nozha Boujemaa<br />
INRIA, France<br />
David Doermann<br />
University of Maryland, USA<br />
B. S. Manjunath<br />
University of California, USA<br />
Nicu Sebe<br />
University of Trento, Italy<br />
Berrin Yanıkoğlu<br />
Sabanci University, Turkey<br />
Track VI: Bioinformatics and Biomedical Applications<br />
Rachid Deriche<br />
INRIA, France<br />
Tianzi Jiang<br />
Chinese Academy of Sciences, China<br />
Elena Marchiori<br />
Radboud University, Netherlands<br />
Dimitris Metaxas<br />
The State University of New Jersey, USA<br />
Gözde Ünal<br />
Sabanci University, Turkey<br />
- 5 -
Message from the General Chair<br />
It is my great honor and privilege to welcome all of you to the 20th International Conference on Pattern Recognition.<br />
In the past 40 years, this conference brought together the research communities of industry and academia, from all over<br />
the world to discuss important issues, challenges, and solutions in Pattern Recognition related problems. The conference<br />
has established itself as a forum at which research as well as practical aspects of pattern recognition are enthusiastically<br />
addressed. We hope to continue this tradition by offering you another successful forum with an interesting program.<br />
Once again we have a very strong technical program, with technical sessions on computer vision, pattern recognition and<br />
machine learning, signal, speech, image and video processing, biometrics and human computer interaction, multimedia<br />
and Document Analysis, Processing, and Bioinformatics and Biomedical Applications. We are also fortunate to have distinguished<br />
invited speakers: Christopher Bishop from Microsoft Research Cambridge, Shree K. Nayar Columbia University,<br />
Prabhakar Raghavan from Yahoo! Research will share their experiences and vision with us. The conference also has<br />
an extremely varied program: there will be 7 interesting tutorials that are an integrated part of the program, as well as 9<br />
workshops that allow an even deeper focus on areas that are of interest to the conference participants. A new feature in the<br />
program this year is the organization of 9 contests which will provide a setting where participants will have the opportunity<br />
to evaluate their algorithms using publicly available datasets, and discuss technical topics in an atmosphere that fosters<br />
active exchange of ideas.<br />
A number of organizations, namely, Tüpraş (TR), Tübitak (TR), Havelsan (TR), Cybersoft (TR), Savronik (TR), Chryso<br />
(TR), Star Alliance (TR), Mitsubishi Electric Research Laboratories (USA), IBM Research (USA) and Elsevier (USA),<br />
kindly served as supporters of the Conference. We are most grateful to these organizations for their financial support and<br />
encouragement. The conference is technically co-sponsored by IEEE Computer Society, continuing our desire to seek<br />
closer collaboration between our two communities.<br />
During this period, I have had the opportunity to work closely with some of the best people in our community. We are extremely<br />
grateful to Prof. Müjdat Çetin and Osman Rahmi Fıçıcı, who worked day and night, beyond their professional duties<br />
to make the conference a success. The success of any conference depends heavily on the quality of the selected papers.<br />
For selecting the best out of many excellent submitted papers, we are indebted to the technical co-chairs Müjdat Çetin,<br />
Kim Boyer and Seong-Whan Lee, all the track chairs, and the external referees for their hard work that has continued to<br />
uphold the high standard that is now custom to this conference series.<br />
Special thanks also to the conference organizing committee, in particular, Ayşın Ertüzün and Mustafa Ünel (Local arrangement<br />
chairs), Cem Ünsalan (Publication chair), Fatoş Yarman Vural (Sponsorship chair), Anil Jain (Plenary speakers chair),<br />
Hakan Erdoğan and Gülbin Akgül (Finance chairs), Olcay Kurşun (Exhibits chair), Denis Laurendeau, Arun Ross and<br />
Birsen Yazıcı (Tutorial chairs), Selim Aksoy, Theo Gevers, Denis Laurendeau and Bülent Sankur (Workshop chairs), Selim<br />
Aksoy, Zehra Çataltepe and Devrim Ünay (Contest chairs) and Pınar Duygulu, Enis Çetin (Publicity chairs). There would<br />
be no conference without them. We also thank IAPR ex-co members and past chairs of this event for their continued<br />
support and advice in helping us. We would also like to thank Sabancı University President Prof. Nihat Berker for his support<br />
and encouragement. Our special thanks go to the Teamcon staff members, who provided critical support overseeing<br />
all the logistics and making the smooth operation of the entire conference possible.<br />
Finally, no conference can ever take place without the support of those individuals who submit their original research results,<br />
or without the participants, who honor the conference with their presence.<br />
We hope that you will find the conference both enjoyable and valuable, and also enjoy the architectural, cultural and<br />
natural beauty of Istanbul, and Turkey.<br />
Aytül Erçil<br />
<strong>ICPR</strong> <strong>2010</strong> General Chair<br />
Sabancı University, Faculty of Engineering and Natural Sciences<br />
- 6 -
Message from the Technical Program Chairs<br />
The full technical program committee joins the three of us in welcoming you to the <strong>2010</strong> International Conference on Pattern<br />
Recognition in beautiful, fascinating İstanbul! This is the 20th edition of <strong>ICPR</strong>, world famous as the flagship conference<br />
of the International Association for Pattern Recognition. For nearly 40 years, <strong>ICPR</strong> has been the international forum<br />
for reporting the latest advances across a wide spectrum of fields including pattern recognition and machine learning,<br />
computer vision, image and signal understanding, medical image analysis, biometrics and human-computer interaction,<br />
multimedia and document analysis, bioinformatics and biomedical applications, and more.<br />
The conference program is the work of many people, whose names you will find in the accompanying lists. Track Chairs,<br />
in some cases supported by Area Chairs, pored over thoughtful, well-written reviews provided by an extensive set of referees<br />
drawn from the broad IAPR community. Preliminary decisions were funneled to a set of Track Chairs and Müjdat<br />
Çetin, who met in İstanbul to finalize the program. Papers submitted by Track Chairs were processed by Kim Boyer. General<br />
Chair’s and Technical Program Chairs’ papers were handled by a senior researcher and a separate set of reviewers in<br />
a process completely external to the main paper management system. Seong-Whan Lee took the point on awards.<br />
In all we received 2140 submissions, and accepted 1147 for an acceptance rate of 54%. Of the accepted papers, we were<br />
able to accommodate 385 for oral presentation and 762 as posters. This submission number continues an upward trend for<br />
<strong>ICPR</strong>, and underscores the health of our scientific community. A slight tightening of the acceptance rate ensures a highquality<br />
meeting, and indeed was necessary to fit into the space and time constraints. It is, however, undoubtedly true that<br />
many quality submissions were left out. This is an unfortunate byproduct of the compressed time window in which such<br />
a large number of decisions need to be made.<br />
We thank all of the authors who took the time to prepare and submit their work. We are also deeply grateful to all of the<br />
reviewers, and especially the Track and Area Chairs who devoted so much time and expertise to bringing forth a quality<br />
meeting.<br />
We are confident that <strong>ICPR</strong> <strong>2010</strong> will prove to be a rewarding experience, both scientifically as you interact with others<br />
at the meeting, and culturally as you enjoy the rich heritage, local cuisine, crafts, shopping, and so much more that İstanbul<br />
has to offer.<br />
We look forward to seeing you during our time together, here where the continents meet.<br />
Müjdat Çetin, Kim Boyer, and Seong-Whan Lee<br />
Technical Program Chairs<br />
- 7 -
- 8 -
- 9 -
- 10 -
- 11 -
- 12 -
- 13 -
- 14 -
- 15 -
- 16 -
Technical Program for Monday<br />
August 23, <strong>2010</strong><br />
- 17 -
- 18 -
09:00-09.30, MoOT10 Anadolu Auditorium<br />
Opening Session<br />
09:30-10:30, MoP1L1 Anadolu Auditorium<br />
K.S. Fu Prize Lecture:<br />
Towards the Unification of Structural and Statistical Pattern Recognition<br />
Horst Bunke Plenary Session<br />
Research Group on Computer Vision and Artificial Intelligence IAM<br />
University of Bern, Switzerland<br />
Statistical pattern recognition is characterized by the use of feature vectors for pattern representation, while the structural<br />
approach is based on symbolic data structures, such as strings, trees, and graphs. Clearly, symbolic data structures have a<br />
higher representational power than feature vectors because they allows one to directly model relationships that may exist<br />
between the individual parts of a pattern. However, many operations that are needed in classification, clustering, and other<br />
pattern recognition tasks are not defined for graphs. Consequently, there has been a lack of algorithmic tools in the domain<br />
of structural pattern recognition since its beginning. This talk gives an overview of the development of the field of structural<br />
pattern recognition and shows various attempts to bridge the gap between statistical and structural pattern recognition, i.e.<br />
to make algorithmic tools originally developed for feature vectors applicable to symbolic data structures.<br />
MoAT1 Anadolu Auditorium<br />
Image Analysis - I Regular Session<br />
Session chair: Aksoy, Selim (Bilkent Univ.)<br />
11:00-11:20, Paper MoAT1.1<br />
Minimizing Geometric Distance by Iterative Linear Optimization<br />
Chen, Yisong, Peking Univ.<br />
Sun, Jiewei, Peking Univ.<br />
Wang, Guoping, Peking Univ.<br />
This paper proposes an algorithm that solves planar homography by iterative linear optimization. we iteratively employ<br />
direct linear transformation (DLT) algorithm to robustly estimate the homography induced by a given set of point correspondences<br />
under perspective transformation. By simple on-the-fly homogeneous coordinate adjustment we progressively minimize<br />
the difference between the algebraic error and the geometric error. When the difference is sufficiently close to zero,<br />
the geometric error is equivalently minimized and the homography is reliably solved. Backward covariance propagation is<br />
employed to do error analysis. The experiments prove that the algorithm is able to find global minimum despite erroneous<br />
initialization. It gives very precise estimate at low computational cost and greatly outperforms existing techniques.<br />
11:20-11:40, Paper MoAT1.2<br />
Hyper Least Squares and its Applications<br />
Rangarajan, Prasanna, Southern Methodist Univ.<br />
Kanatani, Kenichi, Okayama Univ.<br />
Niitsuma, Hirotaka, Okayama Univ.<br />
Sugaya, Yasuyuki, Toyohashi Univ. of Tech.<br />
We present a new form of least squares (LS), called ``hyper LS’’, for geometric problems that frequently appear in computer<br />
vision applications. Doing rigorous error analysis, we maximize the accuracy by introducing a normalization that eliminates<br />
statistical bias up to second order noise terms. Our method yields a solution comparable to maximum likelihood (ML)<br />
without iterations, even in large noise situations where ML computation fails.<br />
11:40-12:00, Paper MoAT1.3<br />
Integrating a Discrete Motion Model into GMM based Background Subtraction<br />
Wolf, Christian, INSA de Lyon<br />
Jolion, Jean-Michel, Univ. de Lyon<br />
GMM based algorithms have become the de facto standard for background subtraction in video sequences, mainly because<br />
of their ability to track multiple background distributions, which allows them to handle complex scenes including moving<br />
trees, flags moving in the wind etc. However, it is not always easy to determine which distributions of the mixture belong<br />
- 19 -
to the background and which distributions belong to the foreground, which disturbs the results of the labeling process for<br />
each pixel. In this work we tackle this problem by taking the labeling decision together for all pixels of several consecutive<br />
frames minimizing a global energy function taking into account spatial and temporal relationships. A discrete approximative<br />
optical-flow like motion model is integrated into the energy function and solved with Ishikawa’s convex graph cuts algorithm.<br />
12:00-12:20, Paper MoAT1.4<br />
Saliency based on Multi-Scale Ratio of Dissimilarity<br />
Huang, Rui, Huazhong Univ. of Science and Tech.<br />
Sang, Nong, Huazhong Univ. of Science and Tech.<br />
Liu, Leyuan, Huazhong Univ. of Science and Tech.<br />
Tang, Qiling, Huazhong Univ. of Science and Tech.<br />
Recently, many vision applications tend to utilize saliency maps derived from input images to guide them to focus on processing<br />
salient regions in images. In this paper, we propose a simple and effective method to quantify the saliency for<br />
each pixel in images. Specially, we define the saliency for a pixel in a ratio form, where the numerator measures the<br />
number of dissimilar pixels in its center-surround and the denominator measures the total number of pixels in its centersurround.<br />
The final saliency is obtained by combining these ratios of dissimilarity over multiple scales. For images, the<br />
saliency map generated by our method not only has a high quality in resolution also looks more reasonable. Finally, we<br />
apply our saliency map to extract the salient regions in images, and compare the performance with some state-of-the-art<br />
methods over an established ground-truth which contains 1000 images.<br />
12:20-12:40, Paper MoAT1.5<br />
Online Principal Background Selection for Video Synopsis<br />
Feng, Shikun, Chinese Acad. of Sciences<br />
Liao, Shengcai, Chinese Acad. of Sciences<br />
Yuan, Zhiyong, Wuhan Univ.<br />
Li, Stan Z., Chinese Acad. of Sciences<br />
Video synopsis provides a means for fast browsing of activities in video. Principal background selection (PBS) is an important<br />
step in video synopsis. Existing methods make PBS in an offline way and at a high memory cost. In this paper we<br />
propose a novel background selection method, ``online principal background selection’’ (OPBS). The OPBS selects n<br />
principal backgrounds from N backgrounds in an online fashion with a low memory cost, making it possible to build an<br />
efficient online video synopsis system. Another advantage is that, with OPBS, the selected backgrounds are related to not<br />
only background changes over time but also video activities. Experimental results demonstrate the advantages of the proposed<br />
OPBS.<br />
MoAT2 Marmara Hall<br />
Support Vector Machines Regular Session<br />
Session chair: Alpaydin, Ethem (Bogazici Univ.)<br />
11:00-11:20, Paper MoAT2.1<br />
Large Margin Classifier based on Affine Hulls<br />
Cevikalp, Hakan, Eskisehir Osmangazi Univ.<br />
Yavuz, Hasan Serhan, Eskisehir Osmangazi Univ.<br />
This paper introduces a geometrically inspired large-margin classifier that can be a better alternative to the Support Vector<br />
Machines (SVMs) for the classification problems with limited number of training samples. In contrast to the SVM classifier,<br />
we approximate classes with affine hulls of their class samples rather than convex hulls, which may be unrealistically<br />
tight in high-dimensional spaces. To find the best separating hyperplane between any pair of classes approximated with<br />
the affine hulls, we first compute the closest points on the affine hulls and connect these two points with a line segment.<br />
The optimal separating hyperplane is chosen to be the hyperplane that is orthogonal to the line segment and bisects the<br />
line. To allow soft margin solutions, we first reduce affine hulls in order to alleviate the effects of outliers and then search<br />
for the best separating hyperplane between these reduced models. Multi-class classification problems are dealt with constructing<br />
and combining several binary classifiers as in SVM. The experiments on several databases show that the proposed<br />
method compares favorably with the SVM classifier.<br />
- 20 -
11:20-11:40, Paper MoAT2.2<br />
2D Shape Recognition using Information Theoretic Kernels<br />
Bicego, Manuele, Univ. of Verona<br />
Torres Martins, André Filipe, Inst. Superior Técnico<br />
Murino, Vittorio, Univ. of Verona<br />
Aguiar, Pedro M. Q., Inst. for Systems and Robotics / Inst. Superior Tecnico<br />
Figueiredo, Mario A. T., Inst. Superior Técnico<br />
In this paper, a novel approach for contour based 2D shape recognition is proposed, using a class of information theoretic<br />
kernels recently introduced. This kind of kernels, based on a non-extensive generalization of the classical Shannon information<br />
theory, are defined on probability measures. In the proposed approach, chain code representations are first extracted<br />
from the contours; then n-gram statistics are computed and used as input to the information theoretic kernels. We tested<br />
different versions of such kernels, using support vector machine and nearest neighbor classifiers. An experimental evaluation<br />
on the Chicken pieces dataset shows that the proposed approach significantly outperforms the current state-of-theart<br />
methods.<br />
11:40-12:00, Paper MoAT2.3<br />
Time Series Classification using Support Vector Machine with Gaussian Elastic Metric Kernel<br />
Zhang, Dongyu, Harbin Inst. of Tech.<br />
Zuo, Wangmeng, Harbin Inst. of Tech.<br />
Zhang, David, The Hong Kong Pol. Univ.<br />
Zhang, Hongzhi, Harbin Inst. of Tech.<br />
Motivated by the great success of dynamic time warping (DTW) in time series matching, Gaussian DTW kernel had been<br />
developed for support vector machine (SVM)-based time series classification. Counter-examples, however, had been subsequently<br />
reported that Gaussian DTW kernel usually cannot outperform Gaussian RBF kernel in the SVM framework.<br />
In this paper, by extending the Gaussian RBF kernel, we propose one novel class of Gaussian elastic metric kernel (GEMK),<br />
and present two examples of GEMK: Gaussian time warp edit distance (GTWED) kernel and Gaussian edit distance with<br />
real penalty (GERP) kernel. Experimental results on UCR time series data sets show that, in terms of classification accuracy,<br />
SVM with GEMK is much superior to SVM with Gaussian RBF kernel and Gaussian DTW kernel, and the state-of-theart<br />
similarity measure methods.<br />
12:00-12:20, Paper MoAT2.4<br />
Multiplicative Update Rules for Multilinear Support Tensor Machines<br />
Kotsia, Irene, Queen Mary Univ. of London<br />
Patras, Ioannis, Queen Mary Univ. of London<br />
In this paper, we formulate the Multilinear Support Tensor Machines (MSTMs) problem in a similar to the Non-negative<br />
Matrix Factorization (NMF) algorithm way. A novel set of simple and robust multiplicative update rules are proposed in<br />
order to find the multilinear classifier. Updates rules are provided for both hard and soft margin MSTMs and the existence<br />
of a bias term is also investigated. We present results on standard gait and action datasets and report faster convergence of<br />
equivalent classification performance in comparison to standard MSTMs.<br />
12:20-12:40, Paper MoAT2.5<br />
Support Vectors Selection for Supervised Learning using an Ensemble Approach<br />
Guo, Li, Univ. of Bordeaux 3<br />
Boukir, Samia, Univ. of Bordeaux 3<br />
Chehata, Nesrine, Univ. of Bordeaux 3<br />
Support Vector Machines (SVMs) are popular for pattern classification. However, training a SVM requires large memory<br />
and high processing time, especially for large datasets, which limits their applications. To speed up their training, we<br />
present a new efficient support vector selection method based on ensemble margin, a key concept in ensemble classifiers.<br />
This algorithm exploits a new version of the margin of an ensemble-based classification and selects the smallest margin<br />
instances as support vectors. Our experimental results show that our method reduces training set size significantly without<br />
degrading the performance of the resulting SVMs classifiers.<br />
- 21 -
MoAT3 Topkapı Hall A<br />
Motion and Multiple-View Vision – I Regular Session<br />
Session chair: Hancock, Edwin (Univ. of York)<br />
11:00-11:20, Paper MoAT3.1<br />
Estimating Apparent Motion on Satellite Acquisitions with a Physical Dynamic Model<br />
Huot, Etienne, INRIA and UVSQ<br />
Herlin, Isabelle, INRIA<br />
Mercier, Nicolas, INRIA<br />
Plotnikov, Evgeny, National Academy of Sciences, Ukraine<br />
The paper presents a motion estimation method based on data assimilation in a dynamic model, named Image Model, expressing<br />
the physical evolution of a quantity observed on the images. The application concerns the retrieval of apparent<br />
surface velocity from a sequence of satellite data, acquired over the ocean. The Image Model includes a shallow-water<br />
approximation for the dynamics of the velocity field (the evolution of the two components of motion are linked by the<br />
water layer thickness) and a transport equation for the image field. For retrieving the surface velocity, a sequence of Sea<br />
Surface Temperature (SST) acquisitions is assimilated in the Image Model with a 4D-Var method. This is based on the<br />
minimization of a cost function including the discrepancy between model outputs and SST data and a regularization term.<br />
Several types of regularization norms have been studied. Results are discussed to analyze the impact of the different components<br />
of the assimilation system.<br />
11:20-11:40, Paper MoAT3.2<br />
Multiple View Geometries for Mirrors and Cameras<br />
Fujiyama, Shinji, Nagoya Inst. of Tech.<br />
Sakaue, Fumihiko, Nagoya Inst. of Tech.<br />
Sato, Jun, Nagoya Inst. of Tech.<br />
In this paper, we analyze the multiple view geometry for a camera and mirrors, and propose a method for computing the<br />
geometry of the camera and mirrors accurately from fewer corresponding points than the existing methods. The geometry<br />
between a camera and mirrors can be described as the multiple view geometry for a real camera and virtual cameras. We<br />
show that very strong constraints on geometries can be obtained in addition to the ordinary multilinear constraints. By<br />
using these constraints, we can estimate multiple view geometry more accurately from fewer corresponding points than<br />
usual. The experimental results show the efficiency of the proposed method.<br />
11:40-12:00, Paper MoAT3.3<br />
Perspective Reconstruction and Camera Auto-Calibration as Rectangular Polynomial Eigenvalue Problem<br />
Pernek, Ákos, MTA SZTAKI, BME<br />
Hajder, Levente, MTA SZTAKI<br />
Motion-based 3D reconstruction (SfM) with missing data has been a challenging computer vision task since the late 90s.<br />
Under perspective camera model, one of the most difficult problems is camera auto-calibration which means determining<br />
the intrinsic camera parameters without using any known calibration object or assuming special properties of the scene.<br />
This paper presents a novel algorithm to perform camera auto-calibration from multiple images and dealing with the missing<br />
data problem. The method supposes semi-calibrated cameras (every intrinsic camera parameter except for the focal<br />
length is considered to be known) and constant focal length over all the images. The solution requires at least one image<br />
pair having at least eight common measured points. Tests verified that the algorithm is numerically stable and produces<br />
accurate results both on synthetic and real test sequences.<br />
12:00-12:20, Paper MoAT3.4<br />
Multi-Camera Platform Calibration using Multi-Linear Constraints<br />
Nyman, Patrik, Lund Univ.<br />
Heyden, Anders, Lund Univ.<br />
Astroem, Kalle, Lund Univ.<br />
We present a novel calibration method for multi-camera platforms, based on multi-linear constraints. The calibration<br />
method can recover the relative orientation between the different cameras on the platform, even when there are no corre-<br />
- 22 -
sponding feature points between the cameras, i.e. there are no overlaps between the cameras. It is shown that two translational<br />
motions in different directions are sufficient to linearly recover the rotational part of the relative orientation. Then<br />
two general motions, including both translation and rotation, are sufficient to linearly recover the translational part of the<br />
relative orientation. However, as a consequence of the speed-scale ambiguity the absolute scale of the translational part<br />
can not be determined if no prior information about the motions are known, e.g. from dead reckoning. It is shown that in<br />
case of planar motion, the vertical component of the translational part can not be determined. However, if at least one<br />
feature point can be seen in two different cameras, this vertical component can also be estimated. Finally, the performance<br />
of the proposed method is shown in simulated experiments.<br />
12:20-12:40, Paper MoAT3.5<br />
A Game-Theoretic Approach to Robust Selection of Multi-View Point Correspondence<br />
Rodolà, Emanuele, Univ. Ca’ Foscari Venezia<br />
Albarelli, Andrea, Univ. Ca’ Foscari di Venezia<br />
Torsello, Andrea, Univ. Ca’ Foscari<br />
In this paper we introduce a robust matching technique that allows very accurate selection of corresponding feature points<br />
from multiple views. Robustness is achieved by enforcing global geometric consistency at an early stage of the matching<br />
process, without the need of subsequent verification through reprojection. The global consistency is reduced to a pairwise<br />
compatibility making use of the size and orientation information provided by common feature descriptors, thus projecting<br />
what is a high-order compatibility problem into a pairwise setting. Then a game-theoretic approach is used to select a<br />
maximally consistent set of candidate matches, where highly compatible matches are enforced while incompatible correspondences<br />
are driven to extinction.<br />
MoAT4 Dolmabahçe Hall A<br />
Ensemble Learning Regular Session<br />
Session chair: Roli, Fabio (Univ. of Cagliari)<br />
11:00-11:20, Paper MoAT4.1<br />
A Bias-Variance Analysis of Bootstrapped Class-Separability Weighting for Error-Correcting Output Code Ensemble<br />
Smith, Raymond, Univ. of Surrey<br />
Windeatt, Terry, Univ. of Surrey<br />
We investigate the effects, in terms of a bias-variance decomposition of error, of applying class-separability weighting<br />
plus bootstrapping in the construction of error-correcting output code ensembles of binary classifiers. Evidence is presented<br />
to show that bias tends to be reduced at low training strength values whilst variance tends to be reduced across the full<br />
range. The relative importance of these effects, however, varies depending on the stability of the base classifier type.<br />
11:20-11:40, Paper MoAT4.2<br />
Multi-Class AdaBoost with Hypothesis Margin<br />
Jin, Xiaobo, Chinese Acad. of Sciences<br />
Hou, Xinwen, Chinese Acad. of Sciences<br />
Liu, Cheng-Lin, Chinese Acad. of Sciences<br />
Most AdaBoost algorithms for multi-class problems have to decompose the multi-class classification into multiple binary<br />
problems, like the Adaboost.MH and the LogitBoost. This paper proposes a new multi-class AdaBoost algorithm based<br />
on hypothesis margin, called AdaBoost.HM, which directly combines multi-class weak classifiers. The hypothesis margin<br />
maximizes the output about the positive class meanwhile minimizes the maximal outputs about the negative classes. We<br />
discuss the upper bound of the training error about AdaBoost.HM and a previous multi-class learning algorithm AdaBoost.M1.<br />
Our experiments using feed forward neural networks as weak learners show that the proposed AdaBoost.HM<br />
yields higher classification accuracies than the AdaBoost.M1 and the AdaBoost.MH, and meanwhile, AdaBoost.HM is<br />
computationally efficient in training.<br />
- 23 -
11:40-12:00, Paper MoAT4.3<br />
A Score Decidability Index for Dynamic Score Combination<br />
Lobrano, Carlo, DIEE- Univ. of Cagliari<br />
Tronci, Roberto, Univ. of Cagliari<br />
Giacinto, Giorgio, Univ. of Cagliari<br />
Roli, Fabio, Univ. of Cagliari<br />
In two-class problems, the combination of the outputs (scores) of an ensemble of classifiers is widely used to attain high<br />
performance. Dynamic combination techniques that estimate the combination parameters on a pattern per pattern basis,<br />
usually provide better performance than those of static combination techniques. In this paper, we propose an Index of Decidability<br />
derived from the Wilcox on-Mann-Whitney statistic, that is used to estimate the combination parameters. Reported<br />
results on a multimodal biometric dataset show the effectiveness of the proposed dynamic combination mechanisms<br />
in terms of misclassification errors.<br />
12:00-12:20, Paper MoAT4.4<br />
AUC-Based Combination of Dichotomizers: Is Whole Maximization also Effective for Partial Maximization?<br />
Ricamato, Maria Teresa, Univ. degli Studi di Cassino<br />
Tortorella, Francesco, Univ. degli Studi di Cassino<br />
The combination of classifiers is an established technique to improve the classification performance. When dealing with<br />
two-class classification problems, a frequently used performance measure is the Area under the ROC curve (AUC) since<br />
it is more effective than accuracy. However, in many applications, like medical or biometric ones, tests with false positive<br />
rate over a given value are of no practical use and thus irrelevant for evaluating the performance of the system. In these<br />
cases, the performance should be measured by looking only at the interesting part of the ROC curve. Consequently, the<br />
optimization goal is to maximize only a part of the AUC instead of the whole area. In this paper we propose a method tailored<br />
for these situations which builds a linear combination of two dichotomizers maximizing the partial AUC (pAUC).<br />
Another aim of the paper is to understand if methods that maximize the AUC can maximize also the pAUC. An empirical<br />
comparison drawn between algorithms maximizing the AUC and the proposed method shows that this latter is more effective<br />
for the pAUC maximization than methods designed to globally optimize the AUC.<br />
12:20-12:40, Paper MoAT4.5<br />
Random Prototypes-Based Oracle for Selection-Fusion Ensembles<br />
Armano, Giuliano, Univ. of Cagliari<br />
Hatami, Nima, Univ. of Cagliari<br />
Classifier ensembles based on selection-fusion strategy have recently aroused enormous interest. The main idea underlying<br />
this strategy is to use miniensembles instead of monolithic base classifiers in an ensemble in order to improve the overall<br />
performance. This paper proposes a classifier selection method to be used in selection-fusion strategies. The method involves<br />
first splitting the original classification problem according to some prototypes randomly selected from training<br />
data, and then building a classifier on each subset. The trained classifiers, together with an oracle used to switch between<br />
them, form a miniensemble of classifier selection. With respect to the other methods used in the selection-fusion framework,<br />
the proposed method has proven to be more efficient in the decomposition process with no limitation in the number of resulting<br />
partitions. Experimental results on some datasets from the UCI repository show the validity of the proposed method.<br />
MoAT5 Dolmabahçe Hall B<br />
Detection and Segmentation of Audio Signals Regular Session<br />
Session chair: Erdogan, Hakan (Sabanci Univ.)<br />
11:00-11:20, Paper MoAT5.1<br />
Noise-Robust Voice Activity Detector based on Hidden Semi-Markov Models<br />
Liu, Xianglong, Beihang Univ.<br />
Liang, Yuan, Beihang Univ.<br />
Lou, Yihua, Beihang Univ.<br />
Li, He, Beihang Univ.<br />
Shan, Baosong, Beihang Univ.<br />
- 24 -
This paper concentrates on speech duration distributions that are usually invariant to noises and proposes a noise-robust<br />
and real-time voice activity detector (VAD) using the hidden semi-Markov model (HSMM) to explicitly model state durations.<br />
Motivated by statistical observations and tests on TIMIT and the IEEE sentence database, we use Weibull distributions<br />
to model state durations approximately and estimate their parameters by maximum likelihood estimators. The<br />
final VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge and modified<br />
forward variables. An efficient way that recursively calculates modified forward variables is devised and a dynamic adjustment<br />
scheme is used to update parameters. Experiments on noisy speech data show that the proposed method performs<br />
more robustly and accurately than the standard ITU-T G.729B VAD and AMR2.<br />
11:20-11:40, Paper MoAT5.2<br />
Simultaneous Segmentation and Modelling of Signals based on an Equipartition Principle<br />
Panagiotakis, Costas, Univ. of Crete<br />
We propose a general framework for simultaneous segmentation and modelling of signals based on an Equipartition Principle<br />
(EP). According to EP, the signal is divided into segments with equal reconstruction errors by selecting the most<br />
suitable model to describe each segment. In addition, taking into account change detection on signal model an efficient<br />
signal reconstruction is also obtained. The model selection concerns both the kind and the order of the model. The proposed<br />
methodology is very flexible on different error criteria and signal features.<br />
11:40-12:00, Paper MoAT5.3<br />
Voice Activity Detection based on Complex Exponential Atomic Decomposition and Likelihood Ratio Test<br />
Deng, Shiwen, Harbin Inst. of Tech.<br />
Han, Jiqing, Harbin Inst. of Tech.<br />
The voice activity detection (VAD) algorithms by using Discrete Fourier Transform (DFT) coefficients are widely found<br />
in literature. However, some shortcomings for modeling a signal in the DFT can easily degrade the performance of a VAD<br />
in noise environment. To overcome the problem, this paper presents a novel approach by using the complex coefficients<br />
derived from complex exponential atomic decomposition of a signal. Those coefficients are modeled by a complex Gaussian<br />
probability distribution and a statistical model is employed to derive the decision rule from the likelihood ratio test.<br />
According to the experimental results, the proposed VAD method shows better performance than the VAD based on DFT<br />
coefficients in various noise environments.<br />
12:00-12:20, Paper MoAT5.4<br />
Speaker Change Detection based on the Pairwise Distance Matrix<br />
Seo, Jin S., Gangneung-Wonju National Univ.<br />
Speaker change detection is most commonly done by statistically determining whether the two adjacent segments of a<br />
speech stream are significantly different or not. In this paper, we propose a novel method to detect speaker change points<br />
based on the minimum statistics of the pairwise distance matrix of feature vectors. The use of the minimum statistics<br />
makes it possible to compare between the similar acoustic groups, which is effective in suppressing the phonetic variation.<br />
Experimental results showed that the proposed method is promising for speech change detection problem.<br />
12:20-12:40, Paper MoAT5.5<br />
Real-Time User Position Estimation in Indoor Environments using Digital Watermarking for Audio Signals<br />
Kaneto, Ryosuke, Osaka Univ.<br />
Nakashima, Yuta, Osaka Univ.<br />
Babaguchi, Noboru, Osaka Univ.<br />
In this paper, we propose a method for estimating the user position where a user is holding a microphone in an indoor environment<br />
using digital watermarking for audio signals. The proposed method utilizes detection strengths, which are calculated<br />
while detecting spread-spectrum-based watermarks. Taking into account delays and attenuation of the watermarked<br />
signals emitted from multiple loudspeakers and other factors, we construct a model of detection strengths. The user position<br />
is estimated in real-time using the model. The experimental results indicate that the user positions are estimated with 1.3<br />
m of root mean squared error on average for the case where the user is static. We demonstrate that the proposed method<br />
successfully estimates the user position even when the user moves.<br />
- 25 -
MoAT6 Topkapı Hall B<br />
Human Computer Interaction Regular Session<br />
Session chair: Drygajlo, Andrzej (EPFL)<br />
11:00-11:20, Paper MoAT6.1<br />
Gaze Probing: Event-Based Estimation of Objects being Focused On<br />
Yonetani, Ryo, Kyoto Univ.<br />
Kawashima, Hiroaki, Kyoto Univ.<br />
Hirayama, Takatsugu, Kyoto Univ.<br />
Matsuyama, Takashi, Kyoto Univ.<br />
We propose a novel method to estimate the object that a user is focusing on by using the synchronization between the<br />
movements of objects and a user’s eyes as a cue. We first design an event as a characteristic motion pattern, and we then<br />
embed it within the movement of each object. Since the user’s ocular reactions to these events are easily detected using a<br />
passive camera-based eye tracker, we can successfully estimate the object that the user is focusing on as the one whose<br />
movement is most synchronized with the user’s eye reaction. Experimental results obtained from the application of this<br />
system to dynamic content (consisting of scrolling images) demonstrate the effectiveness of the proposed method over<br />
existing methods.<br />
11:20-11:40, Paper MoAT6.2<br />
A Covariate Shift Minimisation Method to Alleviate Non-Stationarity Effects for an Adaptive Brain-Computer Interface<br />
Satti, Abdul Rehman, Univ. of Ulster<br />
Guan, Cuntai, Inst. For Infocomm Res.<br />
Coyle, Damien, Univ. of Ulster<br />
Prasad, Girijesh, Univ. of Ulster<br />
The non-stationary nature of the electroencephalogram (EEG) poses a major challenge for the successful operation of a<br />
brain-computer interface (BCI) when deployed over multiple sessions. The changes between the early training measurements<br />
and the proceeding multiple sessions can originate as a result of alterations in the subject’s brain process, new<br />
cortical activities, change of recording conditions and/or change of operation strategies by the subject. These differences<br />
and alterations over multiple sessions cause deterioration in BCI system performance if periodic or continuous adaptation<br />
to the signal processing is not carried out. In this work, the covariate shift is analyzed over multiple sessions to determine<br />
the non-stationarity effects and an unsupervised adaptation approach is employed to account for the degrading effects this<br />
might have on performance. To improve the system’s online performance, we propose a covariate shift minimization<br />
(CSM) method, which takes into account the distribution shift in the feature set domain to reduce the feature set overlap<br />
and unbalance for different classes. The analysis and the results demonstrate the importance of CSM, as this method not<br />
only improves the accuracy of the system, but also reduces the classification unbalance for different classes by a significant<br />
amount.<br />
11:40-12:00, Paper MoAT6.3<br />
A Probabilistic Language Model for Hand Drawings<br />
Akce, Abdullah, Univ. of Illinois at Urbana-Champaign<br />
Bretl, Timothy, Univ. of Illinois at Urbana-Champaign<br />
Probabilistic language models are critical to applications in natural language processing that include speech recognition,<br />
optical character recognition, and interfaces for text entry. In this paper, we present a systematic way to learn a similar<br />
type of probabilistic language model for hand drawings from a database of existing artwork by representing each stroke<br />
as a sequence of symbols. First, we propose a language in which the symbols are circular arcs with length fixed by a scale<br />
parameter and with curvature chosen from a fixed low-cardinality set. Then, we apply an algorithm based on dynamic<br />
programming to represent each stroke of the drawing as a sequence of symbols from our alphabet. Finally, we learn the<br />
probabilistic language model by constructing a Markov model. We compute the entropy of our language in a test set as<br />
measured by the expected number of bits required for each symbol. Our language model might be applied in future work<br />
to create a drawing interface for noisy and low-bandwidth input devices, for example an electroencephalograph (EEG)<br />
that admits one binary command per second. The results indicate that by leveraging our language model, the performance<br />
of such an interface would be enhanced by about 20 percent.<br />
- 26 -
12:00-12:20, Paper MoAT6.4<br />
AR-PCA-HMM Approach for Sensorimotor Task Classification in EEG-Based Brain-Computer Interfaces<br />
Argunsah, Ali Ozgur, Inst. Gulbenkian de Ciencia<br />
Cetin, Mujdat, Sabanci Univ.<br />
We propose an approach based on Hidden Markov models (HMMs) combined with principal component analysis (PCA)<br />
for classification of four-class single trial motor imagery EEG data for brain computer interfacing (BCI) purposes. We extract<br />
autoregressive (AR) parameters from EEG data and use PCA to decrease the number of features for better training<br />
of HMMs. We present experimental results demonstrating the improvements provided by our approach over an existing<br />
HMM-based EEG single trial classification approach as well as over state-of-the-art classification methods.<br />
12:20-12:40, Paper MoAT6.5<br />
Design, Implementation and Evaluation of a Real-Time P300-Based Brain-Computer Interface System<br />
Amcalar, Armagan, Sabanci Univ.<br />
Cetin, Mujdat, Sabanci Univ.<br />
We present a new end-to-end brain-computer interface system based on electroencephalography (EEG). Our system exploits<br />
the P300 signal in the brain, a positive deflection in event-related potentials, caused by rare events. P300 can be<br />
used for various tasks, perhaps the most well-known being a spelling device. We have designed a flexible visual stimulus<br />
mechanism that can be adapted to user preferences and developed and implemented EEG signal processing, learning and<br />
classification algorithms. Our classifier is based on Bayes linear discriminant analysis, in which we have explored various<br />
choices and improvements. We have designed data collection experiments for offline and online decision-making and<br />
have proposed modifications in the stimulus and decision-making procedure to increase online efficiency. We have evaluated<br />
the performance of our system on 8 healthy subjects on a spelling task and have observed that our system achieves<br />
higher average speed than state-of-the-art systems reported in the literature for a given classification accuracy.<br />
MoAT7 Dolmabahçe Hall C<br />
Video Classification and Retrieval Regular Session<br />
Session chair: Sarkar, Sudeep (Univ. of South Florida)<br />
11:00-11:20, Paper MoAT7.1<br />
Motion-Sketch based Video Retrieval using a Trellis Levenshtein Distance<br />
Hu, Rui, Univ. of Surrey<br />
Collomosse, John Philip, Univ. of Surrey<br />
We present a fast technique for retrieving video clips using free-hand sketched queries. Visual keypoints within each video<br />
are detected and tracked to form short trajectories, which are clustered to form a set of space-time tokens summarising<br />
video content. A Viterbi process matches a space-time graph of tokens to a description of colour and motion extracted<br />
from the query sketch. Inaccuracies in the sketched query are ameliorated by computing path cost using a Levenshtein<br />
(edit) distance. We evaluate over datasets of sports footage.<br />
11:20-11:40, Paper MoAT7.2<br />
Extracting Key Sub-Trajectory Features for Supervised Tactic Detection in Sports Video<br />
Zhang, Yi, Chinese Acad. of Sciences<br />
Xu, Changsheng, Chinese Acad. of Sciences<br />
Lu, Hanqing, Chinese Acad. of Sciences<br />
Tactic analysis is receiving more attention in sports video analysis for its assistance to coaches and players. This paper<br />
proposes an efficient key sub-trajectory feature representation of ball trajectory for tactic analysis. Ball trajectories are<br />
modeled with the generalized suffix tree where frequent sub-trajectory patterns are searched for. Key sub-trajectory patterns<br />
are extracted by further filtering these frequent sub-trajectory patterns. Instead of directly using individual sub-trajectories<br />
as features to train tactic detectors, we take key sub-trajectory patterns as a whole. Key sub-trajectory feature representation<br />
effectively removes noise, reduces the dimension of features, and improves the performance of supervised learning to<br />
detect tactics.<br />
- 27 -
11:40-12:00, Paper MoAT7.3<br />
A New Symmetry based on Proximity of Wavelet-Moments for Text Frame Classification in Video<br />
Palaiahnakote, Shivakumara, National Univ. of Singapore<br />
Dutta, Anjan, Univ. Autonoma de Barcelona<br />
Tan, Chew-Lim, National Univ. of Singapore<br />
Pal, Umapada, Indian Statistical Inst.<br />
This paper proposes the use of a new symmetry property based on proximity of the median moments in the wavelet domain.<br />
The method divides a given frame into 16 equally sized blocks to classify the true text frame. The average of high frequency<br />
subbands of a block is used for computing median moments to brighten the text pixel in a block of video frame. Then Kmeans<br />
clustering with K=2 is applied on the median moments of the block to classify it as a probable text block. For classified<br />
blocks, average wavelet median moments are computed for a sliding window. We introduce Max-Min cluster to<br />
classify the probable text pixel in each probable text block. The four quadrants are formed from the centroid of the probable<br />
text pixels. The new concept called symmetry is introduced to identify the true text block based on proximity between<br />
probable text pixels in each quadrant. If the frame produces at least one true text block, it is considered as a text frame<br />
otherwise a non-text frame. The method is tested on three datasets to evaluate the robustness of the method in classification<br />
of text frames in terms of recall and precision.<br />
12:00-12:20, Paper MoAT7.4<br />
Edge based Binarization for Video Text Images<br />
Zhou, Zhiwei, National Univ. of Singapore<br />
Li, Linlin, Univ. of Singapore<br />
Tan, Chew-Lim, National Univ. of Singapore<br />
This paper introduces a binarization method based on edge for video text images, especially for images with complex<br />
background or low contrast. The binarization method first detects the contour of the text, and utilizes a local thresholding<br />
method to decide the inner side of the contour, and then fills up the contour to form characters that are recognizable to<br />
OCR software. Experiment results show that our method is especially effective on complex background and low contrast<br />
images.<br />
12:20-12:40, Paper MoAT7.5<br />
Detecting Group Turn Patterns in Conversations using Audio-Video Change Scale-Space<br />
Krishnan, Ravikiran, Univ. of South Florida<br />
Sarkar, Sudeep, Univ. of South Florida<br />
Automatic analysis of conversations is important for extracting high-level descriptions of meetings. In this work, as an alternative<br />
to linguistic approaches, we develop a novel, purely bottom-up representation, constructed from both audio and<br />
video signals that help us characterize and build a rich description of the content at multiple temporal scales. We consider<br />
the evolution of the detected change, using Bayesian Information Criterion (BIC) at multiple temporal scales to build an<br />
audio-visual change scale space. Peaks detected in this representation, yields group-turn based conversational changes at<br />
different temporal scales. Conversation overlaps, changes and their inferred models offer an intermediate-level description<br />
of meeting videos that can be useful in summarization and indexing of meetings. Results on NIST meeting room dataset<br />
showed a true positive rate of 88%<br />
14:00-15:00, MoP2L1 Anadolu Auditorium<br />
Embracing Uncertainty: The New Machine Intelligence<br />
Christopher M. Bishop Plenary Session<br />
Microsoft Research Cambridge, UK<br />
Professor Chris Bishop is Chief Research Scientist at Microsoft Research Cambridge. He also has a Chair in computer<br />
science at the University of Edinburgh, and is a Fellow of Darwin College Cambridge. Chris is the author of the leading<br />
text<strong>book</strong> “Pattern Recognition and Machine Learning” (Springer, 2006). His research interests include probabilistic approaches<br />
to machine learning, as well as their application to fields such as biomedical sciences and healthcare.<br />
The first successful applications of machine intelligence were based on expert systems constructed using rules elicited<br />
from human experts. Limitations in the applicability of this approach helped drive the second generation of machine intelligence<br />
methods, as typified by neural as elin and as eli vector machines, which can be characterised as black-box sta-<br />
- 28 -
tistical models fitted to large data sets. In this talk I will describe a new paradigm for machine intelligence, based on probabilistic<br />
graphical models, which has emerged over the last five years and which allows strong prior knowledge from<br />
domain experts to be combined with machine learning techniques to enable a new generation of large-scale applications.<br />
The talk will be illustrated with tutorial examples as well as real-world case studies.<br />
MoBT1 Marmara Hall<br />
Tracking and Surveillance – I Regular Session<br />
Session chair: Goldgof, Dmitry (Univ of South Florida)<br />
15:30-15:50, Paper MoBT1.1<br />
Improved Shadow Removal for Robust Person Tracking in Surveillance Scenarios<br />
Sanin, Andres, NICTA<br />
Sanderson, Conrad, NICTA<br />
Lovell, Brian Carrington, The Univ. of Queensland<br />
Shadow detection and removal is an important step employed after foreground detection, in order to improve the segmentation<br />
of objects for tracking. Methods reported in the literature typically have a significant trade-off between the shadow<br />
detection rate (classifying true shadow areas as shadows) and the shadow discrimination rate (discrimination between<br />
shadows and foreground). We propose a method that is able to achieve good performance in both cases, leading to improved<br />
tracking in surveillance scenarios. Chromacity information is first used to create a mask of candidate shadow pixels, followed<br />
by employing gradient information to remove foreground pixels that were incorrectly included in the mask. Experiments<br />
on the CAVIAR dataset indicate that the proposed method leads to considerable improvements in multiple object<br />
tracking precision and accuracy.<br />
15:50-16:10, Paper MoBT1.2<br />
Multi-Cue Integration for Multi-Camera Tracking<br />
Chen, Kuan-Wen, National Taiwan Univ.<br />
Hung, Yi-Ping, National Taiwan Univ.<br />
For target tracking across multiple cameras with disjoint views, previous works usually employed multiple cues and<br />
focused on learning a better matching model of each cue, separately. However, none of them had discussed how to integrate<br />
these cues to improve performance, to our best knowledge. In this paper, we look into the multi-cue integration problem<br />
and propose an unsupervised learning method since a complicated training phase is not always viable. In the experiments,<br />
we evaluate several types of score fusion methods and show that our approach learns well and can be applied to large<br />
camera networks more easily.<br />
16:10-16:30, Paper MoBT1.3<br />
Learning Pedestrian Trajectories with Kernels<br />
Ricci, Elisa, Fondazione Bruno Kessler<br />
Tobia, Francesco, Fondazione Bruno Kessler<br />
Zen, Gloria, Fondazione Bruno Kessler<br />
We present a novel method for learning pedestrian trajectories which is able to describe complex motion patterns such as<br />
multiple crossing paths. This approach adopts Kernel Canonical Correlation Analysis (KCCA) to build a mapping between<br />
the physical location space and the trajectory patterns space. To model crossing paths we rely on a clustering algorithm<br />
based on Kernel K-means with a Dynamic Time Warping (DTW) kernel. We demonstrate the effectiveness of our method<br />
incorporating the learned motion model into a multi-person tracking algorithm and testing it on several video surveillance<br />
sequences.<br />
16:30-16:50, Paper MoBT1.4<br />
Bag of Features Tracking<br />
Yang, Fan, Dalian Univ. of Tech.<br />
Lu, Hu-Chuan, Dalian Univ. of Tech.<br />
Chen, Yen-Wei, Ritsumeikan Univ.<br />
- 29 -
In this paper, we propose a visual tracking approach based on “bag of features” (BoF) algorithm. We randomly sample<br />
image patches within the object region in training frames for constructing two code<strong>book</strong>s using RGB and LBP features,<br />
instead of only one code<strong>book</strong> in traditional BoF. Tracking is accomplished by searching for the highest similarity between<br />
candidates and code<strong>book</strong>s. Besides, updating mechanism and result refinement scheme are included in BoF tracking. We<br />
fuse patch-based approach and global template-based approach into a unified framework. Experiments demonstrate that<br />
our approach is robust in handling occlusion, scaling and rotation.<br />
16:50-17:10, Paper MoBT1.5<br />
Gradient Constraints Can Improve Displacement Expert Performance<br />
Tresadern, Philip Andrew, Univ. of Manchester<br />
Cootes, Tim, The Univ. of Manchester<br />
The `displacement expert’ has recently proven popular for rapid tracking applications. In this paper, we note that experts<br />
are typically constrained only to produce approximately correct parameter updates at training locations. However, we<br />
show that incorporating constraints on the gradient of the displacement field within the learning framework results in an<br />
expert with better convergence and fewer local minima. We demonstrate this proposal for facial feature localization in<br />
static images and object tracking over a sequence.<br />
MoBT2 Topkapı Hall B<br />
Dimensionality Reduction Regular Session<br />
Session chair: Somol, Petr (Institute of Information Theory and Automation)<br />
15:30-15:50, Paper MoBT2.1<br />
Temporal Extension of Laplacian Eigenmaps for Unsupervised Dimensionality Reduction of Time Series<br />
Lewandowski, Michal, Kingston Univ.<br />
Martinez-Del-Rincon, Jesus, Kingston Univ.<br />
Makris, Dimitrios, Kingston Univ.<br />
Nebel, Jean-Christophe, Kingston Univ.<br />
A novel non-linear dimensionality reduction method, called Temporal Laplacian Eigenmaps, is introduced to process efficiently<br />
time series data. In this embedded-based approach, temporal information is intrinsic to the objective function,<br />
which produces description of low dimensional spaces with time coherence between data points. Since the proposed<br />
scheme also includes bidirectional mapping between data and embedded spaces and automatic tuning of key parameters,<br />
it offers the same benefits as mapping-based approaches. Experiments on a couple of computer vision applications demonstrate<br />
the superiority of the new approach to other dimensionality reduction method in term of accuracy. Moreover, its<br />
lower computational cost and generalisation abilities suggest it is scalable to larger datasets.<br />
15:50-16:10, Paper MoBT2.2<br />
Orthogonal Locality Sensitive Fuzzy Discriminant Analysis in Sleep-Stage Scoring<br />
Khushaba, Rami N., Univ. of Tech. Sydney<br />
Elliott, Rosalind, Univ. of Tech. Sydney<br />
Alsukker, Akram, Univ. of Tech. Sydney<br />
Al-Ani, Ahmed, Univ. of Tech. Sydney<br />
Mckinley, Sharon, Univ. of Tech. Sydney<br />
Sleep-stage scoring plays an important role in analyzing the sleep patterns of people. Studies have revealed that Intensive<br />
Care Unit (ICU) patients do not usually get enough quality sleep, and hence, analyzing their sleep patterns is of increased<br />
importance. Due to the fact that sleep data are usually collected from a number of Electroencephalogram (EEG), Electromyogram<br />
(EMG) and Electrooculography (EOG) channels, the feature set size can become large, which may affect the<br />
development of on-line scoring systems. Hence, a dimensionality reduction step is needed. One of the powerful dimensionality<br />
reduction approaches is based on the concept of Linear Discriminant Analysis (LDA). Unlike existing variants<br />
of LDA, this paper presents a new method that considers the fuzzy nature of input measurements while preserving their<br />
local structure. Practical results indicate the significance of preserving the local structure of sleep data, which is achieved<br />
by the proposed method, and hence attaining superior results to other dimensionality reduction methods.<br />
- 30 -
16:10-16:30, Paper MoBT2.3<br />
A Recursive Online Kernel PCA Algorithm<br />
Hasanbelliu, Erion, Univ. of Florida<br />
Sanchez-Giraldo, Luis Gonzalo, Univ. of Florida<br />
Principe, Jose, Univ. of Florida<br />
In this paper, we describe a new method for performing kernel principal component analysis which is online and also has<br />
a fast convergence rate. The method follows the Rayleigh quotient to obtain a fixed point update rule to extract the leading<br />
eigenvalue and eigenvector. Online deflation is used to estimate the remaining components. These operations are performed<br />
in reproducing kernel Hilbert space (RKHS) with linear order memory and computation complexity. The derivation of the<br />
method and several applications are presented.<br />
16:30-16:50, Paper MoBT2.4<br />
Effective Dimensionality Reduction based on Support Vector Machine<br />
Moon, Sangwoo, Univ. of Tennessee<br />
Qi, Hairong, Univ. of Tennessee<br />
This paper presents an effective dimensionality reduction method based on support vector machine. By utilizing mapping<br />
vectors from support vector machine for dimensionality reduction purpose, we obtain features which are computationally<br />
efficient, providing high classification accuracy and robustness especially in noisy environment. These characteristics are<br />
acquired from the generalization capability of support vector machine by minimizing the structural risk. To further reduce<br />
dimensionality, this paper introduces the redundancy removal process based on an asymmetric I relation measure with<br />
kernel function. Experimental results show that the proposed dimensionality reduction method provides the most appropriate<br />
trade off between classification accuracy and robustness in relatively low dimensional space.<br />
16:50-17:10, Paper MoBT2.5<br />
Prototype Selection for Dissimilarity Representation by a Genetic Algorithm<br />
Plasencia, Yenisel, CENATAV, Cuba<br />
Garcia, Edel, Advanced Tech. Application Center<br />
Orozco-Alzate, Mauricio, Univ. Nacional de Colombia Sede Manizales, Colombia<br />
Duin, Robert, TU Delft<br />
Dissimilarities can be a powerful way to represent objects like strings, graphs and images for which it is difficult to find<br />
good features. The resulting dissimilarity space may be used to train any classifier appropriate for feature spaces. There is,<br />
however, a strong need for dimension reduction. Straightforward procedures for prototype selection as well as feature selection<br />
have been used for this in the past. Complicated sets of objects may need more advanced procedures to overcome local minima.<br />
In this paper it is shown that genetic algorithms, previously used for feature selection, may be used for building good<br />
dissimilarity spaces as well, especially when small sets of prototypes are needed for computational reasons.<br />
MoBT3 Topkapı Hall A<br />
Motion and Multiple-View Vision – II Regular Session<br />
Session chair: Torsello, Andrea (Univ. Ca’ Foscari)<br />
15:30-15:50, Paper MoBT3.1<br />
Multiple View Geometry for Non-Rigid Motions Viewed from Curvilinear Motion Projective Cameras<br />
Wan, Cheng, Nagoya Inst. of Tech.<br />
Sato, Jun, Nagoya Inst. of Tech.<br />
This paper presents a tensorial representation of multiple projective cameras with arbitrary curvilinear motions. It enables<br />
us to define multilinear relationship of image points derived from non-rigid object motions viewed from multiple cameras<br />
with arbitrary curvilinear motions. We show the new multilinear relationship is useful for generating images of non-rigid<br />
object motions viewed from cameras with arbitrary curvilinear motions. The method is tested in real image sequences.<br />
- 31 -
15:50-16:10, Paper MoBT3.2<br />
Estimating Nonrigid Shape Deformation using Moments<br />
Liu, Wei, Florida Inst. of Tech.<br />
Ribeiro, Eraldo, Florida Inst. of Tech.<br />
Image moments have been widely used for designing robust shape descriptors that are invariant to rigid transformations.<br />
In this work, we address the problem of estimating non-rigid deformation fields based on image moment variations. By<br />
using a single family of polynomials to both parameterize the deformation field and to define image moments, we can<br />
represent image moments variation as a system of quadratic functions, and solve for the deformation parameters. As a<br />
result, we can recover the deformation field between two images without solving the correspondence problem. Additionally,<br />
our method is highly robust to image noise. The method was tested on both synthetically deformed MPEG-7 shapes and<br />
cardiac MRI sequences.<br />
16:10-16:30, Paper MoBT3.3<br />
Optical Flow Estimation using Diffusion Distances<br />
Wartak, Szymon, Univ. of York<br />
Bors, Adrian, Univ. of York<br />
In this paper we apply the diffusion framework to dense optical flow estimation. Local image information is represented<br />
by matrices of gradients between paired locations. Diffusion distances are modelled as sums of eigenvectors weighted by<br />
their eigenvalues extracted following the eigen decomposion of these matrices. Local optical flow is estimated by correlating<br />
diffusion distances characterizing features from different frames. A feature confidence factor is defined based on<br />
the local correlation efficiency when compared to that of its neighbourhood. High confidence optical flow estimates are<br />
propagated to areas of lower confidence.<br />
16:30-16:50, Paper MoBT3.4<br />
Novel Multi View Structure Estimation based on Barycentric Coordinates<br />
Ruether, Matthias, Graz Univ. of Tech.<br />
Bischof, Horst, Graz Univ. of Tech.<br />
Traditionally, multi-view stereo algorithms estimate three-dimensional structure from corresponding points by linear triangulation<br />
or bundle-adjustment. This introduces systematic errors in case of inaccurate camera calibration and partial<br />
occlusion. The errors are not negligible in applications requiring high accuracy like micro-metrology or quality inspection.<br />
We show how accuracy of structure estimation can be significantly increased by using a barycentric coordinate representation<br />
for central perspective projection. Experiments show a reduction of geometric error by 50% compared with bundle<br />
adjustment. The error remains almost constantly low, even under partial occlusion.<br />
16:50-17:10, Paper MoBT3.5<br />
Estimation of Non-Rigid Surface Deformation using Developable Surface Model<br />
Watanabe, Yoshihiro, Univ. of Tokyo<br />
Nakashima, Takashi, Univ. of Tokyo<br />
Komuro, Takashi, Univ. of Tokyo<br />
Ishikawa, Masatoshi, Univ. of Tokyo<br />
There is a strong demand for a method of acquiring a non-rigid shape under deformation with high accuracy and high resolution.<br />
However, this is difficult to achieve because of performance limitations in measurement hardware. In this paper,<br />
we propose a model based method for estimating non-rigid deformation of a developable surface. The model is based on<br />
geometric characteristics of the surface, which are important in various applications. This method improves the accuracy<br />
of surface estimation and planar development from a low-resolution point cloud. Experiments using curved documents<br />
showed the effectiveness of the proposed method.<br />
- 32 -
MoBT4 Dolmabahçe Hall A<br />
Ocular Biometrics Regular Session<br />
Session chair: Zhang, David (The Hong Kong Polytechnic Univ.)<br />
15:30-15:50, Paper MoBT4.1<br />
On the Fusion of Periocular and Iris Biometrics in Non-Ideal Imagery<br />
Woodard, Damon, Clemson Univ.<br />
Pundlik, Shrinivas, Clemson Univ.<br />
Miller, Philip, Clemson Univ.<br />
Jillela, Raghavender, West Virginia Univ.<br />
Ross, Arun, West Virginia Univ.<br />
Human recognition based on the iris biometric is severely impacted when encountering non-ideal images of the eye characterized<br />
by occluded irises, motion and spatial blur, poor contrast, and illumination artifacts. This paper discusses the<br />
use of the periocular region surrounding the iris, along with the iris texture patterns, in order to improve the overall recognition<br />
performance in such images. Periocular texture is extracted from a small, fixed region of the skin surrounding the<br />
eye. Experiments on the images extracted from the Near Infra-Red (NIR) face videos of the Multi Biometric Grand Challenge<br />
(MBGC) dataset demonstrate that valuable information is contained in the periocular region and it can be fused with<br />
the iris texture to improve the overall identification accuracy in non-ideal situations.<br />
15:50-16:10, Paper MoBT4.2<br />
Genetic-Based Type II Feature Extraction for Periocular Biometric Recognition: Less is More<br />
Adams, Joshua, North Carolina A&T Univ.<br />
Woodard, Damon, Clemson Univ.<br />
Dozier, Gerry, North Carolina A&T State Univ.<br />
Miller, Philip, Clemson Univ.<br />
Bryant, Kelvin, North Carolina A&T State Univ.<br />
Glenn, George, North Carolina A&T State Univ.<br />
Given an image from a biometric sensor, it is important for the feature extraction module to extract an original set of<br />
features that can be used for identity recognition. This form of feature extraction has been referred to as Type I feature extraction.<br />
For some biometric systems, Type I feature extraction is used exclusively. However, a second form of feature extraction<br />
does exist and is concerned with optimizing/minimizing the original feature set given by a Type I feature extraction<br />
method. This second form of feature extraction has been referred to as Type II feature extraction (feature selection). In<br />
this paper, we present a genetic-based Type II feature extraction system, referred to as GEFE (Genetic & Evolutionary<br />
Feature Extraction), for optimizing the feature sets returned by Loocal Binary Pattern Type I feature extraction for periocular<br />
biometric recognition. Our results show that not only does GEFE dramatically reduce the number of features needed<br />
but the evolved features sets also have higher recognition rates.<br />
16:10-16:30, Paper MoBT4.3<br />
Multispectral Eye Detection: A Preliminary Study<br />
Whitelam, Cameron, WVU<br />
Jafri, Zain, WVU<br />
Bourlai, Thirimachos, WVU<br />
In this paper the problem of eye detection across three different bands, i.e., the visible, multispectral, and short wave<br />
infrared (SWIR), is studied in order to illustrate the advantages and limitations of multi-band eye localization. The contributions<br />
of this work are two-fold. First, a multi-band database of 30 subjects is assembled and used to illustrate the<br />
challenges associated with the problem. Second, a set of experiments is performed in order to demonstrate the possibility<br />
for multi-band eye detection. Experiments show that the eyes on face images captured under different bands can be detected<br />
with promising results. Finally, we illustrate that recognition performance in all studied bands is favorably affected by the<br />
geometric normalization of raw face images that is based on our proposed detection methodology. To the best of our<br />
knowledge this is the first time that this problem is being investigated in the open literature in the context of human eye<br />
localization across different bands.<br />
- 33 -
16:30-16:50, Paper MoBT4.4<br />
Entropy of Feature Point-Based Retina Templates<br />
Jeffers, Jason, RMIT Univ.<br />
Arakala, Arathi, RMIT Univ.<br />
Horadam, Kathy, RMIT Univ.<br />
This paper studies the amount of distinctive information contained in a privacy protecting and compact template of a<br />
retinal image created from the locations of crossings and bifurcations in the choroidal vasculature, otherwise called feature<br />
points. Using a training set of 20 different retina, we build a template generator that simulates one million imposter comparisons<br />
and computes the number of imposter retina comparisons that successfully matched at various thresholds. The<br />
template entropy thus computed was used to validate a theoretical model of imposter comparisons. The simulator and the<br />
model both estimate that 20 bits of entropy can be achieved by the feature point-based template. Our results reveal the<br />
distinctiveness of feature point-based retinal templates, hence establishing their potential as a biometric identifier for high<br />
security and memory intensive applications.<br />
16:50-17:10, Paper MoBT4.5<br />
Hierarchical Fusion of Face and Iris for Personal Identification<br />
Zhang, Xiaobo, Chinese Acad. of Sciences<br />
Sun, Zhenan, Chinese Acad. of Sciences<br />
Tan, Tieniu, Chinese Acad. of Sciences<br />
Most existing face and iris fusion schemes are concerned about improving performance on good quality images under<br />
controlled environments. In this paper, we propose a hierarchical fusion scheme for low quality images under uncontrolled<br />
situations. In the training stage, canonical correlation analysis (CCA) is adopted to construct a statistical mapping from<br />
face to iris in pixel level. In the testing stage, firstly the probe face image is used to obtain a subset of candidate gallery<br />
samples via regression between the probe face and gallery irises, then ordinal representation and sparse representation are<br />
performed on these candidate samples for iris recognition and face recognition respectively. Finally, score level fusion via<br />
min-max normalization is performed to make final decision. Experimental results on our low quality database show the<br />
outperforming performance of proposed method.<br />
MoBT5 Anadolu Auditorium<br />
Image Analysis – II Regular Session<br />
Session chair: Mirmehdi, Majid (Univ. of Bristol)<br />
15:30-15:50, Paper MoBT5.1<br />
Wavelet-Based Texture Retrieval using a Mixture of Generalized Gaussian Distributions<br />
Allili, Mohand Said, Univ. du Québec en Outaouais<br />
In this paper, we address the texture retrieval problem using wavelet distribution. We propose a new statistical scheme to<br />
represent the marginal distribution of the wavelet coefficients using a mixture of generalized Gaussian distributions<br />
(MoGG). The MoGG allows to capture a wide range of histogram shapes, which provides a better description of texture<br />
and enhances texture discrimination. We propose a similarity measurement based on Kullback-Leibler distance (KLD),<br />
which is calculated using MCMC Metropolis-Hastings sampling algorithm. We show that our approach yields better texture<br />
retrieval results than previous methods using only a single probability density function (<strong>pdf</strong>) for wavelet representation,<br />
or texture energy distribution.<br />
15:50-16:10, Paper MoBT5.2<br />
Adaptive Color Curve Models for Image Matting<br />
Cho, Sunyoung, Yonsei Univ.<br />
Byun, Hyeran, Yonsei Univ.<br />
Image matting is the process of extracting a foreground element from a single image with limited user input. To solve the<br />
inherently ill-posed problem, there exist various methods which use specific color model. One representative method assumes<br />
that the colors of the foreground and background elements satisfy the linear color model. The other recent method<br />
considers line-point color model and point-point color model. In this paper we present a new adaptive color curve model<br />
for image matting. We assume that the colors of local region form curve. Based on these pixels in the local region, we<br />
adaptively construct a curve model using quadratic Bezier curve model. This curve model enables us to derive a matting<br />
- 34 -
equation for estimating alphas of pixels forming a curve using quadratic formula. We show that our model estimates alpha<br />
mattes comparable or more accurately than recent existing methods.<br />
16:10-16:30, Paper MoBT5.3<br />
Fast and Accurate Approximation of the Euclidean Opening Function in Arbitrary Dimension<br />
Coeurjolly, David, CNRS – Univ. Claude Bernard Lyon 1<br />
In this paper, we present a fast and accurate approximation of the Euclidean opening function which is a wide-used tool<br />
in morphological mathematics to analyze binary shapes since it allows us to define a local thickness distribution. The proposed<br />
algorithm can be defined in arbitrary dimension thanks to the existing techniques to compute the discrete power diagram.<br />
16:30-16:50, Paper MoBT5.4<br />
Non-Ring Filters for Robust Detection of Linear Structures<br />
Läthén, Gunnar, Linkoping Univ.<br />
Olivier, Olivier Cros, Linköping Univ.<br />
Knutsson, Hans,<br />
Borga, Magnus, Linköping Univ.<br />
Many applications in image analysis include the problem of linear structure detection, e.g. segmentation of blood vessels<br />
in medical images, roads in satellite images, etc. A simple and efficient solution is to apply linear filters tuned to the structures<br />
of interest and extract line and edge positions from the filter output. However, if the filter is not carefully designed,<br />
artifacts such as ringing can distort the results and hinder a robust detection. In this paper, we study the ringing effects<br />
using a common Gabor filter for linear structure detection, and suggest a method for generating non-ring filters in 2D and<br />
3D. The benefits of the non-ring design are motivated by results on both synthetic and natural images.<br />
16:50-17:10, Paper MoBT5.5<br />
Incremental Distance Transforms (IDT)<br />
Schouten, Theo, Radboud Univ. Nijmegen<br />
Van Den Broek, Egon L., Univ. of Twente<br />
A new generic scheme for incremental implementations of distance transforms (DT) is presented: Incremental Distance<br />
Transforms (IDT). This scheme is applied on the city-block, Chamfer, and three recent exact Euclidean DT (E2DT). A<br />
benchmark shows that for all five DT, the incremental implementation results in a significant speedup: 3.4 -10 times. However,<br />
significant differences (i.e., up to 12.5 times) among the DT remain present. The FEED transform, one of the recent<br />
E2DT, even showed to be faster than both city-block and Chamfer DT. So, through a very efficient incremental processing<br />
scheme for DT, a relief is found for E2DT’s computational burden.<br />
MoBT6 Dolmabahçe Hall B<br />
Document Segmentation Regular Session<br />
Session chair: Srihari, Sargur (Univ. at Buffalo)<br />
15:30-15:50, Paper MoBT6.1<br />
Text Separation from Mixed Documents using a Tree-Structured Classifier<br />
Peng, Xujun, State Univ. at Buffalo<br />
Setlur, Srirangaraj, Univ. at Buffalo<br />
Govindaraju, Venu, Univ. at Buffalo<br />
Sitaram, Ramachandrula, HP Lab. India<br />
In this paper, we propose a tree-structured multi-class classifier to identify annotations and overlapping text from machine<br />
printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT)<br />
which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all<br />
training data at each node with different weights. The evaluation of the proposed method is presented on a set of machine<br />
printed documents which have been annotated by multiple writers in an office/collaborative environment.<br />
- 35 -
15:50-16:10, Paper MoBT6.2<br />
Document Segmentation using Pixel-Accurate Ground Truth<br />
An, Chang, Lehigh Univ.<br />
Yin, Dawei, Lehigh Univ.<br />
Baird, Henry, Lehigh Univ.<br />
We compare methodologies for trainable document image content extraction, using a variety of ground-truth policies:<br />
loose, tight, and pixel-accurate. The goal is to achieve pixel-accurate segmentation of document images. Which groundtruth<br />
policy is the best has been debated. ``Loose’’ truth is obtained by sweeping rectangles to enclose entire text blocks<br />
etc, and can be an efficient manual task. ``Tight’’ truth requires more care, and more time, to enclose individual text lines.<br />
Pixel-accurate truth, in which only foreground pixels are labeled, can be obtained by applying the PARC PixLabeler tool;<br />
in our experience this tool was as quick to use as loose truthing. We have compared the accuracy of all three truthing policies,<br />
and report that tight truth supports higher accuracy than loose truth, and pixel-accurate truth yields the highest accuracy.<br />
We have also experimented on morphological expansions on pixel-accurate truth, by expanding sets of foreground<br />
pixels morphologically, and report that expanded pixel-accurate truth supports higher accuracy than pixel-accurate truth.<br />
16:10-16:30, Paper MoBT6.3<br />
An Adaptive Script-Independent Block-Based Text Line Extraction<br />
Ziaratban, Majid, Amirkabir Univ. of Technology<br />
Faez, Karim, Amirkabir Univ. of Technology<br />
In this paper, a novel script-independent block-based text line extraction technique is proposed for multi-skewed document<br />
images. Three parameters are defined to adopt the method with various writings. Extensive experiments on different<br />
datasets demonstrate that the proposed algorithm outperforms previous methods.<br />
16:30-16:50, Paper MoBT6.4<br />
Automated Quality Assurance for Document Logical Analysis<br />
Meunier, Jean-Luc, XRCE<br />
We consider here the general problem of converting documents available in print-ready or image format into a structured<br />
format that reflects the logical structure of the document. One aspect of the problem involves reconstructing conventional<br />
constructs such as titles, headings, captions, footnotes, etc. In practice, another important aspect involves putting in place<br />
some automated Quality Assessment (QA) method. We propose here a method to automate the QA in the case of a homogeneous<br />
collection by considering multiple documents at once instead of focusing only on the document being processed.<br />
16:50-17:10, Paper MoBT6.5<br />
The PAGE (Page Analysis and Ground-Truth Elements) Format Framework<br />
Pletschacher, Stefan, Univ. of Salford<br />
Antonacopoulos, Apostolos, Univ. of Salford<br />
There is a plethora of established and proposed document representation formats but none that can adequately support individual<br />
stages within an entire sequence of document image analysis methods (from document image enhancement to<br />
layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation<br />
framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections,<br />
binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation<br />
of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications<br />
such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition<br />
series.<br />
MoBT7 Dolmabahçe Hall C<br />
Computer Aided Detection and Diagnosis Regular Session<br />
Session chair: Unal, Gozde (Sabanci Univ.)<br />
- 36 -
15:30-15:50, Paper MoBT7.1<br />
Dyslexia Diagnostics by Centerline-Based Shape Analysis of the Corpus Callosum<br />
Elnakib, Ahmed, Univ. of Louisville<br />
El-Baz, Ayman, Univ. of Auckland<br />
Casanova, Manuel, Univ. of Louisville<br />
Switala, Andrew, Univ. of Louisville<br />
Dyslexia severely impairs learning abilities, so that improved diagnostic methods are called for. Neuropathological studies<br />
have revealed abnormal anatomy of the Corpus Callosum (CC) in dyslexic brains. We explore a possibility of distinguishing<br />
between dyslexic and normal (control brains by quantitative CC shape analysis in 3D magnetic resonance images (MRI).<br />
Our approach consists of the three steps: (I) segmenting the CC from a given 3D MRI using the learned CC shape and<br />
visual appearance; (ii) extracting the centerline of the CC; and (iii) classifying the subject as dyslexic or normal based on<br />
the estimated length of the CC centerline using a _-nearest neighbor classifier. Experiments revealed significant differences<br />
(at the 95% confidence level) between the CC centerlines for 14 normal and 16 dyslexic subjects. Our initial classification<br />
suggests the proposed centerline-based shape analysis of the CC is a promising supplement to the current dyslexia diagnostics.<br />
15:50-16:10, Paper MoBT7.2<br />
A Probabilistic Information Fusion Approach to MR-Based Automated Diagnosis of Dementia<br />
Akgul, Ceyhun Burak, Vistek Machine Vision and Automation<br />
Ekin, Ahmet, Philips Res. Europe<br />
In this work, we present a probabilistic information fusion approach for the diagnosis of dementia from cross-sectional<br />
magnetic resonance (MR) images. The approach relies on first mapping the outputs of a support vector classifier (SVM)<br />
trained on image features to probabilities and then on combining these probabilities with the class-conditional distributions<br />
of neuropsychiatric test scores, such as the mini-mental state examination (MMSE). The SVM classifier is trained and<br />
tested on 121 subjects drawn from the Open Access Series of Imaging Studies (OASIS) database. Two independent sets<br />
of MMSE related statistics are estimated from data, one from the training set in OASIS and the other from the Alzheimer’s<br />
Disease Neuroimaging Initiative (ADNI) database. The probabilistic fusion of image-based SVM decisions with no visual<br />
MMSE information exhibits very steep receiver operating characteristic curves on the test set; giving, at the equal error<br />
rate operating point, 92% accuracy.<br />
16:10-16:30, Paper MoBT7.3<br />
Two-Level Algorithm for MCs Detection in Mammograms using Diverse-Adaboost-SVM<br />
Harirchi, Farshad, K. N. Toosi Univ. of Tech.<br />
Radparvar, Parham, K. N. Toosi Univ. of Tech.<br />
Abrishami Moghaddam, Hamid, K. N. Toosi Univ. of Tech.<br />
Dehghan, Faramarz, K. N. Toosi Univ. of Tech.<br />
Giti, Masoumeh, Tehran Univ. of Medical Sciences<br />
Clustered micro calcifications (MCs) are one of the early signs of breast cancer. In this paper, we propose a new computer<br />
aided diagnosis (CAD) system for automatic detection of MCs in two steps. First, pixels corresponding to potential micro<br />
calcifications are found using a multilayer feed-forward neural network. The input of this network consists of 4 wavelet<br />
and 2 gray-level features. The output of the network is then transformed to potential micro calcification objects using<br />
spatial 4-point connectivity. Second, we extract 25 features from the potential MC objects and use Diverse Adaboost SVM<br />
(DA-SVM) and 3 other classifiers to detect individual MCs. A free-response operating characteristics (FROC) curve issued<br />
to evaluate the performance of the CAD system. The 90.44% mean TP detection rate is achieved at the cost of 1.043 FP<br />
per image by using DA-SVM shows a quite satisfactory detection performance of CAD system.<br />
16:30-16:50, Paper MoBT7.4<br />
An Image Analysis Approach for Detecting Malignant Cells in Digitized H&E-Stained Histology Images of Follicular<br />
Lymphoma<br />
Sertel, Olcay, The Ohio State Univ.<br />
Catalyurek, Umit, The Ohio State Univ.<br />
Lozanski, Gerard, The Ohio State Univ.<br />
Shana’Ah, Arwa, The Ohio State Univ.<br />
- 37 -
Gurcan, Metin, The Ohio State Univ.<br />
The gold standard in follicular lymphoma (FL) diagnosis and prognosis is histopathological examination of tumor tissue<br />
samples. However, the qualitative manual evaluation is tedious and subject to considerable inter- and intra-reader variations.<br />
In this study, we propose an image analysis system for quantitative evaluation of digitized FL tissue slides. The developed<br />
system uses a robust feature space analysis method, namely the mean shift algorithm followed by a hierarchical grouping<br />
to segment a given tissue image into basic cytological components. We then apply further morphological operations to<br />
achieve the segmentation of individual cells. Finally, we generate a likelihood measure to detect candidate cancer cells<br />
using a set of clinically driven features. The proposed approach has been evaluated on a dataset consisting of 100 region<br />
of interest (ROI) images and achieves a promising 89% average accuracy in detecting target malignant cells.<br />
16:50-17:10, Paper MoBT7.5<br />
Microaneurysm (MA) Detection via Sparse Representation Classifier with MA and Non-MA Dictionary Learning<br />
Zhang, Bob, Univ. of Waterloo<br />
Zhang, Lei, The Hong Kong Pol. Univ.<br />
You, Jane, The Hong Kong Pol. Univ.<br />
Karray, Fakhri, Univ. of Waterloo<br />
Diabetic retinopathy (DR) is a common complication of diabetes that damages the retina and leads to sight loss if treated<br />
late. In its earliest stage, DR can be diagnosed by micro aneurysm (MA). Although some algorithms have been developed,<br />
the accurate detection of MA in color retinal images is still a challenging problem. In this paper we propose a new method<br />
to detect MA based on Sparse Representation Classifier (SRC). We first roughly locate MA candidates by using multiscale<br />
Gaussian correlation filtering, and then classify these candidates with SRC. Particularly, two dictionaries, one for<br />
MA and one for non-MA, are learned from example MA and non-MA structures, and are used in the SRC process. Experimental<br />
results on the ROC database show that the proposed method can well distinguish MA from non-MA objects.<br />
MoBT8 Lower Foyer<br />
Object Detection and Recognition; Performance Evaluation of Computer Vision Algorithms; Computer Vision<br />
Applications Poster Session<br />
Session chair: Chen, Chu-Song (Academia Sinica)<br />
15:00-17:10, Paper MoBT8.1<br />
A Neurobiologically Motivated Stochastic Method for Analysis of Human Activities in Video<br />
Sethi, Ricky, Univ. of California, Riverside<br />
Roy-Chowdhury, Amit, Univ. of California, Riverside<br />
In this paper, we develop a neurobiologically-motivated statistical method for video analysis that simultaneously searches<br />
the combined motion and form space in a concerted and efficient manner using well-known Markov chain Monte Carlo<br />
(MCMC) techniques. Specifically, we leverage upon an MCMC variant called the Hamiltonian Monte Carlo (HMC),<br />
which we extend to utilize data-based proposals rather than the blind proposals in a traditional HMC, thus creating the<br />
Data-Driven HMC (DDHMC). We demonstrate the efficacy of our system on real-life video sequences.<br />
15:00-17:10, Paper MoBT8.2<br />
Arbitrary Stereoscopic View Generation using Multiple Omnidirectional Image Sequences<br />
Hori, Maiya, Nara Inst. of Science and Tech.<br />
Kanbara, Masayuki, Nara Inst. of Science and Tech.<br />
Yokoya, Naokazu, Nara Inst. of Science and Tech.<br />
This paper proposes a novel method for generating arbitrary stereoscopic view from multiple omni directional image sequences.<br />
Although conventional methods for arbitrary view generation with an image-based rendering approach can create<br />
binocular views, positions and directions of viewpoints for stereoscopic vision are limited to a small range. In this research,<br />
we attempt to generate arbitrary stereoscopic views from omni directional image sequences that are captured in various<br />
multiple paths. To generate a high-quality stereoscopic view from a number of images captured at various viewpoints, appropriate<br />
ray information needs to be selected. In this paper, appropriate ray information is selected from a number of<br />
omni directional images using a penalty function expressed as ray similarity. In experiments, we show the validity of this<br />
penalty function by generating stereoscopic view from multiple real image sequences.<br />
- 38 -
15:00-17:10, Paper MoBT8.3<br />
Fast Odometry Integration in Local Bundle Adjustment-Based Visual SLAM<br />
Eudes, Alexandre, CEA LIST<br />
Lhuillier, Maxime, LASMEA<br />
Naudet Collette, Sylvie, CEA LIST, LVIC<br />
Dhome, Michel, Blaise Pascal Univ.<br />
The Simultaneous Localisation And Mapping (SLAM) for a camera moving in a scene is a long term research problem.<br />
Here we improve a recent visual SLAM which applies Local Bundle Adjustments (LBA) on selected key-frames of a<br />
video: we show how to correct the scale drift observed in long monocular video sequence using an additional odometry<br />
sensor. Our method and results are interesting for several reasons: (1) the pose accuracy is improved on real examples (2)<br />
we do not sacrifice the consistency between the reconstructed 3D points and image features to fit odometry data (3) the<br />
modification of the original visual SLAM method is not difficult.<br />
15:00-17:10, Paper MoBT8.4<br />
Classifying Textile Designs using Bags of Shapes<br />
Jia, Wei, Univ. of Dundee<br />
Mckenna, Stephen James, Univ. of Dundee<br />
The use of region shape descriptors was investigated for categorisation of textile design images. Images were segmented<br />
using MRF pixel labelling and the shapes of regions obtained were described with generic Fourier descriptors. Each image<br />
was represented as a bag of shapes. A simple yet competitive classification scheme based on nearest neighbour class-based<br />
matching was used. Classification performance was compared to that obtained when using bags of SIFT features.<br />
15:00-17:10, Paper MoBT8.5<br />
Driver Body-Height Prediction for an Ergonomically Optimized Ingress using a Single Omnidirectional Camera<br />
Scharfenberger, Christian, TU-Munich<br />
Chakraborty, Samarjit, TU-Munich<br />
Faerber, Georg, TU-Munich<br />
Maximizing passengers comfort is an important research topic in the domain of automotive systems engineering. In particular,<br />
an automatic adjustment of seat position according to driver height significantly increases the level of comfort<br />
during ingress. In this paper, we present a new method to estimate the height of approaching car drivers based on a single<br />
omni directional camera integrated with the side-view mirror of a car. Towards this, we propose mathematical descriptions<br />
of standard parking scenarios, allowing for an accurate height estimation. First, approaching drivers are extracted from<br />
image frames captured by the camera. Second, the scenario and height are initially estimated based on gathered samples<br />
of angles to head and foot-points of an approaching driver. An iterative optimization process removes outliers and refines<br />
the initially estimated scenario and height. Finally, we present a number of experimental results based on image sequences<br />
captured from real-life ingress scenarios.<br />
15:00-17:10, Paper MoBT8.6<br />
Torchlight Navigation<br />
Felsberg, Michael, Linköping Univ.<br />
Larsson, Fredrik, Linköping Univ.<br />
Wang, Han, Nanyang Tech. Univ.<br />
Ynnerman, Anders, Linköping Univ.<br />
Schön, Thomas, Linköping Univ.<br />
A common computer vision task is navigation and mapping. Many indoor navigation tasks require depth knowledge of<br />
flat, unstructured surfaces (walls, floor, ceiling). With passive illumination only, this is an ill-posed problem. Inspired by<br />
small children using a torchlight, we use a spotlight for active illumination. Using our torchlight approach, depth and orientation<br />
estimation of unstructured, flat surfaces boils down to estimation of ellipse parameters. The extraction of ellipses<br />
is very robust and requires little computational effort.<br />
- 39 -
15:00-17:10, Paper MoBT8.7<br />
Adaptive Image Projection Onto Non-Planar Screen using Projector-Camera Systems<br />
Yamanaka, Takashi, Nagoya Inst. of Tech.<br />
Sakaue, Fumihiko, Nagoya Inst. of Tech.<br />
Sato, Jun, Nagoya Inst. of Tech.<br />
In this paper, we propose a method for projecting images onto non-planar screens by using projector-camera systems eliminating<br />
distortion in projected images. In this system, point-to-point correspondences in a projector image and a camera<br />
image should be extracted. For finding correspondences, the epipolar geometry between a projector and a camera is used.<br />
By using dynamic programming method on epipolar lines, correspondences between projector image and camera image<br />
are obtained. Furthermore, in order to achieve faster and more robust matching, the non-planar screen is approximately<br />
represented by a B-spline surface. The small number of parameters for the B-spline surface are estimated from corresponding<br />
pixels on epipolar lines rapidly. Experimental results show the proposed method works well for projecting images<br />
onto non-planar screens.<br />
15:00-17:10, Paper MoBT8.8<br />
Analysis and Adaptation of Integration Time in PMD Camera for Visual Servoing<br />
Gil, Pablo, Univ. of Alicante<br />
Pomares, Jorge, Univ. of Alicante<br />
Torres, Fernando, Univ. of Alicante<br />
The depth perception in the objects of a scene can be useful for tracking or applying visual servoing in mobile systems. 3D<br />
time-of-flight (ToF) cameras provide range images which give measurements in real time to improve these types of tasks.<br />
However, the distance computed from these range images is very changing with the integration time parameter. This paper<br />
presents an analysis for the online adaptation of integration time of ToF cameras. This online adaptation is necessary in order<br />
to capture the images in the best condition irrespective of the changes of distance (between camera and objects) caused by<br />
its movement when the camera is mounted on a robotic arm.<br />
15:00-17:10, Paper MoBT8.9<br />
Detecting Paper Fibre Cross Sections in Microtomy Images<br />
Kontschieder, Peter, Graz Univ. of Tech.<br />
Donoser, Michael, Graz Univ. of Tech.<br />
Kritzinger, Johannes, Graz Univ. of Tech.<br />
Bauer, Wolfgang, Graz Univ. of Tech.<br />
Bischof, Horst, Graz Univ. of Tech.<br />
The goal of this work is the fully-automated detection of cellulose fibre cross sections in microtomy images. A lack of<br />
significant appearance information makes edges the only reliable cue for detection. We present a novel and highly discriminative<br />
edge fragment descriptor that represents angular relations between fragment points. We train a Random Forest<br />
with a plurality of these descriptors including their respective center votes. In such a way, the Random Forest exploits the<br />
knowledge about the object centroid for detection using a generalized Hough voting scheme. In the experiments we found<br />
that our method is able to robustly detect fibre cross sections in microtomy images and can therefore serve as initialization<br />
for successive fibre segmentation or tracking algorithms.<br />
15:00-17:10, Paper MoBT8.10<br />
Active Calibration of Camera-Projector Systems based on Planar Homography<br />
Park, Soon-Yong, Kyungpook National Univ.<br />
Park, Go Gwang, Kyungpook National Univ.<br />
This paper presents a simple and active calibration technique of camera-projector systems based on planar homography.<br />
From the camera image of a planar calibration pattern, we generate a projector image of the pattern through the homography<br />
between the camera and the projector. To determine the coordinates of the pattern corners from the view of the projector,<br />
we actively project a corner marker from the projector to align the marker with the printed pattern corners. Calibration is<br />
done in two steps. First, four outer corners of the pattern are identified. Second, all other inner corners are identified. The<br />
pattern image from the projector is then used to calibrate the projector. Experimental results of two types of camera-projector<br />
systems show that the projection errors of both camera and projector are less than 1 pixel.<br />
- 40 -
15:00-17:10, Paper MoBT8.11<br />
Abnormal Traffic Detection using Intelligent Driver Model<br />
Sultani, Waqas, Seoul National Univ.<br />
Choi, Jin Young, Seoul National Univ.<br />
We present a novel approach for detecting and localizing abnormal traffic using intelligent driver model. Specifically, we<br />
advect particles over video sequence. By treating each particle as a car, we compute driver behavior using intelligent driver<br />
model. The behaviors are learned using latent dirichlet allocation and frames are classified as abnormal using likelihood<br />
threshold criteria. In order to localize the abnormality; we compute spatial gradients of behaviors and construct Finite<br />
Time Lyaponov Field. Finally the region of abnormality is segmented using watershed algorithm. The effectiveness of<br />
proposed approach is validated using videos from stock footage websites.<br />
15:00-17:10, Paper MoBT8.12<br />
Detection of Moving Objects with Removal of Cast Shadows and Periodic Changes using Stereo Vision<br />
Moro, Alessandro, Univ. of Trieste<br />
Terabayashi, Kenji, Chuo Univ.<br />
Umeda, Kazunori, Chuo Univ.<br />
In this paper we present a method for the detection of moving objects for unknown and generic environments under cast<br />
shadow and periodic movements of non relevant objects (like waving leaves), using a combination of non-parametric<br />
thresholding algorithms and local cast shadow analysis with stereo camera information. Good detection rates were achieved<br />
in several environments under different lighting conditions, and objects could be detected independently of scene illumination,<br />
shadow, and periodic changes.<br />
15:00-17:10, Paper MoBT8.13<br />
Localized Image Matte Evaluation by Gradient Correlation<br />
Yao, Guilin, Harbin Inst. of Tech.<br />
Yao, Hongxun, Harbin Inst. of Tech.<br />
In natural image matting, various kinds of algorithms have been recently proposed. Moreover, alpha matting results have<br />
also been generated for comparison and composition into new backgrounds. However, all these methods have to make an<br />
alpha matte comparison to the ground truth so that one can get the final pixel-wised evaluation of these results. Nevertheless,<br />
while the input datasets are just used for test and there are no ground truth mattes, it is not possible to perform comparisons<br />
and to generate the quantitative comparison results. In this paper we combine the two ideas above and propose a<br />
new pixel-wise alpha mattes evaluation method. This approach is based on using local windows to measure gradient correlation<br />
between image and the matte. An optimal image channel minimizing the image variance is also selected at each<br />
window in order to perform the correlation more correctly. Experimental result shows that, our system can generate precise<br />
evaluation result for each pixel of each matte without ground truth.<br />
15:00-17:10, Paper MoBT8.14<br />
Multiple Plane Detection in Image Pairs using J-Linkage<br />
Fouhey, David Ford, Middlebury Coll.<br />
Scharstein, Daniel, Middlebury Coll.<br />
Briggs, Amy, Middlebury Coll.<br />
We present a new method for the robust detection and matching of multiple planes in pairs of images. Such planes can<br />
serve as stable landmarks for vision-based urban navigation. Our approach starts from SIFT matches and generates multiple<br />
local homography hypotheses using the recent J-linkage technique by Toldo and Fusiello, a robust randomized multimodel<br />
estimation algorithm. These hypotheses are then globally merged, spatially analyzed, robustly fitted, and checked<br />
for stability. When tested on more than 30,000 image pairs taken from panoramic views of a college campus, our method<br />
yields no false positives and recovers 72% of the matchable building walls identified by a human, despite significant occlusions<br />
and viewpoint changes.<br />
- 41 -
15:00-17:10, Paper MoBT8.15<br />
Contextual Features for Head Pose Estimation in Football Games<br />
Launila, Andreas, Royal Inst. of Tech. (KTH)<br />
Sullivan, Josephine, Royal Inst. of Tech. (KTH)<br />
We explore the benefits of using contextual features for head pose estimation in football games. Contextual features are<br />
derived from knowledge of the position of all players and combined with image based features derived from low-resolution<br />
footage. Using feature selection and combination techniques, we show that contextual features can aid head pose estimation<br />
in football games and potentially be an important complement to the image based features traditionally used.<br />
15:00-17:10, Paper MoBT8.16<br />
Coarse-To-Fine Multiclass Nested Cascades for Object Detection<br />
Verschae, Rodrigo, Univ. de Chile<br />
Ruiz-Del-Solar, Javier, Univ. de Chile<br />
Building robust and fast object detection systems is an important goal of computer vision. A problem arises when several<br />
object types are to be detected, because the computational burden of running several specific classifiers in parallel becomes<br />
a problem. In addition the accuracy and the training time can be greatly affected. Seeking to provide a solution to these<br />
problems, we extend cascade classifiers to the multiclass case by proposing the use of multiclass coarse-to-fine (CTF)<br />
nested cascades. The presented results show that the proposed system scales well with the number of classes, both at<br />
training and running time.<br />
15:00-17:10, Paper MoBT8.17<br />
Visual SLAM with an Omnidirectional Camera<br />
Rituerto, Alejandro, Univ. de Zaragoza<br />
Puig, Luis, Univ. de Zaragoza<br />
Guerrero, Jose J., Univ. de Zaragoza<br />
In this work we integrate the Spherical Camera Model for catadioptric systems in a Visual-SLAM application. The Spherical<br />
Camera Model is a projection model that unifies central catadioptric and conventional cameras. To integrate this model<br />
into the Extended Kalman Filter-based SLAM we require to linearize the direct and the inverse projection. We have performed<br />
an initial experimentation with omni directional and conventional real sequences including challenging trajectories.<br />
The results confirm that the omni directional camera gives much better orientation accuracy improving the estimated camera<br />
trajectory.<br />
15:00-17:10, Paper MoBT8.18<br />
Shape Index SIFT: Range Image Recognition using Local Features<br />
Bayramoglu, Neslihan, Middle East Tech. Univ.<br />
Alatan, A. Aydin, Middle East Tech. Univ.<br />
Range image recognition gains importance in the recent years due to the developments in acquiring, displaying, and storing<br />
such data. In this paper, we present a novel method for matching range surfaces. Our method utilizes local surface properties<br />
and represents the geometry of local regions efficiently. Integrating the Scale Invariant Feature Transform (SIFT) with the<br />
shape index (SI) representation of the range images allows matching of surfaces with different scales and orientations. We<br />
apply the method for scaled, rotated, and occluded range images and demonstrate the effectiveness it by comparing the<br />
previous studies.<br />
15:00-17:10, Paper MoBT8.19<br />
Windows Detection using K-Means in CIE-Lab Color Space<br />
Recky, Michal, ICG TU Graz<br />
Leberl, Franz, ICG TU Graz<br />
In this paper, we present a method for window detection, robust enough to process complex facades of historical buildings.<br />
This method is able to provide results even for facades under severe perspective distortion. Our algorithm is able to detect<br />
many different window types and does not require a learning step. We achieve these features thanks to an extended gradient<br />
projection method and introduction of a I color descriptor based on a k-means clustering in a CIE-Lab color space into the<br />
- 42 -
process. This method is an important step towards creating large 3D city models in an automated workflow from large online<br />
image databases, or industrial systems. As such, it was designed to provide a high level of robustness for processing<br />
a large variety of I types.<br />
15:00-17:10, Paper MoBT8.20<br />
Robust Figure Extraction on Textured Background: A Game-Theoretic Approach<br />
Albarelli, Andrea, Univ. Ca’ Foscari Venezia<br />
Rodolà, Emanuele, Univ. Ca’ Foscari Venezia<br />
Cavallarin, Alberto, Univ. Ca’ Foscari Venezia<br />
Torsello, Andrea, Univ. Ca’ Foscari Venezia<br />
Feature-based image matching relies on the assumption that the features contained in the model are distinctive enough. When<br />
both model and data present a sizeable amount of clutter, the signal-to-noise ratio falls and the detection becomes more challenging.<br />
If such clutter exhibits a coherent structure, as it is the case for textured background, matching becomes even harder.<br />
In fact, the large amount of repeatable features extracted from the texture dims the strength of the relatively few interesting<br />
points of the object itself. In this paper we introduce a game-theoretic approach that allows to distinguish foreground features<br />
from background ones. In addition the same technique can be used to deal with the object matching itself. The whole procedure<br />
is validated by applying it to a practical scenario and by comparing it with a standard point-pattern matching technique.<br />
15:00-17:10, Paper MoBT8.21<br />
Image Retrieval of First-Person Vision for Pedestrian Navigation in Urban Area<br />
Kameda, Yoshinari, Univ. of Tsukuba<br />
Ohta, Yuich, Univ. of Tsukuba<br />
We propose a new computer vision approach to locate a walking pedestrian by a camera image of first-person vision in practical<br />
situation. We assume reference points have been registered with other first-person vision images. We utilize SURF and<br />
define seven matching criteria that derive from the property of first-person vision so that it rejects false matching. We have<br />
implemented a preliminary system that can respond to a query within ½ seconds for a path of approximately 1 km long<br />
around Tokyo downtown area where pedestrians and vehicles are always in images.<br />
15:00-17:10, Paper MoBT8.22<br />
Unexpected Human Behavior Recognition in Image Sequences using Multiple Features<br />
Zweng, Andreas, Vienna Univ. of Tech.<br />
Kampel, Martin, Vienna Univ. of Tech.<br />
This paper presents a novel approach for unexpected behavior recognition in image sequences with attention to high density<br />
crowd scenes. Due to occlusions, object-tracking in such scenes is challenging and in cases of low resolution or poor image<br />
quality it is not robust enough to efficiently detect abnormal behavior. The wide variety of possible actions performed by<br />
humans and the problem of occlusions makes action recognition unsuitable for behavior recognition in high density crowd<br />
scenes. The novel approach, which is presented in this paper uses features based on motion information instead of detecting<br />
actions or events in order to detect abnormality. Experiments demonstrate the potentials of the approach.<br />
15:00-17:10, Paper MoBT8.23<br />
Object Recognition based on N-Gram Expression of Human Actions<br />
Kojima, Atsuhiro, Osaka Prefecture Univ.<br />
Miki, Hiroshi, Osaka Prefecture Univ.<br />
Kise, Koichi, Osaka Prefecture Univ.<br />
In this paper, we propose a novel method for recognizing objects by observing human actions based on bag-of-features. The<br />
key contribution of our method is that human actions are represented as n-grams of symbols and used to identify specific<br />
object categories. First, features of human actions taken on a object are extracted from video images and encoded to symbols.<br />
Then, n-grams are generated from the sequence of symbols and registered for corresponding object category. For recognition<br />
phase, actions taken on the object are converted into set of n-grams in the same way and compared with ones representing<br />
object categories. We performed experiments to recognize objects in an office environment and confirmed the effectiveness<br />
of our method.<br />
- 43 -
15:00-17:10, Paper MoBT8.24 CANCELED<br />
Image Feature Associations via Local Semantic Structure<br />
Parrish, Nicholas, Colorado State Univ.<br />
Draper, Bruce A., Colorado State Univ.<br />
Most research in object recognition suffers from two distinct weaknesses that limits its effectiveness in natural environments.<br />
First, it tends to rely on labeled training images to learn object models. Second, it tends to assume that the goal is<br />
to recognize a single, dominant foreground object. This paper presents a different method of object recognition that learns<br />
to recognize objects in natural scenes without supervision. The approach uses semantic co-occurance information of local<br />
image features to form object models (called percepts) from groups of image features. These percepts are used to recognize<br />
objects in novel images. It will be shown that this approach is capable of learning object categories without supervision,<br />
and of recognizing objects in complex multi-object scenes. It will also be shown that it outperforms nearest-neighbor<br />
scene recognition.<br />
15:00-17:10, Paper MoBT8.25<br />
Unifying Approach for Fast License Plate Localization and Super-Resolution<br />
Nguyen, Chu Duc, Ec. Centrale de Lyon<br />
Ardabilian, Mohsen, Ec. Centrale de Lyon<br />
Chen, Liming, Ec. Centrale de Lyon<br />
This paper addresses the localization and super resolution of license plate in a unifying approach. Higher quality license<br />
plate can be obtained using super resolution on successive lower resolution plate images. All existing methods assume that<br />
plate zones are correctly extracted from every frame. However, the accurate localization needs a sufficient quality of the<br />
image, which is not always true in real video. Super-resolution on all pixels is a possible but much time consuming alternative.<br />
We propose a framework which interlaces successfully these two modules. First, coarse candidates are found by an weak<br />
but fast license plate detection based on edge map sub-sampling. Then, an improved fast MAP-based super-resolution, using<br />
local phase accurate registration and edge preserving prior, applied on these regions of interest. Finally, our robust ICHTbased<br />
localizer rejects false-alarms and localizes the high resolution license plate more accurately. Experiments which were<br />
conducted on synthetic and real data, proved the robustness of our approach with real-time possibility.<br />
15:00-17:10, Paper MoBT8.26<br />
Dimensionality Reduction for Distributed Vision Systems using Random Projection<br />
Sulic, Vildana, Univ. of Ljubljana<br />
Pers, Janez, Univ. of Ljubljana<br />
Kristan, Matej, Univ. of Ljubljana<br />
Kovacic, Stanislav, Univ. of Ljubljana<br />
Dimensionality reduction is an important issue in the context of distributed vision systems. Processing of dimensionality<br />
reduced data requires far less network resources (e.g., storage space, network bandwidth) than processing of original data.<br />
In this paper we explore the performance of the random projection method for distributed smart cameras. In our tests, random<br />
projection is compared to principal component analysis in terms of recognition efficiency (i.e., object recognition).<br />
The results obtained on the COIL-20 image data set show good performance of the random projection in comparison to<br />
the principal component analysis, which requires distribution of a subspace and therefore consumes more resources of the<br />
network. This indicates that random projection method can elegantly solve the problem of subspace distribution in embedded<br />
and distributed vision systems. Moreover, even without explicit orthogonalization or normalization of random<br />
projection transformation subspace, the method achieves good object recognition efficiency.<br />
15:00-17:10, Paper MoBT8.27<br />
Sensor Fusion for Cooperative Head Localization<br />
Del Bimbo, Alberto, Univ. of Florence<br />
Dini, Fabrizio, Univ. of Florence<br />
Lisanti, Giuseppe, Univ. of Florence<br />
Pernici, Federico, Univ. of Florence<br />
In modern video surveillance systems, pan; tilt; zoom (PTZ) cameras certainly have the potential to allow the coverage of<br />
wide areas with a much smaller number of sensors, compared to the common approach of fixed camera networks. This<br />
- 44 -
paper describes a general framework that aims at exploiting the capabilities of modern PTZ cameras in order to acquire<br />
high resolution images of body parts, such as the head, from the observation of pedestrians moving in a wide outdoor<br />
area. The framework allows to organize the sensors in a network with arbitrary topology, and to establish pairwise<br />
master;slave relationship between them. In this way a slave camera can be steered to acquire imagery of a target keeping<br />
into account both target and zooming uncertainties. Experiments show good performance in localizing targets head, independently<br />
from the zooming factor of the slave camera.<br />
15:00-17:10, Paper MoBT8.28<br />
Shared Random Ferns for Efficient Detection of Multiple Categories<br />
Villamizar Vergel, Michael, CSIC-UPC<br />
Moreno-Noguer, Francesc, CSIC-UPC<br />
Andrade Cetto, Juan, CSIC-UPC<br />
Sanfeliu, Alberto, Univ. Pol. De Catalunya<br />
We propose a new algorithm for detecting multiple object categories that exploits the fact that different categories may<br />
share common features but with different geometric distributions. This yields an efficient detector which, in contrast to<br />
existing approaches, considerably reduces the computation cost at runtime, where the feature computation step is traditionally<br />
the most expensive. More specifically, at the learning stage we compute common features by applying the same<br />
Random Ferns over the Histograms of Oriented Gradients on the training images. We then apply a boosting step to build<br />
discriminative weak classifiers, and learn the specific geometric distribution of the Random Ferns for each class. At<br />
runtime, only a few Random Ferns have to be densely computed over each input image, and their geometric distribution<br />
allows performing the detection. The proposed method has been validated in public datasets achieving competitive detection<br />
results, which are comparable with state-of-the-art methods that use specific features per class.<br />
15:00-17:10, Paper MoBT8.29<br />
Age Recognition in the Wild<br />
Bauckhage, Christian, Fraunhofer IAIS<br />
Jahanbekam, Amirhossein, Fraunhofer IAIS<br />
Thurau, Christian, Fraunhofer IAIS<br />
In this paper, we present a novel approach to age recognition from facial images. The method we propose, combines<br />
several established features in order to characterize facial characteristics and aging patterns. Since we explicitly consider<br />
age recognition in the wild, i.e. vast amounts of unconstrained Internet images, the methods we employ are tailored towards<br />
speed and efficiency. For evaluation, we test different classifiers on common benchmark data and a new data set of unconstrained<br />
images harvested from the Internet. Extensive experimental evaluation shows state of the art performance on<br />
the benchmarks, very high accuracy for the novel data set, and superior runtime performance; to our knowledge, this is<br />
the first time that automatic age recognition is carried out on a large Internet data set.<br />
15:00-17:10, Paper MoBT8.30<br />
EKF-SLAM and Machine Learning Techniques for Visual Robot Navigation<br />
Casarrubias-Vargas, Heriberto, CINVESTAV<br />
Petrilli-Barceló, Alberto E., CINVESTAV<br />
Bayro Corrochano, Eduardo Jose, CINVESTAV, Unidad Guadalajara<br />
In this work we propose the use of machine learning techniques to improve Simultaneous Localization and Mapping<br />
(SLAM) using an extended Kalman filter (EKF) and visual information for robot navigation. We are using the Viola and<br />
Jones approach for looking specific visual landmarks in environment. The landmarks are used to improve the robot localization<br />
in the EKF-SLAM system. Our experiments validate the efficiency of our algorithm.<br />
15:00-17:10, Paper MoBT8.31<br />
Boosting Clusters of Samples for Sequence Matching in Camera Networks<br />
Takala, Valtteri, Univ. of Oulu<br />
Cai, Yinghao, Univ. of Oulu<br />
Pietikäinen, Matti, Univ. of Oulu<br />
This study introduces a novel classification algorithm for learning and matching sequences in view independent object<br />
- 45 -
tracking. The proposed learning method uses adaptive boosting and classification trees on a wide collection (shape, pose,<br />
color, texture, etc.) of image features that constitute a model for tracked objects. The temporal dimension is taken into account<br />
by using k-mean clusters of sequence samples. Most of the utilized object descriptors have a temporal quality also.<br />
We argue that with a proper boosting approach and decent number of reasonably descriptive image features it is feasible<br />
to do view-independent sequence matching in sparse camera networks. The experiments on real-life surveillance data support<br />
this statement.<br />
15:00-17:10, Paper MoBT8.32<br />
Saliency Detection and Object Localization in Indoor Environments<br />
Rudinac, Maja, Delft Univ. of Tech.<br />
Jonker, Pieter, Delft Univ. of Tech.<br />
In this paper we present a scene exploration method for the identification of interest regions in unknown indoor environments<br />
and the position estimation of the objects located in those regions. Our method consists of two stages: First, we<br />
generate a saliency map of the scene based on the spectral residual of three color channels and interest points are detected<br />
in this map. Second, we propose and evaluate a method for the clustering of neighboring interest regions, the rejection of<br />
outliers and the estimation of the positions of potential objects. Once the location of objects in the scene is known, recognition<br />
of objects/object classes can be performed or the locations can be used for grasping the object. The main contribution<br />
of this paper lies in a computationally inexpensive method for the localization of multiple salient objects in a scene. The<br />
performance obtained on a dataset of indoor scenes shows that our method performs good, is very fast and hence highly<br />
suitable for real-world applications, such as mobile robots and surveillance.<br />
15:00-17:10, Paper MoBT8.33<br />
Bubble Tag Identification using an Invariant–Under–Perspective Signature<br />
Patraucean, Viorica, Univ. of Toulouse<br />
Gurdjos, Pierre, Univ. of Toulouse<br />
Conter, Jean, Univ. of Toulouse<br />
We have at our disposal a large database containing images of various configurations of coplanar circles, randomly laidout,<br />
called Bubble Tags. The images are taken from different viewpoints. Given a new image (query image), the goal is to<br />
find in the database the image containing the same bubble tag as the query image. We propose representing the images<br />
through projective invariant signatures which allow identifying the bubble tag without passing through an Euclidean reconstruction<br />
step. This is justified by the size of the database, which imposes the use of queries in 1D/vectorial form, i.e.<br />
not in 2D/matrix form. The experiments carried out confirm the efficiency of our approach, in terms of precision and complexity.<br />
15:00-17:10, Paper MoBT8.35<br />
The Role of Polarity in Haar-Like Features for Face Detection<br />
Landesa-Vázquez, Iago, Univ. de Vigo<br />
Alba Castro, Jose Luis, Univ. of Vigo<br />
Human vision is primarily based on local contrast perception and its polarity. Viola and Jones proposed, in their wellknown<br />
face detector framework, a boosted cascade of weak classifiers based on Haar-like features which encode local<br />
contrast and polarity information. Nevertheless contrast polarity invariance, which is not directly modeled in their framework,<br />
has been shown to be perceptually relevant for the human capability of detecting faces. In this paper we study, from<br />
both algorithmical and perceptual points of view, the effect of enhancing Haar-like features with polarity invariance and<br />
how it may improve cascaded classifiers.<br />
15:00-17:10, Paper MoBT8.36<br />
A Human Detection Framework for Heavy Machinery<br />
Heimonen, Teuvo Antero, Univ. of Oulu<br />
Heikkilä, Janne, Univ. of Oulu<br />
A stereo camera based human detection framework for heavy machinery is proposed. The framework allows easy integration<br />
of different human detection and image segmentation methods. This integration is essential for diverge and challenging<br />
- 46 -
work machine environments, in which traditional, one detector based human detection approaches has been found to be<br />
insufficient. The framework is based on the idea of pixel-wise human probabilities, which are obtained by several separate<br />
detection trials following binomial distribution. The framework has been evaluated with extensive image sequences of<br />
authentic work machine environments, and it has proven to be feasible. Promising detection performance was achieved<br />
by utilizing publically available human detectors.<br />
15:00-17:10, Paper MoBT8.37<br />
Building a Videorama with Shallow Depth of Field<br />
Bae, Soonmin, Boston Coll.<br />
Jiang, Hao, Boston Coll.<br />
This paper presents a new automatic approach to building a videorama with shallow depth of field. We stitch the static background<br />
of video frames and render the dynamic foreground onto the enlarged background after foreground/background segmentation.<br />
To this end, we extract the depth information from a two-view video stream. We show that the depth cues combined<br />
with color cues improve segmentation. Finally, we use the depth cues to synthesize the shallow depth of field effects in the<br />
final videorama. Our approach stabilizes the camera motion as if the video was captured from a static camera and improves<br />
the visual quality with the increased field of view and shallow depth of field effects.<br />
15:00-17:10, Paper MoBT8.38<br />
Fast Training of Object Detection using Stochastic Gradient Descent<br />
Wijnhoven, Rob, ViNotion BV<br />
De With, Peter H. N., Eindhoven Univ. of Tech. / CycloMedia<br />
Training datasets for object detection problems are typically very large and Support Vector Machine (SVM) implementations<br />
are computationally complex. As opposed to these complex techniques, we use Stochastic Gradient Descent (SGD) algorithms<br />
that use only a single new training sample in each iteration and process samples in a stream-like fashion. We have incorporated<br />
SGD optimization in an object detection framework. The object detection problem is typically highly asymmetric, because<br />
of the limited variation in object appearance, compared to the background. Incorporating SGD speeds up the optimization<br />
process significantly, requiring only a single iteration over the training set to obtain results comparable to state-of-the-art<br />
SVM techniques. SGD optimization is linearly scalable in time and the obtained speedup in computation time is two to three<br />
orders of magnitude. We show that by considering only part of the total training set, SGD converges quickly to the overall<br />
optimum.<br />
15:00-17:10, Paper MoBT8.39<br />
Assessing Water Quality by Video Monitoring Fish Swimming Behavior<br />
Serra-Toro, Carlos, Univ. Jaume I<br />
Montoliu, Raúl, Univ. Jaume I<br />
Traver, V. Javier, Univ. Jaume I<br />
Hurtado-Melgar, Isabel M., Univ. Jaume I<br />
Núñez-Redó, Manuela, Univ. Jaume I<br />
Cascales, Pablo, Univ. Jaume I<br />
Animals are known to alter their behavior in response to changes in their environments. Therefore, automatic visual monitoring<br />
of animal behavior is currently of great interest because of its many applications. In this paper, a video-based system<br />
is proposed for analyzing the swimming patterns of fishes so that the presence of toxic in the water can be inferred. This<br />
problem is challenging, among other reasons, because how fishes react when swimming in contaminated water is neither<br />
really known nor well defined. A novel use of recurrence plots is proposed, and very compact and simple descriptors based<br />
on these recurrence representation are found to be highly discriminative between videos of fishes in clean and polluted water.<br />
15:00-17:10, Paper MoBT8.40<br />
Detecting Wires in Cluttered Urban Scenes using a Gaussian Model<br />
Candamo, Joshua, Univ. of South Florida<br />
Goldgof, Dmitry, Univ. of South Florida<br />
Kasturi, Rangachar, Univ. of South Florida<br />
Godavarthy, Sridhar, Univ. of South Florida<br />
- 47 -
A novel wire detection algorithm for use by unmanned aerial vehicles (UAV) in low altitude urban reconnaissance is presented.<br />
This is of interest to urban search and rescue and military reconnaissance operations. Detection of wires plays an<br />
important role, because thin wires are hard to discern by tele-operators and automated systems. Our algorithm is based on<br />
identification of linear patterns in images. Most existing methods that search for linear patterns use a simple model of a<br />
line, which does not take into account the line surroundings. We propose the use of a robust Gaussian model to approximate<br />
the intensity profile of a line and its surroundings which allows effective discrimination of wires from other visually similar<br />
linear patterns. The algorithm is able to cope with highly cluttered urban backgrounds, moderate rain, and mist. Experimental<br />
results show a 17.7% detection improvement over the baseline.<br />
15:00-17:10, Paper MoBT8.41<br />
Abandoned Objects Detection based on Radial Reach Correlation of Double Illumination Invariant Foreground Masks<br />
Li, Xunli, Peking Univ.<br />
Zhang, Chao, Peking Univ.<br />
Zhang, Duo,<br />
This paper proposes an automatic and robust method to detect and recognize the abandoned objects for video surveillance<br />
systems. Two Gaussian Mixture Models(Long-term and Short-term models) in the RGB color space are constructed to<br />
obtain two binary foreground masks. By refining the foreground masks through Radial Reach Filter(RRF) method, the influence<br />
of illumination changes is greatly reduced. The height/width ratio and a linear SVM classifier based on HOG (Histogram<br />
of Oriented Gradient) descriptor is also used to recognize the left-baggage. Tests on datasets of PETS2006,<br />
PETS2007 and our own videos show that the proposed method in this paper can detect very small abandoned objects<br />
within low quality surveillance videos, and it is also robust to the varying illuminations and dynamic background.<br />
15:00-17:10, Paper MoBT8.42<br />
Unsupervised Visual Object Categorisation via Self-Organisation<br />
Kinnunen, Juha Teemu Ensio, Lappeenranta Univ. of Tech.<br />
Kamarainen, Joni-Kristian, Lappeenranta Univ. of Tech.<br />
Lensu, Lasse, Lappeenranta Univ. of Tech.<br />
Kalviainen, Heikki, Lappeenranta Univ. of Tech.<br />
Visual object categorisation (VOC) has become one of the most actively investigated topic in computer vision. In the<br />
mainstream studies, the topic is considered as a supervised problem, but recently, the ultimate challenge has been posed:<br />
Unsupervised visual object categorisation. Hitherto only a few methods have been published, all of them being computationally<br />
demanding successors of their supervised counterparts. In this study, we address this problem with a simple and<br />
effective method: competitive learning leading to self organisation (self-categorisation). The unsupervised competitive<br />
learning approach is implemented using the Kohonen self-organising map algorithm (SOM). The SOM is used to perform<br />
the both unsupervised code<strong>book</strong> generation and object categorisation. We present our method in detail and compare results<br />
to the supervised approach.<br />
15:00-17:10, Paper MoBT8.43<br />
A Novel Shape Feature for Fast Region-Based Pedestrian Recognition<br />
Shahrokni, Ali, Univ. of Reading<br />
Gawley, Darren, Univ. of Adelaide<br />
Ferryman, James, Univ. of Reading<br />
A new class of shape features for region classification and high-level recognition is introduced. The novel Randomised<br />
Region Ray (RRR) features can be used to train binary decision trees for object category classification using an abstract<br />
representation of the scene. In particular we address the problem of human detection using an over segmented input image.<br />
We therefore do not rely on pixel values for training, instead we design and train specialised classifiers on the sparse set<br />
of semantic regions which compose the image. Thanks to the abstract nature of the input, the trained classifier has the potential<br />
to be fast and applicable to extreme imagery conditions. We demonstrate and evaluate its performance in people<br />
detection using a pedestrian dataset.<br />
- 48 -
15:00-17:10, Paper MoBT8.44<br />
Road Change Detection from Multi-Spectral Aerial Data<br />
Mancini, Adriano, Univ. Pol. Delle Marche<br />
Frontoni, Emanuele, Univ. Pol. Delle Marche<br />
Zingaretti, Primo, Univ. Pol. Delle Marche<br />
The paper presents a novel approach to automate the Change Detection (CD) problem for the specific task of road extraction.<br />
Manual approaches to CD fail in terms of the time for releasing updated maps; in the contrary, automatic approaches,<br />
based on machine learning and image processing techniques, allow to update large areas in a short time with an accuracy<br />
and precision comparable to those obtained by human operators. This work is focused on the road-graph update starting<br />
from aerial, multi-spectral data. Geore ferenced, ground data, acquired by a GPS and an inertial sensor, are integrated with<br />
aerial data to speed up the change detector. After roads extraction by means of a binary AdaBoost classifier, the old roadgraph<br />
is updated exploiting a particle filter. In particular this filter results very useful to link (track) parts of roads not extracted<br />
by the classifier due to the presence of occlusions (e.g., shadows, trees).<br />
15:00-17:10, Paper MoBT8.45<br />
Object Recognition and Localization via Spatial Instance Embedding<br />
Ikizler Cinbis, Nazli, Boston Univ.<br />
Sclaroff, Stan, Boston Univ.<br />
We propose an approach for improving object recognition and localization using spatial kernels together with instance embedding.<br />
Our approach treats each image as a bag of instances (image features) within a multiple instance learning framework,<br />
where the relative locations of the instances are considered as well as the appearance similarity of the localized image features.<br />
The introduced spatial kernel augments the recognition power of the instance embedding in an intuitive and effective way,<br />
providing increased localization performance. We test our approach over two object datasets and present promising results.<br />
15:00-17:10, Paper MoBT8.46<br />
Co-Recognition of Actions in Video Pairs<br />
Shin, Young Min, Seoul National Univ.<br />
Cho, Minsu, Seoul National Univ.<br />
Lee, Kyoung Mu, Seoul National Univ.<br />
In this paper, we present a method that recognizes single or multiple common actions between a pair of video sequences.<br />
We establish an energy function that evaluates geometric and photometric consistency, and solve the action recognition<br />
problem by optimizing the energy function. The proposed stochastic inference algorithm based on the Monte Carlo method<br />
explores the video pair from the local spatio-temporal interest point matches to find the common actions. Our algorithm<br />
works in unsupervised way without prior knowledge about the type and the number of common actions. Experiments<br />
show that our algorithm produces promising results on single and multiple action recognition.<br />
15:00-17:10, Paper MoBT8.47<br />
Detecting Moving Objects using a Camera on a Moving Platform<br />
Lin, Chung-Ching, Georgia Inst. of Tech.<br />
Wolf, Marilyn, Georgia Inst. of Tech.<br />
This paper proposes a new ego-motion estimation and background/foreground classification method to effectively segment<br />
moving objects from videos captured by a moving camera on a moving platform. Existing methods for moving-camera<br />
detecting impose serious constraints. In our approach, ellipsoid scene shape is applied in the motion model and a complicated<br />
ego-motion estimation formula is derived. Genetic algorithm is introduced to accurately solve ego-motion parameters.<br />
After motion recovery, noisy result is refined by motion vector correlation and foreground is classified by pixel level probability<br />
model. Experiment results show that the method demonstrates significant detecting performance without further<br />
restrictions and performs effectively in complex detecting environment.<br />
- 49 -
15:00-17:10, Paper MoBT8.48<br />
A Unified Probabilistic Approach to Feature Matching and Object Segmentation<br />
Kim, Tae Hoon, Seoul National Univ.<br />
Lee, Kyoung Mu, Seoul National Univ.<br />
Lee, Sang Uk, Seoul National Univ.<br />
This paper deals with feature matching and segmentation of common objects in a pair of images, simultaneously. For the<br />
feature matching problem, the matching likelihoods of all feature correspondences are obtained by combining their discriminative<br />
power with the spatial coherence constraint that favors their spatial aggregation via object segmentation. At<br />
the same time, for the object segmentation problem, our algorithm estimates the object likelihood that each subregion is<br />
a commonly existing part in two images by the affinity propagation of the resulted matching likelihoods. Since these two<br />
problems are related to each other, our main idea to solve them is to integrate all the priors about them into a unified framework,<br />
that consists of several correlated quadratic cost functions. Eventually, all matching and object likelihoods are estimated<br />
simultaneously as a solution of linear system of equations. Based on these likelihoods, we finally recover the optimal<br />
feature matches and the common object parts by imposing simple sequential mapping and thresholding techniques, respectively.<br />
The experiments demonstrate the superiority of our algorithm compared with the conventional methods.<br />
15:00-17:10, Paper MoBT8.49<br />
Automatic Restoration of Scratch in Old Archive<br />
Kim, Kyung-Tai, Konkuk Univ.<br />
Kim, Byunggeun, Konkuk Univ.<br />
Kim, Eun Yi, Konkuk Univ.<br />
This paper presents scratch restoration method that can deal with scratches of various lengths and widths in old film. The<br />
proposed method consists of detection and reconstruction. The detection is performed using texture and shape properties<br />
of the scratches: first, each pixel is classified as scratches and non-scratches using a neural network (NN)-based texture<br />
classifier, and then some false alarms are removed by shape filtering. Thereafter, the detected region is reconstructed.<br />
Here, the reconstruction is formulated as energy minimization problem, thus genetic algorithm is used as optimization algorithm.<br />
The experimental result with well-known old films showed the effectiveness of the proposed method.<br />
15:00-17:10, Paper MoBT8.50<br />
Automatic Building Detection in Aerial Images using a Hierarchical Feature based Image Segmentation<br />
Izadi, Mohammad, Simon Fraser Univ.<br />
Saeedi, Parvaneh, Simon Fraser Univ.<br />
This paper introduces a novel automatic building detection method for aerial images. The proposed method incorporates<br />
a hierarchical multilayer feature based image segmentation technique using color. A number of geometrical/regional attributes<br />
are defined to identify potential regions in multiple layers of segmented images. A tree-based mechanism is utilized<br />
to inspect segmented regions using their spatial relationships with each other and their regional/geometrical characteristics.<br />
This process allows the creation of a set of candidate regions that are validated as rooftops based on the overlap between<br />
existing and predicted shadows of each region according to the image acquisition information. Experimental results show<br />
an overall shape accuracy and completeness of 96%.<br />
15:00-17:10, Paper MoBT8.51<br />
Making Visual Object Categorization More Challenging: Randomized Caltech-101 Data Set<br />
Kinnunen, Juha Teemu Ensio, Lappeenranta Univ. of Tech.<br />
Kamarainen, Joni-Kristian, Lappeenranta Univ. of Tech.<br />
Lensu, Lasse, Lappeenranta Univ. of Tech.<br />
Lankinen, Jukka, Lappeenranta Univ. of Tech.<br />
Kalviainen, Heikki, Lappeenranta Univ. of Tech.<br />
Visual object categorization is one of the most active research topics in computer vision, and Caltech-101 data set is one<br />
of the standard benchmarks for evaluating the method performance. Despite of its wide use, the data set has certain weaknesses:<br />
I) the objects are practically in a standard pose and scale in the middle of the images and ii) background varies too<br />
little in certain categories making it more discriminative than the foreground objects. In this work, we demonstrate how<br />
these weaknesses bias the evaluation results in an undesired manner. In addition, we reduce the bias effect by replacing<br />
- 50 -
the backgrounds with random landscape images from Google and by applying random Euclidean transformations to the<br />
foreground objects. We demonstrate how the proposed randomization process makes visual object categorization more<br />
challenging improving the relative results of methods which categorize objects by their visual appearance and are invariant<br />
to pose changes. The new data set is made publicly available for other researchers.<br />
15:00-17:10, Paper MoBT8.52<br />
A Reliability Assessment Paradigm for Automated Video Tracking Systems<br />
Chen, Chung-Hao, North Carolina Central Univ.<br />
Yao, Yi, GE Global Res.<br />
Koschan, Andreas, The Univ. of Tennessee<br />
Abidi, Mongi, The Univ. of Tennessee<br />
Most existing performance evaluation methods concentrate on defining separate metrics over a wide range of conditions<br />
and generating standard benchmarking video sequences for examining the effectiveness of video tracking systems. In<br />
other words, these methods attempt to design a robustness margin or factor for the system. These methods are deterministic<br />
in which a robustness factor, for example, 2 or 3 times the expected number of subjects to track or the strength of illumination<br />
would be required in the design. This often results in over design, thus increasing costs, or under design causing<br />
failure by unanticipated factors. In order to overcome these limitations, we propose in this paper an alternative framework<br />
to analyze the physics of the failure process and, through the concept of reliability, determine the time to failure in automated<br />
video tracking systems. The benefit of our proposed framework is that we can provide a unified and statistical index<br />
to evaluate the performance of automated video tracking system for a task to be performed. At the same time, the uncertainty<br />
problem about a failure process, which may be caused by the systems complexity, imprecise measurements of the<br />
relevant physical constants and variables, or the indeterminate nature of future events, can be addressed accordingly based<br />
on our proposed framework.<br />
15:00-17:10, Paper MoBT8.53<br />
Road Sign Detection in Images: A Case Study<br />
Belaroussi, Rachid, Univ. Paris Est,INRETS-LCPC<br />
Foucher, Philippe, Lab. Des Ponts et Chaussées<br />
Tarel, Jean-Philippe, LCPC<br />
Soheilian, Bahman, Ins. Géographique National,<br />
Charbonnier, Pierre, ERA27 LCPC – LRPC<br />
Paparoditis, Nicolas, Inst. Geographique National<br />
Road sign identification in images is an important issue, in particular for vehicle safety applications. It is usually tackled<br />
in three stages: detection, recognition and tracking, and evaluated as a whole. To progress towards better algorithms, we<br />
focus in this paper on the first stage of the process, namely road sign detection. More specifically, we compare, on the<br />
same ground-truth image database, results obtained by three algorithms that sample different state-of-the-art approaches.<br />
The three tested algorithms: Contour Fitting, Radial Symmetry Transform, and pair-wise voting scheme, all use color and<br />
edge information and are based on geometrical models of road signs. The test dataset is made of 847 images 960x1080 of<br />
complex urban scenes (available at www.itowns.fr/benchmarking.html). They feature 251 road signs of different shapes<br />
(circular, rectangular, triangular), sizes and types. The pros and cons of the three algorithms are discussed, allowing to<br />
draw new research perspectives.<br />
15:00-17:10, Paper MoBT8.54<br />
ImageCLEF@<strong>ICPR</strong> Contest: Challenges, Methodologies and Results of the Photo Annotation Task<br />
Nowak, Stefanie, Fraunhofer Inst. For Digital Media Tech.<br />
The Photo Annotation Task is performed as one task in the Image CLEF@<strong>ICPR</strong> contest and poses the challenge to annotate<br />
53 visual concepts in Flickr photos. Altogether 12 research teams met the multilabel classification challenge and submitted<br />
solutions. The participants were provided with a training and a validation set consisting of 5,000 and 3,000 annotated images,<br />
respectively. The test was performed on 10,000 images. Two evaluation paradigms have been applied, the evaluation per<br />
concept and the evaluation per example. The evaluation per concept was performed by calculating the Equal Error Rate and<br />
the Area Under Curve (AUC). The evaluation per example utilizes a recently proposed Ontology Score. For the concepts, an<br />
average AUC of 86.5% could be achieved, including concepts with an AUC of 96%. The classification performance for each<br />
image ranged between 59% and 100% with an average score of 85%.<br />
- 51 -
15:00-17:10, Paper MoBT8.55<br />
Task-Oriented Evaluation of Super-Resolution Techniques<br />
Tian, Li, NTT Corp.<br />
Suzuki, Akira, NTT Cyber Space Lab.<br />
Koike, Hideki, NTT Corp.<br />
The goal of super-resolution (SR) techniques is to enhance the resolution of low-resolution (LR) images. How to evaluate<br />
the performance of an SR algorithm seems to be forgotten when researchers keep producing algorithms. This paper presents<br />
a task-oriented method for evaluating SR techniques. Our method includes both objective and subjective measures and is<br />
designed from the viewpoint of how SR impacts many essential image processing and vision tasks. We evaluate some<br />
state-of-the-art SR algorithms and the results suggest that different SR algorithms should be utilized for different applications.<br />
In general, they reflect the consistency and conflict between objective and subjective measures as well as computer<br />
vision systems and human vision systems do.<br />
15:00-17:10, Paper MoBT8.56<br />
FeEval – a Dataset for Evaluation of Spatio-Temporal Local Features<br />
Stoettinger, Julian, TU Vienna<br />
Zambanini, Sebastian, TU Vienna<br />
Khan, Rehanullah, TU Vienna<br />
Hanbury, Allan, Information Retrieval Facility<br />
The most successful approaches to video understanding and video matching use local spatio-temporal features as a sparse<br />
representation for video content. Until now, no principled evaluation of these features has been done. We present FeEval,<br />
a dataset for the evaluation of such features. For the first time, this dataset allows for a systematic measurement of the stability<br />
and the invariance of local features in videos. FeEval consists of 30 original videos from a great variety of different<br />
sources, including HDTV shows, 1080p HD movies and surveillance cameras. The videos are iteratively varied by increasing<br />
blur, noise, increasing or decreasing light, median filter, compression quality, scale and rotation leading to a total<br />
of 1710 video clips. Homography matrices are provided for geometric transformations. The surveillance videos are taken<br />
from 4 different angles in a calibrated environment. Similar to prior work on 2D images, this leads to a repeatability and<br />
matching measurement in videos for spatio-temporal features estimating the overlap of features under increasing changes<br />
in the data.<br />
15:00-17:10, Paper MoBT8.57<br />
Performance Evaluation Tools for Zone Segmentation and Classification (PETS)<br />
Seo, Wontaek, Univ. of Maryland<br />
Agrawal, Mudit, Univ. of Maryland<br />
Doermann, David, Univ. of Maryland<br />
This paper describes a set of Performance Evaluation Tools (PETS) for document image zone segmentation and classification.<br />
The tools allow researchers and developers to evaluate, optimize and compare their algorithms by providing a<br />
variety of quantitative performance metrics. The evaluation of segmentation quality is based on the pixel-based overlaps<br />
between two sets of zones proposed by Randriamasy and Vincent. PETS extends the approach by providing a set of metrics<br />
for overlap analysis, RLE and polygonal representation of zones and introduces type-matching to evaluate zone classification.<br />
The software is available for research use.<br />
MoBT9 Upper Foyer<br />
Feature Extraction; Classification; Clustering; Bayesian Methods Poster Session<br />
Session chair: Pietikäinen, Matti (Univ of Oulu)<br />
15:00-17:10, Paper MoBT9.1<br />
Shape Filling Rate for Silhouette Representation and Recognition<br />
An, Guocheng, Chinese Acad. of Sciences<br />
Zhang, Fengjun, Chinese Acad. of Sciences<br />
Wang, Hong’An, Chinese Acad. of Sciences<br />
Dai, Guozhong, Chinese Acad. of Sciences<br />
- 52 -
Research on complex shape recognition showed that the shape context algorithm is sensitive to relative position variation<br />
of articulation. Aimed at this problem, a shape recognition method is proposed based on local shape filling rate of various<br />
object silhouettes. We take each landmark point as a circle center and use as its radius. Then, under a particular radius, the<br />
ratio between the covered silhouette pixels and the total pixels is defined as local shape filling rate. Thus, different radius<br />
may form different local shape filling rates. All landmark points with different radius will constitute a characteristic matrix<br />
which can effectively reflects the entire statistical property of the object shape. Experiments on a variety of shape databases<br />
show that the novel method is insensitive to articulation and less influenced by the number of landmark points, so our algorithm<br />
has strong power in describing object details.<br />
15:00-17:10, Paper MoBT9.2<br />
Learning Gmm using Elliptically Contoured Distributions<br />
Li, Bo, Beijing Inst. of Tech.<br />
Liu, Wenju, Chinese Acad. of Sciences<br />
Dou, Lihua, Beijing Inst. of Tech.<br />
Model order selection and parameter estimation for Gaussian mixture model (GMM) are important issues for clustering<br />
analysis and density estimation. Most methods for model selection usually add a penalty term in the objective function<br />
that can penalize the models and choose an optimal one from a set of candidate models. This paper presents a simple and<br />
novel approach to determine the number of components and simultaneously estimate the parameters for GMM. By introducing<br />
the degenerating model, the proposed approach overcomes the drawback of likelihood estimate that is a non-decreasing<br />
function and can not be used to select the number of components. The degenerating model is a more general form<br />
of mixture component density and it can degenerate into the component density or a crater-like density when its parameter<br />
K varies from 1 to a bigger value. The likelihood of the crater-like density evaluated for the training data approximates to<br />
zero. This characteristic of the degenerating model forms the foundation of the proposed approach. The experimental<br />
results show robust and evident performance improvement of the approach.<br />
15:00-17:10, Paper MoBT9.3<br />
FIND: A Neat Flip Invariant Descriptor<br />
Guo, Xiaojie, Tianjin Univ.<br />
Cao, Xiaochun, Tianjin Univ.<br />
In this paper, we introduce a novel Flip Invariant Descriptor (FIND). FIND improves the degenerated performance resulted<br />
from image flips and reduces both space and time costs. Flip invariance of FIND enables the intractable flip detection to<br />
be achieved easily, instead of duplicately implementing the procedure. To alleviate the pressure brought by the increasing<br />
scale of image and video data, FIND utilizes a concise structure with less storage space. Comparing to SIFT, FIND reduces<br />
35.94% length for a descriptor. We compare FIND against SIFT with respect to accuracy, speed and space cost. An application<br />
to image search over a database of 3.27 million descriptors is also shown.<br />
15:00-17:10, Paper MoBT9.4<br />
Matching Image with Multiple Local Features<br />
Cao, Yudong, Beijing Univ. of Posts and Telecommunications/ Liaoning Univ. of Tech<br />
Zhang, Honggang, Beijing Univ. of Posts and Telecommunications<br />
Gao, Yanyan, Beijing Univ. of Posts and Telecommunications<br />
Xu, Xiaojun, Beijing Univ. of Posts and Telecommunications<br />
Guo, Jun, Beijing Univ. of Posts and Telecommunications<br />
In this paper, we present the fusional feature composed of Affine-SIFT, MSER and color moment invariants. The fusional<br />
feature is more robust and distinctive than a single local feature. Instead of adding three local features together simply, an<br />
efficient two-level matching strategy is devised with the fusional feature, which speeds up the establishment of the local<br />
correspondences. To remove partial false positives, an affine transformation is estimated with the weighted RANSAC<br />
which decreases iteration times. The experimental results show that our approach can achieve more accurate correspondence.<br />
We prospect to apply the fusional feature and match strategy to image retrieval in the end.<br />
- 53 -
15:00-17:10, Paper MoBT9.5<br />
Lipreading: A Graph Embedding Approach<br />
Zhou, Ziheng, Univ. of Oulu<br />
Zhao, Guoying, Univ. of Oulu<br />
Pietikäinen, Matti, Univ. of Oulu<br />
In this paper, we propose a novel graph embedding method for the problem of lipreading. To characterize the temporal<br />
connections among video frames of the same utterance, a new distance metric is defined on a pair of frames and graphs<br />
are constructed to represent the video dynamics based on the distances between frames. Audio information is used to assist<br />
in calculating such distances. For each utterance, a subspace of the visual feature space is learned from a well-defined intrinsic<br />
and penalty graph within a graph-embedding framework. Video dynamics are found to be well preserved along<br />
some dimensions of the subspace. Discriminatory cues are then decoded from curves of the projected visual features to<br />
classify different utterances.<br />
15:00-17:10, Paper MoBT9.6<br />
Face Recognition using a Multi-Manifold Discriminant Analysis Method<br />
Yang, Wankou, Southeast Univ. Nanjing<br />
Sun, Changyin, Southeast Univ. Nanjing<br />
Zhang, Lei, The Hong Kong Pol. Univ.<br />
In this paper, we propose a Multi-Manifold Discriminant Analysis (MMDA) method for face feature extraction and face<br />
recognition, which is based on graph embedded learning and under the Fisher discirminant analysis framework. In MMDA,<br />
the within-class graph and between-class graph are designed to characterize the within-class compactness and the between-class<br />
separability, respectively, seeking for the discriminant matrix that simultaneously maximizing the betweenclass<br />
scatter and minimizing the within-class scatter. In addition, the within-class graph can also represent the sub-manifold<br />
information and the between-class graph can also represent the multi-manifold information. The proposed MMDA is examined<br />
by using the FERET face database, and the experimental results demonstrate that MMDA works well in feature<br />
extraction and lead to good recognition performance.<br />
15:00-17:10, Paper MoBT9.7<br />
Globally-Preserving based Locally Linear Embedding<br />
Hui, Kanghua, Chinese Acad. of Sciences<br />
Wang, Chunheng, Chinese Acad. of Sciences<br />
Xiao, Baihua, Chinese Acad. of Sciences<br />
The locally linear embedding (LLE) algorithm is considered as a powerful method for the problem of nonlinear dimensionality<br />
reduction. In this paper, a new method called globally-preserving based LLE (GPLLE) is proposed. It not only<br />
preserves the local neighborhood, but also keeps those distant samples still far away, which solves the problem that LLE<br />
may encounter, i.e. LLE only makes local neighborhood preserving, but cannot prevent the distant samples from nearing.<br />
Moreover, GPLLE can estimate the intrinsic dimensionality d of the manifold structure. The experiment results show that<br />
GPLLE always achieves better classification performances than LLE based on the estimated d.<br />
15:00-17:10, Paper MoBT9.8<br />
3d Human Pose Estimation by an Annealed Two-Stage Inference Method<br />
Wang, Yuan-Kai, Fu Jen Univ.<br />
Cheng, Kuang-You, Fu Jen Univ.<br />
This paper proposes a novel human motion capture method that locates human body joint position and reconstructs the<br />
human pose in 3D space from monocular images. We propose a two-stage framework including 2D and 3D probabilistic<br />
graphical models which can solve the occlusion problem for the estimation of human joint positions. The 2D and 3D<br />
models adopt directed acyclic structure to avoid error propagation of inference in the models. Both the 2D and 3D models<br />
utilize the Expectation Maximization algorithm to learn prior distributions of the models. An annealed Gibbs sampling<br />
method is proposed for the two-stage method to inference the maximum posteriori distributions of joint positions. The annealing<br />
process can efficiently explore the mode of distributions and find solutions in high-dimensional space. Experiments<br />
are conducted on the Human Eva dataset to show the effectiveness of the proposed method. The experimental data are<br />
image sequences of walking motion with a full 180 turn around a region, which causes occlusion of poses and loss of<br />
- 54 -
image observations. Experimental results show that the proposed two-stage approach can efficiently estimate more accurate<br />
human poses from monocular images.<br />
15:00-17:10, Paper MoBT9.9<br />
Extended Locality Preserving Discriminant Analysis for Face Recognition<br />
Yang, Liping, Chongqing Univ.<br />
Gong, Weiguo, Chongqing Univ.<br />
Gu, Xiaohua, Chongqing Univ.<br />
In this paper, an extended locality preserving discriminant analysis (ELPDA) method is proposed. To address the disadvantages<br />
of original locality preserving discriminant analysis (LPDA), a new locality preserving between-class scatter,<br />
which is characterized by samples and the corresponding k out-class nearest neighbors, is defined. Moreover, the small<br />
sample size problem is also avoided by solving a new optimization function. Experimental results on AR and FERET subsets<br />
illustrate the effectiveness of the proposed method for face recognition.<br />
15:00-17:10, Paper MoBT9.10<br />
Beyond “Near-Duplicates”: Learning Hash Codes for Efficient Similar-Image Retrieval<br />
Baluja, Shumeet, Google, Inc.<br />
Covell, Michele, Google, Inc.<br />
Finding similar images in a large database is an important, but often computationally expensive, task. In this paper, we<br />
present a two-tier similar-image retrieval system with the efficiency characteristics found in simpler systems designed to<br />
recognize near-duplicates. We compare the efficiency of lookups based on random projections and learned hashes to 100times-more-frequent<br />
exemplar sampling. Both approaches significantly improve on the results from exemplar sampling,<br />
despite having significantly lower computational costs. Learned-hash keys provide the best result, in terms of both recall<br />
and efficiency.<br />
15:00-17:10, Paper MoBT9.11<br />
Rare Class Classification on SVM<br />
He, He, The Hong Kong Pol. Univ.<br />
Ghodsi, Ali, University of Waterloo<br />
The problem of classification on highly imbalanced datasets has been studied extensively in the literature. Most classifiers<br />
show significant deterioration in performance when dealing with skewed datasets. In this paper, we first examine the underlying<br />
reasons for SVM’s deterioration on imbalanced datasets. We then propose two modifications for the soft margin<br />
SVM, where we change or add constraints to the optimization problem. The proposed methods are compared with regular<br />
SVM, cost-sensitive SVM and two re-sampling methods. Our experimental results demonstrate that this constrained SVM<br />
can consistently outperform the other associated methods.<br />
15:00-17:10, Paper MoBT9.12<br />
Package Boosting for Readaption of Cascaded Classifiers<br />
Szczot, Magdalena, Daimler AG<br />
Löhlein, Otto, Daimler AG<br />
Forster, Julian, Daimler AG<br />
Palm, Günther, Univ. of Ulm<br />
This contribution presents an efficient and useful way to readapt a cascaded classifier. We introduce Package Boosting<br />
which combines the advantages of Real Adaboost and Online Boosting for the realization of the strong learners in each<br />
cascade layer. We also examine the conditions which need to be fulfilled by a cascade in order to meet the requirements<br />
of an online algorithm and present the evaluation results of the system.<br />
- 55 -
15:00-17:10, Paper MoBT9.13<br />
Baby-Posture Classification from Pressure-Sensor Data<br />
Boughorbel, Sabri, Philips Res. Lab.<br />
Bruekers, Fons, Philips Res. Lab.<br />
Breebaart, Jeroen, Philips Res. Lab.<br />
The activity of babies and more specifically the posture of babies is an important aspect in their safety and development.<br />
In this paper, we studied the automatic classification of baby posture using a pressure-sensitive mat. The posture classification<br />
problem is formulated as the design of features that describe the pressure patterns induced by the child in combination<br />
with generic classifiers. Novel rotation invariant features constructed from high order statistics obtained from the concentric<br />
rings around the center of gravity. Non-constant ring radii are used in order to ensure uniform cell areas and therefore<br />
equal importance of features. A vote fusion of various generic classifiers is used for classification. Temporal information<br />
was shown to improve the classification performance. The obtained results are promising and open new opportunities for<br />
applications and further research in the area of baby safety and development.<br />
15:00-17:10, Paper MoBT9.14<br />
Vector Quantization Mappings for Speaker Verification<br />
Brew, Anthony, Univ. Coll. Dublin<br />
Cunningham, Pádraig, Univ. Coll. Dublin<br />
In speaker verification several techniques have emerged to map variable length utterances into a fixed dimensional space<br />
for classification. One popular approach uses Maximum A-Posteriori (MAP) adaptation of a Gaussian Mixture Model<br />
(GMM) to create a super-vector. This paper investigates using Vector Quantisation (VQ) as the global model to provide a<br />
similar mapping. This less computationally complex mapping gives comparable results to its GMM counterpart while<br />
also providing the ability for an efficient iterative update enabling media files to be scanned with a fixed length window.<br />
15:00-17:10, Paper MoBT9.15<br />
Maximum Entropy Model based Classification with Feature Selection<br />
Dukkipati, Ambedkar, Indian Inst. of Science<br />
Yadav, Abhay Kumar, Indian Inst. of Science<br />
M, Narasimha Murty, Indian Inst. of Science<br />
In this paper, we propose a classification algorithm based on the maximum entropy principle. This algorithm finds the<br />
most appropriate class-conditional maximum entropy distributions for classification. No prior knowledge about the form<br />
of density function for estimating the class conditional density is assumed except that the information is given in the form<br />
of expected valued of features. This algorithm also incorporates a method to select relevant features for classification. The<br />
proposed algorithm is suitable for large data-sets and is demonstrated by simulation results on some real world benchmark<br />
data-sets.<br />
15:00-17:10, Paper MoBT9.16<br />
Dimensionality Reduction by Minimal Distance Maximization<br />
Xu, Bo, Chinese Acad. of Sciences<br />
Huang, Kaizhu, Chinese Acad. of Sciences<br />
Liu, Cheng-Lin, Chinese Acad. of Sciences<br />
In this paper, we propose a novel discriminant analysis method, called Minimal Distance Maximization (MDM). In contrast<br />
to the traditional LDA, which actually maximizes the average divergence among classes, MDM attempts to find a low-dimensional<br />
subspace that maximizes the minimal (worst-case) divergence among classes. This ``minimal” setting solves<br />
the problem caused by the ``average” setting of LDA that tends to merge similar classes with smaller divergence when<br />
used for multi-class data. Furthermore, we elegantly formulate the worst-case problem as a convex problem, making the<br />
algorithm solvable for larger data sets. Experimental results demonstrate the advantages of our proposed method against<br />
five other competitive approaches on one synthetic and six real-life data sets.<br />
- 56 -
15:00-17:10, Paper MoBT9.17<br />
Possibilistic Clustering based on Robust Modeling of Finite Generalized Dirichlet Mixture<br />
Ben Ismail, Maher, Univ. of Louisville<br />
Frigui, Hichem, Univ. of Louisville<br />
We propose a novel possibilistic clustering algorithm based on robust modelling of the Generalized Dirichlet (GD) finite<br />
mixture. The algorithm generates two types of membership degrees. The first one is a posterior probability that indicates<br />
the degree to which the point fits the estimated distribution. The second membership represents the degree of typicality<br />
and is used to indentify and discard noise points. The algorithm minimizes one objective function to optimize GD mixture<br />
parameters and possibilistic membership values. This optimization is done iteratively by dynamically updating the Dirichlet<br />
mixture parameters and the membership values in each iteration. We compare the performance of the proposed algorithm<br />
with an EM based approach. We show that the possibilistic approach is more robust.<br />
15:00-17:10, Paper MoBT9.18<br />
Cluster-Pairwise Discriminant Analysis<br />
Makihara, Yasushi, The Inst. of Scientific and Industrial Res. Univ.<br />
Yagi, Yasushi, Osaka Univ.<br />
Pattern recognition problems often suffer from the larger intra-class variation due to situation variations such as pose,<br />
walking speed, and clothing variations in gait recognition. This paper describes a method of discriminant subspace analysis<br />
focused on situation cluster pair. In training phase, both a situation cluster discriminant subspace and class discriminant<br />
subspaces for the situation cluster pair by using training samples of non recognition-target classes. In testing phase, given<br />
a matching pair of patterns of recognition-target classes, posterior of situation cluster pairs is estimated at first, and then<br />
the distance is calculated in the corresponding cluster-pairwise class discriminant subspace. The experiments both with<br />
simulation data and real data show the effectiveness of the proposed method.<br />
15:00-17:10, Paper MoBT9.19<br />
Online Discriminative Kernel Density Estimation<br />
Kristan, Matej, Univ. of Ljubljana<br />
Leonardis, Ales, Univ. of Ljubljana<br />
We propose a new method for online estimation of probabilistic discriminative models. The method is based on the recently<br />
proposed online Kernel Density Estimation (oKDE) framework which produces Gaussian mixture models and allows<br />
adaptation using only a single data point at a time. The oKDE builds reconstructive models from the data, and we extend<br />
it to take into account the interclass discrimination through a new distance function between the classifiers. We arrive at<br />
an online discriminative Kernel Density Estimator (odKDE). We compare the odKDE to oKDE, batch state-of-the-art<br />
KDEs and support vector machine (SVM) on a standard database. The odKDE achieves comparable classification performance<br />
to that of best batch KDEs and SVM, while allowing online adaptation, and produces models of lower complexity<br />
than the oKDE.<br />
15:00-17:10, Paper MoBT9.20<br />
Local Outlier Detection based on Kernel Regression<br />
Gao, Jun, Chinese Acad. of Sciences<br />
Hu, Weiming, Chinese Acad. of Sciences<br />
Li, Wei, Chinese Acad. of Sciences<br />
Zhang, Zhongfei, State Univ. of New York, Binghamton<br />
Wu, Ou, Chinese Acad. of Sciences<br />
Outlier detection keeps an important and attractive task of the knowledge discovery in databases. In this paper, a novel<br />
approach named Multi-scale Local Kernel Regression is proposed. It transfers the unsupervised learning of outlier detection<br />
to the classic non-parameter regression learning. Through preprocessing the original data by the basic local density-based<br />
method, it adopts the local kernel regression estimator in the multiple scale neighborhoods to determine outliers. Experiments<br />
on several real life data sets demonstrate that this approach is promising in detection performance.<br />
- 57 -
15:00-17:10, Paper MoBT9.21<br />
Verification under Increasing Dimensionality<br />
Hendrikse, Anne, Univ. of Twente<br />
Veldhuis, Raymond, Univ. of Twente<br />
Spreeuwers, Luuk, Univ. of Twente<br />
Verification decisions are often based on second order statistics estimated from a set of samples. Ongoing growth of computational<br />
resources allows for considering more and more features, increasing the dimensionality of the samples. If the<br />
dimensionality is of the same order as the number of samples used in the estimation or even higher, then the accuracy of<br />
the estimate decreases significantly. In particular, the eigenvalues of the covariance matrix are estimated with a bias and<br />
the estimate of the eigenvectors differ considerably from the real eigenvectors. We show how a classical approach of verification<br />
in high dimensions is severely affected by these problems, and we show how bias correction methods can reduce<br />
these problems.<br />
15:00-17:10, Paper MoBT9.22<br />
Discriminant Feature Manifold for Facial Aging Estimation<br />
Fang, Hui, Swansea Univ.<br />
Grant, Phil, Swansea Univ.<br />
Min, Chen, Swansea Univ.<br />
Computerised facial aging estimation, which has the potential for many applications in human-computer interactions, has<br />
been investigated by many computer vision researchers in recent years. In this paper, a feature-based discriminant subspace<br />
is proposed to extract more discriminating and robust representations for aging estimation. After aligning all the faces by<br />
a piece-wise affine transform, orthogonal locality preserving projection (OLPP) is employed to project local binary patterns<br />
(LBP) from the faces into an age-discriminant subspace. The feature extracted from this manifold is more distinctive for<br />
age estimation compared with the features using in the state-of-the-art methods. Based on the public database FG-NET,<br />
the performance of the proposed feature is evaluated by using two different regression techniques, quadratic function and<br />
neural-network regression. The proposed feature subspace achieves the best performance based on both types of regression.<br />
15:00-17:10, Paper MoBT9.23<br />
Tensor Voting based Color Clustering<br />
Nguyen Dinh, Toan, Chonnam National Univ.<br />
Park, Jonghyun, Chonnam National Univ.<br />
Lee, Chilwoo, Chonnam National Univ.<br />
Lee, Gueesang, Chonnam National Univ.<br />
A novel color clustering algorithm based on tensor voting is proposed. Each color feature vector is encoded by a second<br />
order tensor. Tensor voting is then applied to estimate the number of dominant colors and perform color clustering by exploiting<br />
the shape and data density of the color clusters. The experimental results show that the proposed method generates<br />
good results in image segmentation, especially in the case of images with multi-color texts.<br />
15:00-17:10, Paper MoBT9.24<br />
An Improved Structural EM to Learn Dynamic Bayesian Nets<br />
De Campos, Cassio, Dalle Molle Inst. For Artificial Intelligence<br />
Zeng, Zhi, Rensselaer Pol. Inst.<br />
Ji, Qiang, RPI<br />
This paper addresses the problem of learning structure of Bayesian and Dynamic Bayesian networks from incomplete<br />
data based on the Bayesian Information Criterion. We describe a procedure to map the problem of the dynamic case into<br />
a corresponding augmented Bayesian network through the use of structural constraints. Because the algorithm is exact<br />
and anytime, it is well suitable for a structural Expectation–Maximization (EM) method where the only source of approximation<br />
is due to the EM itself. We show empirically that the use a global maximizer inside the structural EM is computationally<br />
feasible and leads to more accurate models.<br />
- 58 -
15:00-17:10, Paper MoBT9.25<br />
Gaussian Process Learning from Order Relationships using Expectation Propagation<br />
Wang, Ruixuan, Univ. of Dundee<br />
Mckenna, Stephen James, Univ. of Dundee<br />
A method for Gaussian process learning of a scalar function from a set of pair-wise order relationships is presented. Expectation<br />
propagation is used to obtain an approximation to the log marginal likelihood which is optimised using an analytical<br />
expression for its gradient. Experimental results show that the proposed method performs well compared with a<br />
previous method for Gaussian process preference learning.<br />
15:00-17:10, Paper MoBT9.26<br />
Feature Ranking based on Decision Border<br />
Diamantini, Claudia, Univ. Pol. Delle Marche<br />
Gemelli, Alberto, Univ. Pol. Delle Marche<br />
Potena, Domenico, Univ. Pol. Delle Marche<br />
In this paper a Feature Ranking algorithm for classification is proposed, which is based on the notion of Bayes decision<br />
border. The method elaborates upon the results of the Decision Border Feature Extraction approach, exploiting properties<br />
of eigenvalues and eigenvectors of the orthogonal transformation to calculate the discriminative importance weights of<br />
the original features. Non parametric classification is also considered by resorting to Labeled Vector Quantizers neural<br />
networks trained by the BVQ algorithm. The choice of this architecture leads to a cheap implementation of the ranking algorithm<br />
we call BVQ-FR. The effectiveness of BVQ-FR is tested on real datasets. The novelty of the method is to use a<br />
feature extraction technique to assess the weight of the original features, as opposed to heuristics methods commonly used.<br />
15:00-17:10, Paper MoBT9.27<br />
Three-Layer Spatial Sparse Coding for Image Classification<br />
Dai, Dengxin, Wuhan Univ.<br />
Yang, Wen, Wuhan Univ.<br />
Wu, Tianfu, Lotus Hill Res. Inst.<br />
In this paper, we propose a three-layer spatial sparse coding (TSSC) for image classification, aiming at three objectives:<br />
naturally recognizing image categories without learning phase, naturally involving spatial configurations of images, and<br />
naturally counteracting the intra-class variances. The method begins by representing the test images in a spatial pyramid<br />
as the to-be-recovered signals, and taking all sampled image patches at multiple scales from the labeled images as the<br />
bases. Then, three sets of coefficients are involved into the cardinal sparse coding to get the TSSC, one to penalize spatial<br />
inconsistencies of the pyramid cells and the corresponding selected bases, one to guarantee the sparsity of selected images,<br />
and the other to guarantee the sparsity of selected categories. Finally, the test images are classified according to a simple<br />
image-to-category similarity defined on the coding coefficients. In experiments, we test our method on two publicly available<br />
datasets and achieve significantly more accurate results than the conventional sparse coding with only a modest increase<br />
in computational complexity.<br />
15:00-17:10, Paper MoBT9.28<br />
Theoretical Analysis of a Performance Measure for Imbalanced Data<br />
Garcia, Vicente, Univ. Jaume I<br />
Mollineda, Ramón A., Univ. Jaume I<br />
Sanchez, J. Salvador, Univ. Jaume I<br />
This paper analyzes a generalization of a new metric to evaluate the classification performance in imbalanced domains,<br />
combining some estimate of the overall accuracy with a plain index about how dominant the class with the highest individual<br />
accuracy is. A theoretical analysis shows the merits of this metric when compared to other well-known measures.<br />
15:00-17:10, Paper MoBT9.29<br />
Cluster Preserving Embedding<br />
Zhan, Yubin, National Univ. of Defense Tech.<br />
Yin, Jianping, National Univ. of Defense Tech.<br />
- 59 -
Most of existing dimensionality reduction methods obtain the low-dimensional embedding via preserving a certain property<br />
of the data, such as locality, neighborhood relationship. However, the intrinsic cluster structure of data, which plays a key<br />
role in analyzing and utilizing the data, has been ignored by the state-of-the-art dimensionality reduction methods. Hence,<br />
in this paper we propose a novel dimensionality reduction method called Cluster Preserving Embedding(CPE), in which<br />
the cluster structure of original data is preserved via preserving the robust path-based similarity between pairwise points.<br />
We present two different methods to preserve this similarity. One is the Multidimensional Scaling(MDS) way, which tries<br />
to preserve similarity matrix accurately, the other one is a Laplacian-style way, which preserves the topological partial<br />
order of the similarity rather than similarity itself. Encouraging experimental results on a toy data set and handwritten<br />
digits from MNIST database demonstrate the effectiveness of our Cluster Preserving Embedding method.<br />
15:00-17:10, Paper MoBT9.30<br />
Color Image Analysis by Quaternion Zernike Moments<br />
Chen, Beijing, Southeast Univ.<br />
Shu, Huazhong, Southeast Univ.<br />
Zhang, Hui, Southeast Univ.<br />
Chen, Gang, Southeast Univ.<br />
Luo, Limin, Southeast Univ.<br />
Moments and moment invariants are useful tool in pattern recognition and image analysis. Conventional methods to deal<br />
with color images are based on RGB decomposition or graying. In this paper, by using the theory of quaternions, we introduce<br />
a set of quaternion Zernike moments (QZMs) for color images in a holistic manner. It is shown that the QZMs<br />
can be obtained via the conventional Zernike moments of each channel. We also construct a set of combined invariants to<br />
rotation and translation (RT) using the modulus of central QZMs. Experimental results show that the proposed descriptors<br />
are more efficient than the existing ones.<br />
15:00-17:10, Paper MoBT9.32<br />
Topic-Sensitive Tag Ranking<br />
Jin, Yan’An, Huazhong Univ. of Science and Tech.<br />
Li, Ruixuan, Huazhong Univ. of Science and Tech.<br />
Lu, Zhengding, Huazhong Univ. of Science and Tech.<br />
Wen, Kunmei, Huazhong Univ. of Science and Tech.<br />
Gu, Xiwu, Huazhong Univ. of Science and Tech.<br />
Social tagging is an increasingly popular way to describe and classify documents on the web. However, the quality of the<br />
tags varies considerably since the tags are authored freely. How to rate the tags becomes an important issue. In this paper,<br />
we propose a topic-sensitive tag ranking (TSTR) approach to rate the tags on the web. We employ a generative probabilistic<br />
model to associate each tag with a distribution of topics. Then we construct a tag graph according to the co-tag relationships<br />
and perform a topic-level random walk over the graph to suggest a ranking score for each tag at different topics. Experimental<br />
results validate the effectiveness of the proposed tag ranking approach.<br />
15:00-17:10, Paper MoBT9.33<br />
Water Reflection Detection using a Flip Invariant Shape Detector<br />
Zhang, Hua, Tianjin Univ.<br />
Guo, Xiaojie, Tianjin Univ.<br />
Cao, Xiaochun, Tianjin Univ.<br />
Water reflection detection is a tough task in computer vision, since the reflection is distorted by ripples irregularly. This<br />
paper proposes an effective method to detect water reflections. We introduce a descriptor that is not only invariant to<br />
scales, rotations and affine transformations, but also tolerant to the flip transformation and even non-rigid distortions, such<br />
as ripple effects. We analyze the structure of our descriptor and show how it outperforms the existing mirror feature descriptors<br />
in the context of water reflection. The experimental results demonstrate that our method is able to detect the<br />
water reflections.<br />
- 60 -
15:00-17:10, Paper MoBT9.34<br />
CDP Mixture Models for Data Clustering<br />
Ji, Yangfeng, Peking Univ.<br />
Lin, Tong, Peking Univ.<br />
Zha, Hongbin, Peking Univ.<br />
In Dirichlet process (DP) mixture models, the number of components is implicitly determined by the sampling parameters<br />
of Dirichlet process. However, this kind of models usually produces lots of small mixture components when modeling<br />
real-world data, especially high-dimensional data. In this paper, we propose a new class of Dirichlet process mixture<br />
models with some constrained principles, named constrained Dirichlet process (CDP) mixture models. Based on general<br />
DP mixture models, we add a resampling step to obtain latent parameters. In this way, CDP mixture models can suppress<br />
noise and generate the compact patterns of the data. Experimental results on data clustering show the remarkable performance<br />
of the CDP mixture models.<br />
15:00-17:10, Paper MoBT9.35<br />
A Simple Approach to Find the Best Wavelet Basis in Classification Problems<br />
Faradji, Farhad, Univ. of British Columbia<br />
Ward, Rabab K., Univ. of British Columbia<br />
Birch, Gary E., Neil Squire Society<br />
In this paper, we address the problem of finding the best wavelet basis in wavelet packet analysis for applications based<br />
on classification. We implement and evaluate our proposed method in the design of a self-paced 2-state mental task-based<br />
brain-computer interface (BCI) as one possible type of classification-based applications. The autoregressive coefficients<br />
of the best wavelet basis are concatenated to form the feature vector. The 2-stage classification process is based on quadratic<br />
discriminant analysis and majority voting. Seventeen wavelets from 2 different families are tested. A cross-validation<br />
process is per-formed twice to do model selection and system performance evaluation. The results show that the proposed<br />
method can be well applied to BCI systems.<br />
15:00-17:10, Paper MoBT9.36<br />
Learning Probabilistic Models of Contours<br />
Amate, Laure, Univ. of Nice-Sophia Antipolis, CNRS<br />
Rendas, Maria João, Univ. of Nice-Sophia Antipolis, CNRS<br />
We present a methodology for learning spline-based probabilistic models for sets of contours, proposing a new Monte<br />
Carlo variant of the EM algorithm to estimate the parameters of a family of distributions defined over the set of spline<br />
functions (with fixed complexity). The proposed model effectively captures the major morphological properties of the observed<br />
set of contours as well as its variability, as the simulation results presented demonstrate.<br />
15:00-17:10, Paper MoBT9.37<br />
Local Sparse Representation based Classification<br />
Li, Chun-Guang, Beijing Univ. of Posts and Telecommunications<br />
Guo, Jun, Beijing Univ. of Posts and Telecommunications<br />
Zhang, Honggang, Beijing Univ. of Posts and Telecommunications<br />
In this paper, we address the computational complexity issue in Sparse Representation based Classification (SRC). In<br />
SRC, it is time consuming to find a global sparse representation. To remedy this deficiency, we propose a Local Sparse<br />
Representation based Classification (LSRC) scheme, which performs sparse decomposition in local neighborhood. In<br />
LSRC, instead of solving the L1-norm constrained least square problem for all of training samples we solve a similar<br />
problem in a local neighborhood for each test sample. Experiments on face recognition data sets ORL and Extended Yale<br />
B demonstrated that the proposed LSRC algorithm can reduce the computational complexity and remain the comparative<br />
classification accuracy and robustness.<br />
- 61 -
15:00-17:10, Paper MoBT9.38<br />
Manifold Modeling with Learned Distance in Random Projection Space for Face Recognition<br />
Tsagkatakis, Grigorios, Rochester Inst. of Tech.<br />
Savakis, Andreas, Rochester Inst. of Tech.<br />
In this paper, we propose the combination of manifold learning and distance metric learning for the generation of a representation<br />
that is both discriminative and informative, and we demonstrate that this approach is effective for face recognition.<br />
Initial dimensionality reduction is achieved using random projections, a computationally efficient and data independent<br />
linear transformation. Distance metric learning is then applied to increase the separation between classes and improve the<br />
accuracy of nearest neighbor classification. Finally, a manifold learning method is used to generate a mapping between<br />
the randomly projected data and a low dimensional manifold. Face recognition results suggest that the combination of<br />
distance metric learning and manifold learning can increase performance. Furthermore, random projections can be applied<br />
as an initial step without significantly affecting the classification accuracy.<br />
15:00-17:10, Paper MoBT9.39<br />
Part Detection, Description and Selection based on Hidden Conditional Random Fields<br />
Lu, Wenhao, Tsinghua Univ.<br />
Wang, Shengjin, Tsinghua Univ.<br />
Ding, Xiaoqing, Tsinghua Univ.<br />
In this paper, the problem of part detection, description and selection is discussed. This problem is crucial in the learning<br />
algorithms of part-based models, but cannot be solved well when some candidate parts are extracted from background.<br />
This paper studies this problem and introduces a new algorithm, HCRF-PS (Hidden Conditional Random Fields for Part<br />
Selection), for part detection, description, especially selection. Our algorithm is distinguished for its power to optimize<br />
multiple kinds of information at the same time, including texture, color, location and part label. Finally, we did some experiments<br />
with HCRF-PS algorithm which give good results on both virtual and real data.<br />
15:00-17:10, Paper MoBT9.40<br />
Boosting Bayesian MAP Classification<br />
Piro, Paolo, CNRS/Univ. of Nice-Sophia Antipolis<br />
Nock, Richard, Univ. des Antilles et de la Guyane<br />
Nielsen, Frank, Ec. Pol.<br />
Barlaud, Michel, CNRS/Univ. of Nice-Sophia Antipolis<br />
In this paper we redefine and generalize the classic k-nearest neighbors (k-NN) voting rule in a Bayesian maximum-aposteriori<br />
(MAP) framework. Therefore, annotated examples are used for estimating pointwise class probabilities in the<br />
feature space, thus giving rise to a new instance-based classification rule. Namely, we propose to ``boost’’ the classic k-<br />
NN rule by inducing a strong classifier from a combination of sparse training data, called ``prototypes’’. In order to learn<br />
these prototypes, our MapBoost algorithm globally minimizes a multiclass exponential risk defined over the training data,<br />
which depends on the class probabilities estimated at sample points themselves. We tested our method for image categorization<br />
on three benchmark databases. Experimental results show that MapBoost significantly outperforms classic k-NN<br />
(up to 8%). Interestingly, due to the supervised selection of sparse prototypes and the multiclass classification framework,<br />
the accuracy improvement is obtained with a considerable computational cost reduction.<br />
15:00-17:10, Paper MoBT9.41<br />
Weighting of the K-Nearest-Neighbors<br />
Chernoff, Konstantin, Univ. of Copenhagen<br />
Nielsen, Mads<br />
This paper presents two distribution independent weighting schemes for k-Nearest-Neighbors (kNN). Applying the first<br />
scheme in a Leave-One-Out (LOO) setting corresponds to performing complete b-fold cross validation (b-CCV), while<br />
applying the second scheme corresponds to performing bootstrapping in the limit of infinite iterations. We demonstrate<br />
that the soft kNN errors obtained through b-CCV can be obtained by applying the weighted kNN in a LOO setting, and<br />
that the proposed weighting schemes can decrease the variance and improve the generalization of kNN in a CV setting.<br />
- 62 -
15:00-17:10, Paper MoBT9.42<br />
Learning Sparse Face Features : Application to Face Verification<br />
Buyssens, Pierre, Greyc UMR6072<br />
Revenu, Marinette, GREYC UMR 6072<br />
We present a low resolution face recognition technique based on a Convolutional Neural Network approach. The network<br />
is trained to reconstruct a reference per subject image. In classical feature–based approaches, a first stage of features extraction<br />
is followed by a classification to perform the recognition. In classical Convolutional Neural Network approaches,<br />
features extraction stages are stacked (interlaced with pooling layers) with classical neural layers on top to form the complete<br />
architecture of the network. This paper addresses two questions : 1. Does a pretraining of the filters in an unsupervised<br />
manner improve the recognition rate compared to the one with filters learned in a purely supervised scheme ? 2. Is there<br />
an advantage of pretraining more than one feature extraction stage ? We show particularly that a refinement of the filters<br />
during the supervised training improves the results.<br />
15:00-17:10, Paper MoBT9.43<br />
Image Feature Extraction using 2D Mel-Cepstrum<br />
Cakir, Serdar, Bilkent Univ.<br />
Cetin, E., Bilkent Univ.<br />
In this paper, a feature extraction method based on two-dimensional (2D) mel-cepstrum is introduced. Feature matrices<br />
resulting from the 2D mel-cepstrum, Fourier LDA approach and original image matrices are individually applied to the<br />
Common Matrix Approach (CMA) based face recognition system. For each of these feature extraction methods, recognition<br />
rates are obtained in the AR face database, ORL database and Yale database. Experimental results indicate that recognition<br />
rates obtained by the 2D mel-cepstrum method is superior to the recognition rates obtained using Fourier LDA approach<br />
and raw image matrices. This indicates that 2D mel-cepstral analysis can be used in image feature extraction problems.<br />
15:00-17:10, Paper MoBT9.44<br />
Entropy Estimation and Multi-Dimensional Scale Saliency<br />
Suau, Pablo, Univ. of Alicante<br />
Escolano, Francisco, Univ. of Alicante<br />
In this paper we survey two multi-dimensional Scale Saliency approaches based on graphs and the k-d partition algorithm.<br />
In the latter case we introduce a new divergence metric and we show experimentally its suitability. We also show an application<br />
of multi-dimensional Scale Saliency to texture discrimination. We demonstrate that the use of multi-dimensional<br />
data can improve the performance of texture retrieval based on feature extraction.<br />
15:00-17:10, Paper MoBT9.45<br />
A Novel Facial Localization for Three-Dimensional Face using Multi-Level Partition of Unity Implicits<br />
Hu, Yuan, Shanghai Jiao Tong Univ.<br />
Yan, Jingqi, Shanghai Jiao Tong Univ.<br />
Li, Wei, Shanghai Jiao Tong Univ.<br />
Shi, Pengfei, Shanghai Jiao Tong Univ.<br />
This paper presents a novel facial localization method for 3D face in the presence of facial pose and expression variation.<br />
An idea of using Multi-level Partition of Unity (MPU) Implicits in a hierarchical way is proposed for reconstruction of<br />
face surface. Based on the analysis of curvature features, nose and eyeholes regions can be detected on lower level reconstructed<br />
face surface uniquely. Experimental results show that this method is invariant to pose, holes, noise and expression.<br />
The overall performance of 99.18% is achieved.<br />
15:00-17:10, Paper MoBT9.46<br />
Automated Feature Weighting in Fuzzy Declustering-Based Vector Quantization<br />
Ng, Theam Foo, Univ. of New South Wales@ADFA<br />
Pham, Tuan D., Univ. of New South Wales@ADFA<br />
Sun, Changming, CSIRO<br />
- 63 -
Feature weighting plays an important role in improving the performance of clustering technique. We propose an automated<br />
feature weighting in fuzzy declustering-based vector quantization (FDVQ), namely AFDVQ algorithm, for enhancing effectiveness<br />
and efficiency in classification. The proposed AFDVQ imposes weights on the modified fuzzy c-means (FCM)<br />
so that it can automatically calculate feature weights based on their degrees of importance rather than treating them equally.<br />
Moreover, the extension of FDVQ and AFDVQ algorithms based on generalized improved fuzzy partitions (GIFP), known<br />
as GIFP-FDVQ and GIFP-AFDVQ respectively, are proposed. The experimental results on real data (original and noisy<br />
data) and modified data (biased and noisy-biased data) have demonstrated that the proposed algorithms outperformed<br />
standard algorithms in classifying clusters especially for biased data.<br />
15:00-17:10, Paper MoBT9.47<br />
A Discriminative and Heteroscedastic Linear Feature Transformation for Multiclass Classification<br />
Lee, Hung-Shin, National Taiwan Univ.<br />
Wang, Hsin-Min, Acad. Sinica<br />
Chen, Berlin, National Taiwan Normal Univ.<br />
This paper presents a novel discriminative feature transformation, named full-rank generalized likelihood ratio discriminant<br />
analysis (fGLRDA), on the grounds of the likelihood ratio test (LRT). fGLRDA attempts to seek a feature space, which is<br />
linearly isomorphic to the original n-dimensional feature space and is characterized by a full-rank transformation matrix,<br />
under the assumption that all the class-discrimination information resides in a d-dimensional subspace, through making<br />
the most confusing situation, described by the null hypothesis, as unlikely as possible to happen without the homoscedastic<br />
assumption on class distributions. Our experimental results demonstrate that fGLRDA can yield moderate performance<br />
improvements over other existing methods, such as linear discriminant analysis (LDA) for the speaker identification task.<br />
15:00-17:10, Paper MoBT9.48<br />
Sparse Representation Classifier Steered Discriminative Projection<br />
Yang, Jian, Nanjing Univ. of Science and Tech.<br />
Chu, Delin, National Univ. of Singapore<br />
The sparse representation-based classifier (SRC) has been developed and shows great potential for pattern classification.<br />
This paper aims to gain a discriminative projection such that SRC achieves the optimum performance in the projected<br />
pattern space. We use the decision rule of SRC to steer the design of a dimensionality reduction method, which is coined<br />
the sparse representation classifier steered discriminative projection (SRC-DP). SRC-DP matches SRC optimally in theory.<br />
Experiments are done on the AR and extended Yale B face image databases, and results show the proposed method is<br />
more effective than other dimensionality reduction methods with respect to the sparse representation-based classifier.<br />
15:00-17:10, Paper MoBT9.49<br />
Designing a Pattern Stabilization Method using Scleral Blood Vessels for Laser Eye Surgery<br />
Kaya, Aydin, Hacettepe Univ.<br />
Can, Ahmet Burak, Hacettepe Univ.<br />
Çakmak, Hasan Basri, Ataturk Research Hospital<br />
In laser eye surgery, the accuracy of operation depends on coherent eye tracking and registration techniques. Main approach<br />
used in image processing based eye trackers is extraction and tracking of pupil and limbus regions. In eye registration<br />
step, iris region features extracted from infrared images are used generally. Registration step determines the angular shift<br />
of eye origin by comparing the eye position on operation table with the eye topology obtained before the operation. Registration<br />
is only applied at the beginning but patients movements don not stop during operation. Hence we presented a<br />
method for pattern stabilization which can be repeated during operation at regular intervals. We use scleral blood vessels<br />
as features due to texturedness and resistance to errors caused by pupil center shift and ablation of cornea region.<br />
15:00-17:10, Paper MoBT9.51<br />
Aggregation of Probabilistic PCA Mixtures with a Variational-Bayes Technique over Parameters<br />
Bruneau, Pierrick, Nantes Univ.<br />
Gelgon, Marc, Nantes Univ.<br />
Picarougne, Fabien, Nantes Univ.<br />
- 64 -
This paper proposes a solution to the problem of aggregating versatile probabilistic models, namely mixtures of probabilistic<br />
principal component analyzers. These models are a powerful generative form for capturing high-dimensional, non<br />
Gaussian, data. They simultaneously perform mixture adjustment and dimensionality reduction. We demonstrate how such<br />
models may be advantageously aggregated by accessing mixture parameters only, rather than original data. Aggregation<br />
is carried out through Bayesian estimation with a specific prior and an original variational scheme. Experimental results<br />
illustrate the effectiveness of the proposal.<br />
15:00-17:10, Paper MoBT9.52<br />
Kernel Uncorrelated Adjacent-Class Discriminant Analysis<br />
Jing, Xiaoyuan, Nanjing Univ. of Posts and Telecommunications<br />
Li, Sheng, Nanjing Univ. of Posts and Telecommunications<br />
Yao, Yongfang, Nanjing Univ. of Posts and Telecommunications<br />
Bian, Lusha, Nanjing Univ. of Posts and Telecommunications<br />
Yang, Jingyu, Nanjing Univ. of Science and Tech.<br />
In this paper, a kernel uncorrelated adjacent-class discriminant analysis (KUADA) approach is proposed for image recognition.<br />
The optimal nonlinear discriminant vector obtained by this approach can differentiate one class and its adjacent<br />
classes, i.e., its nearest neighbor classes, by constructing the specific between-class and within-class scatter matrices in<br />
kernel space using the Fisher criterion. In this manner, KUADA acquires all discriminant vectors class by class. Furthermore,<br />
KUADA makes every discriminant vector satisfy locally statistical uncorrelated constraints by using the corresponding<br />
class and part of its most adjacent classes. Experimental results on the public AR and CAS-PEAL face databases<br />
demonstrate that the proposed approach outperforms several representative nonlinear discriminant methods.<br />
15:00-17:10, Paper MoBT9.53<br />
A Meta-Learning Approach to Conditional Random Fields using Error-Correcting Output Codes<br />
Ciompi, Francesco, Univ. de Barcelona<br />
Pujol, Oriol, UB<br />
Radeva, Petia, CVC<br />
We present a meta-learning framework for the design of potential functions for Conditional Random Fields. The design<br />
of both node potential and edge potential is formulated as a classification problem where margin classifiers are used. The<br />
set of state transitions for the edge potential is treated as a set of different classes, thus defining a multi-class learning<br />
problem. The Error-Correcting Output Codes (ECOC) technique is used to deal with the multi-class problem. Furthermore,<br />
the point defined by the combination of margin classifiers in the ECOC space is interpreted in a probabilistic manner, and<br />
the obtained distance values are then converted into potential values. The proposed model exhibits very promising results<br />
when applied to two real detection problems.<br />
15:00-17:10, Paper MoBT9.54<br />
Statistical Modeling of Image Degradation based on Quality Metrics<br />
Chetouani, Aladine, Inst. Galilée – Univ. Paris 13<br />
Beghdadi, Azeddine, Univ. Paris 13<br />
Deriche, Mohamed, KFUPM<br />
A plethora of Image Quality Metrics (IQM) has been proposed during the last two decades. However, at present time,<br />
there is no accepted IQM able to predict the perceptual level of image degradation across different types of visual distortions.<br />
Some measures are more adapted for a set of degradations but inefficient for others. Indeed, the efficiency of any<br />
IQM has been shown to depend upon the type of degradation. Thus, we propose here a new approach for predicting the<br />
type of degradation before using IQMs. The basic idea is first to identify the type of distortion using a Bayesian approach,<br />
then select the most appropriate IQM for estimating image quality for that specific type of distortion. The performance of<br />
the proposed method is evaluated in terms of classification accuracy across different types of degradations.<br />
15:00-17:10, Paper MoBT9.55<br />
Performance Evaluation of Automatic Feature Discovery Focused within Error Clusters<br />
Wang, Sui-Yu, Lehigh Univ.<br />
Baird, Henry, Lehigh Univ.<br />
- 65 -
We report performance evaluation of our automatic feature discovery method on the publicly available Gisette dataset: a<br />
set of 29 features discovered by our method ranks 129 among all 411 current entries on the validation set. Our approach<br />
is a greedy forward selection algorithm guided by error clusters. The algorithm finds error clusters in the current feature<br />
space, then projects one tight cluster into the null space of the feature mapping, where a new feature that helps to classify<br />
these errors can be discovered. This method assumes a ``data-rich’’ problem domain and works well when large amount<br />
of labeled data is available. The result on the Gisette dataset shows that our method is competitive to many of the current<br />
feature selection algorithms. We also provide analytical results showing that our method is guaranteed to lower the error<br />
rate on Gaussian distributions and that our approach may outperform the standard Linear Discriminant Analysis (LDA)<br />
method in some cases.<br />
15:00-17:10, Paper MoBT9.56<br />
Optimized Entropy-Constrained Vector Quantization of Lossy Vector Map Compression<br />
Chen, Minjie, Univ. of Eastern Finland<br />
Xu, Mantao, Carestream Health Corp. Shanghai, China<br />
Fränti, Pasi, Univ. of Eastern Finland<br />
Quantization plays an important part in lossy vector map compression, for which the existing solutions are based on either<br />
a fixed size open-loop code<strong>book</strong>, or a simple uniform quantization. In this paper, we proposed an entropy-constrained<br />
vector quantization to optimize both the structure and size of the code<strong>book</strong> at the same time using a closed-loop approach.<br />
In order to lower the distortion to a desirable level, we exploit two-level design strategy, where the vector quantization<br />
code<strong>book</strong> is designed only for most common vectors and the remaining (outlier) vectors are coded by uniform quantization.<br />
15:00-17:10, Paper MoBT9.57<br />
Nonnegative Embeddings and Projections for Dimensionality Reduction and Information Visualization<br />
Zafeiriou, Stefanos, Imperial Coll. of London<br />
Laskaris, Nikolaos, AiiA-Lab. AUTH,<br />
In this paper, we propose novel algorithms for low dimensionality nonnegative embedding of vectorial and/or relational<br />
data, as well as nonnegative projections for dimensionality reduction. We start by introducing a novel algorithm for Metric<br />
Multidimensional Scaling (MMS). We propose algorithms for Nonnegative Locally Linear Embedding (NLLE) and Nonnegative<br />
Laplacian Eigenmaps (NLE). By reformulating the problem of MMS, NLLE and NLE for finding projections<br />
we propose algorithms for Nonnegative Principal Component Analysis (NPCA), for Nonnegative Orthogonal Neighbourhood<br />
Preserving Projections (NONPP) and Nonnegative Orthogonal Locality Preserving Projections (NOLPP). We demonstrate<br />
some first preliminary results of the proposed methods in data visualization.<br />
15:00-17:10, Paper MoBT9.58<br />
Unsupervised Learning from Linked Documents<br />
Guo, Zhen, SUNY at Binghamton<br />
Zhu, Shenghuo, NEC Lab.<br />
Chi, Yun, NEC Lab.<br />
Zhang, Zhongfei, State Univ. of New York, Binghamton<br />
Gong, Yihong, NEC Lab. America, Inc.<br />
Documents in many corpora, such as digital libraries and webpages, contain both content and link information. In a traditional<br />
topic model which plays an important role in the unsupervised learning, the link information is either totally ignored<br />
or treated as a feature similar to content. We believe that neither approach is capable of accurately capturing the relations<br />
represented by links. To address the limitation of traditional topic models, in this paper we propose a citation-topic (CT)<br />
model that explicitly considers the document relations represented by links. In the CT model, instead of being treated as<br />
yet another feature, links are used to form the structure of the generative model. As a result, in the CT model a given document<br />
is modeled as a mixture of a set of topic distributions, each of which is borrowed (cited) from a document that is<br />
related to the given document. We apply the CT model to several document collections and the experimental comparisons<br />
against state-of-the-art approaches demonstrate very promising performances.<br />
- 66 -
15:00-17:10, Paper MoBT9.59<br />
Tensor Power Method for Efficient MAP Inference in Higher-Order MRFs<br />
Semenovich, Dimitri, Univ. of New South Wales<br />
Sowmya, Arcot, Univ. of New South Wales<br />
We present a new efficient algorithm for maximizing energy functions with higher order potentials suitable for MAP inference<br />
in discrete MRFs. Initially we relax integer constraints on the problem and obtain potential label assignments<br />
using higher-order (tensor) power method. Then we utilise an ascent procedure similar to the classic ICM algorithm to<br />
converge to a solution meeting the original integer constraints.<br />
15:00-17:10, Paper MoBT9.60<br />
Detection and Characterization of Anomalous Entities in Social Communication Networks<br />
Gupta, Nithi, Tata Consultancy Services<br />
Dey, Lipika, Tata Consultancy Services<br />
Social networks generated from emails or calls provide enormous geospatial and interaction information about subscribers.<br />
These have served as important inputs to intelligence analysts. In this paper, we propose an efficient algorithm for anomaly<br />
detection from social networks. Anomalous users are detected based on their behavioral dissimilarity from others. A rich<br />
feature set is proposed for outlier detection. A method for providing visual explanation for the results is also proposed.<br />
15:00-17:10, Paper MoBT9.61<br />
Mahalanobis-Based Adaptive Nonlinear Dimension Reduction<br />
Aouada, Djamila, Univ. of Luxembourg, SnT<br />
Baryshnikov, Yuliy, Bell Lab.<br />
Krim, Hamid, NCSU<br />
We define a new adaptive embedding approach for data dimension reduction applications. Our technique entails a local<br />
learning of the manifold of the initial data, with the objective of defining local distance metrics that take into account the<br />
different correlations between the data points. We choose to illustrate the properties of our work on the isomap algorithm.<br />
We show through multiple simulations that the new adaptive version of isomap is more robust to noise than the original<br />
non-adaptive one.<br />
15:00-17:10, Paper MoBT9.62<br />
Maximum Likelihood Estimation of Gaussian Mixture Models using Particle Swarm Optimization<br />
Ari, Caglar, Bilkent Univ.<br />
Aksoy, Selim, Bilkent Univ.<br />
We present solutions to two problems that prevent the effective use of population-based algorithms in clustering problems.<br />
The first solution presents a new representation for arbitrary covariance matrices that allows independent updating of individual<br />
parameters while retaining the validity of the matrix. The second solution involves an optimization formulation<br />
for finding correspondences between different parameter orderings of candidate solutions. The effectiveness of the proposed<br />
solutions are demonstrated on a novel clustering algorithm based on particle swarm optimization for the estimation of<br />
Gaussian mixture models.<br />
15:00-17:10, Paper MoBT9.63<br />
Object Discovery by Clustering Correlated Visual Word Sets<br />
Fuentes Pineda, Gibran, The Univ. of Electro-Communications<br />
Koga, Hisashi, Univ. of Electro-Communications<br />
Watanabe, Toshinori, Univ. of Electro-Communications<br />
This paper presents a novel approach to discovering particular objects from a set of unannotated images. We aim to find<br />
discriminative feature sets that can effectively represent particular object classes (as opposed to object categories). We<br />
achieve this by mining correlated visual word sets from the bag-of-features model. Specifically, we consider that a visual<br />
word set belongs to the same object class if all its visual words consistently occur together in the same image. To efficiently<br />
find such sets we apply Min-LSH to the occurrence vector of the each visual word. An agglomerative hierarchical clustering<br />
- 67 -
is further performed to eliminate redundancy and obtain more representative sets. We also propose a simple and efficient<br />
strategy for quantizing the feature descriptors based on locality-sensitive hashing. By experiment, we show that our approach<br />
can efficiently discover objects against cluster and slight viewpoint variations.<br />
- 68 -
Technical Program for Tuesday<br />
August 24, <strong>2010</strong><br />
- 69 -
- 70 -
TuAT1 Marmara Hall<br />
Object Detection and Recognition – I Regular Session<br />
Session chair: Jiang, Xiaoyi (Univ. of Münster)<br />
09:00-09:20, Paper TuAT1.1<br />
Learning an Efficient and Robust Graph Matching Procedure for Specific Object Recognition<br />
Revaud, Jerome, Univ. de Lyon, CNRS<br />
Lavoue, Guillaume, Univ. de Lyon, CNRS<br />
Ariki, Yasuo, Kobe Univ.<br />
Baskurt, Atilla, LIRIS, INSA Lyon<br />
We present a fast and robust graph matching approach for 2D specific object recognition in images. From a small number<br />
of training images, a model graph of the object to learn is automatically built. It contains its local key points as well as<br />
their spatial proximity relationships. Training is based on a selection of the most efficient subgraphs using the mutual information.<br />
The detection uses dynamic programming with a lattice and thus is very fast. Experiments demonstrate that<br />
the proposed method outperforms the specific object detectors of the state-of-the-art in realistic noise conditions.<br />
09:20-09:40, Paper TuAT1.2<br />
A New Biologically Inspired Feature for Scene Image Classification<br />
Jiang, Aiwen, Chinese Acad. of Sciences<br />
Wang, Chunheng, Chinese Acad. of Sciences<br />
Xiao, Baihua, Chinese Acad. of Sciences<br />
Dai, Ruvei, Chinese Acad. of Sciences<br />
Scene classification is a hot topic in pattern recognition and computer vision area. In this paper, based on the past research<br />
on vision neuroscience, we proposed a new biologically inspired feature method for scene image classification. The new<br />
feature accounts for the visual processing from simple cell to complex cell in V1 area, and also the spatial layout for scene<br />
gist signature. It provides a different line and model revision to consider some nonlinearities inV1 area. We compare it<br />
with traditional HMAX model and recently proposed ScSPM model, and experiment on a popular 15 scenes dataset. We<br />
show that our proposed method has many important differences and merits. The experiment results also show that our<br />
method outperforms the state-of-the-art like ScSPM and KSPM model.<br />
09:40-10:00, Paper TuAT1.3<br />
On a Quest for Image Descriptors based on Unsupervised Segmentation Maps<br />
Koniusz, Piotr, Univ. of Surrey<br />
Mikolajczyk, Krystian, Univ. of Surrey<br />
This paper investigates segmentation-based image descriptors for object category recognition. In contrast to commonly<br />
used interest points the proposed descriptors are extracted from pairs of adjacent regions given by a segmentation method.<br />
In this way we exploit semi-local structural information from the image. We propose to use the segments as spatial bins<br />
for descriptors of various image statistics based on gradient, colour and region shape. Proposed descriptors are validated<br />
on standard recognition benchmarks. Results show they outperform state-of-the-art reference descriptors with 5.6x less<br />
data and achieve comparable results to them with 8.6x less data. The proposed descriptors are complementary to SIFT<br />
and achieve state-of-the-art results when combined together within a kernel based classifier.<br />
10:00-10:20, Paper TuAT1.4<br />
An RST-Tolerant Shape Descriptor for Object Detection<br />
Su, Chih-Wen, Acad. Sinica<br />
Liao, Mark, Acad. Sinica, Taiwan<br />
Liang, Yu-Ming, Acad. Sinica<br />
Tyan, Hsiao-Rong, Chung Yuan Christian Univ.<br />
In this paper, we propose a new object detection method that does not need a learning mechanism. Given a hand-drawn<br />
model as a query, we can detect and locate objects that are similar to the query model in cluttered images. To ensure the<br />
invariance with respect to rotation, scaling, and translation (RST), high curvature points (HCPs) on edges are detected<br />
first. Each pair of HCPs is then used to determine a circular region and all edge pixels covered by the circular region are<br />
- 71 -
transformed into a polar histogram. Finally, we use these local descriptors to detect and locate similar objects within any<br />
images. The experiment results show that the proposed method outperforms the existing state-of-the-art work.<br />
10:20-10:40, Paper TuAT1.5<br />
Inverse Multiple Instance Learning for Classifier Grids<br />
Sternig, Sabine, Graz Univ. of Tech.<br />
Roth, Peter M., Graz Univ. of Tech.<br />
Bischof, Horst, Graz Univ. of Tech.<br />
Recently, classifier grids have shown to be a considerable alternative for object detection from static cameras. However,<br />
one drawback of such approaches is drifting if an object is not moving over a long period of time. Thus, the goal of this<br />
work is to increase the recall of such classifiers while preserving their accuracy and speed. In particular, this is realized<br />
by adapting ideas from Multiple Instance Learning within a boosting framework. Since the set of positive samples is well<br />
defined, we apply this concept to the negative samples extracted from the scene: Inverse Multiple Instance Learning. By<br />
introducing temporal bags, we can ensure that each bag contains at least one sample having a negative label, providing<br />
the required stability. The experimental results demonstrate that using the proposed approach state-of-the-art detection results<br />
can by obtained, however, showing superior classification results in presence of non-moving objects.<br />
TuAT2 Topkapı Hall B<br />
Clustering Regular Session<br />
Session chair: Tasdizen, Tolga (Univ. of Utah)<br />
09:00-09:20, Paper TuAT2.1<br />
On Dynamic Weighting of Data in Clustering with K-Alpha Means<br />
Chen, Si-Bao, Anhui Univ.<br />
Wang, Hai-Xian, Southeast Univ.<br />
Luo, Bin, Anhui Univ.<br />
Although many methods of refining initialization have appeared, the sensitivity of K-Means to initial centers is still an<br />
obstacle in applications. In this paper, we investigate a new class of clustering algorithm, K-Alpha Means (KAM), which<br />
is insensitive to the initial centers. With K-Harmonic Means as a special case, KAM dynamically weights data points<br />
during iteratively updating centers, which deemphasizes data points that are close to centers while emphasizes data points<br />
that are not close to any centers. Through replacing minimum operator in K-Means by alpha-mean operator, KAM significantly<br />
improves the clustering performances.<br />
09:20-09:40, Paper TuAT2.2<br />
ARImp: A Generalized Adjusted Rand Index for Cluster Ensembles<br />
Zhang, Shaohong, City Univ. of Hong Kong<br />
Wong, Hau-San, City Univ. of Hong Kong<br />
Adjusted Rand Index (ARI) is one of the most popular measure to evaluate the consistency between two partitions of data<br />
sets in the areas of pattern recognition. In this paper, ARI is generalized to a new measure, Adjusted Rand Index between<br />
a similarity matrix and a cluster partition (ARImp), to evaluate the consistency between a set of clustering solutions (or<br />
cluster partitions) and their associated consensus matrix in a cluster ensemble. The generalization property of ARImp from<br />
ARI is proved and its preservation of desirable properties of ARI is illustrated with simulated experiments. Also, we show<br />
with application experiments on several real data sets that ARImp can serve as a filter to identify the less effective cluster<br />
ensemble methods.<br />
09:40-10:00, Paper TuAT2.3<br />
On the Scalability of Evidence Accumulation Clustering<br />
Lourenço, André, Inst. Superior de Engenharia de Lisboa (ISEL), Inst. Superior Técnico (IST), IT<br />
Fred, Ana Luisa Nobre, Inst. Superior Técnico<br />
Jain, Anil, Michigan State Univ.<br />
This work focuses on the scalability of the Evidence Accumulation Clustering (EAC) method. We first address the space<br />
- 72 -
complexity of the co-association matrix. The sparseness of the matrix is related to the construction of the clustering ensemble.<br />
Using a split and merge strategy combined with a sparse matrix representation, we empirically show that a linear<br />
space complexity is achievable in this framework, leading to the scalability of EAC method to clustering large data-sets.<br />
10:00-10:20, Paper TuAT2.4<br />
A Hierarchical Clustering Method for Color Quantization<br />
Zhang, Jun, Waseda Univ.<br />
Hu, Jinglu, Waseda Univ.<br />
In this paper, we propose a hierarchical frequency sensitive competitive learning (HFSCL) method to achieve color quantization<br />
(CQ). In HFSCL, the appropriate number of quantized colors and the palette can be obtained by an adaptive procedure<br />
following a binary tree structure with nodes and layers. Starting from the root node that contains all colors in an<br />
image until all nodes are examined by split conditions, a binary tree will be generated. In each node of the tree, a frequency<br />
sensitive competitive learning (FSCL) network is used to achieve two-way division. To avoid over-split, merging condition<br />
is defined to merge the clusters that are close enough to each other at each layer. Experimental results show that HFSCL<br />
has the desired ability for CQ.<br />
10:20-10:40, Paper TuAT2.5<br />
Combining Real and Virtual Graphs to Enhance Data Clustering<br />
Wang, Liang, The Univ. of Melbourne<br />
Leckie, Christopher, The Univ. of Melbourne<br />
Kotagiri, Rao, Univ. of Melbourne<br />
Fusion of multiple information sources can yield significant benefits to accomplishing certain learning tasks. This paper<br />
exploits the sparse representation of signals for the problem of data clustering. The method is built within the framework<br />
of spectral clustering algorithms, which convexly combines a real graph constructed from the given physical features with<br />
a virtual graph constructed from sparse reconstructive coefficients. The experimental results on several real-world data<br />
sets have shown that fusion of both real and virtual graphs can obtain better (or at least comparable) results than using<br />
either graph alone.<br />
TuAT3 Topkapı Hall A<br />
3D Shape Recovery Regular Session<br />
Session chair: Sato, Jun (Nagoya Institute of Technology)<br />
09:00-09:20, Paper TuAT3.1<br />
Calibration Method for Line Structured Light Vision Sensor based on Vanish Points and Lines<br />
Wei, Zhenzhong, Beihang Univ.<br />
Xie, Meng, Beihang Univ. Ministry of Education<br />
Zhang, Guangjun, Beihang Univ.<br />
Line structured light vision sensor (LSLVS) calibration is to establish the location relationship between the camera and<br />
the light plane projector. This paper proposes a geometrical calibration method of LSLVS based on the property of vanish<br />
points and lines, by randomly moving the planar target. This method contains two steps, (1) the vanish point of the light<br />
stripe projected by the light plane is found in each target image, and all the obtained vanish points form the vanish line of<br />
the light plane, which is helpful to determine the normal of the light plane. (2) one 3D feature point on the light plane is<br />
acquired (one is enough, surely can be more than one) to determine d parameter of the light plane. Then the equation of<br />
the light plane under the camera coordinate system can be solved out. Computer simulations and real experiments have<br />
been carried out to validate our method, and the result of the real calibration reaches the accuracy of 0.141mm within the<br />
view field of about 300mm×200mm.<br />
- 73 -
09:20-09:40, Paper TuAT3.2<br />
A Color Invariant based Binary Coded Structured Light Range Scanner for Shiny Objects<br />
Benveniste, Rifat, Yeditepe Univ.<br />
Unsalan, Cem, Yeditepe Univ.<br />
Object range data provide valuable information in recognition and modeling applications. Therefore, it is extremely important<br />
to reliably extract the range data from a given object. There are various range scanners based on different principles.<br />
Among these, structured light based range scanners deserve spacial attention. In these systems, coded light stripes are projected<br />
onto the object. Using the bending of these light stripes on the object and the triangulation principle, range information<br />
can be obtained. Since this method is simple and fast, it is used in most industrial range scanners. Unfortunately,<br />
these range scanners can not scan shiny objects reliably. The main reason is either highlights on the shiny object or the<br />
ambient light in the environment. These disturb the coding by illumination. As the code is changed, the range data extracted<br />
from it will also be disturbed. In this study, we propose a color invariant based binary coded structured light range scanner<br />
to solve this problem. The color invariant used can eliminate the effects of highlights on the object and the ambient light<br />
from the environment. This way, we can extract the range data of shiny objects in a robust manner. To test our method, we<br />
developed a prototype range scanner. We provide the obtained range data of various test objects with our range scanner.<br />
09:40-10:00, Paper TuAT3.3<br />
Improving Shape-From-Focus by Compensating for Image Magnification Shift<br />
Pertuz, Said|, Rovira I Virgili Univ.<br />
Puig, Domenec, Rovira I Virgili Univ.<br />
Garcia, Miguel Angel, Autonomous Univ. of Madrid<br />
Images taken with different focus settings are used in shape-from-focus to reconstruct the depth map of a scene. A problem<br />
when acquiring images with different focus settings is the shift of image features due to changes in magnification. This<br />
paper shows that those changes affect the shape-from-focus performance and that the final reconstruction can be improved<br />
by compensating for that shift. The proposed scheme takes into account the effects due to magnification changes between<br />
near and far focused images and it is able to determine the depth of the scene points with higher accuracy than traditional<br />
techniques. Experimental results of the application of the proposed method are shown.<br />
10:00-10:20, Paper TuAT3.4<br />
Quasi-Dense Wide Baseline Matching for Three Views<br />
Koskenkorva, Pekka, Univ. of Oulu<br />
Kannala, Juho, Univ. of Oulu<br />
Brandt, Sami Sebastian, Univ. of Oulu<br />
This paper proposes a method for computing a quasi-dense set of matching points between three views of a scene. The<br />
method takes a sparse set of seed matches between pairs of views as input and then propagates the seeds to neighboring<br />
regions. The proposed method is based on the best-first match propagation strategy, which is here extended from twoview<br />
matching to the case of three views. The results show that utilizing the three-view constraint during the correspondence<br />
growing improves the accuracy of matching and reduces the occurrence of outliers. In particular, compared with<br />
two-view stereo, our method is more robust for repeating texture. Since the proposed approach is able to produce high<br />
quality depth maps from only three images, it could be used in multi-view stereo systems that fuse depth maps from multiple<br />
views.<br />
10:20-10:40, Paper TuAT3.5<br />
Robust Shape from Polarisation and Shading<br />
Huynh, Cong Phuoc, Australian National Univ.<br />
Robles-Kelly, Antonio, National ICT Australia<br />
Hancock, Edwin, Univ. of York<br />
In this paper, we present an approach to robust estimation of shape from single-view multi-spectral polarisation images.<br />
The developed technique tackles the problem of recovering the azimuth angle of surface normals robust to image noise<br />
and a low degree of polarisation. We note that the linear least-squares estimation results in a considerable phase shift from<br />
the ground truth in the presence of noise and weak polarisation in multispectral and hyper spectral imaging. This paper<br />
discusses the utility of robust statistics to discount the large error attributed to outliers and noise. Combining this approach<br />
- 74 -
with Shape from Shading, we fully recover the surface shape. We demonstrate the effectiveness of the robust estimator<br />
compared to the linear least-squares estimator through shape recovery experiments on both synthetic and real images.<br />
TuAT4 Dolmabahçe Hall A<br />
Signal Separation and Classification Regular Session<br />
Session chair: Erzin, Engin (Koc Univ.)<br />
09:00-09:20, Paper TuAT4.1<br />
Classifying Three-Way Seismic Volcanic Data by Dissimilarity Representation<br />
Porro, Diana, Advanced Tech. Application Center<br />
Duin, Robert, TU Delft<br />
Orozco-Alzate, Mauricio, Univ. Nacional de Colombia Sede Manizales<br />
Talavera, Isneri, Advanced Tech. Application Center<br />
Londoño-Bonilla, John Makario, Inst. Colombiano de Geología y Minería<br />
Multi-way data analysis is a multivariate data analysis technique having a wide application in some fields. Nevertheless,<br />
the development of classification tools for this type of representation is incipient yet. In this paper we study the dissimilarity<br />
representation for the classification of three-way data, as dissimilarities allow the representation of multi-dimensional objects<br />
in a natural way. As an example, the classification of seismic volcanic events is used. It is shown that in this application<br />
classification based on 2D spectrograms, dissimilarities perform better than on 1D spectral features.<br />
09:20-09:40, Paper TuAT4.2<br />
Improved Blur Insensitivity for Decorrelated Local Phase Quantization<br />
Heikkilä, Janne, Univ. of Oulu<br />
Ojansivu, Ville, Univ. of Oulu<br />
Rahtu, Esa, Univ. of Oulu<br />
This paper presents a novel blur tolerant I relation scheme for local phase quantization (LPQ) texture descriptor. As opposed<br />
to previous methods, the introduced model can be applied with virtually any kind of blur regardless of the point spread<br />
function. The new technique takes also into account the changes in the image characteristics originating from the blur<br />
itself. The implementation does not suffer from multiple solutions like the I relation in original LPQ, but still retains the<br />
same run-time computational complexity. The texture classification experiments illustrate considerable improvements in<br />
the performance of LPQ descriptors in the case of blurred images and show only negligible loss of accuracy with sharp<br />
images.<br />
09:40-10:00, Paper TuAT4.3<br />
Ensemble Discriminant Sparse Projections Applied to Music Genre Classification<br />
Kotropoulos, Constantine, Aristotle Univ. of Thessaloniki<br />
Arce, Gonzalo, Univ. of Delaware<br />
Panagakis, Yannis, Aristotle Univ. of Thessaloniki<br />
Resorting to the rich, psycho-physiologically grounded, properties of the slow temporal modulations of music recordings,<br />
a novel classifier ensemble is built, which applies discriminant sparse projections. More specifically, over complete dictionaries<br />
are learned and sparse coefficient vectors are extracted to optimally approximate the slow temporal modulations<br />
of the training music recordings. The sparse coefficient vectors are then projected to the principal subspaces of their withinclass<br />
and between-class covariance matrices. Decisions are taken with respect to the minimum Euclidean distance from<br />
the class mean sparse coefficient vectors, which undergo the aforementioned projections. The application of majority<br />
voting to the decisions taken by 10 individual classifiers, which are trained on the 10 training folds defined by stratified<br />
10-fold cross-validation on the GTZAN dataset, yields a music genre classification accuracy of 84.96% on average. The<br />
latter exceeds by 2.46% the highest accuracy previously reported without employing any sparse representations.<br />
- 75 -
10:00-10:20, Paper TuAT4.4<br />
Single Channel Speech Separation using Source-Filter Representation<br />
Stark, Michael, Graz Univ. of Tech.<br />
Wohlmayr, Michael, Graz Univ. of Tech.<br />
Pernkopf, Franz, Graz Univ. of Tech.<br />
We propose a fully probabilistic model for source-filter based single channel source separation. In particular, we perform<br />
separation in a sequential manner, where we estimate the source-driven aspects by a factorial HMM used for multi-pitch<br />
estimation. Afterwards, these pitch tracks are combined with the vocal tract filter model to form an utterance dependent<br />
model. Additionally, we introduce a gain estimation approach to enable adaptation to arbitrary mixing levels in the speech<br />
mixtures. We thoroughly evaluate this system and finally end up in a speaker independent model.<br />
10:20-10:40, Paper TuAT4.5<br />
Nonlinear Blind Source Separation using Slow Feature Analysis with Random Features<br />
Ma, Kuijun, Chinese Acad. of Sciences<br />
Tao, Qing, Chinese Acad. of Sciences<br />
Wang, Jue, Chinese Acad. of Sciences<br />
We develop an algorithm RSFA to perform nonlinear blind source separation with temporal constraints. The algorithm is<br />
based on slow feature analysis using random Fourier features for shift invariant kernels, followed by a selection procedure<br />
to obtain the sought-after signals. This method not only obtains remarkable results in a short computing time, but also excellently<br />
handles situations where there are multiple types of mixtures. In kernel methods, since the problem is unsupervised,<br />
the need of multiple kernels is ubiquitous. Experiments on music excerpts illustrate the strong performance of our<br />
method.<br />
TuAT5 Anadolu Auditorium<br />
Image Analysis – III Regular Session<br />
Session chair: Kittler, Josef (Univ. of Surrey)<br />
09:00-09:20, Paper TuAT5.1<br />
Canonical Image Selection by Visual Context Learning<br />
Zhou, Wengang, Univ. of Science and Tech. of China<br />
Lu, Yijuan, Texas State Univ. at San Marcos<br />
Li, Houqiang, Univ. of Science and Tech. of China<br />
Tian, Qi, Univ. of Texas at San Antonio<br />
Canonical image selection is to select a subset of photos that best summarize a photo collection. In this paper, we define<br />
the canonical image as those that contain most important and distinctive visual words. We propose to use visual context<br />
learning to discover visual word significance and develop Weighted Set Coverage algorithm to select canonical images<br />
containing distinctive visual words. Experiments with web image datasets demonstrate that the canonical images selected<br />
by our approach are not only representatives of the collected photos, but also exhibit a diverse set of views with minimal<br />
redundancy.<br />
09:20-09:40, Paper TuAT5.2<br />
Exposing Digital Image Forgeries by using Canonical Correlation Analysis<br />
Zhang, Chi, Beijing Univ. of Tech.<br />
Zhang, Hongbin, Beijing Univ. of Tech.<br />
In this paper, we propose a new method to detect the forgeries in digital images by using photo-response non-uniformity<br />
(PRNU) noise features. The method utilizes canonical correlation analysis (CCA) to measure linear correlation relationship<br />
between two sets of PRNU noise estimation from images taken by the same camera. The linear correlation relationship<br />
maximizes the correlation between the noise reference pattern(or PRNU noise estimation) and PRNU noise features from<br />
the same camera. To further improve the detection accuracy rate, the difference of variance between an image region and<br />
its smoothed version is used to categorize the image region into heavily textured region class or non-heavily textured<br />
region class. For a heavily textured region or a non-heavily textured region, Neyman-Pearson decision is used to calculate<br />
the corresponding threshold, and get the final result of detection.<br />
- 76 -
09:40-10:00, Paper TuAT5.3<br />
Adding Affine Invariant Geometric Constraint for Partial-Duplicate Image Retrieval<br />
Wu, Zhipeng, Chinese Acad. of Sciences<br />
Xu, Qianqian, Chinese Acad. of Sciences<br />
Jiang, Shuqiang, Chinese Acad. of Sciences<br />
Huang, Qingming, Chinese Acad. of Sciences<br />
Cui, Peng, Chinese Acad. of Sciences<br />
Li, Liang, Chinese Acad. of Sciences<br />
The spring up of large numbers of partial-duplicate images on the internet brings a new challenge to the image retrieval<br />
systems. Rather than taking the image as a whole, researchers bundle the local visual words by MSER detector into groups<br />
and add simple relative ordering geometric constraint to the bundles. Experiments show that bundled features become<br />
much more discriminative than single feature. However, the weak geometric constraint is only applicable when there is<br />
no significant rotation between duplicate images and it couldn’t handle the circumstances of image flip or large rotation<br />
transformation. In this paper, we improve the bundled features with an affine invariant geometric constraint. It employs<br />
area ratio invariance property of affine transformation to build the affine invariant matrix for bundled visual words. Such<br />
affine invariant geometric constraint can cope well with flip, rotation or other transformations. Experimental results on<br />
the internet partial-duplicate image database verify the promotion it brings to the original bundled features approach. Since<br />
currently there is no available public corpus for partial-duplicate image retrieval, we also publish our dataset for future<br />
studies.<br />
10:00-10:20, Paper TuAT5.4<br />
Outlier-Resistant Dissimilarity Measure for Feature-Based Image Matching<br />
Palenichka, Roman, Univ. of Quebec<br />
Lakhssassi, Ahmed, Univ. of Quebec<br />
Zaremba, Marek, Univ. of Quebec<br />
A novel dissimilarity measure is proposed to perform correspondence image matching for object recognition, image registration<br />
and content-based image retrieval. This is a feature-based matching, which supposes image representation (object<br />
description) in the form of a set of multi-location descriptor vectors. The proposed measure called intersection matching<br />
distance eliminates outlies (false or missing feature points) while transformation-invariantly matching two sets of descriptor<br />
vectors. A block-subdivision algorithm for time-efficient image matching is also described.<br />
10:20-10:40, Paper TuAT5.5<br />
The University of Surrey Visual Concept Detection System at ImageCLEF@<strong>ICPR</strong>: Working Notes<br />
Tahir, Muhammad Atif, Univ. of Surrey<br />
Fei, Yan, Univ. of Surrey<br />
Barnard, Mark, Univ. of Surrey<br />
Awais, Muhammad, Univ. of Surrey<br />
Mikolajczyk, Krystian, Univ. of Surrey<br />
Kittler, Josef, Univ. of Surrey<br />
Visual concept detection is one of the most important tasks in image and video indexing. This paper describes our system<br />
in the Image CLEF@<strong>ICPR</strong> Visual Concept Detection Task which ranked {\it first} for large-scale visual concept detection<br />
tasks in terms of Equal Error Rate (EER) and Area under Curve (AUC) and ranked {\it third} in terms of hierarchical<br />
measure. The presented approach involves state-of-the-art local descriptor computation, vector quantisation via clustering,<br />
structured scene or object representation via localised histograms of vector codes, similarity measure for kernel construction<br />
and classifier learning. The main novelty is the classifier-level and kernel-level fusion using Kernel Discriminant Analysis<br />
with RBF/Power Chi-Squared kernels obtained from various image descriptors. For 32 out of 53 individual concepts, we<br />
obtain the best performance of all 12 submissions to this task.<br />
- 77 -
TuAT6 Dolmabahçe Hall B<br />
Texture Regular Session<br />
Session chair: Theodoridis, Sergios (Univ. of Athens)<br />
09:00-09:20, Paper TuAT6.1<br />
On Adapting Pixel-Based Classification to Unsupervised Texture Segmentation<br />
Melendez, Jaime, Rovira I Virgili Univ.<br />
Puig, Domenec, Univ. Rovira I Virgili<br />
Garcia, Miguel Angel, Autonomous Univ. of Madrid<br />
An inherent problem of unsupervised texture segmentation is the absence of previous knowledge regarding the texture<br />
patterns present in the images to be segmented. A new efficient methodology for unsupervised image segmentation based<br />
on texture is proposed. It takes advantage of a supervised pixel-based texture classifier trained with feature vectors associated<br />
with a set of texture patterns initially extracted through a clustering algorithm. Therefore, the final segmentation is<br />
achieved by classifying each image pixel into one of the patterns obtained after the previous clustering process. Multisized<br />
evaluation windows following a top-down approach are applied during pixel classification in order to improve accuracy.<br />
The proposed technique has been experimentally validated on MeasTex, VisTex and Brodatz compositions, as well<br />
as on complex ground and aerial outdoor images. Comparisons with state-of the-art unsupervised texture segmenters are<br />
also provided.<br />
09:20-09:40, Paper TuAT6.2<br />
Natural Material Recognition with Illumination Invariant Textural Features<br />
Vacha, Pavel, Inst. of Information Theory and Automation<br />
Haindl, Michael, Inst. of Information Theory and Automation<br />
A visual appearance of natural materials fundamentally depends on illumination conditions, which significantly complicates<br />
a real scene analysis. We propose textural features based on fast Markovian statistics, which are simultaneously invariant<br />
to illumination colour and robust to illumination direction. No knowledge of illumination conditions is required and a<br />
recognition is possible from a single training image per material. Material recognition is tested on the currently most realistic<br />
visual representation – Bidirectional Texture Function (BTF), using the Amsterdam Library of Textures (I), which<br />
contains 250 natural materials acquired in different illumination conditions. Our proposed features significantly outperform<br />
several leading alternatives including Local Binary Patterns (LBP, LBP-HF) and Gabor features.<br />
09:40-10:00, Paper TuAT6.3<br />
Gaze-Motivated Compression of Illumination and View Dependent Textures<br />
Filip, Jiri, Inst. of Information Theory and Automation of the AS CR<br />
Haindl, Michael, Inst. of Information Theory and Automation<br />
Chantler, Michael J., Heriot-Watt Univ.<br />
Illumination and view dependent texture provide ample information on the appearance of real materials at the cost of enormous<br />
data storage requirements. Hence, past research focused mainly on compression and modelling of these data, however,<br />
few papers have explicitly addressed the way in which humans perceive these compressed data. We analyzed human<br />
gaze information to determine appropriate texture statistics. These statistics were then exploited in a pilot illumination<br />
and view direction dependent data compression algorithm. Our results showed that taking into account local texture variance<br />
can increase compression of current methods more than twofold, while preserving original realistic appearance and<br />
allowing fast data reconstruction.<br />
10:00-10:20, Paper TuAT6.4<br />
Perceptual Color Texture Code<strong>book</strong>s for Retrieving in Highly Diverse Texture Datasets<br />
Alvarez, Susana, Univ. Rovira I Virgili<br />
Salvetella, Anna, Univ. Autònoma de Barcelona<br />
Vanrell, Maria, Univ. Autònoma de Barcelona<br />
Otazu, Xavier, Univ. Autònoma de Barcelona<br />
Color and texture are visual cues of different nature, their integration in a useful visual descriptor is not an obvious step.<br />
One way to combine both features is to compute texture descriptors independently on each color channel. A second way<br />
- 78 -
is integrate the features at a descriptor level, in this case arises the problem of normalizing both cues. A significant progress<br />
in the last years in object recognition has provided the bag-of-words framework that again deals with the problem of<br />
feature combination through the definition of vocabularies of visual words. Inspired in this framework, here we present<br />
perceptual textons that will allow to fuse color and texture at the level of p-blobs, which is our feature detection step.<br />
Feature representation is based on two uniform spaces representing the attributes of the p-blobs. The low-dimensionality<br />
of these text on spaces will allow to bypass the usual problems of previous approaches. Firstly, no need for normalization<br />
between cues; and secondly, vocabularies are directly obtained from the perceptual properties of text on spaces without<br />
any learning step. Our proposal improve current state-of-art of color-texture descriptors in an image retrieval experiment<br />
over a highly diverse texture dataset from Corel.<br />
10:20-10:40, Paper TuAT6.5<br />
Illumination Estimation of 3D Surface Texture based on Active Basis<br />
Dong, Junyu, Ocean Univ. of China<br />
Su, Liyuan, Ocean Univ. of China<br />
Duan, Yuanxu, Alcatel-Lucent R&D<br />
This paper describes an approach to estimate illumination directions of 3D surface texture based on Active Basis. Instead<br />
of applying Gabor wavelet transform to extract texture features, we represent our texture features with a simple Haar<br />
feature to improve efficiency. The Active Basis model can be learned from training image patches by the shared pursuit<br />
algorithm. The base histogram can then be obtained based on each model. We measure the illumination directions by minimizing<br />
the Euclidean distance and the entropy difference of base histograms between the test image and the training sets.<br />
Experimental results demonstrate the effectiveness and accuracy of the proposed approach.<br />
TuAT7 Dolmabahçe Hall C<br />
Security and Privacy Regular Session<br />
Session chair: Veldhuis, Raymond (Univ of Twente)<br />
09:00-09:20, Paper TuAT7.1<br />
Binary Discriminant Analysis for Face Template Protection<br />
Feng, Y C, Hong Kong Baptist Univ.<br />
Yuen, Pong C, Hong Kong Baptist Univ.<br />
Biometric cryptosystem (BC) is a very secure approach for template protection because the stored template is encrypted.<br />
The key issues in BC approach include(I) limited capability in handling intra-class variations and (ii) binary input is required.<br />
To overcome these problems, this paper adopts the concept of discriminative analysis and develops a new binary<br />
discriminant analysis (BDA) method to convert a real valued template to a binary template. Experimental results on CMU-<br />
PIE and FRGC face databases show that the proposed BDA method outperforms existing template binarization schemes.<br />
09:20-09:40, Paper TuAT7.2<br />
Renewable Minutiae Templates with Tunable Size and Security<br />
Yang, Bian, Gjovik Univ. Coll.<br />
Busch, Christoph, Gjovik Univ. Coll.<br />
Gafurov, Davrondzhon, Gjovik Univ. Coll.<br />
Bours, Patrick, Gjovik Univ. Coll.<br />
A renewable fingerprint minutiae template generation scheme is proposed to utilize random projection for template diversification<br />
in a security enhanced way. The scheme first achieves absolute pre-alignment over local minutiae quadruplets<br />
in the original template and results in a fix-length feature vector; and then encrypts the feature vector by projecting it to<br />
multiple random matrices and quantizing the projected result; and finally post-process the resultant binary vector in a size<br />
and security tunable way to obtain the final protected minutia vicinity. Experiments on the fingerprint database<br />
FVC2002DB2_A demonstrate the desirable biometric performance of the proposed scheme.<br />
- 79 -
09:40-10:00, Paper TuAT7.3<br />
Tokenless Cancelable Biometrics Scheme for Protecting IrisCodes<br />
Ouda, Osama, Chiba Univ.<br />
Tsumura, Norimichi, Chiba Univ.<br />
Nakaguchi, Toshiya, Chiba Univ.<br />
In order to satisfy the requirements of the cancelable biometrics construct, cancelable biometrics techniques rely on other<br />
authentication factors such as password keys and/or user specific tokens in the transformation process. However, such<br />
multi-factor authentication techniques suffer from the same issues associated with traditional knowledge-based and tokenbased<br />
authentication systems. This paper presents a new one-factor cancelable biometrics scheme for protecting Iris Codes.<br />
The proposed method is based solely on Iris Codes; however, it satisfies the requirements of revocability, diversity and<br />
noninvertibility without deteriorating the recognition performance. Moreover, the transformation process is easy to implement<br />
and can be integrated simply with current iris matching systems. The impact of the proposed transformation<br />
process on the the recognition accuracy is discussed and its noninvertibility is analyzed. The effectiveness of the proposed<br />
method is confirmed experimentally using CASIA-IrisV3-Interval dataset.<br />
10:00-10:20, Paper TuAT7.4<br />
A Novel Fingerprint Template Protection Scheme based on Distance Projection Coding<br />
Wang, Ruifang, Chinese Acad. of Sciences<br />
Yang, Xin, Chinese Acad. of Sciences<br />
Liu, Xia, Harbin University of Science and Technology<br />
Zhou, Sujing, Chinese Acad. of Sciences<br />
Li, Peng, Chinese Acad. of Sciences<br />
Cao, Kai, Chinese Acad. of Sciences<br />
Tian, Jie, Chinese Acad. of Sciences<br />
The biometric template, which is stored in the form of raw data, has become the greatest potential threat to the security of<br />
biometric authentication system. As the compromise of the biometric data is permanent, the protection of biometric data<br />
is particularly important. Consequently, biometric template protection technologies have aroused research highlights recently.<br />
One of the most popular template protection methods is biometric cryptosystem method. In this paper, we design<br />
a code<strong>book</strong> named distance projection for biometric coding to generate secured biometric template, and propose a novel<br />
fingerprint biometric cryptosystem scheme based on the code<strong>book</strong>. Experimental results on FVC2002 DB2 show that the<br />
proposed scheme can obtain positive results on both security and authentication accuracy.<br />
10:20-10:40, Paper TuAT7.5<br />
Combination of Symmetric Hash Functions for Secure Fingerprint Matching<br />
Kumar, Gaurav, State Univ. of New York at Buffalo<br />
Tulyakov, Sergey, Univ. at Buffalo<br />
Govindaraju, Venu, Univ. at Buffalo<br />
Fingerprint based secure biometric authentication systems have received considerable research attention lately, where the<br />
major goal is to provide an anonymous, multipliable and easily revocable methodology for fingerprint verification. In our<br />
previous work, we have shown that symmetric hash functions are very effective in providing such secure fingerprint representation<br />
and matching since they are independent of order of minutiae triplets as well as location of singular points<br />
(e.g. core and delta). In this paper, we extend our prior work by generating a combination of symmetric hash functions,<br />
which increases the security of fingerprint matching by an exponential factor. Firstly, we extract kplets from each fingerprint<br />
image and generate a unique key for combining multiple hash functions up to an order of (k-1). Each of these keys is generated<br />
using the features extracted from minutiae k-plets such as bin index of smallest angles in each k-plet. This combination<br />
provides us an extra security in the face of brute force attacks, where the compromise of few hash functions as well<br />
do not compromise the overall matching. Our experimental results suggest that the EER obtained using the combination<br />
of hash functions (4.98%) is comparable with the baseline system (3.0%), with the added advantage of being more secure.<br />
- 80 -
TuAT8 Lower Foyer<br />
Structural Methods and Speech/Image Analysis Poster Session<br />
Session chair: Aguiar, Pedro M. Q. (Institute for Systems and Robotics / Instituto Superior Tecnico)<br />
09:00-11:10, Paper TuAT8.1<br />
Face Recognition based on Illumination Adaptive LDA<br />
Liu, Zhonghua, Nanjing Univ. of Science and Tech.<br />
Zhou, Jingbo, Nanjing Univ. of Science and Tech.<br />
Jin, Zhong, Nanjing Univ. of Science and Tech.<br />
The variation of facial appearance due to the illumination degrades face recognition systems considerably, which is well<br />
known as one of the bottlenecks in face recognition. However, the variations of each subject which are due to the changes<br />
of illumination are extremely similar to each other. We offline collect many face classes each of which has many images<br />
under different lighting conditions, a common within-class scatter matrix describing the within-class illumination variations<br />
of all the face classes can be gotten. Based on this, illumination adaptive linear discriminant analysis (IALDA) is proposed<br />
to solve illumination variation problems in face recognition when each face class has only one training sample under the<br />
standard lighting conditions. In the IALDA method, the illumination direction of an input face image is firstly estimated.<br />
Then the corresponding LDA feature, which is robust to the variations between the images under the estimated lighting<br />
conditions and the standard lighting conditions, is extracted. Experiments on the face databases demonstrate the effectiveness<br />
of the proposed method.<br />
09:00-11:10, Paper TuAT8.2<br />
Topological Dynamic Bayesian Networks<br />
Bouchaffra, Djamel, Grambling State Univ.<br />
The objective of this research is to embed topology within the dynamic Bayesian network (DBN) formalism. This extension<br />
of a DBN (that encodes statistical or causal relationships) to a topological DBN (TDBN) allows continuous mappings<br />
(e.g., topological homeomorphisms), topological relations (e.g., homotopy equivalences) and invariance properties (e.g.,<br />
surface genus, compactness) to be exploited. The mission of TDBN is not limited only to classify objects but to reveal<br />
how these objects are topologically related as well. Because TDBN formalism uses geometric constructors that project a<br />
discrete space onto a continuous space, it is well suited to identify objects that undergo smooth deformation. Experimental<br />
results in face identification across ages represent conclusive evidence that the fusion of statistics and topology embodied<br />
by the TDBN concept holds promise. The TDBN formalism outperformed the DBN approach in facial identification across<br />
ages.<br />
09:00-11:10, Paper TuAT8.3<br />
Vector Space Embedding of Undirected Graphs with Fixed-Cardinality Vertex Sequences for Classification<br />
Richiardi, Jonas, Ec. Pol. Fédérale de Lausanne<br />
Van De Ville, Dimitri, Ec. Pol. Fédérale de Lausanne<br />
Riesen, Kaspar, Univ. of Bern<br />
Bunke, Horst, Univ. of Bern<br />
Simple weighted undirected graphs with a fixed number of vertices and fixed vertex orderings can be used to represent<br />
data and patterns in a wide variety of scientific and engineering domains. Classification of such graphs by existing graph<br />
matching methods perform rather poorly because they do not exploit their specificity. As an alternative, methods relying<br />
on vector-space embedding hold promising potential. We propose two such techniques that can be deployed as a frontend<br />
for any pattern recognition classifiers: one has low computational cost but generates high-dimensional spaces, while<br />
the other is more computationally demanding but can yield relatively low-dimensional vector space representations. We<br />
show experimental results on an fMRI brain state decoding task and discuss the shortfalls of graph edit distance for the<br />
type of graph under consideration.<br />
- 81 -
09:00-11:10, Paper TuAT8.4<br />
Hierarchical Large Margin Nearest Neighbor Classification<br />
Chen, Qiaona, East China Normal Univ.<br />
Sun, Shiliang, East China Normal Univ.<br />
Distance metric learning has exhibited its great power to enhance performance in metric related pattern recognition tasks.<br />
The recent large margin nearest neighbor classification (LMNN) improves the performance of k-nearest neighbor classification<br />
by learning a global distance metric. However, it does not consider the locality of data distributions, which is<br />
crucial in determining a proper metric. In this paper, we propose a novel local distance metric learning method called hierarchical<br />
LMNN (HLMNN) which first builds a hierarchical structure by grouping data points according to the overlapping<br />
ratios defined by us and then learns distance metrics sequentially. Experimental results on real-world data sets including<br />
comparisons with the traditional k-nearest neighbor and the state-of-the-art LMNN show the effectiveness of the proposed<br />
HLMNN.<br />
09:00-11:10, Paper TuAT8.5<br />
Adapting Information Theoretic Clustering to Binary Images<br />
Bauckhage, Christian, Fraunhofer IAIS<br />
Thurau, Christian, Fraunhofer IAIS<br />
We consider the problem of finding points of interest along local curves of binary images. Information theoretic vector<br />
quantization is a clustering algorithm that shifts cluster centers towards the modes of principal curves of a data set. Its<br />
runtime characteristics, however, do not allow for efficient processing of many data points. In this paper, we show how to<br />
solve this problem when dealing with data on a 2D lattice. Borrowing concepts from signal processing, we adapt information<br />
theoretic clustering to the quantization of binary images and gain significant speedup.<br />
09:00-11:10, Paper TuAT8.6<br />
Nearest-Manifold Classification with Gaussian Processes<br />
Jun, Goo, Univ. of Texas at Austin<br />
Ghosh, Joydeep, Univ. of Texas<br />
Manifold models for nonlinear dimensionality reduction provide useful low-dimensional representations of high-dimensional<br />
data. Most manifold models are unsupervised algorithms and map the entire data onto a single manifold. Heterogeneous<br />
data with multiple classes are often better modeled by multiple manifolds rather than by a single global manifold,<br />
but there is no explicit way to compare instances embedded in different subspaces. We propose a novel low-to-high dimensional<br />
mapping using Gaussian processes that offers comparisons in the original space. Based on the mapping, we<br />
propose a nearest-manifold classification algorithm for high-dimensional data. Experimental results show that the proposed<br />
algorithm provides good classification accuracies for problems well-modeled by multiple manifolds.<br />
09:00-11:10, Paper TuAT8.7<br />
Mining Exemplars for Object Modelling using Affinity Propagation<br />
Xia, Shengping, Univ. of York<br />
Liu, Jianjun, Univ. of York<br />
Hancock, Edwin, Univ. of York<br />
This paper focusses on the problem of locating object class exemplars from a large corpus of images using a infinity propagation.<br />
We use attributed relational graphs to represent groups of local invariant features together with their spatial arrangement.<br />
Rather than mining exemplars from the entire graph corpus, we prefer to cluster object specific exemplars. Firstly,<br />
we obtain an object specific cluster of graphs using similarity propagation. The popular a nity propagation method is then<br />
individually applied to each object specific cluster. Using this clustering method, we can obtain object specific exemplars<br />
together with a high precision for the data associated with each exemplar. Experiments are performed on over 80K images<br />
spanning 500 objects, and demonstrate the performance of the method in terms of efficiency, scalability.<br />
- 82 -
09:00-11:10, Paper TuAT8.8<br />
Background Filtering for Improving of Object Detection in Images<br />
Qin, Ge, Univ. of Surrey<br />
Vrusias, Bogdan, Univ. of Surrey<br />
Gillam, Lee, Univ. of Surrey<br />
We propose a method for improving object recognition in street scene images by identifying and filtering out background<br />
aspects. We analyse the semantic relationships between foreground and background objects and use the information obtained<br />
to remove areas of the image that are misclassified as foreground objects. We show that such background filtering<br />
improves the performance of four traditional object recognition methods by over 40%. Our method is independent of the<br />
recognition algorithms used for individual objects, and can be extended to generic object recognition in other environments<br />
by adapting other object models.<br />
09:00-11:10, Paper TuAT8.9<br />
Sparse Local Discriminant Projections for Feature Extraction<br />
Lai, Zhihui, Nanjing Univ. of Science and Tech.<br />
Jin, Zhong, Nanjing Univ. of Science and Tech.<br />
Yang, Jian, Nanjing Univ. of Science and Tech.<br />
Wong, W.K., The Hong Kong Pol. Univ.<br />
One of the major disadvantages of the linear dimensionality reduction algorithms, such as Principle Component Analysis<br />
(PCA) and Linear Discriminant Analysis (LDA), are that the projections are linear combination of all the original features<br />
or variables and all weights in the linear combination known as loadings are typically non-zero. Thus, they lack physical<br />
interpretation in many applications. In this paper, we propose a novel supervised learning method called Sparse Local<br />
Discriminant Projections (SLDP) for linear dimensionality reduction. SLDP introduces a sparse constraint into the objective<br />
function and obtains a set of sparse projective axes with directly physical interpretation. The sparse projections can be efficiently<br />
computed by the Elastic Net combining with spectral analysis. The experimental results show that SLDP give<br />
the explicit interpretation on its projections and achieves competitive performance compared with some dimensionality<br />
reduction techniques.<br />
09:00-11:10, Paper TuAT8.10<br />
Information-Theoretic Feature Selection from Unattributed Graphs<br />
Bonev, Boyan, Univ. of Alicante<br />
Escolano, Francisco, Univ. of Alicante<br />
Giorgi, Daniela, National Res. Council<br />
Biasotti, Silvia, CNR – IMATI<br />
In this work we evaluate purely structural graph measures for 3D objects classification. We extract spectral features from<br />
different Reeb graph representations. Information-theoretic feature selection gives an insight on which are the most relevant<br />
features.<br />
09:00-11:10, Paper TuAT8.11<br />
Head Pose Estimation based on Random Forests for Multiclass Classification<br />
Huang, Chen, Tsinghua Univ.<br />
Ding, Xiaoqing, Tsinghua Univ.<br />
Fang, Chi, Tsinghua Univ.<br />
Head pose estimation remains a unique challenge for computer vision system due to identity variation, illumination<br />
changes, noise, etc. Previous statistical approaches like PCA, linear discriminative analysis (LDA) and machine learning<br />
methods, including SVM and Adaboost, cannot achieve both accuracy and robustness that well. In this paper, we propose<br />
to use Gabor feature based random forests as the classification technique since they naturally handle such multi-class classification<br />
problem and are accurate and fast. The two sources of randomness, random inputs and random features, make<br />
random forests robust and able to deal with large feature spaces. Besides, we implement LDA as the node test to improve<br />
the discriminative power of individual trees in the forest, with each node generating both constant and variant number of<br />
children nodes. Experiments are carried out on two public databases to show the proposed algorithm outperforms other<br />
approaches in both accuracy and computational efficiency.<br />
- 83 -
09:00-11:10, Paper TuAT8.12<br />
Differential Morphological Decomposition Segmentation: A Multi-Scale Object based Image Description<br />
Gueguen, Lionel, JRC – European Commission<br />
Soille, Pierre, Ec. Joint Res. Centre<br />
Pesaresi, Martino, Ec. Joint Res. Centre<br />
In order to describe, to extract image information content, segmentation is a well-known approach to represent the information<br />
in terms of objects. Image segmentation is a common image processing technique aiming at disintegrating an<br />
image into a partition of its support. Hierarchical of fuzzy segmentation are extension of segmentation definition, in order<br />
to provide a covering of the image support with overlapping segments. In this paper, we propose a novel approach for<br />
breaking up an image into multi-scale overlapping objects. The image is decomposed by granulometry or differential morphological<br />
pyramid, resulting in a discrete scale-space representation. Then, the scale-space transform is segmented by a<br />
region based method. Projecting the obtained scale-space partition into space constitutes the disintegrated image representation,<br />
which enables a multi-scale object based image description.<br />
09:00-11:10, Paper TuAT8.13<br />
Efficient Learning to Label Images<br />
Jia, Ke, Australian National Univ. National ICT Australia<br />
Cheng, Li, NICTA<br />
Liu, Nianjun, NICTA<br />
Wang, Lei, The Australian National Univ.<br />
Conditional random field methods (CRFs) have gained popularity for image labeling tasks in recent years. In this paper,<br />
we describe an alternative discriminative approach, by extending the large margin principle to incorporate spatial correlations<br />
among neighboring pixels. In particular, by explicitly enforcing the sub modular condition, graph-cuts is conveniently<br />
integrated as the inference engine to attain the optimal label assignment efficiently. Our approach allows learning<br />
a model with thousands of parameters, and is shown to be capable of readily incorporating higher-order scene context.<br />
Empirical studies on a variety of image datasets suggest that our approach performs competitively compared to the stateof-the-art<br />
scene labeling methods.<br />
09:00-11:10, Paper TuAT8.14<br />
NAVIDOMASS: Structural-Based Approaches towards Handling Historical Documents<br />
Jouili, Salim, LORIA<br />
Coustaty, Mickaël, Univ. of La Rochelle<br />
Tabbone, Salvatore, Univ. Nancy 2-LORIA<br />
Ogier, Jean-Marc, Univ. de la Rochelle<br />
In the context of the NAVIDOMASS project, the problematic of this paper concerns the clustering of historical document<br />
images. We propose a structural-based framework to handle the ancient ornamental letters data-sets. The contribution,<br />
firstly, consists of examining the structural (i.e. graph) representation of the ornamental letters, secondly, the graph matching<br />
problem is applied to the resulted graph-based representations. In addition, a comparison between the structural (graphs)<br />
and statistical (generic Fourier descriptor) techniques is drawn.<br />
09:00-11:10, Paper TuAT8.15<br />
Median Graph Shift: A New Clustering Algorithm for Graph Domain<br />
Jouili, Salim, LORIA<br />
Tabbone, Salvatore, Univ. Nancy 2-LORIA<br />
Lacroix, Vinciane, Royal Military Acad. Belgium<br />
In the context of unsupervised clustering, a new algorithm for the domain of graphs is introduced. In this paper, the key idea<br />
is to adapt the mean-shift clustering and its variants proposed for the domain of feature vectors to graph clustering. These algorithms<br />
have been applied successfully in image analysis and computer vision domains. The proposed algorithm works in<br />
an iterative manner by shifting each graph towards the median graph in a neighborhood. Both the set median graph and the<br />
generalized median graph are tested for the shifting procedure. In the experiment part, a set of cluster validation indices are<br />
used to evaluate our clustering algorithm and a comparison with the well-known Kmeans algorithm is provided.<br />
09:00-11:10, Paper TuAT8.16<br />
- 84 -
A Discrete Labelling Approach to Attributed Graph Matching using SIFT Features<br />
Sanroma, Gerard, Univ. Rovira I Virgili<br />
Alquezar, Rene, Univ. Pol. De Catalunya<br />
Serratosa, Francesc, Univ. Rovira I Virgili<br />
Local invariant feature extraction methods are widely used for image-features matching. There exist a number of approaches<br />
aimed at the refinement of the matches between image-features. It is a common strategy among these approaches<br />
to use geometrical criteria to reject a subset of outliers. One limitation of the outlier rejection design is that it is unable to<br />
add new useful matches. We present a new model that integrates the local information of the SIFT descriptors along with<br />
global geometrical information to estimate a new robust set of feature-matches. Our approach encodes the geometrical information<br />
by means of graph structures while posing the estimation of the feature-matches as a graph matching problem.<br />
Some comparative experimental results are presented.<br />
09:00-11:10, Paper TuAT8.17<br />
A Conductance Electrical Model for Representing and Matching Weighted Undirected Graphs<br />
Igelmo, Manuel, Univ. Pol. De Catalunya<br />
Sanfeliu, Alberto, Univ. Pol. De Catalunya<br />
Ferrer, Miquel, Univ. Pol. De Catalunya<br />
In this paper we propose a conductance electrical model to represent weighted undirected graphs that allows us to efficiently<br />
compute approximate graph isomorphism in large graphs. The model is built by transforming a graph into an electrical<br />
circuit. Edges in the graph become conductances in the electrical circuit. This model follows the laws of the electrical<br />
circuit theory and we can potentially use all the existing theory and tools of this field to derive other approximate techniques<br />
for graph matching. In the present work, we use the proposed circuital model to derive approximated graph isomorphism<br />
solutions.<br />
09:00-11:10, Paper TuAT8.18<br />
Computing the Barycenter Graph by Means of the Graph Edit Distance<br />
Bardaji, Itziar, Univ. Pol. De Catalunya<br />
Ferrer, Miquel, Univ. Pol. De Catalunya<br />
Sanfeliu, Alberto, Univ. Pol. De Catalunya<br />
The barycenter graph has been shown as an alternative to obtain the representative of a given set of graphs. In this paper<br />
we propose an extension of the original algorithm which makes use of the graph edit distance in conjunction with the<br />
weighted mean of a pair of graphs. Our main contribution is that we can apply the method to attributed graphs with any<br />
kind of labels in both the nodes and the edges, equipped with a distance function less constrained than in previous approaches.<br />
Experiments done on four different datasets support the validity of the method giving good approximations of<br />
the barycenter graph.<br />
09:00-11:10, Paper TuAT8.19<br />
Refined Morphological Methods of Moment Computation<br />
Suk, Tomas, Inst. of Information Theory and Automation<br />
Flusser, Jan, Inst. of Information Theory and Automation<br />
A new method of moment computation based on decomposition of the object into rectangular blocks is presented. The decomposition<br />
is accomplished by means of distance transform. The method is compared with earlier morphological methods,<br />
namely with erosion decomposition to squares. All the methods are also compared with direct computation by definition.<br />
09:00-11:10, Paper TuAT8.20<br />
Robust Computation of the Polarisation Image<br />
Saman, Gule, Univ. of York<br />
Hancock, Edwin, Univ. of York<br />
In this paper we show how to render the computation of polarisation information from multiple polariser angle images robust.<br />
We make two contributions. First, we show how to use M-estimators to make robust moments estimates of the mean<br />
intensity, polarisation and phase. Second, we show how directional statistics can be used to smooth the phase-angle, and<br />
- 85 -
to improve its estimation when the polarisation is small. We apply the resulting techniques to polariser images and perform<br />
surface quality inspection. Compared to polarisation information delivered by the three-point method, our estimates reveal<br />
finer surface detail.<br />
09:00-11:10, Paper TuAT8.21<br />
Fast Polar and Spherical Fourier Descriptors for Feature Extraction<br />
Yang, Zhuo, Waseda Univ.<br />
Kamata, Sei-Ichiro, Waseda Univ.<br />
Polar Fourier Descriptor(PFD) and Spherical Fourier Descriptor(SFD) are rotation invariant feature descriptors for two<br />
dimensional(2D) and three dimensional(3D) image retrieval and pattern recognition tasks. They are demonstrated to show<br />
superiorities compared with other methods on describing rotation invariant features of 2D and 3D images. However in<br />
order to increase the computation speed, fast computation method is needed especially for applications like real-time systems<br />
and large image databases. This paper presents fast computation method for PFD and SFD that based on mathematical<br />
properties of trigonometric functions and associated Legendre polynomials. Proposed fast PFD and SFD are 8 and 16<br />
times faster than traditional ones that significantly boost computation process.<br />
09:00-11:10, Paper TuAT8.22<br />
RBM-Based Silhouette Encoding for Human Action Modelling<br />
Marin-Jimenez, Manuel Jesus, Univ. of Cordoba<br />
Perez De La Blanca, Nicolas, UGR<br />
Mendoza Perez, Maria Angeles, Univ. de Granada<br />
In this paper we evaluate the use of Restricted Bolzmann Machines (RBM) in the context of learning and recognizing<br />
human actions. The features used as basis are binary silhouettes of persons. We test the proposed approach on two datasets<br />
of human actions where binary silhouettes are available: ViHASi (synthetic data) and Weizmann (real data). In addition,<br />
on Weizmann dataset, we combine features based on optical flow with the associated binary silhouettes. The results show<br />
that thanks to the use of RBM-based models, very informative and shorter feature vectors can be obtained for the classification<br />
tasks, improving the classification performance.<br />
09:00-11:10, Paper TuAT8.23<br />
Shape Classification using Tree-Unions<br />
Wang, Bo, Huazhong Univ. of Science and Tech.<br />
Shen, Wei, Huazhong Univ. of Science and Tech.<br />
Liu, Wenyu, Huazhong Univ. of Science and Tech.<br />
You, Xinge, Huazhong Univ. of Science and Tech.<br />
Bai, Xiang, Huazhong Univ. of Science and Tech.<br />
In this paper, we proposed a novel approach to shape classification. A new shape tree based on junction nodes can represent<br />
the global structure in a simple way. The statistic distribution of junctions can be learned by merging the shape trees. In<br />
the process of learning, context of a junction node is obtained to improve the rate of classification. We illustrate the utility<br />
of the proposed method on the problem of 2D shape classification using the new shape tree representation.<br />
09:00-11:10, Paper TuAT8.24<br />
Sparse Coding of Linear Dynamical Systems with an Application to Dynamic Texture Recognition<br />
Ghanem, Bernard, Univ. of Illinois at Urbana-Champaign<br />
Ahuja, Narendra,<br />
Given a sequence of observable features of a linear dynamical system (LDS), we propose the problem of finding a representation<br />
of the LDS which is sparse in terms of a given dictionary of LDSs. Since LDSs do not belong to Euclidean<br />
space, traditional sparse coding techniques do not apply. We propose a probabilistic framework and an efficient MAP algorithm<br />
to learn this sparse code. Since dynamic textures (DTs) can be modeled as LDSs, we validate our framework and<br />
algorithm by applying them to the problems of DT representation and DT recognition. In the case of occlusion, we show<br />
that this sparse coding scheme outperforms conventional DT recognition methods.<br />
- 86 -
09:00-11:10, Paper TuAT8.25<br />
Background Modeling by Combining Joint Intensity Histogram with Time-Sequential Data<br />
Kita, Yasuyo, National Inst. of Advanced Industrial Science and Technology<br />
In this paper, a method for detecting changes from time-sequential images of outside scenes which are taken with several<br />
minutes interval is proposed. Recently, statistical background intensity model per pixel using Gaussian mixture model<br />
(GMM) has shown its effectiveness for detecting changes from video streams. However, when the time interval between<br />
consecutive images is long, enough number of frames can not be sampled for building useful GMM. To robustly build a<br />
pixel wise background model at time t0 from small number of fore and aft frames, we propose to use the joint intensity<br />
histogram of the images at time t0 and t0 + 1, H(It0, Ito+1). Under background dominance condition, background probability<br />
distribution for each intensity level at t0 can be estimated from H(It0, Ito+1). By taking this background probability<br />
distribution per intensity as a prior probability, GMM which models the variation in each pixel is robustly calculated even<br />
from several frames. Experimental results using actual field monitoring images have shown the advantage of the proposed<br />
method.<br />
09:00-11:10, Paper TuAT8.26<br />
2LDA: Segmentation for Recognition<br />
Perina, Alessandro, Univ. of Verona<br />
Cristani, Marco, Univ. of Verona<br />
Murino, Vittorio, Univ. of Verona<br />
Following the trend of segmentation for recognition, we present 2LDA, a novel generative model to automatically segment<br />
an image in 2 segments, background and foreground, while inferring a latent Dirichlet allocation (LDA) topic distribution<br />
on both segments. The idea is to merge two separate modules, LDA and the segmentation module, explicitly considering<br />
(and exchanging) the uncertainty between them. The resulting model adds spatial relationships to LDA, which in turn<br />
helps in using the topics to segment an image. The experimental results show that, unlike LDA, our model can be used to<br />
recognize objects, and also outperforms the state of the art algorithms.<br />
09:00-11:10, Paper TuAT8.27<br />
Modeling and Generalization of Discrete Morse Terrain Decompositions<br />
De Floriani, L.<br />
Magillo, Paola, Univ. of Genova<br />
Vitali, Maria, DISI, Univ. of Genova<br />
We address the problem of morphological analysis of real terrains. We describe a morphological model for a terrain by<br />
considering extensions of Morse theory to the discrete case. We propose a two-level model of the morphology of a terrain<br />
based on a graph joining the critical points of the terrain through integral lines. We present a new set of generalization operators<br />
specific for discrete piece-wise linear terrain models, which are used to reduce noise and the size of the morphological<br />
representation. We show results of our approach on real terrains.<br />
09:00-11:10, Paper TuAT8.28<br />
Region Description using Extended Local Ternary Patterns<br />
Liao, Wen-Hung, National Chengchi Univ.<br />
The local binary pattern (LBP) operator is a computationally efficient local texture descriptor and has found many useful<br />
applications. However, its sensitivity to noise and the high dimensionality of histogram associated with a mediocre size<br />
neighborhood have raised some concerns. In this paper, we attempt to improve the original LBP by proposing a novel extension<br />
named extended local ternary pattern (ELTP). We will investigate the characteristics of ELTP in terms of noise<br />
sensitivity, discriminability and computational efficiency. Preliminary experimental results have shown better efficacy of<br />
ELTP over the original LBP.<br />
09:00-11:10, Paper TuAT8.29<br />
A Novel Multi-View Agglomerative Clustering Algorithm based on Ensemble of Partitions on Different Views<br />
Mirzaei, Hamidreza, SFU<br />
- 87 -
In this paper, we propose a new algorithm for extending the hierarchical clustering methods and introduce a Multi-View<br />
Agglomerative Clustering approach to handle multi-view represented objects. Experiments on real world datasets indicate<br />
that our algorithm considering the relationship among multiple views can provide a solution with improved quality in<br />
multi-view setting. We find empirically that the multi-view version of our Agglomerative Clustering, independent of<br />
linkage method and given any number of views, greatly improves on its single-view counterparts.<br />
09:00-11:10, Paper TuAT8.30<br />
Hydroacoustic Signal Classification using Kernel Functions for Variable Feature Sets<br />
Tuma, Matthias, Ruhr-Univ. Bochum<br />
Igel, Christian, Ruhr-Univ. Bochum<br />
Prior, Mark, Preparatory Commission for the CTBTO<br />
Large-scale geophysical monitoring systems raise the need for real-time feature extraction and signal classification. We<br />
study support vector machine (SVM) classification of hydroacoustic signals recorded by the Comprehensive Nuclear-<br />
Test-Ban Treaty’s verification network. Due to constraints in the early signal processing most samples have incomplete<br />
feature sets with values missing not at random. We propose kernel functions explicitly incorporating Boolean representations<br />
of the missingness pattern through dedicated sub-kernels. For kernels with more than a few parameters, gradientbased<br />
model selection algorithms were employed. In the case of binary classification, an increase in classification accuracy<br />
as compared to baseline SVM and linear classifiers was observed. In the multi-class case we evaluated four different formulations<br />
of multi-class SVMs. Here, neither SVMs with standard nor with problem-specific kernels outperformed a baseline<br />
linear discriminant analysis.<br />
09:00-11:10, Paper TuAT8.31<br />
Large Margin Discriminant Hashing for Fast K-Nearest Neighbor Classification<br />
Shibata, Tomoyuki, Toshiba Corp.<br />
Kubota, Susumu, Toshiba Corp.<br />
Ito, Satoshi, Toshiba Corp.<br />
Since the k-nearest neighbor (k-NN) classification is computationally demanding in terms of time and memory, approximate<br />
nearest neighbor (ANN) algorithms that utilize dimensionality reduction and hashing are gathering interest. Dimensionality<br />
reduction saves memory usage for storing training patterns and hashing techniques significantly reduce the<br />
computation required for distance calculation. Several ANN methods have been proposed which make k-NN classification<br />
applicable to those tasks that have a large number of training patterns with very high-dimensional feature. Though conventional<br />
ANN methods try to approximate Euclidean distance calculation in the original high-dimensional feature space<br />
with much lower-dimensional subspace, the Euclidean distance in the original feature space is not necessarily optimal for<br />
classification. According to the recent studies, metric learning is effective to improve accuracy of the k-NN classification.<br />
In this paper, Large Margin Discriminative Hashing (LMDH) method, which projects input patterns into low dimensional<br />
subspace with the optimized metric for the k-NN classification, is proposed.<br />
09:00-11:10, Paper TuAT8.32<br />
Robust Frame-To-Frame Hybrid Matching<br />
Chen, Lei, Beijing Inst. of Tech.<br />
Jia, Yunde, Beijing Inst. of Tech.<br />
Wang, Zhongli, Beijing Inst. of Tech.<br />
In this paper, we propose a hybrid approach for addressing feature-based matching problem. We aim to obtain robust and<br />
accurate correspondence between features from image frames under unknown and unstructured environments. The approach<br />
incorporates image texture analysis, 2-D analytic signal theory and color modeling. It takes advantage of geometric<br />
invariant property in texture and monogenic signal information as well as photometric invariant property in HSV color<br />
information. The detected features are well localized with high accuracy and the selected matches are robust to changes<br />
in scale, blur, viewpoint, and illumination. Experiments conducted on a standard benchmark dataset demonstrate the effectiveness<br />
and reliability of our approach.<br />
- 88 -
09:00-11:10, Paper TuAT8.33<br />
A Fast Extension for Sparse Representation on Robust Face Recognition<br />
Qiu, Hui-Ning, Sun Yat-sen Univ.<br />
Pham, Duc-Son, Curtin Univ. of Tech.<br />
Venkatesh, Svetha, Curtin Univ. of Tech<br />
Liu, Wanquan, Curtin Univ. of Tech.<br />
Lai, Jian-Huang, Sun Yat-sen Univ.<br />
We extend a recent Sparse Representation-based Classification (SRC) algorithm for face recognition to work on 2D images<br />
directly, aiming to reduce the computational complexity whilst still maintaining performance. Our contributions include:<br />
(1) a new 2D extension of SRC algorithm; (2) an incremental computing procedure which can reduce the eigen decomposition<br />
expense of each 2D-SRC for sequential input data; and (3) extensive numerical studies to validate the proposed<br />
methods.<br />
09:00-11:10, Paper TuAT8.34<br />
A MANOVA of Major Factors of RIU-LBP Feature for Face Recognition<br />
Luo, Jie, Shanghai Univ.<br />
Fang, Yuchun, Shanghai Univ.<br />
Cai, Qiyun, Shanghai Univ.<br />
Local Binary Patterns (LBP) feature is one of the most popular representation schemes for face recognition. The four<br />
factors deciding its effect are the blocking number, image resolution, the sampling radius and sampling density of LBP<br />
operator. Numerous previous researches have taken various groups of value of these factors based on experimental comparisons.<br />
However, which factor among them contributes the most? Numerous revisions are made to the LBP operators<br />
for it is believed that the LBP coding is the most essential factor. Is it true? In this paper, with the very simple and classical<br />
Multivariate Analysis of Variance (MANOVA), we discover that the blocking number contributes the most; though all<br />
four factors have significant effect for recognition rate. In addition, with the same analysis, we disclose the detailed effect<br />
of each factor and their interactions to the precision of LBP features.<br />
09:00-11:10, Paper TuAT8.35<br />
Consistent Estimators of Median and Mean Graph<br />
Jain, Brijnesh J., Berlin Univ. of Tech.<br />
Obermayer, Klaus, Berlin Univ. of Tech.<br />
The median and mean graph are basic building blocks for statistical graph analysis and unsupervised pattern recognition<br />
methods such as central clustering and graph quantization. This contribution provides sufficient conditions for consistent<br />
estimators of true but unknown central points of a distribution on graphs.<br />
09:00-11:10, Paper TuAT8.36<br />
Efficient Encoding of N-D Combinatorial Pyramids<br />
Fourey, Sébastien, GREYC Ensicaen & Univ. of Caen<br />
Brun, Luc, ENSICAEN<br />
Combinatorial maps define a general framework which allows to encode any subdivision of an n-D orientable quasi-manifold<br />
with or without boundaries. Combinatorial pyramids are defined as stacks of successively reduced combinatorial<br />
maps. Such pyramids provide a rich framework which allows to encode fine properties of objects (either shapes or partitions).<br />
Combinatorial pyramids have first been defined in 2D, then extended using n-D generalized combinatorial maps.<br />
We motivate and present here an implicit and efficient way to encode pyramids of n-D combinatorial maps.<br />
- 89 -
09:00-11:10, Paper TuAT8.37<br />
View-Invariant Object Recognition with Visibility Maps<br />
Raytchev, Bisser, Hiroshima Univ.<br />
Mino, Tetsuya, Hiroshima Univ.<br />
Tamaki, Toru, Hiroshima Univ.<br />
Kaneda, Kazufumi, Hiroshima Univ.<br />
In this paper we propose a new framework for view-invariant 3D object recognition, based on what we call Visibility Maps.<br />
A Visibility Map (VM) encodes a compact model of an arbitrary 3D object for which a set of images taken from different<br />
views is available. Representative local invariant features extracted from each image are selectively combined to form a visibility<br />
basis, in terms of which an arbitrary view of the modeled object can be represented. A metric which incorporates geometric<br />
information is also provided for comparing test images to the model, and can be used for recognition.<br />
09:00-11:10, Paper TuAT8.38<br />
Normalized Sum-Over-Paths Edit Distances<br />
García, Silvia, Univ. Catholique de Louvain<br />
Fouss, François, Facultés Univ. Catholiques de Mons<br />
Shimbo, Masashi, Graduate School of Information Science<br />
Saerens, Marco, Univ. Catholique de Louvain<br />
In this paper, normalized SoP string-edit distances, taking into account all possible alignments between two sequences, are<br />
investigated. These normalized distances are variants of the Sum-over-Paths (SoP) distances which compute the expected<br />
cost on all sequence alignments by favoring low-cost ones therefore favoring good alignment. Such distances consider two<br />
sequences tied by many optimal or nearly-optimal alignments as more similar than two sequences sharing only one, optimal,<br />
alignment. They depend on a parameter, and reduce to the standard distances the edit-distance or the longest common subsequence<br />
when 0, while having the same time complexity. This paper puts the emphasis on applying some type of normalization<br />
of the expectation of the cost. Experimental results for clustering and classification tasks performed on four OCR<br />
data sets show that (I) the applied normalization generally improves the existing results, and (ii) as for the SoP edit-distances,<br />
the normalized SoP edit-distances clearly outperform the non-randomized measures, i.e. the standard edit distance and longest<br />
common subsequence.<br />
09:00-11:10, Paper TuAT8.39<br />
Effective Multi-Level Image Representation for Image Categorization<br />
Li, Hao, Peking Univ.<br />
Peng, Yuxin, Peking Univ.<br />
This paper proposes a novel approach for image categorization based on effective multi-level image representation(MLIR).<br />
On one hand, to exploit fully the information of segmented regions at different levels in the image, we recursively segment<br />
the image into a hierarchical structure. On the other hand, to represent the information at different levels in a uniform manner,<br />
we construct a visual vocabulary based on the image regions of the hierarchical structure by a random sampling strategy.<br />
And the intermediate feature mapping is adopted to form a multi-level image representation, which encodes the information<br />
of the image at different levels, and can be very useful for distinguishing images from different categories. Experimental results<br />
on the widely used COREL data set have shown our proposed approach can achieve significant improvement compared<br />
with the state-of-the-art methods.<br />
09:00-11:10, Paper TuAT8.40<br />
Classification of Volcano Events Observed by Multiple Seismic Stations<br />
Duin, Robert, TU Delft<br />
Orozco-Alzate, Mauricio, Univ. Nacional de Colombia Sede Manizales, Colombia<br />
Londoño-Bonilla, John Makario, Inst. Colombiano de Geología y Minería (INGEOMINAS), Colombia<br />
Seismic events in and around volcanos, like tremors, earth quakes, ice quakes and strokes of lightning, are usually observed<br />
by multiple stations. The question rises whether classifiers trained for one seismic station can be used for classifying observations<br />
by other stations, and, moreover, whether a combination of station signals improves the classification performances<br />
for a single station. We study this for seismic time signals represented by spectra and spectrograms obtained from 5 seismic<br />
stations on the Nevado del Ruiz in Colombia.<br />
- 90 -
09:00-11:10, Paper TuAT8.41<br />
A Variational Bayesian EM Algorithm for Tree Similarity<br />
Takasu, Atsuhiro, National Inst. of Informatics<br />
Fukagawa, Daiji, National Inst. of Informatics<br />
Akutsu, Tatsuya, Kyoto Univ.<br />
In recent times, a vast amount of tree-structured data has been generated. For mining, retrieving, and integrating such data,<br />
we need a fine-grained tree similarity measure that can be adapted to objective data. To achieve this goal, this paper (1)<br />
proposes a probabilistic generative model that generates pairs of similar trees, and (2) derives a learning algorithm for estimating<br />
the parameters of the model based on the variational Bayesian expectation maximization (VBEM) method. This<br />
method can handle rooted, ordered, and labeled trees. We show that the tree similarity model obtained via the BEM technique<br />
performs better than that obtained via maximum likelihood estimation by tuning the hyper parameters.<br />
09:00-11:10, Paper TuAT8.42<br />
Enhancing Image Classification with Class-Wise Clustered Vocabularies<br />
Wojcikiewicz, Wojciech, Fraunhofer Inst. FIRST<br />
Kawanabe, Motoaki, Fraunhofer FIRST and TU Berlin<br />
Binder, Alexander, Fraunhofer Inst. FIRST, Berlin<br />
In recent years bag-of-visual-words representations have gained increasing popularity in the field of image classification.<br />
Their performance highly relies on creating a good visual vocabulary from a set of image features (e.g. SIFT). For realworld<br />
photo archives such as Flicker, code<strong>book</strong>s with larger than a few thousand words are desirable, which is infeasible<br />
by the standard k-means clustering. In this paper, we propose a two-step procedure which can generate more informative<br />
code<strong>book</strong>s efficiently by class-wise k-means and a novel procedure for word selection. Our approach was compared favorably<br />
to the standard k-means procedure on the PASCAL VOC data sets.<br />
09:00-11:10, Paper TuAT8.43<br />
Efficiently Computing Optimal Consensus of Digital Line Fitting<br />
Kenmochi, Yukiko, Univ. Paris-Est<br />
Buzer, Lilian, ESIEE<br />
Talbot, Hugues, ESIEE<br />
Given a set of discrete points in a 2D digital image containing noise, we formulate our problem as robust digital line<br />
fitting. More precisely, we seek the maximum subset whose points are included in a digital line, called the optimal consensus.<br />
The paper presents an efficient method for exactly computing the optimal consensus by using the topological<br />
sweep, which provides us with the quadratic time complexity and the linear space complexity with respect to the number<br />
of input points.<br />
09:00-11:10, Paper TuAT8.44<br />
Learning a Joint Manifold Representation from Multiple Data Sets<br />
Torki, Marwan, Rutgers Univ.<br />
Elgammal, Ahmed, Rutgers Univ.<br />
Lee, Chan-Su, Yeungnam Univ.<br />
The problem we address in the paper is how to learn a joint representation from data lying on multiple manifolds. We are<br />
given multiple data sets and there is an underlying common manifold among the different data set. We propose a framework<br />
to learn an embedding of all the points on all the manifolds in a way that preserves the local structure on each manifold<br />
and, in the same time, collapses all the different manifolds into one manifold in the embedding space, while preserving<br />
the implicit correspondences between the points across different data sets. The proposed solution works as extensions to<br />
current state of the art spectral-embedding approaches to handle multiple manifolds.<br />
09:00-11:10, Paper TuAT8.45<br />
A Multi-Scale Approach to Decompose a Digital Curve into Meaningful Parts<br />
Nguyen, Thanh Phuong, LORIA<br />
Debled-Rennesson, Isabelle, LORIA – Nancy Univ.<br />
- 91 -
A multi-scale approach is proposed for polygonal representation of a digital curve by using the notion of blurred segment<br />
and a split-and-merge strategy. Its main idea is to decompose the curve into meaningful parts that are represented by detected<br />
dominant points at the appropriate scale. The method uses no threshold and can automatically decompose the curve<br />
into meaningful parts.<br />
09:00-11:10, Paper TuAT8.46<br />
A Memetic Algorithm for Selection of 3D Clustered Features with Applications in Neuroscience<br />
Björnsdotter, Malin, Univ. of Gothenburg<br />
Wessberg, Johan, Univ. of Gothenburg<br />
We propose a Memetic algorithm for feature selection in volumetric data containing spatially distributed clusters of informative<br />
features, typically encountered in neuroscience applications. The proposed method complements a conventional genetic<br />
algorithm with a local search utilizing inherent spatial relationships to efficiently identify informative feature clusters across<br />
multiple regions of the search volume. First, we demonstrate the utility of the algorithm on simulated data containing informative<br />
feature clusters of varying contrast-to-noise-ratios. The Memetic algorithm identified a majority of the relevant<br />
features whereas a conventional genetic algorithm detected only a subset sufficient for fitness maximization. Second, we<br />
applied the algorithm to authentic functional magnetic resonance imaging (fMRI) brain activity data from a motor task study,<br />
where the Memetic algorithm identified expected brain regions and subsequent brain activity prediction in new individuals<br />
was accurate at an average of 76% correct classification. The proposed algorithm constitutes a novel method for efficient<br />
volumetric feature selection and is applicable in any 3D data scenario. In particular, the algorithm is a promising alternative<br />
for sensitive brain activity mapping and decoding.<br />
09:00-11:10, Paper TuAT8.47<br />
Pose Estimation of Known Objects by Efficient Silhouette Matching<br />
Reinbacher, Christian, Graz Tech. Univ.<br />
Ruether, Matthias, Graz Univ. of Tech.<br />
Bischof, Horst, Graz Univ. of Tech.<br />
Pose estimation is essential for automated handling of objects. In many computer vision applications only the object silhouettes<br />
can be acquired reliably, because untextured or slightly transparent objects do not allow for other features. We propose<br />
a pose estimation method for known objects, based on hierarchical silhouette matching and unsupervised clustering. The<br />
search hierarchy is created by an unsupervised clustering scheme, which makes the method less sensitive to parametrization,<br />
and still exploits spatial neighborhood for efficient hierarchy generation. Our evaluation shows a decrease in matching time<br />
of 80% compared to an exhaustive matching and scalability to large models.<br />
09:00-11:10, Paper TuAT8.48<br />
Learning Non-Linear Dynamical Systems by Alignment of Local Linear Models<br />
Joko, Masao, The Univ. of Tokyo<br />
Kawahara, Yoshinobu, Osaka Univ.<br />
Yairi, Takehisa, Univ. of Tokyo<br />
Learning dynamical systems is one of the important problems in many fields. In this paper, we present an algorithm for<br />
learning non-linear dynamical systems which works by aligning local linear models, based on a probabilistic formulation of<br />
subspace identification. Because the procedure for constructing a state sequence in subspace identification can be interpreted<br />
as the CCA between past and future observation sequences, we can derive a latent variable representation for this problem.<br />
Therefore, as in a similar manner to the recent works on learning a mixture of probabilistic models, we obtain a framework<br />
for constructing a state space by aligning local linear coordinates. This leads to a prominent algorithm for learning non-linear<br />
dynamical systems. Finally, we apply our method to motion capture data and show how our algorithm works well.<br />
09:00-11:10, Paper TuAT8.49<br />
A Column Generation Approach for the Graph Matching Problem<br />
Silva, Freire, Alexandre, Univ. of Sao Paulo<br />
Jr., R. M. Cesar, Univ. of Sao Paulo<br />
Ferreira, C.E., Univ. of Sao Paulo<br />
Graph matching plays a central role in different problems for structural pattern recognition. Examples of applications include<br />
- 92 -
matching 3D CAD models, shape matching and medical imaging, to name but a few. In this paper, we present a new integer<br />
linear formulation for the problem and employ a combinatorial optimization technique, called column generation, in order<br />
to solve instances of the problem. We also present computational experiments with generated instances.<br />
09:00-11:10, Paper TuAT8.50<br />
Pattern Recognition using Functions of Multiple Instances<br />
Zare, Alina, Univ. of Florida<br />
Gader, Paul, Univ. of Florida<br />
The Functions of Multiple Instances (FUMI) method for learning a target prototype from data points that are functions of<br />
target and non-target prototypes is introduced. In this paper, a specific case is considered where, given data points which are<br />
convex combinations of a target prototype and several non-target prototypes, the Convex-FUMI (C-FUMI) method learns<br />
the target and non-target patterns, the number of nontarget patterns, and determines the weights (or proportions) of all the<br />
prototypes for each data point. For this method, training data need only binary labels indicating whether the data contains or<br />
does not contain some proportion of the target prototype; the specific target weights for the training data are not needed.<br />
After learning the target prototype using the binary labeled training data, target detection is performed on test data. Results<br />
showing detection of the skin in hyper spectral imagery and sub-pixel target detection in simulated data are presented.<br />
09:00-11:10, Paper TuAT8.51<br />
Linear Decomposition of Planar Shapes<br />
Faure, Alexandre, LAIC Univ. d’Auvergne<br />
Feschet, Fabien, Univ. d’Auvergne Clermont-Ferrand 1<br />
The issue of decomposing digital shapes into sets of digital primitives has been widely studied over the years. Practically all<br />
existing approaches require perfect or cleaned shapes. Those are obtained using various pre-processing techniques such as<br />
thinning or skeletonization. The aim of this paper is to bypass the use of such pre-processings, in order to obtain decompositions<br />
of shapes directly from connected components. This method has the advantage of taking into account the intrinsic<br />
thickness of digital shapes, and provides a decomposition which is also robust to<br />
09:00-11:10, Paper TuAT8.52<br />
Sketched Symbol Recognition with a Latent-Dynamic Conditional Model<br />
Deufemia, Vincenzo, Univ. di Salerno<br />
Risi, Michele, Univ. of Salerno<br />
Tortora, Genoveffa, Univ. di Salerno<br />
In this paper we present a recognizer of sketched symbols based on Latent-Dynamic Conditional Random Fields (LDCRF),<br />
a discriminative model for sequence classification. The LDCRF model classifies unsegmented sequences of strokes into domain<br />
symbols by taking into account contextual and temporal information. In particular, LDCRFs learn the extrinsic dynamics<br />
among strokes by modeling a continuous stream of symbol labels, and learn internal stroke sub-structure by using intermediate<br />
hidden states. The performance of our work is evaluated in the electric circuit domain.<br />
09:00-11:10, Paper TuAT8.53<br />
Canonical Patterns of Oriented Topologies<br />
Mankowski, Walter, Drexel Univ.<br />
Shokoufandeh, Ali, Drexel Univ.<br />
Salvucci, Dario, Drexel Univ.<br />
A common problem in many areas of behavioral research is the analysis of the large volume of data recorded during the execution<br />
of the tasks being studied. Recent work has proposed the use of an automated method based on canonical sets to<br />
identify the most representative patterns in a large data set, and described an initial experiment in identifying canonical webbrowsing<br />
patterns. However, there is a significant limitation to the method: it requires the similarity matrix to be symmetric,<br />
and thus can only be used for problems that can be modeled as unoriented topologies. In this paper we propose a novel enhancement<br />
to the method to support oriented topologies by allowing the similarity matrix to be nonsymmetric. We demonstrate<br />
the power of this new technique by applying the new method to find canonical lane changes in a driving simulator experiment.<br />
- 93 -
09:00-11:10, Paper TuAT8.54<br />
Hierarchical Anomality Detection based on Situation<br />
Nishio, Shuichi, Advanced Telecommunication Res. Inst. International<br />
Okamoto, Hiromi, Nara Women’s Univ.<br />
Babaguchi, Noboru, Osaka Univ.<br />
In this paper, we propose a novel anomality detection method based on external situational information and hierarchical<br />
analysis of behaviors. Past studies model normal behaviors to detect anomality as outliers. However, normal behaviors tend<br />
to differ by situations. Our method combines a set of simple classifiers with pedestrian trajectories as inputs. As mere path<br />
information is not sufficient for detecting anomality, trajectories are first decomposed into hierarchical features of different<br />
abstract levels and then applied to appropriate classifiers corresponding to the situation it belongs to. Effects of the methods<br />
are tested using real environment data.<br />
09:00-11:10, Paper TuAT8.55<br />
Image Classification using Subgraph Histogram Representation<br />
Ozdemir, Bahadir, Bilkent Univ.<br />
Aksoy, Selim, Bilkent Univ.<br />
We describe an image representation that combines the representational power of graphs with the efficiency of the bag-ofwords<br />
model. For each image in a data set, first, a graph is constructed from local patches of interest regions and their spatial<br />
arrangements. Then, each graph is represented with a histogram of sub graphs selected using a frequent subgraph mining algorithm<br />
in the whole data. Using the sub graphs as the visual words of the bag-of-words model and transforming of the<br />
graphs into a vector space using this model enables statistical classification of images using support vector machines. Experiments<br />
using images cut from a large satellite scene show the effectiveness of the proposed representation in classification<br />
of complex types of scenes into eight high-level semantic classes.<br />
09:00-11:10, Paper TuAT8.56<br />
Oriented Boundary Graph: A Framework to Design and Implement 3D Segmentation Algorithms<br />
Baldacci, Fabien, Univ. de Bordeaux<br />
Braquelaire, Achille, Univ. de Bordeaux<br />
Domenger, Jean Philippe, Univ. de Bordeaux<br />
In this paper we show the interest of a topological model to represent 3D segmented image which is a good compromise between<br />
the complete but time consuming representations and the partial but not expressive enough ones. We show that this<br />
model, called Oriented Boundary Graph, provides an effective framework for both volumic image analysis and segmentation.<br />
The Oriented Boundary Graph provides an efficient implementation of a set of primitives suitable for the design complex<br />
segmentation algorithms and to implement the computation of the segmented image characteristics needed by such algorithms.<br />
We first present the framework and give the time complexity of its main primitives. Then, we give some examples of the use<br />
of this framework in order to efficiently design non-trivial image analysis operations and image segmentation algorithms.<br />
Those examples are applied on 3D CT-scan data.<br />
09:00-11:10, Paper TuAT8.57<br />
Hierarchical Segmentation of Complex Structures<br />
Akcay, Huseyin Gokhan, Bilkent Univ.<br />
Aksoy, Selim, Bilkent Univ.<br />
Soille, Pierre, Ec. Joint Res. Centre<br />
We present an unsupervised hierarchical segmentation algorithm for detection of complex heterogeneous image structures<br />
that are comprised of simpler homogeneous primitive objects. An initial segmentation step produces regions corresponding<br />
to primitive objects with uniform spectral content. Next, the transitions between neighboring regions are modeled and clustered.<br />
We assume that the clusters that are dense and large enough in this transition space can be considered as significant.<br />
Then, the neighboring regions belonging to the significant clusters are merged to obtain the next level in the hierarchy. The<br />
experiments show that the algorithm that iteratively clusters and merges region groups is able to segment high-level complex<br />
structures in a hierarchical manner.<br />
- 94 -
TuAT9 Upper Foyer<br />
Biometrics Poster Session<br />
Session chair: Dobrišek, Simon (University of Ljubljana)<br />
09:00-11:10, Paper TuAT9.1<br />
Image Specific Error Rate: A Biometric Performance Metric<br />
Tabassi, Elham, NIST<br />
Image-specific false match and false non-match error rates are defined by inheriting concepts from the biometric zoo. These<br />
metrics support failure mode analyses by allowing association of a covariate (e.g., dilation for iris recognition) with a matching<br />
error rate without having to consider the covariate of a comparison image. Image-specific error rates are also useful in detection<br />
of ground truth errors in test datasets. Images with higher image-specific error rates are more ``difficult’’ to recognize,<br />
so these metrics can be used to assess the level of difficulty of test corpora or partition a corpus into sets with varying level<br />
of difficulty. Results on use of image-specific error rates for ground-truth error detection, covariate analysis and corpus partitioning<br />
is presented.<br />
09:00-11:10, Paper TuAT9.2<br />
Low Cost and Usable Multimodal Biometric System based on Keystroke Dynamics and 2D Face Recognition<br />
Giot, Romain, Univ. de Caen, Basse-Normandie – CNRS<br />
Hemery, Baptiste, Univ. de CAEN<br />
Rosenberger, Christophe, Lab. GREYC<br />
We propose in this paper a low cost multimodal biometric system combining keystroke dynamics and 2D face recognition.<br />
The objective of the proposed system is to be used while keeping in mind: good performances, acceptability, and espect of<br />
privacy. Different fusion methods have been used (min, max, mul, svm, weighted sum configured with genetic algorithms,<br />
and, genetic programming) on the scores of three keystroke dynamics algorithms and two 2D face recognition ones. This<br />
multimodal biometric system improves the recognition rate in comparison with each individual method. On a chimeric database<br />
composed of 100 individuals, the best keystroke dynamics method obtains an EER of 8.77%, the best face recognition<br />
one has an EER of 6.38%, while the best proposed fusion system provides an EER of 2.22%.<br />
09:00-11:10, Paper TuAT9.3<br />
Parallel versus Hierarchical Fusion of Extended Fingerprint Features<br />
Zhao, Qijun, The Hong Kong Pol. Univ.<br />
Liu, Feng, The Hong Kong Pol. Univ.<br />
Zhang, Lei, The Hong Kong Pol. Univ.<br />
Zhang, David, The Hong Kong Pol. Univ.<br />
Extended fingerprint features such as pores, dots and incipient ridges have been increasingly attracting attention from researchers<br />
and engineers working on automatic fingerprint recognition systems. A variety of methods have been proposed to<br />
combine these features with the traditional minutiae features. This paper comparatively analyses the parallel and hierarchical<br />
fusion approaches on a high resolution fingerprint image dataset. Based on the results, a novel and more effective hierarchical<br />
approach is presented for combining minutiae, pores, dots and incipient ridges.<br />
09:00-11:10, Paper TuAT9.4<br />
Feature Band Selection for Multispectral Palmprint Recognition<br />
Guo, Zhenhua, The Hong Kong Pol. Univ.<br />
Zhang, Lei, The Hong Kong Pol. Univ.<br />
Zhang, David, The Hong Kong Pol. Univ.<br />
Palm print is a unique and reliable biometric characteristic with high usability. Many palm print recognition algorithms and<br />
systems have been successfully developed in the past decades. Most of the previous works use the white light sources for illumination.<br />
Recently, it has been attracting much research attention on developing new biometric systems with both high<br />
accuracy and high anti-spoof capability. Multispectral palm print imaging and recognition can be a potential solution to such<br />
systems because it can acquire more discriminative information for personal identity recognition. One crucial step in developing<br />
such systems is how to determine the minimal number of spectral bands and select the most representative bands to<br />
build the multispectral imaging system. This paper presents preliminary studies on feature band selection by analyzing hyper<br />
- 95 -
spectral palm print data (420nm~1100nm). Our experiments showed that 2 spectral bands at 700nm and 960nm could provide<br />
most discriminate information of palm print. This finding could be used as the guidance for designing multispectral palm<br />
print systems in the future.<br />
09:00-11:10, Paper TuAT9.5<br />
Automatic Gender Recognition using Fusion of Facial Strips<br />
Lee, Ping-Han, National Taiwan Univ.<br />
Hung, Jui-Yu, National Taiwan Univ.<br />
Hung, Yi-Ping, National Taiwan Univ.<br />
We propose a fully automatic system that detects and normalizes faces in images and recognizes their genders. To boost the<br />
recognition accuracy, we correct the in-plane and out-of-plane rotations of faces, and align faces based on estimated eye positions.<br />
To perform gender recognition, a face is first decomposed into several horizontal and vertical strips. Then, a regression<br />
function for each strip gives an estimation of the likelihood the strip sample belongs to a specific gender. The likelihoods<br />
from all strips are concatenated to form a new feature, based on which a gender classifier gives the final decision. The proposed<br />
approach achieved an accuracy of 88.1% in recognizing genders of faces in images collected from the World-Wide<br />
Web. For faces in the FERET dataset, our system achieved an accuracy of 98.8%, outperforming all the six state-of-the-art<br />
algorithms compared in this paper<br />
09:00-11:10, Paper TuAT9.6<br />
Benchmarking Local Orientation Extraction in Fingerprint Recognition<br />
Cappelli, Raffaele, Univ. of Bologna<br />
Maltoni, Davide, Univ. of Bologna<br />
Turroni, Francesco, Univ. of Bologna<br />
The computation of local orientations is a fundamental step in fingerprint recognition. Although a large number of approaches<br />
have been proposed in the literature, no systematic quantitative evaluations have been done yet, mainly due to the lack of<br />
proper datasets with associated ground truth information. In this paper we propose a new benchmark (which includes two<br />
datasets and an accuracy metric) and report preliminary results obtained by testing four well-known local orientation extraction<br />
algorithms.<br />
09:00-11:10, Paper TuAT9.7<br />
Efficient Finger Vein Localization and Recognition<br />
Li, Xu, Civil Aviation Univ. of China<br />
Yang, Jinfeng, Civil Aviation Univ. of China<br />
In order to achieve accurate recognition of human finger vein (FV), this paper addresses the problems of finger vein localization<br />
and vein feature extraction. An inherent physical property of human fingers is used to localize the region of interest<br />
(ROI) of vein images as well as removing uninformative vein imagery based on the inter-phalangeal joint prior. In addition,<br />
vein images are characterized as a series of energy features through steerable filters. Experimental results show the promising<br />
performance of the proposed algorithm for human vein identification.<br />
09:00-11:10, Paper TuAT9.8<br />
Learning the Relationship between High and Low Resolution Images in Kernel Space for Face Super Resolution<br />
Zou, Wilman, W W, Hong Kong Baptist Univ.<br />
Yuen, Pong C, Hong Kong Baptist Univ.<br />
This paper proposes a new nonlinear face super resolution algorithm to address an important issue in face recognition from<br />
surveillance video namely, recognition of low resolution face image with nonlinear variations. The proposed method learns<br />
the nonlinear relationship between low resolution face image and high resolution face image in (nonlinear) kernel feature<br />
space. Moreover, the discriminative term can be easily included in the proposed framework. Experimental results on CMU-<br />
PIE and FRGC v2.0 databases show that proposed method outperforms existing methods as well as the recognition based on<br />
high resolution images.<br />
- 96 -
09:00-11:10, Paper TuAT9.9<br />
Robust Regression for Face Recognition<br />
Naseem, Imran, The Univ. of Western Australia<br />
Togneri, Roberto, The Univ. of Western Australia<br />
Bennamoun, Mohammed, The Univ. of Western Australia<br />
In this paper we address the problem of illumination invariant face recognition. Using a fundamental concept that in<br />
general, patterns from a single object class lie on a linear subspace [2], we develop a linear model representing a probe<br />
image as a linear combination of class-specific galleries. In the presence of noise, the well-conditioned inverse problem<br />
is solved using the robust Huber estimation and the decision is ruled in favor of the class with the minimum reconstruction<br />
error. The proposed Robust Linear Regression Classification (RLRC) algorithm is extensively evaluated for two standard<br />
databases and has shown good performance index compared to the state-of-art robust approaches.<br />
09:00-11:10, Paper TuAT9.10<br />
Recognition of Blurred Faces via Facial Deblurring Combined with Blur-Tolerant Descriptors<br />
Hadid, Abdenour, Univ. of Oulu<br />
Nishiyama, Masashi, Toshiba Corp.<br />
Sato, Yoichi, Univ. of Tokyo<br />
Blur is often present in real-world images and significantly affects the performance of face recognition systems. To improve<br />
the recognition of blurred faces, we propose a new approach which inherits the advantages of two recent methods. The<br />
idea consists of first reducing the amount of blur in the images via deblurring and then extracting blur-tolerant descriptors<br />
for recognition. We assess our analysis on real blurred face images (FRGC 1.0 database) and also on face images artificially<br />
degraded by focus blur (FERET database), demonstrating significant performance enhancement compared to the state-ofthe-art.<br />
09:00-11:10, Paper TuAT9.11<br />
Diffusion-Based Face Selective Smoothing in DCT Domain to Illumination Invariant Face Recognition<br />
Ezoji, Mehdi, Amirkabir Univ. of Tech.<br />
Faez, Karim, Amirkabir Univ. of Tech.<br />
In this paper, a diffusion-based iterative algorithm is proposed for illumination invariant face representation using image<br />
selective smoothing in DCT domain. In fact, we split the image I into three parts (R+w)+L of an illumination invariant<br />
component, an oscillating component and a smooth component. At each iteration, the influence of different frequency<br />
sub-bands of image is determined and the additive oscillating component is reduced. The experimental results confirmed<br />
that our approach provides a suitable representation for overcoming illumination variations.<br />
09:00-11:10, Paper TuAT9.12<br />
BioHashing for Securing Fingerprint Minutiae Templates<br />
Belguechi, Rima, National School of Computer Science<br />
Rosenberger, Christophe, Lab. GREYC<br />
Ait Aoudia, Samy, National School of Computer Science<br />
The storage of fingerprints is an important issue as this biometric modality is more and more deployed for real applications.<br />
The a prori impossibility to revoke a biometric template (like a password) in case of theft, is a major concern for privacy<br />
reasons. We propose in this paper a new method to secure fingerprint minutiae templates by storing a bio code while keeping<br />
good recognition results. We show the efficiency of the method in comparison to some published methods for different<br />
scenarios.<br />
09:00-11:10, Paper TuAT9.13<br />
Fusion of an Isometric Deformation Modeling Approach using Spectral Decomposition and a Region-Based Approach<br />
using ICP for Expression-Invariant 3D Face Recognition<br />
Smeets, Dirk, K.U.Leuven<br />
Fabry, Thomas, K.U.Leuven<br />
Hermans, Jeroen, K.U.Leuven<br />
- 97 -
Vandermeulen, Dirk<br />
Suetens, Paul, K.U.Leuven<br />
The recognition of faces under varying expressions is one of the current challenges in the face recognition community. In<br />
this paper, we propose a method fusing different complementary approaches each dealing with expression variations. The<br />
first approach uses an isometric deformation model and is based on the largest singular values of the geodesic distance<br />
matrix as an expression-invariant shape descriptor. The second approach performs recognition on the more rigid parts of<br />
the face that are less affected by expression variations. Several fusion techniques are examined for combining the approaches.<br />
The presented method is validated on a subset of 900 faces of the BU-3DFE face database resulting in an equal<br />
error rate of 5.85% for the verification scenario and a rank 1 recognition rate of 94.48% for the identification scenario<br />
using the sum rule as fusion technique. This result outperforms other 3D expression-invariant face recognition methods<br />
on the same database.<br />
09:00-11:10, Paper TuAT9.14<br />
Towards a Best Linear Combination for Multimodal Biometric Fusion<br />
Chia, Chaw, Chia, Chaw, Nottingham Trent Univ.<br />
Sherkat, Nasser, Nottingham Trent Univ.<br />
Nolle, Lars, Nottingham Trent Univ.<br />
Owing to effectiveness and ease of implementation Sum rule has been widely applied in the biometric research field. Different<br />
matcher information has been used as weighting parameters in the weighted Sum rule. In this work, a new parameter<br />
has been devised in reducing the genuine/imposter distribution overlap. It is shown that the overlap region width has the<br />
best generalization performance as the weighting parameter amongst other commonly used matcher information. Furthermore,<br />
it is illustrated that the equal weighted Sum rule can generally perform better than the Equal Error Rate and d-prime<br />
weighted Sum rule. The publicly available databases: the NIST-BSSR1 multimodal biometric and Xm2vts score sets have<br />
been used.<br />
09:00-11:10, Paper TuAT9.15<br />
Slap Fingerprint Segmentation for Live-Scan Devices and Ten-Print Cards<br />
Zhang, Yongliang, Zhejiang Univ. of Technology<br />
Xiao, Gang, Zhejiang Univ. of Technology<br />
Li, Yanmiao, Jiaotong Univ. Dalian<br />
Wu, Hongtao, Hebei Univ. of Tech.<br />
Huang, Yaping, Zhejiang Univ. of Technology<br />
Presented here is a highly accurate and computationally efficient algorithm suitable for slap fingerprint segmentation. The<br />
main advantages of this algorithm are as follows: 1)three-order cumulant is used to roughly segment the foreground; 2)frequency<br />
domain analysis is carried out in local areas to do binarization and fine segmentation; 3)cumulative sum analysis<br />
is applied to extract the knuckle lines; 4)two shape features of the ellipse are adapted to calculate the confidence of each<br />
fingertip candidate. Experimental results show that the algorithm has the characteristic of more robustness against noise<br />
and superior precision, not only for live-scan four finger slaps but also for ten-print-card five finger slaps.<br />
09:00-11:10, Paper TuAT9.16<br />
A Metric of Information Gained through Biometric Systems<br />
Takahashi, Kenta, Hitachi Ltd.<br />
Murakami, Takao, Hitachi Ltd.<br />
We propose a metric of information gained through biometric matching systems. Firstly, we discuss how the information<br />
about the identity of a person is derived from biometric samples through a biometric system, and define the “biometric<br />
system entropy” or BSE. Then we prove that the BSE can be approximated asymptotically by the Kullback-Leibler divergence<br />
D(f_G(x) || f_I(x)) where f_G(x), f_I(x) are PDFs of matching scores between samples from an individuals and<br />
among population. We also discuss how to evaluate D(f_G || f_I) of a biometric system and show a numerical example of<br />
face and fingerprint matching systems.<br />
- 98 -
09:00-11:10, Paper TuAT9.17<br />
Probabilistic Measure for Signature Verification based on Bayesian Learning<br />
Pu, Danjun, State Univ. of New York at Buffalo<br />
Srihari, Sargur<br />
Signature verification is a common task in forensic document analysis. The goal is to make a decision whether a questioned<br />
signature belongs to a set of known signatures of an individual or not. In a typical forgery case a very limited number of<br />
known signatures may be available, with as few as four or five knowns \cite{Stev95}. Here we describe a fully Bayesian<br />
approach which overcomes the limitation of having too few genuine samples. The algorithm has three steps: Step 1: Learn<br />
prior distributions of parameters from a population of known signatures; Step 2: Determine the posterior distributions of<br />
parameters using the genuine samples of a particular person; Step 3: Determine probabilities of the query from both genuine<br />
and forgery classes and the Log Likelihood Ratio (LLR) of the query. Rather than give a hard decision, this method provides<br />
a probabilistic measure LLR of the decision and the performance of the Bayesian Learning is improved especially in the<br />
case of limited known samples.<br />
09:00-11:10, Paper TuAT9.18<br />
Gender Classification using on Single Frontal Image Per Person: Combination of Appearance and Geometric based<br />
Features<br />
Mozaffari, Saeed, Semnan Univ.<br />
Behravan, Hamid, Semnan Univ.<br />
Akbari, Rohollah, Qazvin Azad Univ.<br />
Today, many social interactions and services depend on gender. In this paper, we introduce a single image gender classification<br />
algorithm using combination of appearance-based and geometric-based features. These include Discrete Cosine<br />
Transform (DCT), and Local Binary Pattern (LBP), and geometrical distance feature (GDF). The novel feature, GDF proposed<br />
in this paper, is inspired from physiological differences between male and female faces. Combination of appearance-based<br />
features (DCT and LBP) with geometric-based feature (GDF) leads to higher gender classification accuracy.<br />
Our system estimates gender of the input image based on the majority rule. If the results of DCT and LBP features are not<br />
identical, gender classification will be based on GDF feature. The proposed method was evaluated on two databases: AR<br />
and ethnic. Experimental results show that the novel geometric feature improves the gender classification accuracy by<br />
13%.<br />
09:00-11:10, Paper TuAT9.19<br />
Residual Analysis for Fingerprint Orientation Modeling<br />
Jirachaweng, Suksan, Kasetsart Univ.<br />
Hou, Zujun, Inst. For Infocomm Res.<br />
Li, Jun, Inst. For Infocomm Res.<br />
Yau, Wei-Yun, Inst. For Infocomm Res.<br />
Areekul, Vutipong, Kasetsart Univ.<br />
This paper presents a novel method for fingerprint orientation modeling, which executes in two phases. Firstly, the orientation<br />
field is reconstructed through fitting to a lower order Legendre polynomial basis to capture the global orientation<br />
pattern. Then the preliminary model around the singular region is dynamically refined by fitting to a higher order Legendre<br />
polynomial basis. The singular region is automatically detected through the analysis on the orientation residual field between<br />
the original orientation field and the orientation model. The method has been evaluated using the FVC 2004 data<br />
sets and compared with state-of-the-arts. Experiments turn out that the propose method attains higher accuracy in fingerprint<br />
matching and singularity preservation.<br />
09:00-11:10, Paper TuAT9.20<br />
Dynamic Amelioration of Resolution Mismatches for Local Feature based Identity Inference<br />
Wong, Yongkang, NICTA<br />
Sanderson, Conrad, NICTA<br />
Mau, Sandra, NICTA<br />
Lovell, Brian Carrington, The Univ. of Queensland<br />
While existing face recognition systems based on local features are robust to issues such as misalignment, they can exhibit<br />
- 99 -
accuracy degradation when comparing images of differing resolutions. This is common in surveillance environments<br />
where a gallery of high resolution mugshots is compared to low resolution CCTV probe images, or where the size of a<br />
given image is not a reliable indicator of the underlying resolution (e.g. poor optics). To alleviate this degradation, we<br />
propose a compensation framework which dynamically chooses the most appropriate face recognition system for a given<br />
pair of image resolutions. This framework applies a novel resolution detection method which does not rely on the size of<br />
the input images, but instead exploits the sensitivity of local features to resolution using a probabilistic multi-region histogram<br />
approach. Experiments on a resolution-modified version of the “Labeled Faces in the Wild” dataset show that the<br />
proposed resolution detector frontend obtains a 99% average accuracy in selecting the most appropriate face recognition<br />
system, resulting in higher overall face discrimination accuracy (across several resolutions) compared to the individual<br />
baseline face recognition systems.<br />
09:00-11:10, Paper TuAT9.21<br />
Patch-Based Similarity HMMs for Face Recognition with a Single Reference Image<br />
Vu, Ngoc-Son, Gipsa-Lab.<br />
Caplier, Alice, GIPSA-Lab. Grenoble Univ.<br />
In this paper we present a new architecture for face recognition with a single reference image, which completely separates<br />
the training process from the recognition process. In the training stage, by using a database containing various individuals,<br />
the spatial relations between face components are represented by two Hidden Markov Models (HMMs), one modeling<br />
within-subject similarities, and the other modeling inter-subject differences. This allows us during the recognition stage<br />
to take a pair of face images, neither of which has been seen before, and to determine whether or not they come from the<br />
same individual. Whilst other face-recognition HMMs use Maximum Likelihood criterion, we test our approach using<br />
both Maximum Likelihood and Maximum a Posteriori (MAP) criterion, and find that MAP provides better results. Importantly,<br />
the training database can be entirely separated from the gallery and test images: this means that adding new individuals<br />
to the system can be done without re-training. We present results based upon models trained on the FERET training<br />
dataset, and demonstrate that these give satisfactory recognition rates on both the FERET database itself and more impressively<br />
the unseen AR database. When compared to other HMM based face recognition techniques, our algorithm is of<br />
much lower complexity due to the small size of our observation sequence.<br />
09:00-11:10, Paper TuAT9.22<br />
How to Control Acceptance Threshold for Biometric Signatures with Different Confidence Values?<br />
Makihara, Yasushi, The Inst. of Scientific and Industrial Res. Univ.<br />
Hossain, Md. Altab, Osaka Univ.<br />
Yagi, Yasushi, Osaka Univ.<br />
In the biometric verification, authentication is given when a distance of biometric signatures between enrollment and test<br />
phases is less than an acceptance threshold, and the performance is usually evaluated by a so-called Receiver Operating<br />
Characteristics (ROC) curve expressing a trade off between False Rejection Rate (FRR) and False Acceptance Rate (FAR).<br />
On the other hand, it is also well known that the performance is significantly affected by the situation differences between<br />
enrollment and test phases. This paper describes a method to adaptively control an acceptance threshold with quality measures<br />
derived from situation differences so as to optimize the ROC curve. We show that the optimal evolution of the adaptive<br />
threshold in the domain of the distance and quality measure is equivalent to a constant evolution in the domain of the error<br />
gradient defined as a ratio of a total error rate to a total acceptance rate. An experiment with simulation data demonstrates<br />
that the proposed method outperforms the previous methods, particularly under a lower FAR or FRR tolerance condition.<br />
09:00-11:10, Paper TuAT9.23<br />
Binary Representations of Fingerprint Spectral Minutiae Features<br />
Xu, Haiyun, Univ. of Twente<br />
Veldhuis, Raymond, Univ. of Twente<br />
A fixed-length binary representation of a fingerprint has the advantages of a fast operation and a small template storage.<br />
For many biometric template protection schemes, a binary string is also required as input. The spectral minutiae representation<br />
is a method to represent a minutiae set as a fixed-length real-valued feature vector. In order to be able to apply the<br />
spectral minutiae representation with a template protection scheme, we introduce two novel methods to quantize the<br />
spectral minutiae features into binary strings: Spectral Bits and Phase Bits. The experiments on the FVC2002 database<br />
show that the binary representations can even outperformed the spectral minutiae real-valued features.<br />
- 100 -
09:00-11:10, Paper TuAT9.24<br />
Attacking Iris Recognition: An Efficient Hill-Climbing Technique<br />
Rathgeb, Christian, Univ. of Salzburg<br />
Uhl, Andreas, Univ. of Salzburg<br />
In this paper we propose a modified hill-climbing attack to iris biometric systems. Applying our technique we are able to<br />
effectively gain access to iris biometric systems at very low effort. Furthermore, we demonstrate that reconstructing approximations<br />
of original iris images is highly non-trivial.<br />
09:00-11:10, Paper TuAT9.25<br />
Face Recognition at-a-Distance using Texture, Dense- and Sparse-Stereo Reconstruction<br />
Rara, Ham, CVIP Lab. Univ. of Louisville<br />
Ali, Asem, Univ. of Louisville<br />
Elhabian, Shireen, Univ. of Louisville<br />
Starr, Thomas, Univ. of Louisville<br />
Farag, Aly A., Univ. of Louisville<br />
This paper introduces a framework for long-distance face recognition using dense and sparse stereo reconstruction, with<br />
texture of the facial region. Two methods to determine correspondences of the stereo pair are used in this paper: (a) dense<br />
global stereo-matching using maximum-a-posteriori Markov Random Fields (MAP-MRF) algorithms and (b) Active Appearance<br />
Model (AAM) fitting of both images of the stereo pair and using the fitted AAM mesh as the sparse correspondences.<br />
Experiments are performed using combinations of different features extracted from the dense and sparse<br />
reconstructions, as well as facial texture. The cumulative rank curves (CMC), which are generated using the proposed<br />
framework, confirms the feasibility of the proposed work for long distance recognition of human faces.<br />
09:00-11:10, Paper TuAT9.26<br />
Automatic Asymmetric 3D-2D Face Recognition<br />
Huang, Di, Ec. Centrale de Lyon<br />
Ardabilian, Mohsen, Ec. Centrale de Lyon<br />
Wang, Yunhong, Beihang Univ.<br />
Chen, Liming, Ec. Centrale de Lyon<br />
3D Face recognition has been considered as a major solution to deal with unsolved issues of reliable 2D face recognition<br />
in recent years, i.e. lighting and pose variations. However, 3D techniques are currently limited by their high registration<br />
and computation cost. In this paper, an asymmetric 3D-2D face recognition method is presented, enrolling in textured 3D<br />
whilst performing automatic identification using only 2D facial images. The goal is to limit the use of 3D data to where it<br />
really helps to improve face recognition accuracy. The proposed approach contains two separate matching steps: Sparse<br />
Representation Classifier (SRC) is applied to 2D-2D matching, while Canonical Correlation Analysis (CCA) is exploited<br />
to learn the mapping between range LBP faces (3D) and texture LBP faces (2D). Both matching scores are combined for<br />
the final decision. Moreover, we propose a new preprocessing pipeline to enhance robustness to lighting and pose effects.<br />
The proposed method achieves better experimental results in the FRGC v2.0 dataset than 2D methods do, but avoiding<br />
the cost and inconvenience of data acquisition and computation of 3D approaches.<br />
09:00-11:10, Paper TuAT9.27<br />
Model and Score Adaptation for Biometric Systems: Coping with Device Interoperability and Changing Acquisition<br />
Conditions<br />
Poh, Norman, Univ. of Surrey<br />
Kittler, Josef, Univ. of Surrey<br />
Marcel, Sebastien, IDIAP Res. Inst. EPFL<br />
Matrouf, Driss, Univ. d’Avignon et des Pays de Vaucluse<br />
Bonastre, Jean-Francois, Univ. d’Avignon et des Pays de Vaucluse<br />
The performance of biometric systems can be significantly affected by changes in signal quality. In this paper, two types<br />
of changes are considered: change in acquisition environment and in sensing devices. We investigated three solutions: (I)<br />
model-level adaptation, (ii) score-level adaptation (normalisation), and (iii) the combination of the two, called compound<br />
adaptation. In order to cope with the above changing conditions, the model-level adaptation attempts to update the param-<br />
- 101 -
eters of the expert systems (classifiers). This approach requires the authenticity of the candidate samples used for adaptation<br />
be known (corresponding to supervised adaptation), or can be estimated (unsupervised adaptation). In comparison, the<br />
score-level adaptation merely involves post processing the expert output, with the objective of rendering the associated<br />
decision threshold to be dependent only on the class priors despite the changing acquisition conditions. Since the above<br />
adaptation strategies treat the underlying biometric experts/classifiers as a black-box, they can be applied to any unimodal<br />
or multimodal biometric system, thus facilitating system-level integration and performance optimisation. Our contributions<br />
are: (I) proposal of compound adaptation; (ii) investigation and comparison of two different quality-dependent score normalisation<br />
strategies; and, (iii) empirical comparison of the merit of the above three solutions on the BANCA face (video)<br />
and speech database.<br />
09:00-11:10, Paper TuAT9.28<br />
Online Boosting OC for Face Recognition in Continuous Video Stream<br />
Huo, Hongwen, Peking Univ.<br />
Feng, Jufu, Peking Univ.<br />
In this paper, we present a novel online face recognition approach for video stream called online boosting OC (output<br />
code). Recently, boosting was successfully used in many study fields such as object detection and tracking. It is one kind<br />
of large margin classifiers for binary classification problems and also efficient for on-line learning. However, face recognition<br />
is a typical multi-class problem. Hence, it is difficult to use boosting in face recognition, especially in an online<br />
version. In our work, we combine online boosting and OC algorithm to solve real-time online multi-class classification<br />
problems. We perform online boosting OC on real-world experiments: face recognition in continuous video stream, and<br />
the results show that our algorithm is accurate and robust.<br />
09:00-11:10, Paper TuAT9.29<br />
On the Dimensionality Reduction for Sparse Representation based Face Recognition<br />
Zhang, Lei, The Hong Kong Pol. Univ.<br />
Yang, Meng, The Hong Kong Pol. Univ.<br />
Feng, Zhizhao, The Hong Kong Pol. Univ.<br />
Zhang, David, The Hong Kong Pol. Univ.<br />
Face recognition (FR) is an active yet challenging topic in computer vision applications. As a powerful tool to represent<br />
high dimensional data, recently sparse representation based classification (SRC) has been successfully used for FR. This<br />
paper discusses the dimensionality reduction (DR) of face images under the framework of SRC. Although one important<br />
merit of SRC is that it is insensitive to DR or feature extraction, a well trained projection matrix can lead to higher FR rate<br />
at a lower dimensionality. An SRC oriented unsupervised DR algorithm is proposed in this paper and the experimental results<br />
on benchmark face databases demonstrated the improvements brought by the proposed DR algorithm over PCA or<br />
random projection based DR under the SRC framework.<br />
09:00-11:10, Paper TuAT9.30<br />
Improved Fingerprint Image Segmentation and Reconstruction of Low Quality Areas<br />
Mieloch, Krzysztof, Univ. of Goettingen<br />
Munk, Axel, Univ. of Goettingen<br />
Mihailescu, Preda, Univ. of Goettingen<br />
One of the main reason for false recognition is noise added to fingerprint images during the acquisition step. Hence, the<br />
improvement of the enhancement step affects general accuracy of automatic recognition systems. In one of our previous<br />
publications we introduced hierarchically linked extended features – the new set of features which not only includes additional<br />
fingerprint features individually but also contains the information about their relationships such as line adjacency<br />
information at minutiae points or links between neighbouring fingerprint lines. In this work we present the application of<br />
the extended features to preprocessing and enhancement. We use structural information for improving the segmentation<br />
step, as well as connecting disrupted fingerprint lines and recovering missing minutiae. Experiments show a decrease in<br />
the error rate in matching.<br />
- 102 -
09:00-11:10, Paper TuAT9.31<br />
An Efficient Method for Offline Text Independent Writer Identification<br />
Ghiasi, Golnaz, Amirkabir Univ. of Tech.<br />
Safabakhsh, Reza, Amirkabir Univ. of Tech.<br />
This paper proposes, an efficient method for text independent writer identification using a code<strong>book</strong>. The occurrence histogram<br />
of the shapes in the code<strong>book</strong> is used to create a feature vector for the handwriting. There is a wide variety of different<br />
shapes in the connected components obtained from handwriting. Small fragments of connected components should<br />
be used to avoid complex patterns. A new and more efficient method is introduced for this purpose. To evaluate the methods,<br />
writer identification is conducted on three varieties of a Farsi database. These varieties include texts of short, medium and<br />
large lengths. Experimental results show the efficiency of the method especially for short texts.<br />
09:00-11:10, Paper TuAT9.32<br />
Study on Color Spaces for Single Image Enrolment Face Authentication<br />
Hemery, Baptiste, Univ. de CAEN<br />
Schwartzmann, Jean-Jacques, Orange Lab.<br />
Rosenberger, Christophe, Lab. GREYC<br />
We propose in this paper to study different color spaces for representing an image for the face authentication application.<br />
We used a generic algorithm based on a matching of keypoints using sift descriptors computed on one color component.<br />
Ten color spaces have been studied on four large and signicant benchmark databases (ENSIB, FACES94, AR and FERET).<br />
We show that all color spaces do not provide the same efficiency and the use of the color information allows an interesting<br />
improvement of verification results.<br />
09:00-11:10, Paper TuAT9.33<br />
Estimation of Fingerprint Orientation Field by Weighted 2D Fourier Expansion Mode<br />
Tao, Xunqiang, Chinese Acad. of Sciences<br />
Yang, Xin, Chinese Acad. of Sciences<br />
Cao, Kai, Chinese Acad. of Sciences<br />
Wang, Ruifang, Chinese Acad. of Sciences<br />
Li, Peng, Chinese Acad. of Sciences<br />
Tian, Jie<br />
Accurate estimation of fingerprint orientation field is an essential module in fingerprint recognition. This paper proposes<br />
a novel technique for improving fingerprint orientation field estimation by fingerprint orientation model based on weighted<br />
2D fourier expansion(W-FOMFE). The motivation for the proposed method can be found by: 1)the original FOMFE is<br />
sensitive to abrupt changes in orientation field; 2) blocks of different quality should have different impacts on FOMFE.<br />
Thus, we take into account the information of the Harris-corner strength (HCS) for orientation field estimation. In our<br />
method, we first calculate the fingerprint’s HCS; then use the HCS to remove abrupt changes in orientation field; finally,<br />
incorporate the normalized HCS as weighted value into original FOMFE. We test our method on FVC2004DB1. Experimental<br />
results show that our method (W-FOMFE) has better orientation field estimation than FOMFE.<br />
09:00-11:10, Paper TuAT9.34<br />
Iterative Fingerprint Enhancement with Matched Filtering and Quality Diffusion in Spatial-Frequency Domain<br />
Sutthiwichaiporn, Prawit, Kasetsart Univ.<br />
Areekul, Vutipong, Kasetsart Univ.<br />
Jirachaweng, Suksan, Kasetsart Univ.<br />
The proposed fingerprint enhancement algorithm utilizes power spectrum in spatial-frequency domain. The input fingerprint<br />
is partitioned and assessed as high/low quality zone by using signal-to-noise ratio (SNR) approach. For high quality<br />
zone, signal spectrum with noise suppression is used to shape an enhanced filter in frequency domain. Then, this algorithm<br />
feed neighboring enhanced zone back in order to repair unreliable low quality region. The proposed algorithm out-performs<br />
Gabor and STFT approaches by fingerprint matching experiments on FVC2004 Db2 and Db3.<br />
- 103 -
09:00-11:10, Paper TuAT9.35<br />
Cancelable Face Recognition using Random Multiplicative Transform<br />
Wang, Yongjin, Univ. of Toronto<br />
Hatzinakos, Dimitrios, Univ. of Toronto<br />
The generation of cancelable and privacy preserving biometric templates is important for the pervasive deployment of<br />
biometric technology in a wide variety of applications. This paper presents a novel approach for cancelable biometric authentication<br />
using random multiplicative transform. The proposed method transforms the original biometric feature vector<br />
through element-wise multiplication with a random vector, and the sorted index numbers of the resulting vector in the<br />
transformed domain are stored as the biometric template. The changeability and privacy protecting properties of the generated<br />
biometric template are analyzed in detail. The effectiveness of the proposed method is well supported by extensive<br />
experimentation on a face verification problem.<br />
09:00-11:10, Paper TuAT9.36<br />
Evaluation of Multi-Frame Fusion based Face Classification under Shadow<br />
Canavan, Shaun, SUNY Binghamton<br />
Johnson, Benjamin, Youngstown State Univ.<br />
Reale, Michael, Binghamton Univ.<br />
Zhang, Yong, Youngstown State Univ.<br />
Yin, Lijun, SUNY Binghamton<br />
Sullins, John, Youngstown State Univ.<br />
A video sequence of a head moving across a large pose angle contains much richer information than a single-view image,<br />
and hence has greater potential for identification purposes. This paper explores and evaluates the use of a multi-frame<br />
fusion method to improve face recognition in the presence of strong shadow. The dataset includes videos of 257 subjects<br />
who rotated their heads by 0 to 90. Experiments were carried out using ten video frames per subject that were fused on<br />
the score level. The primary findings are: (I) A significant performance increase was observed, with the recognition rate<br />
being doubled from 40% using a single frame to 80% using ten frames; (ii) The performance of multi-frame fusion is<br />
strongly related to its inter-frame variation that measures its information diversity.<br />
09:00-11:10, Paper TuAT9.37<br />
Finger-Vein Authentication based on Wide Line Detector and Pattern Normalization<br />
Huang, Beining, Peking Univ.<br />
Dai, Yanggang, Peking Univ.<br />
Li, Rongfeng, Peking Univ.<br />
Tang, Darun, Univ.<br />
Li, Wenxin, Peking Univ.<br />
In the finger-vein authentication, there are two problems in practice. One is that the quality of the vein image will be<br />
reduced under bad environment conditions; the other is the irregular distortion of the image caused by the variance of the<br />
finger poses. Both problems raise the error ratios. In this paper, we introduced a wide line detector for feature extraction,<br />
which can obtain precise width information of the vein and increase the information of the extracted feature from low<br />
quality image. We also developed a new pattern normalization model based on a hypothesis that the fingers cross-sections<br />
are approximately ellipses and the vein that can be imaged is close to the finger surface. It can effectively reduce the distortion<br />
caused by the pose. In our experiment based on a database containing 50,700 images, our method shows advantages<br />
on dealing with the low quality data collected from the practical personal authentication system.<br />
09:00-11:10, Paper TuAT9.38<br />
Performance Evaluation of Micropattern Representation on Gabor Features for Face Recognition<br />
Zhao, Sanqiang, Griffith Univ. / National ICT Australia<br />
Gao, Yongsheng, Griffith Univ.<br />
Zhang, Baochang, Beihang Univ.<br />
Face recognition using micropattern representation has recently received much attention in the computer vision and pattern<br />
recognition community. Previous researches demonstrated that micropattern representation based on Gabor features<br />
achieves better performance than its direct usage on gray-level images. This paper conducts a comparative performance<br />
- 104 -
evaluation of micropattern representations on four forms of Gabor features for face recognition. Three evaluation rules<br />
are proposed and observed for a fair comparison. To reduce the high feature dimensionality problem, uniform quantization<br />
is used to partition the spatial histograms. The experimental results reveal that: 1) micropattern representation based on<br />
Gabor magnitude features outperforms the other three representations, and the performances of the other three are comparable;<br />
and 2) micropattern representation based on the combination of Gabor magnitude and phase features performs<br />
the best.<br />
09:00-11:10, Paper TuAT9.39<br />
Block Pyramid based Adaptive Quantization Watermarking for Multimodal Biometric Authentication<br />
Ma, Bin, Beihang Univ.<br />
Li, Chunlei, Beihang Univ.<br />
Wang, Yunhong, Beihang Univ.<br />
Zhang, Zhaoxiang, Beihang Univ.<br />
Wang, Yiding, North China Univ. of Tech.<br />
This paper proposes a novel robust watermarking scheme to embed fingerprint minutiae into face images for multimodal<br />
biometric authentication. First, a block pyramid is layered according to the block-wise face region distinctiveness estimated<br />
by Adaboost; upper level indicates informative spacial regions. Then, we adopt a first-order statics QIM method to perform<br />
watermark embedding in each pyramid level. Numeric watermark bits with higher priority are embedded into upper pyramid<br />
level with a larger embedding strength. By joint differentiation of host image regions and watermark bits priority, our<br />
scheme achieves a trade-offs among watermarking robustness, capacity and fidelity. Experimental results demonstrate<br />
that our approach guarantees the robustness of hidden biometric data, while preserving the distinctiveness of host biometric<br />
images.<br />
09:00-11:10, Paper TuAT9.40<br />
A Topologic Approach to User-Dependent Key Extraction from Fingerprints<br />
Gudkov, Vladimir, Sonda<br />
Ushmaev, Oleg, Russian Acad. of Sciences<br />
The paper briefly describes an approach to key extraction from fingerprint images based on topological descriptors of<br />
minutiae point neighborhood. The approach allows designing biometric encryption procedures with variable key length<br />
and successful decryption rate.<br />
09:00-11:10, Paper TuAT9.41<br />
Robust Face Recognition using Block-Based Bag of Words<br />
Li, Zisheng, The Univ. of Electro-Communications<br />
Imai, Jun-Ichi, The Univ. of Electro-Communications<br />
Kaneko, Masahide, The Univ. of Electro-Communications<br />
A novel block-based bag of words (BboW) method is proposed for robust face recognition. In our approach, a face image<br />
is partitioned into multiple blocks, dense SIFT features are then calculated and vector quantized into different codewords<br />
on each block respectively. Finally, histograms of codeword distribution on each local block are concatenated to represent<br />
the face image. Experimental results on AR database show that only using one neutral expression frame per person for<br />
training, our method can obtain excellent face recognition results on face images with extreme expressions, variant illumination,<br />
and partial occlusions. Our method also achieves an average recognition rate of 100% on XM2VTS database.<br />
09:00-11:10, Paper TuAT9.42<br />
Analysis of Fingerprint Pores for Vitality Detection<br />
Marcialis, Gian Luca, Univ. of Cagliari<br />
Roli, Fabio, Univ. of Cagliari<br />
Tidu, Alessandra, Univ. of Cagliari<br />
Spoofing is an open-issue for fingerprint recognition systems. It consists in submitting an artificial fingerprint replica from<br />
a genuine user. Current sensors provide an image which is then processed as a true fingerprint. Recently, the so-called 3 rd -<br />
level features, namely, pores, which are visible in high-definition fingerprint images, have been used for matching. In this<br />
- 105 -
paper, we propose to analyse pores location for characterizing the liveness of fingerprints. Experimental results on a large<br />
dataset of spoofed and live fingerprints show the benefits of the proposed approach.<br />
09:00-11:10, Paper TuAT9.43<br />
Applying Dissimilarity Representation to Off-Line Signature Verification<br />
Batista, Luana, École de Tech. Supérieure<br />
Granger, Eric, École de Tech. Supérieure<br />
Sabourin, R., École de Tech. Supérieure<br />
In this paper, a two-stage off-line signature verification system based on dissimilarity representation is proposed. In the<br />
first stage, a set of discrete left-to-right HMMs trained with different number of states and code<strong>book</strong> sizes is used to<br />
measure similarity values that populate new feature vectors. Then, these vectors are input to the second stage, which provides<br />
the final classification. Experiments were performed using two different classification techniques – AdaBoost, and<br />
Random Subspaces with SVMs – and a real-world signature verification database. Results indicate that the performance<br />
is significantly better with the proposed system over other reference signature verification systems from literature.<br />
09:00-11:10, Paper TuAT9.44<br />
3D Face Decomposition and Region Selection against Expression Variations<br />
Günlü, Göksel, Gazi Univ.<br />
Bilge, Hasan Sakir, Gazi Univ.<br />
3D face recognition exploits shape information as well as texture information in 2D systems. The use of whole 3D face is<br />
sensitive to some undesired situations like expression variations. To overcome this problem, we investigate a new approach<br />
that decomposes the whole 3D face into sub-regions and independently extracts features from each sub-region. 3D DCT<br />
is applied to each sub-region and most discriminating DCT coefficients are selected. The nose region gives the most contribution<br />
to the list of discriminating coefficients. Furthermore, a better recognition rate is achieved by only using the nose<br />
region. The highest recognition score in our experiments is 98.97% where rank-one recognition rates are considered. The<br />
results of the proposed approach are compared to other methods that use FRGC v2 database.<br />
09:00-11:10, Paper TuAT9.45<br />
Fusion of Qualities for Frame Selection in Video Face Verification<br />
Villegas, Mauricio, Univ. Pol. De Valencia<br />
Paredes, Roberto, Univ. Pol. De Valencia<br />
It is known that the use of video can help improve the performance of face verification systems. However, processing<br />
video in resource constrained devices is prohibitive. In order to reduce the load of the algorithms, a quality-based selection<br />
of frames can be applied. Generally there are available several qualities and thus a good fusion scheme is required. This<br />
paper addresses the problem of fusing quality measures such that the resulting quality improves the performance of frame<br />
selection. A comparison of different methods for fusing qualities is presented. Also, some new quality measures based on<br />
time derivatives are proposed, which are shown to be beneficial for estimating the overall quality. Finally, a curve is proposed<br />
which proves that the qualities used for frame selection effectively improve verification performance, independent<br />
of the number of frames selected or the method employed for obtaining the overall biometric score.<br />
09:00-11:10, Paper TuAT9.46<br />
A Person Retrieval Solution using Finger Vein Patterns<br />
Tang, Darun, Peking Univ.<br />
Huang, Beining, Peking Univ.<br />
Li, Rongfeng, Peking Univ.<br />
Li, Wenxin, Peking Univ.<br />
Dai, Yanggang, Peking Univ.<br />
Personal identification based on finger vein patterns is a newly developed biometrics technique and several practical systems<br />
have been deployed recent years. We developed a finger vein verification system for checking attendance and have<br />
collected a database of 0.8 million finger vein samples. Based on the database, we proposed a person retrieval solution for<br />
searching an image in the database and can get the response in an acceptable time. To fit for the retrieval solution, we de-<br />
- 106 -
signed a new encoding method. The experimental results show that our solution can get a result in about 10 seconds when<br />
working on a database of 50,700 samples. In the same time, the error rate is nearly the same as the linear searching.<br />
09:00-11:10, Paper TuAT9.47<br />
Multi-Classifier Q-Stack Aging Model for Adult Face Verification<br />
Li, Weifeng, Swiss Federal Inst. of Tech. Lausanne (EPFL)<br />
Drygajlo, Andrzej, Swiss Federal Inst. of Tech. Lausanne (EPFL)<br />
The influence of age progression on the performance of multi-classifier face verification systems is a challenging and<br />
largely open research problem that deserves more and more attention. In this paper, we propose to manage the aging influence<br />
on the adult face verification system by a multi-classifier Q-stack age modeling technique, which uses the age as<br />
a class-independent metadata quality measure together with scores from baseline classifiers, combining global and local<br />
patterns, in order to obtain better recognition rates. This allows for improved long-term class separation by introducing a<br />
2D parameterized decision boundary in the scores-age space using a short-term enrollment model. This new method, based<br />
on the concept of classifier stacking and age-dependent decision boundary, compares favorably with the conventional face<br />
verification approach, which uses age-independent decision threshold calculated only in the score space at the time of enrollment.<br />
The proposed approach is evaluated on the MORPH database.<br />
09:00-11:10, Paper TuAT9.48<br />
Quality-Based Fusion for Multichannel Iris Recognition<br />
Vatsa, Mayank, IIIT Delhi<br />
Singh, Richa, IIIT Delhi<br />
Ross, Arun, West Virginia Univ.<br />
Noore, Afzel, West Virginia Univ.<br />
We propose a quality-based fusion scheme for improving the recognition accuracy using color iris images characterized<br />
by three spectral channels – Red, Green and Blue. In the proposed method, quality scores are employed to select two channels<br />
of a color iris image which are fused at the image level using a Redundant Discrete Wavelet Transform (RDWT). The<br />
fused image is then used in a score-level fusion framework along with the remaining channel to improve recognition accuracy.<br />
Experimental results on a heterogenous color iris database demonstrate the efficacy of the technique when compared<br />
against other score-level and image-level fusion methods. The proposed method can potentially benefit the use of color<br />
iris images in conjunction with their NIR counterparts.<br />
09:00-11:10, Paper TuAT9.49<br />
Iris Image Retrieval based on Macro-Features<br />
Sam Sunder, Manisha, West Virginia Univ.<br />
Ross, Arun, West Virginia Univ.<br />
Most iris recognition systems use the global and local texture information of the iris in order to recognize individuals. In<br />
this work, we investigate the use of macro-features that are visible on the anterior surface of RGB images of the iris for<br />
matching and retrieval. These macro-features correspond to structures such as moles, freckles, nevi, melanoma, etc. and<br />
may not be present in all iris images. Given an image of a macro-feature, the goal is to determine if it can be used to successfully<br />
retrieve the associated iris from the database. To address this problem, we use features extracted by the Scale-<br />
Invariant Feature Transform (SIFT) to represent and match macro-features. Experiments using a subset of 770 distinct<br />
irides from the Miles Research Iris Database suggest the possibility of using macro-features for iris characterization and<br />
retrieval.<br />
09:00-11:10, Paper TuAT9.50<br />
A Gradient Descent Approach for Multi-Modal Biometric Identification<br />
Basak, Jayanta, IBM Res.<br />
Kate, Kiran, IBM Res. – India<br />
Tyagi, Vivek, IBM Res. - India<br />
Ratha, Nalini, IBM Res.<br />
While biometrics-based identification is a key technology in many critical applications such as searching for an identity<br />
- 107 -
in a watch list or checking for duplicates in a citizen ID card system, there are many technical challenges in building a solution<br />
because the size of the database can be very large (often in 100s of millions) and the intrinsic errors with the underlying<br />
biometrics engines. Often multi-modal biometrics is proposed as a way to improve the underlying biometrics accuracy<br />
performance. In this paper, we propose a score based fusion scheme tailored for identification applications. The proposed<br />
algorithm uses a gradient descent method to learn weights for each modality such that weighted sum of genuine scores is<br />
larger than the weighted sum of all the impostor scores. During the identification phase, top K candidates from each modality<br />
are retrieved and a super-set of identities is constructed. Using the learnt weights, we compute the weighted score for<br />
all the candidates in the superset. The highest scoring candidate is declared as the top candidate for identification. The<br />
proposed algorithm has been tested using NIST BSSR-1 dataset and results in terms of accuracy as well as the speed (execution<br />
time) are shown to be far superior than the published results on this dataset.<br />
09:00-11:10, Paper TuAT9.51<br />
Robust ECG Biometrics by Fusing Temporal and Cepstral Information<br />
Li, Ming, Univ. of Southern California<br />
Narayanan, Shrikanth, Univ. of Southern California<br />
The use of vital signs as a biometric is a potentially viable approach in a variety of application scenarios such as security<br />
and personalized health care. In this paper, a novel robust Electrocardiogram (ECG) biometric algorithm based on both<br />
temporal and cepstral information is proposed. First, in the time domain, after pre-processing and normalization, each<br />
heartbeat of the ECG signal is modeled by Hermite polynomial expansion (HPE) and support vector machine (SVM).<br />
Second, in the homomorphic domain, cepstral features are extracted from the ECG signals and modeled by Gaussian mixture<br />
modeling (GMM). In the GMM framework, heteroscedastic linear discriminant analysis and GMM super vector kernel<br />
is used to perform feature dimension reduction and discriminative modeling, respectively. Finally, fusion of both temporal<br />
and cepstral system outcomes at the score level is used to improve the overall performance. Experiment results show that<br />
the proposed hybrid approach achieves 98.3% accuracy and 0.5% equal error rate on the MIT-BIH Normal Sinus Rhythm<br />
Database.<br />
09:00-11:10, Paper TuAT9.52<br />
A Comparative Study of Facial Landmark Localization Methods for Face Recognition using HOG Descriptors<br />
Monzo, David, Univ. Pol. Valencia<br />
Albiol, Alberto, Univ. Pol. Valencia<br />
Albiol, Antonio, Univ. Pol. Valencia<br />
Mossi, Jose M., Univ. Pol. Valencia<br />
This paper compares several approaches to extract facial landmarks and studies their influence on face recognition problems.<br />
In order to obtain fair comparisons, we use the same number of facial landmarks and the same type of descriptors<br />
(HOG descriptors) for each approach. The comparative results are obtained using FERET and FRGC datasets and show<br />
that better recognition rates are obtained when landmarks are located at real facial fiducial points. However, if the automatic<br />
detection of these is compromised by the difficulty of the images, better results are obtained using fixed landmarks grids.<br />
09:00-11:10, Paper TuAT9.53<br />
Confidence Weighted Subspace Projection Techniques for Robust Face Recognition in the Presence of Partial Occlusio<br />
Struc, Vitomir, Univ. of Ljubljana<br />
Dobrišek, Simon, Univ. of Ljubljana<br />
Pavesic, Nikola, Univ. of Ljubljana<br />
Subspace projection techniques are known to be susceptible to the presence of partial occlusions in the image data. To<br />
overcome this susceptibility, we present in this paper a confidence weighting scheme that assigns weights to pixels according<br />
to a measure, which quantifies the confidence that the pixel in question represents an outlier. With this procedure<br />
the impact of the occluded pixels on the subspace representation is reduced and robustness to partial occlusions is obtained.<br />
Next, the confidence weighting concept is improved by a local procedure for the estimation of the subspace representation.<br />
Both the global weighting approach and the local estimation procedure are assessed in face recognition experiments on<br />
the AR database, where encouraging results are obtained with partially occluded facial images.<br />
- 108 -
09:00-11:10, Paper TuAT9.54<br />
Face Recognition across Pose with Automatic Estimation of Pose Parameters through AAM-Based Landmarking<br />
Teijeiro-Mosquera, Lucía, Univ. de Vigo<br />
Alba Castro, Jose Luis, Univ. of Vigo<br />
Gonzalez-Jimenez, Daniel, Univ. of Vigo<br />
In this paper we present a fully automatic system for face recognition across pose where no frontal view is needed in enrollment<br />
or test. The system uses three Active Appearance Models(AAMs): the first one is a generic multi resolution AAM,<br />
while the remaining ones are trained to cope with left/right variations (i.e. pose-dependent AAMs). During the fitting<br />
stage, pose is automatically estimated using eigenvector analysis, and a synthetic face is generated through texture warping.<br />
Results over CMU PIE Database show promising results compared to the performance achieved with manually land<br />
marked faces.<br />
09:00-11:10, Paper TuAT9.55<br />
Cross-Spectral Face Verification in the Short Wave Infrared (SWIR) Band<br />
Bourlai, Thirimachos, WVU<br />
Kalka, Nathan, WVU<br />
Ross, Arun, West Virginia Univ.<br />
Cukic, Bojan, WVU<br />
Hornak, Lawrence, WVU<br />
The problem of face verification across the short wave infrared spectrum (SWIR) is studied in order to illustrate the advantages<br />
and limitations of SWIR face verification. The contributions of this work are two-fold. First, a database of 50<br />
subjects is assembled and used to illustrate the challenges associated with the problem. Second, a set of experiments is<br />
performed in order to demonstrate the possibility of SWIR cross-spectral matching. Experiments also show that images<br />
captured under different SWIR wavelengths can be matched to visible images with promising results. The role of multispectral<br />
fusion in improving recognition performance in SWIR images is finally illustrated. To the best of our knowledge,<br />
this is the first time cross-spectral SWIR face recognition is being investigated in the open literature.<br />
09:00-11:10, Paper TuAT9.56<br />
Decision Fusion for Patch-Based Face Recognition<br />
Topçu, Berkay, Sabancı Univ.<br />
Erdogan, Hakan, Sabanci Univ.<br />
Patch-based face recognition is a recent method which uses the idea of analyzing face images locally, in order to reduce<br />
the effects of illumination changes and partial occlusions. Feature fusion and decision fusion are two distinct ways to<br />
make use of the extracted local features. Apart from the well-known decision fusion methods, a novel approach for calculating<br />
weights for the weighted sum rule is proposed in this paper. Improvements in recognition accuracies are shown and<br />
superiority of decision fusion over feature fusion is advocated. In the challenging AR database, we obtain significantly<br />
better results using decision fusion as compared to conventional methods and feature fusion methods by using validation<br />
accuracy weighting scheme and nearest-neighbor discriminant analysis dimension reduction method.<br />
09:00-11:10, Paper TuAT9.57<br />
Video based Palmprint Recognition<br />
Methani, Chhaya, IIIT-H<br />
Namboodiri, Anoop, International Inst. of Information Tech.<br />
The use of camera as a biometric sensor is desirable due to its ubiquity and low cost, especially for mobile devices. Palm<br />
print is an effective modality in such cases due to its discrimination power, ease of presentation and the scale and size of<br />
texture for capture by commodity cameras. However, the unconstrained nature of pose and lighting introduces several<br />
challenges in the recognition process. Even minor changes in pose of the palm can induce significant changes in the visibility<br />
of the lines. We turn this property to our advantage by capturing a short video, where the natural palm motion<br />
induces minor pose variations, providing additional texture information. We propose a method to register multiple frames<br />
of the video without requiring correspondence, while being efficient. Experimental results on a set of different 100 palms<br />
show that the use of multiple frames reduces the error rate from 12.75% to 4.7%. We also propose a method for detection<br />
of poor quality samples due to specularities and motion blur, which further reduces the EER to 1.8%.<br />
- 109 -
09:00-11:10, Paper TuAT9.58<br />
Profile Lip Reading for Vowel and Word Recognition<br />
Saitoh, Takeshi, Kyushu Inst. of Tech.<br />
Konishi, Ryosuke, Tottori Univ.<br />
This paper focuses on the profile view, which is the second most typical angle after the frontal face, and proposes a profile<br />
view lip reading method. We applied the normalized cost method to detect profile contour. Five feature points, the tip of the<br />
nose, upper lip, lip corner, lower lip, and chin, were detected from the contour, and eight features obtained from the five<br />
feature points were defined. We gathered two types of utterance scenes, five Japanese vowels and 20 Japanese words. We<br />
selected 20 combinations based on the eight features and carried out recognition experiments. Recognition rates of 99% for<br />
vowel recognition and 86% for word recognition were obtained with five features: two lip heights, two protrusion lengths,<br />
and one lip angle.<br />
11:10-12:10, TuPL1 Anadolu Auditorium<br />
Computational Cameras: Redefining the Image<br />
Shree Nayar Plenary Session<br />
Columbia University, USA<br />
Shree K. Nayar received his PhD degree in Electrical and Computer Engineering from the Robotics Institute at Carnegie<br />
Mellon University in 1990. He is currently the T. C. Chang Professor of Computer Science at Columbia University. He<br />
co-directs the Columbia Vision and Graphics Center. He also heads the Columbia Computer Vision Laboratory (CAVE),<br />
which is dedicated to the development of advanced computer vision systems. His research is focused on three areas; the<br />
creation of novel cameras, the design of physics based models for vision, and the development of algorithms for scene<br />
understanding. His work is motivated by applications in the fields of digital imaging, computer graphics, and robotics.<br />
He has received best paper awards at ICCV 1990, <strong>ICPR</strong> 1994, CVPR 1994, ICCV 1995, CVPR 2000 and CVPR 2004.<br />
He is the recipient of the David Marr Prize (1990 and 1995), the David and Lucile Packard Fellowship (1992), the National<br />
Young Investigator Award (1993), the NTT Distinguished Scientific Achievement Award (1994), the Keck Foundation<br />
Award for Excellence in Teaching (1995) and the Columbia Great Teacher Award (2006). In February 2008, he was elected<br />
to the National Academy of Engineering.<br />
The computational camera embodies the convergence of the camera and the computer. It uses new optics to select rays<br />
from the scene in unusual ways, and an appropriate algorithm to process the selected rays. This as eline to manipulate<br />
images before they are recorded and process the recorded images before they are presented is a powerful one. It enables<br />
us to experience our visual world in rich and compelling ways.<br />
TuBT1 Anadolu Auditorium<br />
Image Analysis – IV Regular Session<br />
Session chair: Hlavac, Vaclav (Czech Technical Univ.)<br />
13:30-13:50, Paper TuBT1.1<br />
Joint Image GMM and Shading MAP Estimation<br />
Shekhovtsov, Alexander, Czech Tech. Univ. in Prague<br />
Hlavac, Vaclav, Czech Tech. Univ.<br />
We consider a simple statistical model of the image, in which the image is represented as a sum of two parts: one part is<br />
explained by an i.i.d. color Gaussian mixture and the other part by a (piecewise) smooth gray scale shading function. The<br />
smoothness is ensured by a quadratic (Tikhonov) or total variation regularization. We derive an EM algorithm to estimate<br />
simultaneously the parameters of the mixture model and the shading. Our algorithms for both kinds of the regularization<br />
solve for shading and mean parameters of the mixture model jointly.<br />
13:50-14:10, Paper TuBT1.2<br />
Continuous Markov Random Field Optimization using Fusion Move Driven Markov Chain Monte Carlo Technique<br />
Kim, Wonsik, Seoul National Univ.<br />
Lee, Kyoung Mu, Seoul National Univ.<br />
Many vision applications have been formulated as Markov Random Field (MRF) problems. Although many of them are<br />
discrete labeling problems, continuous formulation often achieves great improvement on the qualities of the solutions in<br />
- 110 -
some applications such as stereo matching and optical flow. In continuous formulation, however, it is much more difficult<br />
to optimize the target functions. In this paper, we propose a new method called fusion move driven Markov Chain Monte<br />
Carlo method (MCMC-F) that combines the Markov Chain Monte Carlo method and the fusion move to solve continuous<br />
MRF problems effectively. This algorithm exploits powerful fusion move while it fully explore the whole solution space.<br />
We evaluate it using the stereo matching problem. We empirically demonstrate that the proposed algorithm is more stable<br />
and always finds lower energy states than the state-of-the art optimization techniques.<br />
14:10-14:30, Paper TuBT1.3<br />
Approximate Belief Propagation by Hierarchical Averaging of Outgoing Messages<br />
Ogawara, Koichi, Kyushu Univ.<br />
This paper presents an approximate belief propagation algorithm that replaces outgoing messages from a node with the<br />
averaged outgoing message and propagates messages from a low resolution graph to the original graph hierarchically. The<br />
proposed method reduces the computational time by half or two-thirds and reduces the required amount of memory by<br />
60% compared with the standard belief propagation algorithm when applied to an image. The proposed method was implemented<br />
on CPU and GPU, and was evaluated against Middlebury stereo benchmark dataset in comparison with the<br />
standard belief propagation algorithm. It is shown that the proposed method outperforms the other in terms of both the<br />
computational time and the required amount of memory with minor loss of accuracy.<br />
14:30-14:50, Paper TuBT1.4<br />
Cascaded Background Subtraction using Block-Based and Pixel-Based Code<strong>book</strong>s<br />
Guo, Jing-Ming, National Taiwan Univ. of Science and Tech.<br />
Chih-Sheng Hsu, Sheng, National Taiwan Univ. of Science and Tech.<br />
This paper presents a cascaded scheme with block-based and pixel-based code<strong>book</strong>s for background subtraction. The<br />
code<strong>book</strong> is mainly used to compress information to achieve high efficient processing speed. In the block-based stage, 12<br />
intensity values are employed to represent a block. The algorithm extends the concept of the Block Truncation Coding<br />
(BTC), and thus it can further improve the processing efficiency by enjoying its low complexity advantage. In detail, the<br />
block-based stage can remove the most noise without reducing the True Positive (TP) rate, yet it has low precision. To<br />
overcome this problem, the pixel-based stage is adopted to enhance the precision, which also can reduce the False Positive<br />
(FP) rate. Moreover, this study also presents a color model and a match function which can classify an input pixel as<br />
shadow, highlight, background, or foreground. As documented in the experimental results, the proposed algorithm can<br />
provide superior performance to that of the former approaches.<br />
14:50-15:10, Paper TuBT1.5<br />
Moving Cast Shadow Removal based on Local Descriptors<br />
Qin, Rui, Chinese Acad. of Sciences<br />
Liao, Shengcai, Chinese Acad. of Sciences<br />
Lei, Zhen, Chinese Acad. of Sciences<br />
Li, Stan Z., Chinese Acad. of Sciences<br />
Moving cast shadow removal is an important yet difficult problem in video analysis and applications. This paper presents<br />
a novel algorithm for detection of moving cast shadows, that based on a local texture descriptor called Scale Invariant<br />
Local Ternary Pattern (SILTP). An assumption is made that the texture properties of cast shadows bears similar patterns<br />
to those of the background beneath them. The likelihood of cast shadows is derived using information in both color and<br />
texture. An online learning scheme is employed to update the shadow model adaptively. Finally, the posterior probability<br />
of cast shadow region is formulated by further incorporating prior contextual constrains using a Markov Random Field<br />
(MRF) model. The optimal solution is found using graph cuts. Experimental results tested on various scenes demonstrate<br />
the robustness of the algorithm.<br />
TuBT2 Topkapı Hall A<br />
Feature Extraction – I Regular Session<br />
Session chair: Franke, Katrin (Gjøvik Univ. College)<br />
- 111 -
13:30-13:50, Paper TuBT2.1<br />
Local Rotation Invariant Patch Descriptors for 3D Vector Fields<br />
Janis, Fehr, Univ. Freiburg<br />
In this paper, we present two novel methods for the fast computation of local rotation invariant patch descriptors for 3D<br />
vectorial data. Patch based algorithms have recently become very popular approach for a wide range of 2D computer<br />
vision problems. Our local rotation invariant patch descriptors allow an extension of these methods to 3D vector fields.<br />
Our approaches are based on a harmonic representation for local spherical 3D vector field patches, which enables us to<br />
derive fast algorithms for the computation of rotation invariant power spectrum and bispectrum feature descriptors of such<br />
patches.<br />
13:50-14:10, Paper TuBT2.2<br />
Anomaly Detection for Longwave Flir Imagery using Kernel wavelet-Rx<br />
Mehmood, Asif, US Army Res. Lab.<br />
Nasrabadi, Nasser, US Army Res. Lab.<br />
This paper describes a new kernel wavelet-based anomaly detection technique for long-wave (LW) Forward Looking Infrared<br />
(FLIR) imagery. The proposed approach called kernel wavelet-RX algorithm is essentially an extension of the<br />
wavelet-RX algorithm (combination of wavelet transform and RX anomaly detector) to a high dimensional feature space<br />
(possibly infinite) via a certain nonlinear mapping function of the input data. The wavelet-RX algorithm in this high dimensional<br />
feature space can easily be implemented in terms of kernels that implicitly compute dot products in the feature<br />
space (kernelizing the wavelet-RX algorithm). In our kernel wavelet-RX algorithm, a 2-D wavelet transform is first applied<br />
to decompose the input image into uniform subbands. A number of significant subbands (high energy subbands) are concatenated<br />
together to form a subband-image cube. The kernel RX algorithm is then applied to these subband-image cubes<br />
obtained from wavelet decomposition of the LW database images. Experimental results are presented for the proposed<br />
kernel wavelet-RX, wavelet-RX and the classical CFAR algorithm for detecting anomalies (targets) in a large database of<br />
LW imagery. The ROC plots show that the proposed kernel wavelet-RX algorithm outperforms the wavelet-RX as well<br />
as the classical CFAR detector.<br />
14:10-14:30, Paper TuBT2.3<br />
Detection of Salient Image Points using Principal Subspace Manifold Structure<br />
Paiva, Antonio, Univ. of Utah<br />
Tasdizen, Tolga, Univ. of Utah<br />
This paper presents a method to find salient image points in images with regular patterns based on deviations from the<br />
overall manifold structure. The two main contributions are that: (I) the features to extract salient point are derived directly<br />
and in an unsupervised manner from image neighborhoods, and (ii) the manifold structure is utilized, thus avoiding the<br />
assumption that data lies in clusters and the need to do density estimation. We illustrate the concept for the detection of<br />
fingerprint minutiae, fabric defects, and interesting regions of seismic data.<br />
14:30-14:50, Paper TuBT2.4<br />
Triangle-Constraint for Finding More Good Features<br />
Guo, Xiaojie, Tianjin Univ.<br />
Cao, Xiaochun, Tianjin Univ.<br />
We present a novel method for finding more good feature pairs between two sets of features. We first select matched features<br />
by Bi-matching method as seed points, then organize these seed points by adopting the Delaunay triangulation algorithm.<br />
Finally, we use Triangle-Constraint (T-C) to increase both number of correct matches and matching score (the ratio<br />
between number of correct matches and total number of matches). The experimental evaluation shows that our method is<br />
robust to most of geometric and photometric transformations including rotation, scale change, blur, viewpoint change,<br />
JPEG compression and illumination change, and significantly improves both number of correct matches and matching<br />
score.<br />
- 112 -
14:50-15:10, Paper TuBT2.5<br />
Compressing Sparse Feature Vectors using Random Ortho-Projections<br />
Rahtu, Esa, Univ. of Oulu<br />
Salo, Mikko, Univ. of Helsinki<br />
Heikkilä, Janne, Univ. of Oulu<br />
In this paper we investigate the usage of random ortho-projections in the compression of sparse feature vectors. The study<br />
is carried out by evaluating the compressed features in classification tasks instead of concentrating on reconstruction accuracy.<br />
In the random ortho-projection method, the mapping for the compression can be obtained without any further<br />
knowledge of the original features. This makes the approach favorable if training data is costly or impossible to obtain.<br />
The independence from the data also enables one to embed the compression scheme directly into the computation of the<br />
original features. Our study is inspired by the results in compressive sensing, which state that up to a certain compression<br />
ratio and with high probability, such projections result in no loss of information. In comparison to learning based compression,<br />
namely principal component analysis (PCA), the random projections resulted in comparable performance already<br />
at high compression ratios depending on the sparsity of the original features.<br />
TuBT3 Marmara Hall<br />
Object Detection and Recognition – II Regular Session<br />
Session chair: Porikli, Fatih (MERL)<br />
13:30-13:50, Paper TuBT3.1<br />
Learning Discriminative Features based on Distribution<br />
Shen, Jifeng, Southeast Univ.<br />
Yang, Wankou, Southeast Univ.<br />
Sun, Changyin, Southeast Univ.<br />
In this paper, a novel feature named adaptive projection LBP (APLBP) is proposed for face detection. To promote discriminative<br />
power, the distribution information of training samples is embedded into the proposed feature. APLBP is generated<br />
by LDA which maximizes the margin between positive and negative samples adaptively, utilizing characteristics<br />
of similarity to Gaussian distribution of the training samples. Asymmetric Gentle Adaboost is utilized to train strong classifier<br />
and nested cascade is applied to construct the final detector. Experimental results based on MIT+CMU database<br />
demonstrate that APLBP feature outperforms several well-existing features due to its excellent discriminative power with<br />
less feature number.<br />
13:50-14:10, Paper TuBT3.2<br />
Sub-Category Optimization for Multi-View Multi-Pose Object Detection<br />
Das, Dipankar, Saitama Univ.<br />
Kobayashi, Yoshinori, Saitama Univ.<br />
Kuno, Yoshinori, Saitama Univ.<br />
Object category detection with large appearance variation is a fundamental problem in computer vision. The appearance<br />
of object categories can change due to intra-class variability, viewpoint, and illumination. For object categories with large<br />
appearance change a sub-categorization based approach is necessary. This paper proposes a sub-category optimization<br />
approach that automatically divides an object category into an appropriate number of sub-categories based on appearance<br />
variation. Instead of using a predefined intra-category sub-categorization based on domain knowledge or validation<br />
datasets, we divide the sample space by unsupervised clustering based on discriminative image features. Then the clustering<br />
performance is verified using a sub-category discriminant analysis. Based on the clustering performance of the unsupervised<br />
approach and sub-category discriminant analysis results we determine an optimal number of sub-categories per object category.<br />
Extensive experimental results are shown using two standard and the authors’ own databases. The comparison<br />
results show that our approach outperforms the state-of-the-art methods.<br />
14:10-14:30, Paper TuBT3.3<br />
Learning and Detection of Object Landmarks in Canonical Object Space<br />
Kamarainen, Joni-Kristian, Lappeenranta Univ. of Tech.<br />
Ilonen, Jarmo, Lappeenranta Univ. of Tech.<br />
- 113 -
This work contributes to part-based object detection and recognition by introducing an enhanced method for local part<br />
detection. The method is based on complex-valued multiresolution Gabor features and their ranking using multiple hypothesis<br />
testing. In the present work, our main contribution is the introduction of a canonical object space, where objects<br />
are represented in their ``expected pose and visual appearance’’. The canonical space circumvents the problem of geometric<br />
image normalisation prior to feature extraction. In addition, we define a compact set of Gabor filter parameters, from<br />
where the optimal values can be easily devised. These enhancements make our method an attractive landmark detector<br />
for part-based object detection and recognition methods.<br />
14:30-14:50, Paper TuBT3.4<br />
Multiple-Shot Person Re-Identification by HPE Signature<br />
Bazzani, Loris, Univ. of Verona<br />
Cristani, Marco, Univ. of Verona<br />
Perina, Alessandro, Univ. of Verona<br />
Farenzena, Michela, Univ. of Verona<br />
Murino, Vittorio, Univ. of Verona<br />
In this paper, we propose a novel appearance-based method for person re-identification, that condenses a set of frames of<br />
the same individual into a highly informative signature, called Histogram Plus Epitome, HPE. It incorporates complementary<br />
global and local statistical descriptions of the human appearance, focusing on the overall chromatic content, via<br />
histograms representation, and on the presence of recurrent local patches, via epitome estimation. The matching of HPEs<br />
provides optimal performances against low resolution, occlusions, pose and illumination variations, defining novel stateof-the-art<br />
results on all the datasets considered.<br />
14:50-15:10, Paper TuBT3.5<br />
Building Detection in a Single Remotely Sensed Image with a Point Process of Rectangles<br />
Benedek, Csaba, Computer and Automation Res. Inst. Hungarian<br />
Descombes, Xavier, INRIA<br />
Zerubia, Josiane, INRIA<br />
In this paper we introduce a probabilistic approach of building extraction in remotely sensed images. To cope with data<br />
heterogeneity we construct a flexible hierarchical framework which can create various building appearance models from<br />
different elementary feature based modules. A global optimization process attempts to find the optimal configuration of<br />
buildings, considering simultaneously the observed data, prior knowledge, and interactions between the neighboring building<br />
parts. The proposed method is evaluated on various aerial image sets containing more than 500 buildings, and the<br />
results are matched against two state-of-the-art techniques.<br />
TuBT4 Dolmabahçe Hall A<br />
Model Selection and Clustering Regular Session<br />
Session chair: Shapiro, Linda (Univ. of Washington)<br />
13:30-13:50, Paper TuBT4.1<br />
A Relationship between Generalization Error and Training Samples in Kernel Regressors<br />
Tanaka, Akira, Hokkaido Univ.<br />
Imai, Hideyuki, Hokkaido Univ.<br />
Kudo, Mineichi, Hokkaido Univ.<br />
Miyakoshi, Masaaki, Hokkaido Univ.<br />
A relationship between generalization error and training samples in kernel regressors is discussed in this paper. The generalization<br />
error can be decomposed into two components. One is a distance between an unknown true function and an<br />
adopted model space. The other is a distance between an estimated function and the orthogonal projection of the unknown<br />
true function onto the model space. In our previous work, we gave a framework to evaluate the first component. In this<br />
paper, we theoretically analyze the second one and show that a larger set of training samples usually causes a larger generalization<br />
error.<br />
- 114 -
13:50-14:10, Paper TuBT4.2<br />
Localized Multiple Kernel Regression<br />
Gönen, Mehmet, Bogazici Univ.<br />
Alpaydin, Ethem, Bogazici Univ.<br />
Multiple kernel learning (MKL) uses a weighted combination of kernels where the weight of each kernel is optimized<br />
during training. However, MKL assigns the same weight to a kernel over the whole input space. Our main objective is the<br />
formulation of the localized multiple kernel learning (LMKL) framework that allows kernels to be combined with different<br />
weights in different regions of the input space by using a gating model. In this paper, we apply the LMKL framework to<br />
regression estimation and derive a learning algorithm for this extension. Canonical support vector regression may over fit<br />
unless the kernel parameters are selected appropriately; we see that even if provide more kernels than necessary, LMKL<br />
uses only as many as needed and does not overfit due to its inherent regularization.<br />
14:10-14:30, Paper TuBT4.3<br />
Probabilistic Clustering using the Baum-Eagon Inequality<br />
Rota Bulo’, Samuel, Univ. Ca’ Foscari di Venezia<br />
Pelillo, Marcello, Ca’ Foscari Univ.<br />
The paper introduces a framework for clustering data objects in a similarity-based context. The aim is to cluster objects<br />
into a given number of classes without imposing a hard partition, but allowing for a soft assignment of objects to clusters.<br />
Our approach uses the assumption that similarities reflect the likelihood of the objects to be in a same class in order to<br />
derive a probabilistic model for estimating the unknown cluster assignments. This leads to a polynomial optimization in<br />
probability domain, which is tackled by means of a result due to Baum and Eagon. Experiments on both synthetic and real<br />
standard datasets show the effectiveness of our approach.<br />
14:30-14:50, Paper TuBT4.4<br />
Ensemble Clustering via Random Walker Consensus Strategy<br />
Abdala, Daniel Duarte, Univ. of Münster<br />
Wattuya, Pakaket, Univ. of Münster<br />
Jiang, Xiaoyi, Univ. of Münster<br />
In this paper we present the adaptation of a random walker algorithm for combination of image segmentations to work<br />
with clustering problems. In order to achieve it, we pre-process the ensemble of clusterings to generate its graph representation.<br />
We show experimentally that a very small neighborhood will produce similar results if compared with larger choices.<br />
This fact alone improves the computational time needed to produce the final consensual clustering. We also present an experimental<br />
comparison between our results against other graph based and well known combination clustering methods in<br />
order to assess the quality of this approach.<br />
14:50-15:10, Paper TuBT4.5<br />
Bhattacharyya Clustering with Applications to Mixture Simplifications<br />
Nielsen, Frank, Ecole Polytechnique/SONY CLS<br />
Boltz, Sylvain, Ecole Polytechnique/SONY CLS<br />
Schwander, Olivier, Ecole Polytechnique/SONY CLS<br />
Bhattacharrya distance (BD) is a widely used distance in statistics to compare probability density functions (PDFs). It has<br />
shown strong statistical properties (in terms of Bayes error) and it relates to Fisher information. It has also practical advantages,<br />
since it strongly relates on measuring the overlap of the supports of the PDFs. Unfortunately, even with common<br />
parametric models on PDFs, few closed-form formulas are known. Moreover, the BD centroid estimation was limited to<br />
univariate gaussian PDFs in the literature and no convergence guarantees were provided. In this paper, we propose a<br />
closed-form formula for BD on a general class of parametric distributions named exponential families. We show that the<br />
BD is a Burbea-Rao divergence for the log normalizer of the exponential family. We propose an efficient iterative scheme<br />
to compute a BD centroid on exponential families. Finally, these results allow us to define a Bhattacharrya hierarchical<br />
clustering algorithms (BHC). It can be viewed as a generalization of k-means on BD. Results on image segmentation<br />
shows the stability of the method.<br />
- 115 -
TuBT5 Dolmabahçe Hall B<br />
Watermarking and Authentication Regular Session<br />
Session chair: Bülent Sankur (Boğaziçi Univ.)<br />
13:30-13:50, Paper TuBT5.1<br />
High Capacity Data Hiding for Binary Image Authentication<br />
Guo, Meng, Beijing Univ. of Tech.<br />
Zhang, Hongbin, Beijing Univ. of Tech.<br />
This paper proposes a novel data hiding scheme with high capacity for binary images, including document images, halftone<br />
images, scanned figures, text and signatures. In our scheme, the embedding efficiency and the placement of embedding<br />
changes are considered simultaneously. Given an MxN image block, the upper bound of the amount of bits that can be<br />
embedded of the scheme is nlog2((MxN)/n + 1) by changing at most n pixels. Experimental results show that the proposed<br />
scheme can embed more data, meanwhile maintain a better quality, and have wider applications than existing schemes.<br />
13:50-14:10, Paper TuBT5.2<br />
Secure Self-Recovery Image Authentication using Randomly-Sized Blocks<br />
Hassan, Ammar M., Otto-von-Guericke Univ.<br />
Al-Hamadi, Ayoub, IESK<br />
Michaelis, Bernd, IESK<br />
Hasan, Yassin M. Y., Assiut Univ.<br />
Wahab, Mohamed A. A., Minia Univ.<br />
In this paper, a secure variable-size block-based image authentication technique is proposed that can not only localize the<br />
alteration detection but also recover the missing data. An image undergoes recursive arbitrarily-asymmetric binary tree<br />
partitioning to obtain randomly-sized blocks spanning the entire image. To enhance reliability of altered block recovery,<br />
multiple description coding (MDC) is utilized to generate two block descriptions. Block signature copies and the two<br />
block descriptions are embedded into two relatively-distant blocks making a doubly linked chain. The experimental results<br />
deposit that the proposed technique successfully both localizes and compensates the alterations. Furthermore, it is robust<br />
against the vector quantization (VQ) attack.<br />
14:10-14:30, Paper TuBT5.3<br />
Blind Wavelet based Logo Watermarking Resisting to Cropping<br />
Soheili, Mohammadreza, Tarbiat Moallem Univ.<br />
In this paper we propose a blind wavelet-based logo watermarking scheme focusing on resisting to cropping. The binary<br />
logo is embedded in the LL2 sub-band of host image, using quantization technique. For increasing robustness of proposed<br />
algorithm two dimensional parity bits are added to the binary logo. Experimental results show that the proposed watermarking<br />
method can resist not only cropping attack, but also some common signal processing attacks, such as JPEG compression,<br />
average and median filtering, rotation and scaling.<br />
14:30-14:50, Paper TuBT5.4<br />
The New Blockwise Algorithm for Large-Scale Images Robust Watermarking<br />
Mitekin, Vitaly, Russian Acad. of Sciences<br />
Glumov, Nikolay, Russian Acad. of Sciences<br />
A new algorithm for digital watermarking of large-scale digital images is proposed in the article. The proposed algorithm<br />
provides watermark robustness to a wide range of host image distortions and has a number of advantages compared to an<br />
existing algorithms of robust watermarking.<br />
14:50-15:10, Paper TuBT5.5<br />
Lossless ROI Medical Image Watermarking Technique with Enhanced Security and High Payload Embedding<br />
Kundu, Malay Kumar, Indian Statistical Inst.<br />
Das, Sudeb, Indian Statistical Inst.<br />
- 116 -
In this article, a new fragile, blind, high payload capacity, ROI (Region of Interest) preserving Medical image watermarking<br />
(MIW) technique in the spatial domain for gray scale medical images is proposed. We present a watermarking scheme<br />
that combines lossless data compression and encryption technique in application to medical images. The effectiveness of<br />
the proposed scheme, proven through experiments on various medical images through various image quality measure matrices<br />
such as PSNR, MSE and MSSIM enables us to argue that, the method will help to maintain Electronic Patient Report(EPR)/DICOM<br />
data privacy and medical image integrity.<br />
TuBT6 Topkapı Hall B<br />
Face Recognition – I Regular Session<br />
Session chair: Ross, Arun (West Virginia Univ.)<br />
13:0-13:50, Paper TuBT6.1<br />
Efficient Facial Attribute Recognition with a Spatial Code<strong>book</strong><br />
Ijiri, Yoshihisa, OMRON Corp.<br />
Lao, Shihong, OMRON Corp.<br />
Han, Tony X., Univ. of Missouri<br />
Murase, Hiroshi, Nagoya Univ.<br />
There is a large number of possible facial attributes such as hairstyle, with/without glasses, with/without mustache, etc.<br />
Considering large number of facial attributes and their combinations, it is difficult to build attributes classifiers for all possible<br />
combinations needed in various applications, especially at the designing stage. To tackle this important and challenging<br />
problem, we propose a novel efficient facial attributes recognition algorithm using a learned spatial code<strong>book</strong>.<br />
The Maximum Entropy and Maximum Orthogonality (MEMO) criterion is followed to learn the spatial code<strong>book</strong>. With<br />
a spatial code<strong>book</strong> constructed at the designing stage, attribute classifiers can be trained on demand with a small number<br />
of exemplars with high accuracy on the testing data. Meanwhile, up to 600 times speedup is achieved in the on-demand<br />
training process, compared to current state-of-the-art method. The effectiveness of the proposed method is supported by<br />
convincing experimental results.<br />
13:50-14:10, Paper TuBT6.2<br />
Feature Space Hausdorff Distance for Face Recognition<br />
Chen, Shaokang, NICTA<br />
Lovell, Brian Carrington, The Univ. of Queensland<br />
We propose a novel face image similarity measure based on Hausdorff distance (HD). In contrast to conventional HDbased<br />
measures, which are generally applied in the image space (such as edge maps or gradient images), the proposed<br />
HD-based similarity measure is applied in the feature space. By extending the concept of HD using a variable radius and<br />
reference set, we can generate a neighbourhood set for HD measures in feature space and then apply this concept for classification.<br />
Experiments on the Labeled Faces in the Wild; and FRGC datasets show that the proposed measure improves<br />
the overall classification performance quite dramatically, especially under the highly desirable low false acceptance rate<br />
conditions.<br />
14:10-14:30, Paper TuBT6.3<br />
How to Measure Biometric Information?<br />
Sutcu, Yagiz, Pol. Inst. of New York Univ.<br />
Sencar, Husrev Taha, TOBB Univ. of Ec. and Tech.<br />
Memon, Nasir, Pol. Inst. of New York Univ.<br />
Being able to measure the actual information content of biometrics is very important but also a challenging problem. Main<br />
difficulty here is not only related to the selected feature representation of the biometric data, but also related to the matching<br />
algorithm employed in biometric systems. In this paper, we propose a new measure for measuring biometric information<br />
using relative entropy between intra-user and inter-user distance distributions. As an example, we evaluated the proposed<br />
measure on a face image dataset.<br />
- 117 -
14:30-14:50, Paper TuBT6.4<br />
Intensity-Based Congealing for Unsupervised Joint Image Alignment<br />
Storer, Markus, Graz Univ. of Tech.<br />
Urschler, Martin, Graz Univ. of Tech.<br />
Bischof, Horst, Graz Univ. of Tech.<br />
We present an approach for unsupervised alignment of an ensemble of images called congealing. Our algorithm is based<br />
on image registration using the mutual information measure as a cost function. The cost function is optimized by a standard<br />
gradient descent method in a multiresolution scheme. As opposed to other congealing methods, which use the SSD measure,<br />
the mutual information measure is better suited as a similarity measure for registering images since no prior assumptions<br />
on the relation of intensities between images are required. We present alignment results on the MNIST handwritten digit<br />
database and on facial images obtained from the CVL database.<br />
14:50-15:10, Paper TuBT6.5<br />
An Illumination Quality Measure for Face Recognition<br />
Rizo-Rodriguez, Dayron, Advanced Tech. Application Center<br />
Mendez-Vazquez, Heydi, Advanced Tech. Application Center<br />
Garcia, Edel, Advanced Tech. Application Center<br />
A method to determine whether face images are affected or not by lighting problems is proposed. The method is the result<br />
of combining the analysis of lighting effect on face regions with the analysis of special areas which have a weight on the<br />
decision. Good results were obtained classifying well and badly illuminated images. The proposed method was inserted<br />
on a face recognition framework in order to apply the preprocessing step only to those images affected by illumination<br />
variations. The good performance achieved on verification and identification experiments, confirm that it is better to apply<br />
the proposed methodology than to preprocess all images when the lighting conditions are variable.<br />
TuBT7 Dolmabahçe Hall C<br />
Biomedical Image Segmentation Regular Session<br />
Session chair: Kato, Zoltan (Univ. of Szeged)<br />
13:30-13:50, Paper TuBT7.1<br />
Cascaded Segmentation of Grained Cell Tissue with Active Contour Models<br />
Moeller, Birgit, Martin-Luther-Univ. Halle-Wittenberg<br />
Stöhr, Nadine, ZAMED, Martin Luther Univ. Halle-Wittenberg<br />
Hüttelmaier, Stefan, ZAMED, Martin Luther Univ. Halle-Wittenberg<br />
Posch, Stefan, Martin-Luther-Univ. Halle-Wittenberg<br />
Cell tissue in microscope images is often grained and its intensities do not well agree with Gaussian distribution assumptions<br />
widely used in many segmentation approaches. We present a new cascaded segmentation scheme for inhomogeneous<br />
cell tissue based on active contour models. Cell regions are iteratively expanded from initial nuclei regions applying a<br />
data-dependent number of optimization levels. Experimental results on a set of microscope images from a human hepatoma<br />
cell line prove high quality of the results with regard to the cell segmentation task and biomedical investigations.<br />
13:50-14:10, Paper TuBT7.2<br />
Live Cell Segmentation in Fluorescence Microscopy via Graph Cut<br />
Lesko, Milan, Univ. of Szeged<br />
Kato, Zoltan, Univ. of Szeged<br />
Nagy, Antal, Univ. of Szeged<br />
Gombos, Imre, Hungarian Acad. of Sciences<br />
Torok, Zsolt, Hungarian Acad. of Sciences<br />
Vigh Jr, Laszlo, Univ. of Szeged<br />
Vigh, Laszlo, Hungarian Acad. of Sciences<br />
We propose a novel Markovian segmentation model which takes into account edge information. By construction, the<br />
model uses only pairwise interactions and its energy is sub modular. Thus the exact energy minima is obtained via a maxflow/min-cut<br />
algorithm. The method has been quantitatively evaluated on synthetic images as well as on fluorescence microscopic<br />
images of live cells.<br />
- 118 -
14:10-14:30, Paper TuBT7.3<br />
Retinal Blood Vessels Segmentation using the Radial Projection and Supervised Classification<br />
Peng, Qinmu, Huazhong Univ. of Science and Tech.<br />
You, Xinge, Huazhong Univ. of Science and Tech.<br />
Zhou, Long, Wuhan Pol. Univ.<br />
Cheung, Yiu-Ming, Hong Kong Baptist Univ.<br />
The low-contrast and narrow blood vessels in retinal images are difficult to be extracted but useful in revealing certain<br />
systemic disease. Motivated by the goals of improving detection of such vessels, we propose the radial projection method<br />
to locate the vessel centerlines. Then the supervised classification is used for extracting the major structures of vessels.<br />
The final segmentation is obtained by the union of the two types of vessels after removal schemes. Our approach is tested<br />
on the STARE database, the results demonstrate that our algorithm can yield better segmentation.<br />
14:30-14:50, Paper TuBT7.4<br />
Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During Speech<br />
Fasel, Ian, Univ. of Arizona<br />
Berry, Jeff, Univ. of Arizona<br />
Ultrasound has become a useful tool for speech scientists studying mechanisms of language sound production. State-ofthe-art<br />
methods for extracting tongue contours from ultrasound images of the mouth, typically based on active contour<br />
snakes, require considerable manual interaction by an expert linguist. In this paper we describe a novel method for fully<br />
automatic extraction of tongue contours based on a hierarchy of restricted Boltzmann machines (RBMs), i.e. deep belief<br />
networks (DBNs). Usually, DBNs are first trained generatively on sensor data, then discriminatively to predict humanprovided<br />
labels of the data. In this paper we introduce the translational RBM (tRBM), which allows the DBN to make use<br />
of both human labels and raw sensor data at all stages of learning. This method yields performance in contour extraction<br />
comparable to human labelers, without any temporal smoothing or human intervention, and runs in real-time.<br />
14:50-15:10, Paper TuBT7.5<br />
Automated Gland Segmentation and Classification for Gleason Grading of Prostate Tissue Images<br />
Nguyen, Kien, Michigan State Univ.<br />
Jain, Anil, Michigan State Univ.<br />
Allen, Ronald, BioImagene<br />
The well-known Gleason grading method for an H&E prostatic carcinoma tissue image uses morphological features of histology<br />
patterns within a tissue slide to classify it into 5 grades. We have developed an automated gland segmentation and<br />
classification method that will be used for automated Gleason grading of a prostatic carcinoma tissue image. We demonstrate<br />
the performance of the proposed classification system for a three-class classification problem (benign, grade 3 carcinoma<br />
and grade 4 carcinoma) on a dataset containing 78 tissue images and achieve a classification accuracy of 88.84%. In comparison<br />
to the other segmentation-based methods, our approach combines the similarity of morphological patterns associated<br />
with a grade with the domain knowledge such as the appearance of nuclei and blue mucin for the grading task.<br />
TuCT1 Topkapı Hall B<br />
Face Recognition – II Regular Session<br />
Session chair: Tistarelli, Massimo (Univ. of Sassari)<br />
15:40-16:00, Paper TuCT1.1<br />
Multi-Resolution Local Appearance-Based Face Verification<br />
Gao, Hua, Karlsruhe Inst. of Tech.<br />
Ekenel, Hazim Kemal, Karlsruhe Inst. of Tech.<br />
Fischer, Mika, Karlsruhe Inst. of Tech.<br />
Stiefelhagen, Rainer, Karlsruhe Inst. of Tech. & Fraunhofer IITB<br />
Facial analysis based on local regions/blocks usually outperforms holistic approaches because it is less sensitive to local<br />
deformations and occlusions. Moreover, modeling local features enables us to avoid the problem of high dimensionality<br />
of feature space. In this paper, we model the local face blocks with Gabor features and project them into a discriminant<br />
identity space. The similarity score of a face pair is determined by fusion of the local classifiers. To acquire complementary<br />
- 119 -
information in different scales of face images, we integrate the local decisions from various image resolutions. The proposed<br />
multi-resolution block based face verification system is evaluated on the experiment 4 of Face Recognition Grand Challenge<br />
(FRGC) version 2.0. We obtained 92.5% verification rate@0.1% FAR, which is the highest performance reported<br />
on this experiment so far in the literature.<br />
16:00-16:20, Paper TuCT1.2<br />
Partial Face Biometry using Shape Decomposition on 2D Conformal Maps of Faces<br />
Szeptycki, Przemyslaw, Ec. Centrale de Lyon<br />
Ardabilian, Mohsen, Ec. Centrale de Lyon<br />
Chen, Liming, Ec. Centrale de Lyon<br />
Zeng, Wei, Wayne State Univ.<br />
Gu, Xianfeng, State Univ. of New York at Stony Brook<br />
Samaras, Dimitris, Stony Brook Univ.<br />
In this paper, we introduce a new approach for partial 3D face recognition, which makes use of shape decomposition over<br />
the rigid1 part of a face. To explore the descriptiveness of shape dissimilarity over an isometric part of a face, which has<br />
lower probability to be influenced by expression, we transform a 3D shape to a 2D domain using conformal mapping and<br />
use shape decomposition as a similarity measurement. In our work we investigate several classifiers as well as several<br />
shape descriptors for recognition purposes. Recognition tests on a subset of the FRGC data set show approximately 80%<br />
rank-one recognition rate using only the eyes and nose part of the face.<br />
16:20-16:40, Paper TuCT1.3<br />
Gender Classification using Interlaced Derivative Patterns<br />
Shobeirinejad, Ameneh, Griffith Univ.<br />
Gao, Yongsheng, Griffith Univ.<br />
Automated gender recognition has become an interesting and challenging research problem in recent years with its potential<br />
applications in security industry and human-computer interaction systems. In this paper we present a novel feature representation,<br />
namely Interlaced Derivative Patterns (IDP), which is a derivative-based technique to extract discriminative<br />
facial features for gender classification. The proposed technique operates on a neighborhood around a pixel and concatenates<br />
the extracted regional feature distributions to form a feature vector. The experimental results demonstrate the effectiveness<br />
of the IDP method for gender classification, showing that the proposed approach achieves 29.6% relative error<br />
reduction compared to Local Binary Patterns (LBP), while it performs over four times faster than Local Derivative Patterns<br />
(LDP).<br />
16:40-17:00, Paper TuCT1.4<br />
Heterogeneous Face Recognition: Matching NIR to Visible Light Images<br />
Klare, Brendan, Michigan State Univ.<br />
Jain, Anil, Michigan State Univ.<br />
Matching near-infrared (NIR) face images to visible light (VIS) face images offers a robust approach to face recognition<br />
with unconstrained illumination. In this paper we propose a novel method of heterogeneous face recognition that uses a<br />
common feature-based representation for both NIR images as well as VIS images. Linear discriminant analysis is performed<br />
on a collection of random subspaces to learn discriminative projections. NIR and VIS images are matched (I) directly<br />
using the random subspace projections, and (ii) using sparse representation classification. Experimental results demonstrate<br />
the effectiveness of the proposed approach for matching NIR and VIS face images.<br />
17:00-17:20, Paper TuCT1.5<br />
Clustering Face Carvings: Application to Devatas of Angkor Wat<br />
Klare, Brendan, Michigan State Univ.<br />
Mallapragada, Pavan Kumar, Michigan State Univ.<br />
Jain, Anil, Michigan State Univ.<br />
Davis, Kent, DatAsia Inc.<br />
We propose a framework for clustering and visualization of images of face carvings at archaeological sites. The pairwise<br />
- 120 -
similarities among face carvings are computed by performing Procrustes analysis on local facial features (eyes, nose,<br />
mouth, etc.). The distance between corresponding face features is computed using point distribution models; the final pairwise<br />
similarity is the weighted sum of feature similarities. A web-based interface is provided to allow domain experts to<br />
interactively assign different weights to each face feature, and display hierarchical clustering results in 2D or 3D projections<br />
obtained by multidimensional scaling. The proposed framework has been successfully applied to the devata goddesses<br />
depicted in the ancient Angkor Wat temple. The resulting clusterings and visualization will enable a systematic anthropological,<br />
ethnological and artistic analysis of nearly 1,800 stone portraits of devatas of Angkor Wat.<br />
TuCT2 Topkapı Hall A<br />
Feature Extraction – II Regular Session<br />
Session chair: Covell, Michele (Google, Inc.)<br />
15:40-16:00, Paper TuCT2.1<br />
Action Recognition using Spatial-Temporal Context<br />
Hu, Qiong, Chinese Acad. of Sciences<br />
Qin, Lei, Chinese Acad. of Sciences<br />
Huang, Qingming, Chinese Acad. of Sciences<br />
Jiang, Shuqiang, Chinese Acad. of Sciences<br />
Tian, Qi, Univ. of Texas at San Antonio<br />
The spatial-temporal local features and the bag of words representation have been widely used in the action recognition<br />
field. However, this framework usually neglects the internal spatial-temporal relations between video-words, resulting in<br />
ambiguity in action recognition task, especially for videos in the wild. In this paper, we solve this problem by utilizing the<br />
volumetric context around a video-word. Here, a local histogram of video-words distribution is calculated, which is referred<br />
as the context and further clustered into contextual words. To effectively use the contextual information, the descriptive<br />
video-phrases (ST-DVPs) and the descriptive video-cliques (ST-DVCs) are proposed. A general framework for ST-DVP<br />
and ST-DVC generation is described, and then action recognition can be done based on all these representations and their<br />
combinations. The proposed method is evaluated on two challenging human action datasets: the KTH dataset and the<br />
YouTube dataset. Experiment results confirm the validity of our approach.<br />
16:00-16:20, Paper TuCT2.2<br />
Feature Extraction for Simple Classification<br />
Stuhlsatz, André, Univ. of Applied Sciences Duesseldorf<br />
Lippel, Jens, Univ. of Applied Sciences Duesseldorf<br />
Zielke, Thomas, Univ. of Applied Sciences Duesseldorf<br />
Constructing a recognition system based on raw measurements for different objects usually requires expert knowledge of<br />
domain specific data preprocessing, feature extraction, and classifier design. We seek to simplify this process in a way<br />
that can be applied without any knowledge about the data domain and the specific properties of different classification algorithms.<br />
That is, a recognition system should be simple to construct and simple to operate in practical applications. For<br />
this, we have developed a nonlinear feature extractor for high-dimensional complex patterns, using Deep Neural Networks<br />
(DNN). Trained partly supervised and unsupervised, the DNN effectively implements a nonlinear discriminant analysis<br />
based on a Fisher criterion in a feature space of very low dimensions. Our experiments show that the automatically extracted<br />
features work very well with simple linear discriminants, while the recognition rates improve only minimally if more sophisticated<br />
classification algorithms like Support Vector Machines (SVM) are used instead.<br />
16:20-16:40, Paper TuCT2.3<br />
Towards a Generic Feature-Selection Measure for Intrusion Detection<br />
Nguyen, Hai Thanh, Gjøvik Univ. Coll.<br />
Franke, Katrin, Gjøvik Univ. Coll.<br />
Petrovic, Slobodan, Gjøvik Univ. Coll.<br />
Performance of a pattern recognition system depends strongly on the employed feature-selection method. We perform an<br />
in-depth analysis of two main measures used in the filter model: the correlation-feature-selection (CFS) measure and the<br />
minimal-redundancy-maximal-relevance (mRMR) measure. We show that these measures can be fused and generalized<br />
into a generic feature-selection (GeFS) measure. Further on, we propose a new feature-selection method that ensures glob-<br />
- 121 -
ally optimal feature sets. The new approach is based on solving a mixed 0-1 linear programming problem (M01LP) by<br />
using the branch-and-bound algorithm. In this M01LP problem, the number of constraints and variables is linear ($O(n)$)<br />
in the number $n$ of full set features. In order to evaluate the quality of our GeFS measure, we chose the design of an intrusion<br />
detection system (IDS) as a possible application. Experimental results obtained over the KDD Cup’99 test data set<br />
for IDS show that the GeFS measure removes 93% of irrelevant and redundant features from the original data set, while<br />
keeping or yielding an even better classification accuracy.<br />
16:40-17:00, Paper TuCT2.4<br />
Discriminative Basis Selection using Non-Negative Matrix Factorization<br />
Jammalamadaka, Aruna, Univ. of California, Santa Barbara<br />
Joshi, Swapna, Univ. of California, Santa Barbara<br />
Shanmuga Vadivel, Karthikeyan, Univ. of California, Santa Barbara<br />
Manjunath, B. S., Univ. of California, Santa Barbara<br />
Non-negative matrix factorization (NMF) has proven to be useful in image classification applications such as face recognition.<br />
We propose a novel discriminative basis selection method for classification of image categories based on the popular<br />
term frequency-inverse document frequency (TF-IDF) weight used in information retrieval. We extend the algorithm to<br />
incorporate color, and overcome the drawbacks of using unaligned images. Our method is able to choose visually significant<br />
bases which best discriminate between categories and thus prune the classification space to increase correct classifications.<br />
We apply our technique to ETH-80, a standard image classification benchmark dataset. Our results show that our algorithm<br />
outperforms other state-of-the-art techniques.<br />
17:00-17:20, Paper TuCT2.5<br />
Recognizing Dance Motions with Segmental SVD<br />
Deng, Liqun, Univ. of Science & Tech. of China<br />
Leung, Howard, City Univ. of Hong Kong<br />
Gu, Naijie, Univ. of Science & Tech. of China<br />
Yang, Yang, Univ. of Science & Tech. of China<br />
In this paper, a novel concept of segmental singular value decomposition (SegSVD) is proposed to represent a motion<br />
pattern with a hierarchical structure. The similarity measure based on the SegSVD representation is also proposed. SegSVD<br />
is capable of capturing the temporal information of the time series. It is effective in matching patterns in a time series in<br />
which the start and end points of the patterns are not known in advance. We evaluate the performance of our method on<br />
both isolated motion classification and continuous motion recognition for dance movements. Experiments show that our<br />
method outperforms existing work in terms of higher recognition accuracy.<br />
TuCT3 Marmara Hall<br />
Object Detection and Recognition – III Regular Session<br />
Session chair: Nixon, Mark (Univ. of Southampton)<br />
15:40-16:00, Paper TuCT3.1<br />
Multi-Class Graph Boosting with Subgraph Sharing for Object Recognition<br />
Zhang, Bang, Univ. of New South Wales, National ICT Australia<br />
Ye, Getian, Univ. of New South Wales<br />
Wang, Yang, National ICT Australia, Univ. of New South Wales<br />
Wang, Wei, Univ. of New South Wales<br />
Xu, Jie, National ICT Australia, Univ. of New South Wales<br />
Herman, Gunawan, National ICT Australia, Univ. of New South Wales<br />
Yang, Jun, National ICT Australia, Univ. of New South Wales<br />
In this paper, we propose a novel multi-class graph boosting algorithm to recognize different visual objects. The proposed<br />
method treats subgraph as feature to construct base classifier, and utilizes popular error correcting output code scheme to<br />
solve multi-class problem. Both factors, base classifier and error-correcting coding matrix are considered simultaneously.<br />
And subgragphs, which are shareable by different classes, are wisely used to improve the classification performance. The<br />
experimental results on multi-class object recognition show the effectiveness of the proposed algorithm.<br />
- 122 -
16:00-16:20, Paper TuCT3.2<br />
Level-Set Segmentation of Brain Tumors using a New Hybrid Speed Function<br />
Cho, Wanhyun, Chonnam National Univ.<br />
Park, Jonghyun, Chonnam National Univ.<br />
Park, Soonyoung, Mokpo National Univ.<br />
Kim, Soohyung, Chonnam National Univ.<br />
Kim, Sunworl, Chonnam National Univ.<br />
Ahn, Gukdong, Chonnam National Univ.<br />
Lee, Myungeun, Chonnam National Univ.<br />
Lee, Gueesang, Chonnam National Univ.<br />
This paper presents a new hybrid speed function needed to perform image segmentation within the level-set framework.<br />
This speed function provides a general form that incorporates the alignment term as a part of the driving force for the<br />
proper edge direction of an active contour by using the probability term derived from the region partition scheme and, for<br />
regularization, the geodesics contour term. First, we use an external force for active contours as the Gradient Vector Flow<br />
field. This is computed as the diffusion of gradient vectors of a gray level edge map derived from an image. Second, we<br />
partition the image domain by progressively fitting statistical models to the intensity of each region. Here we adopt two<br />
Gaussian distributions to model the intensity distribution of the inside and outside of the evolving curve partitioning the<br />
image domain. Third, we use the active contour model that has the computation of geodesics or minimal distance curves,<br />
which allows stable boundary detection when the model’s gradients suffer from large variations including gaps or noise.<br />
Finally, we test the accuracy and robustness of the proposed method for various medical images. Experimental results<br />
show that our method can properly segment low contrast, complex images.<br />
16:20-16:40, Paper TuCT3.3<br />
The Impact of Color on Bag-of-Words based Object Recognition<br />
Rojas Vigo, David Augusto, Computer Vision Center Barcelona<br />
Shahbaz Khan, Fahad, Computer Vision Center Barcelona<br />
Van De Weijer, Joost, Computer Vision Center Barcelona<br />
Gevers, Theo, Univ. of Amsterdam<br />
In recent years several works have aimed at exploiting color information in order to improve the bag-of-words based<br />
image representation. There are two stages in which color information can be applied in the bag-of-words framework.<br />
Firstly, feature detection can be improved by choosing highly informative color-based regions. Secondly, feature description,<br />
typically focusing on shape, can be improved with a color description of the local patches. Although both approaches<br />
have been shown to improve results the combined merits have not yet been analyzed. Therefore, in this paper we investigate<br />
the combined contribution of color to both the feature detection and extraction stages. Experiments performed on two<br />
challenging data sets, namely Flower and Pascal VOC 2009; clearly demonstrate that incorporating color in both feature<br />
detection and extraction significantly improves the overall performance.<br />
16:40-17:00, Paper TuCT3.4<br />
Pyramidal Model for Image Semantic Segmentation<br />
Passino, Giuseppe, Queen Mary, Univ. of London<br />
Patras, Ioannis, Queen Mary, Univ. of London<br />
Izquierdo, Ebroul, Queen Mary, Univ. of London<br />
We present a new hierarchical model applied to the problem of image semantic segmentation, that is, the association to<br />
each pixel in an image with a category label (e.g. tree, cow, building, ...). This problem is usually addressed with a combination<br />
of an appearance-based pixel classification and a pixel context model. In our proposal, the images are initially<br />
over-segmented in dense patches. The proposed pyramidal model naturally embeds the compositional nature of a scene to<br />
achieve a multi-scale contextualisation of patches. This is obtained by imposing an order on the patches aggregation operations<br />
towards the final scene. The nodes of the pyramid (that is, a dendrogram) thus represent patch clusters, or superpatches.<br />
The probabilistic model favours the homogeneous labelling of super-patches that are likely to contain a single<br />
object instance, modelling the uncertainty in identifying such super-patches. The proposed model has several advantages,<br />
including the computational efficiency, as well as the expandability. Initial results place the model in line with other works<br />
in the recent literature.<br />
- 123 -
17:00-17:20, Paper TuCT3.5<br />
Multi-View based Estimation of Human Upper-Body Orientation<br />
Rybok, Lukas, Karlsruhe Inst. of Tech.<br />
Voit, Michael, Fraunhofer Inst. of Optronics<br />
Ekenel, Hazim Kemal, Karlsruhe Inst. of Tech.<br />
Stiefelhagen, Rainer, Karlsruhe Inst. of Tech. & Fraunhofer IITB<br />
The knowledge about the body orientation of humans can improve speed and performance of many service components<br />
of a smart-room. Since many of such components run in parallel, an estimator to acquire this knowledge needs a very low<br />
computational complexity. In this paper we address these two points with a fast and efficient algorithm using the smartroom’s<br />
multiple camera output. The estimation is based on silhouette information only and is performed for each camera<br />
view separately. The single view results are fused within a Bayesian filter framework. We evaluate our system on a subset<br />
of videos from the CLEAR 2007 dataset and achieve an average correct classification rate of 87.8%, while the estimation<br />
itself just takes 12 ms when four cameras are used.<br />
TuCT4 Dolmabahçe Hall A<br />
Structural Methods Regular Session<br />
Session chair: Ghosh, Joydeep (Univ. of Texas)<br />
15:40-16:00, Paper TuCT4.1<br />
An Iterative Algorithm for Approximate Median Graph Computation<br />
Ferrer, Miquel, Univ. Pol. De Catalunuya<br />
Bunke, Horst, Univ. of Bern<br />
Recently, the median graph has been shown to be a good choice to obtain a representative of a given set of graphs. It has<br />
been successfully applied to graph-based classification and clustering. In this paper we exploit a theoretical property of<br />
the median, which has not yet been utilized in the past, to derive a new iterative algorithm for approximate median graph<br />
computation. Experiments done using five different graph databases show that the proposed approach yields, in four out<br />
of these five datasets, better medians than two of the previous existing methods.<br />
16:00-16:20, Paper TuCT4.2<br />
A Supergraph-Based Generative Model<br />
Han, Lin, Univ. of York<br />
Wilson, Richard, Univ. of York<br />
Hancock, Edwin, Univ. of York<br />
This paper describes a method for constructing a generative model for sets of graphs. The method is posed in terms of<br />
learning a supergraph from which the samples can be obtained by edit operations. We construct a probability distribution<br />
for the occurrence of nodes and edges over the supergraph. We use the EM algorithm to learn both the structure of the supergraph<br />
and the correspondences between the nodes of the sample graphs and those of the supergraph, which are treated<br />
as missing data. In the experimental evaluation of the method, we a) prove that our supergraph learning method can lead<br />
to an optimal or suboptimal supergraph, and b) show that our proposed generative model gives good graph classification<br />
results.<br />
16:20-16:40, Paper TuCT4.3<br />
Levelings and Flatzone Morphology<br />
Meyer, Fernand, Mines-ParisTech<br />
Successive levelings are applied on document images. The residues of successive levelings are made of flat zones for<br />
which morphological transforms are described.<br />
16:40-17:00, Paper TuCT4.4<br />
Combining Force Histogram and Discrete Lines to Extract Dashed Lines<br />
Debled-Rennesson, Isabelle, LORIA – Nancy Univ.<br />
Wendling, Laurent, Univ. Paris Descartes<br />
- 124 -
A new method to extract dashed lines in technical document is proposed in this paper by combining force histogram and<br />
discrete lines. The aim is to study the spatial location of couples of connected components using force histogram and to<br />
refine the recognition by considering surrounding discrete lines. This new model is fast and it allows a good extraction of<br />
occulted patterns in presence of noise. Efficient common methods require several thresholds to process with technical<br />
documents. The proposed method requires only few thresholds which can be automatically set from data.<br />
17:00-17:20, Paper TuCT4.5<br />
Heat Flow-Thermodynamic Depth Complexity in Networks<br />
Escolano, Francisco, Univ. of Alicante<br />
Lozano, Miguel Angel, Univ. of Alicante<br />
Hancock, Edwin, Univ. of York<br />
In this paper we establish a formal link between network complexity in terms of Birkhoff-von Newmann decompositions<br />
and heat flow complexity (in terms of quantifying the heat flowing through the network at a given inverse temperature).<br />
Furthermore, we also define heat flow complexity in terms of thermodynamic depth, which results in a novel approach<br />
for characterizing networks and quantify their complexity. In our experiments we characterize several protein-protein interaction<br />
(PPI) networks and then highlight their evolutive differences.<br />
TuCT5 Anadolu Auditorium<br />
Image Analysis – V Regular Session<br />
Session chair: Kasturi, Rangachar (Univ. of South Florida)<br />
15:40-16:00, Paper TuCT5.1<br />
Content Adaptive Hash Lookups for Near-Duplicate Image Search by Full or Partial Image Queries<br />
Harmanci, Oztan, Anvato Inc.<br />
Haritaoglu, Ismail, Pol. Rain Inc.<br />
In this paper we present a scalable and high performance near-duplicate image search method. The proposed algorithm<br />
follows the common paradigm of computing local features around repeatable scale invariant interest points. Unlike existing<br />
methods, much shorter hashes are used (40 bits). By leveraging on the shortness of the hashes, a novel high performance<br />
search algorithm is introduced which analyzes the reliability of each bit of a hash and performs content adaptive hash<br />
lookups by adaptively adjusting the “range” of each hash bit based on reliability. Matched features are post-processed to<br />
determine the final match results. We experimentally show that the algorithm can detect cropped, resized, print-scanned<br />
and re-encoded images and pieces from images among thousands of images. The proposed algorithm can search for a<br />
200x200 piece of image in a database of 2,250 images with size 2400x4000 in 0.020 seconds on 2.5GHz Intel Core 2.<br />
16:00-16:20, Paper TuCT5.2<br />
The Good, the Bad, and the Ugly: Predicting Aesthetic Image Labels<br />
Wu, Yaowen, RWTH Aachen Univ. Fraunhofer Inst. IAIS<br />
Bauckhage, Christian, Fraunhofer IAIS<br />
Thurau, Christian, Fraunhofer IAIS<br />
Automatic classification of the aesthetic content of a picture is one of the challenges in the emerging discipline of computational<br />
aesthetics. Any suitable solution must cope with the facts that aesthetic experiences are highly subjective and<br />
that a commonly agreed upon theory of their psychological constituents is still missing. In this paper, we present results<br />
obtained from an empirical basis of several thousand images. We train SVM based classifiers to predict aesthetic adjectives<br />
rather than aesthetic scores and we introduce a probabilistic post processing step that alleviates effects due to misleadingly<br />
labeled training data. Extensive experimentation indicates that aesthetics classification is possible to a large extent. In particular,<br />
we find that previously established low-level features are well suited to recognize beauty. Robust recognition of<br />
unseemliness, on the other hand, appears to require more high-level analysis.<br />
16:20-16:40, Paper TuCT5.3<br />
Information Fusion for Combining Visual and Textual Image Retrieval<br />
Zhou, Xin, Geneva Univ. Hospitals and Univ. of Geneva<br />
Depeursinge, Adrien, Geneva Univ. Hospitals and Univ. of Geneva<br />
- 125 -
Müller, Henning, Univ. of Applied Sciences Sierre, Switzerland<br />
In this paper, classical approaches such as maximum combinations (combMAX), sum combinations (comb-SUM) and<br />
the product of the maximum and a nonzero number (combMNZ) were employed and the trade off between two fusion effects<br />
(chorus and dark horse effects) was studied based on the sum of n maximums. Various normalization strategies were<br />
tried out. The fusion algorithms are evaluated using the best four visual and textual runs of the ImageCLEF medical image<br />
retrieval task 2008 and 2009. The results show that fused runs outperform the best original runs and multi-modality fusion<br />
statistically outperforms single modality fusion. The logarithmic rank penalization shows to be the most stable normalization.<br />
The dark horse effect is in competition with the chorus effect and each of them can produce best fusion performance<br />
depending on the nature of the input data.<br />
16:40-17:00, Paper TuCT5.4<br />
Perceptual Image Retrieval by Adding Color Information to the Shape Context Descriptor<br />
Rusiñol, Marçal, Univ. Autònoma de Barcelona<br />
Nourbakhsh, Farshad, Computer Vision Center / Univ. Autònoma de Barcelona<br />
Karatzas, Dimosthenis, Univ. Autonoma de Barcelona<br />
Valveny, Ernest, Computer Vision Center / Univ. Autònoma de Barcelona<br />
Llados, Josep, Computer Vision Center<br />
In this paper we present a method for the retrieval of images in terms of perceptual similarity. Local color information is<br />
added to the shape context descriptor in order to obtain an object description integrating both shape and color as visual cues.<br />
We use a color naming algorithm in order to represent the color information from a perceptual point of view. The proposed<br />
method has been tested in two different applications, an object retrieval scenario based on color sketch queries and a color<br />
trademark retrieval problem. Experimental results show that the addition of the color information significantly outperforms<br />
the sole use of the shape context descriptor.<br />
17:00-17:20, Paper TuCT5.5<br />
Weighted Boundary Points for Shape Analysis<br />
Zhang, Jing, Univ. of South Florida<br />
Kasturi, Rangachar, Univ. of South Florida<br />
Shape analysis is an active and important branch in computer vision research field. In recent years, many geometrical, topological,<br />
and statistical features have been proposed and widely used for shape-related applications. In this paper, based on<br />
the properties of Distance Transform, we present a new shape feature, weight of boundary point. By computing the shortest<br />
distances between boundary points and distance contours of a transformed shape, every boundary point is assigned a weight,<br />
which contains the interior structure information of the shape. To evaluate the proposed new shape feature, we tested the<br />
weighted boundary points on shape matching and shape decomposition. The experimental results demonstrated the validity.<br />
TuCT6 Dolmabahçe Hall B<br />
Speech and Speaker Recognition Regular Session<br />
Session chair: Shinoda, Koichi (Tokyo Institute of Technology)<br />
15:40-16:00, Paper TuCT6.1<br />
Dimension-Decoupled Gaussian Mixture Model for Short Utterance Speaker Recognition<br />
Stadelmann, Thilo, Univ. of Marburg<br />
Freisleben, Bernd, Univ. of Marburg<br />
The Gaussian Mixture Model (GMM) is often used in conjunction with Mel-frequency cepstral coefficient (MFCC) feature<br />
vectors for speaker recognition. A great challenge is to use these techniques in situations where only small sets of training<br />
and evaluation data are available, which typically results in poor statistical estimates and, finally, recognition scores. Based<br />
on the observation of marginal MFCC probability densities, we suggest to greatly reduce the number of free parameters in<br />
the GMM by modeling the single dimensions separately after proper preprocessing. Saving about 90% of the free parameters<br />
as compared to an already optimized GMM and thus making the estimates more stable, this approach considerably improves<br />
recognition accuracy over the baseline as the utterances get shorter and saves a huge amount of computing time both in training<br />
and evaluation, enabling real-time performance. The approach is easy to implement and to combine with other short-utterance<br />
approaches, and applicable to other features as well.<br />
- 126 -
16:00-16:20, Paper TuCT6.2<br />
Modeling Syllable-Based Pronunciation Variation for Accented Mandarin Speech Recognition<br />
Zhang, Shilei, IBM Res.<br />
Shi, Qin, IBM Res. – China<br />
Qin, Yong, IBM Res. – China<br />
Pronunciation variation is a natural and inevitable phenomenon in an accented Mandarin speech recognition application.<br />
In this paper, we integrate knowledge-based and data-driven approaches together for syllable-based pronunciation variation<br />
modeling to improve the performance of Mandarin speech recognition system for speakers with Southern accent. First,<br />
we generate the syllable-based pronunciation variation rules of Southern accent observed from the training corpus by Chinese<br />
linguistic expert. Second, dictionary augmentation with multiple pronunciation variants and pronunciation probability<br />
derived from forced alignment statistics of training data. The acoustic models will be retrained based on the new expansion<br />
dictionary. Finally, pronunciation variation adaptation will be performed to further fit the data on the decoding stage by<br />
taking distribution of variation rules clusters of testing set into account. The experimental results show that the proposed<br />
method provides a flexible framework to improve the recognition performance for accented speech effectively.<br />
16:20-16:40, Paper TuCT6.3<br />
Automatic Pronunciation Transliteration for Chinese-English Mixed Language Keyword Spotting<br />
Zhang, Shilei, IBM Res.<br />
Shuang, Zhiwei, IBM Res. – China<br />
Qin, Yong, IBM Res. – China<br />
This paper presents automatic pronunciation transliteration method with acoustic and contextual analysis for Chinese-<br />
English mixed language keyword spotting (KWS) system. More often, we need to develop robust Chinese-English mixed<br />
language spoken language technology without Chinese accented English acoustic data. In this paper, we exploit pronunciation<br />
conversion method based on syllable-based characteristic analysis of pronunciation and data-driven phoneme pairs<br />
mappings to solve mixed language problem by only using well-trained Chinese models. One obvious advantage of such<br />
method is that it provides a flexible framework to implement the pronunciation conversion of English keywords to Chinese<br />
automatically. The efficiency of the proposed method was demonstrated under KWS task on mixed language database.<br />
16:40-17:00, Paper TuCT6.4<br />
Learning Virtual HD Model for Bi-Model Emotional Speaker Recognition<br />
Huang, Ting, Zhejiang Univ.<br />
Yang, Yingchun, Zhejiang Univ.<br />
Pitch mismatch between training and testing is one of the important factors causing the performance degradation of the<br />
speaker recognition system. In this paper, we adopted the missing feature theory and specified the Unreliable Region (UR)<br />
as the parts of the utterance with high emotion induced pitch variation. To model these regions, a virtual HD (High Different<br />
from neutral, with large pitch offset) model for each target speaker was built from the virtual speech, which were converted<br />
from the neutral speech by the Pitch Transformation Algorithm (PTA). In the PTA, a polynomial transformation function<br />
was learned to model the relationship of the average pitch between the neutral and the high-pitched utterances. Compared<br />
with traditional GMM-UBM and our previous method, our new method obtained 1.88% and 0.84% identification rate<br />
(IR) increase on the MASC respectively, which are promising results.<br />
17:00-17:20, Paper TuCT6.5<br />
Role of Synthetically Generated Samples on Speech Recognition in a Resource-Scarce Language<br />
Chakraborty, Rupayan, St. Thomas’ Coll. of Eng. & Tech.<br />
Garain, Utpal, Indian Statistical Inst.<br />
Speech recognition systems that make use of statistical classifiers require a large number of training samples. However,<br />
collection of real samples has always been a difficult problem due to the involvement of substantial amount of human intervention<br />
and cost. Considering this problem, this paper presents a novel method for generating synthetic samples from<br />
a handful of real samples and investigates the role of these samples in designing a speech recognition system. Speaker dependent<br />
limited vocabulary isolated word recognition in an Indian language (i.e. Bengali) has been taken a reference to<br />
demonstrate the potential of the proposed framework. The role of synthetic samples is demonstrated by showing a significant<br />
improvement in recognition accuracy. A maximum improvement of 10% is achieved using the proposed approach.<br />
- 127 -
TuCT7 Dolmabahçe Hall C<br />
Fingerprint Regular Session<br />
Session chair: Sankur, Bülent (Bogazici Univ.)<br />
15:40-16:00, Paper TuCT7.1<br />
Detecting Altered Fingerprints<br />
Feng, Jianjiang, Tsinghua Univ.<br />
Jain, Anil, Michigan State Univ.<br />
Ross, Arun, West Virginia Univ.<br />
The widespread deployment of Automated Fingerprint Identification Systems (AFIS) in law enforcement and border<br />
control applications has prompted some individuals with criminal background to evade identification by purposely altering<br />
their fingerprints. Available fingerprint quality assessment software cannot detect most of the altered fingerprints since<br />
the implicit image quality does not always degrade due to alteration. In this paper, we classify the alterations observed in<br />
an operational database into three categories and propose an algorithm to detect altered fingerprints. Experiments were<br />
conducted on both real-world altered fingerprints and synthetically generated altered fingerprints. At a false alarm rate of<br />
7%, the proposed algorithm detected 92% of the altered fingerprints, while a well-known fingerprint quality software,<br />
NFIQ, only detected 20% of the altered fingerprints.<br />
16:00-16:20, Paper TuCT7.2<br />
A Variational Formulation for Fingerprint Orientation Modeling<br />
Hou, Zujun, Inst. For Infocomm Res.<br />
Yau, Wei-Yun, Inst. For Infocomm Res.<br />
Fingerprint orientation plays important roles in fingerprint recognition. This paper proposes a framework for modeling<br />
the fingerprint orientation field based on the variational principle. The proposed method does not require any prior information<br />
about the structure of acquired fingerprints. Comparison has been made with respect to state-of-the-arts in fingerprint<br />
orientation modeling.<br />
16:20-16:40, Paper TuCT7.3<br />
Fingerprint Pore Matching based on Sparse Representation<br />
Liu, Feng, The Hong Kong Pol. Univ.<br />
Zhao, Qijun, The Hong Kong Pol. Univ.<br />
Zhang, Lei, The Hong Kong Pol. Univ.<br />
Zhang, David, The Hong Kong Pol. Univ.<br />
This paper proposes an improved direct fingerprint pore matching method. It measures the differences between pores by<br />
using the sparse representation technique. The coarse pore correspondences are then established and weighted based on<br />
the obtained differences. The false correspondences among them are finally removed by using the weighted RANSAC algorithm.<br />
Experimental results have shown that the proposed method can greatly improve the accuracy of existing methods.<br />
16:40-17:00, Paper TuCT7.4<br />
Latent Fingerprint Core Point Prediction based on Gaussian Processes<br />
Su, Chang, Univ. at Buffalo, State Univ. of New York<br />
Srihari, Sargur, Univ. at Buffalo, State Univ. of New York<br />
Core point prediction is of critical importance to latent fingerprints individuality assessment. While tremendous effort<br />
have been made in core point detection, locating core points in latent fingerprints continues to be a difficult problem<br />
because latent prints usually contain only partial images with core points left outside the print. A novel method is proposed<br />
that predicts the locations and orientations of core points for latent fingerprints. The method is based on Gaussian processes<br />
and provides prediction in interpretations of probability rather than binary decision. The accuracy of the method is illustrated<br />
by experiments on a real-life latent fingerprint data set.<br />
17:00-17:20, Paper TuCT7.5<br />
Towards a Better Understanding of the Performance of Latent Fingerprint Recognition in Realistic Forensic Conditions<br />
Puertas, Maria, Univ. Autonoma de Madrid<br />
- 128 -
Ramos, Daniel, Univ. Autonoma de Madrid<br />
Fierrez, Julian, Univ. Autonoma de Madrid<br />
Ortega-Garcia, Javier, Univ. Autonoma de Madrid<br />
Exposito-Marquez, NicomedesDepartamento de Identificacion. Servicio de Criminalistica de la Guardia Civil, Ministerio<br />
del Interior, Spain.<br />
This work studies the performance of a state-of-the-art fingerprint recognition technology, in several practical scenarios<br />
of interest in forensic casework. First, the differences in performance between manual and automatic minutiae extraction<br />
for latent fingerprints are presented. Then, automatic minutiae extraction is analyzed using three different types of fingerprints:<br />
latent, rolled and plain. The experiments are carried out using a database of latent finger marks and fingerprint impressions<br />
from real forensic cases. The results show high performance degradation in automatic minutiae extraction<br />
compared to manual extraction by human experts. Moreover, high degradation in performance on latent finger marks can<br />
be observed in comparison to fingerprint impressions.<br />
TuBCT8 Upper Foyer<br />
3D Shape Recovery; Image and Physics-Based Modeling; Motion and Multi-View Vision; Tracking and Surveillance<br />
Poster Session<br />
Session chair: Jiang, Xiaoyi (Univ. of Münster)<br />
13:30-16:30, Paper TuBCT8.1<br />
Online Next-Best-View Planning for Accuracy Optimization using an Extended E-Criterion<br />
Trummer, Michael, Friedrich-Schiller Univ. of Jena<br />
Munkelt, Christoph, Fraunhofer Society<br />
Denzler, Joachim, Friedrich-Schiller Univ. of Jena<br />
Next-best-view (NBV) planning is an important aspect for three-dimensional (3D) reconstruction within controlled environments,<br />
such as a camera mounted on a robotic arm. NBV methods aim at a purposive 3D reconstruction sustaining<br />
predefined goals and limitations. Up to now, literature mainly presents NBV methods for range sensors, model-based approaches<br />
or algorithms that address the reconstruction of a finite set of primitives. For this work, we use an intensity<br />
camera without active illumination. We present a novel combined online approach comprising feature tracking, 3D reconstruction,<br />
and NBV planning that addresses arbitrary unknown objects. In particular we focus on accuracy optimization<br />
based on the reconstruction uncertainty. To this end we introduce an extension of the statistical E-criterion to model directional<br />
uncertainty, and we present a closed-form, optimal solution to this NBV planning problem. Our experimental<br />
evaluation demonstrates the effectivity of our approach using an absolute error measure.<br />
13:30-16:30, Paper TuBCT8.2<br />
Non Contact 3D Measurement Scheme for Transparent Objects using UV Structured Light<br />
Rantoson, Rindra, LE2I<br />
Fofi, David, Le2i UMR CNRS 5158<br />
Stolz, Christophe, LE2I<br />
Meriaudeau, Fabrice, LE2I<br />
This paper introduces a novel 3D measurement scheme based on UV laser triangulation to ascertain the shape of transparent<br />
objects. Transparent objects are extremely difficult to scan with traditional 3D scanners because of the refraction problem<br />
observed in the visible range. Therefore, the object surface needs to be preliminary powdered before being digitized with<br />
commercial scanners. Our approach consists of using non contact measurement scheme while dealing with the refraction<br />
problem in visible environment. The object shape is computed by classical triangulation method based on stereovision<br />
constraint. The proposed acquisition system is composed of two classical visible range cameras and a UV laser source.<br />
The exploitation of the UV laser for triangulation system characterizes the novelty of the proposed approach. The fluorescence<br />
generated by the UV radiation enables to acquire 3D data of transparent surface with a classical stereovision scheme.<br />
13:30-16:30, Paper TuBCT8.3<br />
Extending Fast Marching Method under Point Light Source Illumination and Perspective Projection<br />
Iwahori, Yuji, Chubu Univ.<br />
Iwai, Kazuki, Chubu Univ.<br />
Woodham, Robert J., Univ. of British Columbia<br />
- 129 -
Kawanaka, Haruki, Aichi Prefectural Univ.<br />
Fukui, Shinji, Aichi Univ. of Education<br />
Kasugai, Kunio, Aichi Medical Univ.<br />
An endoscope is a medical instrument that acquires images inside the human body. An endoscope carries its own light<br />
source. Classic shape-from-shading can be used to recover the 3-D shape of objects in view. Recent implementations have<br />
used the Fast Marching Method (FMM). Previous FMM approaches recover 3-D shape under assumptions of parallel light<br />
source illumination and orthographic projection. This paper extends the FMM approach to recover the 3-D shape under<br />
more realistic conditions of endoscopy, namely nearby point light source illumination and perspective projection. The new<br />
approach is demonstrated through experiment and is seen to improve performance.<br />
13:30-16:30, Paper TuBCT8.4<br />
Effective Structure-From-Motion for Hybrid Camera Systems<br />
Bastanlar, Yalin, Middle East Tech. Univ.<br />
Temizel, Alptekin, Middle East Tech. Univ.<br />
Yardimci, Yasemin, Middle East Tech. Univ.<br />
Sturm, Peter, INRIA<br />
We describe a pipeline for structure-from-motion with mixed camera types, namely omni directional and perspective cameras.<br />
The steps of the pipeline can be summarized as calibration, point matching, pose estimation, triangulation and bundle<br />
adjustment. For these steps, we either propose improved methods or modify existing perspective camera methods to make<br />
the pipeline more effective and automatic when employed for hybrid camera systems.<br />
13:30-16:30, Paper TuBCT8.5<br />
Single View Metrology Along Orthogonal Directions<br />
Peng, Kun, Peking Univ.<br />
Hou, Lulu, Peking Univ.<br />
Ren, Ren, Peking Univ.<br />
Ying, Xianghua, Peking Univ.<br />
Zha, Hongbin, Peking Univ.<br />
In this paper, we describe how 3D metric measurements can be determined from a single uncalibrated image, when only<br />
minimal geometric information are available in the image. The minimal information just is orthogonal vanishing points.<br />
Given such limited information, we show that the length ratios on different orthogonal directions can be directly computed.<br />
The exciting discovery of the method seems to oppose common senses: Usually, in the calibration process, all edge-lengths<br />
of cuboid are known, in this paper, cuboid edge-lengths are unknown but its edge-lengths ratios can be recovered from<br />
image. 3D metric measurements can be directly computed from the image using our linear method.<br />
13:30-16:30, Paper TuBCT8.6<br />
Depth Perception Model based on Fixational Eye Movements using Bayesian Statistical Inference<br />
Tagawa, Norio, Tokyo Metropolitan Univ.<br />
Small vibrations of eyeball, which occur when we fix our gaze on object, is called ``fixational eye movements.’’ It has<br />
been reported that such the involuntary eye movements work also for monocular depth perception. In this study, we focus<br />
on ``tremor’’ which is the smallest type of fixational eye movement, and construct depth perception model based on tremor<br />
using MAP-EM algorithm. Its effectiveness is confirmed through numerical evaluations using artificial images.<br />
13:30-16:30, Paper TuBCT8.7<br />
One-Shot Scanning using a Color Stripe Pattern<br />
Li, Renju, Peking Univ.<br />
Zha, Hongbin, Peking Univ.<br />
Structured light 3D scanning has many applications such as 3D modeling, animation, motion analysis, deformation measurement<br />
and so on. Traditional structured light methods make use of a sequence of patterns to obtain the dense 3D data of<br />
objects. However, few methods have been proposed to achieve pixel wise reconstruction using one pattern only. In this<br />
- 130 -
paper, we proposes a one-shot scanning system based on a novel stripe pattern. This pattern uses color stripes with quadratic<br />
intensity distribution in each stripe. The color distribution is based on a De Bruijn sequence with six colors and order<br />
three. Graph cut is utilized to decode the color information and the resulting code is calculated using local intensity. Compared<br />
with traditional methods, the proposed method uses one pattern only and achieves pixel wise reconstruction. Experimental<br />
results show that our one-shot scanning system can robustly capture 3D data with high accuracy.<br />
13:30-16:30, Paper TuBCT8.8<br />
Face Appearance Reconstruction based on a Regional Statistical Craniofacial Model (RSCM)<br />
Yan-Fei, Zhang, Northwest Univ.<br />
Ming-Quan, Zhou, Northwest Univ.<br />
Geng, Guohua, Northwest Univ.<br />
Feng, Jun, Northwest Univ.<br />
The reconstruction of facial soft tissue is an essential processing phase in a few of fields. In this paper, we propose a face<br />
appearance reconstruction algorithm based on a Regional Statistical Craniofacial model called RSCM. Specifically, the<br />
shape of the craniofacial model is decomposed into a few of segments, such as the eyes, the nose and the mouth regions,<br />
then the joint statistical models of different regions are constructed independently to address the small sample size problem.<br />
The face reconstruction task is formulated as a miss data problem, and is also fulfilled region by region respectively.<br />
Finally, the recovered regions are assembled together to achieve a completed face model. The experimental results show<br />
that the proposed reconstruction scheme achieves less error rate than a state of the art method.<br />
13:30-16:30, Paper TuBCT8.9<br />
3D Human Pose Reconstruction using Millions of Exemplars<br />
Jiang, Hao, Boston Coll.<br />
We propose a novel exemplar based method to estimate 3D human poses from single images by using only the joint correspondences.<br />
Due to the inherent depth ambiguity, estimating 3D poses from a monocular view is a challenging problem.<br />
We solve the problem by searching through millions of exemplars for optimal poses. Compared with traditional parametric<br />
schemes, our method is able to handle very large pose database, relieves parameter tweaking, is easier to train and is more<br />
effective for complex pose 3D reconstruction. The proposed method estimates upper body poses and lower body poses<br />
sequentially, which implicitly squares the size of the exemplar database and enables us to reconstruct unconstrained poses<br />
efficiently. Our implementation based on the kd-tree achieves real-time performance. The experiments on a variety of images<br />
show that the proposed method is efficient and effective.<br />
13:30-16:30, Paper TuBCT8.10<br />
Recovering 3D Shape using an Improved Fast Marching Method<br />
Zou, Chengming, Wuhan Univ. of Tech.<br />
Hancock, Edwin, Univ. of York<br />
In this paper we present an improved shape from shading method using improved fast marching method. We commence<br />
by showing how to recover 3D shape from a single image using an improved fast marching method for solving SFS problem.<br />
Then we use the level set method constrained by energy minimization to evolve the 3D shape. Finally we show that<br />
the method can recover stable surface estimates from both synthetic and real world images of complex objects. The experimental<br />
results show that the resulting method is both robust and accurate.<br />
13:30-16:30, Paper TuBCT8.11<br />
The Motion Dynamics Approach to the PnP Problem<br />
Wang, Bo, Chinese Acad. of Sciences<br />
Sun, Fengmei, North China University of Technology<br />
We propose a new motion dynamics approach to solve the PnP problem, where a dynamic simulation system is constituted<br />
by springs and balls. The equivalence between minimizing the energy of the dynamic system and solving the PnP problem<br />
is proved. With the assumption of the existence of resistances, the solution of the original PnP problem can be solved<br />
through the simulation of the process of the movement of the balls.<br />
- 131 -
13:30-16:30, Paper TuBCT8.12<br />
Eigenbubbles: An Enhanced Apparent BRDF Representation<br />
Kumar, Ritwik, Harvard Univ.<br />
Baba, Vemuri, Univ. of Florida<br />
Banerjee, Arunava, Univ. of Florida<br />
In this paper we address the problem of relighting faces in presence of cast shadows and specularities. We present a solution<br />
to this problem by capturing the spatially varying Apparent Bidirectional Reflectance Functions (ABRDF) fields of human<br />
faces using Spline Modulated Spherical Harmonics and representing them using a few salient spherical functions called<br />
Eigenbubbles. Through extensive experiments on the Extended Yale B and the CMU PIE benchmark datasets we demonstrate<br />
that the proposed method clearly outperforms the state-of-the-art techniques in synthesized image quality. Furthermore,<br />
we show that our framework allows for ABDRF field compression and can also be used to enhance performance of<br />
face recognition algorithms.<br />
13:30-16:30, Paper TuBCT8.13<br />
Reactive Object Tracking with a Single PTZ Camera<br />
Al Haj, Murad, Univ. Autonoma de Bracelona<br />
Bagdanov, Andrew D., Univ. Autonoma de Barcelona<br />
Gonzalez, Jordi, Centre de Visio per Computador<br />
Roca, F. Xavier, Univ. Autonoma de Barcelona<br />
In this paper we describe a novel approach to reactive tracking of moving targets with a pan-tilt-zoom camera. The approach<br />
uses an extended Kalman filter to jointly track the object position in the real world, its velocity in 3D and the camera intrinsics,<br />
in addition to the rate of change of these parameters. The filter outputs are used as inputs to PID controllers which<br />
continuously adjust the camera motion in order to reactively track the object at a constant image velocity while simultaneously<br />
maintaining a desirable target scale in the image plane. We provide experimental results on simulated and real<br />
tracking sequences to show how our tracker is able to accurately estimate both 3D object position and camera intrinsics<br />
with very high precision over a wide range of focal lengths.<br />
13:30-16:30, Paper TuBCT8.14<br />
An Experimental Study of Image Components and Data Metrics for Illumination-Robust Variational Optical Flow<br />
Chetverikov, Dmitry, MTA SZTAKI<br />
Molnar, Jozsef, ELTE<br />
Illumination-robust optical flow algorithms are needed in numerous machine vision applications such as vision-based intelligent<br />
vehicles, surveillance and traffic monitoring. Recently, we have proposed an implicit nonlinear scheme for variational<br />
optical flow that assumes no particular analytical form of energy functional and can accommodate various image<br />
components and data metrics. Using test data with brightness and colour illumination changes, we study different features<br />
and metrics and demonstrate that cross-correlation is superior to the L1 metric for all combinations of the features.<br />
13:30-16:30, Paper TuBCT8.15<br />
Multiple Human Tracking based on Multi-View Upper-Body Detection and Discriminative Learning<br />
Xing, Junliang, Tsinghua Univ.<br />
Ai, Haizhou, Tsinghua Univ. China<br />
Lao, Shihong, OMRON Corp.<br />
This paper focuses on the problem of tracking multiple humans in dense environments which is very challenging due to<br />
recurring occlusions between different humans. To cope with the difficulties it presents, an offline boosted multi-view<br />
upper-body detector is used to automatically initialize a new human trajectory and is capable of dealing with partial human<br />
occlusions. What is more, an online learning process is proposed to learn discriminative human observations, including<br />
discriminative interest points and color patches, to effectively track each human when even more occlusions occur. The<br />
offline and online observation models are neatly integrated into the particle filter framework to robustly track multiple<br />
highly interactive humans. Experiments results on CAVIAR dataset as well as many other challenging real-world cases<br />
demonstrate the effectiveness of the proposed method.<br />
- 132 -
13:30-16:30, Paper TuBCT8.16<br />
Visual Tracking using Sparsity Induced Similarity<br />
Liu, Huaping, Tsinghua Univ.<br />
Sun, Fuchun, Tsinghua Univ.<br />
Currently sparse signal reconstruction gains considerable interests and is applied in many fields. In this paper, we propose<br />
a new approach which utilizes the sparsity induced similarity to construct the tracking algorithm. Compared with stateof-the-art,<br />
the advantage of this approach is that the sparse representation needs to be calculated for only once and therefore<br />
the time cost is dramatically decreased. In addition, extensive experimental comparisons show that the proposed approach<br />
is more robust than some existing approaches.<br />
13:30-16:30, Paper TuBCT8.17<br />
An Information Fusion Approach for Multiview Feature Tracking<br />
Ataer-Cansizoglu, Esra, Boston Univ.<br />
Betke, Margrit, Boston Univ.<br />
We propose an information fusion approach to tracking objects from different viewpoints that can detect and recover from<br />
tracking failures. We introduce a reliability measure that is a combination of terms associated with correlation-based template<br />
matching and the epipolar geometry of the cameras. The measure is computed to evaluate the performance of 2D<br />
trackers in each camera view and detect tracking failures. The 3D object trajectory is constructed using stereoscopy and<br />
evaluated to predict the next 3D position of the object. In case of track loss in one camera view, the projection of the predicted<br />
3D position onto the image plane of this view is used to reinitialize the lost 2D tracker. We conducted experiments<br />
with 34 subjects to evaluate our proposed system on videos of facial feature movements during human-computer interaction.<br />
The system successfully detected feature loss and gave promising results on accurate re-initialization of the feature.<br />
13:30-16:30, Paper TuBCT8.18<br />
Monocular 3D Tracking of Deformable Surfaces using Linear Programming<br />
Chenhao, Wang, Shanghai Jiao Tong Univ.<br />
Li, Xiong, Shanghai Jiao Tong Univ.<br />
Liu, Yuncai, Shanghai Jiao Tong Univ.<br />
We present a method for 3D shape reconstruction of inextensible deformable surfaces from monocular image sequences.<br />
The key of our approach is to represent the surface as 3D triangulated mesh and formulate the reconstruction problem as<br />
a sequence of Linear Programming (LP) problems which can be effectively solved. The LP problem consists of data constraints<br />
which are 3D-to-2D keypoint correspondences and shape constraints which prevent large changes of the edge orientation<br />
between consecutive frames. Furthermore, we use a refined bisection algorithm to accelerate the computing speed.<br />
The robustness and efficiency of our approach are validated on both synthetic and real data.<br />
13:30-16:30, Paper TuBCT8.19<br />
Exploiting Visual Quasi-Periodicity for Automated Chewing Event Detection using Active Appearance Models and<br />
Support Vector Machines<br />
Cadavid, Steven, Univ. of Miami<br />
Abdel-Mottaleb, Mohamed, Univ. of Miami<br />
We present a method that automatically detects chewing events in surveillance video of a subject. Firstly, an Active Appearance<br />
Model (AAM) is used to track a subject’s face across the video sequence. It is observed that the variations in the<br />
AAM parameters across chewing events demonstrate a distinct periodicity. We utilize this property to discriminate between<br />
chewing and non-chewing facial actions such as talking. A feature representation is constructed by applying spectral analysis<br />
to a temporal window of model parameter values. The estimated power spectra subsequently undergo non-linear dimensionality<br />
reduction via spectral regression. The low-dimensional representations of the power spectra are employed<br />
to train a Support Vector Machine (SVM) binary classifier to detect chewing events. Experimental results yielded a cross<br />
validated percentage agreement of 93.4%, indicating that the proposed system provides an efficient approach to automated<br />
chewing detection.<br />
- 133 -
13:30-16:30, Paper TuBCT8.20<br />
Slip and Fall Events Detection by Analyzing the Integrated Spatiotemporal Energy Map<br />
Huang, Chung-Lin, National Tsing-Hua Univ.<br />
Liao, Tim, National Tsing-Hua Univ.<br />
This paper presents a new method to detect slip and fall events by analyzing the integrated spatiotemporal energy (ISTE)<br />
map. ISTE map includes motion and time of motion occurrence as our motion feature. The extracted human shape is represented<br />
by an ellipse that provides crucial information of human motion activities. We use this features to detect the<br />
events in the video with non-fixed frame rate. This work assumes that the person lies on the ground with very little motion<br />
after the fall accident. Experimental results show that our method is effective for fall and slip detection.<br />
13:30-16:30, Paper TuBCT8.21<br />
Color Constancy using Standard Deviation of Color Channels<br />
Choudhury, Anustup, Univ. of Southern California<br />
Medioni, Gerard, Univ. of Southern California<br />
We address here the problem of color constancy and propose a new method to achieve color constancy based on the statistics<br />
of images with color cast. Images with color cast have standard deviation of one color channel significantly different<br />
from that of other color channels. This observation is also applicable to local patches of images and ratio of the maximum<br />
and minimum standard deviation of color channels of local patches is used as a prior to select a pixel color as illumination<br />
color. We provide extensive validation of our method on commonly used datasets having images under varying illumination<br />
conditions and show our method to be robust to choice of dataset and at least as good as current state-of-the-art color constancy<br />
approaches.<br />
13:30-16:30, Paper TuBCT8.22<br />
Recognizing Human Actions using Key Poses<br />
Baysal, Sermetcan, Bilkent Univ.<br />
Kurt, Mehmet Can, Bilkent Univ.<br />
Duygulu, Pinar, Bilkent Univ.<br />
In this paper, we explore the idea of using only pose, without utilizing any temporal information, for human action recognition.<br />
In contrast to the other studies using complex action representations, we propose a simple method, which relies on<br />
extracting key poses from action sequences. Our contribution is two-fold. Firstly, representing the pose in a frame as a<br />
collection of line-pairs, we propose a matching scheme between two frames to compute their similarity. Secondly, to<br />
extract key poses for each action, we present an algorithm, which selects the most representative and discriminative poses<br />
from a set of candidates. Our experimental results on KTH and Weizmann datasets have shown that pose information by<br />
itself is quite effective in grasping the nature of an action and sufficient to distinguish one from others.<br />
13:30-16:30, Paper TuBCT8.23<br />
Action Recognition using Three-Way Cross Correlations Feature of Local Motion Attributes<br />
Matsukawa, Tetsu, Univ. of Tsukuba<br />
Kurita, Takio, National Inst. of Advanced Industrial Science andTechnology<br />
This paper proposes a spatio-temporal feature using three-way cross-correlations of local motion attributes for action<br />
recognition. Recently, the cubic higher-order local auto-correlations (CHLAC) feature has been shown high classification<br />
performances for action recognition. In previous researches, CHLAC feature was applied to binary motion image sequences<br />
that indicates moving or static points. However, each binary motion image lost informations about the type of motion such<br />
as timing of change or motion direction. Therefore, we can improve the classification accuracy further by extending<br />
CHLAC to multivalued motion image sequences that considered several types of local motion attributes. The proposed<br />
method is also viewed as an extension of popular bag-of-features approach. Experimental results using two datasets shows<br />
proposed method outperformed CHLAC features and bag-of-features approach.<br />
- 134 -
13:30-16:30, Paper TuBCT8.24<br />
Discriminative Level Set for Contour Tracking<br />
Li, Wei, Chinese Acad. of Sciences<br />
Conventional contour tracking algorithms with level set often use generative models to construct the energy function. For<br />
tracking through cluttered and noisy background, however, a generative model may not be discriminative enough. In this<br />
paper we integrate the discriminative methods into a level set framework when constructing the level set energy function.<br />
We train a set of weak classifiers to distinguish the object from the background. Each weak classifier is designed to select<br />
the most discriminative feature space and integrated via AdaBoost according to their training errors. We also introduce a<br />
novel interaction term to explore the correlation between pixels near the object edge. This term together with the discriminative<br />
model both enhance the discriminative power of the level set. The experimental results show that the contour<br />
tracked by our approach is more accurate than the conventional algorithms with the generative model. Our algorithm successfully<br />
tracks the object contour even in a cluttered environment.<br />
13:30-16:30, Paper TuBCT8.25<br />
Tracking Objects with Adaptive Feature Patches for PTZ Camera Visual Surveillance<br />
Xie, Yi, Beijing Inst. of Tech.<br />
Lin, Liang, Lotushill Inc<br />
Jia, Yunde, Beijing Inst. of Tech.<br />
Compared to the traditional tracking with fixed cameras, the PTZ-camera-based tracking is more challenging due to (I)<br />
lacking of reliable background modeling and subtraction; (ii) the appearance and scale of target changing suddenly and<br />
drastically. Tackling these problems, this paper proposes a novel tracking algorithm using patch-based object models and<br />
demonstrates its advantages with the PTZ-camera in the application of visual surveillance. In our method, the target model<br />
is learned and represented by a set of feature patches whose discriminative power is higher than others. The target model<br />
is matched and evaluated by both appearance and motion consistency measurements. The homography between frames is<br />
also calculated for scale adaptation. The experiment on several surveillance videos shows that our method outperforms<br />
the state-of-arts approaches.<br />
13:30-16:30, Paper TuBCT8.26<br />
Counting Moving People in Videos by Salient Points Detection<br />
Conte, Donatello, Univ. di Salerno<br />
Foggia, Pasquale, Univ. di Salerno<br />
Percannella, Gennaro, Univ. di Salerno<br />
Tufano, Francesco, Univ. degli Studi di Salerno<br />
Vento, Mario, Univ. degli Studi di Salerno<br />
This paper presents a novel method to count people for video surveillance applications. The problem is faced by establishing<br />
a mapping between some scene features and the number of people. Moreover, the proposed technique takes specifically<br />
into account problems due to perspective. In the experimental evaluation, the method has been compared with respect to<br />
the algorithm by Albiol et al., which provided the highest performance at the PETS 2009 contest on people counting, using<br />
the same datasets. The results confirm that the proposed method improves the accuracy, while retaining the robustness of<br />
Albiol’s algorithm.<br />
13:30-16:30, Paper TuBCT8.27<br />
Visualization of Customer Flow in an Office Complex over a Long Period<br />
Onishi, Masaki, National Inst. of Advanced Industrial Science and Technology<br />
Yoda, Ikushi, National Inst. of Advanced Industrial Science and Technology<br />
In facility management, analysis of customer trajectories in office complexes is considered critical. In this paper, we<br />
propose a novel approach for the visualization of customer flow in an office complex over a long period of time. We expressed<br />
the variation in the trajectories with respect to time by using a mixture model; this was used for the visualization<br />
of the trajectory flows. The effectiveness of our approach was evaluated from the results of the customer flow analysis experiments<br />
that were conducted in an office complex.<br />
- 135 -
13:30-16:30, Paper TuBCT8.28<br />
Incremental MPCA for Color Object Tracking<br />
Wang, Dong, Department of Electronic Engineering<br />
Lu, Hu-Chuan, Dalian Univ. of Tech.<br />
Chen, Yen-Wei, Ritsumeikan Univ.<br />
The task of visual tracking is to deal with dynamic image streams that change over time. For color object tracking, although<br />
a color object is a 3-order tensor in essence, little attention has been focused on this attribute. In this paper, we propose a<br />
novel Incremental Multiple Principal Component Analysis (IMPCA) method for online learning dynamic tensor streams.<br />
When newly added tensor set arrives, the mean tenor and the covariance matrices of different modes can be updated easily,<br />
and then projection matrices can be effectively calculated based on covariance matrices. Finally, we apply our IMPCA method<br />
to color object tracking using Bayes inference framework. Experiments are performed on some changeling public and our<br />
own video sequences. The experimental results demonstrate that the proposed method achieves considerable performance.<br />
13:30-16:30, Paper TuBCT8.29<br />
Epipolar-Based Stereo Tracking without Explicit 3D Reconstruction<br />
Gaschler, Andre Karlheinz, Tech. Univ. München<br />
Burschka, Darius, Tech. Univ. München<br />
Hager, Gregory<br />
We present a general framework for tracking image regions in two views simultaneously based on sum-of-squared differences<br />
(SSD) minimization. Our method allows for motion models up to affine transformations. Contrary to earlier approaches, we<br />
incorporate the well-known epipolar constraints directly into the SSD optimization process. Since the epipolar geometry can<br />
be computed from the image directly, no prior calibration is necessary. Our algorithm has been tested in different applications<br />
including camera localization, wide-baseline stereo, object tracking and medical imaging. We show experimental results on<br />
robustness and accuracy compared to the known ground truth given by a conventional tracking device.<br />
13:30-16:30, Paper TuBCT8.30<br />
Human Body Parts Tracking using Sequential Markov Random Fields<br />
Cao, Xiao-Qin, City Univ. of Hong Kong<br />
Zeng, Jia, Soochow University<br />
Liu, Zhi-Qiang, City Univ. of Hong Kong<br />
Automatically tracking human body parts is a difficult problem because of background clutters, missing body parts, and the<br />
high degrees of freedoms and complex kinematics of the articulated human body. This paper presents the sequential Markov<br />
random fields (SMRFs) for tracking and labeling moving human body parts automatically by learning the spatio-temporal<br />
structures of human motions in the setting of occlusions and clutters. We employ a hybrid strategy, where the temporal dependencies<br />
between two successive human poses are described by the sequential Monte Carlo method, and the spatial relationships<br />
between body parts in a pose is described by the Markov random fields. Efficient inference and learning algorithms<br />
are developed based on the relaxation labeling. Experimental results show that the SMRF can effectively track human body<br />
parts in natural scenes.<br />
13:30-16:30, Paper TuBCT8.31<br />
Action Recognition in Videos using Nonnegative Tensor Factorization<br />
Krausz, Barbara, Fraunhofer IAIS<br />
Bauckhage, Christian, Fraunhofer IAIS<br />
Recognizing human actions is of vital interest in video surveillance or ambient assisted living. We consider an action as a<br />
sequence of body poses which are themselves a linear combination of body parts. In an offline procedure, nonnegative tensor<br />
factorization is used to extract basis images that represent body parts. The weighting coefficients are obtained by filtering a<br />
frame with the set of basis images. Since the basis images are obtained from nonnegative tensor factorization, they are separable<br />
and filtering can be implemented efficiently. The weighting coefficients encode dynamics and are used for action<br />
recognition. In the proposed action recognition framework, neither explicit detection and tracking of humans nor background<br />
subtraction are needed. Furthermore, for recognizing location specific actions, we implicitely take scene objects into account.<br />
- 136 -
13:30-16:30, Paper TuBCT8.32<br />
Action Detection in Crowded Videos using Masks<br />
Guo, Ping, Beijing Jiaotong Univ.<br />
Miao, Zhenjiang, Beijing Jiaotong Univ.<br />
In this paper, we investigate the task of human action detection in crowded videos. Different from action analysis in clean<br />
scenes, action detection in crowded environments is difficult due to the cluttered backgrounds, high densities of people<br />
and partial occlusions. This paper proposes a method for action detection based on masks. No human segmentation or<br />
tracking technique is required. To cope with the cluttered and crowded backgrounds, shape and motion templates are built<br />
and the shape templates are used as masks for feature refining. In order to handle the partial occlusion problem, only the<br />
moving body parts in each motion are involved in action training. Experiments using our approach are conducted on the<br />
CMU dataset with encouraging results.<br />
13:30-16:30, Paper TuBCT8.33<br />
3D Model based Vehicle Tracking using Gradient based Fitness Evaluation under Particle Filter Framework<br />
Zhang, Zhaoxiang, Beihang Univ.<br />
Huang, Kaiqi, Chinese Academy of Sciences<br />
Tan, Tieniu, Chinese Academy of Sciences<br />
Wang, Yunhong, Beihang Univ.<br />
We address the problem of 3D model based vehicle tracking from monocular videos of calibrated traffic scenes. A 3D<br />
wire-frame model is set up as prior information and an efficient fitness evaluation method based on image gradients is introduced<br />
to estimate the fitness score between the projection of vehicle model and image data, which is then combined<br />
into a particle filter based framework for robust vehicle tracking. Numerous experiments are conducted and experimental<br />
results demonstrate the effectiveness of our approach for accurate vehicle tracking and robustness to noise and occlusions.<br />
13:30-16:30, Paper TuBCT8.34<br />
Recovering 3D Shape and Light Source Positions from Non-Planar Shadows<br />
Yamashita, Yukihiro, Nagoya Inst. of Tech.<br />
Sakaue, Fumihiko, Nagoya Inst. of Tech.<br />
Sato, Jun, Nagoya Inst. of Tech.<br />
Recently, Shadow Graph has been proposed for recovering 3D shapes from shadows projected on curved surfaces. Unfortunately,<br />
this method requires a large computational cost. In this paper, we introduce 1D Shadow Graph which can be<br />
used for recovering 3D shapes with much smaller computational costs. We also extend our method, so that we can estimate<br />
both 3D shapes and light source positions simultaneously under a condition where 3D shapes and light sources are unknown.<br />
13:30-16:30, Paper TuBCT8.35<br />
3D Contour Model Creation for Stereo-Vision Systems<br />
Maruyama, Kenichi, National Inst. of Advanced Industrial Science and Tech.<br />
Kawai, Yoshihiro, National Inst. of Advanced Industrial Science and Tech.<br />
Tomita, Fumiaki, National Inst. of Advanced Industrial Science and Tech.<br />
The present paper describes a method for automatic 3D contour model creation for stereo-vision systems. The object<br />
model is a triangular surface mesh and a set of aspect models, which consists of model features and model points. Model<br />
features and model points are generated using 3D contours, which are estimated by the projected images of the triangular<br />
surface mesh from multiple discrete viewing directions. Using a non-photorealistic rendering approach, we extract not<br />
only the outer contours but also the inner contours of the projected images. Using both the inner and outer contours of the<br />
projected images, we create the object model which has 3D inner contour features and 3D contour generator features. Experimental<br />
results obtained using the 3D localization algorithm demonstrate the effectiveness of the proposed model.<br />
- 137 -
13:30-16:30, Paper TuBCT8.36<br />
Multibody Motion Classification using the Geometry of 6 Points in 2D Images<br />
Nordberg, Klas, Linköping Univ.<br />
Zografos, Vasileios, Linkoping Univ.<br />
We propose a method for segmenting an arbitrary number of moving objects using the geometry of 6 points in 2D images to<br />
infer motion consistency. This geometry allows us to determine whether or not observations of 6 points over several frames<br />
are consistent with a rigid 3D motion. The matching between observations of the 6 points and an estimated model of their<br />
configuration in 3D space is quantified in terms of a geometric error derived from distances between the points and 6 corresponding<br />
lines in the image. This leads to a simple motion inconsistency score that is derived from the geometric errors of<br />
6 points, that in the ideal case should be zero when the motion of the points can be explained by a rigid 3D motion. Initial<br />
clusters are determined in the spatial domain and merged in motion trajectory domain based on the score. Each point is then<br />
assigned to a cluster by assigning the point to the segment of the lowest score. Our algorithm has been tested with real image<br />
sequences from the Hopkins155 database with very good results, competing with the state of the art methods, particularly<br />
for degenerate motion sequences. In contrast the motion segmentation methods based on multi-body factorization, that assumes<br />
an affine camera model, the proposed method allows the mapping from the 3D space to the 2D image to be fully projective.<br />
13:30-16:30, Paper TuBCT8.37<br />
Reflection Removal in Colour Videos<br />
Conte, Donatello, Univ. di Salerno<br />
Foggia, Pasquale, Univ. di Salerno<br />
Percannella, Gennaro, Univ. di Salerno<br />
Tufano, Francesco, Univ. degli Studi di Salerno<br />
Vento, Mario, Univ. degli Studi di Salerno<br />
This paper presents a novel method for reflection removal in the context of an object detection system. The method is based<br />
on chromatic properties of the reflections and does not require a geometric model of the objects. An experimental evaluation<br />
of the proposed method has been performed on a large database, showing its effectiveness.<br />
13:30-16:30, Paper TuBCT8.38<br />
A Compound MRF Texture Model<br />
Haindl, Michael, Inst. of Information Theory and Aut.<br />
Havlicek, Vojtech, Inst. of Information Theory and Aut.<br />
This paper describes a novel compound Markov random field model capable of realistic modelling of multispectral bidirectional<br />
texture function, which is currently the most advanced representation of visual properties of surface materials. The<br />
proposed compound Markov random field model combines a non-parametric control random field with analytically solvable<br />
wide sense Markov representation for single regions and thus allows to avoid demanding Markov Chain Monte Carlo methods<br />
for both parameters estimation and the compound random field synthesis.<br />
13:30-16:30, Paper TuBCT8.39<br />
Shape Prototype Signatures for Action Recognition<br />
Donoser, Michael, Graz Univ. of Tech.<br />
Riemenschneider, Hayko, Graz Univ. of Tech.<br />
Bischof, Horst, Graz Univ. of Tech.<br />
Recognizing human actions in video sequences is frequently based on analyzing the shape of the human silhouette as the<br />
main feature. In this paper we introduce a method for recognizing different actions by comparing signatures of similarities<br />
to pre-defined shape prototypes. In training, we build a vocabulary of shape prototypes by clustering a training set of human<br />
silhouettes and calculate prototype similarity signatures for all training videos. During testing a prototype signature is calculated<br />
for the test video and is aligned to each training signature by dynamic time warping. A simple voting scheme over<br />
the similarities to the training videos provides action classification results and temporal alignments to the training videos.<br />
Experimental evaluation on a reference data set demonstrates that state-of-the-art results are achieved.<br />
- 138 -
13:30-16:30, Paper TuBCT8.40<br />
Shape Guided Maximally Stable Extremal Region (MSER) Tracking<br />
Donoser, Michael, Graz Univ. of Tech.<br />
Riemenschneider, Hayko, Graz Univ. of Tech.<br />
Bischof, Horst, Graz Univ. of Tech.<br />
Maximally Stable Extremal Regions (MSERs) are one of the most prominent interest region detectors in computer vision<br />
due to their powerful properties and low computational demands. In general MSERs are detected in single images, but given<br />
image sequences as input, the repeatability of MSER detection can be improved by exploiting correspondences between<br />
subsequent frames by feature based analysis. Such an approach fails during fast movements, in heavily cluttered scenes and<br />
in images containing several similar sized regions because of the simple feature based analysis. In this paper we propose an<br />
extension of MSER tracking by considering shape similarity as strong cue for defining the frame-to-frame correspondences.<br />
Efficient calculation of shape similarity scores ensures that real-time capability is maintained. Experimental evaluation<br />
demonstrates improved repeatability and an application for tracking weakly textured, planar objects.<br />
13:30-16:30, Paper TuBCT8.41<br />
Locating People in Images by Optimal Cue Integration<br />
Atienza-Vanacloig, Vicente, Pol. Univ. of Valencia<br />
Rosell Ortega, Juan, Pol. Univ. of Valencia<br />
Andreu-Garcia, Gabriela, Pol. Univ. of Valencia<br />
Valiente, Jose Miguel, Pol. Univ. of Valencia<br />
This paper describes an approach to segment and locate people in crowded scenarios with application to a surveillance system<br />
for airport dependencies. To obtain robust operation, the system analyzes a variety of visual cues color, motion and shape<br />
and integrates them optimally. A general method for automatic inference of optimal cue integration rules is presented. This<br />
schema, based on supervised training on video sequences, avoids the need of explicitly formulate combination rules based<br />
on a-priori constraints. The performance of the system is at least as good as classical fusing strategies like those based on<br />
voting, because the optimized decision engine implicitly includes these and other strategies.<br />
13:30-16:30, Paper TuBCT8.42<br />
Visual Tracking Algorithm using Pixel-Pair Feature<br />
Nishida, Kenji, National Inst. of Advanced Industrial Science and Tech.<br />
Kurita, Takio, National Inst. of Advanced Industrial Science and Tech.<br />
Ogiuchi, Yasuo, Sumitomo Electric Industries Ltd.<br />
Higashikubo, Masakatsu, Sumitomo Electric Industries Ltd.<br />
A novel visual tracking algorithm is proposed in this paper. The algorithm uses pixel-pair features to discriminate between<br />
an image patch with an object in the correct position and image patches with an object in an incorrect position. The pixelpair<br />
feature is considered to be robust for the illumination change, and also is robust for partial occlusion when appropriate<br />
features are selected in every video frame. The tracking precision for a deforming object (skier) is examined and also the occlusion<br />
detection method is described.<br />
13:30-16:30, Paper TuBCT8.43<br />
Self-Calibration of Radially Symmetric Distortion by Model Selection<br />
Fujiki, Jun, National Inst. of Advanced Industrial Science and Tech.<br />
Hino, Hideitsu, Waseda Univ.<br />
Usami, Yumi, Waseda Univ.<br />
Akaho, Shotaro, National Inst. of Advanced Industrial Science and Tech.<br />
Murata, Noboru, Waseda Univ.<br />
For self-calibration of general radially symmetric distortion (RSD) of omni directional cameras such as fish-eye lenses, calibration<br />
parameters are usually estimated so that curved lines, which are supposed to be straight in the real-world, are mapped<br />
to straight lines in the calibrated image, which is assumed to be taken by an ideal pin-hole camera. In this paper, a method<br />
of calibrating RSD is introduced base on the notion of principal component analysis (PCA). In the proposed method, the<br />
distortion function, which maps a distorted image to an ideal pin-hole camera image, is assumed to be a linear combination<br />
of a certain class of basis functions, and an algorithm for solving its coefficients by using line patterns is given. Then a<br />
- 139 -
method of selecting good basis functions is proposed, which aims to realize appropriate calibration in practice. Experimental<br />
results for synthetic data and real images are presented to emonstrate the performance of our calibration method.<br />
13:30-16:30, Paper TuBCT8.44<br />
A Global Spatio-Temporal Representation for Action Recognition<br />
Deng, Chao, Tianjin Univ.<br />
Cao, Xiaochun, Tianjin Univ.<br />
Liu, Hanyu, Univ. of Southern Mississippi<br />
Chen, Jian, Univ. of Southern Mississippi<br />
In this paper we introduce an effective method to construct a global spatio-temporal representation for action recognition.<br />
This representation is inspired by the fact that human actions can be treated as 3D shapes induced by the silhouettes in the<br />
space-time volume. We estimate the silhouettes which contain detailed shape information of the action, and present an<br />
efficient sampling method to extract interest points along the silhouettes. The local interest point is represented by a spatiotemporal<br />
descriptor based on 2D DAISY. Our global space-time representation is the integration of these local descriptors<br />
in an order along the silhouette. In this manner, we not only utilize the static shape information, but also the spatial-temporal<br />
cue. We have obtained impressive results on publicly available action datasets.<br />
13:30-16:30, Paper TuBCT8.45<br />
Super-Resolution Texture Mapping from Multiple View Images<br />
Iiyama, Masaaki, Kyoto Univ.<br />
Kakusho, Koh, Kwansei Gakuin Univ.<br />
Minoh, Michihiko, Kyoto Univ.<br />
This paper presents an artifact-free super resolution texture mapping from multiple-view images. The multiple-view images<br />
are upscaled with a learning-based super resolution technique and are mapped onto a 3D mesh model. However, mapping<br />
multiple-view images onto a 3D model is not an easy task, because artifacts may appear when different upscaled images are<br />
mapped onto neighboring meshes. We define a cost function that becomes large when artifacts appear on neighboring meshes,<br />
and our method seeks the image-and mesh assignment that minimizes the cost function. Experimental results with real images<br />
demonstrate the effectiveness of our method.<br />
13:30-16:30, Paper TuBCT8.46<br />
Automatic Weak Calibration of Master-Slave Surveillance System based on Mosaic Image<br />
Li, You, Shanghai Jiao Tong University<br />
Song, Li, Shanghai Jiao Tong University<br />
Wang, Jia, Shanghai Jiao Tong University<br />
A master-slave camera surveillance system is composed of one(or more) wide FOV(field of view) static camera and one(or<br />
more) dynamic PTZ(Pan-Tilt-Zoom) camera. In such a system, master camera monitors a wide field and provides positional<br />
information of interesting objects to slave camera so that it can dynamically track them. This paper describes a novel method<br />
for the calibration of master-slave surveillance. The method uses a mosaic image created by snapshots of slave camera to estimate<br />
the relationship between static master camera plane and pan-tilt controls of slave camera. Compared with other ways,<br />
this solution provides an efficient and automatic way to calibration of a master-slave system.<br />
13:30-16:30, Paper TuBCT8.47<br />
Reconstruction-Free Parallel Planes Identification from Uncalibrated Images<br />
Habed, Adlane, Univ. deBourgogne<br />
Amintabar, Amirhasan, Univ. of Windsor<br />
Boufama, Boubakeur, Univ. of Windsor<br />
This paper proposes a new method for identifying parallel planes in a scene from three or more uncalibrated images. By<br />
using the fact that parallel planes intersect at infinity, we were able to devise a linear relationship between the inter-image<br />
homographies of the parallel planes and the plane at infinity. This relationship is combined with the so-called modulus constraint<br />
for identifying pairs of parallel planes solely from point correspondences. Experiments with both synthetic and real<br />
images have validated our method.<br />
- 140 -
13:30-16:30, Paper TuBCT8.48<br />
Accurate Dense Stereo by Constraining Local Consistency on Superpixels<br />
Mattoccia, Stefano, Univ. of Bologna<br />
Segmentation is a low-level vision cue often deployed by stereo algorithms to assume that disparity within superpixels<br />
varies smoothly. In this paper, we show that constraining, on a superpixel basis, the cues provided by a recently proposed<br />
technique, which explicitly models local consistency among neighboring points, yields accurate and dense disparity fields.<br />
Our proposal, starting from the initial disparity hypotheses of a fast dense stereo algorithm based on scan line optimization,<br />
demonstrates its effectiveness by enabling us to obtain results comparable to top-ranked algorithms based on iterative disparity<br />
optimization methods.<br />
13:30-16:30, Paper TuBCT8.49<br />
On-Line Structure and Motion Estimation based on an Novel Parameterized Extended Kalman Filter<br />
Haner, Sebastian, Lund Univ. of Tech.<br />
Heyden, Anders, Lund Univ.<br />
Estimation of structure and motion in computer vision systems can be performed using a dynamic systems approach,<br />
where states and parameters in a perspective system are estimated. We present a novel on-line method for structure and<br />
motion estimation in densely sampled image sequences. The proposed method is based on an extended Kalman filter and<br />
a novel parameterization. We assume calibrated cameras and derive a dynamic system describing the motion of the camera<br />
and the image formation. By a change of coordinates, we represent this system by normalized image coordinates and the<br />
inverse depths. Then we apply an extended Kalman filter for estimation of both structure and motion. The performance of<br />
the proposed method is demonstrated in both simulated and real experiments. We furthermore compare our method to the<br />
unified inverse depth parameterization and show that we achieve superior results.<br />
13:30-16:30, Paper TuBCT8.51<br />
Discriminant and Invariant Color Model for Tracking under Abrupt Illumination Changes<br />
Scandaliaris, Jorge, CSIC-UPC<br />
Sanfeliu, Alberto, Univ. Pol. De Catalunya<br />
The output from a color imaging sensor, or apparent color, can change considerably due to illumination conditions and<br />
scene geometry changes. In this work we take into account the dependence of apparent color with illumination an attempt<br />
to find appropriate color models for the typical conditions found in outdoor settings. We evaluate three color based trackers,<br />
one based on hue, another based on an intrinsic image representation and the last one based on a proposed combination of<br />
a chromaticity model with a physically reasoned adaptation of the target model. The evaluation is done on outdoor sequences<br />
with challenging illumination conditions, and shows that the proposed method improves the average track completeness<br />
by over 22% over the hue-based tracker and the closeness of track by over 7% over the tracker based on the<br />
intrinsic image representation.<br />
13:30-16:30, Paper TuBCT8.52<br />
Using Local Affine Invariants to Improve Image Matching<br />
Fleck, Daniel, George Mason Univ.<br />
Duric, Zoran, George Mason Univ.<br />
A method to classify tentative feature matches as inliers or outliers to a transformation model is presented. It is well known<br />
that ratios of areas of corresponding shapes are affine invariants [6]. Our algorithm uses consistency of ratios of areas in<br />
pairs of images to classify matches as inliers or outliers. The method selects four matches within a region, and generates<br />
all possible corresponding triangles. All matches are classified as inliers or outliers based on the variance among the ratio<br />
of areas of the triangles. The selected inliers are used to compute a homography transformation. We present experimental<br />
results showing significant improvements over the baseline RANSAC algorithm for pairs of images from the Zurich Building<br />
Database.<br />
- 141 -
13:30-16:30, Paper TuBCT8.53<br />
Segmenting Video Foreground using a Multi-Class MRF<br />
Dickinson, Patrick, Univ. of Lincoln<br />
Hunter, Andrew, Univ. of Lincoln<br />
Appiah, Kofi, Lincoln Univ.<br />
Methods of segmenting objects of interest from video data typically use a background model to represent an empty, static<br />
scene. However, dynamic processes in the background, such as moving foliage and water, can act to undermine the robustness<br />
of such methods and result in false positive object detections. Techniques for reducing errors have been proposed,<br />
including Markov Random Field (MRF) based pixel classification schemes, and also the use of region-based models. The<br />
work we present here combines these two approaches, using a region-based background model to provide robust likelihoods<br />
for multi-class MRF pixel labelling. Our initial results show the effectiveness of our method, by comparing performance<br />
with an analogous per-pixel likelihood model.<br />
13:30-16:30, Paper TuBCT8.54<br />
Real-Time Pose Regression with Fast Volume Descriptor Computation<br />
Hirai, Michiro, NAIST<br />
Ukita, Norimichi, Nara Inst. of Science and Tech.<br />
Kidode, Masatsugu, NAIST<br />
We present a real-time method for estimating the pose of a human body using its 3D volume obtained from synchronized<br />
videos. The method achieves pose estimation by pose regression from its 3D volume. While the 3D volume allows us to<br />
estimate the pose robustly against self occlusions, 3D volume analysis requires a large amount of computational cost. We<br />
propose fast and stable volume tracking with efficient volume representation in a low dimensional dynamical model. Experimental<br />
results demonstrated that pose estimation of a body with a significantly deformable clothing could run at around<br />
60 fps.<br />
TuBCT9 Lower Foyer<br />
Document Analysis Poster Session<br />
Session chair: Arica, Nafiz (Turkish Naval Academy)<br />
13:30-16:30, Paper TuBCT9.1<br />
Robust Staffline Thickness and Distance Estimation in Binary and Gray-Level Music Scores<br />
Cardoso S., Jaime, Univ. do Porto<br />
Silva, Rebelo, Ana Maria, Univ. do Porto<br />
The optical recognition of handwritten musical scores by computers remains far from ideal. Most OMR algorithms rely<br />
on an estimation of the staff line thickness and the vertical line distance within the same staff. Subsequent operation can<br />
use these values as references, dismissing the need for some predetermined threshold values. In this work we improve on<br />
previous conventional estimates for these two reference lengths. We start by proposing a new method for binarized music<br />
scores and then extend the approach for gray-level music scores. An experimental study with 50 images is used to assess<br />
the interest of the novel method.<br />
13:30-16:30, Paper TuBCT9.2<br />
Hierarchical Decomposition of Handwriting Deformation Vector Field for Improving Recognition Accuracy<br />
Wakahara, Toru, Hosei Univ.<br />
Uchida, Seiichi, Kyushu Univ.<br />
This paper addresses the problem of how to extract, describe, and evaluate handwriting deformation from the deterministic<br />
viewpoint for improving recognition accuracy. The key ideas are threefold. The first is to extract handwriting deformation<br />
vector field (DVF) between a pair of input and target images by 2D warping. The second is to hierarchically decompose<br />
the DVF by a parametric deformation model of global/local affine transformation, where local affine transformation is iteratively<br />
applied to the DVF by decreasing window sizes. The third is to accept only low-order deformation components<br />
as natural, within-class handwriting deformation. Experiments using the handwritten numeral database IPTP CDROM1B<br />
show that correlation-based matching absorbing components of global affine transformation and local affine transformation<br />
up to the 3 rd order achieved a higher recognition rate of 92.1% than that of 87.0% obtained by original 2D warping.<br />
- 142 -
13:30-16:30, Paper TuBCT9.3<br />
Prototype-Based Methodology for the Statistical Analysis of Local Features in Stereotypical Handwriting Tasks<br />
O’Reilly, Christian, École Pol. De Montreal<br />
Plamondon, Réjean, École Pol. De Montréal<br />
A three steps methodology is proposed to derive consistent sets of local features which may be easily compared between<br />
the different samples of a stereotypical human handwriting movement, allowing the statistical analysis its local variability.<br />
This technique is illustrated using the Sigma-Lognormal modeling of on-line triangular trajectory patterns obtained from<br />
a standardized neuromuscular task. The overall approach can be adapted and generalized to the analysis of the end-effector<br />
kinematics of many planar upper limb movements.<br />
13:30-16:30, Paper TuBCT9.4<br />
The Snippet Statistics of Font Recognition<br />
Lidke, Jakub, Fraunhofer IAIS<br />
Thurau, Christian, Fraunhofer IAIS<br />
Bauckhage, Christian, Fraunhofer IAIS<br />
This paper considers the topic of automatic font recognition. The task is to recognize a specific font from a text snippet.<br />
Unlike previous contributions, we evaluate, how the frequencies of certain letters or words influence automatic recognition<br />
systems. The evaluation provides estimates on the general feasibility of font recognition under various changing conditions.<br />
Results on a data-set containing 747 different fonts shows that precision can vary between 16% and 94%, dependent on<br />
(I) which letters are provided, (ii) how many letters are provided, and (iii) which language is used – as these factors considerably<br />
influence the text snippet statistics. As a second contribution, we introduce a novel bag-of-features based approach<br />
to font recognition.<br />
13:30-16:30, Paper TuBCT9.5<br />
A Study of Designing Compact Recognizers of Handwritten Chinese Characters using Multiple-Prototype based<br />
Classifiers<br />
Wang, Yongqiang, The Univ. of Hong Kong<br />
Huo, Qiang, Microsoft Res. Asia<br />
We present a study of designing compact recognizers of handwritten Chinese characters using multiple-prototype based<br />
classifiers. A modified Quick prop algorithm is proposed to optimize a sample-separation-margin based minimum classification<br />
error objective function. Split vector quantization technique is used to compress classifier parameters. Benchmark<br />
results are reported for classifiers with different footprints trained from about 10 million samples on a recognition task<br />
with a vocabulary of 9282 character classes which include 9119 Chinese characters, 62 alphanumeric characters, 101<br />
punctuation marks and symbols.<br />
13:30-16:30, Paper TuBCT9.6<br />
Membership Functions for Zoning-Based Recognition of Handwritten Digits<br />
Impedovo, Sebastiano, Univ. degli Studi di Bari<br />
Impedovo, Donato, Pol. Di Bari<br />
Pirlo, Giuseppe, Univ. degli Studi di Bari<br />
Modugno, Raffaele, Univ. of Bari “Aldo Moro”<br />
This paper focuses the role of membership functions in zoning based classification. In fact, the effectiveness of a zoning<br />
methods depends not only on the way in which the pattern image is partitioned by the zoning, but also on the criteria<br />
adopted to define the way in which a feature influences the diverse zones. For this purpose, an experimental investigation<br />
is presented, that focuses the most valuable way in which a features spreads its influence on the zones of the pattern image.<br />
The experimental tests have been carried out in the field of handwritten digit recognition, using the numeral digits of the<br />
CEDAR database. The result points out the membership function has a paramount relevance on the classification performance<br />
and demonstrate that the exponential model outperforms other membership functions.<br />
- 143 -
13:30-16:30, Paper TuBCT9.7<br />
Scribe Identification in Medieval English Manuscripts<br />
Gilliam, Tara, Univ. of York<br />
Wilson, Richard, Univ. of York<br />
Clark, John A., Univ. of York<br />
In this paper we present work on automated scribe identification on a new Middle-English manuscript dataset from around<br />
the 14 th – 15 th century. We discuss the image and textual problems encountered in processing historical documents, and<br />
demonstrate the effect of accounting for manuscript style on the writer identification rate. The grapheme code<strong>book</strong> method<br />
is used to achieve a Top-1 classification accuracy of up to 77% with a modification to the distance measure. The performance<br />
of the Sparse Multinomial Logistic Regression classifier is compared against five k-nn classifiers. We also consider<br />
classification against the principal components and propose a method for visualising the principal component vectors in<br />
terms of the original grapheme features.<br />
13:30-16:30, Paper TuBCT9.8<br />
Recognition of Handwritten Arabic (Indian) Numerals using Freeman’s Chain Codes and Abductive Network Classifier<br />
Lawal, Isah Abdullahi, King Fahd Univ. of Petroleum & Minerals<br />
Abdel-Aal, Radwan E., King Fahd Univ. of Petroleum & Minerals<br />
Mahmoud, Sabri A., King Fahd Univ. of Petroleum & Minerals<br />
Accurate automatic recognition of handwritten Arabic numerals has several important applications, e.g. in banking transactions,<br />
automation of postal services, and other data entry related applications. A number of modelling and machine learning<br />
techniques have been used for handwritten Arabic numerals recognition, including Neural Network, Support Vector<br />
Machine, and Hidden Markov Models. This paper proposes the use of abductive networks to the problem. We studied the<br />
performance of abductive network architecture on a dataset of 21120 samples of handwritten 0-9 digits produced by 44<br />
writers. We developed a new feature set using histograms of contour points chain codes. Recognition rates as high as<br />
99.03% were achieved, which surpass the performance reported in the literature for other recognition techniques on the<br />
same data set. Moreover, the technique achieves a significant reduction in the number of features required.<br />
13:30-16:30, Paper TuBCT9.9<br />
A SVM-HMM based Online Classifier for Handwritten Chemical Symbols<br />
Zhang, Yang, Nankai Univ.<br />
Shi, Guangshun, Nankai Univ.<br />
Wang, Kai, Nankai Univ.<br />
This paper presents a novel double-stage classifier for handwritten chemical symbols recognition task. The first stage is<br />
rough classification, SVM method is used to distinguish non-ring structure (NRS) and organic ring structure (ORS) symbols,<br />
while HMM method is used for fine recognition at second stage. A point-sequence-reordering algorithm is proposed<br />
to improve the recognition accuracy of ORS symbols. Our test data set contains 101 chemical symbols, 9090 training<br />
samples and 3232 test samples. Finally, we obtained top-1 accuracy of 93.10% and top-3 accuracy of 98.08% based on<br />
the test data set.<br />
13:30-16:30,Paper TuBCT9.10<br />
Symbol Recognition Combining Vectorial and Pixel-Level Features for Line Drawings<br />
Su, Feng, Nanjing Univ.<br />
Lu, Tong, Nanjing Univ.<br />
Yang, Ruoyu, Nanjing Univ.<br />
In this paper, we present an approach for symbol representation and recognition in line drawings, integrating both the vector-based<br />
structural description and pixel-level statistical features of the symbol. For the former, a vectorial template is<br />
defined on the basis of the vectorization model and exploited in segmenting symbols from the line network. For the latter,<br />
a Radon-transform-based signature is employed to characterize shapes on the symbol and the components level. Experimental<br />
results on real technical drawings are presented to show the promising aspect of our approach.<br />
- 144 -
13:30-16:30,Paper TuBCT9.11<br />
Writing Order Recovery from Off-Line Handwriting by Graph Traversal<br />
Cordella, Luigi P., Univ. di Napoli Federico II<br />
De Stefano, Claudio, Univ. di Napoli Federico II<br />
Marcelli, Angelo, Univ. of Salerno<br />
Santoro, Adolfo, Univ. of Salerno<br />
We present a method to recover the dynamic writing order from static images of handwriting. The static handwriting is<br />
initially represented by its skeleton, which is then converted into a graph, whose arcs correspond to the skeleton branches,<br />
and nodes to either end point or branch point of the skeleton. Criteria derived by handwriting generation are then applied<br />
to transform the graph in such a way that all its nodes, but the first and the last, have an even degree, so that it can be traversed<br />
from the first to the last by using the Fleury’s algorithm. The experimental results show that combining criteria derived<br />
from handwriting generation models with graph traversal leads to reconstruct the original sequence produced by a<br />
writer even in case of complex handwriting, i.e handwriting with retracing, crossings and pen-up’s.<br />
13:30-16:30,Paper TuBCT9.12<br />
Holistic Urdu Handwritten Word Recognition using Support Vector Machine<br />
Sagheer, Malik Waqas, CENPARMI, Concordia Univ.<br />
He, Chun Lei, Concordia Univ.<br />
Nobile, Nicola, Concordia Univ. CENPARMI<br />
Suen, Ching Y.<br />
Since the Urdu language has more isolated letters than Arabic and Farsi, a research on Urdu handwritten word is desired.<br />
This is a novel approach to use the compound features and a Support Vector Machine (SVM) in offline Urdu word recognition.<br />
Due to the cursive style in Urdu, a classification using a holistic approach is adapted efficiently. Compound feature<br />
sets, which involves in structural and gradient features (directional features), are extracted on each Urdu word. Experiments<br />
have been conducted on the CENPARMI Urdu Words Database, and a high recognition accuracy of 97.00% has been<br />
achieved.<br />
13:30-16:30,Paper TuBCT9.13<br />
A Framework for the Combination of Different Arabic Handwritten Word Recognition Systems<br />
El Abed, Haikal, Braunschweig Tech. Univ.<br />
Märgner, Volker, Braunschweig Tech. Univ.<br />
In this paper we present A Framework for the Combination of Different Arabic Handwritten Word Recognition Systems<br />
to achieve a decision with a higher performance. This performance can be expressed by lower rejection rates and higher<br />
recognition rates. The used methods range from voting schemes based on results of different recognizer to a neural network<br />
decision based on normalized confidences. This work presents an extension of the well known combination methods for<br />
a large lexicon, an extension from maximum 30 classes (e.g., 10 classes for digits classification) to 937 classes for the<br />
IfN/ENIT-database. In addition, different reject rules based on the evaluation and analysis of individual and combined<br />
systems output are discussed. Different threshold function for reject levels are tested and evaluated. Tests with a set of<br />
recognizer, which participated in the ICDAR 2007 competition and based on set coming from the IfN/ENIT-database<br />
show that a word error rate (WER) of 5.29% without reject and with a reject rate less than 25% even a word error rate of<br />
less than 1%.<br />
13:30-16:30,Paper TuBCT9.15<br />
Degraded Character Recognition by Image Quality Evaluation<br />
Liu, Chunmei, Tongji Univ.<br />
The character image quality plays an important role in degraded character recognition which could tell the recognition<br />
difficulty. This paper proposed a novel approach to degraded character recognition by three kinds of independent degradation<br />
sources. It is composed of two stems: character image quality evaluation, character recognition. Firstly, it presents<br />
the dual-evaluation to evaluate the image quality of the input character. Secondly, according to the input evaluation result,<br />
the character recognition sub-systems adaptively act on. These sub-systems are trained by character sets whose image<br />
qualities are similar to the input’s quality, and have special features and special classifiers respectively. Experiment results<br />
demonstrate the proposed approach highly improved the performance of degraded character recognition system.<br />
- 145 -
13:30-16:30,Paper TuBCT9.16<br />
Offline Arabic Handwriting Identification using Language Diacritics<br />
Lutf, Mohammed, Huazhong Univ. of Science and Tech.<br />
You, Xinge, Huazhong Univ. of Science and Tech.<br />
Li, Hong, Huazhong Univ. of Science and Tech.<br />
In this paper, we present an approach for writer identification using off-line Arabic handwriting. The proposed method introduced<br />
Arabic writing in a new form, by presenting Arabic writing in its basic components instead of alphabetic. We<br />
split the input document into two parts: one for the letters and the other for the diacritics, we extract all diacritics from the<br />
input image and calculate the LBP histogram for each diacritic then concatenate these histograms to use it as handwriting<br />
features. We use the IFN/ENIT database in the experiments reported here and our tests involve 287 writers. The results<br />
show that our method is very effective and makes the handling of the Arabic handwriting more easily than before.<br />
13:30-16:30,Paper TuBCT9.17<br />
Removing Rule-Lines from Binary Handwritten Arabic Document Images using Directional Local Profile<br />
Shi, Zhixin, SUNY at Buffalo<br />
Setlur, Srirangaraj, Univ. at Buffalo<br />
Govindaraju, Venu, Univ. at Buffalo<br />
In this paper, we present a novel approach for detecting and removing pre-printed rule-lines from binary handwritten<br />
Arabic document images. The proposed technique is based on a directional local profiling approach for the detection of<br />
the rule-line locations. Then a refined adaptive vertical run-length search is designed for removing the rule-line pixels<br />
without much damaging to the text. They are also tolerate to the variations in the rule-lines such as broken lines, orientation<br />
changes and variation in the thickness of the rule-lines. Analysis of experimental results on the DARPA MADCAT Arabic<br />
handwritten document data indicates that the method is robust and is capable of correctly removing rule-lines.<br />
13:30-16:30,Paper TuBCT9.18<br />
A Bag-of-Pages Approach to Unordered Multi-Page Document Classification<br />
Gordo, Albert, Univ. Autònoma de Barcelona<br />
Perronnin, Florent, Xerox Res. Centre Europe<br />
We consider the problem of classifying documents containing multiple unordered pages. For this purpose, we propose a<br />
novel bag-of-pages document representation. To represent a document, one assigns every page to a prototype in a code<strong>book</strong><br />
of pages. This leads to a histogram representation which can then be fed to any discriminative classifier. We also consider<br />
several refinements over this initial approach. We show on two challenging datasets that the proposed approach significantly<br />
outperforms a baseline system.<br />
13:30-16:30,Paper TuBCT9.19<br />
Fast Seamless Skew and Orientation Detection in Document Images<br />
Konya, Iuliu Vasile, Fraunhofer IAIS<br />
Eickeler, Stefan, Fraunhofer IAIS<br />
Seibert, Christoph, Fraunhofer IAIS<br />
Reliable and generic methods for skew detection are a necessity for any large-scale digitization projects. As one of the<br />
first processing steps, skew detection and correction has a heavy influence on all further document analysis modules, such<br />
as geometric and logical layout analysis. This paper introduces a generic, scale-independent algorithm capable of accurately<br />
detecting the global skew angle of document images within the range [-90,90] degrees. By using the same framework, the<br />
algorithm is then extended for Roman script documents so as to cope with the full range [-180,180) degrees of possible<br />
skew angles. Despite its generality, the improved algorithm is very fast and requires no explicit parameters. Experiments<br />
on a combined test set comprising around 110000 real-life images show the accuracy and robustness of the proposed<br />
method.<br />
- 146 -
13:30-16:30,Paper TuBCT9.20<br />
Unsupervised Block Covering Analysis for Text-Line Segmentation of Arabic Ancient Handwritten Document Images<br />
Boussellaa, Wafa, Univ. of Sfax<br />
Zahour, Abderrazek, Braunschweig Technical University<br />
El Abed, Haikal, Havre Univ.<br />
Benabdelhafid, Abdellatif, Braunschweig Technical University<br />
Alimi, Adel M., Univ. of Sfax<br />
This paper presents a new method for automatic text-line extraction from Arabic historical handwritten documents presenting<br />
an overlapping and multi-touching characters problems. Our approach is based on block covering analysis using unsupervised<br />
technique. This algorithm performs firstly a statistical block analysis which computes the optimal number of document decomposition<br />
into vertical strips. Then, our algorithm achieves a fuzzy as eline detection using fuzzy Cmeans algorithm. Finally,<br />
blocks are assigned to its corresponding lines. Experiment results show that the proposed method achieves high accuracy<br />
about 95% for detecting text lines in Arabic historical handwritten document images written with different scripts.<br />
13:30-16:30,Paper TuBCT9.21<br />
A Bi-Modal Handwritten Text Corpus: Baseline Results<br />
Pastor, Moises, Univ. Pol. De Valencia<br />
Toselli, Alejandro Héctor, Univ. Pol. De Valencia<br />
Casacuberta, Francisco, Univ. Pol. De Valencia<br />
Vidal, Enrique, Univ. Pol. De Valencia<br />
Handwritten text is generally captured through two main modalities: off-line and on-line. Smart approaches to handwritten<br />
text recognition (HTR) may take advantage of both modalities if they are available. This is for instance the case in computer-assisted<br />
transcription of text images, where on-line text can be used to interactively correct errors made by a main offline<br />
HTR system. We present here baseline results on the biMod-IAM-PRHLT corpus, which was recently compiled for<br />
experimentation with techniques aimed at solving the proposed multi-modal HTR problem, and is being used in one of the<br />
official <strong>ICPR</strong>-<strong>2010</strong> contests.<br />
13:30-16:30,Paper TuBCT9.22<br />
Feature Selection using Multiobjective Optimization for Named Entity Recognition<br />
Ekbal, Asif, Univ. of Heidelberg<br />
Saha, Sriparna, Univ. of Heidelberg<br />
Garbe, Christoph S., Univ. of Heidelberg<br />
Appropriate feature selection is a very crucial issue in any machine learning framework, specially in Maximum Entropy<br />
(ME). In this paper, the selection of appropriate features for constructing a ME based Named Entity Recognition (NER) system<br />
is posed as a multiobjective optimization (MOO) problem. Two classification quality measures, namely recall and precision<br />
are simultaneously optimized using the search capability of a popular evolutionary MOO technique, NSGA-II. The<br />
proposed technique is evaluated to determine suitable feature combinations for NER in two languages, namely Bengali and<br />
English that have significantly different characteristics. Evaluation results yield the recall, precision and F-measure values<br />
of 70.76%, 81.88% and 75.91%, respectively for Bengali, and 78.38%, 81.27% and 79.80%, respectively for English. Comparison<br />
with an existing ME based NER system shows that our proposed feature selection technique is more efficient than<br />
the heuristic based feature selection.<br />
13:30-16:30,Paper TuBCT9.23<br />
Redif Extraction in Handwritten Ottoman Literary Texts<br />
Can, Ethem Fatih, Bilkent Univ.<br />
Duygulu, Pinar, Bilkent Univ.<br />
Can, Fazli, Bilkent Univ.<br />
Kalpakli, Mehmet, Bilkent Univ.<br />
Repeated patterns, rhymes and redifs, are among the fundamental building blocks of Ottoman Divan poetry. They provide<br />
integrity of a poem by connecting its parts and bring a melody to its voice. In Ottoman literature, poets wrote their works by<br />
making use of the rhymes and redifs of previous poems according to the nazire (creative imitation) tradition either to prove<br />
their expertise or to show respect towards old masters. Automatic recognition of redifs would provide important data mining<br />
- 147 -
opportunities in literary analyses of Ottoman poetry where the majority of it is in handwritten form. In this study, we propose<br />
a matching criterion and method, Red if Extraction using Contour Segments (RECS) using the proposed matching criterion,<br />
that detects redifs in handwritten Ottoman literary texts using only visual analysis. Our method provides a success rate of<br />
0.682 in a test collection of 100 poems.<br />
13:30-16:30,Paper TuBCT9.24<br />
Analysis of Local Features for Handwritten Character Recognition<br />
Uchida, Seiichi, Kyushu Univ.<br />
Liwicki, Marcus, DFKI<br />
This paper investigates a part-based recognition method of handwritten digits. In the proposed method, the global structure<br />
of digit patterns is discarded by representing each pattern by just a set of local feature vectors. The method is then comprised<br />
of two steps. First, each of J local feature vectors of a target pattern is recognized into one of ten categories (``0’’—``9’’) by<br />
the nearest neighbor discrimination with a large database of reference vectors. Second, the category of the target pattern is<br />
determined by the majority voting on the J local recognition results. Despite a pessimistic expectation, we have reached<br />
recognition rates much higher than 90% for the task of digit recognition.<br />
13:30-16:30,Paper TuBCT9.25<br />
Detect Visual Spoofing in Unicode-Based Text<br />
Qiu, Bite, City Univ. of Hong Kong<br />
Fang, Ning, City Univ. of Hong Kong<br />
Liu, Wenyin, City U of HK<br />
Visual spoofing in Unicode-based text is anticipated as a severe web security problem in the near future as more and more<br />
Unicode-based web documents will be used. In this paper, to detect whether a suspicious Unicode character in a word is<br />
visual spoofing or not, the context of the suspicious character is utilized by employing a Bayesian framework. Specifically,<br />
two contexts are taken into consideration: simple context and general context. Simple context of a suspicious character is<br />
the word where the character exists while general context consists of all homoglyphs of the character within Universal Character<br />
Set (UCS). Three decision rules are designed and used jointly for convicting a suspicious character. Preliminary evaluations<br />
and user study show that the proposed approach can detect Unicode-based visual spoofing with high effectiveness<br />
and efficiency.<br />
13:30-16:30,Paper TuBCT9.26<br />
Comparing Several Techniques for Offline Recognition of Printed Mathematical Symbols<br />
Álvaro, Francisco, Inst. Tecnológico de Informática<br />
Sánchez, Joan Andreu, Univ. Pol. De Valencia<br />
Automatic recognition of printed mathematical symbols is a fundamental problem for recognition of mathematical expressions.<br />
Several classification techniques has been previously used, but there are very few works that compare different classification<br />
techniques on the same database and with the same experimental conditions. In this work we have tested classical<br />
and novelty classification techniques for mathematical symbol recognition on two databases.<br />
13:30-16:30,Paper TuBCT9.27<br />
Symbol Classification using Dynamic Aligned Shape Descriptor<br />
Fornés, Alicia Computer Vision Center<br />
Escalera, Sergio UB<br />
Llados, Josep Computer Vision Center<br />
Valveny, Ernest Univ. Autònoma de Barcelona<br />
Shape representation is a difficult task because of several symbol distortions, such as occlusions, elastic deformations, gaps<br />
or noise. In this paper, we propose a new descriptor and distance computation for coping with the problem of symbol recognition<br />
in the domain of Graphical Document Image Analysis. The proposed D-Shape descriptor encodes the arrangement information<br />
of object parts in a circular structure, allowing different levels of distortion. The classification is performed using<br />
a cyclic Dynamic Time Warping based method, allowing distortions and rotation. The methodology has been validated on<br />
different data sets, showing very high recognition rates.<br />
- 148 -
13:30-16:30,Paper TuBCT9.28<br />
Document Logo Detection and Recognition using Bayesian Model<br />
Wang, Hongye, Tsinghua Univ.<br />
Chen, Youbin, Tsinghua Univ.<br />
This paper presents a simple, dynamic approach to logo detection and recognition in document images. Although there<br />
are literatures on both logo detection and logo recognition issues, Current methods lack the adaptability to variable realworld<br />
documents. In this paper we initially observe this deficiency from a different point of view and reveal its inherent<br />
causation. Then we reorganize the structure of the logo detection and recognition procedures and integrate them into a<br />
unified framework. By applying feedback and selecting proper features, we make our framework dynamic and interactive.<br />
Experiments show that the proposed method outperforms existing methods in document processing domain.<br />
13:30-16:30,Paper TuBCT9.29<br />
An Efficient Staff Removal Approach from Printed Musical Documents<br />
Dutta, Anjan, Univ. Autonoma de Barcelona<br />
Pal, Umapada, Indian Statistical Inst.<br />
Fornés, Alicia, Computer Vision Center<br />
Llados, Josep, Computer Vision Center<br />
Staff removal is an important preprocessing step of the Optical Music Recognition (OMR). The process aims to remove<br />
the stafflines from a musical document and retain only the musical symbols, later these symbols are used effectively to<br />
identify the music information. This paper proposes a simple but robust method to remove stafflines from printed musical<br />
scores. In the proposed methodology we have considered a staffline segment as a horizontal linkage of vertical black runs<br />
with uniform height. We have used the neighbouring properties of a staffline segment to validate it as a true segment. We<br />
have considered the dataset along with the deformations described in \cite{ex8} for evaluation purpose. From experimentation<br />
we have got encouraging results.<br />
13:30-16:30,Paper TuBCT9.30<br />
Combining Spectral and Spatial Features for Robust Foreground-Background Separation<br />
Lettner, Martin, Vienna Univ. of Tech.<br />
Sablatnig, Robert, Vienna Univ. of Tech.<br />
Foreground-background separation in multispectral images of damaged manuscripts can benefit from both, spectral and<br />
spatial information. Therefore, we incorporate a Markov Random Field which provides a powerful tool to combine both<br />
features simultaneously. Higher order models enable the inclusion of spatial constraints based on stroke characteristics.<br />
We apply belief propagation for inference and include the higher order potentials by upgrading the message update. The<br />
proposed segmentation method requires no training and is independent of script, size, and style of characters. We will<br />
demonstrate the robust performance on a set of degraded documents and on synthetic images.<br />
13:30-16:30,Paper TuBCT9.31<br />
Unsupervised Learning of Stroke Tagger for Online Kanji Handwriting Recognition<br />
Blondel, Mathieu, Kobe Univ.<br />
Seki, Kazuhiro, Kobe Univ.<br />
Uehara, Kuniaki, Kobe Univ.<br />
Traditionally, HMM-based approaches to online Kanji handwriting recognition have relied on a hand-made dictionary,<br />
mapping characters to primitives such as strokes or substrokes. We present an unsupervised way to learn a stroke tagger<br />
from data, which we eventually use to automatically generate such a dictionary. In addition to not requiring a prior handmade<br />
dictionary, our approach can improve the recognition accuracy by exploiting unlabeled data when the amount of labeled<br />
data is limited.<br />
- 149 -
13:30-16:30,Paper TuBCT9.32<br />
A Baseline Dependent Approach for Persian Handwritten Character Segmentation<br />
Alaei, Alireza, Univ. of Mysore<br />
Nagabhushan, P., Univ. of Mysore<br />
Pal, Umapada, Indian Statistical Inst.<br />
In this paper, an efficient approach to segment Persian off-line handwritten text-line into characters is presented. The proposed<br />
algorithm first traces the baseline of the input text-line image and straightens it. Subsequently, it over-segments<br />
each word/subwords using features extracted from histogram analysis and then removes extra segmentation points using<br />
some baseline dependent as well as language dependent rules. We tested the proposed character segmentation scheme<br />
with 2 different datasets. On a test set of 899 Persian words/subwords created by us, 90.26% of the characters were segmented<br />
correctly. From another dataset of 200 handwritten Arabic word images [11] we obtained 93.49% correct segmentation<br />
accuracy.<br />
13:30-16:30,Paper TuBCT9.33<br />
Bayesian Networks Learning Algorithms for Online Form Classification<br />
Philippot, Emilie, Univ. Nancy 2, Loria<br />
Belaid, Yolande, Univ. Nancy 2, Loria<br />
Belaid, Abdel, Univ. Nancy 2, Loria<br />
In this paper a new method is presented for the recognition of online forms filled manually by a digital-type clip. This<br />
writing system transmits only the written fields without the pre-printed form. The form recognition consists in retrieving<br />
the original form directly from the filled fields without any context, which is a very challenging problem. We propose a<br />
method based on Bayesian networks. The networks use the conditional probabilities between fields in order to infer the<br />
real form. Two learning algorithms of form structures are employed to test their suitability for the case studied. The tests<br />
were conducted on the basis of 3200 forms provided by the Act image compagny, specialist in interactive writing processes.<br />
The first experiments show a recognition rate reaching more than 97%.<br />
13:30-16:30,Paper TuBCT9.34<br />
Bangla and English City Name Recognition for Indian Postal Automation<br />
Pal, Umapada, Indian Statistical Inst.<br />
Roy, R.K., Indian Statistical Inst.<br />
Kimura, Fumitaka, Mie Univ.<br />
Because of multi-lingual behavior destination address block of a postal document of an Indian state may be written in two<br />
or more scripts. From a statistical analysis of Indian postal document we noted that about 22.04% of Indian postal documents<br />
are written in two scripts. Because of inter-mixing of these scripts in postal address writings, it is very difficult to<br />
identify the script by which a city name is written. To avoid such identification difficulties, in this paper we proposed a<br />
lexicon-driven bi-lingual (English and Bangla) city name recognition scheme for Indian postal automation. We obtained<br />
93.19% accuracy when tested on 11875 city name samples.<br />
13:30-16:30,Paper TuBCT9.35<br />
Shape Code based Word-Image Matching for Retrieval of Indian Multi-Lingual Documents<br />
Tarafdar, Arundhati, Indian Statistical Inst.<br />
Mandal, Ranju, Indian Statistical Inst.<br />
Pal, Srikanta, Indian Statistical Inst.<br />
Pal, Umapada, Indian Statistical Inst.<br />
Kimura, Fumitaka, Mie Univ.<br />
In the current scenario retrieving information from document images is a challenging problem. In this paper we propose<br />
a shape code based word-image matching (word-spotting) technique for retrieval of multilingual documents written in Indian<br />
languages. Here, each query word image to be searched is represented by a primitive shape code using (I) zonal information<br />
of extreme points (ii) vertical shape based feature (iii) crossing count (with respect to vertical bar position) (iv)<br />
loop shape and position (v) background information etc. Each candidate word (a word having similar aspect ratio and<br />
topological feature to the query word) of the document is also coded accordingly. Then, an inexact string matching technique<br />
is used to measure the similarity between the primitive codes generated from the query word image and each can-<br />
- 150 -
didate word of the document with which the query image is to be searched. Based on the similarity score, we retrieve the<br />
document where the query image is found. Experimental results on Bangla, Devnagari and Gurumukhi scripts document<br />
image databases confirm the feasibility and efficiency of our proposed approach.<br />
13:30-16:30,Paper TuBCT9.36<br />
Stochastic Segment Model Adaptation for Offline Handwriting Recognition<br />
Prasad, Rohit, Raytheon BBN Tech.<br />
Bhardwaj, Anurag, SUNY Buffalo<br />
Subramanian, Krishna, Raytheon BBN Tech.<br />
Cao, Huaigu, Raytheon BBN Tech.<br />
Natarajan, P., BBN Tech.<br />
In this paper, we present techniques for unsupervised adaptation of stochastic segment models to improve accuracy on<br />
large vocabulary offline handwriting recognition (OHR) tasks. We build upon our previous work on stochastic segment<br />
modeling for Arabic OHR. In our previous work, stochastic character segments for each n-best hypothesis were generated<br />
by a hidden Markov model (HMM) recognizer, and then a segmental model was used as an additional knowledge source<br />
for re-ranking the n-best list. Here, we describe a novel framework for unsupervised adaptation. It integrates both HMM<br />
and segment model adaptation to achieve significant gains over un-adapted recognition. Experimental results demonstrate<br />
the efficacy of our proposed method on a large corpus of handwritten Arabic documents.<br />
13:30-16:30,Paper TuBCT9.37<br />
Shape-Based Image Retrieval using a New Descriptor based on the Radon and Wavelet Transforms<br />
Nacereddine, Nafaa, LORIA<br />
Tabbone, Salvatore, Univ. Nancy 2-LORIA<br />
Ziou, Djemel, Sherbrooke Univ.<br />
Hamami, Latifa, Ec. Nationale Pol.<br />
In this paper, the Radon transform is used to design a new descriptor called Phi-signature invariant to usual geometric<br />
transformations. Experiments show the effectiveness of the multilevel representation of the descriptor built from Phi-signature<br />
and R.<br />
13:30-16:30,Paper TuBCT9.38<br />
CUDA Implementation of Deformable Pattern Recognition and its Application to MNIST Handwritten Digit Database<br />
Mizukami, Yoshiki, Yamaguchi Univ.<br />
Tadamura, Katsumi, Yamaguchi Univ.<br />
Warrell, Jonathan, Oxford Brookes University<br />
Li, Peng, Univ. Coll. London<br />
Prince, Simon, Univ. Coll. London<br />
In this study we propose a deformable pattern recognition method with CUDA implementation. In order to achieve the<br />
proper correspondence between foreground pixels of input and prototype images, a pair of distance maps are generated<br />
from input and prototype images, whose pixel values are given based on the distance to the nearest foreground pixel. Then<br />
a regularization technique computes the horizontal and vertical displacements based on these distance maps. The dissimilarity<br />
is measured based on the eight-directional derivative of input and prototype images in order to leverage characteristic<br />
information on the curvature of line segments that might be lost after the deformation. The prototype-parallel displacement<br />
computation on CUDA and the gradual prototype elimination technique are employed for reducing the computational time<br />
without sacrificing the accuracy. A simulation shows that the proposed method with the k-nearest neighbor classifier gives<br />
the error rate of 0.57% for the MNIST handwritten digit database.<br />
13:30-16:30,Paper TuBCT9.39<br />
Text Independent Writer Identification for Bengali Script<br />
Chanda, Sukalpa, GJØVIK Univ. Coll.<br />
Franke, Katrin, Gjøvik Univ. Coll.<br />
Pal, Umapada, Indian Statistical Inst.<br />
Wakabayashi, Tetsushi, Mie Univ.<br />
- 151 -
Automatic identification of an individual based on his/her handwriting characteristics is an important forensic tool. In a<br />
computational forensic scenario, presence of huge amount of text/information in a questioned document cannot be always<br />
ensured. Also, compromising in terms of systems reliability under such situation is not desirable. We here propose a system<br />
to encounter such adverse situation in the context of Bengali script. Experiments with discrete directional feature and gradient<br />
feature are reported here, along with Support Vector Machine (SVM) as classifier. We got promising results of 95.19%<br />
writer identification accuracy at first top choice and 99.03% when considering first three top choices.<br />
13:30-16:30,Paper TuBCT9.40<br />
Document Image Retrieval using Feature Combination in Kernel Space<br />
Hassan, Ehtesham, Indian Inst. of Tech. Delhi<br />
Chaudhury, Santanu, Indian Inst. of Tech. Delhi<br />
Gopal, M, Indian Inst. of Tech. Delhi<br />
The paper presents application of multiple features for word based document image indexing and retrieval. A novel framework<br />
to perform Multiple Kernel Learning for indexing using the Kernel based Distance Based Hashing is proposed. The<br />
Genetic Algorithm based framework is used for optimization. Two different features representing the structural organization<br />
of word shape are defined. The optimal combination of both the features for indexing is learned by performing MKL. The<br />
retrieval results for document collection belonging to Devanagari script are presented.<br />
13:30-16:30,Paper TuBCT9.41<br />
A Novel Handwritten Urdu Word Spotting based on Connected Components Analysis<br />
Sagheer, Malik Waqas, CENPARMI, Concordia Univ.<br />
Nobile, Nicola, Concordia Univ. CENPARMI<br />
He, Chun Lei, Concordia Univ.<br />
Suen, Ching Y.<br />
We propose a novel word spotting system for Urdu words within handwritten text lines. Spatial information of diacritics<br />
is integrated to the detection of the main connected components in candidate words generation. An Urdu word recognition<br />
system is effectively designed and applied to classify the candidate words. In this word recognition system, compound<br />
features and SVM were adapted. The verification/rejection process was based on the outputs from the Urdu word recognition<br />
system and the image’s global features were applied to achieve a promising result. As a result, a high 92.11% correct<br />
segmentation rate, a 50.75% word spotting precision rate were achieved while maintaining a 70.1% recall on CENPARMI’s<br />
Urdu Database.<br />
13:30-16:30,Paper TuBCT9.42<br />
Computer Assisted Transcription of Text Images: Results on the GERMANA Corpus and Analysis of Improvements<br />
Needed for Practical Use<br />
Romero Gomez, Verónica, Univ. Pol. De Valencia<br />
Toselli, Alejandro Héctor, Univ. Pol. De Valencia<br />
Vidal, Enrique, Univ. Pol. De Valencia<br />
We present a study of the application of Computer Assisted Transcription of Text Images (CATTI) to a task which is much<br />
closer to real applications than other tasks previously studied. The new task consists in the transcription of a new publicly<br />
available historic handwritten document, called GERMANA. A detailed analysis of the main factors influencing the system<br />
performance are exposed and some strategies to circumvent them are proposed.<br />
13:30-16:30,Paper TuBCT9.43<br />
OCR Post-Processing using Weighted Finite-State Transducers<br />
Llobet, Rafael, Univ. Pol. De Valencia<br />
Navarro Cerdán, José Ramón, Univ. Pol. De Valencia<br />
Perez-Cortes, Juan-Carlos, Univ. Pol. De Valencia<br />
Arlandis, Joaquim, Univ. Pol. De Valencia<br />
A new approach for Stochastic Error-Correcting Language Modeling based on Weighted Finite-State Transducers (WFSTs)<br />
is proposed as a method to post-process the results of an optical character recognizer (OCR). Instead of using the recognized<br />
- 152 -
string as an input to the transducer, in our approach the complete set of OCR hypotheses, a sequence of vectors of a posteriori<br />
class probabilities, is used to build a WFST that is then composed with independent WFSTs for the error and language<br />
models. This combines the practical advantages of a de-coupled (OCR + post-processor) model with the full power<br />
of an integrated model.<br />
13:30-16:30,Paper TuBCT9.44<br />
Top down Analysis of Line Structure in Handwritten Documents<br />
Kasiviswanathan, Harish, Univ. at Buffalo<br />
Ball, Gregory R., Univ. at Buffalo<br />
Srihari, Sargur, Univ. at Buffalo<br />
One of the most challenging tasks in analyzing handwritten documents is to tackle the inherent skew that is introduced<br />
due to writer’s handwriting, segment the handwritten lines and estimate the skew angle and its direction. Complexities<br />
such as variable spacing between words and lines, variable line skew, variable line width and height, overlapping words<br />
and lines etc. arises in handwritten documents unlike printed documents. This paper explores the application of Radon<br />
transform to process handwritten documents and compares its performance with Hough transform while segmenting lines<br />
and detecting skew. The computational advantage of Radon transform over Hough transform with equally good results<br />
makes it an ideal choice to process handwritten documents.<br />
13:30-16:30,Paper TuBCT9.45<br />
Unsupervised Evaluation Methods based on Local Gray-Intensity Variances for Binarization of Historical Documents<br />
Ramírez-Ortegón, Marte Alejandro, Freie Univ. Berlin<br />
Rojas, Raul, Freie Univ. Berlin<br />
We attempt to evaluate the efficacy of six unsupervised evaluation method to tune Sauvola’s threshold in optical character<br />
recognition (OCR) applications. We propose local implementations of well-known measures based on gray-intensity variances.<br />
Additionally, we derive four new measures from them using the unbiased variance estimator and gray-intensity<br />
logarithms. In our experiment, we selected the well binarized images, according each measure, and computed the accuracy<br />
of the recognized text of each. The results show that the weighted and uniform variance (using logarithms) are suitable<br />
measures for OCR applications.<br />
13:30-16:30,Paper TuBCT9.46<br />
On the Significance of Stroke Size and Position for Online Handwritten Devanagari Word Recognition: An Empirical<br />
Study<br />
Bharath, A, Hewlett-Packard Lab. India<br />
Madhvanath, Sriganesh, Hewlett-Packard Lab. India<br />
Stroke size and position are considered as important information for online recognition of handwritten characters and<br />
words in oriental and Indic family of scripts especially because of their multi-stroke and two-dimensional nature. In an<br />
Indic script such as Devanagari, the vowel diacritics (matras) can occur at any position around the base consonant and<br />
there are even pairs of matras which have similar shapes and differ only in their position with respect to the base consonant.<br />
In this paper, we study the relevance of stroke size and position information for the recognition of online handwritten Devanagari<br />
words by comparing three different preprocessing schemes. Our experimental results indicate that the word recognition<br />
accuracy achieved using a preprocessing scheme that completely disregards the original sizes and positions of the<br />
strokes (and symbols) is comparable with the scheme that retains them, when the input is in discrete style, and contextual<br />
knowledge in the form of a lexicon is available.<br />
13:30-16:30,Paper TuBCT9.47<br />
Noise Tolerant Script Identification of Oriental and English Documents using a Downgraded Pixel Density Feature<br />
Wang, Ning, Concordia Univ.<br />
Lam, Louisa, Concordia Univ.<br />
Suen, Ching Y.<br />
Document Script Identification (DSI) is a very useful application in document processing. This paper presents a method<br />
for this application that uses a new noise tolerant feature, the Downgraded Pixel Density feature. Compared to other<br />
- 153 -
features widely used in existing DSI solutions, this new feature is much more robust to variations in slant, font and style<br />
of printed documents. Experimental results show that the method achieves promising identification performances.<br />
13:30-16:30,Paper TuBCT9.48<br />
Using Spatial Relations for Graphical Symbol Description<br />
K. C., Santosh, INRIA – LORIA and INPL<br />
Wendling, Laurent, Univ. Paris Descartes<br />
Lamiroy, Bart, LORIA – INPL<br />
In this paper, we address the use of unified spatial relations for symbol description. We present a topologically guided directional<br />
relation signature. It references a unique point set instead of one entity in a pair, thus avoiding problems related<br />
to erroneous choices of reference entities and preserves symmetry. We experimentally validate our method on showing its<br />
ability to serve in a symbol retrieval application, based only on a spatial relational descriptor that represents the links between<br />
the decomposed structural patterns called “vocabulary” in a spatial relational graph.<br />
13:30-16:30,Paper TuBCT9.49<br />
Automatic Discrimination between Confusing Classes with Writing Styles Verification in Arabic Handwritten Numeral<br />
Recognition<br />
He, Chun Lei, Concordia Univ.<br />
Lam, Louisa, Concordia Univ.<br />
Suen, Ching Y.<br />
In handwriting recognition, confusing/conflicting writing styles can result in irreducible errors, so the study of writing<br />
style consistencies is important for applications. In Arabic Handwritten Numeral Recognition, most errors occur between<br />
samples of classes two and three due to their very similar shapes in some writing styles. In this paper, an automated writing<br />
style detection process is effectively implemented in the pair-wise verification of samples in these two classes. As a result,<br />
the recognition results have improved significantly with a reduction by 25% of previous errors. With rejection, when the<br />
LDA (Linear Discriminant Analysis) measurement rejection threshold is adjusted to maintain the same error rate, the<br />
recognition rate increases from 96.87% to 97.81%.<br />
13:30-16:30,Paper TuBCT9.50<br />
Random Subspace Method in Text Categorization<br />
Gangeh, Mehrdad, Univ. of Waterloo<br />
Kamel, Mohamed S, Univ. of Waterloo<br />
Duin, Robert, TU Delft<br />
In text categorization (TC), which is a supervised technique, a feature vector of terms or phrases is usually used to represent<br />
the documents. Due to the huge number of terms in even a moderate-size text corpus, high dimensional feature space is<br />
an intrinsic problem in TC. Random subspace method (RSM), a technique that divides the feature space to smaller ones<br />
each submitted to a (base) classifier (BC) in an ensemble, can be an effective approach to reduce the dimensionality of the<br />
feature space. Inspired by a similar research on functional magnetic resonance imaging (fMRI) of brain, here we address<br />
the estimation of ensemble parameters, i.e., the ensemble size (L) and the dimensionality of feature subsets (M) by defining<br />
three criteria: usability, coverage, and diversity of the ensemble. We will show that relatively medium M and small L yield<br />
an ensemble that improves the performance of a single support vector machine, which is considered as the state-of-the-art<br />
in TC.<br />
13:30-16:30,Paper TuBCT9.51<br />
Shape-DNA: Effective Character Restoration and Enhancement for Arabic Text Documents<br />
Caner, Gulcin, Pol. Rain, Inc.<br />
Haritaoglu, Ismail, Pol. Rain, Inc.<br />
We present a novel learning-based image restoration and enhancement technique for improving character recognition performance<br />
of OCR products for degraded documents or documents/text captured with mobile devices such as cameraphones.<br />
The proposed technique is language independent and can simultaneously increase the effective resolution and<br />
restore broken characters with artifacts due to image capturing device such as a low quality/low resolution camera, or due<br />
- 154 -
to previous pre-processing such as extracting text region from the document image. The proposed technique develops a<br />
predictive relationship between high-resolution training images and their low-resolution/degraded counterparts, and exploits<br />
this relationship in a probabilistic scheme to generate a high resolution image from a low quality, low-resolution text<br />
image. We present a fast and scalable implementation of the proposed character restoration algorithm to improve the text<br />
recognition for document/text images captured by mobile phones. Experimental results demonstrate that the system effectively<br />
increases OCR performance for documents captured by mobile imaging devices, from levels of 50% to levels of<br />
over 80% for non-latin document/scene text images at 120dpi.<br />
13:30-16:30,Paper TuBCT9.52<br />
Linguistic Adaptation for Whole-Book Recognition<br />
Xiu, Pingping, Lehigh Univ.<br />
Baird, Henry, Lehigh Univ.<br />
Whole-<strong>book</strong> recognition is a document image analysis strategy that operates on the complete set of a <strong>book</strong>’s page images<br />
using automatic adaptation to improve accuracy. Our algorithm expects to be given approximate iconic and linguistic<br />
models—-derived from (generally errorful) OCR results and (generally incomplete) dictionaries—-and then, guided entirely<br />
by evidence internal to the test set, corrects the models yielding improved accuracy. The iconic model describes<br />
image formation and determines the behavior of a character-image classifier. The linguistic model describes word-occurrence<br />
probabilities. In previous work, we reported that adapting the iconic model alone (with a perfect linguistic model)<br />
was able to automatically reduce word error rate on a 180-page <strong>book</strong> by a large factor. In this paper, %we also adapt the<br />
linguistic model. We propose an algorithm that adapts both the iconic model and the linguistic model alternately to improve<br />
both models on the fly. The linguistic model adaptation method, which we report here, identifies new words and adds<br />
them to the dictionary. With 64.6% words in the dictionary missing, our previous algorithm reduced word error rate from<br />
40.2% to 23.2%. The new algorithm drives word error rate down further from 23.2% to 16.0%.<br />
13:30-16:30,Paper TuBCT9.53<br />
Online Arabic Handwriting Modeling System based on the Grapheme Segmentation<br />
Boubaker, Houcine, Univ. of Sfax<br />
El Baati, Abed El Karim, Univ. of Sfax<br />
Kherallah, Monji, Univ. of Sfax<br />
Alimi, Adel. M., Univ. of Sfax<br />
El Abed, Haikal, Braunschweig Tech. Univ.<br />
We present in this paper a new approach of online Arabic handwriting modeling based on the graphemes segmentation.<br />
This segmentation rests on the previous detection of baseline. It involves the detection of two types of topologically meaningful<br />
points: the backs of the valleys adjoining the baseline and the angular points. The stage of features extraction<br />
allows to model the shapes of segmented graphemes by relevant geometric parameters and to estimate their diacritics<br />
fuzzy affectation rates. The test results show a significant improvement in recognition rate with the introduction of new<br />
pertinent parameters.<br />
- 155 -
- 156 -
Technical Program for Wednesday<br />
August 25, <strong>2010</strong><br />
- 157 -
- 158 -
WeAT1 Marmara Hall<br />
Tracking and Surveillance - II Regular Session<br />
Session chair: Yilmaz, Alper (The Ohio State Univ.)<br />
09:00-09:20, Paper WeAT1.1<br />
The Fusion of Deep Learning Architectures and Particle Filtering Applied to Lip Tracking<br />
Carneiro, Gustavo, Tech. Univ. of Lisbon<br />
Nascimento, Jacinto, Inst. de Sistemas e Robótica<br />
This work introduces a new pattern recognition model for segmenting and tracking lip contours in video sequences. We<br />
formulate the problem as a general nonrigid object tracking method, where the computation of the expected segmentation<br />
is based on a filtering distribution. This is a difficult task because one has to compute the expected value using the whole<br />
parameter space of segmentation. As a result, we compute the expected segmentation using sequential Monte Carlo sampling<br />
methods, where the filtering distribution is approximated with a proposal distribution to be used for sampling. The<br />
key contribution of this paper is the formulation of this proposal distribution using a new observation model based on<br />
deep belief networks and a new transition model. The efficacy of the model is demonstrated in publicly available databases<br />
of video sequences of people talking and singing. Our method produces results comparable to state-of-the-art models, but<br />
showing potential to be more robust to imaging conditions.<br />
09:20-09:40, Paper WeAT1.2<br />
Robust Head-Shoulder Detection by PCA-Based Multilevel HOG-LBP Detector for People Counting<br />
Zeng, Chengbin, Beijing Univ. of Posts and Telecommunications<br />
Ma, Huadong, Beijing Univ. of Posts and Telecommunications<br />
Robustly counting the number of people for surveillance systems has widespread applications. In this paper, we propose<br />
a robust and rapid head-shoulder detector for people counting. By combining the multilevel HOG (Histograms of Oriented<br />
Gradients) with the multilevel LBP (Local Binary Pattern) as the feature set, we can detect the head-shoulders of people<br />
robustly, even though there are partial occlusions occurred. To further improve the detection performance, Principal Components<br />
Analysis (PCA) is used to reduce the dimension of the multilevel HOG-LBP feature set. Our experiments show<br />
that the PCA based multilevel HOG-LBP descriptors are more discriminative, more robust than the state-of-the-art algorithms.<br />
For the application of the real-time people-flow estimation, we also incorporate our detector into the particle filter<br />
tracking and achieve convincing accuracy<br />
09:40-10:00, Paper WeAT1.3<br />
Adaptive Motion Model for Human Tracking using Particle Filter<br />
Mohammad Hossein Ghaeminia, Mohammad Hossein Ghaeminia, Iran Univ. of Science and Tech.<br />
Shabani, Amir-Hossein, Univ. of Waterloo<br />
Baradaran Shokouhi, Shahriar, Iran Univ. ofScience & Tech.<br />
This paper presents a novel approach to model the complex motion of human using a probabilistic autoregressive moving<br />
average model. The parameters of the model are adaptively tuned during the course of tracking by utilizing the main<br />
varying components of the <strong>pdf</strong> of the target’s acceleration and velocity. This motion model, along with the color histogram<br />
as the measurement model, has been incorporated in the particle filtering framework for human tracking. The proposed<br />
method is evaluated by PETS benchmark in which the targets have non-smooth motion and suddenly change their motion<br />
direction. Our method competes with the state-of-the-art techniques for human tracking in the real world scenario.<br />
10:00-10:20, Paper WeAT1.4<br />
Bayesian GOETHE Tracking<br />
Wirkert, Sebastian, Ec. Centrale de Lyon<br />
Dellandréa, Emmanuel, Ec. Centrale de Lyon<br />
Chen, Liming, Ec. Centrale de Lyon<br />
Occlusions pose serious challenges when tracking multiple targets. By severly changing the measurement, they imply<br />
strong inter-target dependencies. Exact computation of these dependencies is not feasible. The GOETHE approximations<br />
preserve much of the information while staying computationally affordable.<br />
- 159 -
10:20-10:40, Paper WeAT1.5<br />
A Combined Self-Configuring Method for Object Tracking in Colour Video<br />
Rosell Ortega, Juan, Pol. Univ. of Valencia<br />
Andreu-Garcia, Gabriela, Pol. Univ. of Valencia<br />
Rodas-Jordà, Angel, Pol. Univ. of Valencia<br />
Atienza-Vanacloig, Vicente, Pol. Univ. of Valencia<br />
This paper introduces a novel approach to background modelling. We propose using initially a method to extract scene<br />
parameters from a sequence of frames. These parameters, together with an initial background model, are used as a starting<br />
point for a background subtraction method based on fuzzy logic. Our method permits modelling the background and detecting<br />
moving objects in a video sequence without user intervention. The algorithm is designed to work with CIEL*a*b*<br />
coordinates with multi modal support and eludes user parameters or fixed or probabilistic thresholds usually found in the<br />
traditional background subtraction methods. Quantitative and qualitative results obtained with a well-known benchmark<br />
and comparisons with other approaches justify the model.<br />
WeAT2 Dolmabahçe Hall A<br />
Shape Modeling - I Regular Session<br />
Session chair: De Floriani, L.<br />
09:00-09:20, Paper WeAT2.1<br />
A Geometric Invariant Shape Descriptor based on the Radon, Fourier, and Mellin Transforms<br />
Hoang, Thai V., Univ. Nancy 2-LORIA<br />
Tabbone, Salvatore, Univ. Nancy 2-LORIA<br />
A new shape descriptor invariant to geometric transformation based on the Radon, Fourier, and Mellin transforms is proposed.<br />
The Radon transform converts the geometric transformation applied on a shape image into transformation in the<br />
columns and rows of the Radon image. Invariances to translation, rotation, and scaling are obtained by applying 1D Fourier-<br />
Mellin and Fourier transforms on the columns and rows of the shape’s Radon image respectively. Experimental results on<br />
different datasets show the usefulness of the proposed shape descriptor.<br />
09:20-09:40, Paper WeAT2.2<br />
Fundamental Geodesic Deformations in Spaces of Treelike Shapes<br />
Feragen, Aasa, Univ. of Copenhagen<br />
Lauze, Francois, Univ. of Copenhagen<br />
Nielsen, Mads<br />
This paper presents a new geometric framework for analysis of planar treelike shapes for applications such as shape matching,<br />
recognition and morphology, using the geometry of the space of treelike shapes. Mathematically, the shape space is<br />
given the structure of a stratified set which is a quotient of a normed vector space with a metric inherited from the vector<br />
space norm. We give examples of geodesic paths in tree-space corresponding to fundamental deformations of small trees,<br />
and discuss how these deformations are key building blocks for understanding deformations between larger trees.<br />
09:40-10:00, Paper WeAT2.3<br />
Shape Interpolation with Flattenings<br />
Meyer, Fernand, Mines-ParisTech<br />
This paper presents the binary flattenings of shapes, first as a connected operator suppressing particles or holes, second as<br />
an erosion in a particular lattice of shapes. Using this erosion, it is then possible to construct a distance from a shape to<br />
another and derive from it an interpolation function between shapes.<br />
10:00-10:20, Paper WeAT2.4<br />
Circularity Measuring in Linear Time<br />
Nguyen, Thanh Phuong, LORIA<br />
Debled-Rennesson, Isabelle, LORIA - Nancy Univ.<br />
We propose a new circularity measure inspired from Arkin \cite{Arkin91}, Latecki \cite{Latecki00} tools of shape match-<br />
- 160 -
ing that is constructed in a tangent space. We then introduce a linear algorithm that uses this measure for circularity measuring.<br />
This method can also be regarded as a method for circular object recognition. Experimental results show the robustness<br />
of this simple method.<br />
10:20-10:40, Paper WeAT2.5<br />
Multiscale Analysis from 1D Parametric Geometric Decomposition of Shapes<br />
Feschet, Fabien, Univ. d’Auvergne Clermont-Ferrand 1<br />
This paper deals with the construction of a non parametric multiscale analysis from a 1D parametric decomposition of<br />
shapes where the elements of the decomposition are geometric primitives. We focus on the case of linear structures in<br />
shapes but our construction readily extends to the case of any geometric primitives. One key point of the construction is<br />
that it is truly multiscale in the sense that a higher level is a sublevel of a lower one and that it preserves symmetries of<br />
shapes. We made some experiments to show the simplification it provides on classical shapes. Results are promising.<br />
WeAT3 Dolmabahçe Hall B<br />
Image and Physics-Based Modeling Regular Session<br />
Session chair: Heyden, Anders (Lund Univ.)<br />
09:00-09:20, Paper WeAT3.1<br />
Region-Based Image Transform for Transition between Object Appearances<br />
Takahashi, Tomokazu, Gifu Shotoku Gakuen Univ.<br />
Kono, Yuki, Nagoya Univ.<br />
Ide, Ichiro, Nagoya Univ.<br />
Murase, Hiroshi, Nagoya Univ.<br />
We propose a method of region-based image transform to achieve accurate transition between object appearances. A viewtransition<br />
model (VTM) is one of the statistical methods that learn appearance transition from a sample image dataset of<br />
a large number of objects with various appearances. However, the VTM method has a practical problem that the appearance<br />
transition cannot be performed accurately if a sufficient number of learning samples is not available in the dataset. To<br />
cope with the problem, the proposed method first determines the regions of input and output images whose pixel values<br />
mutually affect each other during appearance transition, then transforms iteratively between partial images in the regions.<br />
We conducted experiments using actual image datasets. The results show that the proposed method could accurately transform<br />
appearances compared with the VTM method.<br />
09:20-09:40, Paper WeAT3.2<br />
Extended Multiple View Geometry for Lights and Cameras from Photometric and Geometric Constraints<br />
Kato, Kazuki, Nagoya Inst. of Tech.<br />
Sakaue, Fumihiko, Nagoya Inst. of Tech.<br />
Sato, Jun, Nagoya Inst. of Tech.<br />
In this paper, we derive a novel multilinear relationship for close light sources and cameras. In this multilinear relationship,<br />
image intensities and image point coordinates can be handled in a single framework. We first derive a linear representation<br />
of image intensity taken under a general close light source. We next analyze multiple view geometry among close light<br />
sources and cameras, and derive novel multilinear constraints among image intensity and image coordinates. In particular,<br />
we study the detail of the multilinear relationship among 7 lights and a camera. Finally, we show some experimental<br />
results, and show that the new multilinear relationship can be used for linearly generating images illuminated by arbitrary<br />
close light sources.<br />
09:40-10:00, Paper WeAT3.3<br />
Near-Regular BTF Texture Model<br />
Haindl, Michael, Inst. of Information Theory and Automation<br />
Hatka, Martin, Inst. of Information Theory and Automation<br />
In this paper we present a method for seamless enlargement and editing of intricate near-regular type of bidirectional<br />
texture function (BTF) which contains simultaneously both regular periodic and stochastic components. Such BTF textures<br />
- 161 -
cannot be convincingly synthesised using neither simple tiling nor using purely stochastic models. However these textures<br />
are ubiquitous in many man-made environments and also in some natural scenes. Thus they are required for their realistic<br />
appearance visualisation. The principle of the presented BTF-NR synthesis and editing method is to automatically separate<br />
periodic and random components from one or more input textures. Each of these components is subsequently independently<br />
modelled using its corresponding optimal method. The regular texture part is modelled using our roller method, while the<br />
random part is synthesised from its estimated exceptionally efficient Markov random field based representation. Both independently<br />
enlarged texture components from the original measured textures representing one (enlargement) or several<br />
(editing) materials are combined in the resulting synthetic near-regular texture.<br />
10:00-10:20, Paper WeAT3.4<br />
Detecting Vorticity in Optical Flows of Fluids<br />
Doshi, Ashish, Univ. of Surrey<br />
Bors, Adrian, Univ. of York<br />
In this paper we apply the diffusion framework to dense optical flow estimation. Local image information is represented<br />
by matrices of gradients between paired locations. Diffusion distances are modelled as sums of eigenvectors weighted by<br />
their eigenvalues extracted following the eigen decomposion of these matrices. Local optical flow is estimated by correlating<br />
diffusion distances characterizing features from different frames. A feature confidence factor is defined based on<br />
the local correlation efficiency when compared to that of its neighbourhood. High confidence optical flow estimates are<br />
propagated to areas of lower confidence.<br />
10:20-10:40, Paper WeAT3.5<br />
Modeling Facial Skin Motion Properties in Video and its Application to Matching Faces across Expressions<br />
Manohar, Vasant, Raytheon BBN Tech.<br />
Shreve, Matthew, Univ. of South Florida<br />
Goldgof, Dmitry, Univ. of South Florida<br />
Sarkar, Sudeep, Univ. of South Florida<br />
In this paper, we propose a method to model the material constants (Young’s modulus) of the skin in subregions of the<br />
face from the motion observed in multiple facial expressions and present its relevance to an image analysis task such as<br />
face verification. On a public database consisting of 40 subjects undergoing some set of facial motions associated with<br />
anger, disgust, fear, happy, sad, and surprise expressions, we present an expression invariant strategy to matching faces<br />
using the Young’s modulus of the skin. Results show that it is indeed possible to match faces across expressions using the<br />
material properties of their skin.<br />
WeAT4 Topkapı Hall A<br />
Kernel Methods Regular Session<br />
Session chair: Aksoy, Selim (Bilkent Univ.)<br />
09:00-09:20, Paper WeAT4.1<br />
AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach<br />
Zhang, Ziming, Simon Fraser Univ.<br />
Li, Ze-Nian, Simon Fraser Univ.<br />
Drew, Mark S.<br />
In this paper, we propose a novel large-margin based approach for multiple kernel learning (MKL) using biconvex optimization,<br />
called Adaptive Multiple Kernel Learning (AdaMKL). To learn the weights for support vectors and the kernel<br />
coefficients, AdaMKL minimizes the objective function alternately by learning one component while fixing the other at a<br />
time, and in this way only one convex formulation needs to be solved. We also propose a family of biconvex objective<br />
functions with an arbitrary Lp-norm (p>=1) of kernel coefficients. As our experiments show, AdaMKL performs comparably<br />
with state-of-the-art convex optimization based MKL approaches, but its learning is much simpler and faster.<br />
- 162 -
09:20-09:40, Paper WeAT4.2<br />
Von Mises-Fisher Mean Shift for Clustering on a Hypersphere<br />
Kobayashi, Takumi, Nat. Inst. of Advanced Industrial Science<br />
Otsu, Nobuyuki, Nat. Inst. of Advanced Industrial Science<br />
We propose a method of clustering sample vectors on a hypersphere. Sample vectors are normalized in many cases, especially<br />
when applying kernel functions, and thus lie on a (unit) hypersphere. Considering the constraint of the hypersphere,<br />
the proposed method utilizes the von Mises-Fisher distribution in the framework of mean shift. It is also extended to the<br />
kernel-based clustering method via kernel tricks to cope with complex distributions. The algorithms of the proposed methods<br />
are based on simple matrix calculations. In the experiments, including a practical motion clustering task, the proposed<br />
methods produce favorable clustering results.<br />
09:40-10:00, Paper WeAT4.3<br />
Nonlinear Mappings for Generative Kernels on Latent Variable Models<br />
Carli, Anna, Univ. of Verona<br />
Bicego, Manuele, Univ. of Verona<br />
Baldo, Sisto, Univ. of Verona<br />
Murino, Vittorio, Univ. of Verona<br />
Generative kernels have emerged in the last years as an effective method for mixing discriminative and generative approaches.<br />
In particular, in this paper, we focus on kernels defined on generative models with latent variables (e.g. the states<br />
in a Hidden Markov Model). The basic idea underlying these kernels is to compare objects, via a inner product, in a feature<br />
space where the dimensions are related to the latent variables of the model. Here we propose to enhance these kernels via<br />
a nonlinear normalization of the space, namely a nonlinear mapping of space dimensions able to exploit their discriminative<br />
characteristics. In this paper we investigate three possible nonlinear mappings, for two HMM-based generative kernels,<br />
testing them in different sequence classification problems, with really promising results.<br />
10:00-10:20, Paper WeAT4.4<br />
Multiple Kernel Learning with High Order Kernels<br />
Wang, Shuhui, Chinese Acad. of Sciences<br />
Jiang, Shuqiang, Chinese Acad. of Sciences<br />
Huang, Qingming, Chinese Acad. of Sciences<br />
Tian, Qi, Univ. of Texas at San Antonio<br />
Previous Multiple Kernel Learning approaches (MKL) employ different kernels by their linear combination. Though some<br />
improvements have been achieved over methods using single kernel, the advantages of employing multiple kernels for<br />
machine learning are far from being fully developed. In this paper, we propose to use high order kernels to enhance the<br />
learning of MKL when a set of original kernels are given. High order kernels are generated by the products of real power<br />
of the original kernels. We incorporate the original kernels and high order kernels into a unified localized kernel logistic<br />
regression model. To avoid over-fitting, we apply group LASSO regularization to the kernel coefficients of each training<br />
sample. Experiments on image classification prove that our approach outperforms many of the existing MKL approaches.<br />
10:20-10:40, Paper WeAT4.5<br />
Kernel-Based Implicit Regularization of Structured Objects<br />
Dupé, François-Xavier, GREYC<br />
Bougleux, Sébastien, Univ. de Caen<br />
Brun, Luc, ENSICAEN<br />
Lezoray, Olivier, Univ. de Caen<br />
Elmoataz, Abderrahim, Univ. de Caen<br />
Weighted Graph regularization provides a rich framework that allows to regularize functions defined over the vertices of a weighted<br />
graph. Until now, such a framework has been only defined for real or multivalued functions hereby restricting the regularization framework<br />
to numerical data. On the other hand, several kernels have been defined on structured objects such as strings or graphs. Using definite<br />
positive kernels, each original object is associated by the ``kernel trick’’ to one element of an Hilbert space. As a consequence, this<br />
paper proposes to extend the weighted graph regularization framework to objects implicitly defined by their kernel hereby performing<br />
the regularization within the Hilbert space associated to the kernel. This work opens the door to the regularization of structured objects.<br />
- 163 -
WeAT5 Topkapı Hall B<br />
Face Analysis Regular Session<br />
Session chair: Lovell, Brian Carrington (The Univ. of Queensland)<br />
09:00-09:20, Paper WeAT5.1<br />
Face Sketch Synthesis via Sparse Representation<br />
Chang, Liang, Beijing Normal Univ.<br />
Zhou, Mingquan, Beijing Normal Univ.<br />
Han, Yanjun, Chinese Acad. of Sciences<br />
Deng, Xiaoming, Chinese Acad. of Sciences<br />
Face sketch synthesis with a photo is challenging due to that the psychological mechanism of sketch generation is difficult<br />
to be expressed precisely by rules. Current learning-based sketch synthesis methods concentrate on learning the rules by<br />
optimizing cost functions with low-level image features. In this paper, a new face sketch synthesis method is presented,<br />
which is inspired by recent advances in sparse signal representation and neuroscience that human brain probably perceives<br />
images using high-level features which are sparse. Sparse representations are desired in sketch synthesis due to that sparseness<br />
can adaptively selects the most relevant samples which give best representations of the input photo. We assume that<br />
the face photo patch and its corresponding sketch patch follow the same sparse representation. In the feature extraction,<br />
we select succinct high-level features by using the sparse coding technique, and in the sketch synthesis process each sketch<br />
patch is synthesized with respect to high-level features by solving an $l_1$-norm optimization. Experiments have been<br />
given on CUHK database to show that our method can resemble the true sketch fairly well.<br />
09:20-09:40, Paper WeAT5.2<br />
Restoration of a Frontal Illuminated Face Image based on KPCA<br />
Xie, Xiaohua, Sun Yat-sen Univ.<br />
Zheng, Wei-Shi, Queen Mary Univ. of London<br />
Lai, Jian-Huang, Sun Yat-sen Univ.<br />
Suen, Ching Y.<br />
In this paper, we propose a novel illumination-normalization method. By using the combination of the Kernel Principal<br />
Component Analysis (KPCA) and Pre-image technology, this method can restore the frontal-illuminated face image from<br />
a single non-frontal-illuminated face image. In this method, a frontal-illumination subspace is first learned by KPCA. For<br />
each input face image, we project its large-scale features, which are affected by illumination variations, onto this subspace<br />
to normalize the illumination. Then the frontal-illuminated face image is reconstructed by combining the small- and the<br />
normalized large- scale features. Unlike most existing techniques, the proposed method does not require any shape modeling<br />
or lighting estimation. As a holistic reconstruction, KPCA+Pre-image technology incurs less local distortion. Compared<br />
to directly applying KPCA+Pre-image technology on the original image, our proposed method can be better at<br />
processing an image of a face that is outside the training set. Experiments on CMU-PIE and Extended Yale B face databases<br />
show that the proposed method outperforms state-of-the-art algorithms.<br />
09:40-10:00, Paper WeAT5.3<br />
A Bayesian Approach to Face Hallucination using DLPP and KRR<br />
Tanveer, Muhammad, National Univ. of Science and Tech.<br />
Rao, Naveed Iqbal, National Univ. of Sciences and Tech.<br />
Low resolution faces are the main barrier to efficient face recognition and identification in several problems primarily<br />
surveillance systems. To mitigate this problem we proposes a novel learning based two-step approach by the use of Direct<br />
Locality Preserving Projections (DLPP), Maximum a posterior estimation (MAP) and Kernel Ridge Regression (KRR)<br />
for super-resolution of face images or in other words Face Hallucination. First using DLPP for manifold learning and<br />
MAP estimation, a smooth Global high resolution image is obtained. In second step to introduce high frequency components<br />
KRR is used to model the Residue high resolution image, which is then added to Global image to get final high<br />
quality detail featured Hallucinated face image. As shown in experimental results the proposed system is robust and<br />
efficient in synthesizing low resolution faces similar to the original high resolution faces.<br />
- 164 -
10:00-10:20, Paper WeAT5.4<br />
Face Hallucination under an Image Decomposition Perspective<br />
Liang, Yan, Sun Yat-sen Univ.<br />
Lai, Jian-Huang, Sun Yat-sen Univ.<br />
Xie, Xiaohua, Sun Yat-sen Univ.<br />
Liu, Wanquan, Curtin Univ. of Tech.<br />
In this paper we propose to convert the task of face hallucination into an image decomposition problem, and then use the<br />
morphological component analysis (MCA) for hallucinating a single face image, based on a novel three-step framework.<br />
Firstly, a low-resolution input image is up-sampled by interpolation. Then, the MCA is employed to decompose the interpolated<br />
image into a high-resolution image and an unsharp masking, as MCA can properly decompose a signal into special<br />
parts according to typical dictionaries. Finally, a residue compensation, which is based on the neighbor reconstruction of<br />
patches, is performed to enhance the facial details. The proposed method can effectively exploit the facial properties for<br />
face hallucination under the image decomposition perspective. Experimental results demonstrate the effectiveness of our<br />
method, in terms of the visual quality of the hallucinated face images.<br />
10:20-10:40, Paper WeAT5.5<br />
Gender Classification using Local Directional Pattern (LDP)<br />
Jabid, Taskeed, Kyung Hee Univ.<br />
Kabir, Md. Hasanul, Kyung Hee Univ.<br />
Chae, Oksam, Kyung Hee Univ.<br />
In this paper, we present a novel texture descriptor Local Directional Pattern (LDP) to represent facial image for gender<br />
classification. The face area is divided into small regions, from which LDP histograms are extracted and concatenated<br />
into a single vector to efficiently represent the face image. The classification is performed by using support vector machines<br />
(SVMs), which had been shown to be superior to traditional pattern classifiers in gender classification problem. Experimental<br />
results show the superiority of the proposed method on the images collected from FERET face database and<br />
achieved 95.05% accuracy.<br />
WeAT6 Anadolu Auditorium<br />
Document Analysis - I Regular Session<br />
Session chair: Baird, Henry (Lehigh Univ.)<br />
09:00-09:20, Paper WeAT6.1<br />
Generating Sets of Classifiers for the Evaluation of Multi-Expert Systems<br />
Impedovo, Donato, Pol. di Bari<br />
Pirlo, Giuseppe, Univ. degli Studi di Bari<br />
This paper addresses the problem of multi-classifier system evaluation by artificially generated classifiers. For the purpose,<br />
a new technique is presented for the generation of sets of artificial abstract-level classifiers with different characteristics<br />
at the individual-level (i.e. recognition performance) and at the collective-level (i.e. degree of similarity). The technique<br />
has been used to generate sets of classifiers simulating different working conditions in which the performance of combination<br />
methods can be estimated. The experimental tests demonstrate the effectiveness of the approach in generating simulated<br />
data useful to investigate the performance of combination methods for abstract-level classifiers.<br />
09:20-09:40, Paper WeAT6.2<br />
Imbalance and Concentration in K-NN Classification<br />
Yin, Dawei, Lehigh Univ.<br />
An, Chang, Lehigh Univ.<br />
Baird, Henry, Lehigh Univ.<br />
We propose algorithms for ameliorating difficulties in fast approximate k Nearest Neighbors (kNN) classifiers that arise<br />
from imbalances among classes in numbers of samples, and from concentrations of samples in small regions of feature<br />
space. These problems can occur with a wide range of binning kNN algorithms such as k-D trees and our variant, hashed<br />
k-D trees. The principal method we discuss automatically rebalances training data and estimates concentration in each Kd<br />
hash bin separately, which then controls how many samples should be kept in each bin. We report an experiment on<br />
- 165 -
86.7M training samples which shows a 7-times speedup and higher minimum per-class recall, compared to previously reported<br />
methods. The context of these experiments is the need for image classifiers able to handle an unbounded variety of<br />
inputs: in our case, highly versatile document classifiers which require training sets as large as a billion training samples.<br />
09:40-10:00, Paper WeAT6.3<br />
Gaussian Mixture Models for Arabic Font Recognition<br />
Slimane, Fouad, Univ. of Fribourg<br />
Kanoun, Slim, ENIS<br />
Alimi, Adel M., Univ. of Sfax<br />
Ingold, Rolf, Univ. of Fribourg<br />
Hennebert, Jean, Univ. of Applied Sciences<br />
We present in this paper a new approach for Arabic font recognition. Our proposal is to use a fixed-length sliding window<br />
for the feature extraction and to model feature distributions with Gaussian Mixture Models (GMMs). This approach presents<br />
a double advantage. First, we do not need to perform a priori segmentation into characters, which is a difficult task<br />
for arabic text. Second, we use versatile and powerful GMMs able to model finely distributions of features in large multidimensional<br />
input spaces. We report on the evaluation of our system on the APTI (Arabic Printed Text Image) database<br />
using 10 different fonts and 10 font sizes. Considering the variability of the different font shapes and the fact that our<br />
system is independent of the font size, the obtained results are convincing and compare well with competing systems.<br />
10:00-10:20, Paper WeAT6.4<br />
Transfer of Supervision for Improved Address Standardization<br />
Kothari, Govind, IBM<br />
Faruquie, Tanveer, IBM Res. India<br />
Subramaniam, L. Venkata, IBM Res. India<br />
K, Hima Prasad, IBM Res. India<br />
Mohania, Mukesh, IBM Res. India<br />
Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners<br />
can be easily trained for different data sources. However, training requires labeling large corpora for each data source<br />
which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a<br />
given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data<br />
from one source. The shared component distribution across these dirichlet processes captures the semantic relation between<br />
data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision.<br />
10:20-10:40, Paper WeAT6.5<br />
Bag of Characters and SOM Clustering for Script Recognition and Writer Identification<br />
Marinai, Simone, Univ. of Florence<br />
Miotti, Beatrice, Univ. of Florence<br />
Soda, Giovanni, Univ. di Firenze<br />
In this paper, we describe a general approach for script (and language) recognition from printed documents and for writer<br />
identification in handwritten documents. The method is based on a bag of visual word strategy where the visual words<br />
correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM). Unknown pages (words<br />
in the case of script recognition) are classified comparing their vectorial representations with those of one training set<br />
using a cosine similarity. The comparison is improved using a similarity score that is obtained taking into account the<br />
SOM organization of cluster centroids. % Promising results are presented for both printed documents and handwritten<br />
musical scores.<br />
WeAT7 Dolmabahçe Hall C<br />
Gait and Gesture Regular Session<br />
Session chair: Shinoda, Koichi (Tokyo Institute of Technology)<br />
- 166 -
09:00-09:20, Paper WeAT7.1<br />
Multi-View Gait Recognition based on Motion Regression using Multilayer Perceptron<br />
Kusakunniran, Worapan, Univ. of New South Wales<br />
Wu, Qiang, Univ. of Tech. Sydney<br />
Zhang, Jian, National ICT Australia<br />
Li, Hongdong, Australian National Univ.<br />
It has been shown that gait is an efficient biometric feature for identifying a person at a distance. However, it is a challenging<br />
problem to obtain reliable gait feature when viewing angle changes because the body appearance can be different under<br />
the various viewing angles. In this paper, the problem above is formulated as a regression problem where a novel View<br />
Transformation Model (VTM) is constructed by adopting Multilayer Perceptron (MLP) as regression tool. It smoothly estimates<br />
gait feature under an unknown viewing angle based on motion information in a well selected Region of Interest<br />
(ROI) under other existing viewing angles. Thus, this proposal can normalize gait features under various viewing angles<br />
into a common viewing angle before gait similarity measurement is carried out. Encouraging experimental results have<br />
been obtained based on widely adopted benchmark database.<br />
09:20-09:40, Paper WeAT7.2<br />
Robust Gait Recognition against Speed Variation<br />
Aqmar, Muhammad Rasyid, Tokyo Inst. of Tech.<br />
Shinoda, Koichi, Tokyo Inst. of Tech.<br />
Furui, Sadaoki<br />
Variations in walking speed have a strong impact on the recognition of gait. We propose a method of recognition of gait<br />
that is robust against walking-speed variations. It is established on a combination of Fisher discriminant analysis (FDA)based<br />
cubic higher-order local auto-correlation (CHLAC) and the statistical framework provided by hidden Markov models<br />
(HMMs). The HMMs in this method identify the phase of each gait even when walking speed changes nonlinearly, and<br />
the CHLAC features capture the within-phase spatio-temporal characteristics of each individual. We compared the performance<br />
of our method with other conventional methods in our evaluation using three different databases, i.e., USH,<br />
USF-NIST, and Tokyo Tech DB. Ours was equal or better than the others when the speed did not change too much, and<br />
was significantly better when the speed varied across and within a gait sequence.<br />
09:40-10:00, Paper WeAT7.3<br />
Gait Recognition using Period-Based Phase Synchronization for Low Frame-Rate Videos<br />
Mori, Atsushi, Osaka Univ.<br />
Makihara, Yasushi, The Inst. of Scientific and Industrial Res. Univ.<br />
Yagi, Yasushi, Osaka Univ.<br />
This paper proposes a method for period-based gait trajectory matching in the eigenspace using phase synchronization for<br />
low frame-rate videos. First, a gait period is detected by maximizing the normalized autocorrelation of the gait silhouette<br />
sequence for the temporal axis. Next, a gait silhouette sequence is expressed as a trajectory in the eigenspace and the gait<br />
phase is synchronized by time stretching and time shifting of the trajectory based on the detected period. In addition, multiple<br />
period-based matching results are integrated via statistical procedures for more robust matching in the presence of<br />
fluctuations among gait sequences. Results of experiments conducted with 185 subjects to evaluate the performance of<br />
the gait verification with various spatial and temporal resolutions, demonstrate the effectiveness of the proposed method.<br />
10:00-10:20, Paper WeAT7.4<br />
Body Motion Analysis for Multi-Modal Identity Verification<br />
Williams, George, NYU<br />
Taylor, Graham, NYU<br />
Smolskiy, Kirill, NYU<br />
Bregler, Christoph, NYU<br />
This paper shows how Body Motion Signature Analysis a new soft-biometrics technique can be used for identity verification.<br />
It is able to extract motion features from the upper body of people and estimates so called super-features for input<br />
to a classifier. We demonstrate how this new technique can be used to identify people just based on their motion, or it can<br />
be used to significantly improve hard-biometrics techniques. For example, face verification achieves on this domain 6.45%<br />
- 167 -
Equal Error Rate (EER), and the combined verification performance of motion features and face reduces the error to 4.96%<br />
using an adaptive score-level integration method. The more ambiguous motion-only performance is 17.1% EER.<br />
10:20-10:40, Paper WeAT7.5<br />
Robust Sign Language Recognition with Hierarchical Conditional Random Fields<br />
Yang, Hee-Deok, Chosun Univ.<br />
Lee, Seong-Whan, Korea Univ.<br />
Sign language spotting is the task of detection and recognition of signs (words in the predefined vocabulary) and fingerspellings<br />
(a combination of continuous alphabets that are not found in signs) in a signed utterance. The internal structures<br />
of signs and fingerspellings differ significantly. Therefore, it is difficult to spot signs and fingerspellings simultaneously.<br />
In this paper, a novel method for spotting signs and fingerspellings is proposed, which can distinguish signs, fingerspellings,<br />
and nonsign patterns. This is achieved through a hierarchical framework consisting of three steps; (1) Candidate segments<br />
of signs and fingerspellings are discriminated with a two-layer conditional random field (CRF). (2) Hand shapes of detected<br />
signs and fingerspellings are verified by BoostMap embeddings. (3) The motions of fingerspellings are verified in order<br />
to distinguish those which have similar hand shapes and differ only in hand trajectories. Experiments demonstrate that the<br />
proposed method can spot signs and fingerspellings from utterance data at rates of 83% and 78%, respectively.<br />
WeAT8 Upper Foyer<br />
Image and Video Processing Poster Session<br />
Session chair: Koch, Reinhard (Univ. of Kiel)<br />
09:00-11:10, Paper WeAT8.1<br />
Compressive Sampling Recovery for Natural Images<br />
Shang, Fei, Beijing Inst. of Tech.<br />
Du, Huiqian, Beijing Inst. of Tech.<br />
Jia, Yunde, Beijing Inst. of Tech.<br />
Compressive sampling (CS) is a novel data collection and coding theory which allows us to recover sparse or compressible<br />
signals from a small set of measurements. This paper presents a new model for natural image recovery, in which the smooth<br />
l0 norm and the approximate total-variation (TV) norm are adopted simultaneously. By using one-order gradient decrease,<br />
the speed of algorithm for this new model can be guaranteed. Experimental results demonstrate that the principle of the<br />
model is correct and the performance is as good as that based on TV model. The computing speed of the proposed method<br />
is two orders of magnitude faster than that of interior point method and two times faster than that of the Nesta optimization<br />
based on TV model.<br />
09:00-11:10, Paper WeAT8.3<br />
De-Ghosting for Image Stitching with Automatic Content-Awareness<br />
Tang, Yu, The Univ. of Aizu<br />
Shin, Jungpil, The Univ. of Aizu<br />
Ghosting artifact in the field of image stitching is a common problem and the elimination of it is not an easy task. In this<br />
paper, we propose an intuitive technique according to a stitching line based on a novel energy map which is essentially a<br />
combination of gradient map which indicates the presence of structures and prominence map which determines the attractiveness<br />
of a region. We consider a region is of significance only if it is both structural and attractive. Using this improved<br />
energy map, the stitching line can easily skirt around the moving objects or salient parts based on the philosophy that<br />
human eyes mostly notice only the salient features of an image. We compare result of our method to those of 4 state-ofthe-art<br />
image stitching methods and it turns out that our method outperforms the 4 methods in removing ghosting artifacts.<br />
09:00-11:10, Paper WeAT8.4<br />
Content-Adaptive Automatic Image Sharpening<br />
Kobayashi, Tatsuya, Nagoya City Univ.<br />
Tajima, Johji, Nagoya City Univ.<br />
- 168 -
Optimal sharpness differs from image to image, de-pending on the content. In general, human observer prefers images of<br />
artifacts sharper and those of natural-objects less sharper. We have developed a content-adaptive automatic image sharpening<br />
algorithm that relies on the length of lines extracted from the image. It is applicable to images with various regions<br />
such as those contain natural and artificial objects. The proposed algorithm is expected to be used in image processing<br />
modules of image input/output devices, e.g. digital cameras, printers, etc.<br />
09:00-11:10, Paper WeAT8.5<br />
Irradiance Preserving Image Interpolation<br />
Giachetti, Andrea, Univ. di Verona<br />
In this paper we present a new image up scaling (single image super resolution) algorithm. It is based on the refinement<br />
of a simple pixel decimation followed by an optimization step maximizing the smoothness of the second order derivatives<br />
of the image intensity while keeping the sum of the brightness values of each subdivided pixel (i.e. the estimated irradiance<br />
on the area) constant. The method is physically grounded and creates images that appear very sharp and with reduced artifacts.<br />
Subjective and objective tests demonstrate the high quality of the results obtained.<br />
09:00-11:10, Paper WeAT8.7<br />
Interpolation and Sampling on a Honeycomb Lattice<br />
Strand, Robin, Uppsala Univ.<br />
In this paper, we focus on the three-dimensional honeycomb point-lattice in which the Voronoi regions are hexagonal<br />
prisms. The ideal interpolation function is derived by using a Fourier transform of the sampling lattice. From these results,<br />
the sampling efficiency of the lattice follows.<br />
09:00-11:10, Paper WeAT8.8<br />
Optimization of Topological Active Models with Multiobjective Evolutionary Algorithms<br />
Novo Buján, Jorge, Varpa group, Univ. of A Coruña<br />
Santos, Jose, Univ. of A Coruña<br />
Gonzalez Penedo, Manuel Francisco, Univ. of A Coruña<br />
Fernández Arias, Alba, VARPA Group, Univ. of A Coruña<br />
In this work we use the evolutionary multiobjective methodology for the optimization of topological active models, a deformable<br />
model that integrates features of region-based and boundary-based segmentation techniques. The model deformation<br />
is controlled by energy functions that must be minimized. As in other deformable models, a correct segmentation<br />
is achieved through the optimization of the model, governed by energy parameters that must be experimentally tuned.<br />
Evolutionary multiobjective optimization gives a solution to this problem by considering the optimization of several objectives<br />
in parallel. Concretely, we use the SPEA2 algorithm, adapted to our application, the search of the Pareto optimal<br />
individuals. The proposed method was tested on several representative images from different domains yielding highly accurate<br />
results.<br />
09:00-11:10, Paper WeAT8.9<br />
Fast Super-Resolution using Weighted Median Filtering<br />
Nasonov, Andrey, Lomonosov Moscow State Univ.<br />
Krylov, Andrey S., Lomonosov Moscow State Univ.<br />
A non-iterative method of image super-resolution based on weighted median filtering with Gaussian weights is proposed.<br />
Visual tests and basic edges metrics were used to examine the method. It was shown that the weighted median filtering<br />
reduces the errors caused by inaccurate motion vectors.<br />
- 169 -
09:00-11:10, Paper WeAT8.10<br />
Geodesic Thin Plate Splines for Image Segmentation<br />
Lombaert, Herve, Ec. Pol. de Montreal<br />
Cheriet, Farida, Ec. Pol. de Montreal<br />
Thin Plate Splines are often used in image registration to model deformations. Its physical analogy involves a thin lying<br />
sheet of metal that is deformed and forced to pass through a set of control points. The Thin Plate Spline equation minimizes<br />
that thin plate bending energy. Rather than using Euclidean distances between control points for image deformation, we<br />
are using geodesic distances for image segmentation. Control points become seed points and force the thin plate to pass<br />
through given heights. Intuitively, the thin plate surface in the vicinity of a seed point within a region should have similar<br />
heights. The minimally bended thin plate actually gives a “confidence” map telling what the closest seed point is for every<br />
surface point. The Thin Plate Spline has a closed-form solution which is fast to compute and global optimal. This method<br />
shows comparable results to the Graph Cuts method.<br />
09:00-11:10, Paper WeAT8.11<br />
Gestures and Lip Shape Integration for Cued Speech Recognition<br />
Heracleous, Panikos, Advanced Telecommunications Res. Inst. International<br />
Beautemps, Denis, Gipsa-Lab.<br />
Hagita, Norihiro, Advanced Telecommunications Res. Inst. International<br />
In this article, automatic recognition of Cued Speech in French based on hidden Markov models (HMMs) is presented.<br />
Cued Speech is a visual mode, which uses hand shapes in different positions and in combination with lip-patterns of speech<br />
makes all the sounds of spoken language clearly understandable to deaf and hearing-impaired people. The aim of Cued<br />
Speech is to overcome the problems of lipreading and thus enable deaf children and adults to understand full spoken language.<br />
In this study, lip shape component is fused with hand component using also multistream HMM decision fusion to<br />
realize Cued Speech recognition, and continuous phoneme recognition experiments using data from a normal-hearing and<br />
a deaf cuer were conducted. In the case of the normal-hearing cuer, the obtained phoneme accuracy was 83.5%, and in the<br />
case of the deaf cuer 82.1%.<br />
09:00-11:10, Paper WeAT8.12<br />
IFLT based Real-Time Framework for Image-Matching<br />
Janney, Pranam, Univ. of New South Wales<br />
Geers, Glenn, National ICT Australia<br />
In this paper we show that the features generated by the recently presented Invariant Features of Local Textures (IFLT)<br />
technique can be used in a SIFT like framework to deliver real-time point wise image matching with performance comparable<br />
to existing state-of-the-art image matching systems. The proposed framework is also capable of saving considerable<br />
amount of computation time.<br />
09:00-11:10, Paper WeAT8.13<br />
High-Order Circular Derivative Pattern for Image Representation and Recognition<br />
Zhao, Sanqiang, Griffith Univ. / National ICT Australia<br />
Gao, Yongsheng, Griffith Univ.<br />
Caelli, Terry, National ICT Australia<br />
Micropattern based image representation and recognition, e.g. Local Binary Pattern (LBP), has been proved successful<br />
over the past few years due to its advantages of illumination tolerance and computational efficiency. However, LBP only<br />
encodes the first-order radial-directional derivatives of spatial images and is inadequate to completely describe the discriminative<br />
features for classification. This paper proposes a new Circular Derivative Pattern (CDP) which extracts highorder<br />
derivative information of images along circular directions. We argue that the high-order circular derivatives contain<br />
more detailed and more discriminative information than the first-order LBP in terms of recognition accuracy. Experimental<br />
evaluation through face recognition on the FERET database and insect classification on the NICTA Biosecurity Dataset<br />
demonstrated the effectiveness of the proposed method.<br />
- 170 -
09:00-11:10, Paper WeAT8.14<br />
Automatic Face Replacement in Video based on 2D Morphable Model<br />
Min, Feng, WuHan Inst. of Tech.<br />
Sang, Nong, Huazhong Univ. of Science and Tech.<br />
Wang, Zhefu, Wuhan Inst. of Tech.<br />
This paper presents an automatic face replacement approach in video based on 2D morphable model. Our approach includes<br />
three main modules: face alignment, face morph, and face fusion. Given a source image and target video, the Active Shape<br />
Models (ASM) is adopted to source image and target frames for face alignment. Then the source face shape is warped to<br />
match the target face shape by a 2D morphable model. The color and lighting of source face are adjusted to keep consistent<br />
with those of target face, and seamlessly blended in the target face. Our approach is fully automatic without user interference,<br />
and generates natural and realistic results.<br />
09:00-11:10, Paper WeAT8.15<br />
3D Deformable Surfaces with Locally Self-Adjusting Parameters – a Robust Method to Determine Cell Nucleus Shapes<br />
Keuper, Margret, Univ. of Freiburg<br />
Schmidt, Thorsten, Univ. of Freiburg<br />
Padeken, Jan, Max-Planck-Insitute of Immunobiology<br />
Heun, Patrick, Max-Planck-Inst. of Immunobiology<br />
Palme, Klaus, Univ. of Freiburg<br />
Burkhardt, Hans, Univ. of Freiburg<br />
Ronneberger, Olaf, Univ. of Freiburg<br />
When using deformable models for the segmentation of biological data, the choice of the best weighting parameters for<br />
the internal and external forces is crucial. Especially when dealing with 3D fluorescence microscopic data and cells within<br />
dense tissue, object boundaries are sometimes not visible. In these cases, one weighting parameter set for the whole contour<br />
is not desirable. We are presenting a method for the dynamic adjustment of the weighting parameters, that is only depending<br />
on the underlying data and does not need any prior information. The method is especially apt to handle blurred, noisy, and<br />
deficient data, as it is often the case in biological microscopy.<br />
09:00-11:10, Paper WeAT8.16<br />
Decomposition of Dynamic Textures using Morphological Component Analysis: A New Adaptative Strategy<br />
Dubois, Sloven, Univ. de La Rochelle<br />
Péteri, Renaud, Univ. of La Rochelle<br />
Ménard, Michel, Univ. de La Rochelle<br />
The research context of this work is dynamic texture analysis and characterization. Many dynamic textures can be modeled<br />
as a large scale propagating wave and local oscillating phenomena. The Morphological Component Analysis algorithm is<br />
used to retrieve these components using a well chosen dictionary. We define a new strategy for adaptive thresholding in<br />
the Morphological Component Analysis framework, which greatly reduces the computation time when applied on videos.<br />
Tests on synthetic and real image sequences illustrate the efficiency of the proposed method and future prospects are<br />
finally exposed.<br />
09:00-11:10, Paper WeAT8.17<br />
Anisotropic Contour Completion for Cell Microinjection Targeting<br />
Becattini, Gabriele, Italian Inst. of Tech.<br />
Mattos, Leonardo, Italian Inst. of Tech.<br />
Caldwell, Darwin G., Italian Inst. of Tech.<br />
This paper shows a novel application of the diffusion tensor for anisotropic image processing. The designed system aims<br />
at spotting and localizing injection points on a population of adherent cells lying on a Petri’s dish. The overall procedure<br />
is described including pre-filtering, ridge enhancement, cell segmentation, shape analysis and injection point detection.<br />
The anisotropic contour completion (ACC) employed is equivalent to a dilation with a continuous elliptic structural element<br />
that takes into account the local orientation of the contours to be closed, preventing extension towards the normal direction.<br />
Experiments carried out on real images from an optical microscope revealed a remarkable reliability with up to 86% of<br />
cells in the field of view correctly segmented and targeted for microinjection.<br />
- 171 -
09:00-11:10, Paper WeAT8.18<br />
Active Contours with Thresholding Value for Image Segmentation<br />
Chen, Gang, Chinese Acad. of Sciences<br />
Zhang, Haiying, Chinese Acad. of Sciences<br />
Chen, Iron, Chinese Acad. of Sciences<br />
Yang, Wen, Wuhan Univ.<br />
In this paper, we propose an active contour with threshold value to detect objects and at the same time get rid of unimportant<br />
parts rather than extract all information. The basic ideal of our model is to introduce a weight matrix into region-based<br />
active contours, which can enhance the weight for the main parts while filter the weak intensity, such as shadows, illumination<br />
and so on. Moreover, we can choose threshold value to set weight matrix manually for accurate image segmentation.<br />
Thus, the proposed method can extract objects of interest in practice. Coupled partial differential equations are used to<br />
implement this method with level set algorithms. Experimental results show the advantages of our method in terms of accuracy<br />
for image segmentation.<br />
09:00-11:10, Paper WeAT8.19<br />
An Iterative Method for Superresolution of Optical Flow Derived by Energy Minimisation<br />
Mochizuki, Yoshihiko, Chiba Univ.<br />
Kameda, Yusuke, Chiba Univ.<br />
Imiya, Atsushi, IMIT, Chiba Univ.<br />
Sakai, Tomoya, Chiba Univ.<br />
Super resolution is a technique to recover a high resolution image from a low resolution image. We develop a variational<br />
super resolution method for the subpixel accurate optical flow computation using variational optimisation. We combine<br />
variational super resolution and the variational optical flow computation for the super resolution optical flow computation.<br />
09:00-11:10, Paper WeAT8.20<br />
Non-Rigid Image Registration for Historical Manuscript Restoration<br />
Wang, Jie, National Univ. of Singapore<br />
Tan, Chew-Lim, National Univ. of Singapore<br />
This paper presents a non-rigid registration method for the restoration of double-sided historical manuscripts. Firstly, the<br />
gradient direction maps of the two images of a manuscript are examined to identify candidate control points. Then the<br />
correspondences of these points are established by minimizing a disimilarity measure consisting of intensity, gradient and<br />
displacement. To fully capture the spatial relationship between the two images, a mapping function is defined as the combination<br />
of a global affine and local b-splines transformation. The cost function for optimization consists of two parts:<br />
normalized mutual information for the goal of similarity and space integral of the square of the second order derivatives<br />
for smoothness. To evaluate the proposed method, a wavelet based restoration procedure is applied to registered images.<br />
Real documents from the National Archives of Singapore are used for testing and the experimental results are impressive.<br />
09:00-11:10, Paper WeAT8.21<br />
An Effective Decentralized Nonparametric Quickest Detection Approach<br />
Yang, Dayu, Univ. of Tennessee<br />
Qi, Hairong, Univ. of Tennessee<br />
This paper studies decentralized quickest detection schemes that can be deployed in a sensing environment where data<br />
streams are simultaneously collected from multiple channels located distributively to jointly support the detection. Existing<br />
decentralized detection approaches are largely parametric that require the knowledge of pre-change and post-change distributions.<br />
In this paper, we first present an effective nonparametric detection procedure based on Q-Q distance measure.<br />
We then describe two implementations schemes, binary quickest detection and local decision fusion by majority voting,<br />
that realize decentralized nonparametric detection. Experimental results show that the proposed method has a comparable<br />
performance to the parametric CUSUM test in binary detection. Its decision fusion-based implementation also outperforms<br />
the other three popular fusion rules under the parametric framework.<br />
- 172 -
09:00-11:10, Paper WeAT8.22<br />
On the Design of a Class of Odd-Length Biorthogonal Wavelet Filter Banks for Signal and Image Processing<br />
Baradarani, Aryaz, Univ. of Windsor<br />
Mendapara, Pankajkumar, Univ. of Windsor<br />
Wu, Q. M. Jonathan, Univ. of Windsor<br />
In this paper, we introduce an approach to the design of odd-length biorthogonal wavelet filter banks based on semi definite<br />
programming employing Bernstein polynomials. The method is systematic and renders a simple optimization problem,<br />
yet it offers wavelet filters ranging from maximally flat to maximal passband/stopband width. The odd-length biorthogonal<br />
filter pairs are then used in multi-focus imaging to obtain a fully-focused image from a set of registered semi-focused<br />
input images at varying focus employing the distance transform and exponentially decaying function on the subbands in<br />
wavelet domain. Various images are tested and experimental results compare favorably to recent results in literature.<br />
09:00-11:10, Paper WeAT8.23<br />
Implicit Feature-Based Alignment System for Radiotherapy<br />
Yamakoshi, Ryoichi, Mitsubishi Electric Corp.<br />
Hirasawa, Kousuke, Mitsubishi Electric Corp.<br />
Okuda, Haruhisa, Mitsubishi Electric Corp.<br />
Kage, Hiroshi, Mitsubishi Electric Corp.<br />
Sumi, Kazuhiko, Mitsubishi Electric Corp.<br />
Ivanov, Yuri, MERL, USA<br />
Sakamoto, Hidenobu, Mitsubishi Electric Corp.<br />
Yanou, Toshihiro, Hyogo Ion Bean Medical Center, Tokyo<br />
Suga, Daisaku, Hyogo Ion Bean Medical Center, Tokyo<br />
Murakami, Masao, Hyogo Ion Bean Medical Center, Tokyo<br />
In this paper we present a robust alignment algorithm for correcting the effects of out-of-plane rotation to be used for automatic<br />
alignment of the Computed Tomography (CT) volumes and the generally low quality fluoroscopic images for radiotherapy<br />
applications. Analyzing not only in-plane but also out-of-plane rotation effects on the Dignitary Reconstructed<br />
Radiograph (DRR) images, we develop simple alignment algorithm that extracts a set of implicit features from DRR.<br />
Using these SIFT-based features, we align DRRs with the fluoroscopic images of the patient and evaluate the alignment<br />
accuracy. We compare our approach with traditional techniques based on gradient-based operators and show that our algorithm<br />
performs faster while in most cases delivering higher accuracy.<br />
09:00-11:10, Paper WeAT8.24<br />
3D Vertebrae Segmentation in CT Images with Random Noises<br />
Aslan, Melih Seref, Univ. of Louisville<br />
Ali, Asem, Univ. of Louisville<br />
Farag, Aly A., Univ. of Louisville<br />
Arnold, Ben, Image Analysis, Inc<br />
Chen, Dongqing, Univ. of Louisville<br />
Ping, Xiang, Image Analysis, Inc.<br />
Exposure levels (X-ray tube amperage and peak kilovoltage) are associated with various noise levels and radiation dose.<br />
When higher exposure levels are applied, the images have higher signal to noise ratio (SNR) in the CT images. However,<br />
the patient receives higher radiation dose in this case. In this paper, we use our robust 3D framework to segment vertebral<br />
bodies (VBs) in clinical computed tomography (CT) images with different noise levels. The Matched filter is employed<br />
to detect the VB region automatically. In the graph cuts method, a VB (object) and surrounding organs (background) are<br />
represented using a gray level distribution models which are approximated by a linear combination of Gaussians (LCG).<br />
Initial segmentation based on the LCG models is then iteratively refined by using Markov Gibbs random field(MGRF)<br />
with analytically estimated potentials. Experiments on the data sets show that the proposed segmentation approach is more<br />
accurate and robust than other known alternatives.<br />
- 173 -
09:00-11:10, Paper WeAT8.25<br />
An Improved Method for Cirrhosis Detection using Liver’s Ultrasound Images<br />
Fujita, Yusuke, Yamaguchi Univ.<br />
Hamamoto, Yoshihiko, Yamaguchi Univ.<br />
Segawa, Makoto, Yamaguchi Univ.<br />
Terai, Shuji, Yamaguchi Univ.<br />
Sakaida, Isao, Yamaguchi Univ.<br />
This paper describes an improved method for cirrhosis detection in the liver using Gabor features from ultrasound images.<br />
There are three main contributions of our cirrhosis detection method. The first contribution of this method is to combine<br />
weak classifiers using the AdaBoost algorithm. The second one is to use an artificial dataset to avoid the problem of over<br />
fitting the limited training dataset. The third one is to apply a voting classification with use of multiple regions of interest<br />
(ROIs). Although the accuracy rate of a single classifier designed with only original dataset was 56%, that of the proposed<br />
method was 80% in cross-validation.<br />
09:00-11:10, Paper WeAT8.26<br />
A Dual Pass Video Stabilization System using Iterative Motion Estimation and Adaptive Motion Smoothing<br />
Pan, Pan, Fujitsu R&D Center Co., Ltd.<br />
Minagawa, Akihiro, Fujitsu Lab. LTD<br />
Sun, Jun, Fujitsu R&D Center Co., LTD<br />
Hotta, Yoshinobu, Fujitsu Lab. LTD.<br />
Naoi, Satoshi, Fujitsu R&D Center Co., LTD<br />
In this paper, we propose a novel dual pass video stabilization system using iterative motion estimation and adaptive<br />
motion smoothing. In the first pass, the transformation matrix to stabilize each frame is returned. The global motion estimation<br />
is carried out by a novel iterative method. The intentional motion is estimated using adaptive window smoothing.<br />
Before the beginning of the second pass, we obtain the optimal trim size for a specific video based on the statistics of the<br />
transformation parameters. In the second pass, the stabilized video is composed according to the optimal trim size. Experimental<br />
results show the superior performance of the proposed method in comparison to other existing methods.<br />
09:00-11:10, Paper WeAT8.27<br />
A Modified Particle Swarm Optimization Applied in Image Registration<br />
Niazi, Muhammad Khalid Khan, Uppsala Univ.<br />
Nystrom, Ingela, Uppsala Univ.<br />
We report a modified version of the particle swarm optimization (PSO) algorithm and its application to image registration.<br />
The modified version utilizes benefits from the Gaussian and the uniform distribution, when updating the velocity equation<br />
in the PSO algorithm. Which one of the distributions is selected depends on the direction of the cognitive and social components<br />
in the velocity equation. This direction checking and selection of the appropriate distribution provide the particles<br />
with an ability to jump out of local minima. The registration results achieved by this new version proves the robustness<br />
and its ability to find a global minimum.<br />
09:00-11:10, Paper WeAT8.28<br />
Image Segmentation based on Adaptive Fuzzy-C-Means Clustering<br />
Ayech, Mohamed Walid, Pol. de Recherche Informatique Du CEntre<br />
El Kalti, Karim, Faculty of Science of Monastir Tunisia<br />
El Ayeb, Bechir, Pol. de Recherche Informatique Du CEntre<br />
The clustering method Fuzzy-C-Means (FCM) is widely used in image segmentation. However, the major drawback of<br />
this method is its sensitivity to the noise. In this paper, we propose a variant of this method which aims at resolving this<br />
problem. Our approach is based on an adaptive distance which is calculated according to the spatial position of the pixel<br />
in the image. The obtained results have shown a significant improvement of our approach performance compared to the<br />
standard version of the FCM, especially regarding the robustness face to noise and the accuracy of the edges between regions.<br />
- 174 -
09:00-11:10, Paper WeAT8.29<br />
Multi-Spectral Satellite Image Registration using Scale-Restricted SURF<br />
Teke, Mustafa, Middle East Tech. Univ.<br />
Temizel, Alptekin, Middle East Tech. Univ.<br />
Satellites generally have arrays of sensors having different resolution and wavelength parameters. For some applications,<br />
images acquired from different viewpoints and positions are required to be aligned. This alignment process could be<br />
achieved by matching the image features followed by image registration. In this paper registration of multispectral satellite<br />
images using Speeded Up Robust Features (SURF) method is examined. The performance of SURF for registration of<br />
high resolution satellite images captured at different bands is evaluated. Scale restriction (SR) method, which has recently<br />
been proposed for SIFT, is adapted to SURF to improve multispectral image registration performance. Matching performance<br />
between different bands using SURF, U-SURF, SURF with SR and U-SURF with SR is tested and robustness of<br />
these with respect to orientation and scale is evaluated.<br />
09:00-11:10, Paper WeAT8.30<br />
Automatic Attribute Threshold Selection for Blood Vessel Enhancement<br />
Kiwanuka, Fred Noah, Univ. of Groningen<br />
Wilkinson, Michael H.f., Univ. of Groningen<br />
Attribute filters allow enhancement and extraction of features without distorting their borders, and never introduce new<br />
image features. These are highly desirable properties in biomedical imaging, where accurate shape analysis is paramount.<br />
However, setting the attribute-threshold parameters has to date only been done manually. This paper explores simple, fast<br />
and automated methods of computing attribute threshold parameters based on image segmentation, thresholding and data<br />
clustering techniques. Though several techniques perform well on blood-vessel filtering, the choice of technique appears<br />
to depend on the imaging mode.<br />
09:00-11:10, Paper WeAT8.31<br />
Initialisation-Free Active Contour Segmentation<br />
Xie, Xianghua, Swansea Univ.<br />
Mirmehdi, Majid, Univ. of Bristol<br />
We present a region based active contour model which does not require any initialisation and is capable of modelling<br />
multi-modal image regions. Its external force is based on statistically learning and grouping of image primitives in multiscale,<br />
and its numerical solution is carried out using radial basis function interpolation and time dependent expansion<br />
coefficient updating. The initialisation-free property makes it attractive to applications such as detecting unkown number<br />
of objects with unkown topologies.<br />
09:00-11:10, Paper WeAT8.32<br />
On Clock Offset Estimation in Wireless Sensor Networks with Weibull Distributed Network Delays<br />
Ahmad, Aitzaz, Texas A&M Univ. Coll. Station<br />
Noor, Amina, Texas A&M Univ. Coll. Station<br />
Serpedin, Erchin, Texas A&M Univ. Coll. Station<br />
Nounou, Hazem, Texas A&M Univ.<br />
Nounou, Mohamed, Texas A&M Univ.<br />
We consider the problem of Maximum Likelihood (ML) estimation of clock parameters in a two-way timing exchange<br />
scenario where the random delays assume a Weibull distribution, which represents a more generalized model. The ML estimate<br />
of the clock offset for the case of exponential distribution was obtained earlier. Moreover, it was reported that when<br />
the fixed delay is known, MLE is not unique. We determine the uniformly minimum variance unbiased (UMVU) estimators<br />
for exponential distribution under such a scenario and produce biased estimators having lower MSE than UMVU for all<br />
values of clock offset. We then consider the case when shape parameter is greater than one and reduce the corresponding<br />
optimization problems to their equivalent convex forms, thus guaranteeing convergence to a global minimum.<br />
- 175 -
09:00-11:10, Paper WeAT8.33<br />
Parallel Algorithm of Two-Dimensional Discrete Cosine Transform based on Special Data Representation<br />
Chicheva, Marina, Image Processing System Inst. of RAS<br />
The paper is devoted to parallel approach efficiency research for two-dimensional discrete cosine transform. The algorithm<br />
based on data representation in hypercompex algebra is proposed.<br />
09:00-11:10, Paper WeAT8.34<br />
Parallel Scales for More Accurate Displacement Estimation in Phase-Based Image Registration<br />
Forsberg, Daniel, Linköping Univ.<br />
Andersson, Mats, Linköping Univ.<br />
Knutsson, Hans<br />
Phase-based methods are commonly applied in image registration. When working with phase-difference methods only a<br />
single scale is employed, although the algorithms are normally iterated over multiple scales, whereas phase-congruency<br />
methods utilize the the phase from multiple scales simultaneously. This paper presents an extension to phase-difference<br />
methods employing parallel scales to achieve more accurate displacements. Results are also presented clearly favouring<br />
the use of parallel scales over single scale in more than 95% of the 120 tested cases.<br />
09:00-11:10, Paper WeAT8.35<br />
A Comprehensive Evaluation on Non-Deterministic Motion Estimation<br />
Wu, Changzhu, Northwestern Pol. Univ.<br />
Wang, Qing, Northwestern Pol. Univ.<br />
When computing optical flow with region-based matching, very few of them can be reliably obtained, especially for the<br />
high-contrast areas or those with little texture. Instead of using a single pixel from the reference frame, non-deterministic<br />
motion utilizes multiple pixels within a neighborhood to represent the corresponding pixel in the current frame. Although<br />
remarkable improvement has been made with this method, the weight associated to each reference pixel is quite sensitive<br />
to the selection of its standard deviation. To address this issue, a dual probability is presented in this paper. Intuitively, it<br />
enhances those weights of pixels that are more similar to its counterpart in the current frame, while suppressing the rest<br />
of them. Experimental results show that the proposed method is effective to deal with intense motion and occlusion, especially<br />
in the case of reducing the adverse impact of noise.<br />
09:00-11:10, Paper WeAT8.36<br />
A Full-View Spherical Image Format<br />
Li, Shigang, Faculty of Engineering<br />
Hai, Ying, Tottori Univ.<br />
This paper proposes a full-view spherical image format which is based on the geodesic division of a sphere. In comparison<br />
with the conventional 3D array representation which consists of five parallelograms, the proposed spherical image format<br />
is a simple 2D array representation. The algorithms of finding the neighboring pixels given a pixel of a spherical image<br />
and mapping between spherical coordinate and spherical image pixel are given also.<br />
09:00-11:10, Paper WeAT8.37<br />
Shift-Map Image Registration<br />
Svärm, Linus, Lund Univ.<br />
Strandmark, Petter, Lund Univ.<br />
Shift-map image processing is a new framework based on energy minimization over a large space of labels. The optimization<br />
utilizes alpha-expansion moves and iterative refinement over a Gaussian pyramid. In this paper we extend the range<br />
of applications to image registration. To do this, new data and smoothness terms have to be constructed. We note a great<br />
improvement when we measure pixel similarities with the dense DAISY descriptor. The main contributions of this paper<br />
are: * The extension of the shift-map framework to include image registration. We register images for which SIFT only<br />
provides 3 correct matches. * A publicly available implementation of shift-map image processing (e.g. in painting, registration).<br />
We conclude by comparing shift-map registration to a recent method for optical flow with favorable results.<br />
- 176 -
09:00-11:10, Paper WeAT8.38<br />
An Adaptive Method for Efficient Detection of Salient Visual Object from Color Images<br />
Brezovan, Marius, Univ. of Craiova<br />
Burdescu, Dumitru Dan, Univ. of Craiova<br />
Ganea, Eugen, Univ. of Craiova<br />
Stanescu, Liana, Univ. of Craiova<br />
Stoica, Cosmin, Univ. of Craiova<br />
This paper presents an efficient graph-based method to detect salient objects from color images and to extract their color<br />
and geometric features. Despite of the majority of the segmentation methods our method is totally adaptive and it do not<br />
require any parameter to be chosen in order to produce a better segmentation. The proposed segmentation method uses a<br />
hexagonal structure defined on the set of the image pixels ant it performs two different steps: a pre-segmentation step that<br />
will produce a maximum spanning tree of the connected components of the visual graph constructed on the hexagonal<br />
structure of an image, and the final segmentation step that will produce a minimum spanning tree of the connected components,<br />
representing the visual objects, by using dynamic weights based on the geometric features of the regions. Experimental<br />
results are presented indicating a good performance of our method.<br />
09:00-11:10, Paper WeAT8.39<br />
Robust Matching in an Uncertain World<br />
Sur, Frédéric, INPL / INRIA Nancy Grand Est<br />
Finding point correspondences which are consistent with a geometric constraint is one of the cornerstones of many computer<br />
vision problems. This is a difficult task because of spurious measurements leading to ambiguously matched points<br />
and because of uncertainty in point location. In this article we address these problems and propose a new robust algorithm<br />
that explicitly takes account of location uncertainty. We propose applications to SIFT matching and 3D data fusion.<br />
09:00-11:10, Paper WeAT8.41<br />
Recursive Dynamically Variable Step Search Motion Estimation Algorithm for High Definition Video<br />
Tasdizen, Ozgur, Sabanci Univ.<br />
Hamzaoglu, Ilker, Sabanci Univ.<br />
For High Definition (HD) video formats, computational complexity of Full Search (FS) Motion Estimation (ME) algorithm<br />
is prohibitively high, whereas the Peak Signal-to-Noise Ratio obtained by fast search ME algorithms is low. Therefore, in<br />
this paper, we propose Recursive Dynamically Variable Step Search (RDVSS) ME algorithm for real-time processing of<br />
HD video formats. RDVSS algorithm dynamically determines the search patterns that will be used for each Macro block<br />
(MB) based on the motion vectors of its spatial and temporal neighboring MBs. RDVSS performs very close to FS by<br />
searching much fewer search locations than FS and it outperforms successful fast search ME algorithms by searching<br />
more search locations than these algorithms. In addition, RDVSS algorithm can be efficiently implemented by a reconfigurable<br />
systolic array based ME hardware.<br />
09:00-11:10, Paper WeAT8.42<br />
Spatial and Temporal Enhancement of Depth Images Captured by a Time-of-Flight Depth Sensor<br />
Kim, Sung-Yeol, The Unviersity of Tennessee<br />
Cho, Ji-Ho, Gwangju Insititute of Science and Tech.<br />
Koschan, Andreas, The Unviersity of Tennessee<br />
Abidi, Mongi, The Unviersity of Tennessee<br />
In this paper, we present a new method to enhance depth images captured by a time-of-flight (TOF) depth sensor spatially<br />
and temporally. In practice, depth images obtained from TOF depth sensors have critical problems, such as optical noise<br />
existence, unmatched boundaries, and temporal inconsistency. In this work, we improve depth quality by performing a<br />
newly-designed joint bilateral filtering, color segmentation-based boundary refinement, and motion estimation-based temporal<br />
consistency. Experimental results show that the proposed method significantly minimizes the inherent problems of<br />
the depth images so that we can use them to generate a dynamic and realistic 3D scene.<br />
- 177 -
09:00-11:10, Paper WeAT8.43<br />
Transition Thresholds for Binarization of Historical Documents<br />
Ramírez-Ortegón, Marte Alejandro, Free Univ. of Berlin<br />
Rojas, Raul, Freie Univ. Berlin<br />
This paper extends the transition method for binarization based on transition pixels, a generalization of edge pixels. This<br />
method originally computes transition thresholds using the quantile thresholding algorithm, that has a critical parameter.<br />
We achieved an automatic version of the transition method by computing the transition thresholds with the Rosin’s algorithm.<br />
We experimentally tested four variants of the transition method combining the density and cumulative distribution<br />
functions of transition values, with gray-intensity thresholds based on the normal and lognormal density functions. The<br />
results of our experiments show that these unsupervised methods yields superior binarization compared with top-ranked<br />
algorithms.<br />
09:00-11:10, Paper WeAT8.44<br />
Image Quality Metrics: PSNR vs. SSIM<br />
Horé, Alain, Sherbrooke Univ.<br />
Ziou, Djemel, Sherbrooke Univ.<br />
In this paper, we analyse two well-known objective image quality metrics, the peak-signal-to-noise ratio (PSNR) as well<br />
as the structural similarity index measure (SSIM), and we derive a simple mathematical relationship between them which<br />
works for various kinds of image degradations such as Gaussian blur, additive Gaussian white noise, jpeg and jpeg2000<br />
compression. A series of tests realized on images extracted from the Kodak database gives a better understanding of the<br />
similarity and difference between the SSIM and the PSNR.<br />
09:00-11:10, Paper WeAT8.45<br />
Coarse Scale Feature Extraction using the Spiral Architecture Structure<br />
Coleman, Sonya, Univ. of Ulster<br />
Scotney, Bryan, Univ. of Ulster<br />
Gardiner, Bryan, Univ. of Ulster<br />
The Spiral Architecture has been developed as a fast way of indexing a hexagonal pixel-based image. In combination with<br />
spiral addition and spiral multiplication, methods have been developed for hexagonal image processing operations such<br />
as translation and rotation. Using the Spiral Architecture as the basis for our operator structure, we present a general approach<br />
to the computation of adaptive coarse scale Laplacian operators for use on hexagonal pixel-based images. We evaluate<br />
the proposed operators using simulated hexagonal images and demonstrate improved performance when compared<br />
with rectangular Laplacian operators such as Marr-Hildreth<br />
09:00-11:10, Paper WeAT8.46<br />
Visual Perception Driven Registration of Mammograms<br />
Boucher, Arnaud, Univ. Paris Descartes<br />
Cloppet, Florence, Paris Descartes Univ.<br />
Vincent, Nicole, Paris Descartes Univ.<br />
Jouve, Pierre Emmanuel, Fenics Company<br />
This paper aims to develop a methodology to register pairs of temporal mammograms. Control points based on anatomical<br />
features are detected in an automated way. Thereby, image semantic is used to extract landmarks based on these control<br />
points. A referential is generated from these control points based on this referential the studied images are realigned using<br />
different levels of observation leading to both rigid and pseudo non-rigid transforms according to expert mammogram<br />
reading.<br />
- 178 -
09:00-11:10, Paper WeAT8.47<br />
Robust Fourier-Based Image Alignment with Gradient Complex Image<br />
Su, Hong-Ren, National Tsing Hua Univ.<br />
Lai, Shang-Hong, National Tsing Hua Univ.<br />
Tsai, Ya-Hui, Industrial Tech. Res. Inst.<br />
The paper proposes a robust image alignment framework based on Fourier transform of a gradient complex image. The<br />
proposed Fourier-based algorithm can handle translation, rotation, and scaling, and it is robust against noise and non-uniform<br />
illumination. The proposed alignment algorithm is further extended to work under occlusion by partitioning the template<br />
and performing the Fourier-based alignment for all partitioned sub-templates in a voting framework. Our experiments<br />
show superior alignment results by using the proposed robust Fourier-based alignment over the previous related methods.<br />
09:00-11:10, Paper WeAT8.48<br />
Rate Control of H.264 Encoded Sequences by Dropping Frames in the Compressed Domain<br />
Kapotas, Spyridon, Hellenic Open Univ.<br />
Skodras, Athanassios N., Hellenic Open Univ.<br />
A new technique for controlling the bitrate of H.264 encoded sequences is presented. Bitrate control is achieved by dropping<br />
frames directly in the compressed domain. The dropped frames are carefully selected so as to either eliminate or cause<br />
non perceptible drift errors in the decoder. The technique suits well H.264 encoded sequences such as movies and tv news,<br />
which are transmitted over wireless networks.<br />
09:00-11:10, Paper WeAT8.49<br />
Statistical Analysis of Kalman Filters by Conversion to Gauss Helmert Models with Applications to Process Noise Estimation<br />
Petersen, Arne, Christian-Albrechts-Univ. of Kiel<br />
Koch, Reinhard, Univ. of Kiel<br />
This paper introduces a reformulation of the extended Kalman Filter using the Gauss-Helmert model for least squares estimation.<br />
By proving the equivalence of both estimators it is shown how the methods of statistical analysis in least squares<br />
estimation can be applied to the prediction and update process in Kalman Filtering. Especially the efficient computation<br />
of the reliability (or redundancy) matrix allows the implementation of self supervising systems. As an application an unparameterized<br />
method for estimating the variances of the filters process noise is presented.<br />
09:00-11:10, Paper WeAT8.50<br />
Color Adjacency Modeling for Improved Image and Video Segmentation<br />
Price, Brian, Brigham Young Univ.<br />
Morse, Bryan, Brigham Young Univ.<br />
Cohen, Scott, Adobe Systems<br />
Color models are often used for representing object appearance for foreground segmentation applications. The relationships<br />
between colors can be just as useful for object selection. In this paper, we present a method of modeling color adjacency<br />
relationships. By using color adjacency models, the importance of an edge in a given application can be determined and<br />
scaled accordingly. We apply our model to foreground segmentation of similar images and video. We show that given one<br />
previously-segmented image, we can greatly reduce the error when automatically segmenting other images by using our<br />
color adjacency model to weight the likelihood that an edge is part of the desired object boundary.<br />
09:00-11:10, Paper WeAT8.51<br />
Paired Transform Slice Theorem of 2-D Image Reconstruction from Projections<br />
Dursun, Serkan, Univ. of Texas at San Antonio<br />
Du, Nan, Univ. of Texas at San Antonio<br />
Grigoryan, Artyom M., Univ. of Texas at San Antonio<br />
This paper discusses the paired transform-based method of reconstruction of 2-D images from their projections. The complete<br />
set of basic functions of the 2-D discrete paired transform are defined by specific directions, i.e. the transform is di-<br />
- 179 -
ectional and can be calculated from the projection data. A simple formula is presented for image reconstruction without<br />
calculating the 2-D discrete Fourier transform in the case, when the size of image is Lr x Lr, when L is prime. The image<br />
reconstruction is described by the discrete model that is used in the series expansion methods of image reconstruction.<br />
The proposed method of reconstruction has been implemented and successfully applied for modeled images on Cartesian<br />
grid of sizes up to 256x256.<br />
09:00-11:10, Paper WeAT8.52<br />
Segmentation of Cervical Cell Images<br />
Kale, Asli, Bilkent Univ.<br />
Aksoy, Selim, Bilkent Univ.<br />
The key step of a computer-assisted screening system that aims early diagnosis of cervical cancer is the accurate segmentation<br />
of cells. In this paper, we propose a two-phase approach to cell segmentation in Pap smear test images with the<br />
challenges of inconsistent staining, poor contrast, and overlapping cells. The first phase consists of segmenting an image<br />
by a non-parametric hierarchical segmentation algorithm that uses spectral and shape information as well as the gradient<br />
information. The second phase aims to obtain nucleus regions and cytoplasm areas by classifying the segments resulting<br />
from the first phase based on their spectral and shape features. Experiments using two data sets show that our method performs<br />
well for images containing both a single cell and many overlapping cells.<br />
09:00-11:10, Paper WeAT8.53<br />
Principal Contour Extraction and Contour Classification to Detect Coronal Loops from the Solar Images<br />
Durak, Nurcan, Univ. of Louisville<br />
Nasraoui, Olfa, Univ. of Louisville<br />
In this paper, we describe a system that determines coronal loop existence from a given Solar image region in two stages:<br />
1) extracting principal contours from the solar image regions, 2) deciding whether the extracted contours are in a loop<br />
shape. In the first stage, we propose a principal contour extraction method that achieves 88% accuracy in extracting the<br />
desired contours from the cluttered regions. In the second stage, we analyze the extracted contours in terms of their geometric<br />
features such as linearity, elliptical features, curvature, proximity, smoothness, and corner points. To distinguish<br />
loop contours from the other forms, we train an Adaboost classifier based C4.5 decision tree by using geometric features<br />
of 150 loop contours and 250 non-loop contours. Our system achieves 85% F1-Score from 10-fold cross validation experiments.<br />
09:00-11:10, Paper WeAT8.54<br />
Human Shadow Removal with Unknown Light Source<br />
Chen, Chia-Chih, The Univ. of Texas at Austin<br />
Aggarwal, J. K., The Univ. of Texas at Austin<br />
In this paper, we present a shadow removal technique which effectively eliminates a human shadow cast from an unknown<br />
direction of light source. A multi-cue shadow descriptor is proposed to characterize the distinctive properties of shadows.<br />
We employ a 3-stage process to detect then remove shadows. Our algorithm improves the shadow detection accuracy by<br />
imposing the spatial constraint between the foreground subregions of human and shadow. We collect a dataset containing<br />
81 human-shadow images for evaluation. Both descriptor ROC curves and qualitative results demonstrate the superior<br />
performance of our method.<br />
09:00-11:10, Paper WeAT8.55<br />
Generalizing Tableau to Any Color of Teaching Boards<br />
Oliveira, Daniel Marques, Univ. Federal de Pernambuco<br />
Lins, Rafael Dueire, Univ. Federal de Pernambuco<br />
Teaching boards are omnipresent in classrooms throughout the world. Tableau is a software environment for processing<br />
images from teaching-boards acquired using portable digital cameras and cell-phones. The previous versions of Tableau<br />
were restricted to white-board processing. This paper generalizes the enhancement algorithm to work with boards of any<br />
color, being the first software environment able to process non-white boards.<br />
- 180 -
09:00-11:10, Paper WeAT8.56<br />
Enhancing the Filtering-out of the Back-to-Front Interference in Color Documents with a Neural Classifier<br />
Silva, Gabriel De França Pereira E, Univ. Federal de Pernambuco<br />
Lins, Rafael Dueire, Univ. Federal de Pernambuco<br />
Silva, João Marcelo Monte Da, Univ. Federal de Pernambuco<br />
Banergee, Serene, Hewlett-Packard Labs - India<br />
Kuchibhotla, Anjaneyulu, Hewlett-Packard Labs - India<br />
Thielo, Marcelo, Hewlett-Parckard Labs - Brazil<br />
Back-to-front, show-through, or bleeding are the names given to the interference that appears whenever one writes or<br />
prints on both sides of translucent paper. Such interference degrades image binarization and document transcription via<br />
OCR. The technical literature presents several algorithms to remove the back-to-front noise, but no algorithm is good<br />
enough in all cases. This article presents a new technique to remove such noise in color documents which makes use of<br />
neural classifiers to evaluate the degree of intensity of the interference and besides that to indicate the existence of blur.<br />
Such classifier allows tuning the parameters of an algorithm for back-to-front interference and document enhancement.<br />
09:00-11:10, Paper WeAT8.57<br />
A Scale Estimation Algorithm using Phase-Based Correspondence Matching for Electron Microscope Images<br />
Suzuki, Ayako, Tohoku Univ.<br />
Ito, Koichi, Tohoku Univ.<br />
Aoki, Takafumi, Tohoku Univ.<br />
Tsuneta, Ruriko, Hitachi, Ltd., Central Res. Lab.<br />
This paper proposes a multi-stage scale estimation algorithm using phase-based correspondence matching for electron<br />
microscope images. Consider a sequence of microscope images of the same target object, where the image magnification<br />
is gradually increased so that the final image has a very large scale factor S (e.g., S=1,000) with respect to the initial image.<br />
The problem considered in this paper is to estimate the overall scale factor S of the given image sequence. The proposed<br />
scale estimation technique provides a new methodology for high-accuracy magnification calibration of electron microscopes.<br />
Experimental evaluation using Mandelbrot images as precisely scale-controlled image sequence shows that the<br />
proposed method can estimate the scale factor S=1,000 with approximately 0.1%-scale error. This paper also describes an<br />
application of the proposed algorithm to the magnification calibration of an actual STEM (Scanning Transmission Electron<br />
Microscope).<br />
09:00-11:10, Paper WeAT8.58<br />
Edge Drawing: An Heuristic Approach to Robust Real-Time Edge Detection<br />
Topal, Cihan, Anadolu Univ.<br />
Akinlar, Cuneyt, Anadolu Univ.<br />
Genc, Yakup, Siemens Corp. Res.<br />
We propose a new edge detection algorithm that works by computing a set of anchor edge points in an image and then<br />
linking these anchor points by drawing edges between them. The resulting edge map consists of perfect contiguous, one<br />
pixel wide edges. The performance tests show that our algorithm is up to 16% faster than the fastest known edge detection<br />
algorithm, i.e., OpenCV implementation of the Canny edge detector. We believe that our edge detector is a novel step in<br />
edge detection and would be very suitable for the next generation real-time image processing and computer vision applications.<br />
09:00-11:10, Paper WeAT8.59<br />
MPEG-2 Video Watermarking using Pattern Consideration<br />
Mansouri, Azadeh, Shahid Beheshti Univ.<br />
Mahmoudi Aznaveh, Ahmad, Shahid Beheshti Univ.<br />
Torkamani-Azar, Farah, Shahid Beheshti Univ.<br />
This paper proposes a new method for digital video watermarking in compressed domain. Both the embedding and extracting<br />
phases are performed after entropy decoding. Consequently, fully decompressing the compressed video is not<br />
necessary making this scheme an appropriate choice for real-time application. Furthermore, taking the structural information<br />
into account leads to presenting a robust watermarking scheme along with less quality degradation. To select suitable<br />
- 181 -
coefficients for embedding the watermark, three different aspects, imperceptibility, security, and bit rate increase, have<br />
been considered. These performance factors are adjusted by defining three priority matrices. In addition, a content based<br />
key is proposed in order to overcome the collusion attack. The flexibility of our method to provide desired characteristic<br />
can be expressed as another advantage.<br />
09:00-11:10, Paper WeAT8.60<br />
Lip Segmentation using Level Set Method: Fusing Landmark Edge Distance and Image Information<br />
Banimahd, Seyed Reza, Sahand Univ. of Tech.<br />
Ebrahimnezhad, Hossein, Sahand Univ. of Tech.<br />
Lip segmentation is an essential step in audio-visual processing systems. In this paper, we incorporate the color and edge<br />
information in level set formulation, for extraction of lip contour. We build two initiative auxiliary images by mixing of<br />
different color spaces, to extract the landmark edges for upper and lower part of lip. The performance of this approach on<br />
VidTIMIT databases is tasted and accuracy of 91.2% is reached.<br />
09:00-11:10, Paper WeAT8.61<br />
Adaptive Color Independent Components based SIFT Descriptors for Image Classification<br />
Ai, Danni, Ritsumeikan Univ.<br />
Han, Xian-Hua, Ritsumeikan Univ.<br />
Ruan, Xiang, Omron corparation<br />
Chen, Yen-Wei, Ritsumeikan Univ.<br />
This paper proposes an adaptive color independent components based SIFT descriptor (termed CIC-SIFT) for image classification.<br />
Our motivation is to seek an adaptive and efficient color space for color SIFT feature extraction. Our work has<br />
two key contributions. First, based on independent component analysis (ICA), an adaptive and efficient color space is<br />
proposed for color image representation. Second, in this ICA-based color space, a discriminative CIC-SIFT descriptor is<br />
calculated for image classification. The experiment results indicate that (1) contrast between objects and background can<br />
be enhanced on the ICA-based color space and (2) the CIC-SIFT descriptor outperforms other conventional color SIFT<br />
descriptors on image classification.<br />
WeAT9 Lower Foyer<br />
Bioinformatics and Biomedical Applications Poster Session<br />
Session chair: Unay, Devrim (Bahcesehir Univ.)<br />
09:00-11:10, Paper WeAT9.1<br />
Joint Registration and Segmentation of Histological Volume Data by Diffusion-Based Label Adaption<br />
Bollenbeck, Felix, Fraunhofer Inst. for Factory Operation and Automation<br />
Seiffert, Udo, Fraunhofer IFF Magdeburg<br />
Three-dimensional serial section imaging delivers high spatial resolution and histological detail, which facilitates analysis<br />
of differentiation and development by exact labelling of tissues and cells, unknown to other 3-D imaging modalities. We<br />
propose an algorithm for interleaved reconstruction and segmentation of tissues in serial section volumes by diffusionbased<br />
registration and adaption of two-dimensional reference labellings. Iterative refinement of the global image congruence<br />
and local deformation of labellings delivers an efficient algorithm for processing of large volume data-sets. The<br />
benefits of the approach are shown by means of reconstruction and segmentation of giga-voxel serial section volumes of<br />
plant specimen.<br />
09:00-11:10, Paper WeAT9.2<br />
The Use of Genetic Programming for Learning 3D Craniofacial Shape Quantification<br />
Atmosukarto, Indriyati, Univ. of Washington<br />
Shapiro, Linda,<br />
Heike, Carrie, Seattle Children’s Hospital, Craniofacial Center<br />
Craniofacial disorders commonly result in various head shape dysmorphologies. The goal of this work is to quantify the<br />
various 3D shape variations that manifest in the different facial abnormalities in individuals with a craniofacial disorder<br />
- 182 -
called 22q11.2 Deletion Syndrome. Genetic programming (GP) is used to learn the different 3D shape quantifications.<br />
Experimental results show that the GP method achieves a higher classification rate than those of human experts and existing<br />
computer algorithms.<br />
09:00-11:10, Paper WeAT9.3<br />
Identification of Ancestry Informative Markers from Chromosome-Wide Single Nucleotide Polymorphisms using Symmetrical<br />
Uncertainty Ranking<br />
Piroonratana, Theera, King Mongkut’s Univ. of Tech.<br />
Wongseree, Waranyu, King Mongkut’s Univ. of Tech.<br />
Usavanarong, Touchpong, King Mongkut’s Univ. of Tech.<br />
Assawamakin, Anunchai, Mahidol Univ.<br />
Limwongse, Chanin, Mahidol Univ.<br />
Chaiyaratana, Nachol, King Mongkut’s Univ. of Tech.<br />
Ancestry informative markers (AIMs) have been proven to contain necessary information for population classification. In<br />
this article, round robin symmetrical uncertainty ranking for preliminary AIM screening is proposed. Each single nucleotide<br />
polymorphism (SNP) is assigned a rank based on its ability to separate two populations from each other. In a multi-population<br />
scenario, all possible population pairs are considered and the screened SNP set incorporates top-ranked SNPs from<br />
every pair-wise comparison. After the preliminary screening, SNPs are further screened by a wrapper which is embedded<br />
with a naive Bayes classifier. A classification model is subsequently constructed from the finally screened SNPs via a<br />
naive Bayes classifier. The application of the proposed procedure to the Hap Map data indicates that AIM panels can be<br />
found on all chromosomes. Each panel consists of 11 to 24 SNPs and can be used to completely classify the CEU, CHB,<br />
JPT and YRI populations. Moreover, all panels are smaller than the AIM panels reported in previous studies.<br />
09:00-11:10, Paper WeAT9.4<br />
Evaluation of a New Point Clouds Registration Method based on Group Averaging Features<br />
Temerinac-Ott, Maja, Univ. of Freiburg<br />
Keuper, Margret, Univ. of Freiburg<br />
Burkhardt, Hans, Univ. of Freiburg<br />
Registration of point clouds is required in the processing of large biological data sets. The trade off between computation<br />
time and accuracy of the registration is the main challenge in this task. We present a novel method for registering point<br />
clouds in two and three dimensional space based on Group Averaging on the Euclidean transformation group. It is applied<br />
on a set of neighboring points whose size directly controls computing time and accuracy. The method is evaluated regarding<br />
dependencies of the computing time and the registration accuracy versus the point density assuming their random distribution.<br />
Results are verified in two biological applications on 2D and 3D images.<br />
09:00-11:10, Paper WeAT9.5<br />
Cell Tracking in Video Microscopy using Bipartite Graph Matching<br />
Chowdhury, Ananda, Jadavpur Univ.<br />
Chatterjee, Rohit, Jadavpur Univ.<br />
Ghosh, Mayukh, Jadavpur Univ.<br />
Ray, Nilanjan, Univ. of Alberta<br />
Automated visual tracking of cells from video microscopy has many important biomedical applications. In this paper, we<br />
model the problem of cell tracking over pairs of video microscopy image frames as a minimum weight matching problem in<br />
bipartite graphs. The bipartite matching essentially establishes one-to-one correspondences between the cells in different<br />
frames. A key advantage of using bipartite matching is the inherent scalability, which arises from its polynomial time-complexity.<br />
We propose two different tracking methods based on bipartite graph matching and properties of Gaussian distributions.<br />
In both the methods, i) the centers of the cells appearing in two frames are treated as vertices of a bipartite graph and ii) the<br />
weight matrix contains information about distance between the cells (in two frames) and cell velocity. In the first method,<br />
we identify fast-moving cells based on distance and filter them out using Gaussian distributions before the matching is<br />
applied. In the second method, we remove false matches using Gaussian distributions after the bipartite graph matching is<br />
employed. Experimental results indicate that both the methods are promising while the second method has higher accuracy.<br />
- 183 -
09:00-11:10, Paper WeAT9.6<br />
Human State Classification and Predication for Critical Care Monitoring by Real-Time Bio-Signal Analysis<br />
Li, Xiaokun, DCM Res. Res. LLC<br />
Porikli, Fatih, MERL<br />
To address the challenges in critical care monitoring, we present a multi-modality bio-signal modeling and analysis modeling<br />
framework for real-time human state classification and predication. The novel bioinformatic framework is developed<br />
to solve the human state classification and predication issues from two aspects: a) achieve 1:1 mapping between the biosignal<br />
and the human state via discriminant feature analysis and selection by using probabilistic principle component<br />
analysis (PPCA); b) avoid time-consuming data analysis and extensive integration resources by using Dynamic Bayesian<br />
Network (DBN). In addition, intelligent and automatic selection of the most suitable sensors from the bio-sensor array is<br />
also integrated in the proposed DBN.<br />
09:00-11:10, Paper WeAT9.7<br />
Automated Cephalometric Landmark Identification using Shape and Local Appearance Models<br />
Keustermans, Johannes, K.U. Leuven<br />
Mollemans, Wouter, Medicim nv.<br />
Vandermeulen, Dirk<br />
Suetens, Paul, K.U.Leuven<br />
In this paper a method is presented for the automated identification of cephalometric anatomical landmarks in craniofacial<br />
cone-beam CT images. This method makes use of statistical models, incorporating both local appearance and shape knowledge<br />
obtained from training data. Firstly, the local appearance model captures the local intensity pattern around each<br />
anatomical landmark in the image. Secondly, the shape model contains a local and a global component. The former improves<br />
the flexibility, whereas the latter improves the robustness of the algorithm. Using a leave-one-out approach to the<br />
training data, we assess the overall accuracy of the method. The mean and median error values for all landmarks are equal<br />
to 2.55mm and 1.72mm, respectively.<br />
09:00-11:10, Paper WeAT9.8<br />
Color Analysis for Segmenting Digestive Organs in VCE<br />
Vu, Hai, The Inst. of Scientific and Industrial Res. Osaka<br />
Echigo, Tomio, Osaka Electro-Communication Univ.<br />
Yagi, Yasushi, Osaka Univ.<br />
Yagi, Keiko, Kobe Pharmaceutical Univ.<br />
Shiba, Masatsugu, Osaka City Univ.<br />
Higuchi, Kazuhide, Osaka City Univ.<br />
Arakawa, Tetsuo, Osaka City Univ.<br />
This paper presents an efficient method for automatically segmenting the digestive organs in a Video Capsule Endoscopy<br />
(VCE) sequence. The method is based on unique characteristics of color tones of the digestive organs. We first introduce<br />
a color model of the gastrointestinal (GI) tract containing the color components of GI wall and non-wall regions. Based<br />
on the wall regions extracted from images, the distribution along the time dimension for each color component is exploited<br />
to learn the dominant colors that are candidates for discriminating digestive organs. The strongest candidates are then<br />
combined to construct a representative signal to detect the boundary of two adjacent regions. The results of experiments<br />
are comparable with previous works, but computation cost is more efficient.<br />
09:00-11:10, Paper WeAT9.9<br />
A New Application of Meg and DTI on Word Recognition<br />
Meng, Lu, Northeastern Univ.<br />
Xiang, Jing, CCHMC<br />
Zhao, Hong, Northeastern Univ.<br />
Zhao, Dazhe, Northeastern Univ.<br />
This paper presented a novel application of Magneto encephalography (MEG) and diffusion tensor image (DTI) on word<br />
recognition, in which the spatiotemporal signature and the neural network of brain activation associated with word recognition<br />
were investigated. The word stimuli consisted of matched and mismatched words, which were visually and acousti-<br />
- 184 -
cally presented simultaneously. Twenty participants were recruited to distinguish and gave different reactions to these two<br />
types of stimuli. The neural activations caused by their reactions were recorded by MEG system and 3T magnetic DTI<br />
scanner. Virtual sensor technique and wavelet beam former source analysis, which were state-of-the-art methods, were<br />
used to study the MEG and DTI data. Three responses were evoked in the MEG waveform and M160 was identified in<br />
the left temporal-occipital junction. All the results coincided with the previous studies’ conclusions, which indicated that<br />
the integration of virtual sensor and wavelet beam former were effective techniques in analyzing the MEG and DTI data.<br />
09:00-11:10, Paper WeAT9.10<br />
A Hypothesis Testing Approach for Fluorescent Blob Identification<br />
Wu, Le-Shin, Indiana Univ.<br />
Shaw, Sidney, Indiana Univ.<br />
Template matching is a common approach for identifying fluorescent objects within a biological image. But how to decide<br />
a threshold value for the purpose of justifying the goodness of matching score is a rather difficult task. In this paper, we<br />
propose a framework that dynamically chooses appropriate threshold values for correct object identification at a non-arbitrary<br />
statistical power based on the local measure of signal and noise. We validate the feasibility of our proposed framework<br />
by presenting simulation experiments conducted with both synthetic and live-cell data sets. The experimental results<br />
suggest that our auto-thresholding algorithm and local signal to noise ratio estimation can provide solid means for effective<br />
spot identity in place of an ad hoc threshold fitting value or minimization method.<br />
09:00-11:10, Paper WeAT9.11<br />
Automated Detection of Nucleoplasmic Bridges for DNA Damage Scoring in Binucleated Cells<br />
Sun, Changming, CSIRO<br />
Vallotton, Pascal, CSIRO<br />
Fenech, Michael, CSIRO<br />
Thomas, Phil, CSIRO<br />
Quantification of DNA damage, which may be caused by radiation or exposure to chemicals, is very important and can be<br />
very time consuming and subject to variability if carried out visually. The quantification of scoring DNA damage includes<br />
biomarkers such as micronuclei, nucleoplasmic bridges, and nuclear buds as scored in cytokinesis-blocked binucleated<br />
cells. In this paper, we present a new algorithm based on a shortest path technique that enables us to detect the nucleoplasmic<br />
bridges joining two nuclei in cell images of binucleated cells. The effectiveness of our algorithm is illustrated using<br />
a set of cell images. We believe that this is the first time that a feasible automated nucleoplasmic bridge detection system<br />
has been reported.<br />
09:00-11:10, Paper WeAT9.12<br />
Multiple Model Estimation for the Detection of Curvilinear Segments in Medical X-Ray Images using Sparse-Plus-<br />
Dense-RANSAC<br />
Papalazarou, Chrysi, Eindhoven Univ. of Tech.<br />
De With, Peter H. N., Eindhoven Univ. of Tech. / CycloMedia<br />
Rongen, Peter, Philips Healthcare<br />
In this paper, we build on the RANSAC method to detect multiple instances of objects in an image, where the objects are<br />
modeled as curvilinear segments with distinct endpoints. Our approach differs from previously presented work in that it<br />
incorporates soft constraints, based on a dense image representation, that guide the estimation process in every step. This<br />
enables (1) better correspondence with image content, (2) explicit endpoint detection and (3) a reduction in the number of<br />
iterations required for accurate estimation. In the case of curvilinear objects examined in this paper, these constraints are<br />
formulated as binary image labels, where the estimation proved to be robust to mislabeling, e.g. in case of intersections.<br />
Results for both synthetic and real data from medical X-ray images show the improvement from incorporating soft imagebased<br />
constraints.<br />
- 185 -
09:00-11:10, Paper WeAT9.13<br />
Statistical Texture Modeling for Medical Volume using Generalized N-Dimensional Principal Component Analysis<br />
Method and 3D Volume Morphing<br />
Qiao, Xu, Ritsumeikan Univ.<br />
Chen, Yen-Wei, Ritsumeikan Univ.<br />
In this paper, a statistical texture modeling method is proposed for medical volumes. As the shapes of the human organ<br />
are very different from one case to another, 3D volume morphing is applied to normalize all the volume datasets to a same<br />
shape for removing shape variations. In order to deal with the problems of high-dimension and small number of medial<br />
samples, we propose an effective image compression method named Generalized N-dimensional Principal Component<br />
Analysis (GND-PCA) to construct a statistical model. Experiments applied on liver volumes show good performance on<br />
generalization using our method. A simple experiment is employed to show that the features extracted by the statistical<br />
texture model have capability of discrimination for different types of data, such as normal and abnormal.<br />
09:00-11:10, Paper WeAT9.14 CANCELED<br />
Distinguishing Patients with Gastritis and Cholecystitis from the Healthy by Analyzing Wrist Radial Arterial Doppler<br />
Blood Flow Signals<br />
Jiang, Xiaorui, Harbin Inst. of Tech.<br />
Zhang, Dongyu, Harbin Inst. of Tech.<br />
Wang, Kuanquan, Harbin Inst. of Tech.<br />
Zuo, Wangmeng, Harbin Inst. of Tech.<br />
This paper tries to fill the gap between Traditional Chinese Pulse Diagnosis (TCPD) and Doppler diagnosis by applying<br />
digital signal analysis and pattern classification techniques to wrist radial arterial Doppler blood flow signals. Doppler<br />
blood flows signals (DBFS) of patients with cholecystitis, gastritis and healthy people are classified by L2-soft margin<br />
SVM and 5 linear classifiers using the proposed feature - piecewise axially integrated bispectra (PAIB). A 5-fold cross<br />
validation is used for performance evaluation. The classification accuracies between either two groups of subjects are<br />
greater than 93%. Gastritis can be recognized with higher accuracy than cholecystitis. Cholecystitis can be recognized<br />
with higher accuracy on left hand data than right. The findings in this paper partly conform to the theory of TCPD. Though<br />
the sample size is relatively small, we could still argue that the methods proposed here are effective and could serve as an<br />
assistive tool for TCPD.<br />
09:00-11:10, Paper WeAT9.15<br />
Pelvic Organs Dynamic Features Analysis for MRI Sequences Discrimination<br />
Rahim, Mehdi, Univ. Paul Cézanne<br />
Bellemare, Marc-Emmanuel, Univ. Paul Cézanne<br />
Pirro, Nicolas, Hôpital La Timone<br />
Bulot, Rémy, Univ. Paul Cézanne<br />
Dynamic magnetic resonance imaging MRI acquisitions are used in the clinical assessment of the pelvic organs behaviour<br />
during an abdominal strain. The main organs (bladder, uterus-vagina, rectum) undergo deformations and intrinsic movements<br />
along a sequence. Anatomical references and measurements are generally used by clinicians to evaluate pathology<br />
grades. In this context, we have established quantitative elements, which consist of deformation and movement features,<br />
for the pelvic dynamic characterization, by using shape descriptors computed from organ contours. Moreover, the deformation<br />
and movement features relevance has been assessed towards an efficient sequence discrimination and pathology<br />
detection.<br />
09:00-11:10, Paper WeAT9.16<br />
Multiple Atlas Inference and Population Analysis with Spectral Clustering<br />
Sfikas, Giorgos, Univ. of Ioannina<br />
Heinrich, Christian, Univ. de Strasbourg<br />
Nikou, Christophoros, Univ. of Ioannina<br />
In medical imaging, constructing an atlas and bringing an image set in a single common reference frame may easily lead<br />
the analysis to erroneous conclusions, especially when the population under study is heterogeneous. In this paper, we propose<br />
a framework based on spectral clustering that is capable of partitioning an image population into sets that require a<br />
- 186 -
separate atlas, and identifying the most suitable templates to be used as coordinate reference frames. The spectral analysis<br />
step relies on pairwise distances that express anatomical differences between subjects as a function of the diffeomorphic<br />
warp required to match the one subject onto the other, plus residual information. The methodology is validated numerically<br />
on artificial and medical imaging data.<br />
09:00-11:10, Paper WeAT9.17<br />
Automatic Pathology Annotation on Medical Images: A Statistical Machine Translation Framework<br />
Gong, Tianxia, National Univ. of Singapore<br />
Li, Shimiao, National Univ. of Singapore<br />
Tan, Chew-Lim, National Univ. of Singapore<br />
Pang, Boon Chuan, National Neuroscience Inst. Tan Tock Seng Hospital<br />
Lim, Tchoyoson, National Neuroscience Inst. Tan Tock Seng Hospital<br />
Lee, Cheng Kiang, National Neuroscience Inst. Tan Tock Seng Hospital<br />
Tian, Qi, Insitute of Infocomm Res.<br />
Zhang, Zhuo, Insitute of Infocomm Res.<br />
Large number of medical images are produced daily in hospitals and medical institutions, the needs to efficiently process,<br />
index, search and retrieve these images are great. In this paper, we propose a pathology based medical image annotation<br />
framework using a statistical machine translation approach. After pathology terms and regions of interest (ROIs) are extracted<br />
from training text and images respectively, we use machine translation model IBM Model 1 to iteratively learn the<br />
alignment between the ROIs and the pathology terms and generate an ROI-to-pathology translation table. In testing phase,<br />
we annotate the ROI in the image with the pathology label of the highest probability in the translation table. The overall<br />
annotation results and the retrieval performance are promising to doctors and medical professionals.<br />
09:00-11:10, Paper WeAT9.18<br />
3D Cell Nuclei Fluorescence Quantification using Sliding Band Filter<br />
Quelhas, Pedro, INEB- Inst. de Engenharia Biomedica<br />
Mendonça, Ana Maria, INEB - Inst. de Engenharia Biomédica<br />
Aurélio, Campilho, Faculdade de Engenharia da Univ. do Porto<br />
Plant development is orchestrated by transcription factors whose expression has become observable in living plants through<br />
the use of fluorescence microscopy. However, the exact quantification of expression levels is still not solved and most<br />
analysis is only performed through visual inspection. With the objective of automating the quantification of cell nuclei<br />
fluorescence we present a new approach to detect cell nuclei in 3D fluorescence confocal microscopy, based on the use of<br />
the sliding band convergence filter (SBF). The SBF filter detects cell nuclei and estimate their shape with high accuracy<br />
in each 2D image plane. For 3D detection, individual 2D shapes are joined into 3D estimates and then corrected based on<br />
the analysis of the fluorescence profile. The final nuclei detection’s precision/recall are of 0.779/0.803 respectively, and<br />
the average Dice’s coefficient of 0.773.<br />
09:00-11:10, Paper WeAT9.19<br />
AP-Based Consensus Clustering for Gene Expression Time Series<br />
Chiu, Tai-Yu, National Tsing Hua Univ.<br />
Hsu, Ting-Chieh, National Tsing Hua Univ.<br />
Wang, Jia-Shung, National Tsing Hua Univ.<br />
We propose an unsupervised approach for analyzing gene time-series datasets. Our method combines Affinity Propagation<br />
(AP) and the spirit of consensus clustering— extracting multiple partitions from different time intervals. Without priori<br />
knowledge of total number of clusters and exemplars, this method holds the relationship between genes through different<br />
time intervals, and eliminates the influence from noises and outliers. We demonstrate our method with both synthetic and<br />
real gene expression datasets showing significant improvement in accuracy and efficiency.<br />
- 187 -
09:00-11:10, Paper WeAT9.21<br />
Unsupervised Tissue Image Segmentation through Object-Oriented Texture<br />
Tosun, Akif Burak, Bilkent Univ.<br />
Sokmensuer, Cenk, Hacettepe Univ.<br />
Gunduz-Demir, Cigdem, Bilkent Univ.<br />
This paper presents a new algorithm for the unsupervised segmentation of tissue images. It relies on using the spatial information<br />
of cytological tissue components. As opposed to the previous study, it does not only use this information in<br />
defining its homogeneity measures, but it also uses it in its region growing process. This algorithm has been implemented<br />
and tested. Its visual and quantitative results are compared with the previous study. The results show that the proposed<br />
segmentation algorithm is more robust in giving better accuracies with less number of segmented regions.<br />
09:00-11:10, Paper WeAT9.22<br />
Automated Tracking of Vesicles in Phase Contrast Microscopy Images<br />
Usenik, Peter, Univ. of Ljubljana<br />
Vrtovec, Tomaž, Univ. of Ljubljana<br />
Pernus, Franjo, Univ. of Ljubljana<br />
Likar, Bostjan, Univ. of Ljubljana<br />
We propose an algorithm for automated tracking of the contours of phospholipid vesicles, which can be used to evaluate<br />
the power, magnitude and frequency distribution of vesicle contour movements induced by thermal fluctuations. The algorithm<br />
was tested on vesicles with different structure composition that were exposed to variable temperature. The results<br />
show that the proposed algorithm is fast, robust and reliable, and that the resulting description of vesicle contours enables<br />
straightforward spectral analysis of their fluctuations, which can be also used for the determination of other vesicle properties,<br />
e.g. the bending rigidity or spontaneous curvature.<br />
09:00-11:10, Paper WeAT9.23<br />
Automatic Detection and Segmentation of Focal Liver Lesions in Contrast Enhanced CT Images<br />
Militzer, Arne, Friedrich-Alexander-Univ. Erlangen-Nuremberg<br />
Hager, Tobias, Friedrich-Alexander-Univ. Erlangen-Nuremberg<br />
Jäger, Florian, Pattern Recognition Lab. Univ. of Erlangen<br />
Tietjen, Christian, Siemens Healthcare<br />
Hornegger, Joachim, Friedrich-Alexander-Univ.<br />
In this paper a novel system for automatic detection and segmentation of focal liver lesions in CT images is presented. It<br />
utilizes a probabilistic boosting tree to classify points in the liver as either lesion or parenchyma, thus providing both detection<br />
and segmentation of the lesions at the same time and fully automatically. To make the segmentation more robust,<br />
an iterative classification scheme is integrated, that incorporates knowledge gained from earlier iterations into later decisions.<br />
Finally, a comprehensive evaluation of both the segmentation and the detection performance for the most common<br />
hypo dense lesions is given. Detection rates of 77% could be achieved with a sensitivity of 0.95 and a specificity of 0.93<br />
for lesion segmentation at the same settings.<br />
09:00-11:10, Paper WeAT9.24<br />
Automatic Diagnosis of Masses by using Level Set Segmentation and Shape Description<br />
Oliver, Arnau, Univ. of Girona<br />
Torrent, Albert, Univ. of Girona<br />
Llado, Xavier, Univ. of Girona<br />
Martí, Joan, Univ. of Girona<br />
We present here an approach for automatic mass diagnosis in mammographic images. Our strategy contains three main<br />
steps. Firstly, region of interests containing mass and background are segmented using a level set algorithm based on<br />
region information. Secondly, the characterisation of each segmented mass is obtained using the Zernike moments for<br />
modelling its shape. The final step is the diagnosis of masses as benign or malignant lesions, which is done using the Gentleboost<br />
algorithm that also assigns a likelihood value to the final result. The experimental evaluation, performed using<br />
two different digitised databases and Receiver Operating Characteristics (ROC) analysis, proves the feasibility of our proposal,<br />
showing the benefits of a correct shape description for improving automatic mass diagnosis.<br />
- 188 -
09:00-11:10, Paper WeAT9.25<br />
3D Reconstruction of Tumors for Applications in Laparoscopy using Conformal Geometric Algebra<br />
Machucho, Rubén, CINVESTAV, Unidad Guadalajara<br />
Bayro Corrochano, Eduardo Jose, CINVESTAV, Unidad Guadalajara<br />
This paper presents a method for 3D reconstruction of tumors for applications in laparoscopy. This uses stereo endoscopic<br />
ultrasound images, which are simultaneously recorded. To do this, the ultrasound probe is tracked throughout the stereo<br />
endoscopic images using a particle filter and an auxiliary method based on thresholding in the HSV-space is used in order<br />
to improve the tracking. Then, the 3D pose of the ultrasound probe is calculated using conformal geometric algebra. The<br />
2D ultrasound images have been segmented using two methods: the level sets method and morphological operators, and<br />
a comparison between their performances has been done. Finally, the processed ultrasound images are compounded into<br />
a 3D volume, using the calculated ultrasound pose.<br />
09:00-11:10, Paper WeAT9.26<br />
Vessel Bend-Based Cup Segmentation in Retinal Images<br />
Joshi, Gopal Datt, IIIT Hyderabad<br />
Sivaswamy, Jayanthi, IIIT Hyderabad<br />
Karan, Kundan, AECS, Madurai<br />
Ranganath, Prashanth Ranganath, AECS, Madurai<br />
Krishnadas, S.r.krishnadas, AECS, Madurai<br />
In this paper, we present a method for cup boundary detection from monocular colour fundus image to help quantify cup<br />
changes. The method is based on anatomical evidence such as vessel bends at cup boundary, considered relevant by glaucoma<br />
experts. Vessels are modeled and detected in a curvature space to better handle inter-image variations. Bends in a<br />
vessel are robustly detected using a region of support concept, which automatically selects the right scale for analysis. A<br />
reliable subset called r-bends is derived using a multi-stage strategy and a local splinetting is used to obtain the desired<br />
cup boundary. The method has been successfully tested on 133 images comprising 32 normal and 101 glaucomatous<br />
images against three glaucoma experts. The proposed method shows high sensitivity in cup to disk ratio-based glaucoma<br />
detection and local assessment of the detected cup boundary shows good consensus with the expert markings.<br />
09:00-11:10, Paper WeAT9.27<br />
A Spot Segmentation Approach for 2D Gel Electrophoresis Images based on 2D Histograms<br />
Zacharia, Eleni, Univ. of Athens<br />
Kostopoulou, Eirini, Univ. of Athens<br />
Maroulis, Dimitris, Univ. of Athens<br />
Kossida, Sophia, Foundation of Biomedical Res. of the Acad. of Athens<br />
Spot-Segmentation, an essential stage of processing 2D gel electrophoresis images, remains a challenging process. The<br />
available software programs and techniques fail to separate overlapping protein spots correctly and cannot detect low intensity<br />
spots without human intervention. This paper presents an original approach to spot segmentation in 2D gel electrophoresis<br />
images. The proposed approach is based on 2D-histograms of the aforementioned images. The conducted<br />
experiments in a set of 16-bit 2D gel electrophoresis images demonstrate that the proposed method is very effective and<br />
it outperforms existing techniques even when it is applied to images containing several overlapping spots as well as to<br />
images containing spots of various intensities, sizes and shapes.<br />
09:00-11:10, Paper WeAT9.28<br />
Automated Tracking of the Carotid Artery in Ultrasound Image Sequences using a Self Organizing Neural Network<br />
Hamid Muhammed, Hamed, Royal Inst. of Tech. (KTH)<br />
Azar, Jimmy C., STH, KTH<br />
An automated method for the segmentation and tracking of moving vessel walls in 2D ultrasound image sequences is introduced.<br />
The method was tested on simulated and real ultrasound image sequences of the carotid artery. Tracking was<br />
achieved via a self organizing neural network known as Growing Neural Gas. This topology-preserving algorithm assigns<br />
a net of nodes connected by edges that distributes itself within the vessel walls and adapts to changes in topology with<br />
time. The movement of the nodes was analyzed to uncover the dynamics of the vessel wall. By this way, radial and longitudinal<br />
strain and strain rates have been estimated. Finally, wave intensity signals were computed from these measure-<br />
- 189 -
ments. The method proposed improves upon wave intensity wall analysis, WIWA, and opens up a possibility for easy and<br />
efficient analysis and diagnosis of vascular disease through noninvasive ultrasonic examination.<br />
09:00-11:10, Paper WeAT9.29<br />
Quantification of Subcellular Molecules in Tissue MicroArray<br />
Can, Ali, General Electric<br />
Gerdes, Michael, General Electric<br />
Bello, Musodiq, General Electric<br />
Quantifying expression levels of proteins with sub cellular resolution is critical to many applications ranging from biomarker<br />
discovery to treatment planning. In this paper, we present a fully automated method and a new metric that quantifies<br />
the expression of target proteins in immunohisto-chemically stained tissue microarray (TMA) samples. The proposed<br />
metric is superior to existing intensity or ratio-based methods. We compared performance with the majority decision of a<br />
group of 19 observers scoring estrogen receptor (ER) status, achieving a detection rate of 96% with 90% specificity. The<br />
presented methods will accelerate the processes of biomarker discovery and transitioning of biomarkers from research<br />
bench to clinical utility.<br />
09:00-11:10, Paper WeAT9.30<br />
Actual Midline Estimation from Brain CT Scan using Multiple Regions Shape Matching<br />
Chen, Wenan, Virginia Commonwealth Univ.<br />
Ward, Kevin, Virginia Commonwealth Univ.<br />
Kayvan, Najarian, Virginia Commonwealth Univ.<br />
Computer assisted medical image processing can extract vital information that may be elusive to human eyes. In this paper,<br />
an algorithm is proposed to automatically estimate the position of the actual midline from the brain CT scans using multiple<br />
regions shape matching. The method matches feature points identified from a set of ventricle templates, extracted from<br />
MRI, with the corresponding feature points in the segmented ventricles from CT images. Then based on the matched<br />
feature points, the position of the actual midline is estimated. The proposed multiple regions shape matching algorithm<br />
addresses the deformation problem arising from the intrinsic multiple regions nature of the brain ventricles. Experiments<br />
on the CT scans from patients with traumatic brain injuries (TBI) show promising results, particularly the proposed algorithm<br />
proves to be quite robust.<br />
09:00-11:10, Paper WeAT9.31<br />
Boosting Alzheimer Disease Diagnosis using PET Images<br />
Silveira, Margarida, Inst. Superior Técnico / Inst. de Sistema e Robótica<br />
Marques, Jorge S., Inst. Superior Técnico<br />
Alzheimer’s disease (AD) is one of the most frequent type of dementia. Currently there is no cure for AD and early diagnosis<br />
is crucial to the development of treatments that can delay the disease progression. Brain imaging can be a biomarker<br />
for Alzheimer’s disease. This has been shown in several works with MR Images, but in the case of functional imaging<br />
such as PET, further investigation is still needed to determine their ability to diagnose AD, especially at the early stage of<br />
Mild Cognitive Impairment (MCI). In this paper we study the use of PET images of the ADNI database for the diagnosis<br />
of AD and MCI. We adopt a Boosting classification method, a technique based on a mixture of simple classifiers, which<br />
performs feature selection concurrently with the segmentation thus is well suited to high dimensional problems. The Boosting<br />
classifier achieved an accuracy of 90.97% in the detection of AD and 79.63% in the detection of MCI.<br />
09:00-11:10, Paper WeAT9.32<br />
Efficient Quantitative Information Extraction from PCR-RFLP Gel Electrophoresis Images<br />
Maramis, Christos, Aristotle Univ. of Thessaloniki<br />
Delopoulos, Anastasios, Aristotle Univ. of Thessaloniki<br />
For the purpose of PCR-RFLP analysis, as in the case of human papillomavirus (HPV) typing, quantitative information<br />
needs to be extracted from images resulting from one-dimensional gel electrophoresis by associating the image intensity<br />
with the concentration of biological material at the corresponding position on a gel matrix. However, the background intensity<br />
of the image stands in the way of quantifying this association. We propose a novel, efficient methodology for mod-<br />
- 190 -
eling the image background with a polynomial function and prove that this can benefit the extraction of accurate information<br />
from the lane intensity profile when modeled by a superposition of properly shaped parametric functions.<br />
09:00-11:10, Paper WeAT9.33<br />
Heart Murmur Classification using Complexity Signatures<br />
Kumar, Dinesh, Univ. of Coimbra<br />
Carvalho, Paulo, Univ. of Coimbra<br />
Couceiro, Ricardo, Univ. of Coimbra<br />
Antunes, Manuel, Univ. Hospital of Coimbra<br />
Paiva, Rui Pedro, Univ. of Coimbra<br />
Henriques, Jorge, Univ. of Coimbra<br />
In this work, we propose a two-stage classifier based on the analysis of the heart sound’s complexity for murmur identification<br />
and classification. The first stage of the classifier verifies if the heart sound (HS) exhibits murmurs. To this end,<br />
the chaotic nature of the signal is assessed using the Lyapunov exponents (LEs). The second stage of the method is devoted<br />
to the classification of the type of murmur. In opposition to current state of the art methods for murmur classification, a<br />
reduced set of features is proposed. This set includes both well-known as well as new features designed to capture the<br />
morphological and the chaotic nature of murmurs. The classification scheme is evaluated with three classification methods:<br />
Learning Vector Quantization, Gaussian Mixture Models and Support Vector Machines. The achieved results are comparable<br />
to reported results in literature, while relying on a significant smaller set of features.<br />
09:00-11:10, Paper WeAT9.34<br />
3D Filtering for Injury Detection in Brain MRI<br />
Sun, Yu, Univ. of California, Riverside<br />
Bhanu, Bir, Univ. of California Riverside<br />
This paper introduces a brain injury detection approach, using 3D filtering technique, for the images acquired by the magnetic<br />
resonance imaging (MRI) technique. The proposed method uses the symmetry property of brain MRI on both 2D<br />
images and 3D volumetric information of the MRI sequences. The approach consists of two key steps: (1) each slice of a<br />
brain image is segmented into different parts using a region growing algorithm, and a symmetry affinity matrix is computed,<br />
(2) non-symmetric regions are extracted, and they are further used to detect brain injury. The Kalman filter is explicitly<br />
used in step (2) to filter out the non-injury regions in 3D. Experiments are carried out to indicate the high efficiency of the<br />
method to detect the brain injuries.<br />
09:00-11:10, Paper WeAT9.35<br />
Prediction of Protein Sub-Nuclear Location by Clustering mRMR Ensemble Feature Selection<br />
Sakar, Cemal Okan, Bahcesehir Univ.<br />
Kursun, Olcay, Istanbul Univ.<br />
Seker, Huseyin, De Montfort Univ.<br />
Gürgen, Fikret Boğaziçi Univ.<br />
In many applications of pattern recognition in the bioinformatics and biomedical fields, input variables are organized into<br />
natural partitions that are called views in the literature. Mutual information can be used in selecting a minimal yet capable<br />
subset of views. Ignoring the presence of views, dismantling them, and treating their variables intermixed along with those<br />
of others at best results in a complex uninterpretable predictive system for researchers in these fields. Moreover, it would<br />
require measuring or computing majority of the views. We use the clustering indices of the views and rank the views according<br />
to the unique information they have with the target using minimum redundancy-maximum relevance (mRMR)<br />
approach. We also propose an ensemble approach to reduce the random variations in clusterings.<br />
09:00-11:10, Paper WeAT9.36<br />
Multivariate Brain Mapping by Random Subspaces<br />
Sona, Diego, Fondazione Bruno Kessler<br />
Avesani, Paolo, Fondazione Bruno Kessler<br />
- 191 -
Functional neuroimaging consists in the use of imaging technologies allowing to record the functional brain activity in<br />
real-time. Among all techniques, data produced by functional magnetic resonance is encoded as sequences of 3D images<br />
of thousands of voxels. The main investigation performed on this data, termed brain mapping, aims at producing functional<br />
maps of the brain. Brain mapping aims at the detection of the portion of voxels concerned with specific perceptual or cognitive<br />
brain activities. This challenge can be shaped as a problem of feature selection. Excessive features-to-instances ratio<br />
characterizing this data is a major issue for the computation of statistically robust maps. We propose a solution based on<br />
a Random Subspace Method that extends the reference approach (Search Light) adopted by the neuroscientific community.<br />
A comparison of the two methods is supported by the results of an empirical evaluation.<br />
09:00-11:10, Paper WeAT9.37<br />
Dual Channel Colocalization for Cell Cycle Analysis using 3D Confocal Microscopy<br />
Jaeger, Stefan, Chinese Academy of Sciences<br />
Casas-Delucchi, Corella S., Tech. Univ. Darmstadt<br />
Cardoso, M. Cristina, Tech. Univ. Darmstadt<br />
Palaniappan, Kannappan, Univ. of Missouri<br />
We present a cell cycle analysis that aims towards improving our previous work by adding another channel and using one<br />
more dimension. The data we use is a set of 3D images of mouse cells captured with a spinning disk confocal microscope.<br />
All images are available in two channels showing the chromocenters and the fluorescently marked protein PCNA, respectively.<br />
In the present paper, we will describe our recent colocalization study in which we use Hessian-based blob detectors<br />
in combination with radial features to measure the degree of overlap between both channels. We show that colocalization<br />
performed in such a way provides additional discriminative power and allows us to distinguish between phases that we<br />
were not able to distinguish with a single 2D channel.<br />
09:00-11:10, Paper WeAT9.38<br />
Automated Cell Phase Classification for Zebrafish Fluorescence Microscope Images<br />
Lu, Yanting, Nanjing Univ. of Science and Tech.<br />
Lu, Jianfeng, Nanjing Univ. of Science and Tech.<br />
Liu, Tianming, Univ. of Georgia<br />
Yang,Jingyu, Univ. of Georgia<br />
Automated cell phenotype image classification is an interesting bioinformatics problem. In this paper, an automated cell<br />
phase classification framework is investigated for zebra fish presomitic mesoderm (PSM) images. Low image resolution,<br />
gradual transitions between adjacent categories and irregularity of real cell images make this classification task tough but<br />
intriguing. The proposed framework first segments zebra fish image into cell patches by a two-stage segmentation procedure,<br />
then extracts feature set NF9, which designed especially for this low resolution image set, on each cell patch, and finally<br />
employs support vector machine (SVM) as cell classifier. At present, the total accuracy by NF9 is 75%.<br />
09:00-11:10, Paper WeAT9.39<br />
Data-Driven Lung Nodule Models for Robust Nodule Detection in Chest CT<br />
Farag, Amal, Univ. of Louisville<br />
Graham, James, Univ. of Louisville<br />
Farag, Aly A., Univ. of Louisville<br />
The quality of the lung nodule models determines the success of lung nodule detection. This paper describes aspects of<br />
our data-driven approach for modeling lung nodules using the texture and shape properties of real nodules to form an average<br />
model template per nodule type. The ELCAP low dose CT (LDCT) scans database is used to create the required statistics<br />
for the models based on modern computer vision techniques. These models suit various machine learning approaches<br />
for nodule detection including Bayesian methods, SVM and Neural Networks, and computations may be enhanced through<br />
genetic algorithms and Adaboost. The eminence of the new nodule models are studied with respect to parametric models<br />
showing significant improvements in both sensitivity and specificity.<br />
- 192 -
09:00-11:10, Paper WeAT9.41<br />
Segmentation of Anatomical Structures in Brain MR Images using Atlases in FSL - a Quantitative Approach<br />
Soldea, Octavian, Sabanci Univ.<br />
Ekin, Ahmet, Philips Res. Europe<br />
Soldea, Diana Florentina, Sabanci Univ.<br />
Unay, Devrim, Bahcesehir Univ.<br />
Cetin, Mujdat, Sabanci Univ.<br />
Ercil, Aytul, Sabanci Univ.<br />
Uzunbas, Gokhan Mustafa, Rutgers State University<br />
Firat, Zeynep, Yeditepe University Hospital<br />
Cihangiroglu, Mutlu, Yeditepe University Hospital<br />
Segmentation of brain structures from MR images is crucial in understanding the disease progress, diagnosis, and treatment<br />
monitoring. Atlases, showing the expected locations of the structures, are commonly used to start and guide the segmentation<br />
process. In many cases, the quality of the atlas may have a significant effect in the final result. In the literature,<br />
commonly used atlases may be obtained from one subject’s data, only from the healthy, or depict only certain structures<br />
that limit their accuracy. Anatomical variations, pathologies, imaging artifacts all could aggravate the problems related to<br />
application of atlases. In this paper, we propose to use multiple atlases that are sufficiently different from each other as<br />
much as possible to handle such problems. To this effect, we have built a library of atlases and computed their similarity<br />
values to each other. Our study showed that the existing atlases have varying levels of similarity for different structures.<br />
09:00-11:10, Paper WeAT9.42<br />
Graphical Model-Based Tracking of Curvilinear Structures in Bio-Image Sequences<br />
Koulgi, Pradeep, Univ. of California, Santa Barbara<br />
Sargin, Mehmet Emre, Univ. of California, Santa Barbara<br />
Rose, Kenneth, Univ. of California, Santa Barbara<br />
Manjunath, B. S., Univ. of California, Santa Barbara<br />
Tracking of curvilinear structures is a task of fundamental importance in the quantitative analysis of biological structures<br />
such as neurons, blood vessels, retinal interconnects, microtubules, etc. The state of the art HMM-based contour tracking<br />
scheme for tracking microtubules, while performing well in most scenarios, can miss the track if, during its growth, it intersects<br />
another microtubule in its neighbourhood. In this paper we present a graphical model-based tracking algorithm<br />
which propagates across frames information about the dynamics of all the microtubules. This allows the algorithm to faithfully<br />
differentiate the contour of interest from others that contribute to the clutter, and maintain tracking accuracy. We<br />
present results of experiments on real microtubule images captured using fluorescence microscopy, and show that our proposed<br />
scheme outperforms the existing HMM-based scheme.<br />
11:10-12:10, WePL1 Anadolu Auditorium<br />
The Quantitative Analysis of User Behavior Online Data, Models and Algorithms<br />
Prabhakar Raghavan Plenary Session<br />
Yahoo! Research, USA<br />
Prabhakar Raghavan has been the head of Yahoo! Research since 2005. His research interests include text and web mining,<br />
and algorithm design. He is a consulting professor of Computer Science at Stanford University and editor-in-chief of the<br />
Journal of the ACM. Prior to joining Yahoo!, he was the chief technology officer at Verity and has held a number of technical<br />
and managerial positions at IBM Research. Prabhakar received his PhD from Berkeley and is a fellow of the ACM<br />
and of the IEEE.<br />
By blending principles from mechanism design, algorithms, machine learning and massive distributed computing, the search<br />
industry has become good at optimizing monetization on sound scientific principles. This represents a successful and<br />
growing partnership between computer science and microeconomics. When it comes to understanding how online users<br />
respond to the content and experiences presented to them, we have more of a lacuna in the collaboration between computer<br />
science and certain social sciences. We will use a concrete technical example from image search results presentation, developing<br />
in the process some algorithmic and machine learning problems of interest in their own right. We then use this<br />
example to motivate the kinds of studies that need to grow between computer science and the social sciences; a critical<br />
element of this is the need to blend large-scale data analysis with smaller-scale eye-tracking and “individualized” lab studies.<br />
- 193 -
WeBT1 Marmara Hall<br />
Tracking and Surveillance - III Regular Session<br />
Session chair: Liao, Mark (Univ. of Southampton)<br />
13:30-13:50, Paper WeBT1.1<br />
Object Tracking by Structure Tensor Analysis<br />
Donoser, Michael, Graz Univ. of Tech.<br />
Kluckner, Stefan, Graz Univ. of Tech.<br />
Bischof, Horst, Graz Univ. of Tech.<br />
Covariance matrices have recently been a popular choice for versatile tasks like recognition and tracking due to their powerful<br />
properties as local descriptor and their low computational demands. This paper outlines similarities of covariance matrices<br />
to the well-known structure tensor. We show that the generalized version of the structure tensor is a powerful descriptor and<br />
that it can be calculated in constant time by exploiting the properties of integral images. To measure the similarities between<br />
several structure tensors, we describe an approximation scheme which allows comparison in a Euclidean space. Such an approach<br />
is also much more efficient than the common, computationally demanding Riemannian Manifold distances. Experimental<br />
evaluation proves the applicability for the task of object tracking demonstrating improved performance compared to<br />
covariance tracking.<br />
13:50-14:10, Paper WeBT1.2<br />
Prototype Learning using Metric Learning based Behavior Recognition<br />
Zhu, Pengfei, Chinese Acad. of Sciences<br />
Hu, Weiming, Chinese Acad. of Sciences<br />
Yuan, Chunfeng, Chinese Acad. of Sciences<br />
Li, Li, Chinese Acad. of Sciences<br />
Behavior recognition is an attractive direction in the computer vision domain. In this paper, we propose a novel behavior<br />
recognition method based on prototype learning using metric learning. Prototype learning algorithm can improve the classification<br />
performance of nearest-neighbor classifier, reduce the storage and computation requirements. And the metric learning<br />
algorithm is used to advance the performance of the prototype learning. In this paper, We use a kind of compound feature<br />
including local feature and motion feature to recognize human behaviors. The experimental results show the effectiveness<br />
of our method.<br />
14:10-14:30, Paper WeBT1.3<br />
Are Correlation Filters Useful for Human Action Recognition?<br />
Ali, Saad, Carnegie Mellon Univ.<br />
Lucey, Simon, CSIRO<br />
It has been argued in recent work that correlation filters are attractive for human action recognition from videos. Motivation<br />
for their employment in this classification task lies in their ability to: (i) specify where the filter should peak in contrast to<br />
all other shifts in space and time, (ii) have some degree of tolerance to noise and intra-class variation (allowing learning<br />
from multiple examples), and (iii) can be computed deterministically with low computational overhead. Specifically, Maximum<br />
Average Correlation Height (MACH) filters have exhibited encouraging results~\cite{Mikel} on a variety of human<br />
action datasets. Here, we challenge the utility of correlation filters, like the MACH filter, in these circumstances. First, we<br />
demonstrate empirically that identical performance can be attained to the MACH filter by simply taking the~\emph{average}<br />
of the same action specific training examples. Second, we characterize theoretically and empirically under what circumstances<br />
a MACH filter would become equivalent to the average of the action specific training examples. Based on this characterization,<br />
we offer an alternative type of filter, based on a discriminative paradigm, that circumvent the inherent limitations of<br />
correlation filters for action recognition and demonstrate improved action recognition performance.<br />
14:30-14:50, Paper WeBT1.4<br />
Tracking Hand Rotation and Grasping from an IR Camera using Cylindrical Manifold Embedding<br />
Lee, Chan-Su, Yeungnam Univ.<br />
Park, Shin Won, Yeungnam Univ.<br />
- 194 -
This paper presents a new approach for hand rotation and grasping tracking from a single IR camera. For the complexity and<br />
ambiguity of hand pose, it is difficult to track hand pose and view variations simultaneously from a single camera. We propose<br />
a cylindrical manifold embedding for one dimensional hand pose variation and cyclic viewpoint variation. A hand pose shape<br />
from a specific viewpoint can be generated from an embedding point on the cylindrical manifold after learning nonlinear<br />
generative models from the embedding space to the corresponding observed shape. Hand grasping with simultaneous hand<br />
rotation is tracked using particle filter on the manifold space. Experimental results for synthetic and real data show accurate<br />
tracking of grasping hand with rotation. The proposed approach shows potentials for advanced user interface in dark environments.<br />
14:50-15:10, Paper WeBT1.5<br />
Particle Filter Tracking with Online Multiple Instance Learning<br />
Ni, Zefeng, Univ. of California, Santa Barbara<br />
Sunderrajan, Santhoshkumar, Univ. of California, Santa Barbara<br />
Rahimi, Amir, Univ. of California, Santa Barbara<br />
Manjunath, B. S., Univ. of California, Santa Barbara<br />
This paper addresses the problem of object tracking by learning a discriminative classifier to separate the object from its<br />
background. The online-learned classifier is used to adaptively model object’s appearance and its background. To solve the<br />
typical problem of erroneous training examples generated during tracking, an online multiple instance learning (MIL) algorithm<br />
is used by allowing false positive examples. In addition, particle filter is applied to make best use of the learned classifier<br />
and help to generate a better representative set of training examples for the online MIL learning. The effectiveness of the<br />
proposed algorithm is demonstrated in some challenging environments for human tracking.<br />
WeBT2 Topkapı Hall A<br />
Pattern Recognition Systems and Applications - I Regular Session<br />
Session chair: Fred, Ana Luisa Nobre (Instituto Superior Técnico)<br />
13:30-13:50, Paper WeBT2.1<br />
A Test of Granger Non-Causality based on Nonparametric Conditional Independence<br />
Seth, Sohan, Univ. of Florida<br />
Principe, Jose, Univ. of Florida<br />
In this paper we describe a test of Granger non-causality from the perspective of a new measure of nonparametric conditional<br />
independence. We apply the proposed test on two synthetic nonlinear problems where linear Granger causality fails and<br />
show that the proposed method is able to derive the true causal connectivity effectively.<br />
13:50-14:10, Paper WeBT2.2<br />
Haar Random Forest Features and SVM Spatial Matching Kernel for Stonefly Species Identification<br />
Larios, Natalia, Univ. of Washington<br />
Soran, Bilge, Univ. of Washington<br />
Shapiro, Linda,<br />
Martinez-Muñoz, Gonzalo, Univ. Autonoma de Madrid<br />
Lin, Junyuan, Oregon State Univ.<br />
Dietterich, Thomas G., Oregon State Univ.<br />
This paper proposes an image classification method based on extracting image features using Haar random forests and combining<br />
them with a spatial matching kernel SVM. The method works by combining multiple efficient, yet powerful, learning<br />
algorithms at every stage of the recognition process. On the task of identifying aquatic stonefly larvae, the method has stateof-the-art<br />
or better performance, but with much higher efficiency.<br />
14:10-14:30, Paper WeBT2.3<br />
Incorporating Lane Estimation as Context Source in Pedestrian Recognition Task<br />
Szczot, Magdalena, Daimler AG<br />
Dannenmann, Iris, Daimler AG<br />
Löhlein, Otto, Daimler AG<br />
- 195 -
This contribution presents a method for incorporating information given by a lane estimation system into the pedestrian<br />
recognition task. The lane in front of the vehicle is represented by a three dimensional set of points belonging to the middle<br />
of the road. A cascaded classifier solves the first stage of pedestrian recognition task delivering a list of detections in a camera<br />
image. We present a fusion system which combines the information provided by the cascaded classifier and the lane estimation.<br />
The fusion system delivers a probability map of the environment in front of the vehicle. The map indicates regions in<br />
front of the vehicle which with a certain probability contain a relevant detected pedestrian.<br />
14:30-14:50, Paper WeBT2.4<br />
PILL-ID: Matching and Retrieval of Drug Pill Imprint Images<br />
Lee, Young-Beom, Korea Univ.<br />
Park, Unsang, Michigan State Univ.<br />
Jain, Anil, Michigan State Univ.<br />
Automatic illicit drug pill matching and retrieval is becoming an important problem due to an increase in the number of<br />
tablet type illicit drugs being circulated in our society. We propose an automatic method to match drug pill images based on<br />
the imprints appearing on the tablet. This will help identify the source and manufacturer of the illicit drugs. The feature<br />
vector extracted from tablet images is based on edge localization and invariant moments. Instead of storing a single template<br />
for each pill type, we generate multiple templates during the edge detection process. This circumvents the difficulties during<br />
matching due to variations in illumination and viewpoint. Experimental results using a set of real drug pill images (822 illicit<br />
drug pill images and 1,294 legal drug pill images) showed 76.74% (93.02%) rank one (rank-20) matching accuracy.<br />
14:50-15:10, Paper WeBT2.5<br />
Identifying Gender from Unaligned Facial Images by Set Classification<br />
Chu, Wen-Sheng, Acad. Sinica<br />
Huang, Chun-Rong, Acad. Sinica<br />
Chen, Chu-Song, Acad. Sinica<br />
Rough face alignments lead to suboptimal performance of face identification systems. In this study, we present a novel approach<br />
for identifying genders from facial images without proper face alignments. Instead of using only one input for test,<br />
we generate an image set by randomly cropping out a set of image patches from a neighborhood of the face detection region.<br />
Each image set is represented as a subspace and compared with other image sets by measuring the canonical correlation between<br />
two associated subspaces. By finding an optimal discriminative transformation for all training subspaces, the proposed<br />
approach with unaligned facial images is shown to outperform the state-of-the-art methods with face alignment.<br />
WeBT3 Dolmabahçe Hall A<br />
Shape Modeling - II Regular Session<br />
Session chair: Imiya, Atsushi (Chiba Univ.)<br />
13:30-13:50, Paper WeBT3.1<br />
Detection of Shapes in 2D Point Clouds Generated from Images<br />
Su, Jingyong, Florida State Univ.<br />
Zhu, Zhiqiang, Florida State Univ.<br />
Srivastava, Anuj, Florida State Univ.<br />
Huffer, Fred W., Florida State Univ.<br />
We present a novel statistical framework for detecting pre-determined shape classes in 2D cluttered point clouds, which are<br />
in turn extracted from images. In this model based approach, we use a 1D Poisson process for sampling points on shapes, a<br />
2D Poisson process for points from background clutter, and an additive Gaussian model for noise. Combining these with a<br />
past stochastic model on shapes of continuous 2D contours, and optimization over unknown pose and scale, we develop a<br />
generalized likelihood ratio test for shape detection. We demonstrate the efficiency of this method and its robustness to clutter<br />
using both simulated and real data.<br />
- 196 -
13:50-14:10, Paper WeBT3.2<br />
Gait Learning-Based Regenerative Model: A Level Set Approach<br />
Al-Huseiny, Muayed Sattar, Univ. of Southampton<br />
Mahmoodi, Sasan, Univ. of Southampton<br />
Nixon, Mark, Univ. of Southampton<br />
We propose a learning method for gait synthesis from a sequence of shapes(frames) with the ability to extrapolate to novel<br />
data. It involves the application of PCA, first to reduce the data dimensionality to certain features, and second to model corresponding<br />
features derived from the training gait cycles as a Gaussian distribution. This approach transforms a non Gaussian<br />
shape deformation problem into a Gaussian one by considering features of entire gait cycles as vectors in a Gaussian space.<br />
We show that these features which we formulate as continuous functions can be modeled by PCA. We also use this model<br />
to in-between (generate intermediate unknown) shapes in the training cycle. Furthermore, this paper demonstrates that the<br />
derived features can be used in the identification of pedestrians.<br />
14:10-14:30, Paper WeBT3.3<br />
Scale-Space Spectral Representation of Shape<br />
Bates, Jonathan, Florida State Univ.<br />
Liu, Xiuwen, Florida State Univ.<br />
Mio, Washington, Florida State Univ.<br />
We construct a scale space of shape of closed Riemannian manifolds, equipped with metrics derived from spectral representations<br />
and the Hausdorff distance. The representation depends only on the intrinsic geometry of the manifolds, making it<br />
robust to pose and articulation. The computation of shape distance involves an optimization problem over the 2^p-element<br />
group of all p-bit strings, which is approached with Markov chain Monte Carlo techniques. The methods are applied to cluster<br />
surfaces in 3D space.<br />
14:30-14:50, Paper WeBT3.4<br />
Learning Metrics for Shape Classification and Discrimination<br />
Fan, Yu, Florida State Univ.<br />
Houle, David, Florida State Unversity<br />
Mio, Washington, Florida State Univ.<br />
We propose a family of shape metrics that generalize the classical Procrustes distance by attributing weights to general linear<br />
combinations of landmarks. We develop an algorithm to learn a metric that is optimally suited to a given shape classification<br />
problem. Shape discrimination experiments are carried out with phantom data, as well as landmark data representing the<br />
shape of the wing of different species of fruit flies.<br />
14:50-15:10, Paper WeBT3.5<br />
Non-Parametric 3D Shape Warping<br />
Hillenbrand, Ulrich, German Aerospace Center (DLR)<br />
A method is presented for non-rigid alignment of a source shape to a target shape through estimating and interpolating pointwise<br />
correspondences between their surfaces given as point clouds. The resulting mapping can be non-smooth and non-isometric,<br />
relate shapes across large variations, and find partial matches. It does not require a parametric model or a prior of<br />
deformations. Results are shown for some objects from the Princeton Shape Benchmark and a range scan.<br />
WeBT4 Dolmabahçe Hall B<br />
Image Denoising Regular Session<br />
Session chair: Skodras, A. (Hellenic Open Univ.)<br />
13:30-13:50, Paper WeBT4.1<br />
Edge Preserving Image Denoising in Reproducing Kernel Hilbert Spaces<br />
Bouboulis, Pantelis, Univ. of Athens<br />
Slavakis, Konstantinos, Univ. of Peloponnese<br />
Theodoridis, Sergios, Univ. of Athens<br />
- 197 -
The goal of this paper is the development of a novel approach for the problem of Noise Removal, based on the theory of<br />
Reproducing Kernels Hilbert Spaces (RKHS). The problem is cast as an optimization task in a RKHS, by taking advantage<br />
of the celebrated semi parametric Representer Theorem. Examples verify that in the presence of gaussian noise the proposed<br />
method performs relatively well compared to wavelet based techniques and outperforms them significantly in the presence<br />
of impulse or mixed noise.<br />
13:50-14:10, Paper WeBT4.2<br />
Multichannel Image Regularisation using Anisotropic Geodesic Filtering<br />
Grazzini, Jacopo, Los Alamos National Lab.<br />
Soille, Pierre, Ec. Joint Res. Centre<br />
Dillard, Scott, Los Alamos National Lab.<br />
This paper extends a recent image-dependent regularisation approach introduced in [Grazzini and Soille, PR09&CCIS09]<br />
aiming at edge-preserving smoothing. For that purpose, geodesic distances equipped with a Riemannian metric need to be<br />
estimated in local neighbourhoods. By deriving an appropriate metric from the gradient structure tensor, the associated geodesic<br />
paths are constrained to follow salient features in images. Following, we design a generalised anisotropic geodesic<br />
filter, incorporating not only a measure of the edge strength, like in the original method, but also further directional information<br />
about the image structures. The proposed filter is particularly efficient at smoothing heterogeneous areas while preserving<br />
relevant structures in multichannel images.<br />
14:10-14:30, Paper WeBT4.3<br />
Local Jet based Similarity for NL-Means Filtering<br />
Manzanera, Antoine, ENSTA-ParisTech<br />
Reducing the dimension of local descriptors in images is useful to perform pixels comparison faster. We show here that, for<br />
computing the NL-means denoising filter, image patches can be favourably replaced by a vector of spatial derivatives (local<br />
jet), to calculate the similarity between pixels. First, we present the basic, limited range implementation, and compare it with<br />
the original NL-means. We use a fast estimation of the noise variance to automatically adjust the decay parameter of the<br />
filter. Next, we present the unlimited range implementation using nearest neighbours search in the local jet space, based on<br />
a binary search tree representation.<br />
14:30-14:50, Paper WeBT4.4<br />
Image Denoising based on Fuzzy and Intra-Scale Dependency in Wavelet Transform Domain<br />
Saeedi, Jamal, Amirkabir Univ. of Tech.<br />
Moradi, Mohammad Hassan, Amirkabir Univ. of Tech.<br />
Abedi, Ali, Amirkabir Univ. of Tech.<br />
In this paper, we propose a new wavelet shrinkage algorithm based on fuzzy logic. Fuzzy logic is used for taking neighbor dependency and<br />
uncorrelated nature of noise into account in wavelet-based image denoising. For this reason, we use a fuzzy feature for enhancing wavelet coefficients<br />
information in the shrinkage step. Then a fuzzy membership function shrinks wavelet coefficients based on the fuzzy feature. We<br />
examine our image denoising algorithm in the dual-tree discrete wavelet transform, which is the new shiftable and modified version of discrete<br />
wavelet transform. Extensive comparisons with the state-of-the-art image denoising algorithm indicate that our image denoising algorithm<br />
has a better performance in noise suppression and edge preservation.<br />
14:50-15:10, Paper WeBT4.5<br />
Noise-Insensitive Contrast Enhancement for Rendering High-Dynamic-Range Images<br />
Lin, Hsueh-Yi Sean, Lunghwa Univ. of Science and Tech.<br />
The process of compressing the high luminance values into the displayable range inevitably incurs the loss of image contrasts. Although the<br />
local adaptation process, such as the two-scale contrast reduction scheme, is capable of preserving details during the HDR compression<br />
process, it cannot be used to enhance the local contrasts of image contents. Moreover, the effect of noise artifacts cannot be eliminated when<br />
the detail manipulation is subsequently performed. We propose a new tone reproduction scheme, which incorporates the local contrast enhancement<br />
and the noise suppression processes, for the display of HDR images. Our experimental results show that the proposed scheme is<br />
indeed effective in enhancing local contrasts of image contents and suppressing noise artifacts during the increase of the visibility of HDR<br />
scenes.<br />
- 198 -
WeBT5 Topkapı Hall B<br />
Feature Extraction for Face Recognition Regular Session<br />
Session chair: Govindaraju, Venu (Univ. at Buffalo)<br />
13:30-13:50, Paper WeBT5.1<br />
Monogenic Binary Pattern (MBP): A Novel Feature Extraction and Representation Model for Face Recognition<br />
Yang, Meng, The Hong Kong Pol. Univ.<br />
Zhang, Lei, The Hong Kong Pol. Univ.<br />
Zhang, Lin, The Hong Kong Pol. Univ.<br />
Zhang, David, The Hong Kong Pol. Univ.<br />
A novel feature extraction method, namely monogenic binary pattern (MBP), is proposed in this paper based on the theory<br />
of monogenic signal analysis, and the histogram of MBP (HMBP) is subsequently presented for robust face representation<br />
and recognition. MBP consists of two parts: one is monogenic magnitude encoded via uniform LBP, and the other is monogenic<br />
orientation encoded as quadrant-bit codes. The HMBP is established by concatenating the histograms of MBP of all<br />
sub-regions. Compared with the well-known and powerful Gabor filtering based LBP schemes, one clear advantage of<br />
HMBP is its lower time and space complexity because monogenic signal analysis needs fewer convolutions and generates<br />
more compact feature vectors. The experimental results on the AR and FERET face databases validate that the proposed<br />
MBP algorithm has better performance than or comparable performance with state-of-the-art local feature based methods<br />
but with significantly lower time and space complexity.<br />
13:50-14:10, Paper WeBT5.2<br />
Automatic Frequency Band Selection for Illumination Robust Face Recognition<br />
Ekenel, Hazim Kemal, Karlsruhe Inst. of Tech.<br />
Stiefelhagen, Rainer, Karlsruhe Inst. of Tech. & Fraunhofer IITB<br />
Varying illumination conditions cause a dramatic change in facial appearance that leads to a significant drop in face recognition<br />
algorithms’ performance. In this paper, to overcome this problem, we utilize an automatic frequency band selection<br />
scheme. The proposed approach is incorporated to a local appearance-based face recognition algorithm, which employs<br />
discrete cosine transform (DCT) for processing local facial regions. From the extracted DCT coefficients, the approach<br />
determines to the ones that should be used for classification. Extensive experiments conducted on the extended Yale face<br />
database B have shown that benefiting from frequency information provides robust face recognition under changing illumination<br />
conditions.<br />
14:10-14:30, Paper WeBT5.3<br />
Directed Random Subspace Method for Face Recognition<br />
Harandi, Mehrtash, NICTA<br />
Nili Ahmadabadi, Majid, Univ. of Tehran<br />
Nadjar Araabi, Babak, Univ. of Tehran<br />
Bigdeli, Abbas, NICTA<br />
Lovell, Brian Carrington, The Univ. of Queensland<br />
With growing attention to ensemble learning, in recent years various ensemble methods for face recognition have been<br />
proposed that show promising results. Among diverse ensemble construction approaches, random subspace method has<br />
received considerable attention in face recognition. Although random feature selection in random subspace method improves<br />
accuracy in general, it is not free of serious difficulties and drawbacks. In this paper we present a learning scheme<br />
to overcome some of the drawbacks of random feature selection in the random subspace method. The proposed learning<br />
method derives a feature discrimination map based on a measure of accuracy and uses it in a probabilistic recall mode to<br />
construct an ensemble of subspaces. Experiments on different face databases revealed that the proposed method gives superior<br />
performance over the well-known benchmarks and state of the art ensemble methods.<br />
14:30-14:50, Paper WeBT5.4<br />
Raw vs. Processed: How to Use the Raw and Processed Images for Robust Face Recognition under Varying Illuminatio<br />
Xu, Li, Chinese Acad. of Sciences<br />
- 199 -
Lei, Huang, Chinese Acad. of Sciences<br />
Liu, Changping, Chinese Acad. of Sciences<br />
Many previous image processing methods discard low-frequency components of images to extract illumination invariant<br />
for face recognition. However, this method may cause distortion of processed images and perform poorly under normal<br />
lighting. In this paper, a new method is proposed to deal with illumination problem in face recognition. Firstly, we define<br />
a score to denote a relative difference of the first and second largest similarities between the query input and the individuals<br />
in the gallery classes. Then, according to the score, we choose the appropriate images, raw or processed images, to involve<br />
the recognition. The experiment in ORL, CMU-PIE and Extended Yale B face databases shows that our adaptive method<br />
give more robust result after combination and perform better than the traditional fusion operators, the sum and the maximum<br />
of similarities.<br />
14:50-15:10, Paper WeBT5.5<br />
Discriminative Prototype Learning in Open Set Face Recognition<br />
Han, Zhongkai, Tsinghua Univ.<br />
Fang, Chi, Tsinghua Univ.<br />
Ding, Xiaoqing, Tsinghua Univ.<br />
We address the problem of prototype design for open set face recognition (OSFR) using single sample image. Normalized<br />
Correlation (NC), also known as Cosine Distance, offers many benefits in accuracy and robustness compared to other distance<br />
measurement in OSFR problem. Inspired by classical Learning Vector Quantization (LVQ), a novel discriminative<br />
learning method is proposed to design a discriminative prototype used by NC classifier. Specifically, we develop an objective<br />
function that fixes the NC score between the prototype and within-class sample at a high level and minimizes the<br />
similarity between the prototype and between-class samples. Several experiments conducted on benchmark databases<br />
demonstrate the superior performance of the prototype designed compared to the original one.<br />
WeBT6 Anadolu Auditorium<br />
Document Analysis - II Regular Session<br />
Session chair: Lopresti, Daniel (Lehigh Univ.)<br />
13:30-13:50, Paper WeBT6.1<br />
On-Line Handwriting Word Recognition using a Bi-Character Model<br />
Prum, Sophea, Univ. of La Rochelle<br />
Visani, Muriel, Univ. of La Rochelle<br />
Ogier, Jean-Marc, Univ. de la Rochelle<br />
This paper deals with on-line handwriting recognition. Analytic approaches have attracted an increasing interest during<br />
the last ten years. These approaches rely on a preliminary segmentation stage, which remains one of the most difficult<br />
problems and may affect strongly the quality of the global recognition process. In order to circumvent this problem, this<br />
paper introduces a bi-character model, where each character is recognized jointly with its neighboring characters. This<br />
model yields two main advantages. First, it reduces the number of confusions due to connections between characters<br />
during the character recognition step. Second, it avoids some possible confusion at the character recognition level during<br />
the word recognition stage. Our experimentation on significant databases shows some interesting improvements of the<br />
recognition rate, since the recognition rate is increased from 65% to 83% by using this bi-character strategy.<br />
13:50-14:10, Paper WeBT6.2<br />
Ruling Line Removal in Handwritten Page Images<br />
Lopresti, Daniel, Lehigh Univ.<br />
Kavallieratou, Ergina, Univ. of the Aegean<br />
In this paper we present a procedure for removing ruling lines from a handwritten document image that does not break existing<br />
characters. We take advantage of common ruling line properties such as uniform width, predictable spacing, position<br />
vs. text, etc. The proposed process has no effect on document images without ruling lines, hence no a priori discrimination<br />
is required. The system is evaluated on synthetic page images in five different languages.<br />
- 200 -
14:10-14:30, Paper WeBT6.3<br />
Script Identification – a Han & Roman Script Perspective<br />
Chanda, Sukalpa, GJØVIK Univ. Coll.<br />
Pal, Umapada, Indian Statistical Inst.<br />
Franke, Katrin, Gjøvik Univ. Coll.<br />
Kimura, Fumitaka, Mie Univ.<br />
All Han-based scripts (Chinese, Japanese, and Korean) possess similar visual characteristics. Hence system development<br />
for identification of Chinese, Japanese and Korean scripts from a single document page is quite challenging. It is noted<br />
that a Han-based document page might also have Roman script in them. A multi-script OCR system dealing with Chinese,<br />
Japanese, Korean, and Roman scripts, demands identification of scripts before execution of respective OCR modules. We<br />
propose a system to address this problem using directional features along with a Gaussian Kernel-based Support Vector<br />
Machine. We got promising results of 98.39% script identification accuracy at character level and 99.85% at block level,<br />
when no rejection was considered.<br />
14:30-14:50, Paper WeBT6.4<br />
Robust 1D Barcode Recognition on Mobile Devices<br />
Rocholl, Johann, Stuttgart Univ.<br />
Klenk, Sebastian, Stuttgart Univ.<br />
Heidemann, Gunther, Stuttgart Univ.<br />
In the following we will describe a novel method for decoding linear barcodes from blurry camera images. Our goal was<br />
to develop a algorithm that can be used on mobile devices to recognize product numbers from EAN or UPC barcodes.<br />
14:50-15:10, Paper WeBT6.5<br />
Fast Logo Detection and Recognition in Document Images<br />
Li, Zhe, Siemens AG<br />
Schulte-Austum, Matthias, Siemens AG<br />
Neschen, Martin, Recosys GmbH<br />
The scientific significance of automatic logo detection and recognition is more and more growing because of the increasing<br />
requirements of intelligent document image analysis and retrieval. In this paper, we introduce a system architecture which<br />
is aiming at segmentation-free and layout-independent logo detection and recognition. Along with the unique logo feature<br />
design, a novel way to ensure the geometrical relationships among the features, and different optimizations in the recognition<br />
process, this system can achieve improvements concerning both the recognition performance and the running time.<br />
The experimental results on several sets of real-word documents demonstrate the effectiveness of our approach.<br />
WeBT7 Dolmabahçe Hall C<br />
Classification in Biomedicine Regular Session<br />
Session chair: Gurcan, Metin (Ohio State Univ.)<br />
13:30-13:50, Paper WeBT7.1<br />
Joint Independent Component Analysis of Brain Perfusion and Structural Magnetic Resonance Images in Dementia<br />
Tosun, Duygu, Center for Imaging Neurodegenerative Diseases<br />
Rosen, Howard, UCSF<br />
Miller, Bruce L., UCSF<br />
Weiner, Michael W., UCSF<br />
Schuff, Norbert, UCSF<br />
Magnetic Resonance Imaging (MRI) provides various imaging modes to study the brain. We tested the benefits of joint<br />
analysis of multimodality MRI data using joint independent components analysis (jICA) in comparison to unimodality<br />
analyses. Specifically, we designed a jICA to decompose the joint distributions of multimodality MRI data across image<br />
voxels and subjects into independent components that explain joint variations between image modalities across subjects.<br />
We applied jICA to structural and perfusion-weighted MRI data from 12 patients diagnosed with behavioral variant front<br />
temporal dementia (bvFTD), a type of dementia, and 12 healthy elderly individuals. While unimodality analyses showed<br />
widespread brain atrophy and hypoperfusion in the patients, jICA further revealed links between atrophy and hypoperfusion<br />
- 201 -
in specific brain regions. Moreover, significant links were confined to the right brain hemisphere in FTLD, consistent with<br />
the clinical symptoms. Considering multimodality effect size between bvFTD patients and controls, brain atrophy and hypoperfusion<br />
regions identified by multimodality jICA yielded the large effect size while regions identified by unimodality<br />
analysis of atrophy and hypoperfusion differences revealed only a medium multimodality effect size between bvFTD patients<br />
and controls. The findings demonstrate the power of jICA to effectively evaluate multimodality brain imaging data.<br />
13:50-14:10, Paper WeBT7.2<br />
Endoscopic Image Classification using Edge-Based Features<br />
Häfner, Michael, St. Elisabeth Hospital<br />
Gangl, Alfred, Medical Univ. of Vienna<br />
Liedlgruber, Michael, Univ. of Salzburg<br />
Uhl, Andreas, Univ. of Salzburg<br />
Vécsei, Andreas, St. Anna Children’s Hospital<br />
Wrba, Friedrich, Medical Univ. of Vienna<br />
We present a system for an automated colon cancer detection based on the pit pattern classification. In contrast to previous<br />
work we exploit the visual nature of the underlying classification scheme by extracting features based on detected edges.<br />
To focus on the most discriminative subset of features we use a greedy forward feature subset selection. The classification<br />
is then carried out using the k-nearest neighbors (k-NN) classifier. The results obtained are very promising and show that<br />
an automated classification of the given imagery is feasible by using the proposed method.<br />
14:10-14:30, Paper WeBT7.3<br />
Biclustering of Expression Microarray Data with Topic Models<br />
Bicego, Manuele, Univ. of Verona<br />
Lovato, Pietro, Univ. of Verona<br />
Ferrarini, Alberto, Univ. of Verona<br />
Delledonne, Massimo, Univ. of Verona<br />
This paper presents an approach to extract biclusters from expression micro array data using topic models a class of probabilistic<br />
models which allow to detect interpretable groups of highly correlated genes and samples. Starting from a topic<br />
model learned from the expression matrix, some automatic rules to extract biclusters are presented, which overcome the<br />
drawbacks of previous approaches. The methodology has been positively tested with synthetic benchmarks, as well as<br />
with a real experiment involving two different species of grape plants (Vitis vinifera and Vitis riparia).<br />
14:30-14:50, Paper WeBT7.4<br />
A Multiple Instance Learning Approach Toward Optimal Classification of Pathology Slides<br />
Dundar, Murat, IUPUI<br />
Badve, Sunil, Indiana Univ.<br />
Raykar, Vikas, Siemens Medical<br />
Jain, Rohit, IUPUI<br />
Sertel, Olcay, The Ohio State Univ.<br />
Gurcan, Metin, The Ohio State Univ.<br />
Pathology slides are diagnosed based on the histological descriptors extracted from regions of interest (ROIs) identified<br />
on each slide by the pathologists. A slide usually contains multiple regions of interest and a positive (cancer) diagnosis is<br />
confirmed when at least one of the ROIs in the slide is identified as positive. For a negative diagnosis the pathologist has<br />
to rule out cancer for each and every ROI available. Our research is motivated toward computer-assisted classification of<br />
digitized slides. The objective in this study is to develop a classifier to optimize classification accuracy at the slide level.<br />
Traditional supervised training techniques which are trained to optimize classifier performance at the ROI level yield suboptimal<br />
performance in this problem. We propose a multiple instance learning approach based on the implementation of<br />
the large margin principle with different loss functions defined for positive and negative samples. We consider the classification<br />
of intraductal breast lesions as a case study, and perform experimental studies comparing our approach against<br />
the state-of-the-art.<br />
- 202 -
14:50-15:10, Paper WeBT7.5<br />
Gaussian ERP Kernel Classifier for Pulse Waveforms Classification<br />
Zuo, Wangmeng, Harbin Inst. of Tech.<br />
Zhang, Dongyu, Harbin Inst. of Tech.<br />
Zhang, David, The Hong Kong Pol. Univ.<br />
Wang, Kuanquan, Harbin Inst. of Tech.<br />
Li, Naimin, Harbin Inst. of Tech.<br />
While advances in sensor and signal processing techniques have provided effective tools for quantitative research on traditional<br />
Chinese pulse diagnosis (TCPD), the automatic classification of pulse waveforms is remained a difficult problem.<br />
To address this issue, this paper proposed a novel edit distance with real penalty (ERP)-based k-nearest neighbors (KNN)<br />
classifier by referring to recent progresses in time series matching and KNN classifier. Taking advantage of the metric<br />
property of ERP, we first develop a Gaussian ERP kernel, and then embed it into kernel difference-weighted KNN classifier.<br />
The proposed Gaussian ERP kernel classifier is evaluated on a dataset which includes 2470 pulse waveforms. Experimental<br />
results show that the proposed classifier is much more accurate than several other pulse waveform classification approaches.<br />
WeCT1 Marmara Hall<br />
Tracking and Surveillance - IV Regular Session<br />
Session chair: Carneiro, Gustavo (Technical Univ. of Lisbon)<br />
15:40-16:00, Paper WeCT1.1<br />
Human 3D Motion Recognition based on Spatial-Temporal Context of Joints<br />
Zhao, Qiong, Univ. of Science and Tech. of China<br />
Wang, Lihua, City Univ. of Hong Kong<br />
Ip, Horace,<br />
Zhou, Xuehai, Univ. of Science and Tech. of China<br />
The paper presents a novel human motion recognition method based on a new form of the Hidden Markov Models, called<br />
spatial-temporal hidden markov models (ST-HMM), which can be learnt from a sequence of joints positions. To cope with<br />
the high dimensionality of the pose space, in this paper, we exploit the spatial dependency between each pair of spatially<br />
connected joints in the articulated skeletal structure, as well as the temporal dependency due to the continuous movement<br />
of each of the joints. The spatial-temporal contexts of these joints are learnt from the sequences of joints movements and<br />
captured by our ST-HMM. Results of recognizing 11 different action classes on a large number of motion capture sequences<br />
as well as synthetic tracking data show that our approach outperforms traditional HMM approach in terms of robustness<br />
and recognition rates.<br />
16:00-16:20, Paper WeCT1.2<br />
Matching Groups of People by Covariance Descriptor<br />
Cai, Yinghao, Univ. of Oulu<br />
Takala, Valtteri, Univ. of Oulu<br />
Pietikäinen, Matti, Univ. of Oulu<br />
In this paper, we present a new solution to the problem of matching groups of people across multiple non-overlapping<br />
cameras. Similar to the problem of matching individuals across cameras, matching groups of people also faces challenges<br />
such as variations of illumination conditions, poses and camera parameters. Moreover, people often swap their positions<br />
while walking in a group. In this paper, we propose to use covariance descriptor in appearance matching of group images.<br />
Covariance descriptor is shown to be a discriminative descriptor which captures both appearance and statistical properties<br />
of image regions. Furthermore, it presents a natural way of combining multiple heterogeneous features together with a<br />
relatively low dimensionality. Experimental results on two different datasets demonstrate the effectiveness of the proposed<br />
method.<br />
16:20-16:40, Paper WeCT1.3<br />
Boosting Incremental Semi-Supervised Discriminant Analysis for Tracking<br />
Wang, Heng, Chinese Acad. of Sciences<br />
Hou, Xinwen, Chinese Acad. of Sciences<br />
- 203 -
Liu, Cheng-Lin, Chinese Acad. of Sciences<br />
Tracking is recently formulated as a problem of discriminating the object from its nearby background, where the classifier<br />
is updated by new samples successively arriving during tracking. Depending on whether labeling the samples or not, the<br />
tracker can be designed in a supervised or semi-supervised manner. This paper proposes a novel semi-supervised algorithm<br />
for tracking by combining Semi-supervised Discriminant Analysis (SDA) with an online boosting framework. Using the<br />
local geometric structure information from the samples, the SDA-based weak classifier is made more robust to outliers.<br />
Meanwhile, we design an incremental updating mechanism for SDA so that it can adapt to appearance changes. We further<br />
propose an Extended SDA (ESDA) algorithm, which gives better discrimination ability. Results on several challenging<br />
video sequences demonstrate the effectiveness of the method.<br />
16:40-17:00, Paper WeCT1.4<br />
Optical Rails: View-Based Track Following with Hemispherical Environment Model and Orientation View<br />
Descriptors<br />
Dederscheck, David, Goethe Univ. Frankfurt<br />
Zahn, Martin, Goethe Univ. Frankfurt<br />
Friedrich, Holger, Goethe Univ. Frankfurt<br />
Mester, Rudolf, Goethe Univ. Frankfurt<br />
We present a purely view-based method for robot navigation along a prerecorded track using compact omni directional<br />
view-descriptors. This paper focuses on a new model for the navigation environment to determine the steering direction<br />
by efficient holistic comparison of views. The concept of view descriptors based on low-order expansion of local orientation<br />
vectors into spherical harmonic basis functions is augmented by a linear illumination model, providing discriminative<br />
view matching also under illumination changes.<br />
17:00-17:20, Paper WeCT1.5<br />
Forward-Backward Error: Automatic Detection of Tracking Failures<br />
Kalal, Zdenek, Univ. of Surrey<br />
Mikolajczyk, Krystian, Univ. of Surrey<br />
Matas, Jiri, CTU Prague<br />
This paper proposes a novel method for tracking failure detection. The detection is based on the Forward-Backward error,<br />
i.e. the tracking is performed forward and backward in time and the discrepancies between these two trajectories are measured.<br />
We demonstrate that the proposed error enables reliable detection of tracking failures and selection of reliable trajectories<br />
in video sequences. We demonstrate that the approach is complementary to commonly used normalized<br />
cross-correlation (NCC). Based on the error, we propose a novel object tracker called Median Flow. State-of-the-art performance<br />
is achieved on challenging benchmark video sequences which include non-rigid objects.<br />
WeCT2 Topkapı Hall A<br />
Pattern Recognition Systems and Applications - II Regular Session<br />
Session chair: Marinai, Simone (Univ. of Florence)<br />
15:40-16:00, Paper WeCT2.1<br />
Scene-Adaptive Human Detection with Incremental Active Learning<br />
Joshi, Ajay, Univ. of Minnesota, Twin Cities<br />
Porikli, Fatih, MERL<br />
In many computer vision tasks, scene changes hinder the generalization ability of trained classifiers. For instance, a human<br />
detector trained with one set of images is unlikely to perform well in different scene conditions. In this paper, we propose<br />
an incremental learning method for human detection that can take generic training data and build a new classifier adapted<br />
to the new deployment scene. Two operation modes are proposed: i) a completely autonomous mode wherein first few<br />
empty frames of video are used for adaptation, and ii) an active learning approach with user in the loop, for more challenging<br />
scenarios including situations where empty initialization frames may not exist. Results show the strength of the<br />
proposed methods for quick adaptation.<br />
- 204 -
16:00-16:20, Paper WeCT2.2<br />
Direct Printability Prediction in VLSI using Features from Orthogonal Transforms<br />
Kryszczuk, Krzysztof, IBM Zurich Res. Lab.<br />
Hurley, Paul, IBM Zurich Res. Lab.<br />
Sayah, Robert, IBM Systems and Tech. Group<br />
Full-chip printability simulations for VLSI layouts use analytical and heuristic physical process models, and require an<br />
explicit creation of a mask and image. This is a computationally expensive task, often prohibitively so, especially when<br />
prototyping new designs. In this paper we show that using orthogonal transform-based fixed-length feature vector representations<br />
of 22nm VLSI layouts to perform classification based rapid printability prediction, can help in avoiding or reducing<br />
the number of simulations. Furthermore, in order to overcome the problem of scarcity of training data, we show<br />
how re-scaled, abundant 45nm designs can train error prediction models for new, native 22nm designs. Our experiments,<br />
run on M1 layer data and line width errors, demonstrate the viability of the proposed approach.<br />
16:20-16:40, Paper WeCT2.3<br />
Improving Performance of Network Traffic Classification Systems by Cleaning Training Data<br />
Gargiulo, Francesco, Univ. of Naples Federico II<br />
Sansone, Carlo, Univ. of Naples Federico II<br />
In this paper we propose to apply an algorithm for finding out and cleaning mislabeled training sample in an adversarial<br />
learning context, in which a malicious user tries to camouflage training patterns in order to limit the classification system<br />
performance. In particular, we describe how this algorithm can be effectively applied to the problem of identifying HTTP<br />
traffic flowing through port TCP 80, where mislabeled samples can be forced by using port-spoofing attacks.<br />
16:40-17:00, Paper WeCT2.4<br />
Bayesian Networks for Predicting IVF Blastocyst Development<br />
Uyar, Asli, Bogazici Univ.<br />
Bener, Ayse, Bogazici Univ.<br />
Ciray, H. Nadir, Bahceci Woman Healthcare Centre<br />
Bahceci, Mustafa, Bahceci Woman Healthcare Centre<br />
In in-vitro fertilization (IVF) treatment, blastocyst stage embryo transfers at day 5 result in higher pregnancy rates. However,<br />
there is a risk of transfer cancelation due to embryonic developmental failure. Clinicians need reliable models in<br />
predicting blastocyst development. In this study, we apply Bayesian networks in order to investigate cause-effect relationships<br />
of the variables of interest in embryo growth process and to predict blastocyst development. We have analyzed 7745<br />
embryo records including embryo morphological characteristics and patient related data. Experimental results revealed<br />
that, Bayesian networks can predict blastocyst development with 63.5% true positive rate and 33.8% false positive rate.<br />
17:00-17:20, Paper WeCT2.5<br />
Spectral Invariant Representation for Spectral Reflectance Image<br />
Ibrahim, Abdelhameed, Chiba Univ.<br />
Tominaga, Shoji, Chiba Univ.<br />
Horiuchi, Takahiko, Chiba Univ.<br />
Although spectral images contain large amount of information, compared with color images, the image acquisition is affected<br />
by several factors such as shading and specular highlight. Many researchers have introduced color invariant and<br />
spectral invariant representations for these factors using the standard dichromatic reflection model of inhomogeneous dielectric<br />
materials. However, these representations are inadequate for other materials like metal. This paper proposes a<br />
more general spectral invariant representation for obtaining reliable spectral reflectance images. Our invariant representation<br />
is derived from the standard dichromatic reflection model for dielectric materials and the extended dichromatic reflection<br />
model for metals. We proof the invariant formulas for spectral images of most natural objects preserve spectral<br />
information and are invariant to highlights, shading, surface geometry, and illumination intensity. The method is applied<br />
to the problem of material classification and image segmentation of a raw circuit board. Experiments are done with real<br />
spectral images to examine the performance of the proposed method.<br />
- 205 -
WeCT3 Dolmabahçe Hall A<br />
Active Contours and Related Methods Regular Session<br />
Session chair: Burkhardt, Hans (Univ. of Freiburg)<br />
15:40-16:00, Paper WeCT3.1<br />
Level Set based Segmentation using Local Feature Distribution<br />
Xie, Xianghua, Swansea Univ.<br />
We propose a level set based framework to segment textured images. The snake deforms in the image domain in searching<br />
for object boundaries by minimizing an energy functional, which is defined based on dynamically selected local distribution<br />
of orientation invariant features. We also explore the user initialization to simplify the segmentation and improve accuracy.<br />
Experimental results on both synthetic and real data show significant improvements compared to direct modeling of filtering<br />
responses or piecewise constant modeling.<br />
16:00-16:20, Paper WeCT3.2<br />
Mean Shift Gradient Vector Flow: A Robust External Force Field for 3D Active Surfaces<br />
Keuper, Margret, Univ. of Freiburg<br />
Padeken, Jan, Max-Planck-Inst. of Immunobiology<br />
Heun, Patrick, Max-Planck-Inst. of Immunobiology<br />
Burkhardt, Hans, Univ. of Freiburg<br />
Ronneberger, Olaf, Univ. of Freiburg<br />
Gradient vector flow snakes are a very common method in bio-medical image segmentation. The use of gradient vector flow<br />
herein brings some major advantages like a large capture range and a good adaption of the snakes in concave regions. In<br />
some cases though, the application of gradient vector flow can also have undesired effects, e.g. if only parts of an image are<br />
strongly blurred, the remaining weak gradients will be smoothed away. Also, large gradients resulting from small but bright<br />
image structures usually have strong impact on the overall result. To tackle this problem, we present an improvement of the<br />
gradient vector flow, using the mean shift procedure and show its advantages on the segmentation of 3D cell nuclei.<br />
16:20-16:40, Paper WeCT3.3<br />
Adaptive Diffusion Flow for Parametric Active Contours<br />
Wu, Yuwei, Beijing Inst. of Tech.<br />
Wang, Yuanquan, Beijing Inst. of Tech.<br />
Jia, Yunde, Beijing Inst. of Tech.<br />
This paper proposes a novel external force for active contours, called adaptive diffusion flow (ADF). We reconsider the generative<br />
mechanism of gradient vector flow (GVF) diffusion process from the perspective of image restoration, and exploit a<br />
harmonic hyper surface minimal function to substitute smoothness energy term of GVF for alleviating the possible leakage<br />
problem. Meanwhile, a Laplacian functional is incorporated in the ADF framework to ensure that the vector flow diffuses<br />
mainly along normal direction in homogenous regions of an image. Experiments on synthetic and real images demonstrate<br />
the good properties of the ADF snake, including noise robustness, weak edge preserving, and concavity convergence.<br />
16:40-17:00, Paper WeCT3.4<br />
Using Snakes with Asymmetric Energy Terms for the Detection of Varying-Contrast Edges in SAR Images<br />
Seppke, Benjamin, Univ. of Hamburg<br />
Dreschler-Fischer, Leonie, Univ. of Hamburg<br />
Hübbe, Nathanael, Univ. of Hamburg<br />
Active contour methods like snakes, have become a basic tool in computer vision and image analysis over the last years.<br />
They have proven to be adequate for the task of finding boundary features like broken edges in an image. However, when<br />
applying the basic snake technique to synthetic aperture radar (SAR) remote sensing images, the detection of varying-contrast<br />
edges may not be satisfying. This is caused by the special imaging technique of SAR and the commonly known specklenoise.<br />
In this paper we propose the use of asymmetric external energy terms to cope with this problem. We show first results of the<br />
method for the detection of edges of tidal creeks using an ENVISAT ASAR image. These creeks can be found in the World<br />
Heritage Site Wadden Sea located at the German Bight (North Sea).<br />
- 206 -
17:00-17:20, Paper WeCT3.5<br />
Length Increasing Active Contour for the Segmentation of Small Blood Vessels<br />
Rivest-Hénault, David, École de Tech. Supérieure<br />
Deschênes, Sylvain, Sainte-Justine Hospital<br />
Lapierre, Chantale, Hospital Sainte-Justine<br />
Cheriet, Mohammed, École de Tech. Supérieure<br />
A new level-set based active contour method for the segmentation of small blood vessels and other elongated structures<br />
is presented. Its main particularity is the presence of a length increasing force in the contour driving equation. The effect<br />
of this force is to push the active contour in the direction of thin elongated shapes. Although the proposed force is not<br />
stable in general, our experiments show that with few precautions it can successfully be integrated in a practical segmentation<br />
scheme and that it helps to segment a longer part of the structures of interest. For the segmentation of blood vessels,<br />
this may reduce the amount of user interactivity needed: only a small region inside the structure of interest need to be<br />
specified.<br />
WeCT4 Anadolu Auditorium<br />
Graphical Models and Bayesian Methods Regular Session<br />
Session chair: Murino, Vittorio (Univ. of Verona)<br />
15:40-16:00, Paper WeCT4.1<br />
Using Sequential Context for Image Analysis<br />
Paiva, Antonio, Univ. of Utah<br />
Jurrus, Elizabeth, Univ. of Utah<br />
Tasdizen, Tolga, Univ. of Utah<br />
This paper proposes the sequential context inference (SCI) algorithm for Markov random field (MRF) image analysis.<br />
This algorithm is designed primarily for fast inference on an MRF model, but its application requires also a specific modeling<br />
architecture. The architecture is composed of a sequence of stages, each modeling the conditional probability of the<br />
labels, conditioned on a neighborhood of the input image and output of the previous stage. By learning the model at each<br />
stage sequentially with regards to the true output labels, the stages learn different models which can cope with errors in<br />
the previous stage.<br />
16:00-16:20, Paper WeCT4.2<br />
Recovery Video Stabilization using MRF-MAP Optimization<br />
Kim, Soo Wan, Seoul National Univ.<br />
Yi, Kwang Moo, Automation and System Res. Inst. Univ.<br />
Oh, Songhwai, Seoul National Univ.<br />
Choi, Jin Young, Seoul National University<br />
In this paper, we propose a novel approach for video stabilization using Markov random field (MRF) modeling and maximum<br />
a posteriori (MAP) optimization. We build an MRF model describing a sequence of unstable images and find joint<br />
pixel matchings over all image sequences with MAP optimization via Gibbs sampling. The resulting displacements of<br />
matched pixels in consecutive frames indicate the camera motion between frames and can be used to remove the camera<br />
motion to stabilize image sequences. The proposed method shows robust performance even when a scene has moving<br />
foreground objects and brings more accurate stabilization results. The performance of our algorithm is evaluated on outdoor<br />
scenes.<br />
16:20-16:40, Paper WeCT4.3<br />
Annealed SMC Samplers for Dirichlet Process Mixture Models<br />
Ülker, Yener, Istanbul Tech. Univ.<br />
Gunsel, Bilge, Istanbul Tech. Univ.<br />
Cemgil, Ali Taylan, Bogazici Univ.<br />
In this work we propose a novel algorithm that approximates sequentially the Dirichlet Process Mixtures (DPM) model<br />
posterior. The proposed method takes advantage of the Sequential Monte Carlo (SMC) samplers framework to design an<br />
- 207 -
effective annealing procedure that prevents the algorithm to get trapped in a local mode. We evaluate the performance in<br />
a Bayesian density estimation problem with unknown number of components. The simulation results suggest that the proposed<br />
algorithm represents the target posterior much more accurately and provides significantly smaller Monte Carlo error<br />
when compared to particle filtering.<br />
16:40-17:00, Paper WeCT4.4<br />
Bayesian Inference for Nonnegative Matrix Factor Deconvolution Models<br />
Kirbiz, Serap, Istanbul Tech. Univ.<br />
Cemgil, Ali Taylan, Bogazici Univ.<br />
Gunsel, Bilge, Istanbul Tech. Univ.<br />
In this paper we develop a probabilistic interpretation and a full Bayesian inference for non-negative matrix deconvolution<br />
(NMFD) model. Our ultimate goal is unsupervised extraction of multiple sound objects from a single channel auditory<br />
scene. The proposed method facilitates automatic model selection and determination of the sparsity criteria. Our approach<br />
retains attractive features of standard NMFD based methods such as fast convergence and easy implementation. We demonstrate<br />
the use of this algorithm in the log-frequency magnitude spectrum domain, where we employ it to perform model<br />
order selection and control sparseness directly.<br />
17:00-17:20, Paper WeCT4.5<br />
A Graph Matching Algorithm using Data-Driven Markov Chain Monte Carlo Sampling<br />
Lee, Jungmin, Seoul National Univ.<br />
Cho, Minsu, Seoul National Univ.<br />
Lee, Kyoung Mu, Seoul National Univ.<br />
We propose a novel stochastic graph matching algorithm based on data-driven Markov Chain Monte Carlo (DDMCMC)<br />
sampling technique. The algorithm explores the solution space efficiently and avoid local minima by taking advantage of<br />
spectral properties of the given graphs in data-driven proposals. Thus, it enables the graph matching to be robust to deformation<br />
and outliers arising from the practical correspondence problems. Our comparative experiments using synthetic<br />
and real data demonstrate that the algorithm outperforms the state-of-the-art graph matching algorithms.<br />
WeCT5 Topkapı Hall B<br />
Image Processing Applications Regular Session<br />
Session chair: Zafeiriou, Stefanos (Imperial College of London)<br />
15:40-16:00, Paper WeCT5.1<br />
Tensor-Driven Hyperspectral Denoising: A Strong Link for Classification Chains?<br />
Martín-Herrero, Julio, Univ. de Vigo<br />
Ferreiro-Armán, Marcos, Univ. de Vigo<br />
We show how a tensor-driven anisotropic diffusion denoising method affects the performance of a classifier trained to<br />
discriminate among vine varieties in noisy hyper spectral images. We compare the classification statistics on the original<br />
and denoised images and discuss the convenience of this kind of preprocessing for classification in hyperspectral images.<br />
16:00-16:20, Paper WeCT5.2<br />
Search Strategies for Image Multi-Distortion Estimation<br />
Caron, Andre Louis, Univ. of Sherbrooke<br />
Jodoin, Pierre-Marc, Univ. of Sherbrooke<br />
Charrier, Christophe, Univ. de Caen<br />
In this paper, we present a method for estimating the amount of Gaussian noise and Gaussian blur in a distorted image.<br />
Our method is based on the MS-SSIM framework which, although designed to measure image quality, is used to estimate<br />
the amount of blur and noise in a degraded image given a reference image. Various search strategies such as Newton, Simplex,<br />
and brute force search are presented and rigorously compared. Based on quantitative results, we show that the amount<br />
of blur and noise in a distorted image can be recovered with an accuracy up to 0.95% and 5.40%, respectively. To our<br />
knowledge, such precision has never been achieved before.<br />
- 208 -
16:20-16:40, Paper WeCT5.3<br />
Development of a High-Definition and Multispectral Image Capturing System for Digital Archiving of Early Modern<br />
Tapestries of the Kyoto Gion Festival<br />
Tsuchida, Masaru, NTT Corp.<br />
Tanaka, Hiromi, Ritsumei Univ.<br />
Yano, Keiji, Ritsumeikan Univ.<br />
We developed a two-shot 6-band image capturing system consisting of a large-format camera, a customized interference<br />
filter, and a scanning digital back to capture a 185-M-pixel images. The interference filter is set in front of the camera lens<br />
to obtain a 6-band image, that is, two 3-band images, one taken with the filter and the other without it. After correction of<br />
optical aberrations caused by the interference filter as well as system arrangement errors, the two images are combined<br />
into a 6-band image. The 6-band image was converted into a color-managed RGB image embedded ICC profile. In experiments,<br />
object images were captured as several divided parts and synthesized as almost 500-M-pixel image by using an<br />
image stitching technique. Resolution of the captured images is 0.02 mm/pixel. This paper discusses the camera system<br />
with its focus on some early modern tapestries used in the Kyoto Gion Festival. After the experiments, we interviewed a<br />
craftsman to assess the image’s importance in archiving and analyzing fabric structures.<br />
16:40-17:00, Paper WeCT5.4<br />
Appearance Control using Projection with Model Predictive Control<br />
Amano, Toshiyuki, Nara Inst. of Science and Tech.<br />
Kato, Hirokazu, Nara Inst. of Science and Tech.<br />
The unified technique for the irradiance correction and appearance enhancement for the real scene is proposed in this<br />
paper. The proposed method employed MPC algorithm for the projector camera system and enabled arbitrary appearance<br />
control such like photo retouching software in the real world. In the experiment, the appearance control of saturation enhancement,<br />
color removal, phase control, edge enhancement, image blur, makes unique brightness and the other enhancements<br />
for the real scene are shown.<br />
17:00-17:20, Paper WeCT5.5<br />
Decision Trees for Fast Thinning Algorithms<br />
Grana, Costantino, Univ. degli Studi di Modena e Reggio Emilia<br />
Borghesani, Daniele, Univ. degli Studi di Modena e Reggio Emilia<br />
Cucchiara, Rita, Univ. degli Studi di Modena e Reggio Emilia<br />
We propose a new efficient approach for neighborhood exploration, optimized with decision tables and decision trees,<br />
suitable for local algorithms in image processing. In this work, it is employed to speed up two widely used thinning techniques.<br />
The performance gain is shown over a large freely available dataset of scanned document images.<br />
WeCT6 Dolmabahçe Hall B<br />
Iris Regular Session<br />
Session chair: Kittler, Josef (Univ. of Surrey)<br />
15:40-16:00, Paper WeCT6.1<br />
Personal Identification from Iris Images using Localized Radon Transform<br />
Zhou, Yingbo, The Hong Kong Pol. Univ.<br />
Kumar, Ajay, The Hong Kong Pol. Univ.<br />
Personal identification using iris images has invited lots of attention in the literature and offered higher accuracy. However,<br />
the computational complexity in the feature extraction from the normalized iris images is still of key concern and further<br />
efforts are required to develop efficient feature extraction approaches. In this paper, we investigate a new approach for the<br />
efficient and effective extraction of iris features using localized Radon transforms. The feature extraction process exploits<br />
the orientation information from the local iris texture features using finite Radon transform. The dominant orientation<br />
from these Radon transform features is used to generate a binarized/compact feature representation. The similarity between<br />
two feature vectors is computed from the minimum matching distance that can account for the variations resulting from<br />
translation and rotation of the images. The feasibility of this approach is rigorously evaluated on two publically available<br />
iris image databases, i.e. IITD iris image database v1 and CASIA v3 iris image database. We also investigate the multi-<br />
- 209 -
scale analysis of iris images to enhance the performance. The experimental results presented in this paper are highly promising<br />
and suggest the computationally attractive alternative for the online iris identification.<br />
16:00-16:20, Paper WeCT6.2<br />
Segmentation of Unideal Iris Images using Game Theory<br />
Roy, Kaushik, Concordia Univ.<br />
Bhattacharya, Prabir, Concordia Univ.<br />
Suen, Ching Y.<br />
Robust localization of inner/outer boundary from an iris image plays an important role in iris recognition. However, the<br />
conventional iris/pupil localization methods using the region-based segmentation or the gradient-based boundary finding<br />
are often hampered by non-linear deformations, pupil dilations, head rotations, motion blurs, reflections, non-uniform intensities,<br />
low image contrast, camera angles and diffusions, and presence of eyelids and eyelashes. The novelty of this research<br />
effort is that we apply a parallel game-theoretic decision making procedure by using the modified Chakra borty<br />
and Duncan’s algorithm, which integrates the region-based segmentation and gradient-based boundary finding methods<br />
and fuses the complementary strengths of each of these individual methods. This integrated scheme forms a unified approach,<br />
which is robust to noise and poor localization.<br />
16:20-16:40, Paper WeCT6.3<br />
Iris-Biometric Hash Generation for Biometric Database Indexing<br />
Rathgeb, Christian, Univ. of Salzburg<br />
Uhl, Andreas, Univ. of Salzburg<br />
Performing identification on large-scale biometric databases requires an exhaustive linear search. Since biometric data<br />
does not have any natural sorting order, indexing databases, in order to minimize the response time of the system, represents<br />
a great challenge. In this work we propose a biometric hash generation technique for the purpose of biometric database<br />
indexing, applied to iris biometrics. Experimental results demonstrate that the presented approach highly accelerates biometric<br />
identification.<br />
16:40-17:00, Paper WeCT6.4<br />
A Robust Iris Localization Method using an Active Contour Model and Hough Transform<br />
Koh, Jaehan, SUNY Buffalo<br />
Govindaraju, Venu, Univ. at Buffalo<br />
Chaudhary, Vipin, SUNY Buffalo<br />
Iris segmentation is one of the crucial steps in building an iris recognition system since it affects the accuracy of the iris<br />
matching significantly. This segmentation should accurately extract the iris region despite the presence of noises such as<br />
varying pupil sizes, shadows, specular reflections and highlights. Considering these obstacles, several attempts have been<br />
made in robust iris localization and segmentation. In this paper, we propose a robust iris localization method that uses an<br />
active contour model and a circular Hough transform. Experimental results on 100 images from CASIA iris image database<br />
show that our method achieves 99% accuracy and is about 2.5 times faster than the Daugman’s in locating the pupillary<br />
and the limbic boundaries.<br />
17:00-17:20, Paper WeCT6.5<br />
Isis: Iris Segmentation for Identification Systems<br />
Nappi, Michele, Univ. of Salerno<br />
Riccio, Daniel, Univ. of Salerno<br />
De Marsico, Maria, Sapienza Univ. of Rome<br />
Advances in processing procedures make the iris a realistic candidate to the role of biometry of the future. Precise detection<br />
and segmentation for such biometry are a crucial ongoing research area. We propose an iris segmentation technique and<br />
show that it is more reliable than existent ones.<br />
- 210 -
WeCT7 Dolmabahçe Hall C<br />
Handwriting Recognition Regular Session<br />
Session chair: Doermann, David (Univ. of Maryland)<br />
15:40-16:00, Paper WeCT7.1<br />
Consensus Network based Hypotheses Combination for Arabic Offline Handwriting Recognition<br />
Prasad, Rohit, Raytheon BBN Tech.<br />
Kamali, Matin, BBN Tech.<br />
Belanger, David, Raytheon BBN Tech.<br />
Rosti, Antti-Veikko, Raytheon BBN Tech.<br />
Matsoukas, Spyros, Raytheon BBN Tech.<br />
Natarajan, P., BBN Tech.<br />
Offline handwriting recognition (OHR) is an extremely challenging task because of many factors including variations in<br />
writing style, writing device and material, and noise in the scanning and collection process. Due to the diverse nature of<br />
the above challenges, it is highly unlikely that a single recognition technique can address all the characteristics of realworld<br />
handwritten documents. Therefore, one must consider designing different systems, each addressing specific challenges<br />
in the handwritten corpus, and then combining the hypotheses from these diverse systems. To that end, we present<br />
an innovative approach for combining hypotheses from multiple handwriting recognition systems. Our approach is based<br />
on generating a consensus network using hypotheses from a diverse set of handwriting recognition systems. Next, we decode<br />
the consensus network for producing the best possible hypothesis given an error criterion. Experimental results on<br />
an Arabic OHR task show that our combination algorithm outperforms the NIST ROVER technique and results in a 7%<br />
relative reduction in the word error rate over the single best OHR system.<br />
16:00-16:20, Paper WeCT7.2<br />
A Novel Lexicon Reduction Method for Arabic Handwriting Recognition<br />
Wshah, Safwan, SUNY Buffalo<br />
Govindaraju, Venu, Univ. at Buffalo<br />
Li, Huiping, Applied Media Analysis Inc.<br />
Cheng, Yanfen, Wuhan Univ. of Tech.<br />
In this paper, we present a method for lexicon size reduction which can be used as an important pre-processing for an offline<br />
Arabic word recognition. The method involves extraction of the dot descriptors and PAWs (Piece of Arabic Word ).<br />
Then the number and position of dots and the number of the PAWs are used to eliminate unlikely candidates. The extraction<br />
of the dot descriptors is based on defined rules followed by a convolutional neural network for verification. The reduction<br />
algorithm makes use of the combination of two features with a dynamic matching scheme. On IFN/ENIT database of<br />
26459 Arabic handwritten word images we achieved a reduction rate of 87% with accuracy above 93%.<br />
16:20-16:40, Paper WeCT7.3<br />
A Novel Verification System for Handwritten Words Recognition<br />
Guichard, Laurent, IRISA - INRIA<br />
Toselli, Alejandro Héctor, Univ. Pol. de Valencia<br />
Couasnon, Bertrand, Irisa / Insa<br />
In the field of isolated handwritten word recognition, the development of highly effective verification systems to reject<br />
words presenting ambiguities is still an active research topic. In this paper, a novel verification system based on support<br />
vector machine scoring and multiple reject class-dependent thresholds is presented. In essence, a set of support vector machines<br />
appended to a standard HMM-based recognition system provides class-dependent confidence measures employed<br />
by the verification mechanism to accept or reject the recognized hypotheses. Experimental results on RIMES database<br />
show that this approach outperforms other state-of-the-art approaches.<br />
16:40-17:00, Paper WeCT7.4<br />
Multi-Template GAT/PAT Correlation for Character Recognition with a Limited Quantity of Data<br />
Wakahara, Toru, Hosei Univ.<br />
Yamashita, Yukihiko, Tokyo Inst. of Tech.<br />
- 211 -
This paper addresses the problem of improving the accuracy of character recognition with a limited quantity of data. The<br />
key ideas are twofold. One is distortion-tolerant template matching via hierarchical global/partial affine transformation<br />
(GAT/PAT) correlation to absorb both linear and nonlinear distortions in a parametric manner. The other is use of multiple<br />
templates per category obtained by k-means clustering in a gradient feature space for dealing with topological distortion.<br />
Recognition experiments using the handwritten numerical database IPTP CDROM1B show that the proposed method<br />
achieves a much higher recognition rate of 97.9% than that of 85.8% obtained by the conventional, simple correlation<br />
matching with a single template per category. Furthermore, comparative experiments show that the k-NN classification<br />
using the tangent distance and the GAT correlation technique achieves recognition rates of 97.5% and 98.7%, respectively.<br />
17:00-17:20, Paper WeCT7.5<br />
Structure Adaptation of HMM Applied to OCR<br />
Ait Mohand, Kamel, Univ. of Rouen<br />
Paquet, Thierry, Univ. of Rouen<br />
Ragot, Nicolas, Univ. François Rabelais Tours<br />
Heutte, Laurent, Univ. of Rouen<br />
In this paper we present a new algorithm for the adaptation of Hidden Markov Models (HMM models). The principle of<br />
our iterative adaptive algorithm is to alternate an HMM structure adaptation stage with an HMM Gaussian MAP adaptation<br />
stage of the parameters. This algorithm is applied to the recognition of printed characters to adapt the character models of<br />
a poly font general purpose character recognizer to new fonts of characters, never seen during training. A comparison of<br />
the results with those of MAP classical adaptation scheme show a slight increase in the recognition performance.<br />
WeBCT8 Upper Foyer<br />
SVM, NN, Kernel and Learning; Object Detection and Recognition Poster Session<br />
Session chair: Ross, Arun (West Virginia Univ.)<br />
13:30-16:30, Paper WeBCT8.1<br />
Multi-Class Pattern Classification in Imbalanced Data<br />
Ghanem, Amal Saleh, Univ. of Bahrain<br />
Venkatesh, Svetha, Curtin Univ. of Tech.<br />
West, Geoff, Curtin Univ. of Tech.<br />
The majority of multi-class pattern classification techniques are proposed for learning from balanced datasets. However,<br />
in several real-world domains, the datasets have imbalanced data distribution, where some classes of data may have few<br />
training examples compared for other classes. In this paper we present our research in learning from imbalanced multiclass<br />
data and propose a new approach, named Multi-IM, to deal with this problem. Multi-IM derives its fundamentals<br />
from the probabilistic relational technique (PRMs-IM), designed for learning from imbalanced relational data for the twoclass<br />
problem. Multi-IM extends PRMs-IM to a generalized framework for multi-class imbalanced learning for both relational<br />
and non-relational domains.<br />
13:30-16:30, Paper WeBCT8.2<br />
Deep Quantum Networks for Classification<br />
Zhou, Shusen, Harbin Inst. of Tech.<br />
Chen, Qingcai, Harbin Inst. of Tech.<br />
Wang, Xiaolong, Harbin Inst. of Tech.<br />
This paper introduces a new type of deep learning method named Deep Quantum Network (DQN) for classification. DQN<br />
inherits the capability of modeling the structure of a feature space by fuzzy sets. At first, we propose the architecture of<br />
DQN, which consists of quantum neuron and sigmoid neuron and can guide the embedding of samples divisible in new<br />
Euclidean space. The parameter of DQN is initialized through greedy layer-wise unsupervised learning. Then, the parameter<br />
space of the deep architecture and quantum representation are refined by supervised learning based on the global gradient-descent<br />
procedure. An exponential loss function is introduced in this paper to guide the supervised learning procedure.<br />
Experiments conducted on standard datasets show that DQN outperforms other feed forward neural networks and neurofuzzy<br />
classifiers.<br />
- 212 -
13:30-16:30, Paper WeBCT8.3<br />
Nonlinear Combination of Multiple Kernels for Support Vector Machines<br />
Li, Jinbo, East China Normal Univ.<br />
Sun, Shiliang, East China Normal Univ.<br />
Support vector machines (SVMs) are effective kernel methods to solve pattern recognition problems. Traditionally, they<br />
adopt a single kernel chosen beforehand, which makes them lack flexibility. The recent multiple kernel learning (MKL)<br />
overcomes this issue by optimizing over a linear combination of kernels. Despite its success, MKL neglects useful information<br />
generated from the nonlinear interaction of different kernels. In this paper, we propose SVMs based on the nonlinear<br />
combination of multiple kernels (NCMK) which surmounts the drawback of previous MKL by the potential to exploit<br />
more information. We show that our method can be formulated as a semi-definite programming (SDP) problem then solved<br />
by interior-point algorithms. Empirical studies on several data sets indicate that the presented approach is very effective.<br />
13:30-16:30, Paper WeBCT8.4<br />
Data Transformation of the Histogram Feature in Object Detection<br />
Zhang, Rongguo, Chinese Acad. of Sciences<br />
Xiao, Baihua, Chinese Acad. of Sciences<br />
Wang, Chunheng, Chinese Acad. of Sciences<br />
Detecting objects in images is very important for several application domains in computer vision. This paper presents an<br />
experimental study on data transformation of the feature vector in object detection. We use the modified Pyramid of Histograms<br />
of Orientation Gradients descriptor and the SVM classifier to form an object detection model. We apply a simple<br />
transformation to the histogram features before training and testing. This transformation equals a small change in the<br />
kernel function for Support Vector Machines. This change is much quicker than the kernel, but obtains better results. Experimental<br />
evaluations on the UIUC Image Database and TU Darmstadt Database show that the transformed features perform<br />
better than the raw features, and this transformation improves the linear separability of the histogram feature.<br />
13:30-16:30, Paper WeBCT8.5<br />
A New Learning Formulation for Kernel Classifier Design<br />
Sato, Atsushi, NEC<br />
This paper presents a new learning formulation for classifier design called ``General Loss Minimization.’’ The formulation<br />
is based on Bayes decision theory which can handle various losses as well as prior probabilities. A learning method for<br />
RBF kernel classifiers is derived based on the formulation. Experimental results reveal that the classification accuracy by<br />
the proposed method is almost the same as or better than Support Vector Machine (SVM), while the number of obtained<br />
reference vectors by the proposed method is much less than that of support vectors by SVM.<br />
13:30-16:30, Paper WeBCT8.6<br />
Variable Selection for Five-Minute Ahead Electricity Load Forecasting<br />
Koprinska, Irena, Univ. of Sydney<br />
Sood, Rohen, Univ. of Sydney<br />
Agelidis, Vassilios, Univ. of Sydney<br />
We use autocorrelation analysis to extract 6 nested feature sets of previous electricity loads for 5-minite ahead electricity<br />
load forecasting. We evaluate their predictive power using Australian electricity data. Our results show that the most important<br />
variables for accurate prediction are previous loads from the forecast day, 1, 2 and 7 days ago. By using also load<br />
variables from 3 and 6 days ago, we achieved small further improvements. The 3 bigger feature sets (37-51 features) when<br />
used with linear regression and support vector regression algorithms, were more accurate than the benchmarks. The overall<br />
best prediction model in terms of accuracy and training time was linear regression using the set of 51 features.<br />
13:30-16:30, Paper WeBCT8.7<br />
Enhancing Web Page Classification via Local Co-Training<br />
Du, Youtian, Xi’an Jiaotong Univ.<br />
Guan, Xiaohong, Xi’an Jiaotong Univ., Tsinghua University<br />
Cai, Zhongmin, Xi’an Jiaotong Univ.<br />
- 213 -
In this paper we propose a new multi-view semi-supervised learning algorithm called Local Co-Training(LCT). The proposed<br />
algorithm employs a set of local models with vector outputs to model the relations among examples in a local region<br />
on each view, and iteratively refines the dominant local models (i.e. the local models related to the unlabeled examples<br />
chosen for enriching the training set) using unlabeled examples by the co-training process. Compared with previous cotraining<br />
style algorithms, local co-training has two advantages: firstly, it has higher classification precision by introducing<br />
local learning; secondly, only the dominant local models need to be updated, which significantly decreases the computational<br />
load. Experiments on WebKB and Cora datasets demonstrate that LCT algorithm can effectively exploit unlabeled<br />
data to improve the performance of web page classification.<br />
13:30-16:30, Paper WeBCT8.8<br />
Robust Face Recognition using Multiple Self-Organized Gabor Features and Local Similarity Matching<br />
Aly, Saleh, Kyushu Univ.<br />
Shimada, Atsushi, Kyushu Univ.<br />
Tsuruta, Naoyuki, Fukuoka Univ.<br />
Taniguchi, Rin-Ichiro, Kyushu Univ.<br />
Gabor-based face representation has achieved enormous success in face recognition. However, one drawback of Gaborbased<br />
face representation is the huge amount of data that must be stored. Due to the nonlinear structure of the data obtained<br />
from Gabor response, classical linear projection methods like principal component analysis fail to learn the distribution<br />
of the data. A nonlinear projection method based on a set of self-organizing maps is employed to capture this nonlinearity<br />
and to represent face in a new reduced feature space. The Multiple Self-Organized Gabor Features (MSOGF) algorithm<br />
is used to represent the input image using all winner indices from each SOM map. A new local matching algorithm based<br />
on the similarity between local features is also proposed to classify unlabeled data. Experimental results on FERET database<br />
prove that the proposed method is robust to expression variations.<br />
13:30-16:30, Paper WeBCT8.9<br />
Exploring Pattern Selection Strategies for Fast Neural Network Training<br />
Vajda, Szilard, Tech. Univ. of Dortmund<br />
Fink, Gernot, TU Dortmund Univ.<br />
Nowadays, the usage of neural network strategies in pattern recognition is a widely considered solution. In this paper we<br />
propose three different strategies to select more efficiently the patterns for a fast learning in such a neural framework by<br />
reducing the number of available training patterns. All the strategies rely on the idea of dealing just with samples close to<br />
the decision boundaries of the classifiers. The effectiveness (accuracy, speed) of these methods is confirmed through different<br />
experiments on the MNIST handwritten digit data [1], Bangla handwritten numerals [2] and the Shuttle data from<br />
the UCI machine learning repository [3].<br />
13:30-16:30, Paper WeBCT8.10<br />
The Detection of Concept Frames using Clustering Multi-Instance Learning<br />
Tax, David, Delft Univ. of Tech.<br />
Hendriks, E. , Delft Univ. of Tech.<br />
Valstar, Michel, Imperial Coll.<br />
Pantic, M., Imperial Coll.<br />
The classification of sequences requires the combination of information from different time points. In this paper the detection<br />
of facial expressions is considered. Experiments on the detection of certain facial muscle activations in videos<br />
show that it is not always required to model the sequences fully, but that the presence of specific frames (the concept<br />
frame) can be sufficient for a reliable detection of certain facial expression classes. For the detection of these concept<br />
frames a standard classifier is often sufficient, although a more advanced clustering approach performs better in some<br />
cases.<br />
13:30-16:30, Paper WeBCT8.11<br />
Kernel Domain Description with Incomplete Data: Using Instance-Specific Margins to Avoid Imputation<br />
Gripton, Adam, Heriot-Watt Univ.<br />
Lu, Weiping, Heriot-Watt Univ.<br />
- 214 -
We present a method of performing kernel space domain description of a dataset with incomplete entries without the need<br />
for imputation, allowing kernel features of a class of data with missing features to be rigorously described. This addresses<br />
the problem that absent data completion is usually required before kernel classifiers, such as support vector domain description<br />
(SVDD), can be applied; equally, few existing techniques for incomplete data adequately address the issue of<br />
kernel spaces. Our method, which we call instance-specific domain description (ISDD), uses a parametrisation framework<br />
to compute minimal kernelised distances between data points with missing features through a series of optimisation runs,<br />
allowing evaluation of the kernel distance while avoiding subjective completions of missing data. We compare results of<br />
our method against those achieved by SVDD applied to an imputed dataset, using synthetic and experimental datasets<br />
where feature absence has a non-trivial structure. We show that our methods can achieve tighter sphere bounds when applied<br />
to linear and quadratic kernels.<br />
13:30-16:30, Paper WeBCT8.12<br />
Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts<br />
Fausser, Stefan, Univ. of Ulm<br />
Schwenker, Friedhelm, Univ. of Ulm<br />
Having a large game-tree complexity and being EXPTIME-complete, English Draughts, recently weakly solved during<br />
almost two decades, is still hard to learn for intelligent computer agents. In this paper we present a Temporal-Difference<br />
method that is nonlinear neural approximated by a 4-layer multi-layer perceptron. We have built multiple English Draughts<br />
playing agents, each starting with a randomly initialized strategy, which use this method during self-play to improve their<br />
strategies. We show that the agents are learning by comparing their winning-quote relative to their parameters. Our best<br />
agent wins versus the computer draughts programs Neuro Draughts, KCheckers and CheckerBoard with the easych engine<br />
and looses to Chinook, GuiCheckers and CheckerBoard with the strong cake engine. Overall our best agent has reached<br />
an amateur league level.<br />
13:30-16:30, Paper WeBCT8.13<br />
Learning the Kernel Combination for Object Categorization<br />
Zhang, Deyuan, Harbin Inst. of Tech.<br />
Wang, Xiaolong, Harbin Inst. of Tech.<br />
Liu, Bingquan, Harbin Inst. of Tech.<br />
Although Support Vector Machines(SVM) succeed in classifying several image databases using image descriptors proposed<br />
in the literature, no single descriptor can be optimal for general object categorization. This paper describes a novel framework<br />
to learn the optimal combination of kernels corresponding to multiple image descriptors before SVM training, leading<br />
to solve a quadratic programming problem efficiently. Our framework takes into account the variation of kernel matrix<br />
and imbalanced dataset, which are common in real world image categorization tasks. Experimental results on Graz-01<br />
and Caltech-101 image databases show the effectiveness and robustness of our algorithm.<br />
13:30-16:30, Paper WeBCT8.14<br />
SemiCCA: Efficient Semi-Supervised Learning of Canonical Correlations<br />
Kimura, Akisato, NTT Corp.<br />
Kameoka, Hirokazu, NTT Corp.<br />
Sugiyama, Masashi, Tokyo Inst. of Tech.<br />
Nakano, Takuho, University of Tokyo<br />
Maeda, Eisaku, Communication Science Lab.<br />
Sakano, Hitoshi, NTT<br />
Ishiguro, Katsuhiko, NTT<br />
Canonical correlation analysis (CCA) is a powerful tool for analyzing multi-dimensional paired data. However, CCA tends<br />
to perform poorly when the number of paired samples is limited, which is often the case in practice. To cope with this<br />
problem, we propose a semi-supervised variant of CCA named “Semi CCA” that allows us to incorporate additional unpaired<br />
samples for mitigating overfitting. The proposed method smoothly bridges the eigenvalue problems of CCA and<br />
principal component analysis (PCA), and thus its solution can be computed efficiently just by solving a single (generalized)<br />
eigenvalue problem as the original CCA. Preliminary experiments with artificially generated samples and PASCAL VOC<br />
data sets demonstrate the effectiveness of the proposed method.<br />
- 215 -
13:30-16:30, Paper WeBCT8.15<br />
Spatial String Matching for Image Classification<br />
Liu, Yunqiang, Barcelona Media - Innovation Center<br />
Caselles, Vicent, Univ. Pompeu Fabra<br />
This paper presents a spatial string matching method to incorporate spatial information into the bag-of-words model, which<br />
represents an image as an unordered distribution of local features. Spatial constraints among neighboring features are explored<br />
in order to achieve better discrimination power for image classification. The features from neighboring points are<br />
combined together and taken as a spatial string, and then our method matches the images according to the similarity of<br />
string pairs. The categorization problem can be formulated using KNN or SVM classifier based on the spatial string matching<br />
kernel. The proposed method is able to capture spatial dependencies across the neighboring features. Experiment<br />
results show promising performance for image classification tasks.<br />
13:30-16:30, Paper WeBCT8.16<br />
A Semi-Supervised Gaussian Mixture Model for Image Segmentation<br />
Martínez-Usó, Adolfo, Univ. Jaume I<br />
Pla, F., Univ. Jaume I<br />
Martínez Sotoca, Jose, Univ. Jaume I<br />
In this paper, the results of a semi-supervised approach based on the Expectation-Maximisation algorithm for model-based<br />
clustering are presented. We show in this work that, if the appropriate generative model is chosen, the classification accuracy<br />
on clustering for image segmentation can be significantly improved by the combination of a reduced set of labelled<br />
data and a large set of unlabelled data. This technique has been tested on real images as well as on medical images from<br />
a dermatology application. The preliminary results are quite promising. Not only the unsupervised accuracies have been<br />
improved as expected but the segmentation results obtained are considerably better than the results obtained by other powerful<br />
and well-known unsupervised image segmentation techniques.<br />
13:30-16:30, Paper WeBCT8.17<br />
Adding Classes Online in Error Correcting Output Codes Framework<br />
Escalera, Sergio, UB<br />
Masip, David, CVC, UOC<br />
Puertas, Eloi, Univ. de Barcelona<br />
Radeva, Petia, CVC<br />
Pujol, Oriol, UB<br />
This article proposes a general extension of the Error Correcting Output Codes (ECOC) framework to the online learning<br />
scenario. As a result, the final classifier handles the addition of new classes independently of the base classifier used. Validation<br />
on UCI database and two real machine vision applications show that the online problem-dependent ECOC proposal<br />
provides a feasible and robust way for handling new classes using any base classifier.<br />
13:30-16:30, Paper WeBCT8.18<br />
Training Multi-Level Features for the RobotVision@<strong>ICPR</strong> <strong>2010</strong> Challenge<br />
Sebastien, Paris, Univ. de la Méditerranée<br />
Herve, Glotin, LSIS<br />
This paper combines and proposes two novel multi-level spatial pyramidal (sp) features: spELBP (Extended Local Binary<br />
Pattern), spELBOP (Extended Local Binary Orientation Pattern) and spHOEE (Histogram of Oriented Edge Energy).<br />
These features feed state-of-the-art SVM algorithms for the localization of a robot in indoor environments. Two tasks are<br />
associated with the RobotVision@<strong>ICPR</strong> <strong>2010</strong> Challenge, the first one uses only a frame of stereoscopic images, the second<br />
takes into account the dynamics of the robot for improving results. Our scores are ranked 3rd for Task1 and 1st for Task2<br />
13:30-16:30, Paper WeBCT8.19<br />
Subclass Error Correcting Output Codes using Fisher’s Linear Discriminant Ratio<br />
Arvanitopoulos, Nikolaos, Aristotle Univ. of Thessaloniki<br />
Bouzas, Dimitrios, Aristotle Univ. of Thessaloniki<br />
- 216 -
Tefas, Anastasios, Aristotle Univ. of Thessaloniki<br />
Error-Correcting Output Codes (ECOC) with sub-classes reveal a common way to solve multi-class classification problems.<br />
According to this approach, a multi-class problem is decomposed into several binary ones based on the maximization of<br />
the mutual information (MI) between the classes and their respective labels. The MI is modelled through the fast quadratic<br />
mutual information (FQMI) procedure. However, FQMI is not applicable on large datasets due to its high algorithmic<br />
complexity. In this paper we propose Fisher’s Linear Discriminant Ratio (FLDR) as an alternative decomposition criterion<br />
which is of much less computational complexity and achieves in most experiments conducted better classification performance.<br />
Furthermore, we compare FLDR against FQMI for facial expression recognition over the Cohn-Kanade database.<br />
13:30-16:30, Paper WeBCT8.20<br />
Pattern Recognition Method using Ensembles of Regularities Found by Optimal Partitioning<br />
Senko, Oleg, Inst. of Russian Acad. of Sciences<br />
Kuznetsova, Anna, Inst. of Russian Acad. of Sciences<br />
New pattern recognition method is considered that is based on ensembles of syndromes. The developed method that is referred<br />
to as Multi-model statistically weighted syndromes (MSWS) is further development of earlier Statistically Weighted<br />
Syndromes (SWS) method. Syndromes are subregions in space of prognostic features where content of objects from one<br />
of the classes differs significantly from the same class contents in neighboring subregions. Syndromes are discussed as<br />
simple basic classifiers that are combined with the help of weighted voting procedure. Method of optimal partitioning of<br />
input features space is used for syndromes searching. At that syndromes are selected depending on quality of data separation<br />
and complexity of used partitioning model (partitions family). Performance of MSWS is compered with performance of<br />
SWS and alternative techniques in several applied tasks. Influence of recognition ability on characteristics of syndromes<br />
selection is studied.<br />
13:30-16:30, Paper WeBCT8.21<br />
A Geometric Radial Basis Function Network for Robot Perception and Action<br />
Bayro Corrochano, Eduardo Jose, CINVESTAV, Unidad Guadalajara<br />
Vázquez Santacruz, Eduardo, CINVESTAV, Unidad Guadalajara<br />
This paper presents a new hyper complex valued Radial Basis Network. This network constitutes a generalization of the<br />
standard real valued RBF. This geometric RBF can be used in real time to estimate changes in linear transformations between<br />
sets of geometric entities. Experiments using stereo image sequences validate this proposal. We propose a Geometric<br />
RBF Network (GRBF-N) designed in the geometric algebra framework. We present an application to estimate linear transformations<br />
between sets of geometric entities. Our experiments validate our proposal.<br />
13:30-16:30, Paper WeBCT8.22<br />
Kernel on Graphs based on Dictionary of Paths for Image Retrieval<br />
Haugeard, Jean-Emmanuel, ETIS, CNRS, ENSEA, Univ. Cergy-Pontoise<br />
Philipp-Foliguet, Sylvie, ENSEA/UCP/CNRS<br />
Gosselin, Philippe Henri, CNRS<br />
Recent approaches of graph comparison consider graphs as sets of paths. Kernels on graphs are then computed from<br />
kernels on paths. A common strategy for graph retrieval is to perform pairwise comparisons. In this paper, we propose to<br />
follow a different strategy, where we collect a set of paths into a dictionary, and then project each graph to this dictionary.<br />
Then, graphs can be classified using powerful classification methods, such as SVM. Furthermore, we collect the paths<br />
through interaction with a user. This strategy is ten times faster than a straight comparisons of paths. Experiments have<br />
been carried out on a database of city windows.<br />
13:30-16:30, Paper WeBCT8.23<br />
An Efficient Active Constraint Selection Algorithm for Clustering<br />
Vu, Viet-Vu, Univ. Pierre et Marie Curie - Paris 6<br />
Labroche, Nicolas, Univ. Pierre et Marie Curie - Paris 6<br />
Bouchon-Meunier, Bernadette, Univ. Pierre et Marie Curie - Paris 6<br />
- 217 -
In this paper, we address the problem of active query selection for clustering with constraints. The objective is to determine<br />
automatically a set of queries and their associated must-link and can-not link constraints to help constraints based clustering<br />
algorithms to converge. Some works on active constraints learning have already been proposed but they are only applied<br />
to K-Means like clustering algorithms which are known to be limited to spherical clusters while we are interested in constraints-based<br />
clustering algorithms that deals with clusters of arbitrary shapes and sizes (like Constrained-DBSCAN,<br />
Constrained-Hierarchical Clustering. . . ). Our novel approach relies on a k-nearest neighbors graph to estimate the dense<br />
regions of the data space and generates queries at the frontier between clusters where the cluster membership is most uncertain.<br />
Experiments show that our framework improves the performance of constraints based clustering algorithms.<br />
13:30-16:30, Paper WeBCT8.24<br />
Fuzzy Support Vector Machines for ECG Arrhythmia Detection<br />
Özcan, N. Özlem, Boğaziçi Univ.<br />
Gürgen, Fikret, Boğaziçi Univ.<br />
Besides cardiovascular diseases, heart attacks are the main cause of death around the world. Pre-monitoring or pre-diagnostic<br />
helps to prevent heart attacks and strokes. ECG plays a key role in this regard. In recent studies, SVM with different<br />
kernel functions and parameter values are applied for classification on ECG data. The classification model of SVM can<br />
be improved by assigning membership values for inputs. SVM combined with fuzzy theory, FSVM, is exercised on UCI<br />
Arrhythmia Database. Five different membership functions are defined. It is shown that the accuracy of classification can<br />
be improved by defining appropriate membership functions. ANFIS is used in order to interpret the resulting classification<br />
model. The ANFIS model of the ECG data is compared to and found consistent with the medical knowledge.<br />
13:30-16:30, Paper WeBCT8.25<br />
ROC Analysis and Cost-Sensitive Optimization for Hierarchical Classifiers<br />
Paclik, Pavel, PR Sys Design<br />
Lai, Carmen, TU Delft<br />
Landgrebe, Thomas, De Beers<br />
Duin, Robert, TU Delft<br />
Instead of solving complex pattern recognition problems using a single complicated classifier, it is often beneficial to<br />
leverage our prior knowledge and decompose the problem into parts. These may be tackled using specific feature subsets<br />
and simpler classifiers resulting in a hierarchical system. In this paper, we propose an efficient and scalable approach for<br />
cost-sensitive optimization of a general hierarchical classifier using ROC analysis. This allows the designer to view the<br />
hierarchy of trained classifiers as a system, and tune it according to the application needs.<br />
13:30-16:30, Paper WeBCT8.26<br />
Variational Mixture of Experts for Classification with Applications to Landmine Detection<br />
Yuksel, Seniha Esen, Univ. of Florida<br />
Gader, Paul, Univ. of Florida<br />
In this paper, we (1) provide a complete framework for classification using Variational Mixture of Experts (VME); (2) derive<br />
the variational lower bound; and (3) apply the method to landmine, or simply mine, detection and compare the results<br />
to the Mixtures of Experts trained with Expectation Maximization (EMME). VME has previously been used for regression<br />
and Waterhouse explained how to apply VME to classification (which we will call as VMEC). However, the steps to train<br />
the model were not made clear since the equations were applicable to vector valued parameters as opposed to matrices for<br />
each expert. Also, a variational lower bound was not provided. The variational lower bound provides an excellent stopping<br />
criterion that resists over-training. We demonstrate the efficacy of the method on real-world mine classification; in which,<br />
training robust mine classification algorithms is difficult because of the small number of samples per class. In our experiments<br />
VMEC consistently improved performance over EMME.<br />
13:30-16:30, Paper WeBCT8.27<br />
A Unifying Framework for Learning the Linear Combiners for Classifier Ensembles<br />
Erdogan, Hakan, Sabanci Univ.<br />
Sen, Mehmet Umut, Sabanci Univ.<br />
- 218 -
For classifier ensembles, an effective combination method is to combine the outputs of each classifier using a linearly<br />
weighted combination rule. There are multiple ways to linearly combine classifier outputs and it is beneficial to analyze<br />
them as a whole. We present a unifying framework for multiple linear combination types in this paper. This unification<br />
enables using the same learning algorithms for different types of linear combiners. We present various ways to train the<br />
weights using regularized empirical loss minimization. We propose using the hinge loss for better performance as compared<br />
to the conventional least-squares loss. We analyze the effects of using hinge loss for various types of linear weight training<br />
by running experiments on three different databases. We show that, in certain problems, linear combiners with fewer parameters<br />
may perform as well as the ones with much larger number of parameters even in the presence of regularization.<br />
13:30-16:30, Paper WeBCT8.28<br />
Reinforcement Learning for Robust and Efficient Real-World Tracking<br />
Cohen, Andre, Rutgers Univ.<br />
Pavlovic, Vladimir, Rutgers Univ.<br />
In this paper we present a new approach for combining several independent trackers into one robust real-time tracker. Unlike<br />
previous work that employ multiple tracking objectives used in unison, our tracker manages to determine an optimal sequence<br />
of individual trackers given the characteristics present in the video and the desire to achieve maximally efficient tracking.<br />
This allows for the selection of fast less-robust trackers when little movement is sensed, while using more robust but computationally<br />
intensive trackers in more dynamic scenes. We test this approach on the problem of real-world face tracking.<br />
Results show that this approach is a viable method for combining several independent trackers into one robust real-time<br />
tracker capable of tracking faces in varied lighting conditions, video resolutions, and with occlusions.<br />
13:30-16:30, Paper WeBCT8.29<br />
An Efficient and Stable Algorithm for Learning Rotations<br />
Arora, Raman, Univ. of Washington<br />
Sethares, William A., Univ. of Wisconsin-Madison<br />
This paper analyses the computational complexity and stability of an online algorithm recently proposed for learning rotations.<br />
The proposed algorithm involves multiplicative updates that are matrix exponentials of skew-symmetric matrices comprising<br />
the Lie algebra of the rotation group. The rank-deficiency of the skew-symmetric matrices involved in the updates is exploited<br />
to reduce the updates to a simple quadratic form. The Lyapunov stability of the algorithm is established and the application<br />
of the algorithm to registration of point-clouds in n-dimensional Euclidean space is discussed.<br />
13:30-16:30, Paper WeBCT8.30<br />
An Incremental Learning Algorithm for Nonstationary Environments and Class Imbalance<br />
Ditzler, Greg, Rowan Univ.<br />
Chawla, Nitesh, Univ. of Notre Dame<br />
Polikar, Robi, Rowan Univ.<br />
Learning in a non-stationary environment and in the presence of class imbalance has been receiving more recognition from the computational<br />
intelligence community, but little work has been done to create an algorithm or a framework that can handle both issues simultaneously. We<br />
have recently introduced a new member to the Learn++ family of algorithms, Learn++.NSE, which is designed to track non-stationary environments.<br />
However, this algorithm does not work well when there is class imbalance as it has not been designed to handle this problem. On<br />
the other hand, SMOTE a popular algorithm that can handle class imbalance is not designed to learn in nonstationary environments because<br />
it is a method of over sampling the data. In this work we describe and present preliminary results for integrating SMOTE and Learn++.NSE<br />
to create an algorithm that is robust to learning in a non-stationary environment and under class imbalance.<br />
13:30-16:30, Paper WeBCT8.31<br />
Feature-Based Partially Occluded Object Recognition<br />
Fan, Na, East China Normal Univ.<br />
We propose a framework to combine geometry, color and texture information among pairwise feature points into a graph and find the correct<br />
assignments from all candidates using graph matching techniques. Because of our informative similarity matrix, objects can be still recognized<br />
under severe occlusion and the matching errors can be greatly reduced when images are taken from very different view angles and partial occluded.<br />
- 219 -
13:30-16:30, Paper WeBCT8.32<br />
A Sample Pre-Mapping Method Enhancing Boosting for Object Detection<br />
Ren, Haoyu, Chinese Acad. of Sciences<br />
Hong, Xiaopeng, Harbin Inst. of Tech.<br />
Heng, Cher Keng, Panasonic Singapore Lab. Pte Ltd<br />
Liang, Luhong, Chinese Acad. of Sciences<br />
Chen, Xilin, Chinese Acad. of Sciences<br />
We propose a novel method to improve the training efficiency and accuracy of boosted classifiers for object detection.<br />
The key step of the proposed method is a sample pre-mapping on original space by referring to the selected reference<br />
sample before feeding into weak classifiers. The reference sample corresponds to an approximation of the optimal separating<br />
hyper-plane in an implicit high dimensional space, so that the resulting classifier could achieve the performance<br />
similar to kernel method, while spending the computation cost of linear classifier in both training and detection. We employ<br />
two different non-linear mappings to verify the proposed method under boosting framework. Experimental results show<br />
that the proposed approach achieves performance comparable with the common used methods on public datasets in both<br />
pedestrian detection and car detection.<br />
13:30-16:30, Paper WeBCT8.33<br />
Context Inspired Pedestrian Detection in Far-field Videos<br />
Ma, Wenhua, Chinese Acad. of Sciences<br />
He, Peng, Chinese Acad. of Sciences<br />
Lei, Huang, Chinese Acad. of Sciences<br />
Liu, Changping, Chinese Acad. of Sciences<br />
A novel pedestrian detection method that integrates context information with slide window search is proposed. The method<br />
applies notions such as corner, motion, and appearance to localize pedestrians in far-field videos without performing bruteforce-search.<br />
The corners direct attention to a set of conspicuous locations as the starting points for searching. And motion<br />
detection restricts the searching area within the foreground mask. Based on the above two, slide window search is applied<br />
to confirm the exact locations of pedestrians. Experiments demonstrate that the proposed method is efficient in detecting<br />
pedestrians in far-field videos.<br />
13:30-16:30, Paper WeBCT8.34<br />
Theme-Based Multi-Class Object Recognition and Segmentation<br />
Wu, Shilin, Chinese Acad. of Sciences<br />
Geng, Jiajia, Chinese Acad. of Sciences<br />
Zhu, Feng, Chinese Acad. of Sciences<br />
In this paper, we propose a new theme-based CRF model and investigate its performance on class based pixel-wise segmentation<br />
of images. By including the theme of an image, we also propose a new texture-environment potential to represent<br />
texture environment of a pixel, which alone gives satisfactory recognition results. The pixel-wise segmentation accuracy<br />
is remarkably improved by introducing texture potential. We compare our results to recent published results on the MSRC<br />
21-class database and show that our theme-based CRF model significantly outperforms the current state-of-the-art. Especially,<br />
by assigning a theme for each image, our model obtains greatly improved accuracy of structured classes with high<br />
visual variability and fewer training examples, the accuracy of which is very low in most related works.<br />
13:30-16:30, Paper WeBCT8.35<br />
Boosted Sigma Set for Pedestrian Detection<br />
Hong, Xiaopeng, Harbin Inst. of Tech.<br />
Chang, Hong, Chinese Acad. of Sciences<br />
Chen, Xilin, Chinese Acad. of Sciences<br />
Gao, Wen, PeKing Univ.<br />
This paper presents a new method to detect pedestrian in still image using Sigma sets as image region descriptors in the<br />
boosting framework. Sigma set encodes second order statistics of an image region implicitly in the form of a point set.<br />
Compared with the covariance matrix, the traditional second order statistics based region descriptor, which requires computationally<br />
demanding operations based on Riemannian manifold, Sigma set preserves similar robustness and discrimi-<br />
- 220 -
native power more efficiently because the classification on Sigma sets can be directly performed in vector space. Experimental<br />
results on the INRIA and the Daimler Chrysler pedestrian datasets show the effectiveness and efficiency of the<br />
proposed method.<br />
13:30-16:30, Paper WeBCT8.36<br />
Reverse Indexing for Reading Graffiti Tags<br />
Thurau, Christian, Fraunhofer IAIS<br />
Bauckhage, Christian, Fraunhofer IAIS<br />
In this paper, we consider the problem of automatically reading graffiti tags. As a preparatory step, we create a large set<br />
of synthetic graffiti-like characters, generated from publicly available true type fonts. For each character in the database,<br />
we extract a number of scale independent local binary descriptors. Then, using binary non negative matrix factorization,<br />
a sufficient number of basis functions are learned. Basis function coefficients of novel images can then be directly used<br />
for hashing characters from the database of prototypes. Finally, graffiti tags are recognized by means of a localized, spatial<br />
voting scheme.<br />
13:30-16:30, Paper WeBCT8.37<br />
Generic Object Recognition by Tree Conditional Random Field based on Hierarchical Segmentation<br />
Okumura, Takeshi, Kobe Univ.<br />
Takiguchi, Tetsuya, Kobe Univ.<br />
Ariki, Yasuo, Kobe Univ.<br />
Generic object recognition by a computer is strongly required in various fields like robot vision and image retrieval in<br />
recent years. Conventional methods use Conditional Random Field (CRF) that recognizes the class of each region using<br />
the features extracted from the local regions and the class co-occurrence between the adjoining regions. However, there<br />
is a problem that the discriminative ability of the features extracted from local regions is insufficient, and these methods<br />
is not robust to the scale variance. To solve this problem, we propose a method that integrates the recognition results in<br />
multi-scales by tree conditional random field based on hierarchical segmentation. As a result of the image dataset of 7<br />
classes, the proposed method has improved the recognition rate by 2.2%.<br />
13:30-16:30, Paper WeBCT8.38<br />
A Fast Approach for Pixelwise Labeling of Facade Images<br />
Fröhlich, Björn, Friedrich-Schiller Univ. of Jena<br />
Rodner, Erik, Friedrich-Schiller Univ. of Jena<br />
Denzler, Joachim, Friedrich-Schiller Univ. of Jena<br />
Facade classification is an important subtask for automatically building large 3d city models. In the following we present<br />
an approach for pixel wise labeling of facade images using an efficient Randomized Decision Forest classifier and robust<br />
local color features. Experiments are performed with a popular facade dataset and a new demanding dataset of pixel wise<br />
labeled images from the Label Me project. Our method achieves high recognition rates and is significantly faster for<br />
training and testing than other Methods based on expensive feature transformation techniques.<br />
13:30-16:30, Paper WeBCT8.39<br />
Real-Time Traffic Sign Detection: An Evaluation Study<br />
Li, Ying, IBM T. J. Watson Res. Center<br />
Guan, Weiguang, IBM<br />
Pankanti, Sharath<br />
This paper presents an experimental evaluation of three different traffic sign detection approaches, which detect or localize<br />
various types of traffic signs from real-time videos. Specifically, the first approach exploits geometric features to identify<br />
traffic signs, while the other two are developed based on SVM (Support Vector Machine) and AdaBoost learning mechanisms.<br />
We describe each of the three approaches, conduct a detailed comparison among them, and examine their pros and<br />
cons. Our conclusions should lead to useful guidelines for developing a real-time traffic sign detector.<br />
- 221 -
13:30-16:30, Paper WeBCT8.40<br />
Image Categorization by Learned Nonlinear Subspace of Combined Visual-Words and Low-Level Features<br />
Han, Xian-Hua, Ritsumeikan Univ.<br />
Chen, Yen-Wei, Ritsumeikan Univ.<br />
Ruan, Xiang, Omron Corparation<br />
Image category recognition is important to access visual information on the level of objects and scene types. This paper<br />
presents a new algorithm for the automatic recognition of object and scene classes. Compact and yet discriminative visual-words<br />
and low-level-features object class subspaces are automatically learned from a set of training images by a Supervised<br />
Nonlinear Neighborhood Embedding (SNNE) algorithm, which can learn an adaptive nonlinear subspace by<br />
preserving the neighborhood structure of the visual feature space. The main contribution of this paper is two fold: i) an<br />
optimally compact and discriminative feature subspace is learned by the proposed SNNE algorithm for different feature<br />
space (visual-word and low-level features). ii) An effective merge of different feature subspace can be implemented simply.<br />
High classification accuracy is demonstrated on different database including the scene databas (Simplicity) and object<br />
recognition database (Caltech). We confirm that the proposed strategy is much better than state-of-the-art methods for different<br />
databases.<br />
13:30-16:30, Paper WeBCT8.41<br />
Can Motion Segmentation Improve Patch-Based Object Recognition?<br />
Ulges, Adrian, DFKI<br />
Breuel, Thomas -<br />
Patch-based methods, which constitute the state of the art in object recognition, are often applied to video data, where<br />
motion information provides a valuable clue for separating objects of interest from the background. We show that such<br />
motion-based segmentation improves the robustness of patch-based recognition with respect to clutter. Our approach,<br />
which employs segmentation information to rule out incorrect correspondences between training and test views, is demonstrated<br />
empirically to distinctly outperform baselines operating on unsegmented images. Relative improvements reach<br />
50% for the recognition of specific objects, and 33% for object category retrieval.<br />
13:30-16:30, Paper WeBCT8.42<br />
Semi-Supervised and Interactive Semantic Concept Learning for Scene Recognition<br />
Han, Xian-Hua, Ritsumeikan Univ.<br />
Chen, Yen-Wei, Ritsumeikan Univ.<br />
Ruan, Xiang, Omron Corparation<br />
In this paper, we present a novel semi-supervised and interactive concept learning algorithm for scene recognition by local<br />
semantic description. Our work is motivated by the continuing effort in content-based image retrieval to extract and to<br />
model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic<br />
concept classes such as water, sunset, or sky [1]. However, labeling concept sampling manually for training semantic<br />
model is fairly expensive, and the labeling results is, to some extent, subjective to the operators. In this paper, by using<br />
the proposed semi-supervised and interactive learning algorithm, training samples and new concepts can be obtained accurately<br />
and efficiently. Through extensive experiments, we demonstrate that the image concept representation is well<br />
suited for modeling the semantic content of heterogenous scene categories, and thus for recognition and retrieval. Furthermore,<br />
higher recognition accuracy can be achieved by updating new training samples and concepts, which are obtained<br />
by the novel proposed algorithm.<br />
13:30-16:30, Paper WeBCT8.43<br />
Dense Structure Inference for Object Classification in Aerial LIDAR Dataset<br />
Kim, Eunyoung, Univ. of Southern California<br />
Medioni, Gerard, Univ. of Southern California<br />
We present a framework to classify small freeform objects in 3D aerial scans of a large urban area. The system first identifies<br />
large structures such as the ground surface and roofs of buildings densely built in the scene, by fitting planar patches<br />
and grouping adjacent patches similar in pose together. Then, it segments initial object candidates which represent the<br />
visible surface of an object using the identified structures. To deal with sparse density in points representing each candidate,<br />
we also propose a novel method to infer a dense 3D structure from the given sparse and noisy points without any meshes<br />
- 222 -
and iterations. To label object candidates, we build a tree-structure database of object classes, which captures latent patterns<br />
in shape of 3D objects in a hierarchical manner. We demonstrate our system on the aerial LIDAR dataset acquired from a<br />
few square kilometers of Ottawa.<br />
13:30-16:30, Paper WeBCT8.44<br />
Data-Driven Foreground Object Detection from a Non-Stationary Camera<br />
Sun, Shih-Wei, Acad. Sinica, Taiwan<br />
Huang, Fay, National Ilan Univ. Taiwan<br />
Liao, Mark, Acad. Sinica, Taiwan<br />
In this paper, we propose a data-driven foreground object detection technique which can detect foreground objects from<br />
a moving camera. We propose to build a data-driven consensus foreground object template (CFOT) and then detect the<br />
foreground object region in each frame. The proposed foreground object detection technique is equipped with the following<br />
functions: (1) the ability to detect the foreground object captured by a fast moving camera ; (2) the ability to detect a low<br />
contrast (spatially/temporally) foreground object; and (3) the ability to detect a foreground object from a dynamic background.<br />
There are three contributions of our method: (1) a newly proposed data-driven foreground region decision process<br />
for generating the CFOT has been shown robust and efficient; (2) a foreground object probability is proposed for properly<br />
dealing with the imperfect initial foreground region estimations; and (3) a CFOT is generated for precise foreground object<br />
detection.<br />
13:30-16:30, Paper WeBCT8.45<br />
Efficient Shape Retrieval under Partial Matching<br />
Demirci, Fatih, TOBB Univ. of Ec. and Tech.<br />
Indexing into large database systems is essential for a number of applications. This paper presents a new indexing structure,<br />
which overcomes an important restriction of a previous indexing technique using a recently developed theorem from the<br />
domain of matrix analysis. Specifically, given a set of distance values computed by distance function, which do not necessarily<br />
satisfy the triangle inequality, this paper shows that computing its nearest distance values that obey the properties<br />
of a metric enables us to overcome the limitations of the previous indexing algorithm. We demonstrate the proposed framework<br />
in the context of a recognition task.<br />
13:30-16:30, Paper WeBCT8.46<br />
Component Identification in the 3D Model of a Building<br />
Xu, Mai, Imperial Coll.<br />
Petrou, Maria, Imperial Coll.<br />
Jahangiri, Mohammad, Imperial Coll.<br />
This paper addresses the problem of identifying the components (such as balconies and windows) of the 3D model of a<br />
building. A novel method, based on a voting scheme, is presented for solving such a problem. It is intuitive that interference<br />
(such as shadows and occlusions) rarely happen at the same place or at different times when looking at a scene from different<br />
directions. In the spirit of this intuition, the voting-based method combines the information from various images to<br />
identify and segment the components of a building.<br />
13:30-16:30, Paper WeBCT8.48<br />
Multi-Scale Color Local Binary Patterns for Visual Object Classes Recognition<br />
Zhu, Chao, Ec. Centrale de Lyon<br />
Bichot, Charles-Edmond, Ec. Centrale de Lyon<br />
Chen, Liming, Ec. Centrale de Lyon<br />
The Local Binary Pattern (LBP) operator is a computationally efficient yet powerful feature for analyzing local texture<br />
structures. While the LBP operator has been successfully applied to tasks as diverse as texture classification, texture segmentation,<br />
face recognition and facial expression recognition, etc., it has been rarely used in the domain of Visual Object<br />
Classes (VOC) recognition mainly due to its deficiency of power for dealing with various changes in lighting and viewing<br />
conditions in real-world scenes. In this paper, we propose six novel multi-scale color LBP operators in order to increase<br />
photometric invariance property and discriminative power of the original LBP operator. The experimental results on the<br />
- 223 -
PASCAL VOC 2007 image benchmark show significant accuracy improvement by the proposed operators as compared<br />
with both the original LBP and other popular texture descriptors such as Gabor filter.<br />
13:30-16:30, Paper WeBCT8.49<br />
Object Localization by Propagating Connectivity via Superfeatures<br />
Chakraborty, Ishani, Rutgers Univ.<br />
Elgammal, Ahmed, Rutgers Univ.<br />
In this paper, we propose a part-based approach to localize objects in cluttered images. We represent object parts as boundary<br />
segments and image patches. A semi-local grouping of parts named superfeatures encodes appearance and connectivity<br />
within a neighborhood. To match parts, we integrate inter-feature similarities and intra-feature connectivity via a relaxation<br />
labeling framework. Additionally, we use a global elliptical shape prior to match the shape of the solution space to that of<br />
the object. To this end, we demonstrate the efficacy of the method for detecting various objects in cluttered images by<br />
comparing them to simple object models.<br />
13:30-16:30, Paper WeBCT8.50<br />
Efficient Object Detection and Matching using Feature Classification<br />
Dornaika, Fadi, Univ. of the Basque Country<br />
Chakik, Fadi, Lebanese Univ.<br />
This paper presents a new approach for efficient object detection and matching in images and videos. We propose a stage<br />
based on a classification scheme that classifies the extracted features in new images into object features and non-object<br />
features. This binary classification scheme has turned out to be an efficient tool that can be used for object detection and<br />
matching. By means of this classification not only the matching process becomes more robust and faster but also the robust<br />
object registration becomes fast. We provide quantitative evaluations showing the advantages of using the classification<br />
stage for object matching and registration. Our approach could lend itself nicely to real-time object tracking and detection.<br />
13:30-16:30, Paper WeBCT8.51<br />
A Discriminative Model for Object Representation and Detection via Sparse Features<br />
Song, Xi, Beijing Inst. of Tech.<br />
Luo, Ping, Sun Yat-Sen Univ.<br />
Lin, Liang, Sun Yat-Sen Univ.<br />
Jia, Yunde, Beijing Inst. of Tech.<br />
This paper proposes a discriminative model that represents an object category with a batch of boosted image patches, motivated<br />
by detecting and localizing objects with sparse features. Instead of designing features carefully and category-specifically<br />
as in previous work, we extract a massive number of local image patches from the positive object instances and<br />
quantize them as weak classifiers. Then we extend the Adaboost algorithm for learning the patch-based model integrating<br />
object appearance and structure information. With the learned model, a few features are activated to localize instances in<br />
the testing images. In the experiments, we apply the proposed method with several public datasets and achieve advancing<br />
performance.<br />
13:30-16:30, Paper WeBCT8.52<br />
A Robust Recognition Technique for Dense Checkerboard Patterns<br />
Dao, Vinh Ninh, The Univ. of Tokyo<br />
Sugimoto, Masanori, The Univ. of Tokyo<br />
The checkerboard pattern is widely used in computer vision techniques for camera calibration and simple geometry acquisition,<br />
both in practical use and research. However, most of the current techniques fail to recognize the checkerboard<br />
pattern under distorted, occluded or discontinuous conditions, especially when the checkerboard pattern is dense. This<br />
paper proposes a novel checkerboard recognition technique that is robust to noise, surface distortion or discontinuity, supporting<br />
checkerboard recognition in dynamic conditions for a wider range of applications. When the checkerboard pattern<br />
is used in a projector camera system for geometry reconstruction, by using epipolar geometry, this technique can recognize<br />
the corresponding positions of the crossing points, even if the checkerboard pattern is only partly detected.<br />
- 224 -
13:30-16:30, Paper WeBCT8.53<br />
Spike-Based Convolutional Network for Real-Time Processing<br />
Pérez-Carrasco, Jose-Antonio, Univ. de Sevilla<br />
Serrano-Gotarredona, Carmen, Univ. de Sevilla<br />
Acha-Piñero, Begoña, Univ. de Sevilla<br />
Serrano-Gotarredona, Teresa, Univ. de Sevilla<br />
Linares-Barranco, Bernabe, Univ. de Sevilla<br />
In this paper we propose the first bio-inspired six layer convolutional network (ConvNet) non-frame based that can be implemented<br />
with already physically available spikebased electronic devices. The system was designed to recognize people in<br />
three different positions: standing, lying or up-side down. The inputs were spikes obtained with a motion retina chip. We<br />
provide simulation results showing recognition delays of 16 milliseconds from stimulus onset (time-to-first spike) with a<br />
recognition rate of 94%. The weight sharing property in ConvNets and the use of AER protocol allow a great reduction in<br />
the number of both trainable parameters and connections (only 748 trainable parameters and 123 connections in our AER<br />
system (out of 506998 connections that would be required in a frame-based implementation).<br />
13:30-16:30, Paper WeBCT8.54<br />
Learning Affordances for Categorizing Objects and Their Properties<br />
Dag, Nilgun, Middle East Tech. Univ.<br />
Atil, Ilkay, Middle East Tech. Univ.<br />
Kalkan, Sinan, Middle East Tech. Univ.<br />
Sahin, Erol, Middle East Tech. Univ.<br />
In this paper, we demonstrate that simple interactions with objects in the environment leads to a manifestation of the perceptual<br />
properties of objects. This is achieved by deriving a condensed representation of the effects of actions (called effect prototypes<br />
in the paper), and investigating the relevance between perceptual features extracted from the objects and the actions that can<br />
be applied to them. With this at hand, we show that the agent can categorize (i.e., partition) its raw sensory perceptual feature<br />
vector, extracted from the environment, which is an important step for development of concepts and language. Moreover,<br />
after learning how to predict the effect prototypes of objects, the agent can categorize objects based on the predicted effects<br />
of actions that can be applied on them.<br />
13:30-16:30, Paper WeBCT8.55<br />
Feature Pairs Connected by Lines for Object Recognition<br />
Awais, Muhammad, Univ. of Surrey<br />
Mikolajczyk, Krystian, Univ. of Surrey<br />
In this paper we exploit image edges and segmentation maps to build features for object category recognition. We build a<br />
parametric line based image approximation to identify the dominant edge structures. Line ends are used as features described<br />
by histograms of gradient orientations. We then form descriptors based on connected line ends to incorporate weak topological<br />
constraints which improve their discriminative power. Using point pairs connected by an edge assures higher repeatability<br />
than a random pair of points or edges. The results are compared with state-of-the-art, and show significant improvement on<br />
challenging recognition benchmark Pascal VOC 2007. Kernel based fusion is performed to emphasize the complementary<br />
nature of our descriptors with respect to the state-of-the-art features.<br />
13:30-16:30, Paper WeBCT8.56<br />
Using Gait Features for Improving Walking People Detection<br />
Bouchrika, Imed, Univ. of Southampton<br />
Carter, John, Univ. of Southampton<br />
Nixon, Mark, Univ. of Southampton<br />
Morzinger, Roland, Joanneum Res.<br />
Thallinger, Georg, Joanneum Res.<br />
In this paper, we explore a new approach for enriching the HoG method for pedestrian detection in an unconstrained outdoor<br />
environment. The proposed algorithm is based on using gait motion since the rhythmic footprint pattern for walking people<br />
is considered the stable and characteristic feature for the detection of walking people. The novelty of our approach is motivated<br />
by the latest research for people identification using gait. The experimental results confirmed the robustness of our method<br />
- 225 -
to enhance HoG to detect walking people as well as to discriminate between single walking subject, groups of people and<br />
vehicles with a detection rate of 100%. Furthermore, the results revealed the potential of our method to be used in visual surveillance<br />
systems for identity tracking over different camera views.<br />
13:30-16:30, Paper WeBCT8.57<br />
Learning-Based Vehicle Detection using Up-Scaling Schemes and Predictive Frame Pipeline Structures<br />
Tsai, Yi-Min, National Taiwan Univ.<br />
Huang, Keng-Yen, National Taiwan Univ.<br />
Tsai, Chih-Chung, National Taiwan Univ.<br />
Chen, Liang-Gee, National Taiwan Univ.<br />
This paper aims at detecting preceding vehicles in a variety of distance. A sub-region up-scaling scheme significantly raises<br />
far distance detection capability. Three frame pipeline structures involving object predictors are explored to further enhance<br />
accuracy and efficiency. It claims a 140-meter detecting distance along proposed methodology. 97.1% detection rate with<br />
4.2% false alarm rate is achieved. At last, the benchmark of several learning-based vehicle detection approaches is provided.<br />
13:30-16:30, Paper WeBCT8.58<br />
Dynamic Hand Pose Recognition using Depth Data<br />
Suryanarayan, Poonam, The Pennsylvania State Univ.<br />
Subramanian, Anbumani, HP Lab.<br />
Mandalapu, Dinesh, HP Lab.<br />
Hand pose recognition has been a problem of great interest to the Computer Vision and Human Computer Interaction community<br />
for many years and the current solutions either require additional accessories at the user end or enormous computation<br />
time. These limitations arise mainly due to the high dexterity of human hand and occlusions created in the limited view of<br />
the camera. This work utilizes the depth information and a novel algorithm to recognize scale and rotation invariant hand<br />
poses dynamically. We have designed a volumetric shape descriptor enfolding the hand to generate a 3D cylindrical histogram<br />
and achieved robust pose recognition in real time.<br />
13:30-16:30, Paper WeBCT8.59<br />
A Hierarchical GIST Model Embedding Multiple Biological Feasibilities for Scene Classification<br />
Han, Yina, Xi’an Jiaotong Univ.<br />
Liu, Guizhong, Xi’an Jiaotong Univ.<br />
We propose a hierarchical GIST model embedding multiple biological feasibilities for scene classification. In the perceptual<br />
layer, spatial layout of Gabor features are extracted in a bio-vision guided way: introducing diagnostic color information,<br />
tuning the orientations and scales of Gabor filters, as well as the spacial pooling size to a biological feasible value. In the<br />
conceptual layer, for the first time, we attempt to build a computational model for the biological conceptual GIST by kernel<br />
PCA based prototype representation, which is specific task orientated as biological GIST, and also in accordance with the<br />
unsupervised learning assumption in the primary visual cortex and prototype similarity based categorization in human cognition.<br />
Using around $200$ dimensions, our model is shown to outperform existing GIST models, and to achieve state-ofthe-art<br />
performances on four scene datasets.<br />
13:30-16:30, Paper WeBCT8.60<br />
Road Network Extraction using Edge Detection and Spatial Voting<br />
Sirmacek, Beril, Deutsches Zentrum fur Luft und Raumfahrt<br />
Unsalan, Cem, Yeditepe Univ.<br />
Road network detection from very high resolution satellite images is important for two main reasons. First, the detection<br />
result can be used in automated map making. Second, the detected network can be used in trajectory planning for unmanned<br />
aerial vehicles. Although an expert can label road pixels in a given satellite image, this operation is prone to errors. Therefore,<br />
an automated system is needed to detect the road network in a given satellite image in a robust manner. In this study, we propose<br />
a novel approach to detect the road network from a given panchromatic Ikonos satellite image. Our method has five<br />
main steps. First, we apply a nonlinear bilateral filtering to smooth the given image. Then, we extract Canny edges and the<br />
gradient information as local features. Using these local features, we generate a spatial voting matrix. This voting matrix in-<br />
- 226 -
dicates the possible locations of the road network pixels. By processing this voting matrix in an iterative manner, we detect<br />
initial road pixels. Finally, we apply a tracking algorithm on the voting matrix to detect the missing road pixels. We tested<br />
our method on various satellite images and provided the extracted road networks in the experiments section.<br />
13:30-16:30, Paper WeBCT8.61<br />
Decomposition Methods and Learning Approaches for Imbalanced Dataset: An Experimental Integration<br />
Soda, Paolo, Univ. Campus Bio-Medico di Roma<br />
Iannello, Giulio, Univ. Campus Bio-Medico di Roma<br />
Decomposition methods are multiclass classification schemes where the polychotomy is reduced into several dichotomies.<br />
Each dichotomy is addressed by a classifier trained on a training set derived from the original one on the basis of the decomposition<br />
rule adopted. These new training sets may present a disproportion between the classes, harming the global recognition<br />
accuracy. Indeed, traditional learning algorithms are biased towards the majority class, resulting in poor predictive accuracy<br />
over the minority one. This paper investigates if the application of learning methods specifically tailored for imbalanced<br />
training set introduces any performance improvement when used by dichotomizers of decomposition methods. The results<br />
on five public datasets show that the application of these learning methods improves the global performance of decomposition<br />
schemes.<br />
13:30-16:30, Paper WeBCT8.62<br />
The Balanced Accuracy and its Posterior Distribution<br />
Brodersen, Kay Henning, ETH Zurich<br />
Ong, Cheng Soon, ETH Zurich<br />
Stephan, Klaas Enno, Univ. of Zurich<br />
Buhmann, Joachim M., Swiss Federal Inst. of Tech. Zurich<br />
Evaluating the performance of a classification algorithm critically requires a measure of the degree to which unseen examples<br />
have been identified with their correct class labels. In practice, generalizability is frequently estimated by averaging the accuracies<br />
obtained on individual cross-validation folds. This procedure, however, is problematic in two ways. First, it does<br />
not allow for the derivation of meaningful confidence intervals. Second, it leads to an optimistic estimate when a biased<br />
classifier is tested on an imbalanced dataset. We show that both problems can be overcome by replacing the conventional<br />
point estimate of accuracy by an estimate of the posterior distribution of the balanced accuracy.<br />
WeBCT9, Lower Foyer<br />
Multimedia Analysis and Retrieval, Poster Session<br />
Session chair: Cetin, E. (Bilkent Univ.)<br />
13:30-16:30, Paper WeBCT9.1<br />
A Study on Detecting Patterns in Twitter Intra-Topic User and Message Clustering<br />
Cheong, Marc, Monash Univ.<br />
Lee, Vincent C S, Monash Univ.<br />
Timely detection of hidden patterns is the key for the analysis and estimating of driving determinants for mission critical decision<br />
making. This study applies Cheong and Lee’s context-aware content analysis framework to extract latent properties<br />
from Twitter messages (tweets). In addition, we incorporate an unsupervised Self-organizing Feature Map (SOM) as a machine<br />
learning-based clustering tool that has not been investigated in the context of opinion mining and sentimental analysis<br />
using microblogging. Our experimental results reveal the detection of interesting patterns for topics of interest which are<br />
latent and cannot be easily detected from the observed tweets without the aid of machine learning tools.<br />
13:30-16:30, Paper WeBCT9.2<br />
Classification of Near-Duplicate Video Segments based on Their Appearance Patterns<br />
Ide, Ichiro, Nagoya Univ.<br />
Shamoto, Yuji, Nagoya Univ.<br />
Deguchi, Daisuke, Nagoya Univ.<br />
Takahashi, Tomokazu, Gifu Shotoku Gakuen Univ.<br />
Murase, Hiroshi, Nagoya Univ.<br />
- 227 -
We propose a method that analyzes the structure of a large volume of general broadcast video data by the appearance patterns<br />
of near-duplicate video segments. We define six classification rules based on the appearance patterns of near-duplicate video<br />
segments according to their roles, and evaluated them over more than 1,000 hours of actual broadcast video data.<br />
13:30-16:30, Paper WeBCT9.3<br />
Motion Vector based Features for Content based Video Copy Detection<br />
Tasdemir, Kasim, Bilkent Univ.<br />
Cetin, E., Bilkent Univ.<br />
In this article, we propose a motion vector based feature set for Content Based Copy Detection (CBCD) of video clips. Motion<br />
vectors of image frames are one of the signatures of a given video. However, they are not descriptive enough when consecutive<br />
image frames are used because most vectors are too small. To overcome this problem we calculate motion vectors in a lower<br />
frame rate than the actual frame rate of the video. As a result we obtain longer vectors which form a robust parameter set representing<br />
a given video. Experimental results are presented.<br />
13:30-16:30, Paper WeBCT9.4<br />
A Statistical Learning Approach to Spatial Context Exploitation for Semantic Image Analysis<br />
Papadopoulos, Georgios Th., Centre for Res. and Tech. Hellas<br />
Mezaris, Vasileios, Centre for Res. and Tech. Hellas<br />
Kompatsiaris, Yiannis, Centre for Res. and Tech. Hellas<br />
Strintzis, Michael-Gerasimos,<br />
In this paper, a statistical learning approach to spatial context exploitation for semantic image analysis is presented. The proposed<br />
method constitutes an extension of the key parts of the authors’ previous work on spatial context utilization, where a Genetic<br />
Algorithm (GA) was introduced for exploiting fuzzy directional relations after performing an initial classification of image regions<br />
to semantic concepts using solely visual information. In the extensions reported in this work, a more elaborate approach<br />
is followed during the spatial knowledge acquisition and modeling process. Additionally, the impact of every resulting spatial<br />
constraint on the final outcome is adaptively adjusted. Experimental results as well as comparative evaluation on three datasets<br />
of varying complexity in terms of the total number of supported semantic concepts demonstrate the efficiency of the proposed<br />
method.<br />
13:30-16:30, Paper WeBCT9.5<br />
Wavelet-Based Texture Retrieval Modeling the Magnitudes of Wavelet Detail Coefficients with a Generalized Gamma Distribution<br />
De Ves Cuenca, Esther, Univ. of Valencia<br />
Benavent, Xaro, Univ. of Valencia<br />
Ruedin, Ana María Clara, Univ. de Buenos Aires<br />
Acevedo, Daniel Germán, Univ. de Buenos Aires<br />
Seijas, Leticia María, Univ. de Buenos Aires<br />
This paper presents a texture descriptor based on the fine detail coefficients at three resolution levels of a traslation invariant<br />
undecimated wavelet transform. First, we consider vertical and horizontal wavelet detail coefficients at the same position as the<br />
components of a bivariate random vector, and the magnitude and angle of these vectors are computed. The magnitudes are modeled<br />
by a Generalized Gamma distribution. Their parameters, together with the circular histograms of angles, are used to characterize<br />
each texture image of the database. The Kullback-Leibler divergence is used as the similarity measurement. Retrieval<br />
experiments, in which we compare two wavelet transforms, are carried out on the Brodatz texture collection. Results reveal the<br />
good performance of this wavelet-based texture descriptor obtained via the Generalized Gamma distribution.<br />
13:30-16:30, Paper WeBCT9.6<br />
3D-Shape Retrieval using Curves and HMM<br />
Tabia, Hedi, Lagis Univ. Lille 1<br />
Daoudi, Mohamed, TELECOM Lille1<br />
Vandeborre, Jean-Philippe, Univ. of Lille 1<br />
Colot, Olivier, Univ. Lille 1<br />
In this paper, we propose a new approach for 3D-shape matching. This approach encloses an off-line step and an on-line step.<br />
- 228 -
In the off-line one, an alphabet, of which any shape can be composed, is constructed. First, 3D-objects are subdivided into a set<br />
of 3D-parts. The subdivision consists to extract from each object a set of feature points with associated curves. Then the whole<br />
set of 3D-parts is clustered into different classes from a semantic point of view. After that, each class is modeled by a Hidden<br />
Markov Model (HMM). The HMM, which represents a character in the alphabet, is trained using the set of curves corresponding<br />
to the class parts. Hence, any 3D-object can be represented by a set of characters. The on-line step consists to compare the set<br />
of characters representing the 3D-object query and that of each object in the given dataset. The experimental results obtained<br />
on the TOSCA dataset show that the system efficiently performs in retrieving similar 3D-models.<br />
13:30-16:30, Paper WeBCT9.7<br />
Fast Fingerprint Retrieval with Line Detection<br />
Lian, Hui-Cheng, Shanghai University<br />
In this paper, a retrieval method is proposed for audio and video fingerprinting systems by adopting a line detection technique.<br />
To achieve fast retrieval, the lines are generated from sub-fingerprints of query and database, and the non-candidate lines are<br />
filtered out. So, the distance between query and refers can be calculated fast. To demonstrate the superiority of this method, the<br />
audio fingerprints and video fingerprints are generated for comparisons. The experimental results indicate that the proposed<br />
method outperforms the direct hashing method.<br />
13:30-16:30, Paper WeBCT9.8<br />
A High-Dimensional Access Method for Approximated Similarity Search in Text Mining<br />
Artigas-Fuentes, Fernando José, Univ. de Oriente, CERPAMID<br />
Badía-Contelles, José Manuel, Univ. Jaume I, Castellón<br />
Gil-García, Reynaldo, Univ. de Oriente, CERPAMID<br />
In this paper, a new access method for very high-dimensional data space is proposed. The method uses a graph structure and<br />
pivots for indexing objects, such as documents in text mining. It also applies a simple search algorithm that uses distance or<br />
similarity based functions in order to obtain the k-nearest neighbors for novel query objects. This method shows a good selectivity<br />
over very-high dimensional data spaces, and a better performance than other state-of-the-art methods. Although it is a probabilistic<br />
method, it shows a low error rate. The method is evaluated on data sets from the well-known collection Reuters corpus<br />
version 1 (RCV1-v2) and dealing with thousands of dimensions.<br />
13:30-16:30, Paper WeBCT9.9<br />
3D Model Comparison through Kernel Density Matching<br />
Wang, Yiming, Nanjing Univ.<br />
Lu, Tong, Nanjing Univ.<br />
Gao, Rongjun, Nanjing Univ.<br />
Liu, Wenyin, City U of HK<br />
A novel 3D shape matching method is proposed in this paper. We first extract angular and distance feature pairs from preprocessed<br />
3D models, then estimate their kernel densities after quantifying the feature pairs into a fixed number of bins. During<br />
3D matching, we adopt the KL-divergence as a distance of 3D comparison. Experimental results show that our method is effective<br />
to match similar 3D shapes, and robust to model deformations or rotation transformations.<br />
13:30-16:30, Paper WeBCT9.10<br />
Improving the Efficiency of Content-Based Multimedia Exploration<br />
Beecks, Christian, RWTH Aachen Univ.<br />
Wiedenfeld, Sascha, RWTH Aachen Univ.<br />
Seidl, Thomas, RWTH Aachen Univ.<br />
Visual exploration systems enable users to search, browse, and explore voluminous multimedia databases in an interactive and<br />
playful manner. Whether users know the database’s contents in advance or not, these systems guide the user’s exploration<br />
process by visualizing the database contents and allowing him or her to issue queries intuitively. In order to improve the efficiency<br />
of content-based visual exploration systems, we propose an efficient query evaluation scheme which aims at reducing the total<br />
number of costly similarity computations. We evaluate our approach on different state-of-the-art image databases.<br />
- 229 -
13:30-16:30, Paper WeBCT9.11<br />
Tertiary Hash Tree: Indexing Structure for Content-Based Image Retrieval<br />
Tak, Yoon-Sik, Korea Univ.<br />
Hwang, Eenjun, Korea Univ.<br />
Dominant features for content-based image retrieval usually consist of high-dimensional values. So far, many researches<br />
have been done to index such values for fast retrieval. Still, many existing indexing schemes are suffering from performance<br />
degradation due to the curse of dimensionality problem. As an alternative, heuristic algorithms have been proposed to calculate<br />
the result with high probability at the cost of accuracy. In this paper, we propose a new hash tree-based indexing structure<br />
called tertiary hash tree for indexing high-dimensional feature values. Tertiary hash tree provides several advantages compared<br />
to the traditional extendible hash structure in terms of resource usage and search performance. Through extensive experiments,<br />
we show that our proposed index structure achieves outstanding performance.<br />
13:30-16:30, Paper WeBCT9.12<br />
An Augmented Reality Setup with an Omnidirectional Camera based on Multiple Object Detection<br />
Hayashi, Tomoki, Keio Univ.<br />
Uchiyama, Hideaki, Keio Univ.<br />
Pilet, Julien, Keio Univ.<br />
Saito, Hideo, Keio Univ.<br />
We propose a novel augmented reality (AR) setup with an omni directional camera on a table top display. The table acts as<br />
a mirror on which real playing cards appear augmented with virtual elements. The omni directional camera captures and recognizes<br />
its surrounding based on a feature based image retrieval approach which achieves fast and scalable registration. It<br />
allows our system to superimpose virtual visual effects to the omni directional camera image. In our AR card game, users sit<br />
around a table top display and show a card to the other players. The system recognizes it and augments it with virtual elements<br />
in the omni directional image acting as a mirror. While playing the game, the users can interact with each other directly and<br />
through the display. Our setup is a new, simple, and natural approach to augmented reality. It opens new doors to traditional<br />
card games.<br />
13:30-16:30, Paper WeBCT9.13<br />
Enhancing SVM Active Learning for Image Retrieval using Semi-Supervised Bias-Ensemble<br />
Wu, Jun, Dalian Maritime Univ.<br />
Lu, Ming-Yu, Dalian Maritime Univ.<br />
Wang, Chun-Li, Dalian Maritime Univ.<br />
Support vector machine (SVM) based active learning technique plays a key role to alleviate the burden of labeling in relevance<br />
feedback. However, most SVM-based active learning algorithms are challenged by the small example problem and the asymmetric<br />
distribution problem. This paper proposes a novel active learning scheme that deals with SVM ensemble under the<br />
semi-supervised setting to address the fist problem. For the second problem, a bias-ensemble mechanism is developed to<br />
guide the classification model to pay more attention on the positive examples than the negative ones. An empirical study<br />
shows that the proposed scheme is significantly more effective than some existing approaches.<br />
13:30-16:30, Paper WeBCT9.14<br />
Interactive Browsing of Remote JPEG 2000 Image Sequences<br />
Garcia Ortiz, Juan Pablo, Univ. of Almeria<br />
Ruiz, Gonzalez V., Univ. of Almeria<br />
Garcia, I., Univ. of Almeria<br />
Müller, D., European Space Agency/NASA<br />
Dimitoglou, G., European Space Agency/NASA<br />
This papers studies a novel prefetching scheme for the remote browsing of sequences of high resolution JPEG 2000 images.<br />
Using this scheme, an user is able to select randomly any of the remote images for its analysis, repeating this process with<br />
other images after some undefined time. Our solution has been proposed in a low bit-rate communication context where the<br />
complete transmission of any of the images for its lossless recovery should take too much time for an interactive visualization.<br />
For this reason, quality scalability is used in order to minimize the decoding latency. Frequently, the user can also play a<br />
``video’’, moving sequentially on the neighbour (consecutive in time over previous or following) images of the currently<br />
displayed one. With the objective of hiding also the link latency, the proposed data scheduler transmits in parallel data of the<br />
- 230 -
image that it is currently displayed and data of the rest of the temporally adjacent images. This scheduler uses a model based<br />
on the quality progression of the image in order to estimate which percentage of the bandwidth is dedicated to prefetch data.<br />
Our experimental results prove that a significant benefit can be achieved in terms of both subjective quality and responsiveness<br />
by means of prefetching.<br />
13:30-16:30, Paper WeBCT9.15<br />
Binarization of Color Characters in Scene Images using K-Means Clustering and Support Vector Machines<br />
Wakahara, Toru, Hosei Univ.<br />
Kita, Kohei, Hosei Univ.<br />
This paper proposes a new technique for binalizing multicolored characters subject to heavy degradations. The key ideas are<br />
threefold. The first is generation of tentatively binarized images via every dichotomization of k clusters obtained by k-means<br />
clustering in the HSI color space. The total number of tentatively binarized images equals 2^k2. The second is use of support<br />
vector machines (SVM) to determine whether and to what degree each tentatively binarized image represents a character or<br />
non-character. We feed the SVM with mesh and weighted direction code histogram features to output the degree of character-likeness.<br />
The third is selection of a single binarized image with the maximum degree of character likeness as an optimal<br />
binarization result. Experiments using a total of 1000 single-character color images extracted from the ICDAR 2003 robust<br />
OCR dataset show that the proposed method achieves a correct binarization rate of 93.7%.<br />
13:30-16:30, Paper WeBCT9.16<br />
A Self-Training Learning Document Binarization Framework<br />
Su, Bolan, National Univ. of Singapore<br />
Lu, Shijian, -<br />
Tan, Chew-Lim, National Univ. of Singapore<br />
Document Image Binarization techniques have been studied for many years, and many practical binarization techniques have<br />
been developed and applied successfully on commercial document analysis systems. However, the current state-of-the-art<br />
methods, fail to produce good binarization results for many badly degraded document images. In this paper, we propose a<br />
self-training learning framework for document image binarization. Based on reported binarization methods, the proposed<br />
framework first divides document image pixels into three categories, namely, foreground pixels, background pixels and uncertain<br />
pixels. A classifier is then trained by learning from the document image pixels in the foreground and background categories.<br />
Finally, the uncertain pixels are classified using the learned pixel classifier. Extensive experiments have been<br />
conducted over the dataset that is used in the recent Document Image Binarization Contest(DIBCO) 2009. Experimental results<br />
show that our proposed framework significantly improves the performance of reported document image binarization<br />
methods.<br />
13:30-16:30, Paper WeBCT9.17<br />
Novel Edge Features for Text Frame Classification in Video<br />
Palaiahnakote, Shivakumara, National Univ. of Singapore<br />
Tan, Chew-Lim, National Univ. of Singapore<br />
Text frame classification is needed in many applications such as event identification, exact event boundary identification,<br />
navigation, video surveillance in multimedia etc. To the best of our knowledge, there are no methods reported solely dedicated<br />
to text frame classifications so far. Hence this paper presents a new approach to text frame classification in video based on<br />
capturing local observable edge properties of text frames, by virtue of the strong presence of sharp edges, straight appearances<br />
of edges and consistent proximity between edges. The approach initially classifies the blocks of the frame into text blocks<br />
and non-text blocks. The true text block is then identified among classified text blocks to detect text frames by the proposed<br />
features. If the text frame produces one true text block then it is considered as a text frame otherwise a non-text frame. We<br />
evaluate the proposed approach on a large database containing both text and nontext frames and publicly available data at<br />
two levels, i.e., estimating recall and precision at the block level and the frame level.<br />
13:30-16:30, Paper WeBCT9.18<br />
Image Matching and Retrieval by Repetitive Patterns<br />
Doubek, Petr, Czech Tech. Univ. in Prague<br />
Matas, Jiri, Czech Tech. Univ. in Prague<br />
- 231 -
Perdoch, Michal, Czech Tech. Univ. in Prague<br />
Chum, Ondrej, Czech Tech. Univ. in Prague<br />
Detection of repetitive patterns in images has been studied for a long time in computer vision. This paper discusses a<br />
method for representing a lattice or line pattern by shift-invariant descriptor of the repeating element. The descriptor overcomes<br />
shift ambiguity and can be matched between different a views. The pattern matching is then demonstrated in retrieval<br />
experiment, where different images of the same buildings are retrieved solely by repetitive patterns.<br />
13:30-16:30, Paper WeBCT9.19<br />
An Approach for Recognizing Text Labels in Raster Maps<br />
Chiang, Yao-Yi, USC ISI<br />
Knoblock, Craig, USC ISI<br />
Text labels in raster maps provide valuable geospatial information by associating geographical names with geospatial locations.<br />
Although present commercial optical character recognition (OCR) products can achieve a high recognition rate<br />
on documents containing text lines of the same orientation, text recognition on raster maps is challenging due to the varying<br />
text orientations and the overlap of text labels. This paper presents a text recognition approach that focuses on locating individual<br />
text labels in the map and detecting their orientations to then leverage the horizontal text recognition capability<br />
of commercial OCR software. We show that our approach detects accurate string orientations and achieves 96.2% precision<br />
and 94.7% recall on character recognition and 80.6% precision and 84.1% recall on word recognition.<br />
13:30-16:30, Paper WeBCT9.20<br />
Local Visual Pattern Indexing for Matching Screenshots with Videos<br />
Poullot, Sebastien, National Inst. of Informatics<br />
Satoh, Shin’Ichi, National Inst. of Informatics<br />
In this paper a particular issue is addressed: matching still images (screen shots) with videos. A content-based similarity<br />
search approach using image queries is proposed. A fast method based on local visual patterns both for matching and indexing<br />
is employed. But we argue that using every frames may limit the scalability of the approach. Therefore only<br />
keyframes are extracted and used. The main contribution of this paper is an investigation over the trade-off between accuracy<br />
and scalability using different keyframe rates for sampling the video database. This trade-off is evaluated on a<br />
ground truth using a large reference video database (1000 hours).<br />
13:30-16:30, Paper WeBCT9.21<br />
Suggesting Songs for Media Creation using Semantics<br />
Joshi, Dhiraj, Kodak Res. Lab.<br />
Wood, Mark, Eastman Kodak Company<br />
Luo, Jiebo, -<br />
In this paper, we describe a method for matching song lyrics with semantic annotations of picture collections in order to<br />
suggest songs that reflect picture content in lyrics or genre. Picture collections are first analyzed to extract a variety of semantic<br />
information including scene type, event type, and geospatial information. When aggregated over a picture collection,<br />
this semantic information forms a semantic signature of the collection. Typical picture collections in our scenario consist<br />
of photo subdirectories in which people store pictures of a place, activity, or event. Picture collections are expected to<br />
contain coherent semantic content describing in part or whole the event or activity they depict. The semantic signature of<br />
a picture collection is compared against song lyrics using a WordNet expansion based text matching to find songs relevant<br />
to the collection. We present interesting song suggestions, compare and contrast scenarios with human versus machine labels,<br />
and perform a user study to validate the usefulness of the proposed method. The proposed method will be a useful<br />
tool to support user media creation.<br />
13:30-16:30, Paper WeBCT9.22<br />
Color Feature based Approach for Determining Ink Age in Printed Documents<br />
Halder, Biswajit, Mallabhum Inst. of Tech.<br />
Garain, Utpal, Indian Statistical Inst.<br />
- 232 -
Answering to a query like when a particular document was printed is quite helpful in practice especially forensic purposes.<br />
This study attempts to develop a general framework that makes use of image processing and pattern recognition principles<br />
for ink age determination in printed documents. The approach, at first, computationally extracts a set of suitable color features<br />
and then analyzes them to properly associate them with ink age. Finally, a neural net is designed and trained to determine<br />
ages of unknown samples. The dataset used for the present experiment consists of the cover pages of LIFE<br />
magazines published in between 1930’s and 70’s (five decades). Test results show that a viable framework for involving<br />
machines in assisting human experts for determining age of printed documents.<br />
13:30-16:30, Paper WeBCT9.23<br />
Automatic Detection and Localization of Natural Scene Text in Video<br />
Huang, Xiaodong, Beijing Univ. of Posts and Telecommunications<br />
Ma, Huadong, Beijing Univ. of Posts and Telecommunications<br />
Video scene text contains semantic information and thus can contribute significantly to video indexing and summarization.<br />
However, most of the previous approaches to detecting scene text from videos experience difficulties in handling texts<br />
with various character size and text alignments. In this paper, we propose a novel algorithm of scene text detection and<br />
localization in video. Based on our observation that text character strokes show intensive edge details in the fixed orientation<br />
no matter what text alignment and size are, a stroke map is first generated. In the scene text detection, we extract<br />
the texture feature of stroke map to locate text lines. The detected scene text lines are accurately located by using Harris<br />
corners in the stroke map. Experimental results show that this approach is robust and can be effectively applied to scene<br />
text detection and localization in video.<br />
13:30-16:30, Paper WeBCT9.24<br />
High-Level Feature Extraction using SIFT GMMs and Audio Models<br />
Inoue, Nakamasa, Tokyo Inst. of Tech.<br />
Saito, Tatsuhiko, Tokyo Inst. of Tech.<br />
Shinoda, Koichi, Tokyo Inst. of Tech.<br />
Furui, Sadaoki,<br />
We propose a statistical framework for high-level feature extraction that uses SIFT Gaussian mixture models (GMMs)<br />
and audio models. SIFT features were extracted from all the image frames and modeled by a GMM. In addition, we used<br />
mel-frequency cepstral coefficients and ergodic hidden Markov models to detect high-level features in audio streams. The<br />
best result obtained by using SIFT GMMs in terms of mean average precision on the TRECVID 2009 corpus was 0.150<br />
and was improved to 0.164 by using audio information.<br />
13:30-16:30, Paper WeBCT9.25<br />
Pairwise Features for Human Action Recognition<br />
Ta, Anh Phuong, Univ. de Lyon, CNRS, INSA-Lyon, LIRIS<br />
Wolf, Christian, INSA de Lyon<br />
Lavoue, Guillaume, Univ. de Lyon, CNRS<br />
Baskurt, Atilla, LIRIS, INSA Lyon<br />
Jolion, Jolion, Univ. de Lyon<br />
Existing action recognition approaches mainly rely on the discriminative power of individual local descriptors extracted<br />
from spatio-temporal interest points (STIP), while the geometric relationships among the local features are ignored. This<br />
paper presents new features, called pairwise features (PWF), which encode both the appearance and the spatio-temporal<br />
relations of the local features for action recognition. First STIPs are extracted, then PWFs are constructed by grouping<br />
pairs of STIPs which are both close in space and close in time. We propose a combination of two code<strong>book</strong>s for video<br />
representation. Experiments on two standard human action datasets: the KTH dataset and the Weizmann dataset show that<br />
the proposed approach outperforms most existing methods.<br />
13:30-16:30, Paper WeBCT9.26<br />
Group Activity Recognition by Gaussian Processes Estimation<br />
Cheng, Zhongwei, Chinese Acad. of Sciences<br />
Qin, Lei, Chinese Acad. of Sciences<br />
- 233 -
Huang, Qingming, Chinese Acad. of Sciences<br />
Jiang, Shuqiang, Chinese Acad. of Sciences<br />
Tian, Qi, Univ. of Texas at San Antonio<br />
Human action recognition has been well studied recently, but recognizing the activities of more than three persons remains<br />
a challenging task. In this paper, we propose a motion trajectory based method to classify human group activities. Gaussian<br />
Processes are introduced to represent human motion trajectories from a probabilistic perspective to handle the variability<br />
of people’s activities in group. With respect to the relationships of persons in group activities, three discriminative descriptors<br />
are designed, which are Individual, Dual and Unitized Group Activity Pattern. We adopt the Bag of Words approach<br />
to solve the problem of unbalanced number of persons in different activities. Experiments are conducted on the<br />
human group-activity video database, and the results show that our approach outperforms the state-of-the-art.<br />
13:30-16:30, Paper WeBCT9.27<br />
Extracting Captions in Complex Background from Videos<br />
Liu, Xiaoqian, Chinese Acad. of Sciences<br />
Wang, Weiqiang, Chinese Acad. of Sciences<br />
Captions in videos play a significant role for automatically understanding and indexing video content, since much semantic<br />
information is associated with them. This paper presents an effective approach to extracting captions from videos, in which<br />
multiple different categories of features (edge, color, stroke etc.) are utilized, and the spatio-temporal characteristics of<br />
captions are considered. First, our method exploits the distribution of gradient directions to decompose a video into a sequence<br />
of clips temporally, so that each clip contains a caption at most, which makes the successive extraction computation<br />
more efficient and accurate. For each clip, the edge and corner information are then utilized to locate text regions. Further,<br />
text pixels are extracted based on the assumption that text pixels in text regions always have homogeneous color, and their<br />
quantity dominates the region relative to non-text pixels with different colors. Finally, the segmentation results are further<br />
refined. The encouraging experimental results on 2565 characters have preliminarily validated our approach.<br />
13:30-16:30, Paper WeBCT9.28<br />
Keyframe-Guided Automatic Non-Linear Video Editing<br />
Rajgopalan, Vaishnavi, Concordia Univ.<br />
Ranganathan, Ananth, Honda Res. Inst. USA<br />
Rajagopalan, Ramgopal, Res. in Motion<br />
Mudur, Sudhir, Concordia Univ.<br />
We describe a system for generating coherent movies from a collection of unedited videos. The generation process is<br />
guided by one or more input keyframes, which determine the content of the generated video. The basic mechanism involves<br />
similarity analysis using the histogram intersection function. The function is applied to spatial pyramid histograms computed<br />
on the video frames in the collection using Dense SIFT features. A two-directional greedy path finding algorithm is<br />
used to select and arrange frames from the collection while maintaining visual similarity, coherence, and continuity. Our<br />
system demonstrates promising results on large video collections and is a first step towards increased automation in nonlinear<br />
video editing.<br />
13:30-16:30, Paper WeBCT9.29<br />
Images in News<br />
Sankaranarayanan, Jagan, Univ. of Maryland<br />
Samet, Hanan, Univ. of Maryland<br />
A system, called News Stand, is introduced that automatically extracts images from news articles. The system takes RSS feeds of news<br />
article and applies an online clustering algorithm so that articles belonging to the same news topic can be associated with the same cluster.<br />
Using the feature vector associated with the cluster, the images from news articles that form the cluster are extracted. First, the caption text<br />
associated with each of the images embedded in the news article is determined. This is done by analyzing the structure of the news article’s<br />
HTML page. If the caption and feature vector of the cluster are found to contain keywords in common, then the image is added to an image<br />
repository. Additional meta-information are now associated with each image such as caption, cluster features, names of people in the news<br />
article, etc. A very large repository containing more than 983k images from 12 million news articles was built using this approach. This<br />
repository also contained more than 86.8 million keywords associated with the images. The key contribution of this work is that it combines<br />
clustering and natural language processing tasks to automatically create a large corpus of news images with good quality tags or meta-information<br />
so that interesting vision tasks can be performed on it.<br />
- 234 -
13:30-16:30, Paper WeBCT9.30<br />
A Multimodal Approach to Violence Detection in Video Sharing Sites<br />
Giannakopoulos, Theodoros, Univ. of Athens<br />
Pikrakis, Aggelos, Univ. of Piraeus<br />
Theodoridis, Sergios, Univ. of Athens<br />
This paper presents a method for detecting violent content in video sharing sites. The proposed approach operates on a fusion<br />
of three modalities: audio, moving image and text data, the latter being collected from the accompanying user comments.<br />
The problem is treated as a binary classification task (violent vs non-violent content) on a 9-dimensional feature<br />
space, where 7 out of 9 features are extracted from the audio stream. The proposed method has been evaluated on 210<br />
YouTube videos and the overall accuracy has reached 82%.<br />
13:30-16:30, Paper WeBCT9.31<br />
Video Retrieval based on Tracked Features Quantization<br />
Hiroaki, Kubo, Keio Univ.<br />
Pilet, Julien, Keio Univ.<br />
Satoh, Shin’Ichi, National Inst. of Informatics<br />
Saito, Hideo, Keio Univ.<br />
In this paper, we present an image retrieval method based on feature tracking. Feature tracks are summarized into a compact<br />
discreet value and used for video indexing purpose. As opposed to existing space-time features, we do not make any assumption<br />
on the motion visible on the indexed videos. As a result, given an example query, our system is able to retrieve<br />
related videos from a large database. We evaluated our system with the copy detection benchmark MUSCLE-VCD-2007.<br />
We also ran retrieval experiment on hours of TV broadcast.<br />
13:30-16:30, Paper WeBCT9.32<br />
Interactive Web Video Advertising with Context Analysis and Search<br />
Wang, Bo, Chinese Acad. of Sciences<br />
Wang, Jinqiao, Chinese Acad. of Sciences<br />
Duan, Lingyu, Peking Univ.<br />
Tian, Qi, Univ. of Texas at San Antonio<br />
Lu, Hanqing, Chinese Acad. of Sciences<br />
Gao, Wen, PeKing Univ.<br />
Online media services and electronic commerce are booming recently. Previous studies have been devoted to contextual<br />
advertising, but few work deals with interactive web advertising. In this paper, we propose to put users in the loop of collecting<br />
contextual ad information with an interaction process, establishing semantic ad links across media platforms. Given<br />
an ad video, the key frames with explicit product information are located, which allow users to click favorite key frames<br />
for searching ads interactively. A three-stage contextual search is applied to find relevant products or services from web<br />
pages, i.e., searching visually similar product images on shopping websites, ranking product tags by text aggregation, and<br />
re-search textual items consisting of semantic meaningful tags to make a recommendation. In addition, users can choose<br />
automatically suggested keywords to reflect their intentions. Subjective evaluation has demonstrated the effectiveness of<br />
the proposed approach to interactive video advertising over the Web.<br />
13:30-16:30, Paper WeBCT9.33<br />
Selection of Photos for Album Building Applications<br />
Egorova, Marta, National Nuclear Res. Univ.<br />
Safonov, Ilia, National Nuclear Res. Univ.<br />
In this work we propose a new algorithm for selection of high-quality photos for album building applications. We describe<br />
how to select features for detection well-exposed, sharp and artifact-free photos. We considered two approaches: the first,<br />
typical way when all features are used in single AdaBoost classifiers committee and the second way, when decision tree,<br />
including 3 committees. Careful analysis of features and decision tree construction allowed better outcomes to be reached.<br />
- 235 -
13:30-16:30, Paper WeBCT9.34<br />
Comparison of Multidimensional Data Access Methods for Feature-Based Image Retrieval<br />
Arslan, Serdar, Middle East Tech. Univ.<br />
Açar, Esra, Middle East Tech. Univ.<br />
Saçan, Ahmet, Middle East Tech. Univ.<br />
Toroslu, Ismail Hakkı , Middle East Tech. Univ.<br />
Yazıcı, Adnan, Middle East Tech. Univ.<br />
Within the scope of information retrieval, efficient similarity search in large document or multimedia collections is a<br />
critical task. In this paper, we present a rigorous comparison of three different approaches to the image retrieval problem,<br />
including cluster-based indexing, distance-based indexing, and multidimensional scaling methods. The time and accuracy<br />
trade-offs for each of these methods are demonstrated on a large Corel image database. Similarity of images is obtained<br />
via a feature-based similarity measure using four MPEG-7 low-level descriptors. We show that an optimization of feature<br />
contributions to the distance measure can identify irrelevant features and is necessary to obtain the maximum accuracy.<br />
We further show that using multidimensional scaling can achieve comparable accuracy, while speeding-up the query times<br />
significantly by allowing the use of spatial access methods.<br />
13:30-16:30, Paper WeBCT9.35<br />
A Pixel-Based Evaluation Method for Text Detection in Color Images<br />
Anthimopoulos, Marios, National Center for Scientific Res. “Demokritos”<br />
Vlissidis, Nikolaos, National Center for Scientific Res. “Demokritos”<br />
Gatos, B., National Center for Scientific Res. “Demokritos”<br />
This paper proposes a performance evaluation method for text detection in color images. The method, contrary to previous<br />
approaches, is not based on the inexplicitly defined text bounding boxes for the evaluation of the text detection result but<br />
considers only the text pixels detected by binarizing the image and applying a color inversion if needed. Moreover, in<br />
order to gain independence from the chosen binarization algorithm, the method uses the skeleton of the binarized image.<br />
The results produced by the proposed evaluation protocol proved to be quite representative and reasonable compared to<br />
the corresponding optical result.<br />
13:30-16:30, Paper WeBCT9.36<br />
Active Boosting for Interactive Object Retrieval<br />
Lechervy, Alexis, ETIS, CNRS, ENSEA, Univ. Cergy-Pontoise<br />
Gosselin, Philippe Henri, CNRS<br />
Precioso, Frederic, ETIS, CNRS, ENSEA, Univ. Cergy-Pontoise<br />
This paper presents a new algorithm based on boosting for interactive object retrieval in images. Recent works propose<br />
online boosting algorithms where weak classifier sets are iteratively trained from data. These algorithms are proposed for<br />
visual tracking in videos, and are not well adapted to online boosting for interactive retrieval. We propose in this paper to<br />
iteratively build weak classifiers from images, labeled as positive by the user during a retrieval session. A novel active<br />
learning strategy for the selection of images for user annotation is also proposed. This strategy is used to enhance the<br />
strong classifier resulting from boosting process, but also to build new weak classifiers. Experiments have been carried<br />
out on a generalist database in order to compare the proposed method to a SVM based reference approach.<br />
13:30-16:30, Paper WeBCT9.37<br />
Geotagged Photo Recognition using Corresponding Aerial Photos with Multiple Kernel Learning<br />
Keita, Yaegashi, Univ. of Electro-Commnications<br />
Keiji, Yanai, Univ. of Electro-Commnications<br />
In this paper, we treat with generic object recognition for geotagged images. As a recognition method for geotagged photos,<br />
we have already proposed exploiting aerial photos around geotag places as additional image features for visual recognition<br />
of geotagged photos. In the previous work, to fuse two kinds of features, we just concatenate them. Instead, in this paper,<br />
we introduce Multiple Kernel Learning (MKL) to integrate both features of photos and aerial images. MKL can estimate<br />
the contribution weights to integrate both kinds of features. In the experiments, we confirmed effectiveness of usage of<br />
aerial photos for recognition of geotagged photos, and we evaluated the weights of both features estimated by MKL for<br />
eighteen concepts.<br />
- 236 -
13:30-16:30, Paper WeBCT9.38<br />
Efficient Semantic Indexing for Image Retrieval<br />
Pulla, Chandrika, International Inst. of Information Tech. Hyderabad<br />
Karthik, Suman, International Inst. of Information Tech. Hyderabad<br />
Jawahar, C. V., IIIT<br />
Semantic analysis of a document collection can be viewed as an unsupervised clustering of the constituent words and documents<br />
around hidden or latent concepts. This has shown to improve the performance of visual bag of words in image retrieval.<br />
However, the enhancement in performance depends heavily on the right choice of number of semantic concepts.<br />
Most of the semantic indexing schemes are also computationally costly. In this paper, we employ a bipartite graph model<br />
(BGM) for image retrieval. BGM is a scalable data structure that aids semantic indexing in an efficient manner. It can also<br />
be incrementally updated. BGM uses \textbf{tf-idf} values for building a semantic bipartite graph. We also introduce a<br />
graph partitioning algorithm that works on the BGM to retrieve semantically relevant images from a database. We demonstrate<br />
the properties as well as performance of our semantic indexing scheme through a series of experiments. We also<br />
compare our methods with incremental pLSA.<br />
13:30-16:30, Paper WeBCT9.39<br />
Improving and Aligning Speech with Presentation Slides<br />
Swaminathan, Ranjini, Univ. of Arizona<br />
Thompson, Michael E., Univ. of Arizona<br />
Fong, Sandiway, Univ. of Arizona<br />
Efrat, Alon, Univ. of Arizona<br />
Amir, Arnon<br />
Barnard, Kobus, Univ. of Arizona<br />
We present a novel method to correct automatically generated speech transcripts of talks and lecture videos using text<br />
from accompanying presentation slides. The approach finesses the challenges of dealing with technical terms which are<br />
often outside the vocabulary of speech recognizers. Further, we align the transcript to the slide word sequence so that we<br />
can improve the organization of closed captioning for hearing impaired users, and improve automatic highlighting or magnification<br />
for visually impaired users. For each speech segment associated with a slide, we construct a sequential Hidden<br />
Markov Model for the observed phonemes that follows slide word order, interspersed with text not on the slide. Incongruence<br />
between slide words and mistaken transcript words is accounted for using phoneme confusion probabilities. Hence,<br />
transcript words different from aligned high probability slide words can be corrected. Experiments on six talks show improvement<br />
in transcript accuracy and alignment with slide words.<br />
13:30-16:30, Paper WeBCT9.40<br />
The ImageCLEF Medical Retrieval Task at <strong>ICPR</strong> <strong>2010</strong> - Information Fusion<br />
Kalpathy-Cramer, Jayashree, Oregon Health & Science Univ.<br />
Müller, Henning, Univ. of Applied Sciences<br />
An increasing number of clinicians, researchers, educators and patients routinely search for medical information on the<br />
Internet as well as in image archives. However, image retrieval is far less understood and developed than text-based search.<br />
The ImageCLEF medical image retrieval task is an international benchmark that enables researchers to assess and compare<br />
techniques for medical image retrieval using standard test collections. Although text retrieval is mature and well researched,<br />
it is limited by the quality and availability of the annotations associated with the images. Advances in computer vision<br />
have led to methods for using the image itself as search entity. However, the success of purely content-based techniques<br />
has been limited and these systems have not had much clinical success. On the other hand a combination of text- and content-based<br />
retrieval can achieve improved retrieval performance if combined effectively. Combining visual and textual<br />
runs is not trivial based on experience in ImageCLEF. The goal of the fusion challenge at <strong>ICPR</strong> is to encourage participants<br />
to combine visual and textual results to improve search performance. Participants were provided textual and visual runs,<br />
as well as the results of the manual judgments from ImageCLEFmed 2008 as training data. The goal was to combine<br />
textual and visual runs from 2009. In this paper, we present the results from this <strong>ICPR</strong> contest.<br />
13:30-16:30, Paper WeBCT9.41<br />
Unified Approach to Detection and Identification of Commercial Films by Temporal Occurrence Pattern<br />
Putpuek, Narongsak, Chulalongkorn Univ.<br />
- 237 -
Cooharojananone, Nagul, Chulalongkorn Univ.<br />
Lursinsap, Chidchanok, Chulalongkorn Univ.<br />
Satoh, Shin’Ichi, National Inst. of Informatics<br />
In this paper, we propose a method to detect and identify commercial films from broadcast videos by using Temporal Occurrence<br />
Pattern (TOP). Our method uses the characteristic of broadcast videos in Japan that each individual commercial<br />
film appears multiple times in broadcast stream and typically has the same duration (e.g., 15 seconds). Using this characteristic,<br />
the method can detect as well as identify individual commercial films within given video archive. Based on simple<br />
signature (global feature) for each frame image, the method first puts all frames into numbers of buckets where each bucket<br />
contains frames having the same signature, and thus they appear the same. For each bucket, TOP as a binary sequence<br />
representing the occurrence time within video archive is then generated. All buckets are then clustered using simple hierarchical<br />
clustering with similarity between TOPs allowing possible temporal offset. This clustering stage can stitch up all<br />
frames for each commercial film and identify multiple occurrence of the same commercial film at the same time. We tested<br />
our method using actual broadcast video archive and confirmed good performance in detecting and identifying commercial<br />
films.<br />
- 238 -
Technical Program for Thursday<br />
August 26, <strong>2010</strong><br />
- 239 -
- 240 -
ThAT1 Marmara Hall<br />
Object Detection and Recognition - IV Regular Session<br />
Session chair: Lee, Kyoung Mu (Seoul National Univ.)<br />
09:00-09:20, Paper ThAT1.1<br />
Visual Recognition of Types of Structural Corridor Landmarks using Vanishing Points Detection and Hidden Markov<br />
Models<br />
Park, Young-Bin, Hanyang Univ.<br />
Kim, Sung-Su, Hanyang Univ.<br />
Suh, Il Hong, Hanyang Univ.<br />
In this paper, to provide a robot with information relative to structure of its environment, we propose a method to recognize<br />
types of structural corridor landmarks such as T-junction, L-junction, end of the corridor, using vanishing points-based visual<br />
image features and hidden Markov models. Several experimental results are illustrated to emonstrate the validity of the proposed<br />
approach in a real environment.<br />
09:20-09:40, Paper ThAT1.2<br />
Multi-Object Segmentation in a Projection Plane using Subtraction Stereo<br />
Ubukata, Toru, Chuo University / CREST, JST<br />
Terabayashi, Kenji, Chuo Univ.<br />
Moro, Alessandro, Univ. of Trieste<br />
Umeda, Kazunori, Chuo Univ.<br />
We propose a method for multi-object segmentation in a projection plane. Our algorithm requires a stereo camera system<br />
called Subtraction Stereo, which extracts foreground information with a fixed stereo camera. The main contribution of this<br />
paper is how the image sequences that include partial occlusion of the foreground objects can be accurately segmented using<br />
mean shift clustering in real-time processing. The proposed method is suitable for inside a medium-sized environment, such<br />
as a room. Finally, we try to segment the sequences that include occlusion and show the accuracy of the proposed method.<br />
09:40-10:00, Paper ThAT1.3<br />
Transitive Closure based Visual Words for Point Matching in Video Sequence<br />
Bhat, Srikrishna, INRIA<br />
Berger, Marie-Odile, INRIA<br />
Simon, Gilles, Nancy-Univ.<br />
Sur, Frédéric, INPL / INRIA Nancy Grand Est<br />
We present Transitive Closure based visual word formation technique for obtaining robust object representations from<br />
smoothly varying multiple views. Each one of our visual words is represented by a set of feature vectors which is obtained<br />
by performing transitive closure operation on SIFT features. We also present range-reducing tree structure to speed up the<br />
transitive closure operation. The robustness of our visual word representation is demonstrated for Structure from Motion<br />
(SfM) and location identification in video images.<br />
10:00-10:20, Paper ThAT1.4<br />
Constrained Energy Minimization for Matching-Based Image Recognition<br />
Gass, Tobias, RWTH Aachen Univ.<br />
Dreuw, Philippe, RWTH Aachen Univ.<br />
Ney, Hermann, RWTH Aachen Univ.<br />
We propose to use energy minimization in MRFs for matching-based image recognition tasks. To this end, the Tree-<br />
Reweighted Message Passing algorithm is modified by geometric constraints and efficiently used by exploiting the guaranteed<br />
monotonicity of the lower bound within a nearest-neighbor based classification framework. The constraints allow for a<br />
speedup linear to the dimensionality of the reference image, and the lower bound allows to optimally prune the nearestneighbor<br />
search without loosing accuracy, effectively allowing to increase the number of optimization iterations without an<br />
effect on runtime. We evaluate our approach on well-known OCR and face recognition tasks and on the latter outperform<br />
current state-of-the-art.<br />
- 241 -
10:20-10:40, Paper ThAT1.5<br />
A Re-Evaluation of Pedestrian Detection on Riemannian Manifolds<br />
Tosato, Diego, Univ. of Verona<br />
Farenzena, Michela, Univ. of Verona<br />
Cristani, Marco, Univ. of Verona<br />
Murino, Vittorio, Univ. of Verona<br />
Boosting covariance data on Riemannian manifolds has proven to be a convenient strategy in a pedestrian detection context.<br />
In this paper we show that the detection performances of the state-of-the-art approach of Tuzel et al. [7] can be greatly improved,<br />
from both a computational and a qualitative point of view, by considering practical and theoretical issues, and<br />
allowing also the estimation of occlusions in a fine way. The resulting detection system reaches the best performance on the<br />
INRIA dataset, setting novel state-of-the art results.<br />
ThAT2 Anadolu Auditorium<br />
Classification - I Regular Session<br />
Session chair: Duin, Robert (TU Delft)<br />
09:00-09:20, Paper ThAT2.1<br />
An Optimum Class-Rejective Decision Rule and its Evaluation<br />
Le Capitaine, Hoel, Univ. of La Rochelle<br />
Frelicot, Carl, Univ. of La Rochelle<br />
Decision-making systems intend to copy human reasoning which often consists in eliminating highly non probable situations<br />
(e.g. diseases, suspects) rather than selecting the most reliable ones. In this paper, we present the concept of class-rejective<br />
rules for pattern recognition. Contrary to usual reject option schemes where classes are selected when they may correspond<br />
to the true class of the input pattern, it allows to discard classes that can not be the true one. Optimality of the rule is proven<br />
and an upper-bound for the error probability is given. We also propose a criterion to evaluate such class-rejective rules. Classification<br />
results on artificial and real datasets are provided.<br />
09:20-09:40, Paper ThAT2.2<br />
A Practical Heterogeneous Classifier for Relational Databases<br />
Manjunath, Geetha, Indian Inst. of Science<br />
M, Narasimha Murty, Indian Inst. of Science<br />
Sitaram, Dinkar, Hewlett Packard Company<br />
Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional singletable<br />
machine learning techniques over such data not only incur a computational penalty for converting to a flat form (megajoin),<br />
even the human-specified semantic information present in the relations is lost. In this paper, we present a two-phase<br />
hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose<br />
a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. A preliminary<br />
evaluation on TPCH and UCI benchmarks shows reduced training time without any loss of prediction accuracy.<br />
09:40-10:00, Paper ThAT2.3<br />
Spatial Representation for Efficient Sequence Classification<br />
Kuksa, Pavel, Rutgers Univ.<br />
Pavlovic, Vladimir, Rutgers Univ.<br />
We present a general, simple feature representation of sequences that allows efficient inexact matching, comparison and<br />
classification of sequential data. This approach, recently introduced for the problem of biological sequence classification,<br />
exploits a novel multi-scale representation of strings. The new representation leads to discovery of very efficient algorithms<br />
for string comparison, independent of the alphabet size. We show that these algorithms can be generalized to handle a wide<br />
gamut of sequence classification problems in diverse domains such as the music and text sequence classification. The presented<br />
algorithms offer low computational cost and highly scalable implementations across different application domains.<br />
The new method demonstrates order-of-magnitude running time improvements over existing state-of-the-art ap<br />
proaches while matching or exceeding their predictive accuracy.<br />
- 242 -
10:00-10:20, Paper ThAT2.4<br />
Rectifying Non-Euclidean Similarity Data using Ricci Flow Embedding<br />
Xu, Weiping, Univ. of York<br />
Hancock, Edwin, Univ. of York<br />
Wilson, Richard, Univ. of York<br />
Similarity based pattern recognition is concerned with the analysis of patterns that are specified in terms of object dissimilarity<br />
or proximity rather than ordinal values. For many types of data and measures, these dissimilarities are not Euclidean.<br />
This hinders the use of many machine-learning techniques. In this paper, we provide a means of correcting or rectifying<br />
the similarities so that the non-Euclidean artifacts are minimized. We consider the data to be embedded as points on a<br />
curved manifold and then evolve the manifold so as to increase its flatness. Our work uses the idea of Ricci flow on the<br />
constant curvature Riemannian manifold to modify the Gaussian curvatures on the edges of a graph representing the non-<br />
Euclidean data. We demonstrate the utility of our method on the standard ``Chicken pieces’’ dataset and show that we can<br />
transform the non-Euclidean distances into Euclidean space.<br />
10:20-10:40, Paper ThAT2.5<br />
One-Vs-All Training of Prototype Classifier for Pattern Classification and Retrieval<br />
Liu, Cheng-Lin, Chinese Acad. of Sciences<br />
Prototype classifiers trained with multi-class classification objective are inferior in pattern retrieval and outlier rejection.<br />
To improve the binary classification (detection, verification, retrieval, outlier rejection) performance of prototype classifiers,<br />
we propose a one-vs-all training method, which enriches each prototype as a binary discriminant function with a local<br />
threshold, and optimizes both the prototype vectors and the thresholds on training data using a binary classification objective,<br />
the cross-entropy (CE). Experimental results on two OCR datasets show that prototype classifiers trained by the onevs-all<br />
method is superior in both multi-class classification and binary classification.<br />
ThAT3 Topkapı Hall A<br />
Computer Vision Applications - I Regular Session<br />
Session chair: Haindl, Michael (Institute of Information Theory)<br />
09:00-09:20, Paper ThAT3.1<br />
Probabilistic Modeling of Dynamic Traffic Flow across Non-Overlapping Camera Views<br />
Huang, Ching-Chun, National Chiao Tung University<br />
Chiu, Wei-Chen, Department of Computer Science<br />
Wang, Sheng-Jyh, National Chiao Tung Univ.<br />
Chuang, Jen-Hui, National Chiao Tung Univ.<br />
In this paper, we propose a probabilistic method to model the dynamic traffic flow across non-overlapping camera views.<br />
By assuming the transition time of object movement follows a certain global model, we may infer the time-varying traffic<br />
status in the unseen region without performing explicit object correspondence between camera views. In this paper, we<br />
model object correspondence and parameter estimation as a unified problem under the proposed Expectation-Maximization<br />
(EM) based framework. By treating object correspondence as a latent random variable, the proposed framework can iteratively<br />
search for the optimal model parameters with the implicit consideration of object correspondence.<br />
09:20-09:40, Paper ThAT3.2<br />
Vehicle Recognition as Changes in Satellite Imagery<br />
Ozcanli, Ozge Can, Brown Univ.<br />
Mundy, Joseph,<br />
Over the last several years, a new probabilistic representation for 3-d volumetric modeling has been developed. The main purpose of the<br />
model is to detect deviations from the normal appearance and geometry of the scene, i.e. change detection. In this paper, the model is<br />
utilized to characterize changes in the scene as vehicles. In the training stage, a compositional part hierarchy is learned to represent the<br />
geometry of Gaussian intensity extrema primitives exhibited by vehicles. In the test stage, the learned compositional model produces vehicle<br />
detections. Vehicle recognition performance is measured on low-resolution satellite imagery and detection accuracy is significantly improved<br />
over the initial change map given by the 3-d volumetric model. A PCA-based Bayesian recognition algorithm is implemented for comparison,<br />
which exhibits worse performance than the proposed method.<br />
- 243 -
09:40-10:00, Paper ThAT3.3<br />
Crowd Motion Analysis using Linear Cyclic Pursuit<br />
Viswanathan, Srikrishnan, I.I.T Bombay<br />
Chaudhuri, Subhasis, IIT<br />
Crowd motion analysis, where there is interdependence amongst the constituent elements, is a relatively unexplored application<br />
area in computer vision. In this work, we propose a fast method for short-term crowd motion prediction using a<br />
sparse set of particles. We study the dynamics of a crowd motion model and linear cyclic pursuit. We show that linear<br />
cyclic pursuit naturally captures the repulsive and attractive forces acting on the individual crowd member. The pursuit<br />
parameters are estimated from videos in an online manner using a feature tracker. Short term trajectory prediction is done<br />
by numerical solution of estimated cyclic pursuit equation. We demonstrate the suitability of the proposed technique<br />
through extensive experimentations.<br />
10:00-10:20, Paper ThAT3.4<br />
Integrating Object Detection with 3D Tracking towards a Better Driver Assistance System<br />
Prisacariu, Victor Adrian, Univ. of Oxford<br />
Timofte, Radu, Katholieke Univ. Leuven<br />
Zimmermann, Karel, Katholieke Univ. Leuven<br />
Reid, Ian,<br />
Van Gool, Luc<br />
Driver assistance helps save lives. Accurate 3D pose is required to establish if a traffic sign is relevant to the driver. We<br />
propose a real-time system that integrates single view detection with region-based 3D tracking of road signs. The optimal<br />
set of candidate detections is found, followed by AdaBoost cascades and SVMs. The 2D detections are then employed in<br />
simultaneous 2D segmentation and 3D pose tracking, using the known 3D model of the recognised traffic sign. We demonstrate<br />
the abilities of our system by tracking multiple road signs in real world scenarios.<br />
10:20-10:40, Paper ThAT3.5<br />
Real-Time Automatic Traffic Accident Recognition using HFG<br />
Bakheet, Samy, Otto-von-Guericke Univ. Magdeburg<br />
Al-Hamadi, Ayoub, Otto-von-Guericke Univ. Magdeburg<br />
Michaelis, Bernd, Otto-von-Guericke Univ. Magdeburg<br />
Sayed, Usama, Otto-von-Guericke Univ. Magdeburg<br />
Recently, the problem of automatic traffic accident recognition has appealed to the machine vision community due to its<br />
implications on the development of autonomous Intelligent Transportation Systems (ITS). In this paper, a new framework<br />
for real-time automated traffic accidents recognition using Histogram of Flow Gradient (HFG) is proposed. This framework<br />
performs two major steps. First, HFG-based features are extracted from video shots. Second, logistic regression is employed<br />
to develop a model for the probability of occurrence of an accident by fitting data to a logistic curve. In case of occurrence<br />
of an accident, the trajectory of vehicle by which the accident was occasioned is determined. Preliminary results on real<br />
video sequences confirm the effectiveness and the applicability of the proposed approach, and it can offer delay guarantees<br />
for real-time surveillance and monitoring scenarios.<br />
ThAT4 Dolmabahçe Hall A<br />
Semi-Supervised and Metric Learning Regular Session<br />
Session chair: Sanfeliu, Alberto (Universitat Politecnica de Catalunya)<br />
09:00-09:20, Paper ThAT4.1<br />
Semi-Supervised Distance Metric Learning by Quadratic Programming<br />
Cevikalp, Hakan, Eskisehir Osmangazi Univ.<br />
This paper introduces a semi-supervised distance metric learning algorithm which uses pair-wise equivalence (similarity<br />
and dissimilarity) constraints to improve the original distance metric in lower-dimensional input spaces. We restrict ourselves<br />
to pseudo-metrics that are in quadratic forms parameterized by positive semi-definite matrices. The proposed method<br />
works in both the input space and kernel in-duced feature space, and learning distance metric is formulated as a quadratic<br />
- 244 -
optimization problem which returns a global optimal solution. Experimental results on several databases show that the<br />
learned distance metric improves the performances of the subsequent classification and clustering algorithms.<br />
09:20-09:40, Paper ThAT4.2<br />
A Comparitive Study on the Use of an Ensemble of Feature Extractors for the Automatic Design of Local Image Descriptors<br />
Carneiro, Gustavo, Tech. Univ. of Lisbon<br />
The use of an ensemble of feature spaces trained with distance metric learning methods has been empirically shown to be<br />
useful for the task of automatically designing local image descriptors. In this paper, we present a quantitative analysis<br />
which shows that in general, nonlinear distance metric learning methods provide better results than linear methods for automatically<br />
designing local image descriptors. In addition, we show that the learned feature spaces present better results<br />
than state of- the-art hand designed features in benchmark quantitative comparisons. We discuss the results and suggest<br />
relevant problems for further investigation.<br />
09:40-10:00, Paper ThAT4.3<br />
A Study on Combining Sets of Differently Measured Dissimilarities<br />
Ibba, Alessandro, Delft Univ. of Tech.<br />
Duin, Robert, Delft Univ. of Tech.<br />
Lee, Wan-Jui, Delft Univ. of Tech.<br />
The ways distances are computed or measured enable us to have different representations of the same objects. In this paper<br />
we want to discuss possible ways of merging different sources of information given by differently measured dissimilarity<br />
representations. We compare here a simple averaging scheme [1] with dissimilarity forward selection and other techniques<br />
based on the learning of weights of linear and quadratic forms. Our general conclusion is that, although the more advanced<br />
forms of combination cannot always lead to better classification accuracies, combining given distance matrices prior to<br />
training is always worthwhile. We can thereby suggest which combination schemes are preferable with respect to the problem<br />
data.<br />
10:00-10:20, Paper ThAT4.4<br />
Efficient Kernel Learning from Constraints and Unlabeled Data<br />
Soleymani Baghshah, Mahdieh, Sharif Univ. of Tech.<br />
Bagheri Shouraki, Saeed, Sharif Univ. of Tech.<br />
Recently, distance metric learning has been received an increasing attention and found as a powerful approach for semisupervised<br />
learning tasks. In the last few years, several methods have been proposed for metric learning when must-link<br />
and/or cannot-link constraints as supervisory information are available. Although many of these methods learn global Mahalanobis<br />
metrics, some recently introduced methods have tried to learn more flexible distance metrics using a kernelbased<br />
approach. In this paper, we consider the problem of kernel learning from both pairwise constraints and unlabeled<br />
data. We propose a method that adapts a flexible distance metric via learning a nonparametric kernel matrix. We formulate<br />
our method as an optimization problem that can be solved efficiently. Experimental evaluations show the effectiveness of<br />
our method compared to some recently introduced methods on a variety of data sets.<br />
10:20-10:40, Paper ThAT4.5<br />
Semi-Supervised Graph Learning: Near Strangers or Distant Relatives<br />
Chen, Weifu, Sun Yat-sen Univ.<br />
Feng, Guocan, Sun Yat-Sen Univ.<br />
In this paper, an easily implemented semi-supervised graph learning method is presented for dimensionality reduction and<br />
clustering, using the most of prior knowledge from limited pairwise constraints. We extend instance-level constraints to<br />
space-level constraints to construct a more meaningful graph. By decomposing the (normalized) Laplacian matrix of this<br />
graph, to use the bottom eigenvectors leads to new representations of the data, which are hoped to capture the intrinsic<br />
structure. The proposed method improves the previous constrained learning methods. Furthermore, to achieve a given<br />
clustering accuracy, fewer constraints are required in our method. Experimental results demonstrate the advantages of the<br />
proposed method.<br />
- 245 -
ThAT5 Dolmabahçe Hall B<br />
Image Segmentation - I Regular Session<br />
Session chair: Puig, Domenec (Univ. Rovira i Virgili)<br />
09:00-09:20, Paper ThAT5.1<br />
Robust Color Image Segmentation through Tensor Voting<br />
Moreno, Rodrigo, Rovira i Virgili Univ.<br />
Garcia Garcia, Miguel Angel, Autonomous Univ. of Madrid<br />
Puig, Domenec, Univ. Rovira i Virgili<br />
This paper presents a new method for robust color image segmentation based on tensor voting, a robust perceptual grouping<br />
technique used to extract salient information from noisy data. First, an adaptation of tensor voting to both image denoising<br />
and robust edge detection is applied. Second, pixels in the filtered image are classified into likely-homogeneous and likelyinhomogeneous<br />
by means of the edginess maps generated in the first step. Third, the likely-homosgeneous pixels are segmented<br />
through an efficient graph-based segmenter. Finally, a modified version of the same graph-based segmenter is<br />
applied to the likely-inhomogeneous pixels in order to obtain the final segmentation. Experiments show that the proposed<br />
algorithm has a better performance than the state-of-the-art.<br />
09:20-09:40, Paper ThAT5.2<br />
An Improved Fluid Vector Flow for Cavity Segmentation in Chest Radiographs<br />
Xu, Tao, Univ. of Alberta<br />
Cheng, Irene, Univ. of Alberta<br />
Mandal, Mrinal, Univ. of Alberta<br />
Fluid vector flow (FVF) is a recently developed edge-based parametric active contour model for segmentation. By keeping<br />
its merits of large capture range and ability to handle acute concave shapes, we improved the model from two aspects:<br />
edge leakage and control point selection. Experimental results of cavity segmentation in chest radiographs show that the<br />
proposed method provides at least 8% improvement over the original FVF method.<br />
09:40-10:00, Paper ThAT5.3<br />
Patchy Aurora Image Segmentation based on ALBP and Block Threshold<br />
Fu, Rong, Xidian Univ.<br />
Gao, Xinbo, Xidian Univ.<br />
Jian, Yongjun, Xidian Univ.<br />
The proportion of aurora region to the field of view is an important index to measure the range and scale of aurora. A<br />
crucial step to obtain the index is to segment aurora region from the background. A simple and efficient aurora image segmentation<br />
algorithm is proposed, which is composed of feature representation based on adaptive local binary patterns<br />
(ALBP) and aurora region estimation through block threshold. First the ALBP features of sky image are extracted and the<br />
threshold is determined. The aurora image to be segmented is then equally divided into detection blocks from which ALBP<br />
features are also extracted. Aurora block is estimated through comparison its ALBP features with the threshold. Simple as<br />
it is, processing in huge data set is possible. The experiment illustrates the segmentation effect of the proposed method is<br />
satisfying from human visual aspect and segmentation accuracy.<br />
10:00-10:20, Paper ThAT5.4<br />
Retinal Image Segmentation based on Mumford-Shah Model and Gabor Wavelet Filter<br />
Du, Xiaojun, Concordia Univ.<br />
Bui, Tien D., Concordia Univ.<br />
Automatic retinal image segmentation is desirable for some disease diagnosis such as diabetes. In this paper, we propose<br />
a new image segmentation method to segment retinal images. The new method is based on the Mumford-Shah (MS)<br />
model. As a region-based approach, the MS model is a good segmentation technique. However, due to non-uniform illumination,<br />
some traditional approximations of the MS model cannot deal with this type of problems. We present a new<br />
method that requires no approximations. Instead, Gabor wavelet filter is used, and the method can segment objects with<br />
complicated image intensity distribution. The method is used to detect blood vessels in retinal images. The results are<br />
comparable with or better than state-of-the-art. Our method requires no training and is relatively fast.<br />
- 246 -
10:20-10:40, Paper ThAT5.5<br />
On Selecting an Optimal Number of Clusters for Color Image Segmentation<br />
Le Capitaine, Hoel, Univ. of La Rochelle<br />
Frelicot, Carl, Univ. of La Rochelle<br />
This paper addresses the problem of region-based color image segmentation using a fuzzy clustering algorithm, e.g. a<br />
spatial version of fuzzy c-means, in order to partition the image into clusters corresponding to homogeneous regions. We<br />
propose to determine the optimal number of clusters, and so the number of regions, by using a new cluster validity index<br />
computed on fuzzy partitions. Experimental results and comparison with other existing methods show the validity and the<br />
efficiency of the proposed method.<br />
ThAT6 Topkapı Hall B<br />
Face Ageing Regular Session<br />
Session chair: Yanikoglu, Berrin (Sabanci Univ.)<br />
09:00-09:20, Paper ThAT6.1<br />
Cross-Age Face Recognition on a Very Large Database: The Performance versus Age Intervals and Improvement<br />
using Soft Biometric Traits<br />
Guo, Guodong, West Virginia Univ.<br />
Mu, Guowang, North Carolina Central Univ.<br />
Ricanek, Karl, Univ. of North Carolina<br />
Facial aging can degrade the face recognition performance dramatically. Traditional face recognition studies focus on<br />
dealing with pose, illumination, and expression (PIE) changes. Considering a large span of age difference, the influence<br />
of facial aging could be very significant compared to the PIE variations. How big the aging influence could be? What is<br />
the relation between recognition accuracy and age intervals? Can soft biometrics be used to improve the face recognition<br />
performance under age variations? In this paper we address all these issues. First, we investigate the face recognition performance<br />
degradation with respect to age intervals between the probe and gallery images on a very large database which<br />
contains about 55,000 face images of more than 13,000 individuals. Second, we study if soft biometric traits, e.g., race,<br />
gender, height, and weight, could be used to improve the cross-age face recognition accuracies, and how useful each of<br />
them could be.<br />
09:20-09:40, Paper ThAT6.2<br />
A Ranking Approach for Human Age Estimation based on Face Images<br />
Chang, Kuang-Yu, Acad. Sinica<br />
Chen, Chu-Song, Acad. Sinica<br />
Hung, Yi-Ping, National Taiwan Univ.<br />
In our daily life, it is much easier to distinguish which person is elder between two persons than how old a person is. When<br />
inferring a person’s age, we may compare his or her face with many people whose ages are known, resulting in a series of<br />
comparative results, and then we conjecture the age based on the comparisons. This process involves numerous pairwise<br />
preferences information obtained by a series of queries, where each query compares the target person’s face to those faces<br />
in a database. In this paper, we propose a ranking-based framework consisting of a set of binary queries. Each query<br />
collects a binary-classification-based comparison result. All the query results are then fused to predict the age. Experimental<br />
results show that our approach performs better than traditional multi-class-based and regression-based approaches for age<br />
estimation.<br />
09:40-10:00, Paper ThAT6.3<br />
Perceived Age Estimation under Lighting Condition Change by Covariate Shift Adaptation<br />
Ueki, Kazuya, NEC Soft, Ltd.<br />
Sugiyama, Masashi, Tokyo Inst. of Tech.<br />
Ihara, Yasuyuki, NEC Soft, Ltd.<br />
Over the recent years, a great deal of effort has been made to age estimation from face images. It has been reported that<br />
age can be accurately estimated under controlled environment such as frontal faces, no expression, and static lighting conditions.<br />
However, it is not straightforward to achieve the same accuracy level in real-world environment because of con-<br />
- 247 -
siderable variations in camera settings, facial poses, and illumination conditions. In this paper, we apply a recently-proposed<br />
machine learning technique called covariate shift adaptation to alleviating lighting condition change between laboratory<br />
and practical environment. Through real-world age estimation experiments, we demonstrate the usefulness of our proposed<br />
method.<br />
10:00-10:20, Paper ThAT6.4<br />
Ranking Model for Facial Age Estimation<br />
Yang, Peng, Rutgers Univ.<br />
Lin, Zhong, Rutgers Univ.<br />
Metaxas, Dimitris, Rutgers Univ.<br />
Feature design and feature selection are two key problems in facial image based age perception. In this paper, we proposed<br />
to using ranking model to do feature selection on the haar-like features. In order to build the pairwise samples for the ranking<br />
model, age sequences are organized by personal aging pattern within each subject. The pairwise samples are extracted<br />
from the sequence of each subject. Therefore, the order information is intuitively contained in the pairwise data. Ranking<br />
model is used to select the discriminative features based on the pairwise data. The combination of the ranking model and<br />
personal aging pattern are powerful to select the discriminative features for age estimation. Based on the selected features,<br />
different kinds of regression models are used to build prediction models. The experiment results show the performance of<br />
our method is comparable to the state-of-art works.<br />
10:20-10:40, Paper ThAT6.5<br />
Development of Recognition Engine for Baby Faces<br />
Di, Wen, Tsinghua Univ.<br />
Zhang, Tong, Hewlett-Packard Lab.<br />
Fang, Chi, Tsinghua Univ.<br />
Ding, Xiaoqing, Tsinghua Univ.<br />
Existing face recognition approaches are mostly developed based on adult faces which may not work well in distinguishing<br />
faces of kids. Especially, baby faces tend to have common features such as round cheeks and chins, so that current face<br />
recognition engines often fail to differentiate them. In this paper, we present methods for discriminating baby faces from<br />
adult faces, and for training a special engine to recognize faces of different babies. To achieve these, we collected a huge<br />
number of baby face images and developed a software system to annotate the image database. Experimental results prove<br />
that the trained baby face recognizer achieves dramatic improvement on differentiating baby faces and the fusion of it<br />
with the conventional adult face recognition engine also works well on the overall data set containing both baby and adult<br />
faces.<br />
ThAT7 Dolmabahçe Hall C<br />
Document Retrieval Regular Session<br />
Session chair: Faruquie, Tanveer (IBM Res. India)<br />
09:00-09:20, Paper ThAT7.1<br />
An Information Extraction Model for Unconstrained Handwritten Documents<br />
Thomas, Simon, LITIS<br />
Chatelain, Clement, LITIS Lab. INSA de Rouen<br />
Heutte, Laurent, Univ. de Rouen<br />
Paquet, Thierry, Univ. of Rouen<br />
In this paper, a new information extraction system by statistical shallow parsing in unconstrained handwritten documents<br />
is introduced. Unlike classical approaches found in the literature as keyword spotting or full document recognition, our<br />
approch relies on a strong and powerful global handwriting model. A entire text line is considered as an indivisible entity<br />
and is modeled with Hidden Markov Models. In this way, text line shallow parsing allows fast extraction of the relevant<br />
information in any document while rejecting at the same time irrelevant information. First results are promising and show<br />
the interest of the approach.<br />
- 248 -
09:20-09:40, Paper ThAT7.2<br />
HMM-Based Word Spotting in Handwritten Documents using Subword Models<br />
Fischer, Andreas, Univ. of Bern<br />
Keller, Andreas, Univ. of Bern<br />
Frinken, Volkmar, Univ. of Bern<br />
Bunke, Horst, Univ. of Bern<br />
Handwritten word spotting aims at making document images amenable to browsing and searching by keyword retrieval.<br />
In this paper, we present a word spotting system based on Hidden Markov Models (HMM) that uses trained subword models<br />
to spot keywords. With the proposed method, arbitrary keywords can be spotted that do not need to be present in the<br />
training set. Also, no text line segmentation is required. On the modern IAM off-line database and the historical George<br />
Washington database we show that the proposed system outperforms a standard template matching approach based on dynamic<br />
time warping (DTW).<br />
09:40-10:00, Paper ThAT7.3<br />
A Content Spotting System for Line Drawing Graphic Document Images<br />
Luqman, Muhammad Muzzamil, Univ. Françoise Rabelaise Tours France; CVC Barcelona<br />
Brouard, Thierry, Univ. Françoise Rabelaise Tours France<br />
Ramel, Jean-Yves, Univ. François Rabelais de Tours<br />
Llados, Josep, Computer Vision Center<br />
We present a content spotting system for line drawing graphic document images. The proposed system is sufficiently domain<br />
independent and takes the keyword based information retrieval for graphic documents, one step forward, to Query<br />
By Example (QBE) and focused retrieval. During offline learning mode: we vectorize the documents in the repository,<br />
represent them by attributed relational graphs, extract regions of interest (ROIs) from them, convert each ROI to a fuzzy<br />
structural signature, cluster similar signatures to form ROI classes and build an index for the repository. During online<br />
querying mode: a Bayesian network classifier recognizes the ROIs in the query image and the corresponding documents<br />
are fetched by looking up in the repository index. Experimental results are presented for synthetic images of architectural<br />
and electronic documents.<br />
10:00-10:20, Paper ThAT7.4<br />
Toward Massive Scalability in Image Matching<br />
Moraleda, Jorge, Ricoh Innovations Inc.<br />
Hull, Jonathan, Ricoh<br />
A method for image matching from partial blurry images is presented that leverages existing text retrieval algorithms to<br />
provide a solution that scales to hundreds of thousands of images. As an initial application, we present a document image<br />
matching system in which the user supplies a query image of a small patch of a paper document taken with a cell phone<br />
camera, and the system returns a label identifying the original electronic document if found in a previously indexed collection.<br />
Experimental results show that a retrieval rate of over 70% is achieved on a collection of nearly 500,000 document<br />
pages.<br />
10:20-10:40, Paper ThAT7.5<br />
Learning Image Anchor Templates for Document Classification and Data Extraction<br />
Sarkar, Prateek, Palo Alto Res. Center<br />
Image anchor templates are used in document image analysis for document classification, data localization, and other<br />
tasks. Current tools allow human operators to mark out small sub-images from documents to act as anchor templates.<br />
However, this requires time, and expertise because operators have to make informed decisions based on behavior of the<br />
template matching algorithms, and the expected degradations patterns in documents. We propose learning templates for a<br />
task automatically and quickly from a few training examples. Document classification or data localization can be done<br />
more robustly by combining evidence from many more discriminating templates (e.g., hundreds) than would be practicable<br />
for operators to specify.<br />
- 249 -
ThAT8 Upper Foyer<br />
Image Analysis; Scene Understanding; Shape Modeling; Tracking and Surveillance; Vision Sensors<br />
Poster Session<br />
Session chair: Gimel’farb, Georgy (Univ. of Auckland)<br />
09:00-11:10, Paper ThAT8.2<br />
Sparse Embedding Visual Attention Systems Combined with Edge Information<br />
Zhao, Cairong, Nanjing Univ. of Science and Tech.<br />
Liu, ChuanCai, Nanjing Univ. of Science and Tech.<br />
Lai, Zhihui, Nanjing Univ. of Science and Tech.<br />
Yang, Jingyu, Nanjing Univ. of Science and Tech.<br />
The general computational models of visual attention are to obtain multi-scale feature maps in terms of visual properties<br />
like intensity, color and orientation, and then combine them to get one saliency map. But due to the lack of object edge information<br />
and reasonable feature combination strategy, the visual saliency map of the image is a blur map. Being aware<br />
of these, we propose a new scheme for saliency extraction. In this paper, we firstly put forward a sparse embedding feature<br />
combination strategy, inspired by sparse representation. The strategy is used to combine the salient regions from the individual<br />
feature maps based on a novel feature sparse indicator that measures the contribution of each map to saliency. Then<br />
we combine traditional visual attention with edge information. Results on different scene images show that our method<br />
outperforms other traditional feature combination strategies.<br />
09:00-11:10, Paper ThAT8.4<br />
LLN-Based Model-Driven Validation of Data Points for Random Sample Consensus Methods<br />
Zhang, Liang, Communications Res. Centre Canada<br />
Wang, Demin, Communications Res. Center Canada<br />
This paper presents an on-the-fly model-driven validation of data points for random sample consensus methods (RANSAC).<br />
The novelty resides in the idea that an analysis of the outcomes of previous random model samplings can benefit subsequent<br />
samplings. Given a sequence of successful model samplings, information from the inlier sets and the model errors is used<br />
to provide a validness of a data point. This validness is used to guide subsequent model samplings, so that the data point<br />
with a higher validness has more chance to be selected. To evaluate the performance, the proposed method is applied to<br />
the problem of the line model fitting and the estimation of the fundamental matrix. Experimental results confirm that the<br />
proposed algorithm improves the performance of RANSAC in terms of the estimate accuracy and the number of samplings.<br />
09:00-11:10, Paper ThAT8.5<br />
Estimating 3D Human Pose from Single Images using Iterative Refinement of the Prior<br />
Daubney, Ben Christopher, Swansea Univ.<br />
Xie, Xianghua, Swansea Univ.<br />
This paper proposes a generative method to extract 3D human pose using just a single image. Unlike many existing approaches<br />
we assume that accurate foreground background segmentation is not possible and do not use binary silhouettes.<br />
A stochastic method is used to search the pose space and the posterior distribution is maximized using Expectation Maximization<br />
(EM). It is assumed that some knowledge is known a priori about the position, scale and orientation of the person<br />
present and we specifically develop an approach to exploit this. The result is that we can learn a more constrained prior<br />
without having to sacrifice its generality to a specific action type. A single prior is learnt using all actions in the Human<br />
Eva dataset [9] and we provide quantitative results for images selected across all action categories and subjects, captured<br />
from differing viewpoints.<br />
09:00-11:10, Paper ThAT8.6<br />
Human-Area Segmentation by Selecting Similar Silhouette Images based on Weak-Classifier Response<br />
Ando, Hiroaki, Chubu Univ.<br />
Fujiyoshi, Hironobu, Chubu Univ.<br />
Human-area segmentation is a major issue in video surveillance. Many existing methods estimate individual human areas<br />
from the foreground area obtained by background subtraction, but the effects of camera movement can make it difficult<br />
- 250 -
to obtain a background image. We have achieved human-area segmentation requiring no background image by using<br />
chamfer matching to match the results of human detection using Real AdaBoost with silhouette images. Although accuracy<br />
in chamfer matching drops as the number of templates increases, the proposed method enables segmentation accuracy to<br />
be improved by selecting silhouette images similar to the matching target beforehand based on response values from weak<br />
classifiers in Real AdaBoost.<br />
09:00-11:10, Paper ThAT8.7<br />
Local Optical Operators for Subpixel Scene Analysis<br />
Jean, Yves, City Univ. of NY<br />
In this paper we present a scene analysis technique with subpixel filtering based on dense coded light fields. Our technique<br />
computes alignment and optically projects analysis filters to local surfaces within the extent of a camera pixel. The resolution<br />
gain depends on the local light field density not on the point spread function of the camera optics. <strong>Abstract</strong> An initial<br />
structured light sequence is used in establishing each camera pixel’s footprint in the projector generated light field. Then<br />
a sequence of basis functions embedded in the light field, with camera pixel support, combine with the local surface texture<br />
and are integrated by a camera sensor to produce a localized response at the subpixel scale. We address optical modeling<br />
and aliasing issues since the dense light field is under sampled by the camera pixels. Results are provided with objects of<br />
planar and non-planar topology.<br />
09:00-11:10, Paper ThAT8.8<br />
Aesthetic Image Classification for Autonomous Agents<br />
Desnoyer, Mark, Carnegie Mellon Univ.<br />
Wettergreen, David, Carnegie Mellon Univ.<br />
Computational aesthetics is the study of applying machine learning techniques to identify aesthetically pleasing imagery.<br />
Prior work used online datasets scraped from large user communities like Flikr to get labeled data. However, online imagery<br />
represents results late in the media generation process, as the photographer has already framed the shot and then picked<br />
the best results to upload. Thus, this technique can only identify quality imagery once it has been taken. In contrast, automatically<br />
creating pleasing imagery requires understanding the imagery present earlier in the process. This paper applies<br />
computational aesthetics techniques to a novel dataset from earlier in that process in order to understand how the problem<br />
changes when an autonomous agent, like a robot or a real-time camera aid, creates pleasing imagery instead of simply<br />
identifying it.<br />
09:00-11:10, Paper ThAT8.9<br />
Removal of Moving Objects from a Street-view Image by Fusing Multiple Image Sequences<br />
Uchiyama, Hiroyuki, Nagoya Univ.<br />
Deguchi, Daisuke, Nagoya Univ.<br />
Takahashi, Tomokazu, Gifu Shotoku Gakuen Univ.<br />
Ide, Ichiro, Nagoya Univ.<br />
Murase, Hiroshi, Nagoya Univ.<br />
We propose a method to remove moving objects from an in-vehicle camera image sequence by fusing multiple image sequences.<br />
Driver assistance systems and services such as Google Street View require images containing no moving object.<br />
The proposed scheme consists of three parts: (i) collection of many image sequences along the same route by using vehicles<br />
equipped with an omni-directional camera, (ii) temporal and spatial registration of image sequences, and (iii) mosaicing<br />
partial images containing no moving object. Experimental results show that 97.3% of the moving object area could be removed<br />
by the proposed method.<br />
09:00-11:10, Paper ThAT8.10<br />
Improving SIFT-Based Descriptors Stability to Rotations<br />
Bellavia, Fabio, Univ. of Palermo<br />
Tegolo, Domenico, Univ. of Palermo<br />
Trucco, Emanuele<br />
Image descriptors are widely adopted structures to match image features. SIFT-based descriptors are collections of gradient<br />
- 251 -
orientation histograms computed on different feature regions, commonly divided by using a regular Cartesian grid or a<br />
log-polar grid. In order to achieve rotation invariance, feature patches have to be generally rotated in the direction of the<br />
dominant gradient orientation. In this paper we present a modification of the GLOH descriptor, a SIFT-based descriptor<br />
based on a log-polar grid, which avoids to rotate the feature patch before computing the descriptor since predefined discrete<br />
orientations can be easily derived by shifting the descriptor vector. The proposed descriptors, called sGLOH and sGLOH+,<br />
have been compared with the SIFT descriptor on the Oxford image dataset, with good results which point out its robustness<br />
and stability.<br />
09:00-11:10, Paper ThAT8.11<br />
Inpainting Large Missing Regions in Range Images<br />
Bhavsar, Arnav, Indian Inst. of Tech. Madras<br />
Ambasamudram, Rajagopalan, Indian Inst. of Tech. Madras<br />
We propose a technique to in paint large missing regions in range images. Such a technique can be used to restore degraded/occluded<br />
range maps. It can also serve to reconstruct dense depth maps from sparse measurements which can speed<br />
up the acquisition. Our method uses the visual cue from segmentation of an intensity image registered to the range image.<br />
Our approach enforces that pixels in the same segment should have similar range. Our simple strategy involves planefitting<br />
and local medians over segments to compute local energies for labeling unknown pixels. Our results exhibit high<br />
quality in painting with very low errors.<br />
09:00-11:10, Paper ThAT8.12<br />
Angular Variation as a Monocular Cue for Spatial Perception<br />
Aranda, Joan, UPC<br />
Navarro, Agustin A., UPC<br />
Perspective projection presents objects as they are naturally seen by the eye. However, this type of mapping strongly<br />
distorts their geometric properties as angles, which are not preserved under perspective transformations. In this work, this<br />
angular variation serves to model the visual effect of perspective projection. Thus, knowing that the angular distortion depends<br />
on the point of view of the observer, it is demonstrated that it is possible to determine the pose of an object as a consequence<br />
of its perspective distortion. It is a computational approach to direct perception in which spatial information of<br />
a scene is calculated directly from the optic array. Experimental results show the robustness provided by the use of angles<br />
and establishes this 3D measurement technique as an emulation of a visual perception process.<br />
09:00-11:10, Paper ThAT8.13<br />
An Exploration Scheme for Large Images: Application to Breast Cancer Grading<br />
Veillard, Antoine, NUS<br />
Lomenie, Nicolas, CNRS<br />
Racoceanu, Daniel, CNRS - French National Res. Center<br />
Most research works focus on pattern recognition within a small sample images but strategies for running efficiently these<br />
algorithms over large images are rarely if ever specifically considered. In particular, the new generation of satellite and<br />
microscopic images are acquired at a very high resolution and a very high daily rate. We propose an efficient, generic<br />
strategy to explore large images by combining computational geometry tools with a local signal measure of relevance in<br />
a dynamic sampling framework. An application to breast cancer grading from huge histopathological images illustrates<br />
the benefit of such a general strategy for new major applications in the field of microscopy.<br />
09:00-11:10, Paper ThAT8.14<br />
3D Human Body Modeling using Range Data<br />
Yamauchi, Koichiro, Keio Univ.<br />
Bhanu, Bir, Univ. of California<br />
Saito, Hideo, Keio Univ.<br />
For the 3D modeling of walking humans the determination of body pose and extraction of body parts, from the sensed 3D<br />
range data, are challenging image processing problems. Real body data may have holes because of self-occlusions and<br />
grazing angle views. Most of the existing modeling methods rely on direct fitting a 3D model into the data without con-<br />
- 252 -
sidering the fact that the parts in an image are indeed the human body parts. In this paper, we present a method for 3D<br />
human body modeling using range data that attempts to overcome these problems. In our approach the entire human body<br />
is first decomposed into major body parts by a parts-based image segmentation method, and then a kinematics model is<br />
fitted to the segmented body parts in an optimized manner. The fitted model is adjusted by the iterative closest point (ICP)<br />
algorithm to resolve the gaps in the body data. Experimental results and comparisons demonstrate the effectiveness of our<br />
approach.<br />
09:00-11:10, Paper ThAT8.15<br />
Scale Matching of 3D Point Clouds by Finding Keyscales with Spin Images<br />
Tamaki, Toru, Hiroshima Univ.<br />
Tanigawa, Shunsuke, Hiroshima Univ.<br />
Ueno, Yuji, Hiroshima Univ.<br />
Raytchev, Bisser, Hiroshima Univ.<br />
Kaneda, Kazufumi, Hiroshima Univ.<br />
In this paper we propose a method for matching the scales of 3D point clouds. 3D point sets of the same scene obtained<br />
by 3D reconstruction techniques usually differ in scales. To match scales, we propose a keyscale that characterizes the<br />
scale of a given 3D point cloud. By performing PCA of spin images over different scales, a keyscale is defined as the<br />
scale that gives the minimum of cumulative contribution rate of PCA at a specific dimension of eigen space. Simulations<br />
with the Stanford bunny and experimental results with 3D reconstructions of a real scene demonstrate that keyscales of<br />
any 3D point clouds can be uniquely found and effectively used for scale matching.<br />
09:00-11:10, Paper ThAT8.16<br />
Tracking Multiple People with Illumination Maps<br />
Zen, Gloria, Fondazione Bruno Kessler<br />
Lanz, Oswald, Fondazione Bruno Kessler<br />
Messelodi, Stefano, Fondazione Bruno Kessler<br />
Ricci, Elisa, Fondazione Bruno Kessler<br />
We address the problem of multiple people tracking under non-homogenous and time-varying illumination conditions.<br />
We propose a unified framework for jointly estimating the position of the targets and their illumination conditions. For<br />
each target multiple templates are considered to model appearance variations due to lighting changes. The template choice<br />
is driven by an illumination map which describes the light conditions in different areas of the scene. This map is computed<br />
with a novel algorithm for efficient inference in a hierarchical Markov Random Field (MRF) and is updated online to<br />
adapt to slow lighting changes. Experimental results demonstrate the effectiveness of our approach.<br />
09:00-11:10, Paper ThAT8.17<br />
Combining Foreground / Background Feature Points and Anisotropic Mean Shift for Enhanced Visual Object<br />
Tracking<br />
Haner, Sebastian, Lund Univ. of Tech.<br />
Gu, Irene Yu-Hua, Chalmers Univ. of Tech.<br />
This paper proposes a novel visual object tracking scheme, exploiting both local point feature correspondences and global<br />
object appearance using the anisotropic mean shift tracker. Using a RANSAC cost function incorporating the mean shift<br />
motion estimate, motion smoothness and complexity terms, an optimal feature point set for motion estimation is found<br />
even when a high proportion of outliers is presented. The tracker dynamically maintains sets of both foreground and background<br />
features, the latter providing information on object occlusions. The mean shift motion estimate is further used to<br />
guide the inclusion of new point features in the object model. Our experiments on videos containing long term partial occlusions,<br />
object intersections and cluttered or close color distributed background have shown more stable and robust tracking<br />
performance in comparison to three existing methods.<br />
09:00-11:10, Paper ThAT8.18<br />
Enhanced Measurement Model for Subspace-Based Tracking<br />
Yin, Shimin, Seoul National Univ.<br />
Yoo, Haan Ju, Seoul National Univ.<br />
- 253 -
Choi, Jin Young, Automation and System Res. Inst. Seoul NationalUniversity<br />
We present an efficient and robust measurement model for visual tracking. This approach builds on and extends work on<br />
measurement model of subspace representation. Subspace-based tracking algorithms have been introduced to visual tracking<br />
literature for a decade and show considerable tracking performance due to its robustness in matching. However, the<br />
measures used in their measurement models are not robust enough in cluttered backgrounds. We propose a novel measure<br />
of object matching referred to as WDIFS, which aims to improve the discriminability of matching within the subspace.<br />
Our measurement model can distinguish target from similar background clutters which often cause erroneous drift by conventional<br />
DFFS based measure. Experiments demonstrate the effectiveness of the proposed tracking algorithm under cluttered<br />
background.<br />
09:00-11:10, Paper ThAT8.19<br />
Person-Specific Face Shape Estimation under Varying Head Pose from Single Snapshots<br />
Dornaika, Fadi, Univ. of the Basque Country<br />
Raducanu, Bogdan, Computer Vision Center<br />
This paper presents a new method for person-specific face shape estimation under varying head pose of a previously<br />
unseen person from a single image. We describe a featureless approach based on a deformable 3D model and a learned<br />
face subspace. The proposed approach is based on maximizing a likelihood measure associated with a learned face subspace,<br />
which is carried out by a stochastic and genetic optimizer. We conducted the experiments on a subset of Honda<br />
Video Database showing the feasibility and robustness of the proposed approach. For this reason, our approach could lend<br />
itself nicely to complex frameworks involving 3D face tracking and face gesture recognition in monocular videos.<br />
09:00-11:10, Paper ThAT8.20<br />
Tracking Ships from Fast Moving Camera through Image Registration<br />
Fefilatyev, Sergiy, Univ. of South Florida<br />
Goldgof, Dmitry, Univ. of South Florida<br />
Lembke, Chad, Univ. of South Florida<br />
This paper presents an algorithm that detects and tracks marine vessels in video taken by a nonstationary camera installed<br />
on an untethered buoy. The video is characterized by large inter-frame motion of the camera, cluttered background, and<br />
presence of compression artifacts. Our approach performs segmentation of ships in individual frames processed with a<br />
color-gradient filter. The threshold selection is based on the histogram of the search region. Tracking of ships in a sequence<br />
is enabled by registering the horizon images in one coordinate system and by using a multihypothesis framework. Registration<br />
step uses an area-based technique to correlate a processed strip of the image over the found horizon line. The results<br />
of evaluation of detection, localization, and tracking of the ships show significant increase in performance in comparison<br />
to the previously used technique.<br />
09:00-11:10, Paper ThAT8.21<br />
Boosted Multiple Kernel Learning for Scene Category Recognition<br />
Jhuo, I-Hong, National Taiwan Univ.<br />
Lee, Der-Tsai, National Taiwan Univ.<br />
Scene images typically include diverse and distinctive properties. It is reasonable to consider different features in establishing<br />
a scene category recognition system with a promising performance. We propose an adaptive model to represent<br />
various features in a unified domain, i.e., a set of kernels, and transform the discriminant information contained in each<br />
kernel into a set of weak learners, called dyadic hyper cuts. Based on this model, we present a novel approach to carrying<br />
out incremental multiple kernel learning for feature fusion by applying AdaBoost to the union of the sets of weak learners.<br />
We further evaluate the performance of this approach by a benchmark dataset for scene category recognition. Experimental<br />
results show a significantly improved performance in both accuracy and efficiency.<br />
09:00-11:10, Paper ThAT8.22<br />
Receding Horizon Estimation for Hybrid Particle Filters and Application for Robust Visual Tracking<br />
Kim, Du Yong, Gwangju Inst. of Science and Tech.<br />
Yang, Ehwa, Gwangju Inst. of Science and Tech.<br />
- 254 -
Jeon, Moongu, Gwangju Inst. of Science and Tech.<br />
Shin, Vladimir, Gwangju Inst. of Science and Tech.<br />
The receding horizon estimation is applied to design robust visual trackers. Most recent data within the fixed size of windows<br />
is receding, and is processed to obtain an estimate of the object state at the current time. In visual tracking such a<br />
scheme improves filter accuracy by avoiding accumulated approximation errors. A newly derived unscented Kalman filter<br />
(UKF) based on the receding horizon strategy is proposed for determining the importance density of the hybrid particle<br />
filter. The importance density derived by the receding horizon-based UKF (RHUKF) provides significantly improved accuracy<br />
and performance consistency compared to the unscented particle filter (UPF). Visual tracking examples are subsequently<br />
tested to demonstrate the advantages of the filter.<br />
09:00-11:10, Paper ThAT8.23<br />
Efficient Polygonal Approximation of Digital Curves via Monte Carlo Optimization<br />
Zhou, Xiuzhuang, Beijing Inst. of Tech.<br />
Lu, Yao, Beijing Inst. of Tech.<br />
A novel stochastic searching scheme based on the Monte Carlo optimization is presented for polygonal approximation<br />
(PA) problem. We propose to combine the split-and-merge based local optimization and the Monte Carlo sampling, to<br />
give an efficient stochastic optimization scheme. Our approach, in essence, is a well-designed Basin-Hopping scheme,<br />
which performs stochastic hopping among the reduced energy peaks. Experiment results on various benchmarks show<br />
that our method achieves high-quality solutions with lower computational costs, and outperforms most of state-of-the-art<br />
algorithms for PA problem.<br />
09:00-11:10, Paper ThAT8.24<br />
Weakly Supervised Action Recognition using Implicit Shape Models<br />
Thi, Tuan Hue, Univ. of New South Wales and National ICT of Australia<br />
Cheng, Li, National ICT of Australia<br />
Zhang, Jian, National ICT of Australia<br />
Wang, Li, Nanjing Forest Univ.<br />
Satoh, Shin’Ichi, National Inst. of Informatics<br />
In this paper, we present a robust framework for action recognition in video, that is able to perform competitively against<br />
the state-of-the-art methods, yet does not rely on sophisticated background subtraction preprocess to remove background<br />
features. In particular, we extend the Implicit Shape Modeling (ISM) of [10] for object recognition to 3D to integrate local<br />
spatiotemporal features, which are produced by a weakly supervised Bayesian kernel filter. Experiments on benchmark<br />
datasets (including KTH and Weizmann) verifies the effectiveness of our approach.<br />
09:00-11:10, Paper ThAT8.25<br />
Moments of Elliptic Fourier Descriptors<br />
Soldea, Octavian, Sabanci Univ.<br />
Unel, Mustafa, Sabanci Univ.<br />
Ercil, Aytul, Sabanci Univ.<br />
This paper develops a recursive method for computing moments of 2D objects described by elliptic Fourier descriptors<br />
(EFD). Green’s theorem is utilized to transform 2D surface integrals into 1D line integrals and EFD description is employed<br />
to derive recursions for moments computations. Experiments are performed to quantify the accuracy of our proposed<br />
method. Comparison with Bernstein-Bezier representations is also provided.<br />
09:00-11:10, Paper ThAT8.26<br />
Semi-Supervised Trajectory Learning using a Multi-Scale Key Point based Trajectory Representation<br />
Liu, Yang, Chinese Acad. of Sciences<br />
Li, Xi, CNRS, TELECOM ParisTech<br />
Hu, Weiming, National Lab. of Pattern Recognition, Inst.<br />
Motion trajectories contain rich high-level semantic information such as object behaviors and gestures, which can be ef-<br />
- 255 -
fectively captured by supervised trajectory learning. However, it is usually a tough task to obtain a large number of highquality<br />
manually labeled samples in real applications. Thus, how to perform trajectory learning in small training sample<br />
size situations is an important research topic. In this paper, we propose a trajectory learning framework using graph-based<br />
semi-supervised transductive learning, which propagates training sample labels along a particular graph. Furthermore, a<br />
novel trajectory descriptor based on multi-scale key points is proposed to characterize the spatial structural information.<br />
Experimental results demonstrate effectiveness of our framework.<br />
09:00-11:10, Paper ThAT8.27<br />
Detection based Low Frame Rate Human Tracking<br />
Wang, Lu, The Univ. of Hong Kong<br />
Yung, Nelson, the Univ. of Hong Kong<br />
Tracking by association of low frame rate detection responses is not trivial, as motion is less continuous and hence ambiguous.<br />
The problem becomes more challenging when occlusion occurs. To solve this problem, we firstly propose a<br />
robust data association method that explicitly differentiates ambiguous tracklets that are likely to introduce incorrect<br />
linking from other tracklets, and deal with them effectively. Secondly, we solve the long-time occlusion problem by detecting<br />
inter-track relationship and performing track split and merge according to appearance similarity and occlusion<br />
order. Experiment on a challenging human surveillance dataset shows the effectiveness of the proposed method.<br />
09:00-11:10, Paper ThAT8.28<br />
Detecting Dominant Motion Flows in Unstructured/Structured Crowd Scenes<br />
Ozturk, Ovgu, The Univ. of Tokyo<br />
Yamasaki, Toshihiko, The Univ. of Tokyo<br />
Aizawa, Kiyoharu, The Univ. of Tokyo<br />
Detecting dominant motion flows in crowd scenes is one of the major problems in video surveillance. This is particularly<br />
difficult in unstructured crowd scenes, where the participants move randomly in various directions. This paper presents a<br />
novel method which utilizes SIFT features’ flow vectors to calculate the dominant motion flows in both unstructured and<br />
structured crowd scenes. SIFT features can represent the characteristic parts of objects, allowing robust tracking under<br />
non-rigid motion. First, flow vectors of SIFT features are calculated at certain intervals to form a motion flow map of the<br />
video. ‘ext, this map is divided into equally sized square regions and in each region dominant motion flows are estimated<br />
by clustering the flow vectors. Then, local dominant motion flows are combined to obtain the global dominant motion<br />
flows. Experimental results demonstrate the successful application of the proposed method to challenging real-world<br />
scenes.<br />
09:00-11:10, Paper ThAT8.29<br />
Statistical Shape Modeling using Morphological Representations<br />
Velasco-Forero, Santiago, MINES ParisTech<br />
Angulo, Jesus, MINES ParisTech<br />
The aim of this paper is to propose tools for statistical analysis of shape families using morphological operators. Given a<br />
series of shape families (or shape categories), the approach consists in empirically computing shape statistics (i.e., mean<br />
shape and variance of shape) and then to use simple algorithms for random shape generation, for empirical shape confidence<br />
boundaries computation and for shape classification using Bayes rules. The main required ingredients for the present methods<br />
are well known in image processing, such as watershed on distance functions or log-polar transformation. Performance<br />
of classification is presented in a well-known shape database.<br />
09:00-11:10, Paper ThAT8.30<br />
Recovering the Topology of Multiple Cameras by Finding Continuous Paths in a Trellis<br />
Cai, Yinghao, Univ. of Oulu<br />
Kaiqi, Huang, CAS Inst. of Automation<br />
Tan, Tieniu, CAS Inst. of Automation<br />
Pietikäinen, Matti, Univ. of Oulu<br />
In this paper, we propose an unsupervised method for recovering the topology of multiple cameras with non-overlapping<br />
- 256 -
fields of view. The nodes in the topology graph are defined as entry/exit zones in each camera while the connectivity between<br />
nodes is inferred through finding continuous paths in a trellis where appearance information and temporal information<br />
of moving objects are encoded. Unlike previous methods which assume a single mode transition distribution between<br />
nodes, our method is capable of dealing with multi-modal transition situations when both cars and pedestrians are in the<br />
scene. Results on simulated and real-life datasets demonstrate the effectiveness of the proposed method.<br />
09:00-11:10, Paper ThAT8.31<br />
On-Line Random Naive Bayes for Tracking<br />
Godec, Martin, Graz Univ. of Tech.<br />
Leistner, Christian, Graz Univ. of Tech.<br />
Saffari, Amir, Graz Univ. of Tech.<br />
Bischof, Horst, Graz Univ. of Tech.<br />
Randomized learning methods (i.e., Forests or Ferns) have shown excellent capabilities for various computer vision applications.<br />
However, it was shown that the tree structure in Forests can be replaced by even simpler structures, e.g., Random<br />
Naive Bayes classifiers, yielding similar performance. The goal of this paper is to benefit from these findings to develop<br />
an efficient on-line learner. Based on the principals of on-line Random Forests, we adapt the Random Naive Bayes classifier<br />
to the on-line domain. For that purpose, we propose to use on-line histograms as weak learners, which yield much better<br />
performance than simple decision stumps. Experimentally we show, that the approach is applicable to incremental learning<br />
on machine learning datasets. Additionally, we propose to use an iir filtering-like forgetting function for the weak learners<br />
to enable adaptivity and evaluate our classifier on the task of tracking by detection.<br />
09:00-11:10, Paper ThAT8.32<br />
Interest Point based Tracking<br />
Kloihofer, Werner, Center Communication Systems GmbH<br />
Kampel, Martin, Vienna Univ. of Tech.<br />
This paper deals with a novel method for object tracking. In the first step interest points are detected and feature descriptors<br />
around them are calculated. Sets of known points are created, allowing tracking based on point matching. The set representation<br />
is updated online at every tracking step. Our method uses one-shot learning with the first frame, so no offline<br />
and no supervised learning is required. Following an object recognition based approach there is no need for a background<br />
model or motion model, allowing tracking of abrupt motion and with non-stationary cameras. We compare our method to<br />
Mean Shift and Tracking via Online Boosting, showing the benefits of our approach.<br />
09:00-11:10, Paper ThAT8.33<br />
Stochastic Filtering of Level Sets for Curve Tracking<br />
Avenel, Christophe, Irisa<br />
Memin, Etienne<br />
Perez, Patrick<br />
This paper focuses on the tracking of free curves using non-linear stochastic filtering techniques. It relies on a particle<br />
filter which includes color measurements. The curve and its velocity are defined through two coupled implicit level set<br />
representations. The stochastic dynamics of the curve is expressed directly on the level set function associated to the curve<br />
representation and combines a velocity field captured from the additional second level set attached to the past curve’s<br />
points location. The curve’s dynamics combines a low-dimensional noise model and a data-driven local force. We demonstrate<br />
how this approach allows the tracking of highly and rapidly deforming objects, such as convective cells in infra-red<br />
satellite images, while providing a location-dependent assessment of the estimation confidence.<br />
09:00-11:10, Paper ThAT8.34<br />
Scalable Cage-Driven Feature Detection and Shape Correspondence for 3D Point Sets<br />
Seversky, Lee, State Univ. of New York at Binghamton<br />
Yin, Lijun, State Univ. of New York at Binghamton<br />
We propose an automatic deformation-driven correspondence algorithm for 3D point sets of non-rigid articulated shapes.<br />
- 257 -
Our approach uses simple geometric cages to embed the point set data and extract and match a coarse set of prominent<br />
features. We seek feature correspondences which lead to low-distortion deformations of the cages while satisfying the feature<br />
pairing. Our approach operates on the simplified geometric domain of the cage instead of the more complex 3D point<br />
data. Thus, it is robust to noise, partial occlusions, and insensitive to non-regular sampling. We demonstrate the potential<br />
of our approach by finding pairwise correspondences for sequences of acquired time-varying 3D scan point data.<br />
09:00-11:10, Paper ThAT8.35<br />
Event Recognition based on Top-Down Motion Attention<br />
Li, Li, Chinese Acad. of Sci.<br />
Hu, Weiming, Chinese Acad. of Sci.<br />
Li, Bing, Chinese Acad. of Sci.<br />
Yuan, Chunfeng, Chinese Acad. of Sci.<br />
Zhu, Pengfei, Chinese Acad. of Sci.<br />
Li, Wanqing, Univ. of Wollongong<br />
How to fuse static and dynamic information is a key issue in event analysis. In this paper, a top-down motion guided<br />
fusing method is proposed for recognizing events in an unconstrained news video. In the method, the static information is<br />
represented as a Bag-of-SIFT-features and motion information is employed to generate event specific attention map to<br />
direct the sampling of the interest points. We build class-specific motion histograms for each event so as to give more<br />
weight on the interest points that are discriminative to the corresponding event. Experimental results on TRECVID 2005<br />
video corpus demonstrate that the proposed method can improve the mean average accuracy of recognition.<br />
09:00-11:10, Paper ThAT8.36<br />
Construction of Precise Local Affine Frames<br />
Mikulik, Andrej, CMP FEE, CTU Prague<br />
Matas, Jiri, CTU Prague<br />
Perdoch, Michal, CMP, FEE, CTU Prague<br />
Chum, Ondrej,<br />
We propose a novel method for the refinement of Maximally Stable Extremal Region (MSER) boundaries to sub-pixel<br />
precision by taking into account the intensity function in the 2x2 neighborhood of the contour points. The proposed method<br />
improves the repeatability and precision of Local Affine Frames (LAFs) constructed on extremal regions. Additionally,<br />
we propose a novel method for detection of local curvature extrema on the refined contour. Experimental evaluation on<br />
publicly available datasets shows that matching with the modified LAFs leads to a higher number of correspondences and<br />
a higher inlier ratio in more than 80% of the test image pairs. Since the processing time of the contour refinement is negligible,<br />
there is no reason not to include the algorithms as a standard part of the MSER detector and LAF constructions.<br />
09:00-11:10, Paper ThAT8.37<br />
Foreground Segmentation via Background Modeling on Riemannian Manifolds<br />
Caseiro, Rui, Univ. of Coimbra<br />
Henriques, João F, Univ. of Coimbra<br />
Batista, Jorge, Univ. of Coimbra<br />
Statistical modeling in color space is a widely used approach for background modeling to foreground segmentation. Nevertheless,<br />
sometimes computing such statistics directly on image values is not enough to achieve a good discrimination.<br />
Thus the image may be converted into a more information rich form, such as a tensor field, in which can be encoded color<br />
and gradients. In this paper, we exploit the theoretically well-founded differential geometrical properties of the Riemannian<br />
manifold where tensors lie. We propose a novel and efficient approach for foreground segmentation on tensor field based<br />
on data modeling by means of Gaussians mixtures (GMM) directly in the tensor domain. We introduced a Expectation<br />
Maximization (EM) algorithm to estimate the mixture parameters, and are proposed two algorithms based on an online<br />
K-means approximation of EM, in order to speed up the process. Theoretic analysis and experimental evaluations demonstrate<br />
the promise and effectiveness of the proposed framework.<br />
- 258 -
09:00-11:10, Paper ThAT8.38<br />
Robust Human Behavior Modeling from Multiple Cameras<br />
Kosmopoulos, D., NCSR Demokritos<br />
Voulodimos, Athanasios, National Tech. Univ. of Athens<br />
Varvarigou, Theodora, National Tech. Univ. of Athens<br />
In this work, we propose a framework for classifying structured human behavior in complex real environments, where<br />
problems such as frequent illumination changes and heavy occlusions are expected. Since target recognition and tracking<br />
can be very challenging, we bypass these problems by employing an approach similar to Motion History Images for feature<br />
extraction. Furthermore, to tackle outliers residing within the training data, which might affect severely the training algorithm<br />
of models with Gaussian observation likelihoods, we scrutinize the effectiveness of the multivariate Student-t distribution<br />
as the observation likelihood of the employed Hidden Markov Models. Additionally, the problem of visibility<br />
and occlusions is addressed by providing various extensions of the framework for multiple cameras, both at the feature<br />
and at the state level. Finally, we evaluate the performance of the examined approaches under real-life visual behavior understanding<br />
scenarios and we compare and discuss the obtained results.<br />
09:00-11:10, Paper ThAT8.39<br />
Unsupervised Learning of Activities in Video using Scene Context<br />
Oh, Sangmin, Kitware Inc.<br />
Hoogs, Anthony, Kitware Inc.<br />
Unsupervised learning of semantic activities from video collected over time is an important problem for visual surveillance<br />
and video scene understanding. Our goal is to cluster tracks into semantically interpretable activity models that are independent<br />
of scene locations; most previous work in video scene understanding is focused on learning location-specific normalcy<br />
models. Location-independent models can be used to detect instances of the same activity anywhere in the scene,<br />
or even across multiple scenes. Our insight for this unsupervised activity learning problem is to incorporate scene context<br />
to characterize the behavior of every track. By scene context, we mean local scene structures, such as building entrances,<br />
parking spots and roads, that moving objects frequently interact with. Each track is attributed with large number of potentially<br />
useful features that capture the relationships and interactions with a set of existing scene context elements. Once<br />
feature vectors are obtained, tracks are grouped in this feature space using state-of-the-art clustering techniques, without<br />
considering scene location. Experiments are conducted on webcam video of a complex scene, with many interacting<br />
objects and very noisy tracks resulting from low frame rates and poor image quality. Our results demonstrate that location-independent<br />
and semantically interpretable groupings can be successfully obtained using unsupervised clustering<br />
methods, and that the models are superior to standard location-dependent clustering.<br />
09:00-11:10, Paper ThAT8.40<br />
Multipath Interference Compensation in Time-of-Flight Camera Images<br />
Fuchs, Stefan, German Aerospace Center<br />
Multipath interference is inherent to the working principle of a Time-of-flight camera and can influence the measurements<br />
by several centimeters. Especially in applications that demand for high accuracy, such as object localization for robotic<br />
manipulation or ego-motion estimation of mobile robots, multipath interference is not tolerable. In this paper we formulate<br />
a multipath model in order to estimate the interference and correct the measurements. The proposed approach comprises<br />
the measured scene structure. All distracting surfaces are assumed to be Lambertian radiators and the directional interference<br />
is simulated for correction purposes. The positive impact of these corrections is experimentally demonstrated.<br />
09:00-11:10, Paper ThAT8.41<br />
Segment-Based Foreground Extraction Dedicated to 3D Reconstruction<br />
Kim, Jungwhan, Soongsil Univ.<br />
Park, Anjin, AIST<br />
Jung, Keechul, Soongsil Univ.<br />
Researches of image-based 3D reconstruction have recently produced a number of good results, but they assume that the<br />
accurate foreground to be reconstructed is already extracted from each input image. This paper proposes a novel approach<br />
to extract more accurate foregrounds by iteratively performing foreground extraction and 3D reconstruction in a manner<br />
similar to an EM algorithm on regions segmented in an initial stage, called segments. After definitively extracting the<br />
- 259 -
foregrounds in multi-views based on simply selecting segments corresponding to the real foreground in only one image,<br />
further improved foregrounds are extracted by back-projecting 3D objects reconstructed based on the foreground extracted<br />
in the previous step into segments of each image in multi-views. These two steps are iteratively performed until the energy<br />
function is optimized. In the experiments, more accurate boundaries were obtained, although the proposed method used a<br />
simple 3D reconstruction method.<br />
09:00-11:10, Paper ThAT8.42<br />
Human Pose Estimation for Multiple Persons based on Volume Reconstruction<br />
Luo, Xinghan, Utrecht Univ.<br />
Berendsen, Berend<br />
Tan, Robby T., Utrecht Univ.<br />
Veltkamp, R. C., Utrecht Univ.<br />
Most of the development of pose recognition focused on a single person. However, many applications of computer vision<br />
essentially require the estimation of multiple people. Hence, in this paper, we address the problems of estimating poses of<br />
multiple persons using volumes estimated from multiple cameras. One of the main issues that causes the multiple person<br />
from multiple cameras to be problematic is the present of ghost; volumes. This problem arises when the projections of<br />
two different silhouettes of two different persons onto the 3D world overlap in a place where in fact there is no person in<br />
it. To solve this problem, we first introduce a novel principal axis-based framework to estimate the 3D ground plane positions<br />
of multiple people, and then use the position cues to label the multi-person volumes (voxels), while considering<br />
the voxel connectivity. Having labeled the voxels, we fit the volume of each person with a body model, and determine the<br />
pose of the person based on the model. The results on real videos demonstrate the accuracy and efficiency of our approach.<br />
09:00-11:10, Paper ThAT8.43<br />
3D Articulated Shape Segmentation using Motion Information<br />
Kalafatlar, Emre, Koç Univ.<br />
Yemez, Yucel, Koç Univ.<br />
We present a method for segmentation of articulated 3D shapes by incorporating the motion information obtained from<br />
time-varying models. We assume that the articulated shape is given in the form of a mesh sequence with fixed connectivity<br />
so that the inter-frame vertex correspondences, hence the vertex movements, are known a priori. We use different postures<br />
of an articulated shape in multiple frames to constitute an affinity matrix which encodes both temporal and spatial similarities<br />
between surface points. The shape is then decomposed into segments in spectral domain based on the affinity<br />
matrix using a standard K-means clustering algorithm. The performance of the proposed segmentation method is demonstrated<br />
on the mesh sequence of a human actor.<br />
09:00-11:10, Paper ThAT8.44<br />
Online Learning with Self-Organizing Maps for Anomaly Detection in Crowd Scenes<br />
Feng, Jie, Peking Univ.<br />
Zhang, Chao, Peking Univ.<br />
Hao, Pengwei, Queen Mary Univ. of London<br />
Detecting abnormal behaviors in crowd scenes is quite important for public security and has been paid more and more attentions.<br />
Most previous methods use offline trained model to perform detection which can’t handle the constantly changing<br />
crowd environment. In this paper, we propose a novel unsupervised algorithm to detect abnormal behavior patterns in<br />
crowd scenes with online learning. The crowd behavior pattern is extracted from the local spatio-temporal volume which<br />
consists of multiple motion patterns in temporal order. An online self-organizing map (SOM) is used to model the large<br />
number of behavior patterns in crowd. Each neuron can be updated by incrementally learning the new observations. To<br />
demonstrate the effectiveness of our proposed method, we have performed experiments on real-world crowd scenes. The<br />
online learning can efficiently reduce the false alarms while still be able to detect most of the anomalies.<br />
09:00-11:10, Paper ThAT8.45<br />
Scene Classification using Spatial Pyramid of Latent Topics<br />
Ergul, Emrah, Turkish Naval Academy<br />
Arica, Nafiz, Turkish Naval Academy<br />
- 260 -
We propose a scene classification method, which combines two popular methods in the literature: Spatial Pyramid Matching<br />
(SPM) and probabilistic Latent Semantic Analysis (pLSA) modeling. The proposed scheme called Cascaded pLSA performs<br />
pLSA in a hierarchical sense after the soft-weighted BoW representation based on dense local features is extracted.<br />
We associate spatial layout information by dividing each image into overlapping regions iteratively at different resolution<br />
levels and implementing a pLSA model for each region individually. Finally, an image is represented by concatenated<br />
topic distributions of each region. In performance evaluation, we compare the proposed method with the most successful<br />
methods in the literature, using the popular 15-class-dataset. In the experiments, it is seen that our method slightly outperforms<br />
the others in that particular dataset.<br />
09:00-11:10, Paper ThAT8.46<br />
Optimization of Target Objects for Natural Feature Tracking<br />
Gruber, Lukas, Graz Univ. of Tech.<br />
Zollmann, Stefanie, Graz Univ. of Tech.<br />
Wagner, Daniel, Graz Univ. of Tech.<br />
Schmalstieg, Dieter, Graz Univ. of Tech.<br />
Hollerer, Tobias, UCSB<br />
This paper investigates possible physical alterations of tracking targets to obtain improved 6DoF pose detection for a<br />
camera observing the known targets. We explore the influence of several texture characteristics on the pose detection, by<br />
simulating a large number of different target objects and camera poses. Based on statistical observations, we rank the importance<br />
of characteristics such as texturedness and feature distribution for a specific implementation of a 6DoF tracking<br />
technique. These findings allow informed modification strategies for improving the tracking target objects themselves, in<br />
the common case of man-made targets, as for example used in advertising. This fundamentally differs from and complements<br />
the traditional approach of leaving the targets unchanged while trying to optimize the tracking algorithms and parameters.<br />
09:00-11:10, Paper ThAT8.47<br />
View-Invariant Action Recognition using Rank Constraint<br />
Ashraf, Nazim, Univ. of Central Florida<br />
Shen, Yuping, Univ. of Central Florida<br />
Foroosh, Hassan, Univ. of Central Florida<br />
We propose a new method for view-invariant action recognition based on the rank constraint on the family of planar homographies<br />
associated with triplets of body points. We represent action as a sequence of poses and we use the fact that the<br />
family of homographies associated with two identical poses would have rank 4 to gauge similarity of the pose between<br />
two subjects, observed by different perspective cameras and from different viewpoints. Extensive experimental results<br />
show that our method can accurately identify action from video sequences when they are observed from totally different<br />
viewpoints with different camera parameters.<br />
09:00-11:10, Paper ThAT8.48<br />
Coarse-To-Fine Particle Filter by Implicit Motion Estimation for 3D Head Tracking on Mobile Devices<br />
Sung, Hacheon, Yonsei Univ.<br />
Choi, Kwontaeg, Yonsei Univ.<br />
Byun, Hyeran, Yonsei Univ.<br />
Due to the widely spread mobile devices over the years, a low cost implementation of an efficient head tracking system<br />
is becoming more useful for a wide range of applications. In this paper, we make an attempt to solving real-time 3D head<br />
tracking problem on mobile devices by enhancing the fitness of the dynamics. In our method, the particles are generated<br />
by implicit motion estimation between two particles rather than the explicit motion estimation using corresponding point<br />
matching between consecutive two frames. This generation is applied iteratively using coarse-to fine strategy in order to<br />
handle a large motion using a small number of particle. This reduces the computational cost while preserving the performance.<br />
We evaluate the efficiency and effectiveness of the proposed algorithm by empirical experiments. Finally, we demonstrate<br />
our method on a recent mobile phone.<br />
- 261 -
09:00-11:10, Paper ThAT8.49<br />
Visibility of Multiple Cameras in a Scene with Unknown Geometry<br />
Zhang, Liuxin, Beijing Inst. of Tech.<br />
Jia, Yunde, Beijing Inst. of Tech.<br />
In this paper, we investigate the problem of determining the visible regions of multiple cameras in a 3D scene without a<br />
priori knowledge of the scene geometry. Our approach is based on a variational energy functional where both the unresolved<br />
visibility information of multiple cameras and the unknown scene geometry are included. We cast visibility estimation<br />
and scene geometry reconstruction as an optimization of the variational energy functional amenable for minimization with<br />
the Euler-Lagrange driven evolution. Starting from any initial value, the accurate visibility of multiple cameras as well as<br />
the true scene geometry can be obtained at the end of the evolution. Experimental results show the validity of our approach.<br />
09:00-11:10, Paper ThAT8.50<br />
Low-Level Image Segmentation based Scene Classification<br />
Akbas, Emre, Univ. of Illinois<br />
Ahuja, Narendra, Univ. of Illinois<br />
This paper is aimed at evaluating the semantic information content of multiscale, low-level image segmentation. As a<br />
method of doing this, we use selected features of segmentation for semantic classification of real images. To estimate the<br />
relative measure of the information content of our features, we compare the results of classifications we obtain using them<br />
with those obtained by others using the commonly used patch/grid based features. To classify an image using segmentation<br />
based features, we model the image in terms of a probability density function, a Gaussian mixture model (GMM) to be<br />
specific, of its region features. This GMM is fit to the image by adapting a universal GMM which is estimated so it fits<br />
all images. Adaptation is done using a maximum-aposteriori criterion. We use kernelized versions of Bhattacharyya distance<br />
to measure the similarity between two GMMs and support vector machines to perform classification. We outperform previously<br />
reported results on a publicly available scene classification dataset. These results suggest further experimentation<br />
in evaluating the promise of low level segmentation in image classification.<br />
09:00-11:10, Paper ThAT8.51<br />
Learning Scene Semantics using Fiedler Embedding<br />
Liu, Jingen, Univ. of Michigan<br />
Ali, Saad, Carnegie Mellon Univ.<br />
We propose a framework to learn scene semantics from surveillance videos. Using the learnt scene semantics, a video analyst<br />
can efficiently and effectively retrieve the hidden semantic relationship between homogeneous and heterogeneous<br />
entities existing in the surveillance system. For learning scene semantics, the algorithm treats different entities as nodes<br />
in a graph, where weighted edges between the nodes represent the “initial” strength of the relationship between entities.<br />
The graph is then embedded into a k-dimensional space by Fiedler Embedding.<br />
09:00-11:10, Paper ThAT8.52<br />
Counting Vehicles in Highway Surveillance Videos<br />
Tamersoy, Birgi, The Univ. of Texas at Austin<br />
Aggarwal, J. K., The Univ. of Texas at Austin<br />
This paper presents a complete system for accurately and efficiently counting vehicles in a highway surveillance video.<br />
The proposed approach employs vehicle detection and tracking modules. In the detection module, an automatically trained<br />
binary classifier detects vehicles while providing robustness against view-point, poor quality videos and clutter. Efficient<br />
tracking is then achieved by a simplified multi-hypothesis approach. First an over-complete set of tracks is created considering<br />
every observed detection within a time interval. As needed, hypothesized detections are generated to force continuous<br />
tracks. Finally, a scoring function is used to separate the valid tracks in the over-complete set. Our tracking system<br />
achieved accurate results in significantly challenging highway surveillance videos.<br />
- 262 -
09:00-11:10, Paper ThAT8.53<br />
Efficient 3D Upper Body Tracking with Self-Occlusions<br />
Chen, Jixu, RPI<br />
Ji, Qiang, RPI<br />
We propose an efficient 3D upper body tracking method, which recovers the positions and orientations of six upper-body<br />
parts from the video sequence. Our method is based on a probabilistic graphical model (PGM), which incorporates the<br />
spatial relationships among the body parts, and a robust multi-view image likelihood using probabilistic PCA (PPCA).<br />
For the efficiency, we use a tree-structured graphical model and use the particle based belief propagation to perform the<br />
inference. Since our image likelihood is based on multiple views, we address the self-occlusion by modeling the likelihood<br />
of the body part in each view, and automatically decrease the influence of the occluded view in the inference procedure.<br />
09:00-11:10, Paper ThAT8.54<br />
Track Initialization in Low Frame Rate and Low Resolution Videos<br />
Cuntoor, Naresh, Kitware Inc.<br />
Basharat, Arslan, Kitware Inc.<br />
Perera, A. G. Amitha, Kitware Inc.<br />
Hoogs, Anthony, Kitware Inc.<br />
The problem of object detection and tracking has received relatively less attention in low frame rate and low resolution<br />
videos. Here we focus on motion segmentation in videos where objects appear small (less than 30-pixel tall people) and<br />
have low frame rate (less than 5 Hz). We study challenging cases where some of the, otherwise successful, approaches<br />
may break down. We investigate a number of popular techniques in computer vision that have been shown to be useful<br />
for discriminating various spatio-temporal signatures. These include: Histogram of oriented Gradients (HOG), Histogram<br />
of oriented optical Flow (HOF) and Haar-features (Viola and Jones). We use these feature to classify the motion segmentations<br />
into person vs. other and vehicle vs. other. We rely on aligned motion history images to create a more consistent<br />
object representation across frames. We present results on these features using webcam data and wide-area aerial video<br />
sequences.<br />
09:00-11:10, Paper ThAT8.55<br />
On the Performance of Handoff and Tracking in a Camera Network<br />
Li, Yiming, Univ. of California Riverside<br />
Bhanu, Bir, Univ. of California Riverside<br />
Nguyen, Vincent, Univ. of California Riverside<br />
Camera handoff is an important problem when using multiple cameras to follow a number of objects in a video network.<br />
However, almost all the handoff techniques rely on a robust tracker. State-of-the-art techniques used to evaluate the performance<br />
of camera handoff use either annotated videos or simulated data, and the handoff performance is evaluated in<br />
conjunction with a tracker. This does not allow a deeper understanding into the performance of a tracker and a handoff<br />
technique separately in the real-world settings. In this paper, we evaluate three camera handoff techniques, two different<br />
color-based trackers in seven real-life cases, with varying numbers of cameras, number of objects and the changing environmental<br />
conditions. We also perform experiments on annotated videos to provide the ground-truth for all the scenarios.<br />
This evaluation of performance isolates the effect of tracking and handoff techniques and clarifies their role in a video<br />
network.<br />
09:00-11:10, Paper ThAT8.56<br />
Object Tracking with Ratio Cycles using Shape and Appearance Cues<br />
Sargin, Mehmet Emre, UC Santa Barbara<br />
Ghosh, Pratim, UC Santa Barbara<br />
Manjunath, B. S., UC Santa Barbara<br />
Rose, Kenneth, UC Santa Barbara<br />
We present a method for object tracking over time sequence imagery. The image plane is represented with a 4-connected<br />
planar graph where vertices are associated with pixels. On each image, the outer contour of the object is localized by<br />
finding the optimal cycle in the graph such that a cost function based on temporal, appearance and shape priors is minimized.<br />
Our contribution is the particle filtering-based framework to integrate the shape cue with the temporal and appear-<br />
- 263 -
ance cues. We demonstrate that incorporating the shape prior yields promising performance improvement over temporal<br />
and appearance priors on various object tracking scenarios.<br />
09:00-11:10, Paper ThAT8.57<br />
Real-Time Abnormal Event Detection in Complicated Scenes<br />
Shi, Yinghuan, Nanjing Univ.<br />
Gao, Yang, Nanjing Univ.<br />
Wang, Ruili, Massey Univ.<br />
In this paper, we proposed a novel real-time abnormal event detection framework that requires a short training period<br />
and has a fast processing speed. Our approach is based on phase correlation and our newly developed spatial-temporal<br />
co-occurrence Gaussian mixture models (STCOG)with the following steps: (i) a frame is divided into non-overlapping<br />
local regions; (ii) phase correlation is used to estimate the motion vectors between successive two frames for all corresponding<br />
local regions, and (iii) STCOG is used to model normal events and detect abnormal events if any deviation<br />
from the trained STCOG is found. Our proposed approach is also able to update the parameters incrementally and can<br />
be applied in complicated scenes. The proposed approach outperforms previous ones in terms of shorter training periods<br />
and lower computational complexity.<br />
ThAT9 Lower Foyer<br />
Human Computer Interaction and Biometrics Poster Session<br />
Session chair: Alba Castro, Jose Luis (Univ. of Vigo)<br />
09:00-11:10, Paper ThAT9.1<br />
Encoding Actions via Quantized Vocabulary of Averaged Silhouettes<br />
Wang, Liang, The Univ. of Melbourne<br />
Leckie, Christopher, The Univ. of Melbourne<br />
Human action recognition from video clips has received increasing attention in recent years. This paper proposes a simple<br />
yet effective method for the problem of action recognition. The method aims to encode human actions using the quantized<br />
vocabulary of averaged silhouettes that are derived from space-time windowed shapes and implicitly capture local temporal<br />
motion as well as global body shape. Experimental results on the publicly available Weizmann dataset have demonstrated<br />
that, despite its simplicity, our method is effective for recognizing actions, and is comparable to other state-of-the-art methods.<br />
09:00-11:10, Paper ThAT9.2<br />
Action Recognition using Space-Time Shape Difference Images<br />
Qu, Hao, The Univ. of Melbourne<br />
Wang, Liang, The Univ. of Melbourne<br />
Leckie, Christopher, The Univ. of Melbourne<br />
A common approach to human action recognition is to use 2-D silhouettes in the space-time volume as a basis for further<br />
extraction of useful features. In this paper, we present a novel motion representation based on difference images. We show<br />
that this representation exploits the dynamics of motion, and show its effectiveness in action recognition. Moreover, experimental<br />
results demonstrate that this method is highly accurate and is not sensitive to the resolution of the video.<br />
09:00-11:10, Paper ThAT9.3<br />
A Brain Computer Interface for Communication using Real-Time fMRI<br />
Eklund, Anders, Linköping Univ.<br />
Andersson, Mats, Linköping Univ.<br />
Ohlsson, Henrik, Linköping Univ.<br />
Ynnerman, Anders, Linköping Univ.<br />
Knutsson, Hans,<br />
We present the first step towards a brain computer interface (BCI) for communication using real-time functional magnetic<br />
resonance imaging (fMRI). The subject in the MR scanner sees a virtual keyboard and steers a cursor to select different<br />
- 264 -
letters that can be combined to create words. The cursor is moved to the left by activating the left hand, to the right by activating<br />
the right hand, down by activating the left toes and up by activating the right toes. To select a letter, the subject<br />
simply rests for a number of seconds. We can thus communicate with the subject in the scanner by for example showing<br />
questions that the subject can answer. Similar BCI for communication have been made with electroencephalography<br />
(EEG). In these implementations the subject for example focuses on a letter while different rows and columns of the virtual<br />
keyboard are flashing. The system then tries to detect if the correct letter is flashing or not. In our setup we instead classify<br />
the brain activity. Our system is not limited to a communication interface, but can be used for any interface where five degrees<br />
of freedom is necessary.<br />
09:00-11:10, Paper ThAT9.4<br />
Combined Top-Down/Bottom-Up Human Articulated Pose Estimation using AdaBoost Learning<br />
Wang, Sheng, Tsinghua Univ.<br />
Ai, Haizhou, Tsinghua Univ.<br />
Yamashita, Takayoshi, OMRON Corp.<br />
Lao, Shihong, OMRON Corp.<br />
In this paper, a novel human articulated pose estimation method based on AdaBoost algorithm is presented. The human<br />
articulated pose is estimated by locating major human joint positions. We learn the classifiers on a normalized image for<br />
classifying each pixel position into a certain category. Two different kinds of classifiers, bottom-up joint position classifier<br />
and top-down skeleton classifier, are combined to achieve final results. HOG (Histogram of Oriented Gradient) feature is<br />
used for training both classifiers. Our human pose estimation system consists of three models, human detection, view classification,<br />
and pose estimation. The implemented system can automatically estimate human pose of different views. Experiment<br />
results are reported to show our proposed method can work on relatively small-size human images without using<br />
human silhouettes as a prerequisite, which is very efficient, robust and accurate enough for potential applications in visual<br />
surveillance.<br />
09:00-11:10, Paper ThAT9.5<br />
The Human Action Image<br />
Sethi, Ricky, Univ. of California, Riverside<br />
Roy-Chowdhury, Amit, Univ. of California, Riverside<br />
Recognizing a person’s motion is intuitive for humans but represents a challenging problem in machine vision. In this<br />
paper, we present a multi-disciplinary framework for recognizing human actions. We develop a novel descriptor, the<br />
Human Action Image (HAI): a physically-significant, compact representation for the motion of a person, which we derive<br />
from first principles in physics using Hamilton’s Action. We embed the HAI as the Motion Energy Pathway of the latest<br />
Neurobiological model of motion recognition. The Form Pathway is modelled using existing low-level feature descriptors<br />
based on shape and appearance. Experimental validation of the theory is provided on the well-known Weizmann and USF<br />
Gait datasets.<br />
09:00-11:10, Paper ThAT9.6<br />
Combining Spatial and Temporal Information for Gait based Gender Classification<br />
Hu, Maodi, Beihang Univ.<br />
Wang, Yunhong, Beihang Univ.<br />
Zhang, Zhaoxiang, Beihang Univ.<br />
Wang, Yiding, North China Univ. of Tech.<br />
In this paper, we address the problem of gait based gender classification. The Gabor feature which is a new attempt for<br />
gait analysis, not only improves the robustness to the segmental noise, but also provides a feasible way to purge the additional<br />
influence factors like clothing and carrying condition changes before supervised learning. Furthermore, through the<br />
agency of Maximization of Mutual Information (MMI), the low dimensional discriminative representation is obtained as<br />
the Gabor-MMI feature. After that, gender related Gaussian Mixture Model-Hidden Markov Models (GMM-HMMs) are<br />
constructed for classification work. In this case, supervised learning reduces the dimension of parameter space, and significantly<br />
increases the gap between likelihoods of the gender models. In order to assess the performance of our proposed<br />
approach, we compare it with other methods on the standard CASIA Gait Databases (Dataset B). Experimental results<br />
demonstrate that our approach achieves better Correct Classification Rate (CCR) than the state of the art methods.<br />
- 265 -
09:00-11:10, Paper ThAT9.7<br />
A Vision-Based Taiwanese Sign Language Recognition System<br />
Huang, Chung-Lin, National Tsing-Hua Univ.<br />
Tsai, Bo-Lin, National Tsing-Hua Univ.<br />
This paper presents a vision-based continuous sign language recognition system to interpret the Taiwanese Sign Language<br />
(TSL). The continuous sign language, which consists of a sequence of hold and movement segments, can be decomposed<br />
into non-signs and signs. The signs can be either static signs or dynamic signs. The former can be found in the hold<br />
segment, whereas the latter can be identified in the combination of hold and movement segments. We use Support Vector<br />
Machine (SVM) to recognize the static sign and apply HMM model to identify the dynamic signs. Finally, we use the<br />
finite state machine to verify the correctness of the grammar of the recognized TSL sentence, and correct the miss-recognized<br />
signs.<br />
09:00-11:10, Paper ThAT9.8<br />
Fusing Audio-Visual Nonverbal Cues to Detect Dominant People in Group Conversations<br />
Aran, Oya, Idiap Res. Inst.<br />
Gatica-Perez, Daniel,<br />
This paper addresses the multimodal nature of social dominance and presents multimodal fusion techniques to combine<br />
audio and visual nonverbal cues for dominance estimation in small group conversations. We combine the two modalities<br />
both at the feature extraction level and at the classifier level via score and rank level fusion. The classification is done by<br />
a simple rule-based estimator. We perform experiments on a new 10-hour dataset derived from the popular AMI meeting<br />
corpus. We objectively evaluate the performance of each modality and each cue alone and in combination. Our results<br />
show that the combination of audio and visual cues is necessary to achieve the best performance.<br />
09:00-11:10, Paper ThAT9.9<br />
Wavelet Domain Local Binary Pattern Features for Writer Identification<br />
Du, Liang, Huazhong Univ. of Science and Tech.<br />
You, Xinge, Huazhong Univ. of Science and Tech.<br />
Xu, Huihui, Huazhong Univ. of Science and Tech.<br />
Gao, Zhifan, Huazhong Univ. of Science and Tech.<br />
Tang, Yuanyan, Hongkong Baptist University<br />
The representation of writing styles is a crucial step of writer identification schemes. However, the large intra-writer variance<br />
makes it a challenging task. Thus, a good feature of writing style plays a key role in writer identification. In this<br />
paper, we present a simple and effective feature for off-line, text-independent writer identification, namely wavelet domain<br />
local binary patterns (WD-LBP). Based on WD-LBP, a writer identification algorithm is developed. WD-LBP is able to<br />
capture the essence of characteristics of writer while ignoring the variations intrinsic to every single writer. Unlike other<br />
texture framework method, we do not assign any statistical distribution assumption to the proposed method. This prevent<br />
us from making any, possibly erroneous, assumptions about the handwritten image feature distributions. The experimental<br />
results show that the proposed writer identification method achieves high accuracy of identification and outperforms recent<br />
writer identification method such as wavelet-GGD model and Gabor filtering method.<br />
09:00-11:10, Paper ThAT9.10<br />
Audio-Visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space<br />
Nicolaou, Mihalis, Imperial Coll.<br />
Gunes, Hatice, Imperial Coll.<br />
Pantic, Maja, Imperial Coll.<br />
This paper focuses on audio-visual (using facial expression, shoulder and audio cues) classification of spontaneous affect,<br />
utilising generative models for classification (i) in terms of Maximum Likelihood Classification with the assumption that<br />
the generative model structure in the classifier is correct, and (ii) Likelihood Space Classification with the assumption<br />
that the generative model structure in the classifier may be incorrect, and therefore, the classification performance can be<br />
improved by projecting the results of generative classifiers onto likelihood space, and then using discriminative classifiers.<br />
Experiments are conducted by utilising Hidden Markov Models for single cue classification, and 2 and 3-chain coupled<br />
Hidden Markov Models for fusing multiple cues and modalities. For discriminative classification, we utilise Support<br />
- 266 -
Vector Machines. Results show that Likelihood Space Classification improves the performance (91.76%) of Maximum<br />
Likelihood Classification (79.1%). Thereafter, we introduce the concept of fusion in the likelihood space, which is shown<br />
to outperform the typically used model-level fusion, attaining a classification accuracy of 94.01% and further improving<br />
all previous results.<br />
09:00-11:10, Paper ThAT9.12<br />
Improved Mandarin Keyword Spotting using Confusion Garbage Model<br />
Zhang, Shilei, IBM Res., China<br />
Shuang, Zhiwei, IBM Res., China<br />
Shi, Qin, IBM Res., China<br />
Qin, Yong, IBM Res., China<br />
This paper presents an improved acoustic keyword spotting (KWS) algorithm using a novel confusion garbage model in<br />
Mandarin conversational speech. Observing the KWS corpus, we found there are many words with similar pronunciation<br />
with predefined keywords, although they have different Chinese characters and different meanings, which easily result in<br />
high false alarm rate. In this paper, an improved acoustic KWS method with confusion garbage models was developed<br />
that absorbs similar pronunciation words confused with specific keywords for a given task. One obvious advantage of<br />
such method is that it provides a flexible framework to implement the selection procedure and reduce false alarm rate effectively<br />
for a specific task. The efficiency of the proposed architecture was evaluated under HMM-based confidence<br />
measures (CM) methods and demonstrated on a conversational telephone dataset.<br />
09:00-11:10, Paper ThAT9.13<br />
Human Activity Recognition using Local Shape Descriptors<br />
Venkatesha, Sharath, Univ. of California, Santa Barbara<br />
Turk, Matthew, Univ. of California, Santa Barbara<br />
We propose a method for human activity recognition in videos, based on shape analysis. We define local shape descriptors<br />
for interest points on the detected contour of the human action and build an action descriptor using a Bag of Features<br />
method. We also use the temporal relation among matching interest points across successive video frames. Further, an<br />
SVM is trained on these action descriptors to classify the activity in the scene. The method is invariant to the length of the<br />
video sequence, and hence it is suitable in online activity recognition. We have demonstrated the results on an action database<br />
consisting of nine actions like walk, jump, bend, etc., by twenty people, in indoor and outdoor scenarios. The proposed<br />
method achieves an accuracy of 87%, and is comparable to other state-of-the-art methods.<br />
09:00-11:10, Paper ThAT9.14<br />
Use of Line Spectral Frequencies for Emotion Recognition from Speech<br />
Bozkurt, Elif, Koc Univ.<br />
Erzin, Engin, Koc Univ.<br />
Eroglu Erdem, Cigdem, Bahcesehir Univ.<br />
Erdem, Arif Tanju, Ozyegin Univ.<br />
We propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not<br />
been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled<br />
cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition.<br />
The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant<br />
structure as well, which are related to the emotional state of the speaker [4]. We use the Gaussian mixture model (GMM)<br />
classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin<br />
Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF<br />
features bring a consistent improvement over the MFCC based emotion classification rates.<br />
09:00-11:10, Paper ThAT9.15<br />
Spatially Regularized Common Spatial Patterns for EEG Classification<br />
Lotte, Fabien, Inst. for Infocomm Res.<br />
Guan, Cuntai, Inst. for Infocomm Res.<br />
- 267 -
In this paper, we propose a new algorithm for Brain-Computer Interface (BCI): Spatially Regularized Common Spatial<br />
Patterns (SRCSP). SRCSP is an extension of the famous CSP algorithm which includes spatial a priori in the learning<br />
process, by adding a regularization term which penalizes spatially non smooth filters. We compared SRCSP and CSP algorithms<br />
on data of 14 subjects from BCI competitions. Results suggested that SRCSP can improve performances, around<br />
10% more in classification accuracy, for subjects with poor CSP performances. They also suggested that SRCSP leads to<br />
more physiologically relevant filters than CSP.<br />
09:00-11:10, Paper ThAT9.16<br />
Comparing Multiple Classifiers for Speech-Based Detection of Self-Confidence – A Pilot Study<br />
Krajewski, Jarek, Univ. of Wuppertal<br />
Batliner, Anton, Univ. of Erlangen-Nuremberg<br />
Kessel, Silke, Univ. of Wuppertal<br />
The aim of this study is to compare several classifiers commonly used within the field of speech emotion recognition<br />
(SER) on the speech based detection of self-confidence. A standard acoustic feature set was computed, resulting in 170<br />
features per one-minute speech sample (e.g. fundamental frequency, intensity, formants, MFCCs). In order to identify<br />
speech correlates of self-confidence, the lectures of 14 female participants were recorded, resulting in 306 one-minute<br />
segments of speech. Five expert raters independently assessed the self-confidence impression. Several classification models<br />
(e.g. Random Forest, Support Vector Machine, Naive Bayes, Multi-Layer Perceptron) and ensemble classifiers (AdaBoost,<br />
Bagging, Stacking) were trained. AdaBoost procedures turned out to achieve best performance, both for single models<br />
(AdaBoost LR: 75.2% class-wise averaged recognition rate) and for average boosting (59.3%) within speaker-independent<br />
settings.<br />
09:00-11:10, Paper ThAT9.17<br />
Hierarchical Human Action Recognition by Normalized-Polar Histogram<br />
Ziaeefard, Maryam, Sahand Univ. of Tech.<br />
Ebrahimnezhad, Hossein, Sahand Univ. of Tech.<br />
This paper proposes a novel human action recognition approach which represents each video sequence by a cumulative<br />
skeletonized images (called CSI) in one action cycle. Normalized-polar histogram corresponding to each CSI is computed.<br />
That is the number of pixels in CSI which is located in the certain distance and angles of the normalized circle. Using hierarchical<br />
classification in two levels, human action is recognized. In first level, course classification is performed with<br />
whole bins of histogram. In the second level, the more similar actions are examined again employing the special bins and<br />
the fine classification is completed. We use linear multi-class SVM as the classifier in two steps. Real human action dataset,<br />
Weizmann, is selected for evaluation. The resulting average recognition rate of the proposed method is 97.6%.<br />
09:00-11:10, Paper ThAT9.18<br />
Automatic 3D Facial Expression Recognition based on a Bayesian Belief Net and a Statistical Facial Feature Model<br />
Zhao, Xi, Ec. Centrale de Lyon<br />
Huang, Di, Ec. Centrale de Lyon<br />
Dellandréa, Emmanuel, Ec. Centrale de Lyon<br />
Chen, Liming, Ec. Centrale de Lyon<br />
Automatic facial expression recognition on 3D face data is still a challenging problem. In this paper we propose a novel<br />
approach to perform expression recognition automatically and flexibly by combining a Bayesian Belief Net (BBN) and<br />
Statistical facial feature models (SFAM). A novel BBN is designed for the specific problem with our proposed parameter<br />
computing method. By learning global variations in face landmark configuration (morphology) and local ones in terms of<br />
texture and shape around landmarks, morphable Statistic Facial feature Model (SFAM) allows not only to perform an automatic<br />
landmarking but also to compute the belief to feed the BBN. Tested on the public 3D face expression database<br />
BU-3DFE, our automatic approach allows to recognize expressions successfully, reaching an average recognition rate<br />
over 82%.<br />
- 268 -
09:00-11:10, Paper ThAT9.19<br />
EEG-Based Personal Identification: From Proof-of-Concept to a Practical System<br />
Su, Fei, Beijing Univ. of Posts and Telecommunications<br />
Xia, Liwen, Beijing Univ. of Posts and Telecommunications<br />
Ma, Junshui, Merck Res. Lab. Merck & Co, Inc.<br />
Although the concept of using brain waves, e.g. Electroencephalogram (EEG), for personal identification has been validated<br />
in several studies, some unanswered practical and theoretical questions prevent this technology from further development<br />
for commercialization. Based on a well-designed personal identification experiment using EEG recordings, this study addressed<br />
three of these questions, which are (1) feasibility of using portable EEG equipment, (2) necessity for controlling<br />
factors influencing EEG, (3) the optimal set of features. With our understanding of the answers to these questions, the<br />
EEG-based personal identification system we built achieved an average accuracy of 97.5% on a dataset with 40 subjects.<br />
Results of this study provided supporting evidence that EEG-based personal identification from proof-of-concept to system<br />
implementation is promising.<br />
09:00-11:10, Paper ThAT9.20<br />
Improved Facial Expression Recognition with Trainable 2-D Filters and Support Vector Machines<br />
Peiyao, Li, Univ. of Wollongong<br />
Phung, Son Lam, Univ. of Wollongong<br />
Bouzerdoum, Abdesselam, Univ. of Wollongong<br />
Tivive, Fok Hing Chi, Univ. of Wollongong<br />
Facial expression is one way humans convey their emotional states. Accurate recognition of facial expressions is essential<br />
in perceptual human-computer interface, robotics and mimetic games. This paper presents a novel approach to facial expression<br />
recognition from static images that combines fixed and adaptive 2-D filters in a hierarchical structure. The fixed<br />
filters are used to extract primitive features. They are followed by the adaptive filters that are trained to extract more complex<br />
facial features. Both types of filters are non-linear and are based on the biological mechanism of shunting inhibition.<br />
The features are finally classified by a support vector machine. The proposed approach is evaluated on the JAFFE database<br />
with seven types of facial expressions: anger, disgust, fear, happiness, neutral, sadness and surprise. It achieves a classification<br />
rate of 96.7%, which compares favorably with several existing techniques for facial expression recognition tested<br />
on the same database.<br />
09:00-11:10, Paper ThAT9.21<br />
A Biologically-Inspired Top-Down Learning Model based on Visual Attention<br />
Sang, Nong, Huazhong Univ. of Science and Tech.<br />
Wei, Longsheng, Huazhong Univ. of Science and Tech.<br />
Wang, Yuehuan, Huazhong Univ. of Science and Tech.<br />
A biologically-inspired top-down learning model based on visual attention is proposed in this paper. Low-level visual features<br />
are extracted from learning object itself and do not depend on the background information. All the features are expressed<br />
as a feature vector, which is looked as a random variable following a normal distribution. So every learning object<br />
is represented as the mean and standard deviation. All the learning objects are combined as an object class, which is represented<br />
as class’s mean and class’s standard deviation stored in long-term memory (LTM). Then the learned knowledge<br />
is used to find the similar location in an attended image. Experimental results indicate that: when the attended object<br />
doesn’t always appear in the background similar to that in the learning objects or their combinations change hugely between<br />
learning images and attended images, our model is excellent to the top-down approach of VOCUS and NavaIPakkam’s<br />
statistical model.<br />
09:00-11:10, Paper ThAT9.22<br />
Human Action Recognition using Segmented Skeletal Features<br />
Yoon, Sang Min, Tech. Univ. of Darmstadt<br />
Kuijper, Arjan, Fraunhofer IGD<br />
We present a novel human action recognition system based on segmented skeletal features which are separated into several<br />
human body parts such as face, torso and limbs. Our proposed human action recognition system consists of two steps: (i)<br />
automatic skeletal feature extraction and splitting by measuring the similarity in the space of diffusion tensor fields, and<br />
- 269 -
(ii) multiple kernel Support Vector Machine based human action recognition. Experimental results on a set of test database<br />
show that our proposed method is very efficient and effective to recognize human actions using few parameters, independent<br />
of dimensions, shadows, and viewpoints.<br />
09:00-11:10, Paper ThAT9.23<br />
Action Recognition by Multiple Features and Hyper-Spheremulti-Class SVM<br />
Liu, Jia, Shanghai Jiao Tong Univ.<br />
Yang, Jie, Shanghai Jiao Tong Univ.<br />
Zhang, Yi, Shanghai Jiao Tong Univ.<br />
He, Xiangjian, University of Technology, Sydney<br />
In this paper we propose a novel framework for action recognition based on multiple features for improve action recognition<br />
in videos. The fusion of multiple features is important for recognizing actions as often a single feature based representation<br />
is not enough to capture the imaging variations (view-point, illumination etc.) and attributes of individuals (size, age,<br />
gender etc.). Hence, we use two kinds of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (cuboids<br />
and 2-D SIFT), and ii) the higher-order statistical models of interest points, which aims to capture the global information<br />
of the actor. We construct video representation in terms of local space-time features and global features and integrate such<br />
representations with hyper-sphere multi-class SVM. Experiments on publicly available datasets show that our proposed<br />
approach is effective. An additional experiment shows that using both local and global features provides a richer representation<br />
of human action when compared to the use of a single feature type.<br />
09:00-11:10, Paper ThAT9.24<br />
Multimodal Recognition of Cognitive Workload for Multitasking in the Car<br />
Putze, Felix, Karlsruhe Inst. of Tech.<br />
Jarvis, Jan-Philip, Karlsruhe Inst. of Tech.<br />
Schultz, Tanja, Univ. Karlsruhe<br />
This work describes the development and evaluation of a recognizer for different levels of cognitive workload in the car.<br />
We collected multiple biosignal streams (skin conductance, pulse, respiration, EEG) during an experiment in a driving<br />
simulator in which the drivers performed a primary driving task and several secondary tasks of varying difficulty. From<br />
this data, an SVM based workload classifier was trained and evaluated, yielding recognition rates of up to for three levels<br />
of workload.<br />
09:00-11:10, Paper ThAT9.25<br />
Automatic Facial Action Detection using Histogram Variation between Emotional States<br />
Senechal, Thibaud, ISIR, UPMC<br />
Bailly, Kevin, Univ. PIERRE 1 MARIE CURIE - PARIS 6<br />
Prevost, Lionel, Univ. PIERRE 1 MARIE CURIE - PARIS 6<br />
This article presents an appearance based method to detect automatically facial actions. Our approach focuses on reducing<br />
features sensitivity to identity of the subject. We compute from an expressive image a Local Gabor Binary Pattern (LGBP)<br />
histogram and synthesize a LGBP histogram approaching the one we would compute on a neutral face. Difference between<br />
these two histograms are used as inputs of Support Vector Machine (SVM) binary detectors associated with a new kernel:<br />
the Histogram Difference Intersection (HDI) kernel. Experimental results carried out for 16 Action Units (AUs) on the<br />
benchmark Cohn-Kanade database can be compared favorably with two state-of-the-art methods.<br />
09:00-11:10, Paper ThAT9.27<br />
Decoding Finger Flexion from Electrocorticographic Signals using Sparse Gaussian Process<br />
Wang, Zuoguan, RPI<br />
Ji, Qiang, RPI<br />
Schalk, Gerwin, NYS Dept of Health<br />
Miller, Kai J., Univ. of Washington,<br />
A brain-computer interface (BCI) creates a direct communication pathway between the brain and an external device, and<br />
can thereby restore function in people with severe motor disabilities. A core component in a BCI system is the decoding<br />
- 270 -
algorithm that translates brain signals into action commands of an output device. Most of current decoding algorithms are<br />
based on linear models (e.g., derived using linear regression) that may have important shortcomings. The use of nonlinear<br />
models (e.g., neural networks) could overcome some of these shortcomings, but has difficulties with high dimensional<br />
feature spaces. Here we propose another decoding algorithm that is based on the sparse gaussian process with pseudoinputs<br />
(SPGP). As a nonparametric method, it can model more complex relationships compared to linear methods. As a<br />
kernel method, it can readily deal with high dimensional feature space. The evaluations shown in this paper demonstrate<br />
that SPGP can decode the flexion of finger movements from electrocorticographic (ECoG) signals more accurately than<br />
a previously described algorithm that used a linear model. In addition, by formulating problems in the bayesian probabilistic<br />
framework, SPGP can provide estimation of the prediction uncertainty. Furthermore, the trained SPGP offers a very effective<br />
way for identifying important features.<br />
09:00-11:10, Paper ThAT9.28<br />
Hand Pointing Estimation for Human Computer Interaction based on Two Orthogonal-Views<br />
Hu, Kaoning, State Univ. of New York at Binghamton<br />
Canavan, Shaun, State Univ. of New York at Binghamton<br />
Yin, Lijun, State Univ. of New York at Binghamton<br />
Hand pointing has been an intuitive gesture for human interaction with computers. Big challenges are still posted for accurate<br />
estimation of finger pointing direction in a 3D space. In this paper, we present a novel hand pointing estimation<br />
system based on two regular cameras, which includes hand region detection, hand finger estimation, two views‘ feature<br />
detection, and 3D pointing direction estimation. Based on the idea of binary pattern face detector, we extend the work to<br />
hand detection, in which a polar coordinate system is proposed to represent the hand region, and achieved a good result<br />
in terms of the robustness to hand orientation variation. To estimate the pointing direction, we applied an AAM based approach<br />
to detect and track 14 feature points along the hand contour from a top view and a side view. Combining two views<br />
of the hand features, the 3D pointing direction is estimated. The experiments have demonstrated the feasibility of the system.<br />
09:00-11:10, Paper ThAT9.29<br />
A Brain-Computer Interface for Mental Arithmetic Task from Single-Trial Near-Infrared Spectroscopy Brain Signals<br />
Ang, Kai Keng, Inst. for Infocomm Res. A*STAR<br />
Guan, Cuntai, Inst. for Infocomm Res.<br />
Lee, Kerry, National Inst. of Education<br />
Lee, Jie Qi, National Inst. of Education<br />
Nioka, Shoko, Univ. of Pennsylvania<br />
Chance, Britton, Univ. of Pennsylvania<br />
Near-infrared spectroscopy (NIRS) enables non-invasive recording of cortical hemoglobin oxygenation in human subjects<br />
through the intact skull using light in the near-infrared range to determine. Recently, NIRS-based brain-computer interfaces<br />
are introduced for discriminating left and right-hand motor imagery. A neuroimaging study has also revealed event-related<br />
hemodynamic responses associated with the performance of mental arithmetic tasks. This paper proposes a novel BCI for<br />
detecting changes resulting from increases in the magnitude of operands used in a mental arithmetic task, using data from<br />
single-trial NIRS brain signals. We measured hemoglobin responses from 20 healthy subjects as they solved mental arithmetic<br />
problems with three difficulty levels. Accuracy in recognizing one difficulty level from another is then presented<br />
using 5 ? 5-fold cross-validations on the data collected. The results yielded an overall average accuracy of 71.2%, thus<br />
demonstrating potential in the proposed NIRS-based BCI in recognizing difficulty of problems encountered by mental<br />
arithmetic problem solvers.<br />
09:00-11:10, Paper ThAT9.30<br />
Articulated Human Body: 3D Pose Estimation using a Single Camera<br />
Wang, Zibin, The Chinese Univ. of Hong Kong<br />
Chung, Chi-Kit Ronald, The Chinese Univ. of Hong Kong<br />
We address how human pose in 3D can be tracked from a monocular video using a probabilistic inference method. Human<br />
body is modeled as a number of cylinders in space, each with an appearance facet as well as a pose facet. The appearance<br />
facets are acquired in a learning phase from some beginning frames of the input video. On this the visual hull description<br />
- 271 -
of the target human subject constructed from multiple images is found to be instrumental. In the operation phase, the 3D<br />
pose of the target subject in the subsequent frames of the input video is tracked. A bottom-up framework is used, which<br />
for any current image frame extracts firstly the tentative candidates of each body part in the image space. The human<br />
model, with the appearance facets already learned, and with the pose entries initialized with those for the previous image<br />
frame, is then brought in under a belief propagation algorithm, to establish correlation with the above 2D body part candidates<br />
while enforcing the proper articulation between the body parts, thereby determining the 3D pose of the human<br />
body in the current frame. The tracking performance on a number of monocular videos is shown.<br />
09:00-11:10, Paper ThAT9.31<br />
Resampling Approach to Facial Expression Recognition using 3D Meshes<br />
Murthy, O. V. Ramana, NUS<br />
Venkatesh, Y. V., NUS<br />
Kassim, Ashraf, NUS<br />
We propose a novel strategy, based on resampling of 3D meshes, to recognize facial expressions. This entails conversion<br />
of the existing irregular 3D mesh structure in the database to a uniformly sampled 3D matrix structure. An important consequence<br />
of this operation is that the classical correspondence problem can be dispensed with. In the present paper, in<br />
order to demonstrate the feasibility of the proposed strategy, we employ only spectral flow matrices as features to recognize<br />
facial expressions. Experimental results are presented, along with suggestions for possible refinements to the strategy to<br />
improve classification accuracy.<br />
09:00-11:10, Paper ThAT9.33<br />
Facial Expression Mimicking System<br />
Fukui, Ryuichi, Toyohashi Univ. of Tech.<br />
Katsurada, Kouichi, Toyohashi Univ. of Tech.<br />
Iribe, Yurie, Toyohashi Univ. of Tech.<br />
Nitta, Tsuneo, Toyohashi Univ. of Tech.<br />
We propose a facial expression mimicking system that copies the facial expression of one person on the image of another.<br />
The system uses the active appearance model (AAM), a commonly used model in the field of facial expression processing.<br />
AAM compositionally comprises some parameters representing facial shape, brightness, and illumination environment.<br />
Therefore, in addition to the facial expression elements, the model parameters express other elements, such as individuality<br />
and direction of the face. In order to extract the facial expression elements from compositional parameters of AAM, we<br />
applied principal component analysis (PCA) to the AAM parameter values, collected with each change in facial expression.<br />
The obtained facial expression model is applied to the facial expression mimicking system and the experiment shows its<br />
effectiveness for mimicking.<br />
09:00-11:10, Paper ThAT9.34<br />
A Framework for Hand Gesture Recognition and Spotting using Sub-Gesture Modeling<br />
Malgireddy, Manavender, Univ. at Buffalo, SUNY<br />
Corso, Jason, Univ. at Buffalo, SUNY<br />
Setlur, Srirangaraj, Univ. at Buffalo<br />
Govindaraju, Venu, Univ. at Buffalo<br />
Mandalapu, Dinesh, HP Lab.<br />
Hand gesture interpretation is an open research problem in Human Computer Interaction (HCI), which involves locating<br />
gesture boundaries (Gesture Spotting) in a continuous video sequence and recognizing the gesture. Existing techniques<br />
model each gesture as a temporal sequence of visual features extracted from individual frames which is not efficient due<br />
to the large variability of frames at different timestamps. In this paper, we propose a new sub-gesture modeling approach<br />
which represents each gesture as a sequence of fixed sub-gestures (a group of consecutive frames with locally coherent<br />
context) and provides a robust modeling of the visual features. We further extend this approach to the task of gesture spotting<br />
where the gesture boundaries are identified using a filler model and gesture completion model. Experimental results<br />
show that the proposed method outperforms state-of-the-art Hidden Conditional Random Fields (HCRF) based methods<br />
and baseline gesture spotting techniques.<br />
- 272 -
09:00-11:10, Paper ThAT9.35<br />
Off-Line Signature Verification using Graphical Model<br />
Lv, Hairong<br />
Bai, Xinxin, IBM Res. – China<br />
Yin, Wenjun, IBM Res. – China<br />
Dong, Jin, IBM Res. – China<br />
In this paper, we propose a novel probabilistic graphical model to address the off-line signature verification problem. Different<br />
from previous work, our approach introduces the concept of feature roles according to their distribution in genuine<br />
and forgery signatures, with all these features represented by a unique graphical model. And we propose several new techniques<br />
to improve the performance of the new signature verification system. Results based on 200 persons’ signatures<br />
(16000 signature samples) indicate that the proposed method outperforms other popular techniques for off-line signature<br />
verification with a great improvement.<br />
09:00-11:10, Paper ThAT9.36<br />
Linear Facial Expression Transfer with Active Appearance Models<br />
De La Hunty, Miles, Australian National Univ.<br />
Asthana, Akshay, Australian National Univ.<br />
Goecke, Roland, Univ. of Canberra<br />
The issue of transferring facial expressions from one person’s face to another’s has been an area of interest for the movie<br />
industry and the computer graphics community for quite some time. In recent years, with the proliferation of online image<br />
and video collections and web applications, such as Google Street View, the question of preserving privacy through face<br />
de-identification has gained interest in the computer vision community. In this paper, we focus on the problem of realtime<br />
dynamic facial expression transfer using an Active Appearance Model framework. We provide a theoretical foundation<br />
for a generalisation of two well-known expression transfer methods and demonstrate the improved visual quality of the<br />
proposed linear extrapolation transfer method on examples of face swapping and expression transfer using the AVOZES<br />
data corpus. Realistic talking faces can be generated in real-time at low computational cost.<br />
09:00-11:10, Paper ThAT9.37<br />
Fractal and Multi-Fractal for Arabic Offline Writer Identification<br />
Chaabouni, Aymen, Univ. of Sfax<br />
Boubaker, Houcine, Univ. of Sfax<br />
Kherallah, Monji, Univ. of Sfax<br />
El Abed, Haikal, Technische Universitat Braunschweig<br />
Alimi, Adel M., Univ. of Sfax<br />
In recent years, fractal and multi-fractal analysis have been widely applied in many domains, especially in the field of<br />
image processing. In this direction we present in this paper a novel method for Arabic text-dependent writer identification<br />
based on fractal and multi-fractal features; thus, from the images of Arabic words, we calculate their fractal dimensions<br />
by using the Box-counting method, then we calculate their multi-fractal dimensions by using the method of DLA (Diffusion<br />
Limited Aggregates). To evaluate our method, we used 50 writers of the ADAB database, each writer wrote 288 words<br />
(24 Tunisian cities repeated 12 times) with 2/3 of words are used for the learning phase and the rest is used for the identification.<br />
The results obtained by using knearest neighbor classifier, demonstrate the effectiveness of our proposed method.<br />
09:00-11:10, Paper ThAT9.38<br />
A Simulation Study on the Generative Neural Ensemble Decoding Algorithms<br />
Kim, Sung-Phil, Korea Univ.<br />
Kim, Min-Ki, Korea Univ.<br />
Park, Gwi-Tae, Korea Univ.<br />
Brain-computer interfaces rely on accurate decoding of cortical activity to understand intended action. Algorithms for<br />
neural decoding can be broadly categorized into two groups: direct versus generative methods. Two generative models,<br />
the population vector algorithm (PVA) and the Kalman filter (KF), have been widely used for many intracortical BCI studies,<br />
where KF generally showed superior decoding to PVA. However, little has been known for which conditions each algorithm<br />
works properly and how KF translates the ensemble information. To address these questions, we performed a<br />
- 273 -
simulation study and demonstrated that KF and PVA worked congruently for uniformly distributed preferred directions<br />
(PDs) whereas KF outperformed PVA for non-uniform PDs. In addition, we showed that KF decoded better than PVA for<br />
low signal-to-noise ratio (SNR) or a small ensemble size. The results suggest that KF may decode direction better than<br />
PVA with non-uniform PDs or with low SNR and small ensemble size.<br />
09:00-11:10, Paper ThAT9.39<br />
3D Active Shape Model for Automatic Facial Landmark Location Trained with Automatically Generated Landmark<br />
Points<br />
Zhou, Dianle, TMSP<br />
Petrovska-Delacretaz, Dijana, Inst. Telecom SudParis (ex GET-INT)<br />
Dorizzi, Bernadette, TELECOM & Management SudParis<br />
In this paper, a 3D Active Shape Model (3DASM) algorithm is presented to automatically locate facial landmarks from different<br />
views. The 3DASM is trained by setting different shape and texture parameters of 3D Morphable Model (3DMM).<br />
Using 3DMM to synthesize training data offers us two advantages: first, few manual operations are need, except labeling<br />
landmarks on the mean face of 3DMM. Second, since the learning data are directly from 3DMM, landmarks have one to one<br />
correspondence between the 2D points detected from the image and 3D points on 3DMM. This kind of correspondence will<br />
benefit 3D face reconstruction processing. During fitting, 3D rotation parameters are added comparing to 2D Active Shape<br />
Model (ASM). So we separate shape variations into intrinsic change (caused by the character of different person) and extrinsic<br />
change (caused by model projection). The experimental results show that our method is robust to pose variation.<br />
09:00-11:10, Paper ThAT9.40<br />
Using Moments on Spatiotemporal Plane for Facial Expression Recognition<br />
Ji, Yi, INSA de Lyon<br />
Idrissi, Khalid, INSA de Lyon<br />
In this paper, we propose a novel approach to capture the dynamic deformation caused by facial expressions. The proposed<br />
method is concentrated on the spatiotemporal plane which is not well explored. It uses the moments as features to describe<br />
the movements of essential components such as eyes and mouth on vertical time plane. The system we developed can automatically<br />
recognize the expression on images as well as on image sequences. The experiments are performed on 348 sequences<br />
from 95 subjects in Cohn-Kanade database and obtained good results as high as 96.1% in 7-class recognition for<br />
frames and 98.5% in 6-class for sequences.<br />
09:00-11:10, Paper ThAT9.41<br />
Towards a More Realistic Appearance-Based Gait Representation for Gender Recognition<br />
Martín-Félez, Raúl, Univ. Jaume I<br />
Mollineda, Ramón A., Univ. Jaume I<br />
Sanchez, J. Salvador, Univ. Jaume I<br />
A realistic appearance-based representation of side-view gait sequences is here introduced. It is based on a prior method<br />
where a set of appearance-based features of a gait sample is used for gender recognition. These features are computed from<br />
parameter values of ellipses that fit body parts enclosed by regions previously defined while ignoring well-known facts of<br />
the human body structure. This work presents an improved regionalization method supported by some adaptive heuristic<br />
rules to better adjust regions to body parts. As a result, more realistic ellipses and a more meaningful feature space are obtained.<br />
Gender recognition experiments conducted on the CASIA Gait Database show better classification results when using<br />
the new features.<br />
09:00-11:10, Paper ThAT9.42<br />
A Calibration-Free Head Gesture Recognition System with Online Capability<br />
Wöhler, Nils-Christian, Bielefeld Univ.<br />
Großekathöfer, Ulf, Bielefeld Univ.<br />
Dierker, Angelika, Bielefeld Univ.<br />
Hanheide, Marc, Univ. of Birmingham<br />
Kopp, Stefan, Bielefeld Univ.<br />
Hermann, Thomas, Bielefeld Univ.<br />
- 274 -
In this paper, we present a calibration-free head gesture recognition system using a motion-sensor-based approach. For<br />
data acquisition we conducted a comprehensive study with 10 subjects. We analyzed the resulting head movement data<br />
with regard to separability and transferability to new subjects. Ordered means models (OMMs) were used for classification,<br />
since they provide an easy-to-use, fast, and stable approach to machine learning of time series. In result, we achieved classification<br />
rates of 85-95% for nodding, head shaking and tilting head gestures and good transferability. Finally, we show<br />
first promising attempts towards online recognition.<br />
09:00-11:10, Paper ThAT9.43<br />
TrajAlign: A Method for Precise Matching of 3-D Trajectories<br />
Aung, Zeyar, Inst. for Infocomm Res. Singapore<br />
Sim, Kelvin, Inst. for Infocomm Res. Singapore<br />
Ng, Wee Siong, Inst. for Infocomm Res. Singapore<br />
Matching two 3-D trajectories is an important task in a number of applications. The trajectory matching problem can be<br />
solved by aligning the two trajectories and taking the alignment score as their similarity measurement. In this paper, we<br />
propose a new method called “TrajAlign” (Trajectory Alignment). It aligns two trajectories by means of aligning their<br />
representative distance matrices. Experimental results show that our method is significantly more precise than the existing<br />
state-of-the-art methods. While the existing methods can provide correct answers in only up to 67% of the test cases, TrajAlign<br />
can offer correct results in 79% (i.e. 12% more) of the test cases, TrajAlign is also computationally inexpensive,<br />
and can be used practically for applications that demand efficiency.<br />
09:00-11:10, Paper ThAT9.44<br />
Real-Time 3D Model based Gesture Recognition for Multimedia Control<br />
Lin, Shih-Yao, National Taiwan Univ.<br />
Lai, Yun-Chien, National Taiwan Univ.<br />
Chan, Li-Wei, National Taiwan Univ.<br />
Hung, Yi-Ping, National Taiwan Univ.<br />
This paper presents a new 3D model-based gesture tracking system for controlling multimedia player in an intuitive way.<br />
The motivation of this paper is to make home appliance aware of user’s intention. This 3D model-based gesture tracking<br />
system adopts a Bayesian framework to track the user’s 3D hand position and to recognize meaning of these postures for<br />
controlling 3D player interactively. To avoid the high dimensionality of the whole 3D upper body model, which may complicate<br />
the gesture tracking problem, our system applies a novel hierarchical tracking algorithm to improve the system<br />
performance. Moreover, this system applies multiple cues for improving the accuracy of tracking results. Based on the<br />
above idea, we have implemented a 3D hand gesture interface for controlling multimedia players. Experimental results<br />
have shown that the proposed system robustly tracks the 3D position of the hand and has high potential for controlling the<br />
multimedia player.<br />
09:00-11:10, Paper ThAT9.45<br />
Motif Discovery and Feature Selection for CRF-Based Activity Recognition<br />
Zhao, Liyue, Univ. of Central Florida<br />
Wang, Xi, Univ. of Central Florida<br />
Sukthankar, Gita, Univ. of Central Florida<br />
Sukthankar, Rahul, Intel Labs Pittsburgh and Carnegie Mellon University<br />
Due to their ability to model sequential data without making unnecessary independence assumptions, conditional random<br />
fields (CRFs) have become an increasingly popular discriminative model for human activity recognition. However, how<br />
to represent signal sensor data to achieve the best classification performance within a CRF model is not obvious. This<br />
paper presents a framework for extracting motif features for CRF-based classification of IMU (inertial measurement unit)<br />
data. To do this, we convert the signal data into a set of motifs, approximately repeated symbolic sub sequences, for each<br />
dimension of IMU data. These motifs leverage structure in the data and serve as the basis to generate a large candidate set<br />
of features from the multi-dimensional raw data. By measuring reductions in the conditional log-likelihood error of the<br />
training samples, we can select features and train a CRF classifier to recognize human activities. An evaluation of our<br />
classifier on the CMU Multi-Modal Activity Database reveals that it outperforms the CRF-classifier trained on the raw<br />
features as well as other standard classifiers used in prior work.<br />
- 275 -
09:00-11:10, Paper ThAT9.46<br />
On-Line Signature Verification using 1-D Velocity-Based Directional Analysis<br />
Muhammad Talal Ibrahim, Ryerson Unviersity<br />
Matthew, Kyan, Ryerson Unviersity<br />
M. Aurangzeb, Khan, COMSATS Inst. of Information Tech.<br />
Ling, Guan, Ryerson Unviersity<br />
In this paper, we propose a novel approach for identity verification based on the directional analysis of velocity-based<br />
partitions of an on-line signature. First, inter-feature dependencies in a signature are exploited by decomposing the shape<br />
(horizontal trajectory, vertical trajectory) into two partitions based on the velocity profile of the base-signature for each<br />
signer, which offers the flexibility of analyzing both low and high-curvature portions of the trajectory independently. Further,<br />
these velocity-based shape partitions are analyzed directionally on the basis of relative angles. Support Vector Machine<br />
(SVM) is then used to find the decision boundary between the genuine and forgery class. Experimental results demonstrate<br />
the superiority of our approach in on-line signature verification in comparison with other techniques.<br />
09:00-11:10, Paper ThAT9.47<br />
Age Classification based on Gait using HMM<br />
Zhang, De, Beihang Univ.<br />
Wang, Yunhong, Beihang Univ.<br />
Bhanu, Bir, Univ. of California<br />
In this paper we propose a new framework for age classification based on human gait using Hidden Markov Model (HMM).<br />
A gait database including young people and elderly people is built. To extract appropriate gait features, we consider a contour<br />
related method in terms of shape variations during human walking. Then the image feature is transformed to a lowerdimensional<br />
space by using the Frame to Exemplar (FED) distance. A HMM is trained on the FED vector sequences.<br />
Thus, the framework provides flexibility in the selection of gait feature representation. In addition, the framework is robust<br />
for classification due to the statistical nature of HMM. The experimental results show that video-based automatic age classification<br />
from human gait is feasible and reliable.<br />
09:00-11:10, Paper ThAT9.48<br />
Human Electrocardiogram for Biometrics using DTW and FLDA<br />
N, Venkatesh, Tata Consultancy Services Innovation Lab.<br />
Jayaraman, Srinivasan, Tata Consultancy Services, Bangalore<br />
This paper proposes a new approach for person identification and novel person authentication using single lead human<br />
Electrocardiogram. Nine Feature parameters were extracted from ECG in spatial domain for classification. For person<br />
identification, Dynamic Time Warping (DTW) and Fisher‘s Linear Discriminant Analysis (FLDA) with K-Nearest Neighbor<br />
Classifier (NNC) as single stage classification yielded a recognition accuracy of 96% and 97% respectively. To further<br />
improve the performance of the system, two stage classification techniques have been adapted. In two stage classifications<br />
FLDA is used with k-NNC at the first stage followed by DTW classifier at the second stage which yielded 100% recognition<br />
accuracy. During person authentication we adapted the QRS complex based threshold technique. The overall performance<br />
of the system was 96% for both legal and intruder situations is verified for MIT-BIH normal database size of 375 recording<br />
from 15 individual ECG.<br />
09:00-11:10, Paper ThAT9.49<br />
Recognizing Sign Language from Brain Imaging<br />
Mehta, Nishant, Georgia Inst. of Tech.<br />
Starner, Thad, Georgia Inst. of Tech.<br />
Moore Jackson, Melody, Georgia Inst. of Tech.<br />
Babalola, Karolyn, Georgia Inst. of Tech.<br />
James, George Andrew, Univ. of Arkansas<br />
Classification of complex motor activities from brain imaging is relatively new in the fields of neuroscience and braincomputer<br />
interfaces (BCIs). We report sign language classification results for a set of three contrasting pairs of signs. Executed<br />
sign accuracy was 93.3%, and imagined sign accuracy was 76.7%. For a full multiclass problem, we used a decision<br />
directed acyclic graph of pairwise support vector machines, resulting in 63.3% accuracy for executed sign and 31.4% ac-<br />
- 276 -
curacy for imagined sign. Pairwise comparison of phrases composed of these signs yielded a mean accuracy of 73.4%.<br />
These results suggest the possibility of BCIs based on sign language.<br />
09:00-11:10, Paper ThAT9.50<br />
American Sign Language Phrase Verification in an Educational Game for Deaf Children<br />
Zafrulla, Zahoor, Georgia Inst. of Tech.<br />
Brashear, Helene, Georgia Inst. of Tech.<br />
Yin, Pei, Georgia Inst. of Tech.<br />
Presti, Peter, Georgia Inst. of Tech.<br />
Starner, Thad, Georgia Inst. of Tech.<br />
Hamilton, Harley, Georgia Inst. of Tech.<br />
We perform real-time American Sign Language (ASL) phrase verification for an educational game, CopyCat, which is<br />
designed to improve deaf children’s signing skills. Taking advantage of context information in the game we verify a phrase,<br />
using Hidden Markov Models (HMMs), by applying a rejection threshold on the probability of the observed sequence for<br />
each sign in the phrase. We tested this approach using 1204 signed phrase samples from 11 deaf children playing the game<br />
during the phase two deployment of CopyCat. The CopyCat data set is particularly challenging because sign samples are<br />
collected during live game play and contain many variations in signing and disfluencies. We achieved a phrase verification<br />
accuracy of 83% compared to 90% real-time performance by a sign linguist. We report on the techniques required to reach<br />
this level of performance.<br />
09:00-11:10, Paper ThAT9.51<br />
A Robust Method for Hand Gesture Segmentation and Recognition using Forward Spotting Scheme in Conditional<br />
Random Fields<br />
Elmezain, Mahmoud, Otto-von-Guericke-Univ. Magdeburg<br />
Al-Hamadi, Ayoub, Otto-von-Guericke-Univ. Magdeburg<br />
Michaelis, Bernd, Otto-von-Guericke-Univ. Magdeburg<br />
This paper proposes a forward spotting method that handles hand gesture segmentation and recognition simultaneously<br />
without time delay. To spot meaningful gestures of numbers (0-9) accurately, a stochastic method for designing a nongesture<br />
model using Conditional Random Fields (CRFs) is proposed without training data. The non-gesture model provides<br />
a confidence measures that are used as an adaptive threshold to find the start and the end point of meaningful gestures.<br />
Experimental results show that the proposed method can successfully recognize isolated gestures with 96.51% and meaningful<br />
gestures with 90.49% reliability.<br />
09:00-11:10, Paper ThAT9.52<br />
Real-Time Upper-Limbs Posture Recognition based on Particle Filters and AdaBoost Algorithms<br />
Fahn, Chin-Shyurng, National Taiwan Univ. of Science and Tech.<br />
Chiang, Sheng-Lung, National Taiwan Univ. of Science and Tech.<br />
In this paper, we employ particle filters to dynamically locate a face and upper-limbs. To prevent from the disturbance<br />
caused by skin color regions, such as other naked parts of a human body, or some skin color-like objects in the background,<br />
we further take the motion cue as a feature during the tracking. Currently, we prescribe eight kinds of upper-limbs postures<br />
with reference to the characteristic of flag semaphore. The advantage is that we can utilize the relative positions of a face<br />
and two hands to recognize the postures easily. To achieve posture recognition, we evaluate three different classifiers using<br />
the machine learning methods: multi-layer perceptrons, support vector machines, and AdaBoost algorithms. The experimental<br />
results reveal that AdaBoost algorithms are the best one, which reach the accuracy rate of recognizing upper-limbs<br />
postures more than 95% and require much less training time than the other two do.<br />
09:00-11:10, Paper ThAT9.53<br />
One-Lead ECG-Based Personal Identification using Ziv-Merhav Cross Parsing<br />
Pereira Coutinho, David, Inst. Superior de Engenharia de Lisboa<br />
Fred, Ana Luisa Nobre, Inst. Superior Técnico<br />
Figueiredo, Mario A. T., Inst. Superior Técnico<br />
- 277 -
The advance of falsification technology increases security concerns and gives biometrics an important role in security solutions.<br />
The electrocardiogram (ECG) is an emerging biometric that does not need liveliness verification. There is strong<br />
evidence that ECG signals contain sufficient discriminative information to allow the identification of individuals from a<br />
large population. Most approaches rely on ECG data and the fiducia of different parts of the heartbeat waveform. However<br />
non-fiducial approaches have proved recently to be also effective, and have the advantage of not relying critically on the<br />
accurate extraction of fiducia data. In this paper, we propose a new % NEW DAV non-fiducial ECG biometric identification<br />
method based on data compression techniques, namely the Ziv-Merhav cross parsing algorithm for symbol sequences<br />
(strings). Our method relies on a string similarity measure derived from algorithmic cross complexity concept and its compression-based<br />
approximation. NEW DAV We present results on real data, one-lead ECG, acquired during a concentration<br />
task, from 19 healthy individuals. Our approach achieves 100% subject recognition rate despite the existence of differentiated<br />
stress states.<br />
09:00-11:10, Paper ThAT9.54<br />
Multimodal Human Computer Interaction with MIDAS Intelligent Infokiosk<br />
Karpov, Alexey, Russian Acad. of Sciences<br />
Ronzhin, Andrey, Russian Acad. of Sciences<br />
Kipyatkova, Irina, Russian Acad. of Sciences<br />
Ronzhin, Alexander, Russian Acad. of Sciences<br />
Akarun, Lale, Bogazici Univ.<br />
In this paper, we present an intelligent information kiosk called MIDAS (Multimodal Interactive-Dialogue Automaton for<br />
Self-service), including its hardware and software architecture, stages of deployment of speech recognition and synthesis<br />
technologies. MIDAS uses the methodology Wizard of Oz (WOZ) that allows an expert to correct speech recognition<br />
results and control the dialogue flow. User statistics of the multimodal human computer interaction (HCI) have been analyzed<br />
for the operation of the kiosk in the automatic and automated modes. The infokiosk offers information about the<br />
structure and staff of laboratories, the location and phones of departments and employees of the institution. The multimodal<br />
user interface is provided with a touch screen, natural speech input and head and manual gestures, both for ordinary and<br />
physically handicapped users.<br />
09:00-11:10, Paper ThAT9.55<br />
View Invariant Body Pose Estimation based on Biased Manifold Learning<br />
Hur, Dongcheol, Korea Univ.<br />
Lee, Seong-Whan, Korea Univ.<br />
Wallraven, Christian, MPI for Biological Cybernetics<br />
In human body pose estimation, manifold learning is a popular technique for reducing the dimension of 2D images and<br />
3D body configuration data. This technique, however, is especially vulnerable to silhouette variation such as caused by<br />
viewpoint changes. In this paper, we propose a novel approach that combines three separate manifolds for representing<br />
variations in viewpoint, pose and 3D body configuration. We use biased manifold learning to learn these manifolds with<br />
appropriately weighted distances. A set of four mapping functions are then learned by a generalized regression neural network<br />
for added robustness. Despite using only three manifolds, we show that this method can reliably estimate 3D body<br />
poses from 2D images with all learned viewpoints.<br />
09:00-11:10, Paper ThAT9.56<br />
Visual Gaze Estimation by Joint Head and Eye Information<br />
Valenti, Roberto, Univ. of Amsterdam<br />
Lablack, Adel, UMR USTL/CNRS 8022<br />
Sebe, Nicu, Univ. of Trento<br />
Djeraba, Chabane, UMR USTL/CNRS 8022<br />
Gevers, Theo, Univ. of Amsterdam<br />
In this paper, we present an unconstrained visual gaze estimation system. The proposed method extracts the visual field<br />
of view of a person looking at a target scene in order to estimate the approximate location of interest (visual gaze). The<br />
novelty of the system is the joint use of head pose and eye location information to fine tune the visual gaze estimated by<br />
the head pose only, so that the system can be used in multiple scenarios. The improvements obtained by the proposed approach<br />
are validated using the Boston University head pose dataset, on which the standard deviation of the joint visual<br />
- 278 -
gaze estimation improved by 61:06% horizontally and 52:23% vertically with respect to the gaze estimation obtained by<br />
the head pose only. A user study shows the potential of the proposed system.<br />
09:00-11:10, Paper ThAT9.57<br />
Discrimination of Moderate and Acute Drowsiness based on Spontaneous Facial Expressions<br />
Vural, Esra, Univ. of California San Diego<br />
Bartlett, Marian Stewart, Univ. of California San Diego<br />
Littlewort, Gwen, Univ. of California San Diego<br />
Cetin, Mujdat, Sabanci Univ.<br />
Ercil, Aytul, Sabanci Univ.<br />
Movellan, Javier, Univ. of California San Diego<br />
It is important for drowsiness detection systems to identify different levels of drowsiness and respond appropriately at<br />
each level. This study explores how to discriminate moderate from acute drowsiness by applying computer vision techniques<br />
to the human face. In our previous study, spontaneous facial expressions measured through computer vision techniques<br />
were used as an indicator to discriminate alert from acutely drowsy episodes. In this study we are exploring which<br />
facial muscle movements are predictive of moderate and acute drowsiness. The effect of temporal dynamics of action<br />
units on prediction performances is explored by capturing temporal dynamics using an over complete representation of<br />
temporal Gabor Filters. In the final system we perform feature selection to build a classifier that can discriminate moderate<br />
drowsy from acute drowsy episodes. The system achieves a classification rate of .96 A’ in discriminating moderately<br />
drowsy versus acutely drowsy episodes. Moreover the study reveals new information in facial behavior occurring during<br />
different stages of drowsiness.<br />
11:10-12:10, ThPL1 Anadolu Auditorium<br />
J.K. Aggarwal Prize Lecture:<br />
Scene and Object Recognition in Context<br />
Antonio Torralba Plenary Session<br />
Computer Science and Artificial Intelligence Laboratory<br />
Dept. of Electrical Engineering and Computer Science<br />
MIT, USA<br />
Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been<br />
much progress and there are already object recognition systems operating in commercial products. Most of the algorithms<br />
for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions<br />
with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem<br />
by brute force. However, in the real world, objects tend to co-vary with other objects, providing a rich collection of<br />
contextual associations. These contextual associations can be used to reduce the search space by looking only in places in<br />
which the object is expect to be; this also increases performance, by rejecting image patterns that appear to look like the<br />
target object but that are in unlikely places.<br />
As the field moves into integrated systems that try to recognize many object classes and learn about contextual relationships<br />
between objects, the lack of large annotated datasets hinders the fast development of robust solutions. In this talk I will<br />
describe recent work on visual scene understanding that try to build integrated models for scene and object recognition,<br />
emphasizing the power of large database of annotated images in computer vision.<br />
ThBT1 Marmara Hall<br />
Object Detection and Recognition - V Regular Session<br />
Session chair: Wang, Yunhong (Beihang Univ.)<br />
13:30-13:50, Paper ThBT1.1<br />
Finding Multiple Object Instances with Occlusion<br />
Guo, Ge, Chinese Acad. of Sciences<br />
Jiang, Tingting, Peking Univ.<br />
Wang, Yizhou, School of EECS, Peking<br />
Gao, Wen, Peking Univ.<br />
- 279 -
In this paper we provide a framework of detection and localization of multiple similar shapes or object instances from an<br />
image based on shape matching. There are three challenges about the problem. The first is the basic shape matching<br />
problem about how to find the correspondence and transformation between two shapes; second how to match shapes under<br />
occlusion; and last how to recognize and locate all the matched shapes in the image. We solve these problems by using<br />
both graph partition and shape matching in a global optimization framework. A Hough-like collaborative voting is adopted,<br />
which provides a good initialization, data-driven information, and plays an important role in solving the partial matching<br />
problem due to occlusion. Experiments demonstrate the efficiency of our method.<br />
13:50-14:10, Paper ThBT1.2<br />
Bag of Hierarchical Co-Occurrence Features for Image Classification<br />
Kobayashi, Takumi, National Inst. of Advanced Industrial Science and<br />
Otsu, Nobuyuki, National Inst. of Advanced Industrial Science and<br />
We propose a bag-of-hierarchical-co-occurrence features method incorporating hierarchical structures for image classification.<br />
Local co-occurrences of visual words effectively characterize the spatial alignment of objects‘ components. The<br />
visual words are hierarchically constructed in the feature space, which helps us to extract higher-level words and to avoid<br />
quantization error in assigning the words to descriptors. For extracting descriptors, we employ two types of features hierarchically:<br />
narrow (local) descriptors, like SIFT [1], and broad descriptors based on co-occurrence features. The proposed<br />
method thus captures the co-occurrences of both small and large components. We conduct an experiment on image classification<br />
by applying the method to the Caltech 101 dataset and show the favorable performance of the proposed method.<br />
14:10-14:30, Paper ThBT1.3<br />
Person Detection using Temporal and Geometric Context with a Pan Tilt Zoom Camera<br />
Del Bimbo, Alberto, Univ. of Florence<br />
Lisanti, Giuseppe, Univ. of Florence<br />
Masi, Iacopo, Univ. of Florence<br />
Pernici, Federico, Univ. of Florence<br />
In this paper we present a system that integrates automatic camera geometry estimation and object detection from a Pan<br />
Tilt Zoom camera. We estimate camera pose with respect to a world scene plane in real-time and perform human detection<br />
exploiting the relative space-time context. Using camera self-localization, 2D object detections are clustered in a 3D world<br />
coordinate frame. Target scale inference is further exploited to reduce the number of false alarms and to increase also the<br />
detection rate in the final non-maximum suppression stage. Our integrated system applied on real-world data shows superior<br />
performance with respect to the standard detector used.<br />
14:30-14:50, Paper ThBT1.4<br />
Disparity Map Refinement for Video based Scene Change Detection using a Mobile Stereo Camera Platform<br />
Haberdar, Hakan, Univ. of Houston<br />
Shah, Shishir, Univ. of Houston<br />
This paper presents a novel disparity map refinement method and vision based surveillance framework for the task of detecting<br />
objects of interest in dynamic outdoor environments from two stereo video sequences taken at different times and<br />
from different viewing angles by a mobile camera platform. The proposed framework includes several steps, the first of<br />
which computes disparity maps of the same scene in two video sequences. Preliminary disparity images are refined based<br />
on estimated disparities in neighboring frames. Segmentation is performed to estimate ground planes, which in turn are<br />
used for establishing spatial registration between the two video sequences. Finally, the regions of change are detected<br />
using the combination of texture and intensity gradient features. We present experiments on detection of objects of different<br />
sizes and textures in real videos.<br />
14:50-15:10, Paper ThBT1.5<br />
Using Symmetry to Select Fixation Points for Segmentation<br />
Kootstra, Gert, KTH<br />
Bergström, Niklas, Royal Inst. of Tech.<br />
Kragic, Danica, KTH<br />
- 280 -
For the interpretation of a visual scene, it is important for a robotic system to pay attention to the objects in the scene and<br />
segment them from their background. We focus on the segmentation of previously unseen objects in unknown scenes. The<br />
attention model therefore needs to be bottom-up and context-free. In this paper, we propose the use of symmetry, one of<br />
the Gestalt principles for figure-ground segregation, to guide the robot’s attention. We show that our symmetry-saliency<br />
model outperforms the contrast-saliency model, proposed in (Itti et al 1998). The symmetry model performs better in finding<br />
the objects of interest and selects a fixation point closer to the center of the object. Moreover, the objects are better<br />
segmented from the background when the initial points are selected on the basis of symmetry.<br />
ThBT2 Anadolu Auditorium<br />
Classification - II Regular Session<br />
Session chair: Pelillo, Marcello (Ca’Foscari Univ.)<br />
13:30-13:50, Paper ThBT2.1<br />
Data Classification on Multiple Manifolds<br />
Xiao, Rui, Shanghai Jiao Tong Univ.<br />
Zhao, Qijun, The Hong Kong Pol. Univ.<br />
Zhang, David, The Hong Kong Pol. Univ.<br />
Shi, Pengfei, Shanghai Jiao Tong Univ.<br />
Unlike most previous manifold-based data classification algorithms assume that all the data points are on a single manifold,<br />
we expect that data from different classes may reside on different manifolds of possible different dimensions. Therefore,<br />
better classification accuracy would be achieved by modeling the data by multiple manifolds each corresponding to a<br />
class. To this end, a general framework for data classification on multiple manifolds is presented. The manifolds are firstly<br />
learned for each class separately, and a stochastic optimization algorithm is then employed to get the near optimal dimensionality<br />
of each manifold from the classification viewpoint. Then, classification is performed under a newly defined minimum<br />
reconstruction error based classifier. Our method could be easily extended by involving various manifold learning<br />
methods and searching strategies. Experiments on both synthetic data and databases of facial expression images show the<br />
effectiveness of the proposed multiple manifold based approach.<br />
13:50-14:10, Paper ThBT2.2<br />
Unsupervised Ensemble Ranking: Application to Large-Scale Image Retrieval<br />
Lee, Jung-Eun, Michigan State Univ.<br />
Jin, Rong, Michigan State Univ.<br />
Jain, Anil, Michigan State Univ.<br />
The continued explosion in the growth of image and video databases makes automatic image search and retrieval an extremely<br />
important problem. Among the various approaches to Content-based Image Retrieval (CBIR), image similarity<br />
based on local point descriptors has shown promising performance. However, this approach suffers from the scalability<br />
problem. Although bag-of-words model resolves the scalability problem, it suffers from loss in retrieval accuracy. We circumvent<br />
this performance loss by an ensemble ranking approach in which rankings from multiple bag-of-words models<br />
are combined to obtain more accurate retrieval results. An unsupervised algorithm is developed to learn the weights for<br />
fusing the rankings from multiple bag-of-words models. Experimental results on a database of 100,000 images show that<br />
this approach is both efficient and effective in finding visually similar images.<br />
14:10-14:30, Paper ThBT2.3<br />
Cross Entropy Optimization of the Random Set Framework for Multiple Instance Learning<br />
Bolton, Jeremy, Univ. of Florida<br />
Gader, Paul, Univ. of Florida<br />
Multiple instance learning (MIL) is a recently researched technique used for learning a target concept in the presence of<br />
noise. Previously, a random set framework for multiple instance learning (RSF-MIL) was proposed; however, the proposed<br />
optimization strategy did not permit the harmonious optimization of model parameters. A cross entropy, based optimization<br />
strategy is proposed. Experimental results on synthetic examples, benchmark and landmine data sets illustrate the benefits<br />
of the proposed optimization strategy.<br />
- 281 -
14:30-14:50, Paper ThBT2.4<br />
A Constant Average Time Algorithm to Allow Insertions in the LAESA Fast Nearest Neighbour Search Index<br />
Oncina, Jose, Univ. de Alicante<br />
Micó, Luisa, Univ. de Alicante<br />
Nearest Neighbour search is a widely used technique in Pattern Recognition. In order to speed up the search many indexing<br />
techniques have been proposed. However, most of the proposed techniques are static, that is, once the index is built the<br />
incorporation of new data is not possible unless a costly rebuilt of the index is performed. The main effect is that changes<br />
in the environment are very costly to be taken into account. In this work, we propose a technique to allow the insertion of<br />
elements in the LAESA index. The resulting index is exactly the same as the one that would be obtained by building it<br />
from scratch. In this paper we also obtain an upper bound for its expected running time. Surprisingly, this bound is independent<br />
of the database size.<br />
14:50-15:10, Paper ThBT2.5<br />
Feature Extraction from Discrete Attributes<br />
Yildiz, Olcay Taner, Isik Univ.<br />
In many pattern recognition applications, first decision trees are used due to their simplicity and easily interpretable nature.<br />
In this paper, we extract new features by combining k discrete attributes, where for each subset of size k of the attributes,<br />
we generate all orderings of values of those attributes exhaustively. We then apply the usual univariate decision tree classifier<br />
using these orderings as the new attributes. Our simulation results on 16 datasets from UCI repository show that the<br />
novel decision tree classifier performs better than the proper in terms of error rate and tree complexity. The same idea can<br />
also be applied to other univariate rule learning algorithms such as C4.5 Rules and Ripper.<br />
ThBT3 Topkapı Hall A<br />
Computer Vision Applications - II Regular Session<br />
Session chair: Foggia, Pasquale (Univ. di Salerno)<br />
13:30-13:50, Paper ThBT3.1<br />
Fire-Flame Detection based on Fuzzy Finite Automation<br />
Ko, Byoungchul, Keimyung Univ.<br />
Ham, Seoun-Jae, Keimyung Univ.<br />
Nam, Jaeyeal, Keimyung Univ.<br />
This paper proposes a new fire-flame detection method using probabilistic membership function of visual features and<br />
Fuzzy Finite Automata (FFA). First, moving regions are detected by analyzing the background subtraction and candidate<br />
flame regions then identified by applying flame color models. Since flame regions generally have an irregular pattern continuously,<br />
membership functions of variance of intensity, wavelet energy and motion orientation are generate and applied<br />
to FFA. Since FFA combines the capabilities of automata with fuzzy logic, it not only provides a systemic approach to<br />
handle uncertainty in computational systems, but also can handle continuous spaces. The proposed algorithm is successfully<br />
applied to various fire videos and shows a better detection performance when compared with other methods<br />
13:50-14:10, Paper ThBT3.2<br />
Extrinsic Camera Parameter Estimation using Video Images and GPS Considering GPS Positioning Accuracy<br />
Kume, Hideyuki, Nara Inst. of Science and Tech.<br />
Taketomi, Takafumi, Nara Inst. of Science and Tech.<br />
Sato, Tomokazu, Nara Inst. of Science and Tech.<br />
Yokoya, Naokazu, Nara Inst. of Science and Tech.<br />
This paper proposes a method for estimating extrinsic camera parameters using video images and position data acquired<br />
by GPS. In conventional methods, the accuracy of the estimated camera position largely depends on the accuracy of GPS<br />
positioning data because they assume that GPS position error is very small or normally distributed. However, the actual<br />
error of GPS positioning easily grows to the 10m level and the distribution of these errors is changed depending on satellite<br />
positions and conditions of the environment. In order to achieve more accurate camera positioning in outdoor environments,<br />
in this study, we have employed a simple assumption that true GPS position exists within a certain range from the observed<br />
GPS position and the size of the range depends on the GPS positioning accuracy. Concretely, the proposed method estimates<br />
- 282 -
camera parameters by minimizing an energy function that is defined by using the reprojection error and the penalty term<br />
for GPS positioning.<br />
14:10-14:30, Paper ThBT3.3<br />
Combining Monocular and Stereo Cues for Mobile Robot Localization using Visual Words<br />
Fraundorfer, Friedrich, ETH Zurich<br />
Wu, Changchang, UNC-Chapel Hill<br />
Pollefeys, Marc,<br />
This paper describes an approach for mobile robot localization using a visual word based place recognition approach. In<br />
our approach we exploit the benefits of a stereo camera system for place recognition. Visual words computed from SIFT<br />
features are combined with VIP (viewpoint invariant patches) features that use depth information from the stereo setup.<br />
The approach was evaluated under the ImageCLEF@<strong>ICPR</strong> <strong>2010</strong> competition. The results achieved on the competition<br />
datasets are published in this paper.<br />
14:30-14:50, Paper ThBT3.4<br />
Fast Derivation of Soil Surface Roughness Parameters using Multi-Band SAR Imagery and the Integral Equation<br />
Model<br />
Seppke, Benjamin, Univ. of Hamburg<br />
Dreschler-Fischer, Leonie, Univ. of Hamburg<br />
Heiming, Jo-Ann, Univ. of Hamburg<br />
Wengenroth, Felix, Univ. of Hamburg<br />
The Integral Equation Model (IEM) predicts the normalized radar cross section (NRCS) of dielectric surfaces given surface<br />
and radar parameters. To derive the surface parameters from the NRCS using the IEM, the model needs to be inverted. We<br />
present a fast method of this model inversion to derive soil surface roughness parameters from synthetic aperture radar<br />
(SAR) remote sensing data. The model inversion is based on two different collocated SAR images of different bands, the<br />
derivation of the parameters cannot be done using one band alone. The computation of the model and the model inversion<br />
are very time consuming tasks and therefore may be impractical for large remote sensing data. We present an approach<br />
that is based on a few model assumptions to speed up the computation of the surface parameters. We applied the algorithm<br />
to detect the correlation length of the surface for dry-fallen areas in the World Cultural Heritage Wadden Sea, a coastal<br />
tidal flat at the German Bight (North Sea). The results are very promising and may be used for a classification of the area<br />
in future steps.<br />
14:50-15:10, Paper ThBT3.5<br />
Social Network Approach to Analysis of Soccer Game<br />
Park, Kyoung-Jin, The Ohio State Univ.<br />
Yilmaz, Alper, The Ohio State Univ.<br />
Video understanding has been an active area of research, where many articles have been published on how to detect and<br />
track objects in videos, and how to analyze their trajectories. These methods, however, only provided heuristic low level<br />
information without providing a higher level understanding of global relations within the whole context. This paper presents<br />
a new way to provide such understanding using social network approach in soccer videos. Our approach considers representing<br />
interactions between the objects in the video as a social network. This network is then analyzed by detecting small<br />
communities using modularity, which relates social interaction. Additionally, we analyze the centrality of nodes which<br />
provides importance of individuals composing the network. In particular, we introduce five centralities exploiting directed<br />
and weighted social network. The partitions of the resulting social network are shown to relate to clusters of soccer players<br />
with respect to their role in the game.<br />
ThBT4 Dolmabahçe Hall B<br />
Image Segmentation - II Regular Session<br />
Session chair: Farag, Aly A. (Univ. of Louisville)<br />
13:30-13:50, Paper ThBT4.1<br />
Robust Foreground Object Segmentation via Adaptive Region-Based Background Modelling<br />
Reddy, Vikas, NICTA, The Univ. of Queensland<br />
Sanderson, Conrad, NICTA<br />
- 283 -
Lovell, Brian Carrington, The Univ. of Queensland<br />
We propose a region-based foreground object segmentation method capable of dealing with image sequences containing<br />
noise, illumination variations and dynamic backgrounds (as often present in outdoor environments). The method utilises<br />
contextual spatial information through analysing each frame on an overlapping block by-block basis and obtaining a lowdimensional<br />
texture descriptor for each block. Each descriptor is passed through an adaptive multi-stage classifier, comprised<br />
of a likelihood evaluation, an illumination invariant measure, and a temporal correlation check. The overlapping of<br />
blocks not only ensures smooth contours of the foreground objects but also effectively minimises the number of false positives<br />
in the generated foreground masks. The parameter settings are robust against wide variety of sequences and postprocessing<br />
of foreground masks is not required. Experiments on the challenging I2R dataset show that the proposed method<br />
obtains considerably better results (both qualitatively and quantitatively) than methods based on Gaussian mixture models<br />
(GMMs), feature histograms, and normalised vector distances. On average, the proposed method achieves 36% more accurate<br />
foreground masks than the GMM based method.<br />
13:50-14:10, Paper ThBT4.2<br />
Flooding and MRF-Based Algorithms for Interactive Segmentation<br />
Grinias, Ilias, Univ. of Crete<br />
Komodakis, Nikos, Univ. of Crete<br />
Tziritas, G., Univ. of Crete<br />
We propose a method for interactive colour image segmentation. The goal is to detect an object from the background,<br />
when some markers on object(s) and the background are given. As features only probability distributions of the data are<br />
used. At first, all the labelled seeds are independently propagated for obtaining homogeneous connected components for<br />
each of them. Then the image is divided in blocks, which are classified according to their probabilistic distance from the<br />
classified regions. A topographic surface for each class is obtained, using Bayesian dissimilarities and a min-max criterion.<br />
Two algorithms are proposed: a regularized classification based on the topographic surface and incorporating an MRF<br />
model, and a priority multi-label flooding algorithm. Segmentation results on the LHI data set are presented.<br />
14:10-14:30, Paper ThBT4.3<br />
Steerable Filtering using Novel Circular Harmonic Functions with Application to Edge Detection<br />
Papari, Giuseppe, Univ. of Groningen<br />
Campisi, Patrizio, Univ. degli Studi Roma TRE<br />
Petkov, N, Univ. of Groningen<br />
In this paper, we perform approximate steering of the elongated 2D Hermite-Gauss functions with respect to rotations and<br />
provide a compact analytical expressions for the related basis functions. A special notation introduced here considerably<br />
simplifies the derivation and unifies the cases of even and odd indices. The proposed filters are applied to edge detection.<br />
Quantitative analysis shows a performance increase of about 12.5% in terms of the Pratt’s figure of merit with respect to<br />
the well-established Gaussian gradient proposed by Canny.<br />
14:30-14:50, Paper ThBT4.4<br />
3D Vertebral Body Segmentation using Shape based Graph Cuts<br />
Aslan, Melih Seref, Univ. of Louisville<br />
Ali, Asem, Univ. of Louisville<br />
Farag, Aly A., Univ. of Louisville<br />
Rara, Ham, Univ. of Louisville<br />
Arnold, Ben, Image Analysis, Inc.<br />
Ping, Xiang, Image Analysis, Inc.<br />
Bone mineral density (BMD) measurements and fracture analysis of the spine bones are restricted to the Vertebral bodies<br />
(VBs). In this paper, we propose a novel 3D shape based method to segment VBs in clinical computed tomography (CT)<br />
images without any user intervention. The proposed method depends on both image appearance and shape information.<br />
3D shape information is obtained from a set of training data sets. Then, we estimate the shape variations using a distance<br />
probabilistic model which approximates the marginal densities of the VB and background in the variability region. To<br />
segment a VB, the Matched filter is used to detect the VB region automatically. We align the detected volume with 3D<br />
shape prior in order to be used in distance probabilistic model. Then, the graph cuts method which integrates the linear<br />
- 284 -
combination of Gaussians (LCG), Markov Gibbs Random Field (MGRF), and distance probabilistic model obtained from<br />
3D shape prior is used. Experiments on the data sets show that the proposed segmentation approach is more accurate than<br />
other known alternatives.<br />
14:50-15:10, Paper ThBT4.5<br />
Locally Deformable Shape Model to Improve 3D Level Set based Esophagus Segmentation<br />
Kurugol, Sila, Northeastern Univ.<br />
Ozay, Necmiye, Northeastern Univ.<br />
Dy, Jennifer G., Northeastern Univ.<br />
Sharp, Gregory C., Mass. General Hospital and Harvard Medical School<br />
Brooks, Dana H., Northeastern Univ.<br />
In this paper we propose a supervised 3D segmentation algorithm to locate the esophagus in thoracic CT scans using a<br />
variational framework. To address challenges due to low contrast, several priors are learned from a training set of segmented<br />
images. Our algorithm first estimates the centerline based on a spatial model learned at a few manually marked anatomical<br />
reference points. Then an implicit shape model is learned by subtracting the centerline and applying PCA to these shapes.<br />
To allow local variations in the shapes, we propose to use nonlinear smooth local deformations. Finally, the esophageal<br />
wall is located within a 3D level set framework by optimizing a cost function including terms for appearance, the shape<br />
model, smoothness constraints and an air/contrast model.<br />
ThBT5 Topkapı Hall B<br />
3D Face Recognition Regular Session<br />
Session chair: Li, Stan Z. (CASIA)<br />
13:30-13:50, Paper ThBT5.1<br />
3D Face Recognition by Deforming the Normal Face<br />
Li, Xiaoli, Southeast Univ.<br />
Da, Feipeng, Southeast Univ.<br />
3D face recognition is complicated by the presence of expression variation. In this paper, we present an automatic 3D face<br />
recognition method which can differentiate the expression deformations from the interpersonal differences and recognize<br />
faces with expressions being removed. The deformations caused by expression and interpersonal difference are firstly<br />
learnt from training set, respectively. Then the deformations are linearly combined to synthesize new face with certain expression.<br />
When a target face comes in, the synthesized face is used to match it by adjusting the coefficients in the linear<br />
combination. After the matching process, coefficients corresponding to the interpersonal differences are chosen as features<br />
for recognition. We perform experiments on the FRGC v2.0 database and good performance is obtained.<br />
13:50-14:10, Paper ThBT5.2<br />
Real-Time 3D Face and Facial Action Tracking using Extended 2D+3D AAMs<br />
Zhou, Mingcai, Chinese Acad. of Sciences<br />
Wang, Yangsheng, Chinese Acad. of Sciences<br />
Huang, Xiangsheng, Chinese Acad. of Sciences<br />
In this work, we address the problem of tracking three dimensional (3D) faces and facial actions in video sequences. The<br />
main contributions of the paper are as follows. First, we develop an extended 2D+3D Active Appearance Models (AAM)<br />
based 3D face and facial action tracking framework using 2D view-based AAMs and a modified 3D face model. Second,<br />
we develop a robust shape initialization method based on local feature matching to track fast face motion. Experiments<br />
evaluating the effectiveness of the proposed algorithm are reported.<br />
14:10-14:30, Paper ThBT5.3<br />
A Novel Face Recognition Approach using a 2D-3D Searching Strategy<br />
Dahm, Nicholas, NICTA<br />
Gao, Yongsheng, Griffith Univ.<br />
Many Face Recognition techniques focus on 2D-2D comparison or 3D-3D comparison, however few techniques explore<br />
- 285 -
the idea of cross-dimensional comparison. This paper presents a novel face recognition approach that implements crossdimensional<br />
comparison to solve the issue of pose invariance. Our approach implements a Gabor representation during<br />
comparison to allow for variations in texture, illumination, expression and pose. Kernel scaling is used to reduce comparison<br />
time during the branching search, which determines the facial pose of input images. The conducted experiments prove<br />
the viability of this approach, with our larger kernel experiments returning 91.6% - 100% accuracy on a database comprised<br />
of both local data, and data from the USF Human ID 3D database.<br />
14:30-14:50, Paper ThBT5.4<br />
Initialization and Pose Alignment in Active Shape Model<br />
Xiong, Pengfei, Chinese Acad. of Sciences<br />
Lei, Huang, Chinese Acad. of Sciences<br />
Liu, Changping, Chinese Acad. of Sciences<br />
In this paper, we propose a new algorithm for shape initialization and 3D pose alignment in Active Shape Model (ASM).<br />
Instead of initializing with average shape in previous works, we build a scatter data interpolation model from key points<br />
to obtain the initial shape, which ensures shape initialized around face organs. These key points are chosen from organs<br />
of face shape and located with a strong classifier firstly. Then they are utilized to build a Radial Basis Function (RBF)<br />
model to deform the average shape as initial shape. Besides, to cope with variety face poses, we define a 3D general shape<br />
to align face shapes in 3D instead of 2D alignment in Classic ASM. With the accurate 3D rotation angles iteratively calculated<br />
by Levenberg-Marquardt (LM) algorithm, shapes can be aligned to standard shape more reliably. Experiments<br />
and comparisons on FERET show that both shape initialization and 3D pose alignment of our algorithm greatly improve<br />
the location accuracy.<br />
14:50-15:10, Paper ThBT5.5<br />
3D Face Reconstruction using a Single or Multiple Views<br />
Choi, Jongmoo, Univ. of Southern California<br />
Medioni, Gerard, Univ. of Southern California<br />
Lin, Yuping, Univ. of Southern California<br />
Silva, Luciano, Univ. Federal do Parana<br />
Bellon, Olga Regina Pereira, Univ. Federal do Parana<br />
Pamplona, Mauricio, Univ. Federal do Parana<br />
Faltemier, Timothy, Progeny Systems<br />
We present a 3D face reconstruction system that takes as input either one single view or several different views. Given a<br />
facial image, we first classify the facial pose into one of five predefined poses, then detect two anchor points that are then<br />
used to detect a set of predefined facial landmarks. Based on these initial steps, for a single view we apply a warping<br />
process using a generic 3D face model to build a 3D face. For multiple views, we apply sparse bundle adjustment to reconstruct<br />
3D landmarks which are used to deform the generic 3D face model. Experimental results on the Color FERET<br />
and CMU multi-PIE databases confirm our framework is effective in creating realistic 3D face models that can be used in<br />
many computer vision applications, such as 3D face recognition at a distance.<br />
ThBT6 Dolmabahçe Hall A<br />
Text Analysis and Detection Regular Session<br />
Session chair: Kholmatov, Alisher (TUBITAK UEKAE)<br />
13:30-13:50, Paper ThBT6.1<br />
Text Detection using Edge Gradient and Graph Spectrum<br />
Zhang, Jing, Univ. of South Florida<br />
Kasturi, Rangachar, Univ. of South Florida<br />
In this paper, we propose a new unsupervised text detection approach which is based on Histogram of Oriented Gradient<br />
and Graph Spectrum. By investigating the properties of text edges, the proposed approach first extracts text edges from<br />
an image and localize candidate character blocks using Histogram of Oriented Gradients, then Graph Spectrum is utilized<br />
to capture global relationship among candidate blocks and cluster candidate blocks into groups to generate bounding boxes<br />
of text objects in the image. The proposed method is robust to the color and size of text. ICDAR 2003 text locating dataset<br />
- 286 -
and video frames were used to evaluate the performance of the proposed approach. Experimental results demonstrated the<br />
validity of our approach.<br />
13:50-14:10, Paper ThBT6.2<br />
Scene Text Extraction with Edge Constraint and Text Collinearity<br />
Lee, Seonghun, KAIST<br />
Cho, MinSu, KAIST<br />
Jung, Kyomin, KAIST<br />
Kim, Jin Hyung, KAIST<br />
In this paper, we propose a framework for isolating text regions from natural scene images. The main algorithm has two<br />
functions: it generates text region candidates, and it verifies of the label of the candidates (text or non-text). The text region<br />
candidates are generated through a modified K-means clustering algorithm, which references texture features, edge information<br />
and color information. The candidate labels are then verified in a global sense by the Markov Random Field model<br />
where collinearity weight is added as long as most texts are aligned. The proposed method achieves reasonable accuracy<br />
for text extraction from moderately difficult examples from the ICDAR 2003 database.<br />
14:10-14:30, Paper ThBT6.3<br />
Typographical Features for Scene Text Recognition<br />
Weinman, Jerod, Grinnell Coll.<br />
Scene text images feature an abundance of font style variety but a dearth of data in any given query. Recognition methods<br />
must be robust to this variety or adapt to the query data’s characteristics. To achieve this, we augment a semi-Markov<br />
model—-integrating character segmentation and recognition—-with a bigram model of character widths. Softly promoting<br />
segmentations that exhibit font metrics consistent with those learned from examples, we use the limited information available<br />
while avoiding error-prone direct estimates and hard constraints. Incorporating character width bigrams in this fashion<br />
improves recognition on low-resolution images of signs containing text in many fonts.<br />
14:30-14:50, Paper ThBT6.4<br />
A Visual Attention based Approach to Text Extraction<br />
Sun, Qiaoyu, HuaihaiInstitute of Tech.<br />
Lu, Yue, East China Normal Univ.<br />
Sun, Shiliang, East China Normal Univ.<br />
A visual attention based approach is proposed to extract texts from complicated background in camera-based images.<br />
First, it applies the simplified visual attention model to highlight the region of interest (ROI) in an input image and to<br />
yield a map, named the VA map, consisting of the ROIs. Second, an edge map of image containing the edge information<br />
of four directions is obtained by Sobel operators. Character areas are detected by connected component analysis and<br />
merged into candidate text regions. Finally, the VA map is employed to confirm the candidate text regions. The experimental<br />
results demonstrate that the proposed method can effectively extract text information and locate text regions contained in<br />
camera-based images. It is robust not only for font, size, color, language, space, alignment and complexity of background,<br />
but also for perspective distortion and skewed texts embedded in images.<br />
14:50-15:10, Paper ThBT6.5<br />
New Wavelet and Color Features for Text Detection in Video<br />
Palaiahnakote, Shivakumara, National Univ. of Singapore<br />
Phan, Trung Quy, National Univ. of Singapore<br />
Tan, Chew-Lim, National Univ. of Singapore<br />
Automatic text detection in video is an important task for efficient and accurate indexing and retrieval of multimedia data<br />
such as events identification, events boundary identification etc. This paper presents a new method comprising of wavelet<br />
decomposition and color features namely R, G and B. The wavelet decomposition is applied on three color bands separately<br />
to obtain three high frequency sub bands (LH, HL and HH) and then the average of the three sub bands for each color<br />
band is computed further to enhance the text pixels in video frame. To take advantage of wavelet and color information,<br />
we again take the average of the three average images (AoA) obtained by the former step to increase the gap between text<br />
- 287 -
and non text pixels. Our previous Laplacian method is employed on AoA for text detection. The proposed method is evaluated<br />
by testing on a large dataset which includes publicly available data, non text data and ICDAR-03 data. Comparative<br />
study with existing methods shows that the results of the proposed method are encouraging and useful.<br />
ThBT7 Dolmabahçe Hall C<br />
Quantitative Biological Image and Signal Analysis Regular Session<br />
Session chair: Tasdizen, Tolga (Univ. of Utah)<br />
13:30-13:50, Paper ThBT7.1<br />
Improving Undersampled MRI Reconstruction using Non-Local Means<br />
Adluru, Ganesh, Univ. of Utah<br />
Tasdizen, Tolga, Univ. of Utah<br />
Whitaker, Ross, Univ. of Utah<br />
Dibella, Edward, Univ. of Utah<br />
Obtaining high quality images in MR is desirable not only for accurate visual assessment but also for automatic processing<br />
to extract clinically relevant parameters. Filtering-based techniques are extremely useful for reducing artifacts caused due<br />
to under sampling of k-space (to reduce scan time). The recently proposed Non-Local Means (NLM) filtering method<br />
offers a promising means to denoise images. Compared to most previous approaches, NLM is based on a more realistic<br />
model of images, which results in little loss of information while removing the noise. Here we extend the NLM method<br />
for MR image reconstruction from under sampled k-space data. The method is applied on T1-weighted images of the<br />
breast and T2-weighted anatomical brain images. Results show that NLM offers a promising method that can be used for<br />
accelerating MR data acquisitions.<br />
13:50-14:10, Paper ThBT7.2<br />
Towards an Intelligent Bed Sensor: Non-Intrusive Monitoring of Sleep Irregularities with Computer Vision Techniques<br />
Branzan Albu, Alexandra, Univ. of Victoria<br />
Malakuti, Kaveh, Univ. of Victoria<br />
This paper proposes a novel approach for monitoring sleep using pressure data. The goal of sleep monitoring is to detect<br />
and log events of normal breathing, sleep apnea and body motion. The proposed approach is based on translating the signal<br />
data to the image domain by computing a sequence of inter-frame similarity matrices from pressure maps acquired with<br />
a mattress of pressure sensors. Periodicity analysis was performed on similarity matrices via a new algorithm based on<br />
segmentation of elementary patterns using the watershed transform, followed by aggregation of quasi-rectangular patterns<br />
into breathing cycles. Once breathing events are detected, all remaining elementary patterns aligned on the main diagonal<br />
are considered as belonging to either apnea or motion events. The discrimination between these two events is based on<br />
detecting movement times from a statistical analysis of pressure data. Experimental results confirm the validity of our approach.<br />
14:10-14:30, Paper ThBT7.3<br />
Automatic Selection of Keyframes from Angiogram Videos<br />
Syeda-Mahmood, Tanveer, IBM Almaden Res. Center<br />
Wang, Fei, Almaden Res. Center<br />
Beymer, David, IBM Almaden Res. Center<br />
Mahmood, Aafreen, Monta Vista High School<br />
Lundstrom, Robert, Kaiser Permanente SFO Medical Center<br />
In this paper we address the problem of automatic selection of important vessel-depicting key frames within 2D angiography<br />
videos. Two different methods of frame selection are described, one based on Frangi filter, and the other based on<br />
detecting parallel curves formed from edges in angiography images. Results are shown by comparison to physician annotation<br />
of such key frames on 2D coronary angiograms.<br />
14:30-14:50, Paper ThBT7.4<br />
A Computer-Aided Method for Scoliosis Fusion Level Selection by a Topologicaly Ordered Self Organizing Kohonen<br />
Network<br />
- 288 -
Mezghani, Neila, Centre de Recherche du CHUM<br />
Phan, Philippe, Sainte-Justine University Hospital Center<br />
Mitiche, Amar, Labella, Hubert, cole Polytechnique de Montreal<br />
de Guise, Jacques, Centre de Recherche du CHUM<br />
Surgical instrumentation for the Adolescent idiopathic scoliosis (AIS) is a complex procedure involving many difficult<br />
decisions. Selection of the appropriate fusion level remains one of the most challenging decisions in scoliosis surgery.<br />
Currently, the Lenke classification model is generally followed in surgical planning. The purpose of our study is to investigate<br />
a computer aided method for Lenke classification and scoliosis fusion level selection. The method uses a self organizing<br />
neural network trained on a large database of surgically treated AIS cases. The neural network produces two<br />
maps, one of Lenke classes and the other of fusion levels. These two maps show that the Lenke classes are associated<br />
with the the proper fusion level categories everywhere in the map except at the Lenke class transitions. The topological<br />
ordering of the Cobb angles in the neural network justifies determining a patient scoliotic treatment instrumentation using<br />
directly the fusion level map rather than via the Lenke classification.<br />
14:50-15:10, Paper ThBT7.5<br />
A Fast and Robust Graph-Based Approach for Boundary Estimation of Fiber Bundles Relying on Fractional<br />
Anisotropy Maps<br />
Bauer, Miriam Helen Anna, Univ. of Marburg<br />
Egger, Jan, Univ. of Marburg<br />
Odonnell, Thomas Patrick, Siemens Corp. Res.<br />
Freisleben, Bernd, Univ. of Marburg<br />
Barbieri, Sebastiano, Fraunhofer MEVIS<br />
Klein, Jan, Fraunhofer MEVIS<br />
Hahn, Horst Karl, Fraunhofer MEVIS<br />
Nimsky, Christopher, Univ. Marburg<br />
In this paper, a fast and robust graph-based approach for boundary estimation of fiber bundles derived from Diffusion<br />
Tensor Imaging (DTI) is presented. DTI is a non-invasive imaging technique that allows the estimation of the location of<br />
white matter tracts based on measurements of water diffusion properties. Depending on DTI data, the fiber bundle boundary<br />
can be determined to gain information about eloquent structures, which is of major interest for neurosurgery. DTI in combination<br />
with tracking algorithms allows the estimation of position and course of fiber tracts in the human brain. The presented<br />
method uses these tracking results as the starting point for a graph-based approach. The overall method starts by<br />
computing the fiber bundle centerline between two user-defined regions of interests (ROIs). This centerline determines<br />
the planes that are used for creating a directed graph. Then, the mincut of the graph is calculated, creating an optimal<br />
boundary of the fiber bundle.<br />
ThCT1 Marmara Hall<br />
Object Detection and Recognition - VI Regular Session<br />
Session chair: Denzler, Joachim (Friedrich-Schiller Univ. of Jena )<br />
15:40-16:00, Paper ThCT1.1<br />
Recognizing 3D Objects with 3D Information from Stereo Vision<br />
Yoon, Kuk-Jin, GIST<br />
Shin, Min-Gil, GIST<br />
Lee, Ji-Hyo, Samsung Electronics<br />
Conventional local feature-based object recognition methods try to recognize learned 3D objects by using unordered local<br />
feature matching followed by the verification. However, the matching between unordered feature sets can be ambiguous<br />
and, moreover, it is difficult to deal with general shaped 3D objects in the verification stage. In this paper, we present a<br />
new framework for general 3D object recognition, which is based on the invariant local features and their 3D information<br />
with stereo cameras. We extend the conventional object recognition framework for stereo cameras. Since the proposed<br />
method is based on the stereo vision, it is possible to utilize 3D information of local features visible from two cameras.<br />
- 289 -
16:00-16:20, Paper ThCT1.2<br />
Combining Geometry and Local Appearance for Object Detection<br />
Pascual García-Tubío, Manuel, Vienna Univ. of Tech.<br />
Wildenauer, Horst, Vienna Univ. of Tech.<br />
Szumilas, Lech, Ind. Research Inst. for Automation & Measurement<br />
In this paper we address the problem of object detection in cluttered scenes. Local image features and their spatial configuration<br />
act as representation of object classes which are learned in a discriminative fashion. Recent contributions in the<br />
area of object detection indicate the importance of using geometrical properties for representing object classes. Prompted<br />
by this, we devised an approach tailored to control the importance of the features and their spatial alignment. We quantitatively<br />
show that modeling the spatial distribution of local features and optimising the influence of both cues significantly<br />
boosts object detection performance.<br />
16:20-16:40, Paper ThCT1.3<br />
Illumination and Expression Invariant Face Recognition using SSIM based Sparse Representation<br />
Khwaja, Asim, The Australian National Univ.<br />
Asthana, Akshay, Australian National Univ.<br />
Goecke, Roland, Univ. of Canberra<br />
The sparse representation technique has provided a new way of looking at object recognition. As we demonstrate in this<br />
paper, however, the mean-squared error (MSE) measure, which is at the heart of this technique, is not a very robust measure<br />
when it comes to comparing facial images, which differ significantly in luminance values, as it only performs pixel-bypixel<br />
comparisons. This requires a significantly large training set with enough variations in it to offset the drawback of the<br />
MSE measure. A large training set, however, is often not available. We propose the replacement of the MSE measure by<br />
the structural similarity (SSIM) measure in the sparse representation algorithm, which performs a more robust comparison<br />
using only one training sample per subject. In addition, since the off-the-shelf sparsifiers are also written using the MSE<br />
measure, we developed our own sparsifier using genetic algorithms that use the SSIM measure. We applied the modified<br />
algorithm to the Extended Yale Face B database as well as to the Multi-PIE database with expression and illumination<br />
variations. The improved performance demonstrates the effectiveness of the proposed modifications.<br />
16:40-17:00, Paper ThCT1.4<br />
Improving Classification Accuracy by Comparing Local Features through Canonical Correlations<br />
Dikmen, Mert, Univ. of Illinois at Urbana Champaign<br />
Huang, Thomas, Univ. of Illinois at Urbana-Champaign<br />
Classifying images using features extracted from densely sampled local patches has enjoyed significant success in many detection<br />
and recognition tasks. It is also well known that generally more than one type of feature is needed to achieve robust<br />
classification performance. Previous works using multiple features have addressed this issue either through simple concatenation<br />
of feature vectors or through combining feature specific kernels at the classifier level. In this work we introduce a<br />
novel approach for combining features at the feature level by projecting two types of features onto two respective subspaces<br />
in which they are maximally correlated. We use their correlation as an augmented feature and demonstrate improvement in<br />
classification accuracy over simple combination through concatenation in a pedestrian detection framework.<br />
17:00-17:20, Paper ThCT1.5<br />
A Robust Approach for Person Localization in Multi-Camera Environment<br />
Sun, Luo, Tsinghua Univ.<br />
Di, Huijun, Tsinghua Univ.<br />
Tao, Linmi, Tsinghua Univ.<br />
Xu, Guangyou, Tsinghua Univ.<br />
Person localization is fundamental in human centered computing, since person should be localized before being actively serviced. This paper<br />
proposed a robust approach to localize person based on the geometric constraints in multi-camera environment. The proposed algorithm has<br />
several advantages: 1) no assumption on the positions and orientations of cameras except the cameras should have certain common field of<br />
view; 2) no assumption on the visibility of particular body part (e.g., feet), except a portion of person should be observed in at least two views;<br />
3) reliability in terms of tolerating occlusion, body posture change and inaccurate motion detection. It can also provide error control and be<br />
further extended to measure person height. The efficacy of the approach is demonstrated on challenging real-world scenarios.<br />
- 290 -
ThCT2 Anadolu Auditorium<br />
Classification - III Regular Session<br />
Session chair: Tortorella, Francesco (Univ. degli Studi di Cassino)<br />
15:40-16:00, Paper ThCT2.1<br />
Nearest Archetype Hull Methods for Large-Scale Data Classification<br />
Thurau, Christian, Fraunhofer IAIS<br />
This paper introduces an efficient geometric approach for data classification that can build class models from large amounts<br />
of high dimensional data. We determine a convex model of the data as the outcome of convex hull non-negative matrix<br />
factorization, a large-scale variant of Archetypal Analysis. The resulting convex regions or archetype hulls give an optimal<br />
(in a least squares sense) bounding of the data region and can be efficiently computed. We classify based on the minimum<br />
distance to the closest archetype hull. The proposed method offers (i) an intuitive geometric interpretation, (ii) single as<br />
well as multi-class classification, and (iii) handling of large amounts of high dimensional data. Experimental evaluation<br />
on common benchmark data sets shows promising results.<br />
16:00-16:20, Paper ThCT2.2<br />
A Bound on the Performance of LDA in Randomly Projected Data Spaces<br />
Durrant, Robert John, Univ. of Birmingham<br />
Kaban, Ata, Univ. of Birmingham<br />
We consider the problem of classification in nonadaptive dimensionality reduction. Specifically, we bound the increase in<br />
classification error of Fisher’s Linear Discriminant classifier resulting from randomly projecting the high dimensional<br />
data into a lower dimensional space and both learning the classifier and performing the classification in the projected<br />
space. Our bound is reasonably tight, and unlike existing bounds on learning from randomly projected data, it becomes<br />
tighter as the quantity of training data increases without requiring any sparsity structure from the data.<br />
16:20-16:40, Paper ThCT2.3<br />
Adaptive Incremental Learning with an Ensemble of Support Vector Machines<br />
Kapp, Marcelo N., École de Tech. Supérieure - Univ. of Quebec<br />
Sabourin, R., École de Tech. Supérieure<br />
Maupin, Patrick, Defence Res. and Development Canada<br />
The incremental updating of classifiers implies that their internal parameter values can vary according to incoming data.<br />
As a result, in order to achieve high performance, incremental learner systems should not only consider the integration of<br />
knowledge from new data, but also maintain an optimum set of parameters. In this paper, we propose an approach for performing<br />
incremental learning in an adaptive fashion with an ensemble of support vector machines. The key idea is to track,<br />
evolve, and combine optimum hypotheses over time, based on dynamic optimization processes and ensemble selection.<br />
From experimental results, we demonstrate that the proposed strategy is promising, since it outperforms a single classifier<br />
variant of the proposed approach and other classification methods often used for incremental learning.<br />
16:40-17:00, Paper ThCT2.4<br />
Margin Preserved Approximate Convex Hulls for Classification<br />
Takahashi, Tetsuji, Hokkaido Univ.<br />
Kudo, Mineichi, Hokkaido Univ.<br />
The usage of convex hulls for classification is discussed with a practical algorithm, in which a sample is classified according<br />
to the distances to convex hulls. Sometimes convex hulls of classes are too close to keep a large margin. In this paper, we<br />
discuss a way to keep a margin larger than a specified value. To do this, we introduce a concept of ``expanded convex<br />
hull’’ and confirm its effectiveness.<br />
17:00-17:20, Paper ThCT2.5<br />
Evolving Fuzzy Classifiers: Application to Incremental Learning of Handwritten Gesture Recognition Systems<br />
Almaksour, Abdullah, IRISA/INSA de Rennes<br />
Anquetil, Eric, IRISA/INSA<br />
- 291 -
Quiniou, Solen, Ec. de Tech. Supérieure<br />
Cheriet, Mohammed, École de Tech. Supérieure<br />
In this paper, we present a new method to design customizable self-evolving fuzzy rule-based classifiers. The presented<br />
approach combines an incremental clustering algorithm with a fuzzy adaptation method in order to learn and maintain the<br />
model. We use this method to build an evolving handwritten gesture recognition system. The self-adaptive nature of this<br />
system allows it to start its learning process with few learning data, to continuously adapt and evolve according to any<br />
new data, and to remain robust when introducing a new unseen class at any moment in the life-long learning process.<br />
ThCT3 Topkapı Hall A<br />
Computer Vision Applications - III Regular Session<br />
Session chair: Yilmaz, Alper (Ohio State Univ.)<br />
15:40-16:00, Paper ThCT3.1<br />
Fast and Spatially-Smooth Terrain Classification using Monocular Camera<br />
Jakkoju, Chetan, IIIT Hyderabad<br />
Krishna, Madhava, IIIT Hyderabad<br />
Jawahar, C. V., IIIT<br />
In this paper, we present a monocular camera based terrain classification scheme. The uniqueness of the proposed scheme<br />
is that it inherently incorporates spatial smoothness while segmenting a image, without requirement of post-processing<br />
smoothing methods. The algorithm is extremely fast because it is build on top of a Random Forest classifier. We present<br />
comparison across features and classifiers. The baseline algorithm uses color, texture and their combination with classifiers<br />
such as SVM and Random Forests. We further enhance the algorithm through a label transfer method. The efficacy of the<br />
proposed solution can be seen as we reach a low error rates on both our dataset and other publicly available datasets.<br />
16:00-16:20, Paper ThCT3.2<br />
Learning Major Pedestrian Flows in Crowded Scenes<br />
Widhalm, Peter, Austrian Inst. of Tech.<br />
Braendle, Norbert, Austrian Inst. of Tech.<br />
We present a crowd analysis approach computing a representation of the major pedestrian flows in complex scenes. It<br />
treats crowds as a set of moving particles and builds a spatio-temporal model of motion events. A Growing Neural Gas algorithm<br />
encodes optical flow particle trajectories as sequences of local motion events and learns a topology which is the<br />
base for trajectory distance computations. Trajectory prototypes are aligned with a two-open-ends version of Dynamic<br />
Time Warping to cope with fragmented trajectores. The trajectories are grouped into an automatically determined number<br />
of clusters with self-tuning spectral clustering. The clusters are compactly represented with the help of Principal Component<br />
Analysis, providing a technique for unusual motion detection based on residuals. We demonstrate results for a publicly<br />
available crowded video and a scene with volunteers moving according to defined origin-destination flows.<br />
16:20-16:40, Paper ThCT3.3<br />
On-Line Video Recognition and Counting of Harmful Insects<br />
Bechar, Ikhlef, INRIA<br />
Sabine Moisan, INRIA<br />
Monique Thonnat, INRIA<br />
Francois Bremond, INRIA<br />
This article is concerned with on-line counting of harmful insects of certain species in videos in the framework of in situ<br />
video-surveillance that aims at the early detection of prominent pest attacks in greenhouse crops. The video-processing<br />
challenges that need to be coped with concern mainly the low spatial resolution and color contrast of the objects of interest<br />
in the videos, the outdoor issues and the video-processing which needs to be done in quasi-real time. Thus, we propose an<br />
approach which makes use of a pattern recognition algorithm to extract the locations of the harmful insects of interest in<br />
a video, which we combine with some video-processing algorithms in order to achieve an on-line video-surveillance solution.<br />
The system has been validated off-line on the whiteflie species (one potential harmful insect) and has shown acceptable<br />
performance in terms of accuracy versus computational time.<br />
- 292 -
16:40-17:00, Paper ThCT3.4<br />
Boosted Edge Orientation Histograms for Grasping Point Detection<br />
Lefakis, Leonidas, TU Vienna<br />
Wildenauer, Horst, Vienna Univ. of Tech.<br />
Pascual García-Tubío, Manuel, Vienna Univ. of Tech.<br />
Szumilas, Lech, Ind. Research Inst. for Automation & Measurement<br />
In this paper, we describe a novel algorithm for the detection of grasping points in images of previously unseen objects.<br />
A basic building block of our approach is the use of a newly devised descriptor, representing semi-local grasping point<br />
shape by the use edge orientation histograms. Combined with boosting, our method learns discriminative grasp point models<br />
for new objects from a set of annotated real-world images. The method has been extensively evaluated on challenging<br />
images of real scenes, exhibiting largely varying characteristics concerning illumination conditions, scene complexity,<br />
and viewpoint. Our experiments show that the method works in a stable manner and that its performance compares favorably<br />
to the state-of-the-art.<br />
17:00-17:20, Paper ThCT3.5<br />
Automatic Refinement of Foreground Regions for Robot Trail Following<br />
Kocamaz, Mehmet Kemal, Univ. of Delaware<br />
Rasmussen, Christopher, Univ. of Delaware<br />
Continuous trails are extended regions along the ground such as roads, hiking paths, rivers, and pipelines which can be<br />
navigationally useful for ground-based or aerial robots. Finding trails in an image and determining possible obstacles on<br />
them are important tasks for robot navigation systems. Assuming that a rough initial segmentation or outline of the region<br />
of interest is available, our goal is to refine the initial guess to obtain a more accurate and detail representation of the true<br />
trail borders. In this paper, we compare the suitability of several previously published segmentation algorithms both in<br />
terms of agreement with ground truth and speed on a range of trail images with diverse appearance characteristics. These<br />
algorithms include generic graph cut, a shape-based version of graph cut which employs a distance penalty, Grab Cut, and<br />
an iterative superpixel grouping method.<br />
ThCT4 Dolmabahçe Hall A<br />
Image Representation and Analysis Regular Session<br />
Session chair: Debled-Rennesson, Isabelle (LORIA-Nancy Univ.)<br />
15:40-16:00, Paper ThCT4.1<br />
Object Decomposition via Curvilinear Skeleton Partition<br />
Serino, Luca, Istituto di Cibernetica<br />
Sanniti Di Baja, Gabriella, CNR<br />
Arcelli, Carlo, Istituto di Cibernetica<br />
A method to decompose a complex 3D object into simpler parts is presented, based on a suitable partition of the curvilinear<br />
skeleton of the object. The curvilinear skeleton is divided into subsets, by taking into account the regions of influence that<br />
can be associated with its branch points. The obtained subsets are then used to recover the parts into which the object can<br />
be decomposed.<br />
16:00-16:20, Paper ThCT4.2<br />
Differential Area Profiles<br />
Ouzounis, Georgios, Joint Res. Center - Ispra, European Commission,<br />
Soille, Pierre, Ec. Joint Res. Centre<br />
In this paper a new feature descriptor, the differential area profile (DAP), is presented. DAPs, like the regular differential<br />
morphological profiles, are computed from some size distribution. The proposed method is based on the area metric given<br />
by regular connected area filters. Area compared to local width, i.e. the diameter of the structuring element in the corresponding<br />
set of openings by reconstruction in classical DMPs, leads to a rather different multi-scale decomposition. This<br />
is investigated here and an example on a very high resolution satellite image tile is given.<br />
- 293 -
16:20-16:40, Paper ThCT4.3<br />
Connected Component Trees for Multivariate Image Processing and Applications in Astronomy<br />
Perret, Benjamin, Univ. of Strasbourg, LSIIT-CNRS<br />
Lefèvre, Sébastien, Univ. of Strasbourg<br />
Collet, Christophe, Univ. of Strasbourg, LSIIT-CNRS<br />
Slezak, Eric Jean Marc, Univ. de Nice - Sophia Antipolis,<br />
In this paper, we investigate the possibilities offered by the extension of the connected component trees (cc-trees) to multivariate<br />
images. We propose a general framework for image processing using the cc-tree based on the lattice theory and<br />
we discuss the possible applications depending on the properties of the underlying ordered set. This theoretical reflexion<br />
is illustrated by two applications in multispectral astronomical imaging: source separation and object detection.<br />
16:40-17:00, Paper ThCT4.4<br />
Multiresolution Analysis of 3D Images based on Discrete Distortion<br />
Weiss, Kenneth, Univ. of Maryland, Coll. Park<br />
Mesmoudi, Mohammed Mostefa, Univ. of Genova<br />
De Floriani, L., Univ. of Genova<br />
We consider a model of a 3D image obtained by discretizing it into a multiresolution tetrahedral mesh known as a hierarchy<br />
of diamonds. This model enables us to extract crack-free approximations of the 3D image at any uniform or variable resolution,<br />
thus reducing the size of the data set without reducing the accuracy. A 3D intensity image is a scalar field (the intensity<br />
field) defined at the vertices of a 3D regular grid and thus the graph of the image is a hyper surface in $R^4$. We<br />
measure the discrete distortion, a generalization of the notion of curvature, of the transformation which maps the tetrahedralized<br />
3D grid onto its graph in $R^4$. We evaluate the use of a hierarchy of diamonds to analyze properties of a 3D<br />
image, such as its discrete distortion, directly on lower resolution approximations. Our results indicate that distortionguided<br />
extractions focus the resolution of approximated images on the salient features of the intensity image.<br />
17:00-17:20, Paper ThCT4.5<br />
Multiscale Analysis of Digital Segments by Intersection of 2D Digital Lines<br />
Said, Mouhammad, Univ. de Savoie, Univ. d’Auvergne<br />
Lachaud, Jacques-Olivier, Univ. of Savoie<br />
Feschet, Fabien, Univ. d’Auvergne Clermont-Ferrand 1<br />
A theory for the multiscale analysis of digital shapes would be very interesting for the pattern recognition community,<br />
giving a digital equivalent of the continuous scale-space theory. We focus here on providing analytical formulae of the<br />
multiresolution of Digital Straight Segments (DSS), which is a fundamental tool for describing digital shape contours.<br />
ThCT5 Dolmabahçe Hall B<br />
Image/Video Processing Regular Session<br />
Session chair: Hamzaoğlu, İlker (Sabancı Univ.)<br />
15:40-16:00, Paper ThCT5.1<br />
Stereoscopic Image Inpainting: Distinct Depth Maps and Images Inpainting<br />
Hervieu, Alexandre, Barcelona Media, Univ. Pompeu Fabra of Barcelona<br />
Papadakis, Nicolas, Barcelona Media<br />
Bugeau, Aurélie, Barcelona Media<br />
Gargallo, Pau, Barcelona Media<br />
Caselles, Vicent, Univ. Pompeu Fabra<br />
In this paper we propose an algorithm for in painting of stereo images. The issue is to reconstruct the holes in a pair of<br />
stereo image as if they were the projection of a 3D scene. Hence, the reconstruction of the missing information has to produce<br />
a consistent visual perception of depth. Thus, first step of the algorithm consists in the computation and in painting<br />
of disparity maps in the given holes. The second step of the algorithm is to fill-in missing regions using the complete disparity<br />
maps in a way that avoids the creation of 3D artifacts. We present some experiments on several pairs of stereo images.<br />
- 294 -
16:00-16:20, Paper ThCT5.2<br />
Panoramic Video Generation by Multi View Data Synthesis<br />
D’Orazio, Tiziana, Italian National Res. Council<br />
Leo, Marco, Italian National Res. Council<br />
Mosca, Nicola, Italian National Res. Council<br />
This paper presents a mosaic based approach for enlarged view soccer video production that can be provided to the audience<br />
as a complementary view for greater enjoyment of relevant events, such as offside, counter attack or goal, that spread out<br />
all over the playing feld. Firstly, an enlarged view of the whole field is produced by fusing the images of six cameras<br />
placed on the two sides of the field. Then a color transformation is applied to have uniform colors on the parts of the<br />
playing field acquired from different cameras. Finally, the players are segmented by each camera and projected onto the<br />
enlarged view to produce videos of the most interesting events.<br />
16:20-16:40, Paper ThCT5.3<br />
An Adaptive True Motion Estimation Algorithm for Frame Rate Conversion of High Definition Video<br />
Cetin, Mert, Sabanci Univ.<br />
Hamzaoglu, Ilker, Sabanci Univ.<br />
Frame Rate Up-Conversion (FRUC) is necessary for displaying low frame rate video signals on high frame rate flat panel<br />
displays. This paper proposes an adaptive true Motion Estimation (ME) algorithm for FRUC of High Definition video<br />
formats. The proposed ME algorithm produces similar quality results with less number of calculations or better quality<br />
results with similar number of calculations compared to 3D Recursive Search true ME algorithm by adaptively using optimized<br />
sets of candidate search locations and several computational complexity reduction techniques.<br />
16:40-17:00, Paper ThCT5.4<br />
Human Daily Activities Indexing in Videos from Wearable Cameras for Monitoring of Patients with Dementia Diseases<br />
Karaman, Svebor, Lab.<br />
Benois-Pineau, Jenny, Lab.<br />
Megret, Remi, Univ. of Bordeaux<br />
Dovgalecs, Vladislavs, IMS<br />
Gaëstel, Yann, INSERM U.897<br />
Dartigues, Jean-Francois, INSERM U.897<br />
Our research focuses on analysing human activities according to a known behaviorist scenario, in case of noisy and high<br />
dimensional collected data. The data come from the monitoring of patients with dementia diseases by wearable cameras.<br />
We define a structural model of video recordings based on a Hidden Markov Model. New spatio-temporal features, color<br />
features and localization features are proposed as observations. First results in recognition of activities are promising.<br />
17:00-17:20, Paper ThCT5.5<br />
Automatic Composition of an Informative Wide-View Image from Video<br />
Habe, Hitoshi, NAIST<br />
Makiyama, Shota, NAIST<br />
Kidode, Masatsugu, NAIST<br />
We describe a method for generating an informative wide-view image using images captured by a moving camera. The<br />
generated image allows for events in the scene observed by the camera to be understood easily. Our method does not use<br />
3D shape information explicitly. Instead, it employs the trajectory of feature points across multiple images and generates<br />
a composite image by taking into account the distribution of the trajectories of the feature points.<br />
ThCT6 Topkapı Hall B<br />
Facial Expression Regular Session<br />
Session chair: Akarun, Lale (Bogazici Univ.)<br />
- 295 -
15:40-16:00, Paper ThCT6.1<br />
Regression-Based Multi-View Facial Expression Recognition<br />
Rudovic, Ognjen, Imperial Coll.<br />
Patras, Ioannis, Queen Mary Univ. of London<br />
Pantic, Maja, Imperial Coll.<br />
We present a regression-based scheme for multi-view facial expression recognition based on 2D geometric features. We<br />
address the problem by mapping facial points (e.g. mouth corners) from non-frontal to frontal view where further recognition<br />
of the expressions can be performed using a state-of-the-art facial expression recognition method. To learn the mapping<br />
functions we investigate four regression models: Linear Regression (LR), Support Vector Regression (SVR),<br />
Relevance Vector Regression (RVR) and Gaussian Process Regression (GPR). Our extensive experiments on the CMU<br />
Multi-PIE facial expression database show that the proposed scheme outperforms view-specific classifiers by utilizing<br />
considerably less training data.<br />
16:00-16:20, Paper ThCT6.2<br />
A Set of Selected SIFT Features for 3D Facial Expression Recognition<br />
Berretti, Stefano, Univ. of Firenze<br />
Del Bimbo, Alberto, Univ. of Florence<br />
Pala, Pietro, Univ. of Firenze<br />
Ben Amor, Boulbaba, LIFL UMR 8022<br />
Daoudi, Mohamed, TELECOM Lille1<br />
In this paper, the problem of person-independent facial expression recognition is addressed on 3D shapes. To this end, an<br />
original approach is proposed that computes SIFT descriptors on a set of facial landmarks of depth images, and then selects<br />
the subset of most relevant features. Using SVM classification of the selected features, an average recognition rate of<br />
77.5% on the BU-3DFE database has been obtained. Comparative evaluation on a common experimental setup, shows<br />
that our solution is able to obtain state of the art results.<br />
16:20-16:40, Paper ThCT6.3<br />
Local 3D Shape Analysis for Facial Expression Recognition<br />
Maalej, Ahmed, LIFL UMR 8022<br />
Ben Amor, Boulbaba, LIFL UMR 8022<br />
Daoudi, Mohamed, TELECOM Lille1<br />
Srivastava, Anuj, Florida State Univ.<br />
Berretti, Stefano, Univ. of Firenze<br />
We investigate the problem of facial expression recognition using 3D face data. Our approach is based on local shape<br />
analysis of several relevant regions of a given face scan. These regions or patches from facial surfaces are extracted and<br />
represented by sets of closed curves. A Riemannian framework is used to derive the shape analysis of the extracted patches.<br />
The applied framework permits to calculate a similarity (or dissimilarity) distances between patches, and to compute the<br />
optimal deformation between them. Once calculated, these measures are employed as inputs to a commonly used classification<br />
techniques such as AdaBoost and Support Vector Machines (SVM). A quantitative evaluation of our novel approach<br />
is conducted on a subset of the publicly available BU-3DFE database.<br />
16:40-17:00, Paper ThCT6.4 CANCELED<br />
Incorporating Action Unit Co-Movement in Classification of Dynamic Facial Expressions using Lasso<br />
Rastad, Mahdi, Univ. of Illinois<br />
Zhu, Lusha, Univ. of Illinois<br />
Koenker, Roger, Univ. of Illinois<br />
Spencer-Smith, Jesse, Univ. of Illinois<br />
Hsu, Ming, Univ. of California, Berkeley<br />
Current literature on facial expression analysis are often applied to static facial images along with a small set of expressions.<br />
In this research we generate a novel dataset of facial action unit dynamics during several experiment sessions by the means<br />
of an avatar controlled by participants using a joystick. Previous studies have shown that this generates highly realistic<br />
facial expressions, comparable to popular displays of facial expressions used in computer vision experiments. Here we<br />
- 296 -
extend this work by using functional data analysis (FDA) to classify facial movement functions into basic emotion categories.<br />
Several single and hybrid classification algorithms are tested. By incorporating action unit co-movement in a Lasso<br />
shrinkage method, we achieved a recognition rate of 89%, substantially outperforming competitor approaches. Application<br />
to real expressions, and introduction of intensity and other temporal features of expressions are discussed as examples of<br />
extensions of our method.<br />
17:00-17:20, Paper ThCT6.5<br />
Multi-Modal Emotion Recognition using Canonical Correlations and Acoustic Features<br />
Gajsek, Rok, Univ. of Ljubljana<br />
Struc, Vitomir, Univ. of Ljubljana<br />
Mihelic, France, Univ. of Ljubljana<br />
The information of the psycho-physical state of the subject is becoming a valuable addition to the modern audio or video<br />
recognition systems. As well as enabling a better user experience, it can also assist in superior recognition accuracy of the<br />
base system. In the article, we present our approach to multi-modal (audio-video) emotion recognition system. For audio<br />
sub-system, a feature set comprised of prosodic, spectral and cepstrum features is selected and support vector classifier is<br />
used to produce the scores for each emotional category. For video sub-system a novel approach is presented, which does<br />
not rely on the tracking of specific facial landmarks and thus, eliminates the problems usually caused, if the tracking algorithm<br />
fails at detecting the correct area. The system is evaluated on the interface database and the recognition accuracy<br />
of our audio-video fusion is compared to the published results in the literature.<br />
ThCT7 Dolmabahçe Hall C<br />
Multimedia and Document Analysis Applications Regular Session<br />
Session chair: Duygulu Sahin, Pinar (Bilkent Univ.)<br />
15:40-16:00, Paper ThCT7.1<br />
Automatic Music Genre Classification using Bass Lines<br />
Simsekli, Umut, Bogazici Univ.<br />
A bass line is an instrumental melody that encapsulates both rhythmic, melodic, and harmonic features and arguably contains<br />
sufficient information for accurate genre classification. In this paper a bass line based automatic music genre classification<br />
system is described. “Melodic Interval Histograms” are used as features and k-nearest neighbor classifiers are<br />
utilized and compared with SVMs on a small size standard MIDI database. Apart from standard distance metrics for knearest<br />
neighbor (Euclidean, symmetric Kullback-Leibler, earth mover’s, normalized compression distances) we propose<br />
a novel distance metric, perceptually weighted Euclidean distance (PWED). The maximum classification accuracy (84%)<br />
is obtained with k-nearest neighbor classifiers and the added utility of the novel metric is illustrated in our experiments.<br />
16:00-16:20, Paper ThCT7.2<br />
Exploiting Combined Multi-Level Model for Document Sentiment Analysis<br />
Li, Si, Beijing Univ. of Posts and Telecommunications<br />
Zhang, Hao, Beijing Univ. of Posts and Telecommunications<br />
Xu, Weiran, Beijing Univ. of Posts and Telecommunications<br />
Guo, Jun, Beijing Univ. of Posts and Telecommunications<br />
This paper focuses on the task of text sentiment analysis in hybrid online articles and web pages. Traditional approaches<br />
of text sentiment analysis typically work at a particular level, such as phrase, sentence or document level, which might<br />
not be suitable for the documents with too few or too many words. Considering every level analysis has its own advantages,<br />
we expect that a combination model may achieve better performance. In this paper, a novel combined model based on<br />
phrase and sentence level’s analyses and a discussion on the complementation of different levels’ analyses are presented.<br />
For the phrase-level sentiment analysis, a newly defined Left-Middle-Right template and the Conditional Random Fields<br />
are used to extract the sentiment words. The Maximum Entropy model is used in the sentence-level sentiment analysis.<br />
The experiment results verify that the combination model with specific combination of features is better than single level<br />
model.<br />
- 297 -
16:20-16:40, Paper ThCT7.3<br />
MONORAIL: A Disk-Friendly Index for Huge Descriptor Databases<br />
Akune, Fernando, Univ. of Campinas<br />
Valle, Eduardo, Univ. of Campinas<br />
Torres, Ricardo, Univ. of Campinas<br />
We propose MONORAIL, an indexing scheme for very large multimedia descriptor databases. Our index is based on the<br />
Hilbert curve, which is able to map the high-dimensional space of those descriptors to a single dimension. Instead of using<br />
several curves to mitigate boundary effects, we use a single curve with several surrogate points for each descriptor. Thus,<br />
we are able to reduce the random accesses to the bare minimum. In a rigorous empirical comparison with another method<br />
based on multiple surrogates, ours shows a significant improvement, due to our careful choice of the surrogate points.<br />
16:40-17:00, Paper ThCT7.4<br />
Localized Supervised Metric Learning on Temporal Physiological Data<br />
Sun, Jimeng, IBM T. J. Watson Res. Center<br />
Sow, Daby, IBM T.J. Watson Res. Center<br />
Hu, Jianying, IBM<br />
Ebadollahi, Shahram, IBM T.J. Watson Res. Center<br />
Effective patient similarity assessment is important for clinical decision support. It enables the capture of past experience<br />
as manifested in the collective longitudinal medical records of patients to help clinicians assess the likely outcomes resulting<br />
from their decisions and actions. However, it is challenging to devise a patient similarity metric that is clinically relevant<br />
and semantically sound. Patient similarity is highly context sensitive: it depends on factors such as the disease, the particular<br />
stage of the disease, and co-morbidities. One way to discern the semantics in a particular context is to take advantage of<br />
physicians’ expert knowledge as reflected in labels assigned to some patients. In this paper we present a method that leverages<br />
localized supervised metric learning to effectively incorporate such expert knowledge to arrive at semantically sound<br />
patient similarity measures. Experiments using data obtained from the MIMIC II database demonstrate the effectiveness<br />
of this approach.<br />
17:00-17:20, Paper ThCT7.5<br />
Automatic Detection of Phishing Target from Phishing Webpage<br />
Liu, Gang, City Univ. of Hong Kong<br />
Qiu, Bite, City Univ. of Hong Kong<br />
Liu, Wenyin, City Univ. of Hong Kong<br />
An approach to identification of the phishing target of a given (suspicious) webpage is proposed by clustering the webpage<br />
set consisting of its all associated webpages and the given webpage itself. We first find its associated webpages, and then<br />
explore their relationships to the given webpage as their features for clustering. Such relationships include link relationship,<br />
ranking relationship, text similarity, and webpage layout similarity relationship. A DBSCAN clustering method is employed<br />
to find if there is a cluster around the given webpage. If such cluster exists, we claim the given webpage is a phishing<br />
webpage and then find its phishing target (i.e., the legitimate webpage it is attacking) from this cluster. Otherwise, we<br />
identify it as a legitimate webpage. Our test dataset consists of 8745 phishing pages (targeting at 76 well-known websites)<br />
selected from Phish Tank and preliminary experiments show that the approach can successfully identify 91.44% of their<br />
phishing targets. Another dataset of 1000 legitimate webpages is collected to test our method‘s false alarm rate, which is<br />
3.40%.<br />
ThBCT8 Upper Foyer<br />
Pattern Recognition Systems and Applications - III Poster Session<br />
Session chair: Radeva, Petia (CVC)<br />
13:30-16:30, Paper ThBCT8.1<br />
Underwater Mine Classification with Imperfect Labels<br />
Williams, David, NATO Undersea Res. Centre<br />
A new algorithm for performing classification with imperfectly labeled data is presented. The proposed approach is motivated<br />
by the insight that the average prediction of a group of sufficiently informed people is often more accurate than the<br />
- 298 -
prediction of any one supposed expert. This idea that the “wisdom of crowds” can outperform a single expert is implemented<br />
by drawing sets of labels as samples from a Bernoulli distribution with a specified labeling error rate. Additionally,<br />
ideas from multiple imputation are exploited to provide a principled way for determining an appropriate number of label<br />
sampling rounds to consider. The approach is demonstrated in the context of an underwater mine classification application<br />
on real synthetic aperture sonar data collected at sea, with promising results.<br />
13:30-16:30, Paper ThBCT8.2<br />
Optimizing Optimum-Path Forest Classification for Huge Datasets<br />
Papa, Joao Paulo, Sao Paulo State Univ<br />
Cappabianco, Fabio, Univ. of Campinas<br />
Falcao, Alexandre Xavier, State Univ. of Campinas<br />
Traditional pattern recognition techniques can not handle the classification of large datasets with both efficiency and effectiveness.<br />
In this context, the Optimum-Path Forest (OPF) classifier was recently introduced, trying to achieve high<br />
recognition rates and low computational cost. Although OPF was much faster than Support Vector Machines for training,<br />
it was slightly slower for classification. In this paper, we present the Efficient OPF (EOPF), which is an enhanced and<br />
faster version of the traditional OPF, and validate it for the automatic recognition of white matter and gray matter in magnetic<br />
resonance images of the human brain.<br />
13:30-16:30, Paper ThBCT8.3<br />
Model-Based Detection of Acoustically Dense Objects in Ultrasound<br />
Banerjee, Jyotirmoy, General Electric<br />
Krishnan, Kajoli B., General Electric<br />
Traditional methods of detection tend to under perform in the presence of the strong and variable background clutter that<br />
characterize a medical ultrasound image. In this paper, we present a novel diffusion based technique to localize acoustically<br />
dense objects in an ultrasound image. The approach is premised on the observation that the topology of noise in ultrasound<br />
images is more sensitive to diffusion than that of any such physical object. We show that our method when applied to the<br />
problem of fetal head detection and automatic measurement of head circumference in 59 obstetric scans compares remarkably<br />
well with manually assisted measurements. Based on fetal age estimates and their bounds specified in Standard<br />
OB Tables [6], the Gestational Age predictions from automated measurements is found to be within 2SD in 95% and 98%<br />
of cases when compared with manual measurements by two experts. The framework is general and can be extended to<br />
object localization in diverse applications of ultrasound imaging.<br />
13:30-16:30, Paper ThBCT8.4<br />
SubXPCA versus PCA: A Theoretical Investigation<br />
Negi, Atul, Univ. of Hyderabad<br />
Kadappagari, Vijaya Kumar, Vasavi Coll. of Egineering<br />
Principal Component Analysis (PCA) is a widely accepted dimensionality reduction technique that is optimal in a MSE<br />
sense. PCA extracts `global’ variations and is insensitive to `local’ variations in sub patterns. Recently, we have proposed<br />
a novel approach, SubXPCA, which was more effective computationally than PCA and also effective in computing principal<br />
components with both global and local information across sub patterns. In this paper, we show the near-optimality<br />
of SubXPCA (in terms of summarization of variance) by proving analytically that `SubXPCA approaches PCA with increase<br />
in number of local principal components of sub patterns.’ This is demonstrated empirically upon CMU Face Data.<br />
13:30-16:30, Paper ThBCT8.5<br />
Feature Extraction Base on Class Mean Embedding (CME)<br />
Wan, Minghua, Nanjing Univ. of Science and Tech.<br />
Lai, Zhihui, Nanjing Univ. of Science and Tech.<br />
Jin, Zhong, Nanjing Univ. of Science and Tech.<br />
Recently, local discriminant embedding (LDE) was proposed to manifold learning and pattern classification. In LDE<br />
framework, the neighbor and class of data points were used to construct the graph embedding for classification problems.<br />
From a high dimensional to a low dimensional subspace, data points of the same class maintain their intrinsic neighbor<br />
- 299 -
elations, whereas neighboring data points of different classes no longer stick to one another. But, neighboring data points<br />
of different classes are not deemphasized efficiently by LDE and it may degrade the performance of classification. In this<br />
paper, we investigated its extension, called class mean embedding (CME), using class mean of data points to enhance its<br />
discriminant power in their mapping into a low dimensional space. Experimental results on ORL and FERET face databases<br />
show the effectiveness of the proposed method.<br />
13:30-16:30, Paper ThBCT8.6<br />
Forest Species Recognition using Color-Based Features<br />
Paula, Pedro Luiz, UFPR<br />
Oliveira, Luiz, Federal Univ. of Parana<br />
Britto, Alceu, Pontificia Univ. Católica do Paraná<br />
Sabourin, R., École de Tech. supérieure<br />
In this work we address the problem of forest species recognition which is a very challenging task and has several potential<br />
applications in the wood industry. The first contribution of this work is a database composed of 22 different species of the<br />
Brazilian flora that has been carefully labeled by expert in wood anatomy. In addition, in this work we demonstrate through<br />
a series of comprehensive experiments that color-based features are quite useful to increase the discrimination power for<br />
this kind of application. Last but not least, we propose a segmentation approach so that a wood can be locally processed<br />
to mitigate the intra-class variability featured in some classes. Such an approach also brings important contribution to improve<br />
the final performance in terms of classification.<br />
13:30-16:30, Paper ThBCT8.7<br />
An Information Theoretic Linear Discriminant Analysis Method<br />
Zhang, Haihong, Inst. for Infocomm Res.<br />
Guan, Cuntai, Inst. for Infocomm Res.<br />
We propose a novel linear discriminant analysis method and demonstrate its superiority over existing linear methods.<br />
Based on information theory, we introduce a non-parametric estimate of mutual information with variable kernel bandwidth.<br />
Furthermore, we derive a gradient-based optimization algorithm for learning the optimal linear reduction vectors which<br />
maximizes the mutual information estimate. We evaluate the proposed method by running cross-validation on 2 data sets<br />
from the UCI repository, together with linear and nonlinear SVMs as classifiers. The result attests to the superority of the<br />
method over conventional LDA and its variant, aPAC.<br />
13:30-16:30, Paper ThBCT8.8<br />
Framewise Phone Classification using Weighted Fuzzy Classification Rules<br />
Dehzangi, Omid, Nanyang Tech. Univ.<br />
Ma, Bin, Inst. for Infocomm Res.<br />
Chng, Eng Siong, Nanyang Tech. Univ.<br />
Li, Haizhou, Inst. for Infocomm Res.<br />
Our aim in this paper is to propose a rule-weight learning algorithm in fuzzy rule-based classifiers. The proposed algorithm<br />
is presented in two modes: first, all training examples are assumed to be equally important and the algorithm attempts to<br />
minimize the error-rate of the classifier on the training data by adjusting the weight of each fuzzy rule in the rule-base,<br />
and second, a weight is assigned to each training example as the cost of misclassification of it using the class distribution<br />
of its neighbors. Then, instead of minimizing the error-rate, the learning algorithm is modified to minimize the sum of<br />
costs for misclassified examples. Using six data sets from UCI-ML repository and the TIMIT speech corpus for frame<br />
wise phone classification, we show that our proposed algorithm considerably improves the prediction ability of the classifier.<br />
13:30-16:30, Paper ThBCT8.9<br />
Statistical Fourier Descriptors for Defect Image Classification<br />
Timm, Fabian, Univ. of Lübeck<br />
Martinetz, Thomas, Univ. of Lübeck<br />
In many industrial applications, Fourier descriptors are commonly used when the description of the object shape is an im-<br />
- 300 -
portant characteristic of the image. However, these descriptors are limited to single objects. We propose a general Fourierbased<br />
approach, called statistical Fourier descriptor (SFD), which computes shape statistics in grey level images. The SFD<br />
is computationally efficient and can be used for defect image classification. In a first example, we deployed the SFD to<br />
the inspection of welding seams with promising results.<br />
13:30-16:30, Paper ThBCT8.10<br />
A Measure of Competence based on Randomized Reference Classifier for Dynamic Ensemble Selection<br />
Woloszynski, Tomasz, Wroclaw Univ. of Tech.<br />
Kurzynski, Marek, Wroclaw Univ. of Tech.<br />
This paper presents a measure of competence based on a randomized reference classifier (RRC) for classifier ensembles.<br />
The RRC can be used to model, in terms of class supports, any classifier in the ensemble. The competence of a modelled<br />
classifier is calculated as the probability of correct classification of the respective RRC. A multiple classifier system (MCS)<br />
was developed and its performance was compared against five MCSs using eight databases taken from the UCI Machine<br />
Learning Repository. The system developed achieved the highest overall classification accuracies for both homogeneous<br />
and heterogeneous ensembles.<br />
13:30-16:30, Paper ThBCT8.12<br />
Information Theory based WCE Video Summarization<br />
Granata, Eliana, Univ. of Catania<br />
Gallo, Giovanni, Univ. of Catania<br />
Torrisi, Alessandro, Univ. of Catania<br />
Wireless Capsule Endoscopy (WCE) is a technical break-through that allows to produce a video of the entire intestine<br />
without surgery. It is reported that a medical clinician spends one or two hours to assess a WCE video. It is hence useful<br />
to help the physician to do analysis diagnosis using computerized methods. In this paper an algorithmic informationtheroretic<br />
method is presented for the automatic summarization of meaningful changes in video sequences extracted from<br />
WCE videos. To segment a WCE video into anatomic parts (esophagus, stomach, small intestine, colon) we use a textonsbased<br />
method. The local textons histogram sequence is used for image representation and the Normalized Compression<br />
Distance (NCD) measure is used to compute the similarity between images.<br />
13:30-16:30, Paper ThBCT8.13<br />
An LDA-Based Relative Hysteresis Classifier with Application to Segmentation of Retinal Vessels<br />
Condurache, Alexandru Paul, Univ. of Luebeck<br />
Müller, Florian, Univ. of Luebeck<br />
Mertins, Alfred, Univ. of Luebeck<br />
In a pattern classification setup, image segmentation is achieved by assigning each pixel to one of two classes: object or<br />
background. The special case of vessel segmentation is characterized by a strong disproportion between the number of<br />
representatives of each class (i.e. class skew) and also by a strong overlap between classes. These difficulties can be solved<br />
using problem-specific knowledge. The proposed hysteresis classification makes use of such knowledge in an efficient<br />
way. We describe a novel, supervised, hysteresis-based classification method that we apply to the segmentation of retina<br />
photographies. This procedure is fast and achieves results that comparable or even superior to other hysteresis methods<br />
and, for the problem of retina vessel segmentation, to known dedicated methods on similar data sets.<br />
13:30-16:30, Paper ThBCT8.14<br />
An Offline Map Matching via Integer Programming<br />
Yanagisawa, Hiroki, IBM<br />
The map matching problem is, given a spatial road network and a sequence of locations of an object moving on the network,<br />
to identify the path in the network that the moving object passed through. In this paper, an integer programming formulation<br />
for the offline map matching problem is presented. This is the first approach that gives the optimal solution with respect<br />
to a widely used objective function for map matching.<br />
- 301 -
13:30-16:30, Paper ThBCT8.15<br />
Invisible Calibration Pattern based on Human Visual Perception Characteristics<br />
Takimoto, Hironori, Okayama Prefectural Univ.<br />
Yoshimori, Seiki, Nippon Bunri Univ.<br />
Mitsukura, Yasue, Tokyo Univ. of Agriculture and Tech.<br />
Fukumi, Minoru, The Univ. of Tokushima<br />
In the print-type steganographic system and watermark, a calibration pattern is arranged around contents where invisible<br />
data is embedded, as plural feature points corresponding to between an original image and the scanned image for normalization<br />
of the scanned image. However, it is clear that conventional methods interfere with page layout and artwork of<br />
contents. In addition, visible calibration patterns are not suitable for security service. In this paper, we propose an arrangement<br />
and detection method of an invisible calibration pattern based on characteristics of human visual perception. The<br />
calibration pattern is embedded to blue intensity in an original image by adding high frequency component.<br />
13:30-16:30, Paper ThBCT8.16<br />
Boosting Gray Codes for Red Eyes Removal<br />
Battiato, Sebastiano, Univ. of Catania<br />
Farinella, Giovanni Maria, Univ. of Catania<br />
Guarnera, Mirko, ST Microelectronics<br />
Messina, Giuseppe, ST Microelectronics<br />
Ravì, Daniele, ST Microelectronics<br />
Since the large diffusion of digital camera and mobile devices with embedded camera and flashgun, the red-eyes artifacts<br />
have de-facto become a critical problem. The technique herein described makes use of three main steps to identify and remove<br />
red-eyes. First, red eyes candidates are extracted from the input image by using an image filtering pipeline. A set of<br />
classifiers is then learned on gray code features extracted in the clustered patches space, and hence employed to distinguish<br />
between eyes and non-eyes patches. Once red-eyes are detected, artifacts are removed through desaturation and brightness<br />
reduction. The proposed method has been tested on large dataset of images achieving effective results in terms of hit rates<br />
maximization, false positives reduction and quality measure.<br />
13:30-16:30, Paper ThBCT8.17<br />
A New Rotation Feature for Single Tri-Axial Accelerometer based 3D Spatial Handwritten Digit Recognition<br />
Xue, Yang, South China Univ. of Tech.<br />
Jin, Lianwen, South China Univ. of Tech.<br />
A new rotation feature extracted from tri-axial acceleration signals for 3D spatial handwritten digit recognition is proposed.<br />
The feature can effectively express the clockwise and anti-clockwise direction changes of the users‘ movement while writing<br />
in a 3D space. Based on the rotation feature, an algorithm for 3D spatial handwritten digit recognition is presented.<br />
First, the rotation feature of the handwritten digit is extracted and coded. Then, the normalized edit distance between the<br />
digit and class model is computed. Finally, classification is performed using Support Vector Machine (SVM). The proposed<br />
approach outperforms time-domain features with a 22.12% accuracy improvement, peak-valley features with a 12.03%<br />
accuracy improvement, and FFT features with a 3.24% accuracy improvement, respectively. Experimental results show<br />
that the proposed approach is effective.<br />
13:30-16:30, Paper ThBCT8.18<br />
Improved Mean Shift Algorithm with Heterogeneous NodeWeights<br />
Yoon, Ji Won, Trinity Coll. Dublin<br />
Wilson, Simon, Trinity Coll. Dublin<br />
The conventional mean shift algorithm has been known to be sensitive to selecting a bandwidth. We present a robust mean<br />
shift algorithm with heterogeneous node weights that come from a geometric structure of a given data set. Before running<br />
MS procedure, we reconstruct un-normalized weights (a rough surface of data points) from the Delaunay Triangulation.<br />
The un-normalized weights help MS to avoid the problem of failing of misled mean shift vectors. As a result, we can<br />
obtain a more robust clustering result compared to the conventional mean shift algorithm. We also propose an alternative<br />
way to assign weights for large size datasets and noisy datasets.<br />
- 302 -
13:30-16:30, Paper ThBCT8.19<br />
Word Clustering using PLSA Enhanced with Long Distance Bigrams<br />
Bassiou, Nikoletta, Aristotle Univ. of Thessaloniki<br />
Kotropoulos, Constantine, Aristotle Univ. of Thessaloniki<br />
Probabilistic latent semantic analysis is enhanced with long distance bigram models in order to improve word clustering.<br />
The long distance bigram probabilities and the interpolated long distance bigram probabilities at varying distances within<br />
a context capture different aspects of contextual information. In addition, the baseline bigram, which incorporates triggerpairs<br />
for various histories, is tested in the same framework. The experimental results collected on publicly available corpora<br />
(CISI, Cran field, Medline, and NPL) demonstrate the superiority of the long distance bigrams over the baseline bigrams<br />
as well as the superiority of the interpolated long distance bigrams against the long distance bigrams and the baseline<br />
bigram with trigger-pairs in yielding more compact clusters containing less outliers.<br />
13:30-16:30, Paper ThBCT8.20<br />
Scene Classification using Local Co-Occurrence Feature in Subspace Obtained by KPCA of Local Blob Visual<br />
Words<br />
Hotta, Kazuhiro, Meijo University<br />
In recent years, scene classification based on local correlation of binarized projection lengths in subspace obtained by<br />
Kernel Principal Component Analysis (KPCA) of visual words was proposed and its effectiveness was shown. However,<br />
local correlation of 2 binary features becomes 1 only when both features are 1. In other cases, local correlation becomes<br />
0. This discarded information. In this paper, all kinds of co-occurrence of 2 binary features are used. This is the first device<br />
of our method. The second device is local Blob visual words. Conventional method made visual words from an orientation<br />
histogram on each grid. However, it is too local information. We use orientation histograms in a local Blob on grid as a<br />
basic feature and develop local Blob visual words. The third device is norm normalization of each orientation histogram<br />
in a local Blob. By normalizing local norm, the similarity between corresponding orientation histogram is reflected in<br />
subspace by KPCA. By these 3 devices, the accuracy is achieved more than 84% which is higher than conventional methods.<br />
13:30-16:30, Paper ThBCT8.21<br />
Recognition and Prediction of Situations in Urban Traffic Scenarios<br />
Käfer, Eugen, Daimler AG<br />
Hermes, Christoph, Bielefeld Univ.<br />
Wöhler, Christian, Dortmund University of Technology<br />
Kummert, Franz, Bielefeld Univ.<br />
Ritter, Helge, Bielefeld Univ.<br />
The recognition and prediction of intersection situations and an accompanying threat assessment are an indispensable skill<br />
of future driver assistance systems. This study focuses on the recognition of situations involving two vehicles at intersections.<br />
For each vehicle, a set of possible future motion trajectories is estimated and rated based on a motion database for<br />
a time interval of 2-4 s ahead. Possible situations involving two vehicles are generated by a pairwise combination of these<br />
individual motion trajectories. An interaction model based on the mutual visibility of the vehicles and the assumption that<br />
a driver will attempt to avoid a collision is used to rate possible situations. The correspondingly favoured situations are<br />
classified with a probabilistic framework. The proposed method is evaluated on a real-world differential GPS data set acquired<br />
during a test drive of about 10 km, including three road intersections. Our method is typically able to recognise the<br />
situation correctly about 1.5-3 s before the last vehicle has passed its minimum distance to the centre of the intersection.<br />
13:30-16:30, Paper ThBCT8.22<br />
Employing Decoding of Specific Error Correcting Codes as a New Classification Criterion in Multiclass Learning<br />
Problems<br />
Luo, Yurong, Virginia Commonwealth Univ.<br />
Kayvan, Najarian, Virginia Commonwealth Univ.<br />
Error Correcting Output Codes (ECOC) method solves multiclass learning problems by combining the outputs of several<br />
binary classifiers according to an error correcting output code matrix. Traditionally, the minimum Hamming distance is<br />
adopted as the classification criterion to “vote” among multiple hypotheses, and the focus is given to the choice of error<br />
- 303 -
correcting output code matrix. In this paper, we apply a decoding methodology in multiclass learning problems, in which<br />
class labels of testing samples are unknown. In other words, without comparing the predicted and actual class labels, it<br />
can be known whether testing samples are classified correctly. Based on this property, a new cascade classifier is introduced.<br />
The classifier can improve the accuracy and will not result in over fitting. The analytical results show feasibility, accuracy,<br />
and the advantages of the proposed method.<br />
13:30-16:30, Paper ThBCT8.23<br />
EEG-Based Emotion Recognition using Self-Organizing Map for Boundary Detection<br />
Khosrowabadi, Reza, Nanyang Tech. Univ. Singapore<br />
Ang, Kai Keng, Inst. for Infocomm Res. A*STAR<br />
Quek, Hiok Chai, Nanyang Tech. Univ.<br />
Bin Abdul Rahman, Abdul Wahab, International Islamic Univ. Malaysia<br />
This paper presents an EEG-based emotion recognition system using self-organizing map for boundary detection. Features<br />
from EEG signals are classified by considering the subjects‘ emotional responses using scores from SAM questionnaire.<br />
The selection of appropriate threshold levels for arousal and valence is critical to the performance of the recognition system.<br />
Therefore, this paper investigates the performance of a proposed EEG-based emotion recognition system that employed selforganizing<br />
map to identify the boundaries between separable regions. A study was performed to collect 8 channels of EEG<br />
data from 26 healthy right-handed subjects in experiencing 4 emotional states while exposed to audio-visual emotional<br />
stimuli. EEG features were extracted using the magnitude squared coherence of the EEG signals. The boundaries of the EEG<br />
features were then extracted using SOM. 5-fold cross-validation was then performed using the k-nn classifier. The results<br />
showed that proposed method improved the accuracies to 84.5%.<br />
13:30-16:30, Paper ThBCT8.24<br />
Vocabulary-Based Approaches for Multiple-Instance Data: A Comparative Study<br />
Amores, Jaume, Univ. Autònoma de Barcelona<br />
Multiple Instance Learning (MIL) has become a hot topic and many different algorithms have been proposed in the last<br />
years. Despite this fact, there is a lack of comparative studies that shed light into the characteristics of the different methods<br />
and their behavior in different scenarios. In this paper we provide such an analysis. We include methods from different families,<br />
and pay special attention to vocabulary-based approaches, a new family of methods that has not received much attention<br />
in the MIL literature. The empirical comparison includes seven databases from four heterogeneous domains, implementations<br />
of eight popular MIL methods, and a study of the behavior under synthetic conditions. Based on this analysis, we show that,<br />
with an appropriate implementation, vocabulary-based approaches outperform other MIL methods in most of the cases,<br />
showing in general a more consistent performance.<br />
13:30-16:30, Paper ThBCT8.25<br />
A Multiple Classifier System Approach for Facial Expressions in Image Sequences Utilizing GMM Supervectors<br />
Schels, Martin, Univ. of Ulm<br />
Schwenker, Friedhelm, Univ. of Ulm<br />
The Gaussian mixture model (GMM) super vector approach is a well known technique in the domain of speech processing,<br />
e.g. speaker verification and audio segmentation. In this paper we apply this approach to video data in order to recognize<br />
human facial expressions. Three different image feature types (optical flow histograms, orientation histograms and principal<br />
components) from four pre-selected regions of the human’s face image were extracted and GMM super-vectors of the feature<br />
channels per sequence were constructed. Support vector machines (SVM) were trained using these super vectors for every<br />
channel separately and its results were combined using classifier fusion techniques. Thus, the performance of the classifier<br />
could be improved compared to the best individual classifier.<br />
13:30-16:30, Paper ThBCT8.26<br />
Incremental Learning of Visual Landmarks for Mobile Robotics<br />
Bandera, Antonio, Univ. of Malaga<br />
Vázquez-Martín, Ricardo, Centro Andaluz de Innovación y Tecnologías de la Información y las Comunicaciones CITIC<br />
Marfil, Rebeca, Univ. of Malaga<br />
- 304 -
This paper proposes an incremental scheme for visual landmark learning and recognition. The feature selection stage characterises<br />
the landmark using the Opponent SIFT, a color-based variant of the SIFT descriptor. To reduce the dimensionality<br />
of this descriptor, an incremental non-parametric discriminant analysis is conducted to seek directions for efficient discrimination<br />
(incremental eigenspace learning). On the other hand, the classification stage uses the incremental envolving clustering<br />
method (ECM) to group feature vectors into a set of clusters (incremental prototype learning). Then, the final classification<br />
is conducted based on the k-nearest neighbor approach, whose prototypes were updated by the ECM. This global scheme<br />
enables a classifier to learn incrementally, on-line, and in one pass. Besides, the ECM allows to reduce the memory and computation<br />
expenses. Experimental results show that the proposed recognition system is well suited to be used by an autonomous<br />
mobile robot.<br />
13:30-16:30, Paper ThBCT8.27<br />
Subspace Methods with Globally/Locally Weighted Correlation Matrix<br />
Yamashita, Yukihiko, Tokyo Inst. of Tech.<br />
Wakahara, Toru, Hosei Univ.<br />
The discriminant function of a subspace method is provided by using correlation matrices that reflect the averaged feature<br />
of a category. As a result, it will not work well on unknown input patterns that are far from the average. To address this problem,<br />
we propose two kinds of weighted correlation matrices for subspace methods. The globally weighted correlation matrix<br />
(GWCM) attaches importance to training patterns that are far from the average. Then, it can reflect the distribution of patterns<br />
around the category boundary more precisely. The computational cost of a subspace method using GWCMs is almost the<br />
same as that using ordinary correlation matrices. The locally weighted correlation matrix (LWCM) attaches importance to<br />
training patterns that arenear to an input pattern to be classified. Then, it can reflect the distribution of training patterns<br />
around the input pattern in more detail. The computational cost of a subspace method with LWCM at the recognition stage<br />
does not depend on the number of training patterns, while those of the conventional adaptive local and the nonlinear subspace<br />
methods do. We show the advantages of the proposed methods by experiments made on the MNIST database of handwritten<br />
digits.<br />
13:30-16:30, Paper ThBCT8.28<br />
The Binormal Assumption on Precision-Recall Curves<br />
Brodersen, Kay Henning, ETH Zurich<br />
Ong, Cheng Soon, ETH Zurich<br />
Stephan, Klaas Enno, Univ. of Zurich<br />
Buhmann, Joachim M., Swiss Federal Inst. of Tech. Zurich<br />
The precision-recall curve (PRC) has become a widespread conceptual basis for assessing classification performance. The curve<br />
relates the positive predictive value of a classifier to its true positive rate and often provides a useful alternative to the well-known<br />
receiver operating characteristic (ROC). The empirical PRC, however, turns out to be a highly imprecise estimate of the true curve,<br />
especially in the case of a small sample size and class imbalance in favour of negative examples. Ironically, this situation tends to<br />
occur precisely in those applications where the curve would be most useful, e.g., in anomaly detection or information retrieval. Here,<br />
we propose to estimate the PRC on the basis of a simple distributional assumption about the decision values that generalizes the established<br />
binormal model for estimating smooth ROC curves. Using simulations, we show that our approach outperforms empirical<br />
estimates, and that an account of the class imbalance is crucial for obtaining unbiased PRC estimates.<br />
13:30-16:30, Paper ThBCT8.29<br />
Incremental Training of Multiclass Support Vector Machines<br />
Nikitidis, Symeon, Centre for Res. and Tech. Hellas<br />
Nikolaidis, Nikos, Aristotle Univ. of Thessaloniki<br />
Pitas, Ioannis, -<br />
We present a new method for the incremental training of multiclass Support Vector Machines that provides computational efficiency<br />
for training problems in the case where the training data collection is sequentially enriched and dynamic adaptation of the classifier<br />
is required. An auxiliary function that incorporates some desired characteristics in order to provide an upper bound of the objective<br />
function which summarizes the multiclass classification task has been designed and the global minimizer for the enriched dataset is<br />
found using a warm start algorithm, since faster convergence is expected when starting from the previous global minimum. Experimental<br />
evidence on two data collections verified that our method is faster than retraining the classifier from scratch, while the<br />
achieved classification accuracy is maintained at the same level.<br />
- 305 -
13:30-16:30, Paper ThBCT8.30<br />
User Adaptive Clustering of a Large Image Database<br />
Saboorian, Mohammad Mehdi, Sharif Univ. of Tech.<br />
Jamzad, Mansour, Sharif Univ. of Tech.<br />
Rabiee, Hamid Reza, Sharif Univ. of Tech.<br />
Searching large image databases is a time consuming process when done manually. Current CBIR methods mostly rely<br />
on training data in specific domains. When source and domain of images are unknown, unsupervised methods provide<br />
better solutions. In this work, we use a hierarchical clustering scheme to group images in an unknown and large image<br />
database. In addition, the user should provide the current class assignment of a small number of images as a feedback to<br />
the system. The proposed method uses this feedback to guess the number of required clusters, and optimizes the weight<br />
vector in an iterative manner. In each step, after modification of the weight vector, the images are reclustered. We compared<br />
our method with a similar approach (but without users feedback) named CLUE. Our experimental results show that by<br />
considering the user feedback, the accuracy of clustering is considerably improved.<br />
13:30-16:30, Paper ThBCT8.31<br />
Alignment-Based Similarity of People Trajectories using Semi-Directional Statistics<br />
Calderara, Simone, Univ. of Modena and Reggio Emilia<br />
Prati, Andrea, Univ. of Modena and Reggio Emilia<br />
Cucchiara, Rita, Univ. of Modena and Reggio Emilia<br />
This paper presents a method for comparing people trajectories for video surveillance applications, based on semi-directional<br />
statistics. In fact, the modelling of a trajectory as a sequence of angles, speeds and time lags, requires the use of a<br />
statistical tool capable to jointly consider periodic and linear variables. Our statistical method is compared with two stateof-the-art<br />
methods.<br />
13:30-16:30, Paper ThBCT8.32<br />
Contact Lens Detection based on Weighted LBP<br />
Zhang, Hui, Shanghai Inst. of Tech.<br />
Sun, Zhenan, Chinese Acad. of Sciences<br />
Tan, Tieniu, Chinese Acad. of Sciences<br />
Spoof detection is a critical function for iris recognition because it reduces the risk of iris recognition systems being forged.<br />
Despite various counterfeit artifacts, cosmetic contact lens is one of the most common and difficult to detect. In this paper,<br />
we proposed a novel fake iris detection algorithm based on improved LBP and statistical features. Firstly, a simplified<br />
SIFT descriptor is extracted at each pixel of the image. Secondly, the SIFT descriptor is used to rank the LBP encoding<br />
sequence. Then, statistical features are extracted from the weighted LBP map. Lastly, SVM classifier is employed to<br />
classify the genuine and counterfeit iris images. Extensive experiments are conducted on a database containing more than<br />
5000 fake iris images by wearing 70 kinds of contact lens, and captured by four iris devices. Experimental results show<br />
that the proposed method achieves state-of-the-art performance in contact lens spoof detection.<br />
13:30-16:30, Paper ThBCT8.33<br />
Integrating ILSR to Bag-of-Visual Words Model based on Sparse Codes of SIFT Features Representations<br />
Wu, Lina, Univ. Beijing<br />
Luo, Siwei, Univ. Beijing<br />
Sun, Wei, Beijing Jiaotong Univ.<br />
Zheng, Xiang, Beijing Jiaotong Univ.<br />
In computer vision, the bag-of-visual words(BOV) approach has been shown to yield state-of-the-art results. To improve<br />
BOV model, we use sparse codes of SIFT features instead of previous vector quantization (VQ) such as k-means, due to<br />
more quantization errors of VQ. And as local features in most categories have spatial dependence in real world, we use<br />
neighbor features of one local feature as its implicit local spatial relationship (ILSR). This paper proposes an object categorization<br />
algorithm which integrate implicit local spatial relationship with its appearance features based on sparse codes<br />
of SIFT to form two sources of information for categorization. The algorithm is applied in Caltech-101 and Caltech-256<br />
datasets to validate its effectiveness. The experimental results show its good performance.<br />
- 306 -
13:30-16:30, Paper ThBCT8.34<br />
Heteroscedastic Multilinear Discriminant Analysis for Face Recognition<br />
Safayani, Mehran, Sharif Univ. of Tech.<br />
Manzuri Shalmani, Mohammad Taghi, Sharif Univ. of Tech.<br />
There is a growing attention in subspace learning using tensor-based approaches in high dimensional spaces. In this paper<br />
we first indicate that these methods suffer from the Heteroscedastic problem and then propose a new approach called Heteroscedastic<br />
Multilinear Discriminant Analysis (HMDA). Our method can solve this problem by utilizing the pairwise<br />
chernoff distance between every pair of clusters with the same index in different classes. We also show that our method is<br />
a general form of Multilinear Discriminant Analysis (MDA) approach. Experimental results on CMU-PIE, AR and AT&T<br />
face databases demonstrate that the proposed method always perform better than MDA in term of classification accuracy.<br />
13:30-16:30, Paper ThBCT8.35<br />
Applying Error Correcting Output Coding to Enhance the Convolutional Neural Network for Target Detection and<br />
Pattern Recognition<br />
Deng, Huiqun, Concordia Univ.<br />
Stathopoulos, George, Concordia Univ.<br />
Suen, Ching Y.<br />
This paper views target detection and pattern recognition as a kind of communications problem and applies error-correcting<br />
coding to the outputs of a convolutional neural network to improve the accuracy and reliability of detection and recognition<br />
of targets. The outputs of the convolutional neural network are designed according to codewords with maximum Hamming<br />
distances. The effects of the codewords on the performance of the convolutional neural network in target detection and<br />
recognition are then investigated. Images of hand-written digits and printed English letters and symbols are used in the<br />
experiments. Results show that error-correcting output coding provides the neural network with more reliable decision<br />
rules and enables it to perform more accurate and reliable detection and recognition of targets. Moreover, our error-correcting<br />
output coding can reduce the number of neurons required, which is highly desirable in efficient implementations.<br />
13:30-16:30, Paper ThBCT8.36<br />
Action Recognition using Direction Models of Motion<br />
Benabbas, Yassine, LIFL<br />
Lablack, Adel, UMR USTL/CNRS 8022<br />
Ihaddadene, Nacim, UMR USTL/CNRS 8022<br />
Djeraba, Chabane, UMR USTL/CNRS 8022<br />
In this paper, we present an effective method for human action recognition using statistical models based on optical flow<br />
orientations. We compute a distribution mixture over motion orientations at each spatial location of the video sequence.<br />
The set of estimated distributions constitutes the direction model, which is used as a mid-level feature for the video sequence.<br />
We recognize human actions using a distance metric to compare the direction model of a query sequence with the<br />
direction models of training sequences. The experimentations have been performed on standard datasets and have showed<br />
promising results.<br />
13:30-16:30, Paper ThBCT8.37<br />
Boolean Combination of Classifiers in the ROC Space<br />
Khreich, Wael, École de Tech. Supérieure<br />
Granger, Eric, École de Tech. Supérieure<br />
Miri, Ali, Univ. of Ottawa<br />
Sabourin, R., École de Tech. Supérieure<br />
Using Boolean AND and OR functions to combine the responses of multiple one- or two-class classifiers in the ROC<br />
space may significantly improve performance of a detection system over a single best classifier. However, techniques<br />
found in literature assume that the classifiers are conditionally independent, and that their ROC curves are convex. These<br />
assumptions are not valid in most real-world applications, where classifiers are designed using limited and imbalanced<br />
training data. A new Iterative Boolean Combination (IBC) technique applies all Boolean functions to combine the ROC<br />
curves produced by multiple classifiers without prior assumptions, and its time complexity is linear according to the<br />
number of classifiers. The results of computer simulations conducted on synthetic and real-world host-based intrusion de-<br />
- 307 -
tection data indicate that combining the responses from multiple HMMs with IBC can achieve a significantly higher level<br />
of performance than with the AND and OR combinations, especially when training data is limited and imbalanced.<br />
13:30-16:30, Paper ThBCT8.38<br />
Stereo-Based Multi-Person Tracking using Overlapping Silhouette Templates<br />
Satake, Junji, Toyohashi Univ. of Tech.<br />
Miura, Jun, Toyohashi Univ. of Tech.<br />
This paper describes a stereo-based person tracking method for a person following robot. Many previous works on person<br />
tracking use laser range finders which can provide very accurate range measurements. Stereo-based systems have also<br />
been popular, but most of them are not used for controlling a real robot. We previously developed a tracking method which<br />
uses depth templates of person shape applied to a dense depth image. The method, however, sometimes failed when complex<br />
occlusions occurred. In this paper, we propose an accurate, stable tracking method using overlapping silhouette templates<br />
which consider how persons overlap in the image. Experimental results show the effectiveness of the proposed<br />
method.<br />
13:30-16:30, Paper ThBCT8.40<br />
Characterising Facial Gender Difference using Fisher-Rao Metric<br />
Ceolin, Simone Regina, Univ. of York<br />
Hancock, Edwin, Univ. of York<br />
The aim in this paper is to explore whether the Fisher-Rao metric can be used to measure different facets of facial shape<br />
estimated from fields of surface normals using the von-Mises Fisher distribution. In particular we aim to characterise the<br />
shape changes due to differences in gender. We make use of the von-Mises Fisher distribution since we are dealing with<br />
surface normal data over the sphere R^2. Finally, we show the results achieved using EAR and Max Planck datasets.<br />
13:30-16:30, Paper ThBCT8.41<br />
On-Line FMRI Data Classification using Linear and Ensemble Classifiers<br />
Plumpton, Catrin Oliver, Bangor Univ.<br />
Kuncheva, Ludmila I., Bangor Univ.<br />
Linden, David E. J., Bangor Univ.<br />
Johnston, Stephen Jaye, Bangor Univ.<br />
The advent of real-time fMRI pattern classification opens many avenues for interactive self-regulation where the brain’s<br />
response is better modelled by multivariate, rather than univariate techniques. Here we test three on-line linear classifiers,<br />
applied to a real fMRI dataset, collected as part of an experiment on the cortical response to emotional stimuli. We propose<br />
a random subspace ensemble as a fast and more accurate alternative to component classifiers. The on-line linear discriminant<br />
classifier (O-LDC) was found to be a better base classifier than the on-line versions of the perceptron and the balanced<br />
winnow.<br />
13:30-16:30, Paper ThBCT8.42<br />
Adaptive Feature and Score Level Fusion Strategy using Genetic Algorithms<br />
Ben Soltana, Wael, Ec. Centrale de Lyon<br />
Ardabilian, Mohsen, Ec. Centrale de Lyon<br />
Chen, Liming, Ec. Centrale de Lyon<br />
Ben Amar, Chokri, Res. Group on Intelligent Machines<br />
Classifier fusion is considered as one of the best strategies for improving performance of general purpose classification systems.<br />
On the other hand, fusion strategy space strongly depends on classifiers, features and data spaces. As the cardinality<br />
of this space is exponential, one needs to resort to a heuristic to find a sub-optimal fusion strategy. In this work, we present<br />
a new adaptive feature and score level fusion strategy (AFSFS) based on adaptive genetic algorithm. AFSFS tunes itself between<br />
feature and matching score level, and improves the final performance over the original on both levels, and as a fusion<br />
method, it does not only contain fusion strategy to combine the most relevant features so as to achieve adequate and optimized<br />
results, but also has the extensive ability to select the most discriminative features. Experiments are provided on the FRGC<br />
database showing that the proposed method produces significantly better results than the baseline fusion methods.<br />
- 308 -
13:30-16:30, Paper ThBCT8.43<br />
Local Binary Pattern-Based Features for Text Identification of Web Images<br />
Jung, Insook, Chonbuk National Univ.<br />
Oh, Il-Seok, Chonbuk National Univ.<br />
We present a method of robustly identifying a text block in complex web images. The method is a MLP (Multi-layer perceptron)<br />
classifier trained on LBP (Local binary patterns), wavelet and shape feature spaces. Especially, we propose adaptive<br />
masks of LBP which responses flexibly to various character sizes. Most of previous works use fixed mask size or<br />
multi level scales by pyramid schemes, which may have weakness in dealing with diverse size of text. Experiments carried<br />
out on 100 web images show promising results.<br />
13:30-16:30, Paper ThBCT8.44<br />
Classification of Polarimetric SAR Images using Evolutionary RBF Neural Networks<br />
Turker, Ince, Izmir Univ. of Ec.<br />
Kiranyaz, Serkan, Tampere Univ. of Tech.<br />
Moncef, Gabbouj, Tampere Univ. of Tech.<br />
This paper proposes an evolutionary RBF network classifier for polar metric synthetic aperture radar ( SAR) images. The<br />
proposed feature extraction process utilizes the full covariance matrix, the gray level co-occurrence matrix (GLCM) based<br />
texture features, and the backscattering power (Span) combined with the H/α/A decomposition, which are projected<br />
onto a lower dimensional feature space using principal component analysis. An experimental study is performed using<br />
the fully polar metric San Francisco Bay data set acquired by the NASA/Jet Propulsion Laboratory Airborne SAR (AIR-<br />
SAR) at L-band to evaluate the performance of the proposed classifier. Classification results (in terms of confusion matrix,<br />
overall accuracy and classification map) compared to the Wish art and a recent NN-based classifiers demonstrate the effectiveness<br />
of the proposed algorithm.<br />
13:30-16:30, Paper ThBCT8.45<br />
On the Use of Median String for Multi-Source Translation<br />
González Rubio, Jesús, Univ. Pol. de Valencia<br />
Casacuberta, Francisco, Univ. Pol. de Valencia<br />
State-of-the-art approaches to multi-source translation involve a multimodal-like process which applies an individual<br />
translation system to each source language. Then, the translations of the individual systems are combined to obtain a consensus<br />
output. We propose to use the (generalised) median string as the consensus output of the individual translation systems.<br />
Different approximations to the median string are studied as well as different approaches to improve the median<br />
string performance when dealing with natural language strings. The proposed approaches were evaluated on the Europarl<br />
corpus, achieving significant improvements in translation quality.<br />
13:30-16:30, Paper ThBCT8.47<br />
A Lip Contour Extraction Method using Localized Active Contour Model with Automatic Parameter Selection<br />
Liu, Xin, Hong Kong Baptist Univ.<br />
Cheung, Yiu-Ming, Hong Kong Baptist Univ.<br />
Li, Meng, Hong Kong Baptist Univ.<br />
Liu, Hailin, Guangdong Univ. of Technology<br />
Lip contour extraction is crucial to the success of a lipreading system. This paper presents a lip contour extraction algorithm<br />
using localized active contour model with the automatic selection of proper parameters. The proposed approach utilizes a<br />
minimum-bounding ellipse as the initial evolving curve to split the local neighborhoods into the local interior region and<br />
the local exterior region, respectively, and then compute the localized energy for evolving and extracting. This method is<br />
robust against the uneven illumination, rotation, deformation, and the effects of teeth and tongue. Experiments show its<br />
promising result in comparison with the existing methods.<br />
13:30-16:30, Paper ThBCT8.48<br />
Multimodal Sleeping Posture Classification<br />
Huang, Weimin, I2R<br />
Phyo Wai, Aung Aung, Inst. for Infocomm Res.<br />
Foo, Siang Fook, Inst. for Infocomm Res.<br />
- 309 -
Biswas, Jit, Inst. for Infocomm Res.<br />
Liou, Kou Juch, Industrial Tech. Res. Inst.<br />
Hsia, C. C., ITRI<br />
Sleeping posture reveals important information for eldercare and patient care, especially for bed ridden patients. Traditionally,<br />
some works address the problem from either pressure sensor or video image. This paper presents a multimodal<br />
approach to sleeping posture classification. Features from pressure sensor map and video image have been proposed in<br />
order to characterize the posture patterns. The spatiotemporal registration of the two modalities has been considered in<br />
the design, and the joint feature extraction and data fusion is presented. Using multi-class SVM, experiment results demonstrate<br />
that the multimodal approach achieves better performance than the approaches using single modal sensing.<br />
13:30-16:30, Paper ThBCT8.49<br />
Exploiting System Knowledge to Improve ECOC Reject Rules<br />
Simeone, Paolo, Univ. of Cassino<br />
Marrocco, Claudio, Univ. of Cassino<br />
Tortorella, Francesco, Univ. of Cassino<br />
Error Correcting Output Coding is a common technique for multiple class classification tasks which decomposes the original<br />
problem in several two-class problems solved through dichotomizers. Such classification system can be improved<br />
with a reject option which can be defined according to the level of information available from the dichotomizers. This<br />
paper analyzes how this knowledge is useful when applying such reject rules. The nature of the outputs, the kind of the<br />
employed classifiers and the knowledge of their loss function are influential details for the improvement of the general<br />
performance of the system. Experimental results on popular benchmark data sets are reported to show the behavior of the<br />
different schemes.<br />
13:30-16:30, Paper ThBCT8.50<br />
Human Smoking Event Detection using Visual Interaction Clues<br />
Wu, Pin, Yuan-Ze University<br />
Hsieh, Jun-Wei, Yuan-Ze University<br />
Cheng, Jiun-Cheng, National Taiwan Ocean Univ.<br />
Cheng, Shyi-Chyi, National Taiwan Ocean Univ.<br />
Tseng, Shau-Yin, Industry Tech. Res. Institute<br />
This paper presents a novel scheme to automatically and directly detect smoking events in video. In this scheme, a colorbased<br />
ratio histogram analysis is introduced to extract the visual clues from appearance interactions between lighted cigarette<br />
and its human holder. The techniques of color re-projection and Gaussian Mixture Models (GMMs) enable the tasks<br />
of cigarette segmentation and tracking over the background pixels. Then, a key problem for event analysis is the nonregular<br />
form of smoking events. Thus, we propose a self-determined mechanism to analyze this suspicious event using<br />
HHM framework. Due to the uncertainties of cigarette size and color, there is no automatic system which can well analyze<br />
human smoking events directly from videos. The proposed scheme is compatible to detect the smoking events of uncertain<br />
actions with various cigarette sizes, colors, and shapes, and has capacity to extend visual analysis to human events of<br />
similar interaction relationship. Experimental results show the effectiveness and real-time performances of our scheme in<br />
smoking event analysis.<br />
13:30-16:30, Paper ThBCT8.51<br />
Malware Detection on Mobile Devices using Distributed Machine Learning<br />
Sharifi Shamili, Ashkan, RWTH Aachen Univ.<br />
Bauckhage, Christian, Fraunhofer IAIS<br />
Alpcan, Tansu, Tech. Univ. Berlin<br />
This paper presents a distributed Support Vector Machine (SVM) algorithm in order to detect malicious software (malware)<br />
on a network of mobile devices. The light-weight system monitors mobile user activity in a distributed and privacy-preserving<br />
way using a statistical classification model which is evolved by training with examples of both normal usage patterns<br />
and unusual behavior. The system is evaluated using the MIT reality mining data set. The results indicate that the<br />
distributed learning system trains quickly and performs reliably. Moreover, it is robust against failures of individual components.<br />
- 310 -
13:30-16:30, Paper ThBCT8.52<br />
Combining Single Class Features for Improving Performance of a Two Stage Classifier<br />
Cordella, Luigi P., Univ. di Napoli Federico II<br />
De Stefano, Claudio, Univ. of Cassino<br />
Fontanella, Francesco, Univ. of Cassino<br />
Marrocco, Cristina, Univ. of Cassino<br />
Scotto Di Freca, Alessandra, Univ. of Cassino<br />
We propose a feature selection—based approach for improving classification performance of a two stage classification<br />
system in contexts where a high number of features is involved. A problem with a set of N classes is subdivided into a set<br />
of N two class problems. In each problem, a GA—based feature selection algorithm is used for finding the best subset of<br />
features. These subsets are then used for training N classifiers. In the classification phase, unknown samples are given in<br />
input to each of the trained classifiers by using the corresponding subspace. In case of conflicting responses, the sample<br />
is sent to a suitably trained supplementary classifier. The proposed approach has been tested on a real world dataset containing<br />
hyper—spectral image data. The results favourably compare with those obtained by other methods on the same<br />
data.<br />
13:30-16:30, Paper ThBCT8.53<br />
The Rex Leopold II Model: Application of the Reduced Set Density Estimator to Human Categorization<br />
De Schryver, Maarten, Ghent Univ.<br />
Roelstraete, Bjorn, Ghent Univ.<br />
Reduction techniques are important tools in machine learning and pattern recognition. In this article, we demonstrate how<br />
a kernel-based density estimator can be used as a tool for understanding human category representation. Despite the dominance<br />
of exemplar models of categorization, there is still ambiguity about the number of exemplars stored in memory.<br />
Here, we illustrate that by omitting exemplars categorization performance is not affected.<br />
13:30-16:30, Paper ThBCT8.54<br />
A Hybrid Method for Feature Selection based on Mutual Information and Canonical Correlation Analysis<br />
Sakar, Cemal Okan, Bahcesehir Univ.<br />
Kursun, Olcay, Istanbul Univ.<br />
Mutual Information (MI) is a classical and widely used dependence measure that generally can serve as a good feature selection<br />
algorithm. However, under-sampled classes or rare but certain relations are overlooked by this measure, which can<br />
result in missing relevant features that could be very predictive of variables of interest, such as certain phenotypes or disorders<br />
in biomedical research, rare but dangerous factors in ecology, intrusions in network systems, etc. On the other hand,<br />
Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence<br />
but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the<br />
amount of information (entropy) of the dependence. In this paper, we propose Predictive Mutual Information (PMI), a hybrid<br />
measure of relevance not only is based on MI but also accounts for predictability of signals from one another as in<br />
KCCA. We show that PMI has more improved feature detection capability than MI and KCCA, especially in catching<br />
suspicious coincidences that are rare but potentially important not only for subsequent experimental studies but also for<br />
building computational predictive models which is demonstrated on two toy datasets and a real intrusion detection system<br />
dataset.<br />
13:30-16:30, Paper ThBCT8.55<br />
Speech Magnitude-Spectrum Information-Entropy (MSIE) for Automatic Speech Recognition in Noisy Environments<br />
Nolazco-Flores, Juan A., Inst. Tecnológico y de Estudios Superiores de Monterrey<br />
Aceves-López, Roberto A., Inst. Tecnológico y de Estudios Superiores de Monterrey<br />
García-Perera, L. Paola, Inst. Tecnológico y de Estudios Superiores de Monterrey<br />
The Magnitude-Spectrum Information-Entropy (MSIE) of the speech signal is presented as an alternative representation<br />
of the speech that can be used to mitigate the mismatch between training and testing conditions. The speech-magnitude<br />
spectrum is considered as a random variable from which entropy coefficients can be calculated for each frame. By concatenating<br />
these entropic coefficients to its corresponding MFCC vector, then calculating the dynamic coefficients, and<br />
- 311 -
the results show an improvement compared to a baseline. The MSIE effectiveness was tested under the Aurora 2 database<br />
audio ?les. When trained in clean speech, the experimental results obtained by the MSIE concatenated to the MFCC outperform<br />
the results obtained with the MFCC baseline system for selected types of noises at different SNRs. For this selected<br />
group of noises the overall improvement performance in the range 0 dB to 20 dB for the Aurora 2 database is of 15.06%.<br />
13:30-16:30, Paper ThBCT8.56<br />
Unsupervised Image Retrieval with Similar Lighting Conditions<br />
Serrano Talamantes, Jose Felix, Centro De Investigacion en Computacion<br />
Aviles, Carlos, Univ. Autónoma Metropolitana-Azcapotzalco México<br />
Sossa, Humberto, Center for Computing Res. CIC-IPN<br />
Villegas, Juan, Univ. Autónoma Metropolitana-Azcapotzalco México<br />
Olague, Gustavo, Centro de Investiación Científica y de Educación Superior<br />
In this work a new method to retrieve images with similar lighting conditions is presented. It is based on automatic clustering<br />
and automatic indexing. Our proposal belongs to Content Based Image Retrieval (CBIR) category. The goal is to<br />
retrieve from a database, images (by their content) with similar lighting conditions. When we look at images taken from<br />
outdoor scenes, much of the information perceived depends on the lighting conditions. The proposal combines fixed and<br />
random extracted points for feature extraction. The describing features are the mean, the standard deviation and the homogeneity<br />
(from the co-occurrence matrix) of a sub-image extracted from the three color channels: (H, S, I). A K-MEANS<br />
algorithm and a 1-NN classifier are used to build an indexed database of 300 images in order to retrieve images with<br />
similar lighting conditions applied on sky regions such as: sunny, partially cloudy and completely cloudy. One of the advantages<br />
of the proposal is that we do not need to manually label the images for their retrieval. The performance of our<br />
framework is demonstrated through several experimental results, including the improved rates for images retrieval with<br />
similar lighting conditions. A comparison with another similar work is also presented.<br />
13:30-16:30, Paper ThBCT8.57<br />
Lattice-Based Anomaly Rectification for Sport Video Annotation<br />
Khan, Aftab, Univ. of Surrey<br />
Windridge, David, Univ. of Surrey<br />
De Campos, Teofilo, Univ. of Surrey<br />
Anomaly detection has received much attention within the literature as a means of determining, in an unsupervised manner,<br />
whether a learning domain has changed in a fundamental way. This may require continuous adaptive learning to be abandoned<br />
and a new learning process initiated in the new domain. A related problem is that of anomaly rectification; the adaptation<br />
of the existing learning mechanism to the change of domain. As a concrete instantiation of this notion, the current<br />
paper investigates a novel lattice-based HMM induction strategy for arbitrary court-game environments. We test (in real<br />
and simulated domains) the ability of the method to adapt to a change of rule structures going from tennis singles to tennis<br />
doubles. Our long term aim is to build a generic system for transferring game-rule inferences.<br />
13:30-16:30, Paper ThBCT8.58<br />
An Ensemble of Classifiers Approach to Steganalysis<br />
Bayram, Sevinc, Pol. Inst. of NYU<br />
Dirik, Ahmet Emir, Pol. Inst. of NYU<br />
Sencar, Husrev Taha, TOBB Univ. of Ec. and Tech.<br />
Memon, Nasir, Pol. Inst. of New York Univ.<br />
Most work on steganalysis, except a few exceptions, have primarily focused on providing features with high discrimination<br />
power without giving due consideration to issues concerning practical deployment of steganalysis methods. In this work,<br />
we focus on machine learning aspect of steganalyzer design and utilize a hierarchical ensemble of classifiers based approach<br />
to tackle two main issues. Firstly, proposed approach provides a workable and systematic procedure to incorporate several<br />
steganalyzers together in a composite steganalyzer to improve detection performance in a scalable and cost-effective manner.<br />
Secondly, since the approach can be readily extended to multi-class classification it can also be used to infer the<br />
steganographic technique deployed in generation of a stego-object. We provide results to demonstrate the potential of the<br />
proposed approach.<br />
- 312 -
13:30-16:30, Paper ThBCT8.59<br />
Discriminating Intended Human Objects in Consumer Videos<br />
Uegaki, Hiroshi, Osaka Univ.<br />
Nakashima, Yuta, Osaka Univ.<br />
Babaguchi, Noboru, Osaka Univ.<br />
In a consumer video, there are not only intended objects, which are intentionally captured by the camcorder user, but also<br />
unintended objects, which are accidentally framed-in. Since the intended objects are essential to present what the camcorder<br />
user wants to express in the video, discriminating the intended objects from the unintended objects are beneficial for many<br />
applications, e.g., video summarization, privacy protection, and so forth. In this paper, focusing on human objects, we<br />
propose a method for discriminating the intended human objects from the unintended human objects. We evaluated the<br />
proposed method using 10 videos captured by 3 camcorder users. The results demonstrate that the proposed method successfully<br />
discriminates the intended human objects with 0.45 of recall and 0.80 of precision.<br />
13:30-16:30, Paper ThBCT8.60<br />
Detecting Human Activity Profiles with Dirichlet Enhanced Inhomogeneous Poisson Processes<br />
Shimosaka, Masamichi, The Univ. of Tokyo<br />
Ishino, Takahito, The Univ. of Tokyo<br />
Noguchi, Hiroshi, The Univ. of Tokyo<br />
Mori, Taketoshi, The Univ. of Tokyo<br />
Sato, Tomomasa, The Univ. of Tokyo<br />
This paper describes an activity pattern mining method via inhomogeneous Poisson point processes (IPPPs) from timeseries<br />
of count data generated in behavior detection by pyroelectric sensors. IPPP reflects the idea that typical human activity<br />
is rhythmic and periodic. We also focus on the idea that activity patterns are affected by exogenous phenomena,<br />
such as the day of the week, and weather condition. Because single IPPP could not tackle this idea, Dirichlet process mixtures<br />
(DPM) are leveraged in order to discriminate and discover different activity patterns caused by such factors. The use<br />
of DPM leads us to discover the appropriate number of the typical daily patterns automatically. Experimental result using<br />
long-term count data shows that our model successfully and efficiently discovers typical daily patterns.<br />
13:30-16:30, Paper ThBCT8.61<br />
I-FAC: Efficient Fuzzy Associative Classifier for Object Classes in Images<br />
Mangalampalli, Ashish, International Inst. of Information Tech. Hyderabad, India<br />
Chaoji, Vineet, Yahoo! Inc<br />
Sanyal, Subhajit, Yahoo! Lab. Bangalore, India<br />
We present I-FAC, a novel fuzzy associative classification algorithm for object class detection in images using interest<br />
points. In object class detection, the negative class CN is generally vague (CN = U CP ; where U and CP are the universal<br />
and positive classes respectively). But, image classification necessarily requires both positive and negative classes for<br />
training. I-FAC is a single class image classifier that relies only on the positive class for training. Because of its fuzzy<br />
nature, I-FAC also handles polysemy and synonymy (common problems in most crisp (non-fuzzy) image classifiers) very<br />
well. As associative classification leverages frequent patterns mined from a given dataset, its performance as adjudged<br />
from its false-positive-rate(FPR)-versus-recall curve is very good, especially at lower FPRs when its recall is even better.<br />
IFAC has the added advantage that the rules used for classification have clear semantics, and can be comprehended easily,<br />
unlike other classifiers, such as SVM, which act as black-boxes. From an empirical perspective (on standard public<br />
datasets), the performance of I-FAC is much better, especially at lower FPRs, than that of either bag-of-words (BOW) or<br />
SVM (both using interest points).<br />
13:30-16:30, Paper ThBCT8.62<br />
Audio-Visual Data Fusion using a Particle Filter in the Application of Face Recognition<br />
Steer, Michael, Otto-von-guericke-Univ. Magdeburg<br />
This paper describes a methodology by which audio and visual data about a scene can be fused in a meaningful manner<br />
in order to locate a speaker in a scene. This fusion is implemented within a Particle Filter such that a single speaker can<br />
be identified in the presence of multiple visual observations. The advantages of this fusion are that weak sensory data<br />
from either modality can be reinforced and the presence of noise can be reduced.<br />
- 313 -
13:30-16:30, Paper ThBCT8.63<br />
The Problem of Fragile Feature Subset Preference in Feature Selection Methods and a Proposal of Algorithmic<br />
Workaround<br />
Somol, Petr, Inst. of Information Theory and Automation, Czech<br />
Grim, Jiří, Inst. of Information Theory and Automation<br />
Pudil, Pavel, Prague Univ. of Ec.<br />
We point out a problem inherent in the optimization scheme of many popular feature selection methods. It follows from<br />
the implicit assumption that higher feature selection criterion value always indicates more preferable subset even if the<br />
value difference is marginal. This assumption ignores the reliability issues of particular feature preferences, over-fitting<br />
and feature acquisition cost. We propose an algorithmic extension applicable to many standard feature selection methods<br />
allowing better control over feature subset preference. We show experimentally that the proposed mechanism is capable<br />
of reducing the size of selected subsets as well as improving classifier generalization.<br />
ThBCT9 Lower Foyer<br />
Signal, Speech, and Image Processing Poster Session<br />
Session chair: Ariki, Yasuo (Kobe Univ.)<br />
13:30-16:30, Paper ThBCT9.1<br />
Removing Partial Occlusion from Blurred Thin Occluders<br />
Mccloskey, Scott, McGill Univ. Honeywell<br />
Langer, Michael, McGill Univ.<br />
Siddiqi, Kaleem, McGill Univ.<br />
We present a method to remove partial occlusion that arises from out-of-focus thin foreground occluders such as wires,<br />
branches, or a fence. Such partial occlusion causes the irradiance at a pixel to be a weighted sum of the radiances of a<br />
blurred foreground occluder and that of the background. The result is that the background component has lower contrast<br />
than it would if seen without the occluder. In order to remove the contribution of the foreground in such regions, we characterize<br />
the position and size of the occluder in a narrow aperture image. In subsequent images with wider apertures, we<br />
use this characterization to remove the contribution of the foreground, thereby restoring contrast in the background. We<br />
demonstrate our method on real camera images without assuming that the background is static.<br />
13:30-16:30, Paper ThBCT9.2<br />
A New Approach to Aircraft Surface Inspection based on Directional Energies of Texture<br />
Mumtaz, Mustafa, National Univ. of Sciences and Tech.<br />
Bin Mansoor, Atif, National Univ. of Sciences and Tech.<br />
Masood, Hassan, National Univ. of Sciences and Tech.<br />
Non Destructive Inspections (NDI) plays a vital role in aircraft industry as it determines the structural integrity of aircraft<br />
surface and material characterization. The existing NDI methods are time consuming, we propose a new NDI approach<br />
using Digital Image Processing that has the potential to substantially decrease the inspection time. The aircraft imagery is<br />
analyzed by two methods i.e Contourlet Transform (CT) and Discrete Cosine Transform (DCT). With the help of Contourlet<br />
Transform the two dimensional (2-D) spectrum is divided into fine slices, using iterated directional filter banks. Next, directional<br />
energy components for each block of the decomposed subband outputs are computed. These energy values are<br />
used to distinguish between the crack and scratch images using the Dot Product classifier. In next approach, the aircraft<br />
imagery is decomposed into high and low frequency components using DCT and the first order moment is determined to<br />
form feature vectors. A correlation based approach is then used for distinction between crack and scratch surfaces. A comparative<br />
examination between the two techniques on a database of crack and scratch images revealed that texture analysis<br />
using the combined transform based approach gave the best results by giving an accuracy of 96.6% for the identification<br />
of crack surfaces and 98.3% for scratch surfaces.<br />
13:30-16:30, Paper ThBCT9.3<br />
A Generalized Anisotropic Diffusion for Defect Detection in Low-Contrast Surfaces<br />
Chao, Shin-Min, Utechzone Co. Ltd.<br />
Tsai, Du-Ming, Yuan-Ze Univ.<br />
Li, Wei-Chen, Yuan-Ze Univ.<br />
Chiu, Wei-Yao, Yuan-Ze Univ.<br />
- 314 -
In this paper, an anisotropic diffusion model with a generalized diffusion coefficient function is presented for defect detection<br />
in low-contrast surface images and, especially, aims at material surfaces found in liquid crystal display (LCD)<br />
manufacturing. A defect embedded in a low-contrast surface image is extremely difficult to detect because the intensity<br />
difference between unevenly-illuminated background and defective regions are hardly observable. The proposed anisotropic<br />
diffusion model provides a generalized diffusion mechanism that can flexibly change the curve of the diffusion coefficient<br />
function. It adaptively carries out a smoothing process for faultless areas and performs a sharpening process for defect<br />
areas in an image. An entropy criterion is proposed as the performance measure of the diffused image and then a stochastic<br />
evolutionary computation algorithm, particle swarm optimization (PSO), is applied to automatically determine the best<br />
parameter values of the generalized diffusion coefficient function. Experimental results have shown that the proposed<br />
method can effectively and efficiently detect small defects in low-contrast surface images.<br />
13:30-16:30, Paper ThBCT9.4<br />
Impact of Vector Ordering Strategies on Morphological Unmixing of Remotely Sensed Hyperspectral Images<br />
Plaza, Antonio, Univ. of Extremadura<br />
Hyper spectral imaging is a new technique in remote sensing that generates hundreds of images, corresponding to different<br />
wavelength channels, for the same area on the surface of the Earth. In previous work, we have explored the application of<br />
morphological operations to integrate both spatial and spectral responses in hyper spectral data analysis. These operations<br />
rely on ordering pixel vectors in spectral space, but there is no unambiguous means of defining the minimum and maximum<br />
values between two vectors of more than one dimension. Our original contribution in this paper is to examine the impact<br />
of different vector ordering strategies on the definition of multi-channel morphological operations. Our focus is on morphological<br />
unmixing, which decomposes each pixel vector in the hyper spectral scene into a combination of pure spectral<br />
signatures (called end members) and their associated abundance fractions, allowing sub-pixel characterization. Experiments<br />
are conducted using real hyper spectral data sets collected by NASA/JPL’s Airborne Visible Infra-Red Imaging Spectrometer<br />
(AVIRIS) system.<br />
13:30-16:30, Paper ThBCT9.5<br />
A Recursive and Model-Constrained Region Splitting Algorithm for Cell Clump Decomposition<br />
Xiong, Wei, Inst. for Infocomm Res. A-STAR<br />
Ong, Sim Heng, National Univ. of Singapore<br />
Lim, Joo-Hwee, Inst. for Infocomm Res.<br />
Decomposition of cells in clumps is a difficult segmentation task requiring region splitting techniques. Techniques that do<br />
not employ prior shape constraints usually fail to achieve accurate segmentation. Those using shape constraints are unable<br />
to cope with large clumps and occlusions. In this work, we propose a model-constrained region splitting algorithm for cell<br />
clump decomposition. We build the cell model using joint probability distribution of invariant shape features. The shape<br />
model, the contour smoothness and the gradient information along the cut are used to optimize the splitting in a recursive<br />
manner. The short cut rule is also adopted as a strategy to speed up the process. The algorithm performs well in validation<br />
experiments using 60 images with 4516 cells and 520 clumps.<br />
13:30-16:30, Paper ThBCT9.6<br />
Bounding-Box based Segmentation with Single Min-Cut using Distant Pixel Similarity<br />
Pham, Viet-Quoc, The Univ. of Tokyo<br />
Takahashi, Keita, The Univ. of Tokyo<br />
Naemura, Takeshi, The Univ. of Tokyo<br />
This paper addresses the problem of interactive image segmentation with a user-supplied object bounding box. The underlying<br />
problem is the classification of pixels into foreground and background, where only background information is<br />
provided with sample pixels. Many approaches treat appearance models as an unknown variable and optimize the segmentation<br />
and appearance alternatively, in an expectation maximization manner. In this paper, we describe a novel approach<br />
to this problem: the objective function is expressed purely in terms of the unknown segmentation and can be optimized<br />
using only one minimum cut calculation. We aim to optimize the trade-off of making the foreground layer as large as possible<br />
while keeping the similarity between the foreground and background layers as small as possible. This similarity is<br />
formulated using the similarities of distant pixel pairs. We evaluated our algorithm on the GrabCut dataset and demonstrated<br />
that high-quality segmentations were attained at a fast calculation speed.<br />
- 315 -
13:30-16:30, Paper ThBCT9.7<br />
Image Retargeting in Compressed Domain<br />
Murthy, O.v. Ramana, Nanyang Tech. Univ.<br />
Muthuswamy, Karthik, Nanyang Tech. Univ.<br />
Rajan, Deepu, Nanyang Tech. Univ.<br />
Chia, Liang-Tien, Nanyang Tech. Univ.<br />
A simple algorithm for image retargeting in the compressed domain is proposed. Most existing retargeting algorithms<br />
work directly in the spatial domain of the raw image. Here, we work on the DCT coefficients of a JPEG-compressed image<br />
to generate a gradient map that serves as an importance map to help identify those parts in the image that need to be<br />
retained during the retargeting process. Each 8x8 block of DCT coefficients is scaled based on the least importance value.<br />
Retargeting can be done both in the horizontal and vertical directions with the same framework. We also illustrate image<br />
enlargement using the same method. Experimental results show that the proposed algorithm produces less distortion in<br />
the retargeted image compared to some other algorithms reported recently.<br />
13:30-16:30, Paper ThBCT9.8<br />
Progressive MAP-Based Deconvolution with Pixel-Dependent Gaussian Prior<br />
Tanaka, Masayuki, Tokyo Inst. of Tech.<br />
Kanda, Takafumi, Tokyo Inst. of Tech.<br />
Okutomi, Masatoshi,<br />
A deconvolution is a fundamental technique and used in various vision applications. A maximum a posteriori estimation<br />
is known as a powerful tool. In this paper, we propose a progressive MAP-based deconvolution algorithm with a pixel dependent<br />
Gaussian image prior. In the proposed algorithm, a mean and a variance for each pixel are adaptively estimated.<br />
Then, the mean and the variance are progressively updated. We experimentally show that the proposed algorithm is comparable<br />
to the state-of-the-art algorithms in the case that the true point spread function (PSF) is used for the deconvolution,<br />
and that the proposed algorithm outperforms in the non-true PSF case.<br />
13:30-16:30, Paper ThBCT9.9<br />
A Fast Image Inpainting Method based on Hybrid Similarity-Distance<br />
Liu, Jie, Chinese Acad. of Sciences<br />
Zhang, Shuwu, Chinese Acad. of Sciences<br />
Yang, Wuyi, Chinese Acad. of Sciences<br />
Li, Heping, Chinese Acad. of Sciences<br />
A fast image in painting method based on hybrid similarity-distance is proposed in this paper. In Criminisi et al.’s work<br />
[1], similarity distance are not reliable enough in many cases and the algorithm performs inefficiently. To solve these problems,<br />
we propose a new searching strategy to accelerate the algorithm. In addition, we modify the confidence-updating<br />
rule to make more reasonable the distributions of the confidences in source region. Besides, taking account of the stationarity<br />
of texture and the reliability of the source regions, we present a hybrid similarity-distance, which combines the<br />
distance in color space with the distance in spatial space by weight coefficients related to the confidence value. A more<br />
reasonable patch will be found out by this hybrid similarity-distance. The experiments verify that the proposed method<br />
yields qualitative improvements compared to Criminisi et al.’s work [1].<br />
13:30-16:30, Paper ThBCT9.10<br />
Reversible Integer 2-D Discrete Fourier Transform by Control Bits<br />
Dursun, Serkan, Univ. of Texas at San Antonio<br />
Grigoryan, Artyom M., Univ. of Texas at San Antonio<br />
This paper describes the 2-D reversible integer discrete Fourier transform (RiDFT), which is based on the concept of the<br />
paired representation of the 2-D image, which is referred to as the unique 2-D frequency and 1-D time representation. The<br />
2-D DFT of the image is split into a minimum set of short transforms, and the image is represented as a set of 1-D signals.<br />
The paired 2-DDFT involves a few operations of multiplication that can be approximated by integer transforms, such as<br />
one-point transforms with one control bit. 24 control bits are required to perform the 8x8-point RiDFT, and 264 control<br />
bits for the 16x16-point 2-D RiDFT of real inputs. The fast paired method of calculating the 1-D DFT is used. The computational<br />
complexity of the proposed 2-D RiDFTs is comparative with the complexity of the fast 2-D DFT.<br />
- 316 -
13:30-16:30, Paper ThBCT9.12<br />
Image Inpainting based on Local Optimisation<br />
Zhou, Jun, National ICT Australia<br />
Robles-Kelly, Antonio, National ICT Australia<br />
In this paper, we tackle the problem of image in painting which aims at removing objects from an image or repairing damaged<br />
pictures by replacing the missing regions using the information in the rest of the scene. The image in painting method<br />
proposed here builds on an exemplar-based perspective so as to improve the local consistency of the in painted region.<br />
This is done by selecting the optimal patch which maximises the local consistency with respect to abutting candidate<br />
patches. The similarity computation generates weights based upon an edge prior and the structural differences between in<br />
painting exemplar candidates. This treatment permits the generation of an in painting sequence based on a list of factors.<br />
The experiments show that the proposed method delivers a margin of improvement as compared to alternative methods.<br />
13:30-16:30, Paper ThBCT9.13<br />
Image Processing based Approach for Retrieving Data from a Seismic Section in Bitmap Format<br />
Chevion, Dan, IBM Res. Lab. in Haifa<br />
Navon, Yaakov, IBM<br />
Ramm, Dov, former Res. stuff member of IBM Israel Res. Lab.<br />
A new method for retrieving seismic data from a seismic section provided in a bitmap format is described. The method is<br />
based on image processing techniques and includes creating a grey level image of a seismic section, processing the grey<br />
level image (by integration, filtering, etc.) and then reconstructing digitized values of individual seismic traces_ from the<br />
resulting image, thus ending with the data in standard SEG-Y format<br />
13:30-16:30, Paper ThBCT9.14<br />
Visible Entropy: A Measure for Image Visibility<br />
Hou, Zujun, Inst. for Infocomm Res.<br />
Yau, Wei-Yun, Inst. for Infocomm Res.<br />
Image visibility is a fundamental issue in the field of computer vision. This paper investigates the connection between<br />
histogram and image visibility, where the concept of entropy is employed to depict the information content of the histogram.<br />
It turns out that image visibility is more dependent on the observed intensity levels with higher frequencies and the distribution<br />
of their locations in the range of intensity levels. With this in mind, the concept of visible entropy is proposed. The<br />
usefulness of the proposed visibility measure has been evaluated using a number of realistic images.<br />
13:30-16:30, Paper ThBCT9.15<br />
Research the Performance of a Recursive Algorithm of the Local Discrete Wavelet Transform<br />
Kopenkov, Vasiliy, RAS<br />
Myasnikov, Vladislav, RAS<br />
We experimentally compare the performance of two fast algorithms for computing the local discrete wavelet transform of<br />
one-dimensional signals: the Mallatalgorithm and a recursive algorithm. For the comparison purposes, we analyze Haar<br />
wavelet bases for one and two-dimensional signals, an extension of the Haar basis with the scale coefficient 3, and biorthogonal<br />
polynomial spline wavelets with finite support.<br />
13:30-16:30, Paper ThBCT9.16<br />
Auditory Features Revisited for Robust Speech Recognition<br />
Harte, Naomi, Trinity Coll. Dublin<br />
Kelly, Finnian, Trinity Coll. Dublin<br />
Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most<br />
promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings<br />
with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC).<br />
Standard Mel-Frequency Cepstral Coefficients (MFCC) are also tested for reference. The performance of all features is<br />
reported on the TIMIT database using a HMM-based recogniser. It is found that the PNCC features outperform MFCC in<br />
- 317 -
clean conditions and are most robust to noise. ZCPA performance is shown to vary widely with filter bank configuration<br />
and frame length. The ZCPA performance is poor in clean conditions but is the least affected by white noise. PNCC is<br />
shown to be the most promising new feature set for robust ASR in recent years.<br />
13:30-16:30, Paper ThBCT9.17<br />
Sparse Representation for Speaker Identification<br />
Naseem, Imran, The Univ. of Western Australia<br />
Togneri, Roberto, The Univ. of Western Australia<br />
Bennamoun, Mohammed, The Univ. of Western Australia<br />
We address the closed-set problem of speaker identification by presenting a novel sparse representation classification algorithm.<br />
We propose to develop an over complete dictionary using the GMM mean super vector kernel for all the training<br />
utterances. A given test utterance corresponds to only a small fraction of the whole training database. We therefore propose<br />
to represent a given test utterance as a linear combination of all the training utterances, thereby generating a naturally<br />
sparse representation. Using this sparsity, the unknown vector of coefficients is computed via l1minimization which is<br />
also the sparsest solution [12]. Ideally, the vector of coefficients so obtained has nonzero entries representing the class<br />
index of the given test utterance. Experiments have been conducted on the standard TIMIT [14] database and a comparison<br />
with the state-of-art speaker identification algorithms yields a favorable performance index for the proposed algorithm.<br />
13:30-16:30, Paper ThBCT9.18<br />
Latency in Speech Feature Analysis for Telepresence State Coding<br />
O’Gorman, Lawrence, Alcatel-Lucent Bell Lab.<br />
For video conferencing, there are network bandwidth and screen real-estate constraints that limit the number of user channels.<br />
We propose an intermediate transmission mode that transmits only at events, where these are detected by both audio<br />
and video changes from the short-term signal average. Our objective in this paper is to determine latency until the audio<br />
portion of a single telepresence channel stabilizes. It is this stable signal from which we detect events. We describe a recursive<br />
filter approach for feature determination and experiments on the Switchboard telephone call database. Results<br />
show latency to stable signal of up to 10 seconds. Although events can be detected much more quickly.<br />
13:30-16:30, Paper ThBCT9.19<br />
Automatically Detecting Peaks in Terahertz Time-Domain Spectroscopy<br />
Stephani, Henrike, Fraunhofer ITWM<br />
Jonuscheit, Joachim, Fraunhofer IPM<br />
Robiné, Christoph, Fraunhofer IPM<br />
Heise, Bettina, JKU<br />
To classify spectroscopic measurements it is necessary to have comparable methods of evaluation. In Terahertz (THz)<br />
time-domain spectroscopy, as a new technology, neither the presentation of the data nor the peak detection is standardized<br />
yet. We propose a procedure for automatic peak extraction in THz spectra of chemical compounds. After preprocessing in<br />
the time-domain, we use a variance based algorithm for determining the valid frequency region. We furthermore propose<br />
a baseline correction using simulated THz spectra. We illustrate how this procedure works on the example of hyperspectral<br />
THz measurements of six chemical compounds. Subsequently we propose to use unsupervised classification on the thus<br />
processed data to robustly detect the characteristic peaks of a compound.<br />
13:30-16:30, Paper ThBCT9.20<br />
Iwasawa Decomposition and Computational Riemannian Geometry<br />
Lenz, Reiner, Linköping Univ.<br />
Mochizuki, Rika, Nippon Telegraph and Telephone Corp.<br />
Chao, Jinhui, Chuo Univ.<br />
We investigate several topics related to manifold-techniques for signal processing. On the most general level we consider<br />
manifolds with a Riemannian Geometry. These manifolds are characterized by their inner products on the tangent spaces.<br />
We describe the connection between the symmetric positive-definite matrices defining these inner products and the Cartan<br />
and the Iwasawa decomposition of the general linear matrix groups. This decomposition gives rise to the decomposition<br />
- 318 -
of the inner product matrices into diagonal matrices and orthonormal and into diagonal and upper triangular matrices.<br />
Next we describe the estimation of the inner product matrices from measured data as an optimization process on the homogeneous<br />
space of upper triangular matrices. We show that the decomposition leads to simple forms of partial derivatives<br />
that are commonly used in optimization algorithms. Using the group theoretical parametrization ensures also that all intermediate<br />
estimates of the inner product matrix are symmetric and positive definite. Finally we apply the method to a<br />
problem from psychophysics where the color perception properties of an observer are characterized with the help of color<br />
matching experiments. We will show that measurements from color weak observers require the enforcement of the positive-definiteness<br />
of the matrix with the help of the manifold optimization technique.<br />
13:30-16:30, Paper ThBCT9.21<br />
Rethinking Algorithm Design and Development in Speech Processing<br />
Stadelmann, Thilo, Univ. of Marburg<br />
Wang, Yinghui, Univ. of Marburg<br />
Smith, Matthew, Univ. of Hannover<br />
Ewerth, Ralph, Univ. of Marburg<br />
Freisleben, Bernd, Univ. of Marburg<br />
Speech processing is typically based on a set of complex algorithms requiring many parameters to be specified. When<br />
parts of the speech processing chain do not behave as expected, trial and error is often the only way to investigate the reasons.<br />
In this paper, we present a research methodology to analyze unexpected algorithmic behavior by making (intermediate)<br />
results of the speech processing chain perceivable and intuitively comprehensible by humans. The workflow of the<br />
process is explicated using a real-world example leading to considerable improvements in speaker clustering. The described<br />
methodology is supported by a software toolbox available for download.<br />
13:30-16:30, Paper ThBCT9.22<br />
Phone-Conditioned Suboptimal Wiener Filtering<br />
Gonzalez-Caravaca, Guillermo, Univ. Autonoma de Madrid<br />
Toledano, Doroteo, Univ. Autonoma de Madrid<br />
Puertas, Maria, Univ. Autonoma de Madrid<br />
A novel way of managing the compromise between noise reduction and speech distortion in Wiener filters is presented. It<br />
is based on adjusting the amount of noise reduced, and therefore the speech distortion introduced, on a phone-by-phone<br />
basis. We show empirically that optimal Wiener filters produce different amounts of speech distortion for different phones.<br />
Therefore we propose a phone-conditioned suboptimal Wiener filter that uses different amounts of noise reduction for<br />
each phone, based on a previous estimation of the amount of distortion introduced. Speech recognition results have shown<br />
that phone conditioning suboptimal Wiener filtering can provide almost a 5% additional relative improvement in word<br />
accuracy over comparable optimal Wiener filtering.<br />
13:30-16:30, Paper ThBCT9.23<br />
Geodesic Active Fields on the Sphere<br />
Zosso, Dominique, École Pol. Fédérale de Lausanne<br />
Thiran, Jean-Philippe, École Pol. Fédérale de Lausanne<br />
In this paper, we propose a novel method to register images defined on spherical meshes. Instances of such spherical<br />
images include inflated cortical feature maps in brain medical imaging or images from omni directional cameras. We apply<br />
the Geodesic Active Fields (GAF) framework locally at each vertex of the mesh. Therefore we define a dense deformation<br />
field, which is embedded in a higher dimensional manifold, and minimize the weighted Polyakov energy. While the<br />
Polyakov energy itself measures the hyper area of the embedded deformation field, its weighting allows to account for the<br />
quality of the current image alignment. Iteratively minimizing the energy drives the deformation field towards a smooth<br />
solution of the registration problem. Although the proposed approach does not necessarily outperform state-of-the-art<br />
methods that are tightly tailored to specific applications, it is of methodological interest due to its high degree of flexibility<br />
and versatility.<br />
- 319 -
13:30-16:30, Paper ThBCT9.24<br />
Emotional Speech Classification based on Multi View Characterization<br />
Mahdhaoui, Ammar, Univ. Pierre & Marie Curie<br />
Chetouani, M., Inst. des Systèmes Intelligents et Robotique<br />
Emotional speech classification is a key problem in social interaction analysis. Traditional emotional speech classification<br />
methods are completely supervised and require large amounts of labeled data. In addition, various feature sets are usually<br />
used to characterize the emotional speech signals. Therefore, we propose a new co-training algorithm based on multiview<br />
features. More specifically, we adopt different features for the characterization of speech signals to form different<br />
views for classification, so as to extract as much discriminative information as possible. We then use the co-training algorithm<br />
to classify emotional speech with only few annotations. In this article, a dynamic weighted co-training algorithm is<br />
developed to combine different features (views) to predict the common class variable. Experiments prove the validity and<br />
effectiveness of this method compared to self-training algorithm.<br />
13:30-16:30, Paper ThBCT9.25<br />
Image Inpainting using Structure-Guided Priority Belief Propagation and Label Transformations<br />
Hsin, Heng-Feng, National Chung Cheng Univ.<br />
Leou, Jin-Jang, National Chung Cheng Univ.<br />
Lin, Cheng-Shian, National Chung Cheng Univ.<br />
Chen, Hsuan-Ying, National Chung Cheng Univ.<br />
In this study, an image in painting approach using structure-guided priority belief propagation (BP) and label transformations<br />
is proposed. The proposed approach contains five stages, namely, Markov random field (MRF) node determination,<br />
structure map generation, label set enlargement by label transformations, image in painting by priority-BP optimization,<br />
and overlapped region composition. Based on experimental results obtained in this study, as compared with three comparison<br />
approaches, the proposed approach provides the better image in painting results.<br />
13:30-16:30, Paper ThBCT9.26<br />
Comparison of Syllable/Phone HMM based Mandarin TTS<br />
Duan, Quansheng, Tsinghua Univ.<br />
Kang, Shiyin, Tsinghua Univ.<br />
Shuang, Zhiwei, IBM Res. - China<br />
Wu, Zhiyong, Tsinghua Univ.<br />
Cai, Lianhong, Tsinghua Univ.<br />
Qin, Yong, IBM Res. - China<br />
The performance of HMM-based text to speech (TTS) system is affected by the basic modeling units and the size of<br />
training data. This paper compares two HMM based Mandarin TTS systems using syllable and phone as basic units respectively<br />
with 1000, 3000 and 5000 sentences’ training data. Two female speakers’ corpora are used as training data for<br />
evaluation. For both corpora, the system using syllable as basic unit outperforms the system using phone as basic unit<br />
with 3000 and 5000 sentences’ training data.<br />
13:30-16:30, Paper ThBCT9.27<br />
QRS Complex Detection by Non Linear Thresholding of Modulus Maxima<br />
Jalil, Bushra, Univ. de Bourgogne<br />
Laligant, Olivier, Univ. de Bourgogne<br />
Fauvet, Eric, Univ. de Bourgogne<br />
Beya, Ouadi, Univ. de Bourgogne<br />
Electrocardiogram (ECG) signal is used to analyze the cardiovascular activity in the human body and has a primary role<br />
in the diagnosis of several heart diseases. The QRS complex is the most distinguishable component in the ECG. Therefore,<br />
the accuracy of the detection of QRS complex is crucial to the performance of subsequent machine learning algorithms<br />
for cardiac disease classification. The aim of the present work is to detect QRS wave from ECG signals. Wavelet transform<br />
filtering is applied to the signal in order to remove baseline drift, followed by QRS localization. By using the property of<br />
R peak, having highest and prominent amplitude, we have applied thresholding technique based on the median absolute<br />
deviation(MAD) of modulus maximas to detect the complex. In order to evaluate the algorithm, the analysis has been<br />
- 320 -
done on MIT-BIH Arrhythmia database. The results have been examined and approved by medical doctors.<br />
13:30-16:30, Paper ThBCT9.28<br />
Crossmodal Matching of Speakers using Lip and Voice Features in Temporally Non-Overlapping Audio and Video<br />
Streams<br />
Roy, Anindya, Ec. Pol. Federale de Lausanne<br />
Marcel, Sebastien, Ec. Pol. Federale de Lausanne<br />
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently<br />
or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identification<br />
in a cross-modal scenario, i.e., matching the speaker in an audio recording to the same speaker in a video recording,<br />
where the two recordings have been made during different sessions, using speaker specific information which is<br />
common to both the audio and video modalities. Several recent psychological studies have shown how humans can indeed<br />
perform this task with an accuracy significantly higher than chance. Here we propose two systems which can solve this<br />
task comparably well, using purely pattern recognition techniques. We hypothesize that such systems could be put to practical<br />
use in multimodal biometric and surveillance systems.<br />
13:30-16:30, Paper ThBCT9.29<br />
Image Parsing with a Three-State Series Neural Network Classifier<br />
Seyedhosseini Tarzjani, Seyed Mojtaba, Univ. of Utah<br />
Paiva, Antonio, Univ. of Utah<br />
Tasdizen, Tolga, Univ. of Utah<br />
We propose a three-state series neural network for effective propagation of context and uncertainty information for image<br />
parsing. The activation functions used in the proposed model have three states instead of the normal two states. This makes<br />
the neural network more flexible than the two-state neural network, and allows for uncertainty to be propagated through<br />
the stages. In other words, decisions about difficult pixels can be left for later stages which have access to more contextual<br />
information than earlier stages. We applied the proposed method to three different datasets and experimental results demonstrate<br />
higher performance of the three-state series neural network.<br />
13:30-16:30, Paper ThBCT9.30<br />
Pan-Sharpening using an Adaptive Linear Model<br />
Liu, Lining, Beihang Univ.<br />
Wang, Yiding, North China Univ. of Tech.<br />
Wang, Yunhong, Beihang Univ.<br />
Yu, Haiyan, Beihang Univ.<br />
In this paper, we propose an algorithm to synthesize high-resolution multispectral images by fusing panchromatic (Pan)<br />
images and multispectral (MS) images. The algorithm is based on an adaptive linear model, which is automatically estimated<br />
by least square fitting. In this model, a virtual difference band is appended to the MS to guarantee the correlation<br />
between the Pan and MS. Then, an iterative procedure is carried out to generate the fused images using steepest descent<br />
method. The efficiency of the presented technique is tested by performing pan-sharpening of IKONOS, Quick Bird, and<br />
Landsat-7 ETM+ datasets. Experimental results show that our method provides better fusion results than other methods.<br />
13:30-16:30, Paper ThBCT9.31<br />
A Study of Voice Source and Vocal Tract Filter based Features in Cognitive Load Classification<br />
Le, Phu, The Univ. of New South Wales<br />
Epps, Julien, The Univ. of New South Wales<br />
Choi, Eric, ational ICT Australia<br />
Ambikairajah, Eliathamby, The Univ. of New South Wales<br />
Speech has been recognized as an attractive method for the measurement of cognitive load. Previous approaches have<br />
used mel frequency cepstral coefficients (MFCCs) as discriminative features to classify cognitive load. The MFCCs contain<br />
information from both the voice source and the vocal tract, so that the individual contributions of each to cognitive load<br />
variation are unclear. This paper aims to extract speech features related to either the voice source or the vocal tract and use<br />
- 321 -
them to discriminate between cognitive load levels in order to identify the individual contribution of each for cognitive<br />
load measurement. Voice source-related features are then used to improve the performance of current cognitive load classification<br />
systems, using adapted Gaussian mixture models. Our experimental result shows that the use of voice source<br />
feature could yield around 12% reduction in relative error rate compared with the baseline system based on MFCCs, intensity,<br />
and pitch contour.<br />
13:30-16:30, Paper ThBCT9.32<br />
Adaptive Enhancement with Speckle Reduction for SAR Images using Mirror-Extended Curvelet and PSO<br />
Li, Ying, Northwestern Pol. Univ.<br />
Hongli, Gong, Northwestern Pol. Univ.<br />
Wang, Qing, Northwestern Pol. Univ.<br />
Speckle and low contrast can cause image degradation, which reduces the detectability of targets and impedes further investigation<br />
of synthetic aperture radar (SAR) images. This paper presents an adaptive enhancement method with speckle<br />
reduction for SAR images using mirror-extended curve let (ME-curve let) transform and particle swarm optimization<br />
(PSO). First, an improved enhancement function is proposed to nonlinearly shrink and stretch the curve let coefficients.<br />
Then, a novel objective evaluation criterion is introduced to adaptively obtain the optimal parameters in the enhancement<br />
function. Finally, a PSO algorithm with two improvements is used as a global search strategy for the best enhanced image.<br />
Experimental results indicate that the proposed method can reduce the speckle and enhance the edge features and the contrast<br />
of SAR images better with comparison to the wavelet-based and curve let-based non-adaptive enhancement methods.<br />
13:30-16:30, Paper ThBCT9.33<br />
Recursive Video Matting and Denoising<br />
Prabhu, Sahana, Indian Inst. of Tech. Madras<br />
Ambasamudram, Rajagopalan, Indian Inst. of Tech. Madras<br />
In this paper, we propose a video matting method with simultaneous noise reduction based on the Unscented Kalman filter<br />
(UKF). This recursive approach extracts the alpha mattes and denoised foregrounds from noisy videos, in a unified framework.<br />
No assumptions are made about the type of motion of the camera or of the foreground object in the video. Moreover,<br />
user-specified trimaps are required only once every ten frames. In order to accurately extract information at the borders<br />
between the foreground and the background, we include a discontinuity-adaptive Markov random field (MRF) prior. It<br />
incorporates spatio-temporal information from the current and previous frame during estimation of the alpha matte as well<br />
as the foreground. Results are given on videos with real film-grain noise.<br />
13:30-16:30, Paper ThBCT9.35<br />
The Effects of Radiometry on the Accuracy of Intensity based Registration<br />
Selby, Boris Peter, Medcom GmbH<br />
Sakas, Georgios, Fraunhofer IGD<br />
Walter, Stefan, Medcom GmbH<br />
Groch, Wolf-Dieter, Univ. of Applied Sciences Darmstadt<br />
Stilla, Uwe, Tech. Univ. Muenchen<br />
Besides several other factors, radiometric differences between a reference and a floating image greatly influence the achievable<br />
accuracy of image registration. In this work we derive the magnitude of registration inaccuracy coming from changes<br />
in radiometric properties. This is done for the example of medical X-ray image registration. We therefore estimate the<br />
change of image intensity with respect to object shape, X-ray attenuation of the object material and the initial X-ray energy<br />
by modeling a simplified image formation process. The change in intensity is then used to determine a closed form estimation<br />
of the resulting registration error, independent from a specific registration algorithm. Finally the theoretical calculations<br />
are compared to the accuracy of intensity based registration performed on X-ray images with different radiometric<br />
properties. Results show that the herewith derived accuracy estimation is well suited to predict the achievable accuracy of<br />
a registration for images with radiometric differences.<br />
- 322 -
13:30-16:30, Paper ThBCT9.36<br />
Fence Removal from Multi-Focus Images<br />
Yamashita, Atsushi, Shizuoka Univ.<br />
Matsui, Akiyoshi, Shizuoka Univ.<br />
Kaneko, Toru, Shizuoka Univ.<br />
When an image of a scene is captured by a camera through a fence, a blurred fence image interrupts objects in the scene.<br />
In this paper, we propose a method for a fence removal from the image using multiple focusing. Most of previous methods<br />
interpolate the interrupted regions by using information of surrounding textures. However, these methods fail when information<br />
of surrounding textures is not rich. On the other hand, there are methods that acquire multiple images for image<br />
restoration and composite them to generate a new clear image. The latter approach is adopted because it is robust and accurate.<br />
Multi-focus images are acquired and ``defocusing’’ information is utilized to generate a clear image. Experimental<br />
results show the effectiveness of the proposed method.<br />
13:30-16:30, Paper ThBCT9.37<br />
Information Theoretic Expectation Maximization based Gaussian Mixture Modeling for Speaker Verification<br />
Memon, Sheeraz, RMIT Univ.<br />
Lech, Margaret, RMIT Univ.<br />
Namunu, Maddage, RMIT Univ.<br />
The expectation maximization (EM) algorithm is widely used in the Gaussian mixture model (GMM) as the state-of-art<br />
statistical modeling technique. Like the classical EM method, the proposed EM-Information Theoretic algorithm (EM-<br />
IT) adapts means, covariances and weights, however this process is not conducted directly on feature vectors but on a<br />
smaller set of centroids derived by the information theoretic procedure, which simultaneously minimizes the divergence<br />
between the Parzen estimates of the feature vector’s distribution within a given Gaussian component and the centroid’s<br />
distribution within the same Gaussian component. The EM-IT algorithm was applied to the speaker verification problem<br />
using NIST 2004 speech corpus and the MFCC with dynamic features. The results showed an improvement of the equal<br />
error rate (ERR) by 1.5% over the classical EM approach. The EM-IT also showed higher convergence rates compare to<br />
the EM method.<br />
13:30-16:30, Paper ThBCT9.38<br />
A Gaussian Process Regression Framework for Spatial Error Concealment with Adaptive Kernels<br />
Asheri, Hadi, Sharif Univ. of Tech.<br />
Rabiee, Hamid Reza, Sharif Univ. of Tech.<br />
Pourdamghani, Nima, Sharif Univ. of Tech.<br />
Rohban, Mohammad H., Sharif Univ. of Tech.<br />
We have developed a Gaussian Process Regression method with adaptive kernels for concealment of the missing macroblocks<br />
of block-based video compression schemes in a packet video system. Despite promising results, the proposed algorithm<br />
introduces a solid framework for further improvements. In this paper, the problem of estimating lost macro-blocks<br />
will be solved by estimating the proper covariance function of the Gaussian process defined over a region around the missing<br />
macro-blocks (i.e. its kernel function). In order to preserve block edges, the kernel is constructed adaptively by using<br />
the local edge related information. Moreover, we can achieve more improvement by local estimation of the kernel parameters.<br />
While restoring the prominent edges of the missing macro-blocks, the proposed method produces perceptually<br />
smooth concealed frames. Objective and subjective evaluations verify the effectiveness of the proposed method.<br />
13:30-16:30, Paper ThBCT9.39<br />
Colour Constant Image Shapening<br />
Alsam, Ali, Sør-Trøndelag Univ. Coll.<br />
In this paper, we introduce a new sharpening method which guarantees colour constancy and resolves the problem of equiluminance<br />
colours. The algorithm is similar to unsharp masking in that the gradients are calculated at different scales by<br />
blurring the original with a variable size kernel. The main difference is in the blurring stage where we calculate the average<br />
of an n times n neighborhood by projecting each colour vector onto the space of the center pixel before averaging. Thus<br />
starting with the center pixel we define a projection matrix onto the space of that vector. Each neighboring colour is then<br />
projected onto the center and the result is summed up. The projection step results in an average vector which shares the<br />
- 323 -
direction of the original center pixel. The difference between the center pixel and the average is by definition a vector<br />
which is scalar away from the center pixel. Thus adding the average to the center pixel is guaranteed not to result in colour<br />
shifts. This projection step is also shown to remedy the problem of equiluminance colours and can be used for m-dimensional<br />
data. Finally, the results indicate that the new sharpening method results in better sharpening than that achieved<br />
using unsharp masking with noticeably less halos around strong edges. The latter aspect of the algorithm is believed to be<br />
due to the asymmetric nature of the projection step.<br />
13:30-16:30, Paper ThBCT9.40<br />
Maximally Stable Texture Regions<br />
Güney, Mesut, Turkish Naval Academy<br />
Arica, Nafiz, Turkish Naval Academy<br />
In this study, we propose to detect interest regions based on texture information of images. For this purpose, Maximally<br />
Stable Extremal Regions (MSER) approach is extended using the high dimensional texture features of image pixels. The<br />
regions with different textures from their vicinity are detected using agglomerative clustering successively. The proposed<br />
approach is evaluated in terms of repeatability and matching scores in an experimental setup used in the literature. It outperforms<br />
the intensity and color based detectors, especially in the images containing textured regions. It succeeds better<br />
in the transformations including viewpoint change, blurring, illumination and JPEG compression, while producing comparable<br />
results in the other transformations tested in the experiments.<br />
13:30-16:30, Paper ThBCT9.41<br />
Combining the Likelihood and the Kullback-Leibler Distance in Estimating the Universal Background Model for<br />
Speaker Verification using SVM<br />
Lei, Zhenchun, JiangxiNormal Univ.<br />
The state-of-the-art methods for speaker verification are based on the support vector machine. The Gaussian supervector<br />
SVM is a typical method which uses the Gaussian mixture model for creating feature vectors for the discriminative SVM.<br />
And all GMMs are adapted from the same universal background model, which is got by maximum likelihood estimation<br />
on a large number of data sets. So the UBM should cover the feature space widely as possible. We propose a new method<br />
to estimate the parameters of the UBM by combining the likelihood and the Kullback-Leibler distances in the UBM. Its<br />
aim is to find the model parameters which get the high likelihood value and all Gaussian distributions are dispersed to<br />
cover the feature space in a great measuring. Experiments on NIST 2001 task show that our method can improve the performance<br />
obviously.<br />
13:30-16:30, Paper ThBCT9.42<br />
Asymmetric Generalized Gaussian Mixture Models and EM Algorithm for Image Segmentation<br />
Nacereddine, Nafaa, LORIA<br />
Tabbone, Salvatore, Univ. Nancy 2-LORIA<br />
Ziou, Djemel, Sherbrooke Univ.<br />
Hamami, Latifa, Ec. Nationale Pol.<br />
In this paper, a parametric and unsupervised histogram-based image segmentation method is presented. The histogram is<br />
assumed to be a mixture of asymmetric generalized Gaussian distributions. The mixture parameters are estimated by using<br />
the Expectation Maximization algorithm. Histogram fitting and region uniformity measures on synthetic and real images<br />
reveal the effectiveness of the proposed model compared to the generalized Gaussian mixture model.<br />
13:30-16:30, Paper ThBCT9.43<br />
Color Connectedness Degree for Mean-Shift Tracking<br />
Gouiffès, Michèle, IEF Univ. Paris Sud 11<br />
Laguzet, Florence, LRI Univ. Paris Sud 11<br />
Lacassagne, Lionel, IEF Univ. Paris Sud 11<br />
This paper proposes an extension to the mean shift tracking. We introduce the color connectedness degrees (CCD) which,<br />
more than providing statistical information about the target to track, embeds information about the amount of connectedness<br />
of the color intervals which compose the target. With a low increase of complexity, this approach provides a better robust-<br />
- 324 -
ness and quality of the tracking compared to the use of the RGB space. This is asserted by the experiments performed on<br />
several sequences showing vehicles and pedestrians in various contexts.<br />
13:30-16:30, Paper ThBCT9.44<br />
Signal-To-Signal Ratio Independent Speaker Identifi Cation for Co-Channel Speech Signals<br />
Saeidi, Rahim, Univ. of Eastern Finland<br />
Mowlaee, Pejman, Aalborg Univ.<br />
Kinnunen, Tomi, Univ. of Eastern Finland<br />
Tan, Zheng-Hua, Aalborg Univ.<br />
Christensen, Mads Græsbøll, Aalborg Univ.<br />
Jensen, Søren Holdt, Aalborg Univ.<br />
Fränti, Pasi, Univ. of Eastern Finland<br />
In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is<br />
recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition<br />
accuracies have already been reported when an accurately estimated signal-to-signal ratio (SSR) is available. In this paper,<br />
we approach the problem without estimating SSR. We show that a simple method based on fusion of adapted Gaussian<br />
mixture models and Kullback-Leibler divergence calculated between models, achieves an accuracy of 97% and 93% when<br />
the two target speakers enlisted as three and two most probable speakers, respectively.<br />
13:30-16:30, Paper ThBCT9.45<br />
Selection of Training Instances for Music Genre Classification<br />
Lopes, Miguel, INESC Porto<br />
Gouyon, Fabien, INESC Porto<br />
Koerich, Alessandro, PUCPR<br />
Oliveira, Luiz, Federal Univ. of Parana<br />
In this paper we present a method for the selection of training instances based on the classification accuracy of a SVM<br />
classifier. The instances consist of feature vectors representing short-term, low-level characteristics of music audio signals.<br />
The objective is to build, from only a portion of the training data, a music genre classifier with at least similar performance<br />
as when the whole data is used. The particularity of our approach lies in a pre-classification of instances prior to the main<br />
classifier training: i.e. we select from the training data those instances that show better discrimination with respect to class<br />
memberships. On a very challenging dataset of 900 music pieces divided among 10 music genres, the instance selection<br />
method slightly improves the music genre classification in 2.4 percentage points. On the other hand, the resulting classification<br />
model is significantly reduced, permitting much faster classification over test data.<br />
13:30-16:30, Paper ThBCT9.46<br />
Semi-Blind Speech-Music Separation using Sparsity and Continuity Priors<br />
Erdogan, Hakan, Sabanci Univ.<br />
M. Grais, Emad, Sabanci Univ.<br />
In this paper we propose an approach for the problem of single channel source separation of speech and music signals.<br />
Our approach is based on representing each source’s power spectral density using dictionaries and nonlinearly projecting<br />
the mixture signal spectrum onto the combined span of the dictionary entries. We encourage sparsity and continuity of the<br />
dictionary coefficients using penalty terms (or log-priors) in an optimization framework. We propose to use a novel coordinate<br />
descent technique for optimization, which nicely handles nonnegativity constraints and nonquadratic penalty terms.<br />
We use an adaptive Wiener filter, and spectral subtraction to reconstruct both of the sources from the mixture data after<br />
corresponding power spectral densities (PSDs) are estimated for each source. Using conventional metrics, we measure<br />
the performance of the system on simulated mixtures of single person speech and piano music sources. The results indicate<br />
that the proposed method is a promising technique for low speech-to-music ratio conditions and that sparsity and continuity<br />
priors help improve the performance of the proposed system.<br />
- 325 -
13:30-16:30, Paper ThBCT9.47<br />
Comparative Analysis for Detecting Objects under Cast Shadows in Video Images<br />
Villamizar Vergel, Michael, CSIC-UPC<br />
Scandaliaris, Jorge, CSIC-UPC<br />
Sanfeliu, Alberto, Univ. Pol. de Catalunya<br />
Cast shadows add additional difficulties on detecting objects because they locally modify image intensity and color. Shadows<br />
may appear or disappear in an image when the object, the camera, or both are free to move through a scene. This<br />
work evaluates the performance of an object detection method based on boosted HOG paired with three different image<br />
representations in outdoor video sequences. We follow and extend on the taxonomy from van de Sande with considerations<br />
on the constraints assumed by each descriptor on the spatial variation of the illumination. We show that the intrinsic image<br />
representation consistently gives the best results. This proves the usefulness of this representation for object detection in<br />
varying illumination conditions, and supports the idea that in practice local assumptions in the descriptors can be violated.<br />
13:30-16:30, Paper ThBCT9.48<br />
Shape-Appearance Guided Level-Set Deformable Model for Image Segmentation<br />
Khalifa, Fahmi, Univ. of Louisville<br />
El-Baz, Ayman, Univ. of Louisville<br />
Gimel’Farb, Georgy, Univ. of Auckland<br />
Abou El-Ghar, Mohamed, Univ. of Mansoura<br />
A new speed function to guide evolution of a level-set based active contour is proposed for segmenting an object from its<br />
background in a given image. The guidance accounts for a learned spatially variant statistical shape prior, 1st-order visual<br />
appearance descriptors of the contour interior and exterior (associated with the object and background, respectively), and<br />
a spatially invariant 2nd-order homogeneity descriptor. The shape prior is learned from a subset of co-aligned training images.<br />
The visual appearances are described with marginal gray level distributions obtained by separating their mixture<br />
over the image. The evolving contour interior is modeled by a 2nd-order translation and rotation invariant Markov-Gibbs<br />
random field of object/background labels with analytically estimated potentials. Experiments with kidney CT images confirm<br />
robustness and accuracy of the proposed approach.<br />
13:30-16:30, Paper ThBCT9.49<br />
Iterative Ramp Sharpening for Structure/Signature-Preserving Simplification of Images<br />
Grazzini, Jacopo, Los Alamos National Lab.<br />
Soille, Pierre, Ec. Joint Res. Centre<br />
In this paper, we present a simple and heuristic ramp sharpening algorithm that achieves local contrast enhancement of<br />
vector-valued images. The proposed algorithm performs pixel wise comparisons of intensity values, gradient strength and<br />
directional information in order to locate transition ramps around true edges in the image. The sharpening is then applied<br />
only for those pixels found on the ramps. This way, the contrast between objects and regions separated by a ramp is enhanced<br />
correspondingly, avoiding ringing artifacts. It is found that applying this technique in an iterative manner on blurred<br />
imagery produces sharpening preserving both structure and signature of the image. The final approach reaches a good<br />
compromise between complexity and effectiveness for image simplfication, enhancing in an efficient manner the image<br />
details and maintaining the overall image appearance.<br />
13:30-16:30, Paper ThBCT9.50<br />
Learning Naive Bayes Classifiers for Music Classification and Retrieval<br />
Fu, Zhouyu, Monash Univ.<br />
Lu, Guojun, Monash Univ.<br />
Ting, Kai Ming, Monash Univ.<br />
Zhang, Dengsheng, Monash Univ.<br />
In this paper, we explore the use of naive Bayes classifiers for music classification and retrieval. The motivation is to employ<br />
all audio features extracted from local windows for classification instead of just using a single song-level feature<br />
vector produced by compressing the local features. Two variants of naive Bayes classifiers are studied based on the extensions<br />
of standard nearest neighbor and support vector machine classifiers. Experimental results have demonstrated superior<br />
performance achieved by the proposed naive Bayes classifiers for both music classification and retrieval as compared<br />
to the alternative methods.<br />
- 326 -
13:30-16:30, Paper ThBCT9.52<br />
An Empirical Study of Feature Extraction Methods for Audio Classification<br />
Parker, Charles, Eastman Kodak Company<br />
With the growing popularity of video sharing web sites and the increasing use of consumer-level video capture devices,<br />
new algorithms are needed for intelligent searching and indexing of such data. The audio from these video streams is particularly<br />
challenging due to its low quality and high variability. Here, we perform a broad empirical study of features used<br />
for intelligent audio processing. We perform experiments on a dataset of 200 consumer videos over which we attempt to<br />
detect 10 semantic audio concepts.<br />
13:30-16:30, Paper ThBCT9.53<br />
Geometric Total Variation for Texture Deformation<br />
Bespalov, Dmitriy, Drexel Univ.<br />
Dahl, Anders, Tech. Univ. of Denmark<br />
Shokoufandeh, Ali, Drexel Univ.<br />
In this work we propose a novel variational method that we intend to use for estimating non-rigid texture deformation.<br />
The method is able to capture variation in gray scale images with respect to the geometry of its features. Accurate localization<br />
of features in the presence of unknown deformations is a crucial property for texture characterization. Our experimental<br />
evaluations demonstrate that accounting for geometry of features in texture images leads to significant<br />
improvements in localization of these features, when textures undergo geometrical transformations. In addition, feature<br />
descriptors using geometrical total variation energies discriminate between various regular textures with accuracy comparable<br />
to SIFT descriptors, while reduced dimensionality of TVG descriptor yields significant improvements over SIFT<br />
in terms of retrieval time.<br />
13:30-16:30, Paper ThBCT9.54<br />
A Novel Approach to Detect Ship-Radiated Signal based on HMT<br />
Zhou, Yue, Shanghai Jiaotong Univ.<br />
Niu, Zhibin, Shanghai Jiaotong Univ.<br />
Wang, Chenhao, Shanghai Jiaotong Univ.<br />
In the presence of non-gaussian noise, we propose a method for the detection of underwater ship-radiated signal. The<br />
wavelet decomposition of the underwater signal yields a natural tree structure, which is further modeled by the Hidden<br />
Markov Tree (HMT). Therefore, the signal is represented as the parameter of the correspondent HMT. We analysis the<br />
likelihood defined on the parameters and form the new detection criteria. Experimental results demonstrate a reliable and<br />
robust solution of our method.<br />
13:30-16:30, Paper ThBCT9.55<br />
Speech Emotion Analysis in Noisy Real-World Environment<br />
Tawari, Ashish, Univ. of California, San Diego<br />
Trivedi, Mohan, Univ. of California, San Diego<br />
Automatic recognition of emotional states via speech signal has attracted increasing attention in recent years. A number<br />
of techniques have been proposed which are capable of providing reasonably high accuracy for controlled studio settings.<br />
However, their performance is considerably degraded when the speech signal is contaminated by noise. In this paper, we<br />
present a framework with adaptive noise cancellation as front end to speech emotion recognizer. We also introduce a new<br />
feature set based on cepstral analysis of pitch and energy contours. Experimental analysis shows promising results.<br />
13:30-16:30, Paper ThBCT9.56<br />
Sampling and Ideal Reconstruction on the 3D Diamond Grid<br />
Strand, Robin, Uppsala Univ.<br />
This paper presents basic, yet important, properties that can be used when developing methods for image acquisition, processing,<br />
and visualization on the diamond grid. The sampling density needed to reconstruct a band-limited signal and the<br />
ideal interpolation function on the diamond grid are derived.<br />
- 327 -
13:30-16:30, Paper ThBCT9.57<br />
Detecting Faint Compact Sources using Local Features and a Boosting Approach<br />
Torrent, Albert, Univ. of Girona<br />
Peracaula, Marta, Univ. of Girona<br />
Llado, Xavier, Univ. of Girona<br />
Freixenet, Jordi, Univ. of Girona<br />
Sanchez-Sutil, Juan Ramon, Univ. de Jaén<br />
Martí, Josep, Univ. de Jaén<br />
Paredes, Josep Maria, Univ. de Barcelona<br />
Several techniques have been proposed so far in order to perform faint compact source detection in wide field interferometric<br />
radio images. However, all these methods can easily miss some detections or obtain a high number of false positive<br />
detections due to the low intensity of the sources, the noise ratio, and the interferometric patterns present in the images.<br />
In this paper we present a novel strategy to tackle this problem. Our approach is based on using local features extracted<br />
from a bank of filters in order to provide a description of different types of faint source structures. We then perform a<br />
training step in order to automatically learn and select the most salient features, which are used in a Boosting classifier to<br />
perform the detection. The validity of our method is demonstrated using 19 real images that compose a radio mosaic. The<br />
comparison with two well-known state of the art methods shows that our approach is able to obtain more source detections,<br />
reducing also the number of false positives.<br />
13:30-16:30, Paper ThBCT9.58<br />
Automatic Hair Detection in the Wild<br />
Julian, Pauline, IRIT, FittingBox<br />
Dehais, Christophe, FittingBox<br />
Lauze, Francois, Univ. of Copenhagen<br />
Charvillat, Vincent, IRIT<br />
Bartoli, Adrien, UdA<br />
Choukroun, Ariel, FittingBox<br />
This paper presents an algorithm for segmenting the hair region in uncontrolled, real life conditions images. Our method<br />
is based on a simple statistical hair shape model representing the upper hair part. We detect this region by minimizing an<br />
energy which uses active shape and active contour. The upper hair region then allows us to learn the hair appearance parameters<br />
(color and texture) for the image considered. Finally, those parameters drive a pixel-wise segmentation technique<br />
that yields the desired (complete) hair region. We demonstrate the applicability of our method on several real images.<br />
13:30-16:30, Paper ThBCT9.59<br />
De-Noising of SRμCT Fiber Images by Total Variation Minimization<br />
Lindblad, Joakim, Swedish Univ. of Agricultural Sciences<br />
Sladoje, Natasa, Univ. of Novi Sad<br />
Lukic, Tibor, Univ. of Novi Sad<br />
SRCT images of paper and pulp fiber materials are characterized by a low signal to noise ratio. De-noising is therefore a<br />
common preprocessing step before segmentation into fiber and background components. We suggest a de-noising<br />
method based on total variation minimization using a modified Spectral Conjugate Gradient algorithm. Quantitative<br />
evaluation performed on synthetic 3D data and qualitative evaluation on real 3D paper fiber data confirm appropriateness<br />
of the suggested method for the particular application.<br />
- 328 -