06.02.2013 Views

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

86.7M training samples which shows a 7-times speedup and higher minimum per-class recall, compared to previously reported<br />

methods. The context of these experiments is the need for image classifiers able to handle an unbounded variety of<br />

inputs: in our case, highly versatile document classifiers which require training sets as large as a billion training samples.<br />

09:40-10:00, Paper WeAT6.3<br />

Gaussian Mixture Models for Arabic Font Recognition<br />

Slimane, Fouad, Univ. of Fribourg<br />

Kanoun, Slim, ENIS<br />

Alimi, Adel M., Univ. of Sfax<br />

Ingold, Rolf, Univ. of Fribourg<br />

Hennebert, Jean, Univ. of Applied Sciences<br />

We present in this paper a new approach for Arabic font recognition. Our proposal is to use a fixed-length sliding window<br />

for the feature extraction and to model feature distributions with Gaussian Mixture Models (GMMs). This approach presents<br />

a double advantage. First, we do not need to perform a priori segmentation into characters, which is a difficult task<br />

for arabic text. Second, we use versatile and powerful GMMs able to model finely distributions of features in large multidimensional<br />

input spaces. We report on the evaluation of our system on the APTI (Arabic Printed Text Image) database<br />

using 10 different fonts and 10 font sizes. Considering the variability of the different font shapes and the fact that our<br />

system is independent of the font size, the obtained results are convincing and compare well with competing systems.<br />

10:00-10:20, Paper WeAT6.4<br />

Transfer of Supervision for Improved Address Standardization<br />

Kothari, Govind, IBM<br />

Faruquie, Tanveer, IBM Res. India<br />

Subramaniam, L. Venkata, IBM Res. India<br />

K, Hima Prasad, IBM Res. India<br />

Mohania, Mukesh, IBM Res. India<br />

Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners<br />

can be easily trained for different data sources. However, training requires labeling large corpora for each data source<br />

which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a<br />

given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data<br />

from one source. The shared component distribution across these dirichlet processes captures the semantic relation between<br />

data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision.<br />

10:20-10:40, Paper WeAT6.5<br />

Bag of Characters and SOM Clustering for Script Recognition and Writer Identification<br />

Marinai, Simone, Univ. of Florence<br />

Miotti, Beatrice, Univ. of Florence<br />

Soda, Giovanni, Univ. di Firenze<br />

In this paper, we describe a general approach for script (and language) recognition from printed documents and for writer<br />

identification in handwritten documents. The method is based on a bag of visual word strategy where the visual words<br />

correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM). Unknown pages (words<br />

in the case of script recognition) are classified comparing their vectorial representations with those of one training set<br />

using a cosine similarity. The comparison is improved using a similarity score that is obtained taking into account the<br />

SOM organization of cluster centroids. % Promising results are presented for both printed documents and handwritten<br />

musical scores.<br />

WeAT7 Dolmabahçe Hall C<br />

Gait and Gesture Regular Session<br />

Session chair: Shinoda, Koichi (Tokyo Institute of Technology)<br />

- 166 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!