Abstract book (pdf) - ICPR 2010
Abstract book (pdf) - ICPR 2010
Abstract book (pdf) - ICPR 2010
- TAGS
- abstract
- icpr
- icpr2010.org
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
86.7M training samples which shows a 7-times speedup and higher minimum per-class recall, compared to previously reported<br />
methods. The context of these experiments is the need for image classifiers able to handle an unbounded variety of<br />
inputs: in our case, highly versatile document classifiers which require training sets as large as a billion training samples.<br />
09:40-10:00, Paper WeAT6.3<br />
Gaussian Mixture Models for Arabic Font Recognition<br />
Slimane, Fouad, Univ. of Fribourg<br />
Kanoun, Slim, ENIS<br />
Alimi, Adel M., Univ. of Sfax<br />
Ingold, Rolf, Univ. of Fribourg<br />
Hennebert, Jean, Univ. of Applied Sciences<br />
We present in this paper a new approach for Arabic font recognition. Our proposal is to use a fixed-length sliding window<br />
for the feature extraction and to model feature distributions with Gaussian Mixture Models (GMMs). This approach presents<br />
a double advantage. First, we do not need to perform a priori segmentation into characters, which is a difficult task<br />
for arabic text. Second, we use versatile and powerful GMMs able to model finely distributions of features in large multidimensional<br />
input spaces. We report on the evaluation of our system on the APTI (Arabic Printed Text Image) database<br />
using 10 different fonts and 10 font sizes. Considering the variability of the different font shapes and the fact that our<br />
system is independent of the font size, the obtained results are convincing and compare well with competing systems.<br />
10:00-10:20, Paper WeAT6.4<br />
Transfer of Supervision for Improved Address Standardization<br />
Kothari, Govind, IBM<br />
Faruquie, Tanveer, IBM Res. India<br />
Subramaniam, L. Venkata, IBM Res. India<br />
K, Hima Prasad, IBM Res. India<br />
Mohania, Mukesh, IBM Res. India<br />
Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners<br />
can be easily trained for different data sources. However, training requires labeling large corpora for each data source<br />
which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a<br />
given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data<br />
from one source. The shared component distribution across these dirichlet processes captures the semantic relation between<br />
data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision.<br />
10:20-10:40, Paper WeAT6.5<br />
Bag of Characters and SOM Clustering for Script Recognition and Writer Identification<br />
Marinai, Simone, Univ. of Florence<br />
Miotti, Beatrice, Univ. of Florence<br />
Soda, Giovanni, Univ. di Firenze<br />
In this paper, we describe a general approach for script (and language) recognition from printed documents and for writer<br />
identification in handwritten documents. The method is based on a bag of visual word strategy where the visual words<br />
correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM). Unknown pages (words<br />
in the case of script recognition) are classified comparing their vectorial representations with those of one training set<br />
using a cosine similarity. The comparison is improved using a similarity score that is obtained taking into account the<br />
SOM organization of cluster centroids. % Promising results are presented for both printed documents and handwritten<br />
musical scores.<br />
WeAT7 Dolmabahçe Hall C<br />
Gait and Gesture Regular Session<br />
Session chair: Shinoda, Koichi (Tokyo Institute of Technology)<br />
- 166 -