26.04.2013 Views

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 5: We present two approaches based <strong>in</strong> word spott<strong>in</strong>g. Both have the same firsts steps.<br />

first to quickly reject an important number of non similar words (first level) and do the <strong>in</strong>tensive<br />

search with more discrim<strong>in</strong>ant features (BSM) <strong>in</strong> the second level with a reduced number of target<br />

words.<br />

The second approach is oriented to pseudo-structural features. The descriptor used <strong>in</strong> this<br />

approach is characteristic Loci feature and the <strong>in</strong>dexation structure is constructed us<strong>in</strong>g a table,<br />

where each column is each observation of the documents, and the rows are the features of the<br />

words. Each word, or character, is composed by several features, and it is not significant where they<br />

appear <strong>in</strong>side the image. This approach uses features based <strong>in</strong> Loci Characteristics [3; 4; 8]. Given<br />

a word image, a feature vector based on Loci characteristics is computed at some characteristicpo<strong>in</strong>ts.<br />

Some approaches of the literature have used the background pixels of the image. Other<br />

approaches have used the foreground pixels, and even some approaches have used the contour or the<br />

skeleton of the images. Loci characteristics encode the frequency of <strong>in</strong>tersection counts for a given<br />

characteristic-po<strong>in</strong>t <strong>in</strong> different direction paths start<strong>in</strong>g from this po<strong>in</strong>t. Loci vectors extracted<br />

from the words of the image database are stored <strong>in</strong> a hash<strong>in</strong>g structure. Afterwards, the word<br />

spott<strong>in</strong>g is performed by a vot<strong>in</strong>g process after Loci vectors from the query word are <strong>in</strong>dexed <strong>in</strong><br />

the hash<strong>in</strong>g table.<br />

Let us describe the different steps of the two developed approaches. Both approaches have the<br />

same prelim<strong>in</strong>ary steps. They consist <strong>in</strong> a pre-process<strong>in</strong>g step, where the documents are segmented<br />

and extracted the words of them, <strong>in</strong> a fast rejection, where bad words are discarded, and noise<br />

removal, where the noise of the image is removed and the bound<strong>in</strong>g box is fixed to the contour<br />

of the image. These prelim<strong>in</strong>ary steps are expla<strong>in</strong>ed <strong>in</strong> the section 6. Section 7 expla<strong>in</strong>s the first<br />

approach developed and the section 8 the second one.<br />

10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!