26.04.2013 Views

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 21: Choos<strong>in</strong>g the best number of clusters. β = 0 means homogeneity. β = 1 means<br />

completeness.<br />

We have observed that the cluster<strong>in</strong>g algorithm used <strong>in</strong> the first approach does not perform<br />

well with the selected corpus. We have done some experiments with Self-Organiz<strong>in</strong>g Map 1 (SOM)<br />

as an <strong>in</strong>troductory work for future endeavours. SOM is a type of Artificial Neural Network that<br />

is tra<strong>in</strong>ed us<strong>in</strong>g unsupervised learn<strong>in</strong>g to produce a low-dimensional, discrete representation of the<br />

<strong>in</strong>put space of the tra<strong>in</strong><strong>in</strong>g examples, called a map.In the appendix A there are some figures with<br />

the results of this algorithm. Figure 26 shows a map of the observations of the tra<strong>in</strong><strong>in</strong>g set us<strong>in</strong>g<br />

BSM features. Each cell represents a different cluster, and each colour a different class. We observe<br />

that the observations of each class are bunched <strong>in</strong> close clusters. Figure 27 shows a similar to the<br />

previous one, but us<strong>in</strong>g characteristic Loci as features. We observe that <strong>in</strong> this case the observations<br />

are more concentrated <strong>in</strong> the same clusters.<br />

10. Conclusions<br />

<strong>Word</strong>-spott<strong>in</strong>g appears to be an attractive alternative to the seem<strong>in</strong>gly obvious recognize-thenretrieve<br />

approach to historical manuscript retrieval. With the capability of match<strong>in</strong>g word images<br />

<strong>in</strong> a quick and accurate way, partial transcriptions of a collection can be achieved with reasonable<br />

accuracy and scarce human <strong>in</strong>teraction and we obta<strong>in</strong> better results and by <strong>in</strong>creas<strong>in</strong>g the number of<br />

observations of the tra<strong>in</strong><strong>in</strong>g set. <strong>Word</strong>-spott<strong>in</strong>g has the capability to automatically identify <strong>in</strong>dex<strong>in</strong>g<br />

1. http://www.cis.hut.fi/somtoolbox/<br />

29

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!