26.04.2013 Views

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 18: Distribution of the observations <strong>in</strong> the clusters us<strong>in</strong>g basic features.<br />

The ideal solution <strong>in</strong> the cluster<strong>in</strong>g process is to obta<strong>in</strong> a 100% <strong>in</strong> completeness and homogeneity.<br />

In our case we have not obta<strong>in</strong> an ideal solution, then, we have to choose a measure which is a trade<br />

of between both measures. In figure 21 we observe two plots for each experiment, β = 0 means<br />

that the plot is measur<strong>in</strong>g homogeneity and β = 1 means that the plot is measur<strong>in</strong>g completeness.<br />

For each experiment we observe that with small number of clusters the homogeneity is small and<br />

the completeness is good. By <strong>in</strong>creas<strong>in</strong>g the number of clusters the homogeneity <strong>in</strong>creases and the<br />

completeness decreases. The best number of cluster for each experiment is when both plots cross.<br />

For example, the best number of clusters for the BSM features is 15.<br />

The experiment for the retrieval process evaluates its accuracy. We have done several experiments<br />

us<strong>in</strong>g different comb<strong>in</strong>ations of basic features, the subset of the ground truth and the BSM<br />

features (Fig. 22). We observe that the worst results are obta<strong>in</strong>ed when we use the ground truth<br />

with all the basic features. Us<strong>in</strong>g the BSM features we have obta<strong>in</strong>ed the best results, followed<br />

by the experiment us<strong>in</strong>g the basic features height and width. Us<strong>in</strong>g all the basic features we have<br />

obta<strong>in</strong>ed worst results.<br />

In the last experiment we evaluate the performance <strong>in</strong> terms of scalability (an <strong>in</strong>creas<strong>in</strong>g number<br />

of documents and classes) and the descriptor. We observe, us<strong>in</strong>g the same descriptor and different<br />

number of documents and classes, that the accuracy is better with less number of classes. We also<br />

observe that us<strong>in</strong>g the BSM descriptor, it is a better descriptor and more accurate. The performance<br />

improves, even us<strong>in</strong>g the bigger ground truth with respect the best result of the smaller ground<br />

truth.<br />

26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!