26.04.2013 Views

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The segmentation process experiments evaluate the accuracy of the word segmentation. The<br />

segmented word and the labeled word are overlapped <strong>in</strong> order to check if they are the same word.<br />

Different thresholds of overlapp<strong>in</strong>g percentage are used to evaluate the accuracy of the segmentation<br />

process.<br />

The first approach has two types of experiments. The first one evaluates how the cluster<strong>in</strong>g<br />

process is done. The second one evaluates the accuracy of the retrieval process:<br />

• The experiment shows the relation between the basic features chosen by us<strong>in</strong>g 2D plots.<br />

• By us<strong>in</strong>g visual results, we observe the distribution of the observations of our ground truth<br />

<strong>in</strong> the clusters.<br />

• We evaluate the accuracy, the homogeneity and completeness of the cluster<strong>in</strong>g us<strong>in</strong>g Vmeasure<br />

(expla<strong>in</strong>ed <strong>in</strong> section 9.3).<br />

• The accuracy of the retrieval process is evaluated by means of a precision-recall curve.<br />

The second approach is evaluated by means of precision-recall curves:<br />

• Two experiments are used to assess the accuracy of this approach by us<strong>in</strong>g different characteristics<br />

pixels (background and foreground pixels).<br />

• Both characteristics po<strong>in</strong>ts are compared <strong>in</strong> order to evaluated.<br />

9.3 Metrics<br />

One drawback of cluster<strong>in</strong>g process is the proper selection of the number of clusters. Learn<strong>in</strong>g<br />

process consist <strong>in</strong> bunch<strong>in</strong>g the observations <strong>in</strong> different clusters. The ideal solution is achieved<br />

when all the <strong>in</strong>stances of the same word are <strong>in</strong> the same cluster, and each cluster has only <strong>in</strong>stances<br />

of only one word. The results of the retrieval process depend on the accuracy <strong>in</strong> the cluster<strong>in</strong>g process.<br />

The evaluation of the cluster<strong>in</strong>g process has been done us<strong>in</strong>g V-measure [22]. V-measure is an<br />

entropy-based measure which explicitly measures how successfully the criteria of homogeneity and<br />

completeness have been satisfied. V-measure is computed as the “mean” of dist<strong>in</strong>ct homogeneity<br />

and completeness scores, that is, V-measure can be weighted to favour the contributions of homogeneity<br />

or completeness. A cluster<strong>in</strong>g result satisfies homogeneity if each one of its clusters conta<strong>in</strong><br />

only data po<strong>in</strong>ts which are members of a s<strong>in</strong>gle class, and a cluster<strong>in</strong>g result satisfies completeness<br />

if all the data po<strong>in</strong>ts that are members of a given class are elements of the same cluster<br />

The retrieval process is evaluated us<strong>in</strong>g precision-recall curves:<br />

recall =<br />

precision =<br />

number of relevant items retrieved<br />

number of relevant items <strong>in</strong> collection<br />

number of relevant items retrieved<br />

total number of items retrieved<br />

22<br />

(4)<br />

(5)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!