26.04.2013 Views

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

• The number of the <strong>in</strong>tersections is quantized.We have bounded the number of <strong>in</strong>tersections <strong>in</strong><br />

<strong>in</strong>tervals. Each direction has a different <strong>in</strong>terval. This bound<strong>in</strong>g do more robust the feature.<br />

• Two modes are implemented to compute the feature vector, namely background and foreground<br />

pixels.<br />

To obta<strong>in</strong> the number of <strong>in</strong>tersections for each direction a th<strong>in</strong>n<strong>in</strong>g operator is previously applied<br />

to the image. Th<strong>in</strong>n<strong>in</strong>g allows to get the skeleton of the image consist<strong>in</strong>g of l<strong>in</strong>es of width of 1<br />

pixel.<br />

Figure 12: Characteristic Loci feature of a s<strong>in</strong>gle po<strong>in</strong>t of the word page.<br />

The feature vector is computed by assign<strong>in</strong>g a number to each background (or foreground) pixel<br />

as show <strong>in</strong> Fig. 12. The features are computed accord<strong>in</strong>g to the number of <strong>in</strong>tersections with the<br />

the background pixels of the image <strong>in</strong> right, upward, left and downward directions. In previous<br />

works, the characteristic Loci method has been applied for digit and isolated letter recognition.<br />

In this work, to reduce the dimension of the feature space the maximum number of <strong>in</strong>tersection<br />

has been limited to 3 values (0, 1 and 2). Delimit<strong>in</strong>g the number of possible values we reduce the<br />

number of comb<strong>in</strong>ations. The length of the feature vector is proportional to the number of possible<br />

values. For example, with 3 possible values and 8 directions, we obta<strong>in</strong> 3 8 (6561) comb<strong>in</strong>ations;<br />

with 4 possible values we have 3 4 (65536). It <strong>in</strong>creases <strong>in</strong> exponential way and the computational<br />

cost (and time) <strong>in</strong>creases <strong>in</strong> the same way.<br />

Characteristic Loci feature was designed for digit and isolated letter recognition, and the number<br />

of <strong>in</strong>tersections was bounded. The orig<strong>in</strong>al approach uses the same <strong>in</strong>terval <strong>in</strong> all directions. In<br />

this work we have also bounded the number of <strong>in</strong>tersections. We have normalized the number of<br />

<strong>in</strong>tersections. For each direction we have def<strong>in</strong>ed a different <strong>in</strong>terval for each value. The horizontal<br />

direction has a bigger <strong>in</strong>terval than the vertical direction. In the orig<strong>in</strong>al approach the digits or<br />

characters have a similar height and width, but <strong>in</strong> our approach the width of the words is usually<br />

bigger than the height. Accord<strong>in</strong>g with the dimensions of the words the range of the <strong>in</strong>tervals are<br />

<strong>in</strong> harmonious. Diagonal directions are a comb<strong>in</strong>ation of the two other directions. Table 1 shows<br />

the <strong>in</strong>tervals for each direction.<br />

Accord<strong>in</strong>g to the above encod<strong>in</strong>g, for each background pixel, an eight digit number <strong>in</strong> base 3<br />

is obta<strong>in</strong>ed. For <strong>in</strong>stance, the locus number of po<strong>in</strong>t P <strong>in</strong> Fig. 12 is (22111122)3 = (6170)10. The<br />

locus numbers are between 0 and 6561 (= 3 8 ). This is done for all background pixels. In this case,<br />

19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!