26.04.2013 Views

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

Handwritten Word Spotting in Old Manuscript Images using Shape ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7. Pixel-based descriptors organized <strong>in</strong> a hierarchical structure<br />

The first approach of this work is based <strong>in</strong> two pixel-based descriptors (basic features and BSM<br />

features), and they are organized us<strong>in</strong>g a hierarchical structure of clusters. The objective is to do<br />

several layers of clusters us<strong>in</strong>g diverse features. Top layer is composed by basic features. Down<br />

layer consists of pixel distribution based features, <strong>in</strong> particular BSM.<br />

This approach bunches the words <strong>in</strong>to clusters with similar features. When we downward of<br />

layer, we only use the observations of the cluster chosen to cluster the words with the new k<strong>in</strong>d<br />

of features (Fig. 10). We reduce <strong>in</strong> each layer the number of observations that we are comput<strong>in</strong>g<br />

features for the new layer, and the classification process is faster.<br />

7.1 Feature extraction<br />

In pattern recognition and <strong>in</strong> image process<strong>in</strong>g feature extraction is a special form of dimensionality<br />

reduction. The objective is to transform the <strong>in</strong>put data <strong>in</strong>to a reduced representation set of features<br />

(feature vector). The observations of the experiments have different ways to represent those us<strong>in</strong>g<br />

different features. The objective is to select the best features that describe the image.<br />

The marriage licences corpus of the cathedral of Barcelona is composed of 244 volumes, too<br />

much <strong>in</strong>formation to be <strong>in</strong>dexed directly. Consequently the computational time cost <strong>in</strong>crease with<br />

the number of documents of our corpus.<br />

Retrieval time can be reduced us<strong>in</strong>g a hierarchical <strong>in</strong>dexation structure. The features of our<br />

corpus are divided <strong>in</strong> several groups (clusters) <strong>in</strong> each layer, and then, each layer is divided <strong>in</strong> other<br />

groups us<strong>in</strong>g other features (Fig. 10).<br />

Figure 10: The structure <strong>in</strong> layers of the feature extraction.<br />

In this work two types of features have been used: Basic features and BSM features. The first<br />

layer uses basic features to do a rough separation of the word classes. In the second layer we have<br />

used features based <strong>in</strong> pixel distribution. Each cluster of the first layer is split us<strong>in</strong>g BSM features.<br />

16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!