29.05.2013 Views

RR_03_02

RR_03_02

RR_03_02

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

marked as common elements and classified as<br />

background elements.<br />

3. Results and Discussion:<br />

We have tested our technique on 7 different<br />

documents ranging from marketing brochures, interior<br />

design books, a catalogue, and a weekly journal, for total<br />

of 130 pages. As shown in table I, among all the pages,<br />

there are four false negative cases where the header,<br />

footer or the pattern is slightly different from ones on the<br />

rest of the pages in the document. There are three false<br />

positive cases, where one pattern belonging to the<br />

foreground content is accidentally repeated on three<br />

different pages. Figure I shows examples of background<br />

pattern recognition and extraction from a catalogue.<br />

Figure la shows the original pages from the document,<br />

and figure I b shows the extracted background pattern<br />

from the document.<br />

We have also tested this algorithm on documents that<br />

are a collection of tickets for different tourist attractions<br />

and supermarket weekly advertisements. Because of the<br />

repetitive text patterns in the content, text elements that<br />

belong to content were mistakenly classified as<br />

background pattern elements. For those types of<br />

document, further restriction of the elements type may be<br />

required.<br />

Documents Total number Number of false negative pages(missing Number of false positive pages(picking<br />

of pages Background elements) elements in the foreground content)<br />

Catalogue 21 2, missing a header which is slightly 0<br />

longer than ones in the rest of pages<br />

Book 16 0 0<br />

Marketing 4 0 0<br />

brochure<br />

Marketing 10 0 3, adding one graphic pattern in content<br />

brochure to the background pattern)<br />

Book 22 I, the footnote pattern is slightly longer 0<br />

than the ones in the rest of pages<br />

Marketing 2 0 0<br />

brochure<br />

Marketing 32 0 0<br />

brochure<br />

Weekly 24 I, missing footer which is longer and in 0<br />

Journal slightly different position than ones in the<br />

rest of the pages.<br />

Table I: Results from background pattern extraction<br />

43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!