12.07.2015 Views

A new method for perspective correction of document images

A new method for perspective correction of document images

A new method for perspective correction of document images

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

in the <strong>document</strong> image, and some parameters that are not automatically tuned, like some parameters used todiscard false detections <strong>of</strong> top and bottom lines <strong>of</strong> the text; thus, small changes on the value <strong>of</strong> those parameterscan make the <strong>method</strong> fail. More recently, Lu and Tan have presented an extended version <strong>of</strong> the mentionedwork, 3 although it inherits the discussed drawbacks.Morphological operations are also used in the <strong>method</strong> proposed by Miao and Peng, 4 although a smallerknowledge about the contents <strong>of</strong> the <strong>document</strong> is required in comparison with Lu et al.’s aproaches. Additionally,an adaptive thresholding technique is adopted to binarize the capture, which makes the <strong>method</strong> capable <strong>of</strong> dealingwith lighting variations. Nevertheless, no clue is provided on how to obtain the value <strong>of</strong> some parameters —likethe size <strong>of</strong> structure elements used to cluster the detected connected components into text lines—. Furthermore,the image <strong>correction</strong> process requires the use <strong>of</strong> three trans<strong>for</strong>mations, which are computationally expensive.Methods that do not depend so much on the contents <strong>of</strong> the <strong>document</strong>, as they are not exclusively aimed attext-based <strong>document</strong>s, are due to Yun. 5,6 The first <strong>of</strong> them is centered on the <strong>perspective</strong> estimation problem,while the second one is focused on the rectification system design. Indeed, Yin et al. 5 describe a <strong>method</strong> thatuses textual in<strong>for</strong>mation —if it is available— and also other sources <strong>of</strong> in<strong>for</strong>mation, such as <strong>document</strong> boundaries.Nevertheless, most <strong>of</strong> the <strong>method</strong> is based on the clues provided by textual in<strong>for</strong>mation. In this case there arealso some parameters that are not automatically tuned —like the thresholds used to classify a detected line ashorizontal—. On the other hand, both <strong>method</strong>s are carefully designed to minimize the computational cost. Infact, a multi-stage approach to <strong>perspective</strong> <strong>correction</strong> is proposed 6 which is able to avoid some unnecesary stages.Nevertheless, the most important drawback <strong>of</strong> these <strong>method</strong>s is the fact that they are not able to recover theoriginal aspect ratio <strong>of</strong> the <strong>document</strong>.Iwamura et al. 7 proposed a <strong>method</strong> that estimates the depth <strong>of</strong> each area <strong>of</strong> the <strong>document</strong> by using measurements<strong>of</strong> its textual contents, like the variation <strong>of</strong> the area <strong>of</strong> characters with respect to their position.Nevertheless, the proposed aproach does not obtain the focal length, being only able to recover an affine distortedversion <strong>of</strong> the original <strong>document</strong>.Finally, there exist some considerations that can make the <strong>method</strong>s based on text <strong>document</strong>s undesirable<strong>for</strong> general applications. First <strong>of</strong> all, it is obvious that the requirement <strong>of</strong> dealing with text <strong>document</strong>s reducesthe generality <strong>of</strong> the designed <strong>method</strong>s, and hence, the potential number <strong>of</strong> applications. In addition, most <strong>of</strong>the described <strong>method</strong>s constrain features like the size <strong>of</strong> the text, as well as its variation over the <strong>document</strong>, orparameters about the paragraphs <strong>for</strong>matting. Furthermore, according to the described algorithms, most <strong>of</strong> themcould fail when restoring handwritten <strong>document</strong>s, or those written with some particular typographies, such asitalics, where the tips <strong>of</strong> the characters are not vertical. Also, the presence <strong>of</strong> several columns <strong>of</strong> text can makesome <strong>method</strong>s fail, as well as the consideration <strong>of</strong> different alphabets —like some kinds <strong>of</strong> writing that are notordered from left to right and from top to bottom—. According to these facts, it is reasonable to pay attentionto those <strong>method</strong>s that do not require the presence <strong>of</strong> text in the <strong>document</strong>.1.2 Methods not requiring text in the <strong>document</strong>A more theoretic approach than those followed in the <strong>method</strong>s presented in the previous section is introducedby Liebowitz and Zisserman. 8 This approach is not only suitable <strong>for</strong> recovering the fronto-parallel view <strong>of</strong> text<strong>document</strong>s, but also <strong>for</strong> describing the geometry, constraints and algorithmic implementations that allow metricproperties <strong>of</strong> figures on a plane, like angles and length ratios, to be measured from a captured image <strong>of</strong> that plane.Perhaps the most novel contribution <strong>of</strong> this work is the presentation <strong>of</strong> different ways <strong>of</strong> providing geometricalconstraints, including the availability <strong>of</strong> a known angle in the original scene, two equal but unknown angles, or aknown length ratio. Un<strong>for</strong>tunately, <strong>for</strong>mal pro<strong>of</strong>s supporting those procedures are not provided. It must be alsonoted that, depending on the level <strong>of</strong> knowdelegde about the contents <strong>of</strong> the <strong>document</strong> —measured by means <strong>of</strong>the number <strong>of</strong> known pairs <strong>of</strong> orthogonal lines in the original scene—, more than a single image trans<strong>for</strong>mationmay be required.On the other hand, there also exist a great number <strong>of</strong> publications that deal with camera calibration. Althoughthey are not designed <strong>for</strong> per<strong>for</strong>ming <strong>perspective</strong> distortion <strong>correction</strong> <strong>of</strong> captures, they can be used to estimatethe camera position and its orientation in relation to the imaged <strong>document</strong>. Un<strong>for</strong>tunately, since those <strong>method</strong>shave been designed <strong>for</strong> other purposes, most <strong>of</strong> them cannot be applied to the <strong>perspective</strong> distortion <strong>correction</strong>problem.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!