Document Binarization with Automatic Parameter Tuning

More documents

Recommendations

Info

labeled as such. In the equation below, φ will takeon a large negative value.L 1 ij ={ −∇ 2 I ij I ij ≤ µ r ij + 2σr ijφ I ij > µ r ij + 2σr ij(4)The neighbor mismatch penalties C h ij and Cv ij offerthe opportunity to employ the third strategy mentionedabove, incorporating the Canny edge map.(Recall that Canny [6] first smooths the image with aGaussian filter of small radius σ E , then finds edges atlocal directed maxima of the image gradient, choosingto retain only those edges selected using a hysteresisprocedure with two thresholds t hi and t lo .) The algorithmsets the mismatch penalties to a uniform valuec everywhere except between pixels where a Cannyedge has identified a likely discontinuity. To be morespecific, Canny identifies pixels as edges, while Equation1 requires locating discontinuities in the connectionsbetween pairs of pixels. To address this discrepancy,the formulation below zeros out the discontinuitypenalty between Canny edge pixels and theirbrighter neighbors, effectively choosing to include theCanny edge pixels within the inked area. The oppositechoice would also be self-consistent, but wouldinhibit detection of single pixel width strokes.⎧⎨ 0 if E ij ∧ (I ij < I i+1,j )Cij h = 0 if E i+1,j ∧ (I ij ≥ I i+1,j )⎩c otherwise⎧⎨ 0 E ij ∧ I ij < I i,j+1Cij v = 0 E i,j+1 ∧ I ij ≥ I i,j+1⎩c otherwise(5)(6)The choice of constant discontinuity penalties everywhereexcept at edges deserves a note. Onemight imagine using a penalty that varies continuouslyaccording to the similarity in intensity betweenthe neighbors. Empirically this approach seems lesssuccessful, perhaps because it actually gives littleguidance about the best precise location of the inkbackgroundtransition: the intensity differences tendto be fairly large everywhere within a few pixels ofthe actual boundary, and thus it becomes too easy tochoose the wrong location.2.1 ParametersThe algorithm described above includes six free parameters,of which only two strongly influence the binarizationoutcome and require varying settings fordifferent images. The four less important parameterscan be set to a constant with negligible consequencesfor all images tested, either because they donot strongly influence the result or because the bestvalue appears not to fluctuate much for different images.Thus finding automatic values for the two importantparameters suffices to create a method thatcan be applied as a “black box” with little or no parametertesting required.Two of the less important parameters appear inEquation 4: r and φ. Of these, r must be largeenough to encompass at least a few background pixels,and thus should be set to some value larger thanthe expected stroke width. φ can be any sufficientlynegative value. No attempt is made to optimize theseparameters, and the experiments all use r = 20 andφ = −500, for images with grayscale intensity inthe range from 0 to 255. The remaining less importantparameters are two of the three that control theCanny edge detection algorithm, namely the lower ofthe two edge detection thresholds t lo and the radiusof smoothing applied prior to edge detection σ E . Badvalues for these parameters can certainly hurt binarizationquality, but fortunately the experiments willshow that there exists a uniform setting that workson all test images with near-optimal results.The two important parameters that remain interactwith each other to determine the final binarizationresult. Most crucial is c, whose magnitude determinesthe relative balance between data fidelityand regularization. The high Canny edge detectionthreshold t hi matters most when c is large becausethe detection or nondetection of an edge can determinewhether an entire ink component appears ordisappears in the minimal energy solution. The highCanny threshold plays a particularly important rolein images that exhibit ink bleeding through from theopposite side of the paper: since the bled ink tends tohave weaker edges, fortuitous thresholding can eliminatemost of the false ink components.Early experiments with this binarization scheme,4
and indeed for most approaches to binarization, haverelied on choosing parameter values that work reasonablywell across a wide range of images. Thiscan be achieved by searching for the best values on atraining set of images similar to the ones that will beused for testing, and yields good results [11]. However,even globally optimal parameter values embodysome compromise on individual images, and tuningthe settings to the image can offer significant gains.The next section explains how to do this.2.2 Automatic Parameter TuningTo understand how the crucial c parameter can be setautomatically, it is worthwhile to examine the behaviorof the base binarization algorithm under a rangeof different settings. Figure 1 shows the binarizationresults on a small image patch for a logarithmicallyincreasing set of values. When c is very low, the binarizationlooks like a simple sign operator on theLaplacian, with many small noisy components. Asc increases the noise components become more consolidatedand many disappear. For a range of middlevalues of c, the binarization stays fairly stable,with only small changes as c increases. At the highestvalues, the result becomes unstable again as largeink components disappear, and occasionally join orhave voids filled in. As c goes to infinity, the binarizationwill tend toward the majority label in eachedge-isolated component.The region of stable results at middle levels of cis significant, and turns out to show up consistentlyon ink-and-paper documents with reasons subject toexplanation. To visualize the phenomenon, define thenormalized binarization instability ξ ν (c) in terms of∆(B c , B νc ) the fraction of pixels that change labelsbetween two values of c differing by a factor ν.∆(B, B ′ ) =ξ ν (c) =m∑i=0 j=0n∑(B ij ≠ B ij) ′ (7)∆(B c , B νc )mn (ln(νc) − ln(c))(8)In the equation above, Bij c is the binarization label atpixel (i, j) using mismatch penalty c, and the Booleanoperator ≠ converts to 0 or 1 in the customary fashion.As Figure 2 shows, the general shape of theinstability curve is an intrinsic property of the imagethat does not depend on the exact value of νchosen; thus in most cases ξ can drop its subscript.Histograms provide a useful analogy: although thedetails vary slightly, a histogram made using a sufficientnumber of samples from the same distributionalways takes on the general appearance of that distributionfor any reasonable choice of bin size. Indeed,the instability plots may be seen as histogramsof pixel label transition events in respect to c, withlogarithmically increasing bin sizes.Figure 3 shows as solid lines the instability plots forseveral document images. Although the stable zonediffers somewhat in appearance and location in eachcase, its nearly universal presence is striking. Yet inhindsight, the observation of such a stable zone atmiddling c values should not offer such a surprise.The twin peaks on either side arise from understandablemechanisms that apply for all document images,and thus the absence of a stable zone for any particularimage would arise from exceptional circumstances.The c parameter controls the incentive for neighboringpatches to share the same label despite a datafidelity term indicating otherwise. The left peak inthe instability curve appears at low levels of c, as itbecomes large enough to overcome small noise fluctuationsand label most large homogeneous foregroundand background regions correctly. On the other hand,the right peak appears when c becomes large enoughto force a uniform label even across regions of differentunderlying ground truth. Unless the noise amplitudeexceeds the contrast between foreground andbackground, a zone of stability will normally separatethe two. Furthermore, this stability zone coincideswith binarizations that adhere to the underlying imagestructure while ignoring noise. Figure 3 showsas dotted lines the binarization error (based on theF-measure, as defined in Equation 14 below), whichtends to reach a minimum at or near the lowest pointin the instability curve. Computing the instabilitycurve does not require access to the ground truth, yetobserving the low point of the stable region revealsa value of c that typically achieves low ground-truthbinarization error.5
Page 1 and 2: Document Binarization with Automati
Page 3: 2 ApproachThe base approach to bina
Page 7: Figure 3: Binarization instability
Page 10 and 11: Algorithm 2. The metaparameters τ
Page 12 and 13: mean F-measure of 95.3% across all
Page 14 and 15: Distance reciprocal distortion metr
Page 16: [7] Chen, Y., Leedham, D.: Decompos

Document Binarization with Automatic Parameter Tuning

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?