13.07.2015 Views

The Light Field Camera: Extended Depth of Field, Aliasing, and ...

The Light Field Camera: Extended Depth of Field, Aliasing, and ...

The Light Field Camera: Extended Depth of Field, Aliasing, and ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

BISHOP AND FAVARO: THE LIGHT FIELD CAMERA: EXTENDED DEPTH OF FIELD, ALIASING, AND SUPERRESOLUTION 975l ¼ H s r;ð1Þwhere l <strong>and</strong> r are rearranged as column vectors. H s embedsboth the camera geometry (e.g., its internal structure, thenumber, size, <strong>and</strong> parameters <strong>of</strong> the optics) <strong>and</strong> the scenedisparity map s. In general, the only quantities directlyobservable are the LF image l <strong>and</strong> the camera geometry, <strong>and</strong>one has to recover both r <strong>and</strong> s. Due to the dimensionality <strong>of</strong>the problem, in this manuscript we consider a two-stepapproach where we first estimate the disparity map s <strong>and</strong>then recover the radiance r given s. In both steps, weformulate inference as an energy minimization.For now, assume that the disparity map s is known.<strong>The</strong>n, one can employ (SR) by estimating r directly from theobservations. Due to the fact that the problem may beparticularly ill-posed depending on the extent <strong>of</strong> thecomplete system, proper regularization <strong>of</strong> the solutionthrough prior modeling <strong>of</strong> the image data is essential. Wecan then formulate the estimation <strong>of</strong> r in the Bayesianframework. Under the typical assumption <strong>of</strong> additiveGaussian observation noise w, the model becomesl ¼ H s r þ w, to which we can associate a conditionalprobability funtion (PDF), the likelihood pðlj r; H s :Þ.We then introduce priors on r. We use a recentlydeveloped prior [31], [32], which can locally recover texture,in addition to smooth <strong>and</strong> edge regions in the image. Bycombining the prior pðrÞ with the likelihood from the noisyimage formation model we can then solve the maximuma posteriori (MAP) problem:^r ¼ arg max pðlj r; H s ÞpðrÞ:ð2Þr<strong>The</strong> MAP problem requires evaluating H s , whichdepends on the unknown disparity map s. To obtain s weconsider extracting views (images from different viewpoints)from the LF so that our input data are suitable for amultiview geometry algorithm (see Section 6). <strong>The</strong> multiviewdepth estimation problem can then be formulated asinferring a disparity map s ¼ : fsðc k Þg by finding correspondencesbetween the views for each 2D location c k visible inthe scene. Let ^V q denote the sampled view from the 2Dviewing direction q <strong>and</strong> ^V q ðkÞ the color measured at a pixelk within ^V q . <strong>The</strong>n, as we will see, depth estimation can beposed as the minimization <strong>of</strong> the joint matching error (plusa suitable regularization term) between all combinations <strong>of</strong>pairs <strong>of</strong> views:E data ðsÞ ¼Xð ^V q1 ðk þ sðc k Þq 1 Þ ^V q2 ðk þ sðc k Þq 2 ÞÞ;8q 1 ;q 2 ;kwhere is some robust norm <strong>and</strong> q 1 ;q 2 are the 2D <strong>of</strong>fsetsbetween each view <strong>and</strong> the central view (the exactdefinition is given in Section 6.1). In practice, to savecomputational effort, only a subset <strong>of</strong> view pairs fq 1 ;q 2 gmay be used in (3). Notice that this definition <strong>of</strong> the 2D<strong>of</strong>fset implicitly fixes the central view as the reference framefor the disparity map s.As the views may be aliased, minimizing (3) is liable tocause incorrect depth estimates around areas <strong>of</strong> highspatialfrequency in the scene. Put simply, even whenscene objects are Lambertian <strong>and</strong> without the presence <strong>of</strong>noise, the views might not satisfy the photoconsistencyð3ÞFig. 4. (a) 2D Schematic <strong>of</strong> a LF camera. Rays from a point p are splitinto several beams by the microlens array. (b) Three example imagescorresponding to the colored planes in space (dashed) <strong>and</strong> theirconjugates (solid). Top: p 0 before microlenses; subimages flipped.Middle: p 0 on microlenses; no repetitions. Bottom: p 0 is virtual, beyondthe microlenses; no flipping.criterion sufficiently well so that E data may not have aminimum at the true depth map. Moreover, subpixelaccuracy is usually obtained through interpolation. Thismight be a reasonable approximation when the viewscollect samples <strong>of</strong> a b<strong>and</strong>-limited (i.e., sufficiently smooth)texture. However, as shown in Fig. 3, this is not the casewith LF cameras. <strong>The</strong>refore, we have to explicitly definehow samples are interpolated <strong>and</strong> study how this affectsthe matching <strong>of</strong> views.We shall also see that there are certain planes where thesample locations from different views coincide. At theseplanes, aliasing no longer affects depth estimation, but extrainformation for (SR) is diminished.4 IMAGE FORMATION OF A LIGHT-FIELD CAMERAIn this section, we derive the image formation model <strong>of</strong> aplenoptic camera, <strong>and</strong> define the relationship betweendifferent camera parameters. To yield a practical computationalmodel suitable for our algorithm (Section 7), weinvestigate the imaging process with tools from geometricoptics [33], ignoring diffraction effects, <strong>and</strong> using the thinlens model. We will also analyze sampling <strong>of</strong> the LF cameraby using the phase-space domain [7].4.1 Imaging ModelIn our investigation, we rebuilt a light field camera similarto that <strong>of</strong> Ng et al. [3]—essentially a regular camera with amicrolens array placed near the sensor (see Fig. 4)—but, asin [21], we consider the imaging system under a generalconfiguration <strong>of</strong> the optical elements. However, unlike inany previous work, we determine the image formationmodel <strong>of</strong> the camera so that it can be used for SR or moregeneral tasks.We use a general 3D scene representation (ignoringocclusions), consisting <strong>of</strong> the all-focused image, or radiance,rðuÞ (as captured by a pinhole camera, i.e., with aninfinitesimally small aperture), plus a depth map zðuÞassociated with each point u. Both r <strong>and</strong> z are defined at themicrolens array plane such that r is the all-focused imagethat would be captured by a regular camera. In this way, wecan analyze both the PSF <strong>of</strong> each point in space (correspondingto a column <strong>of</strong> H s ) <strong>and</strong> sampling <strong>and</strong> aliasingeffects in the captured LF with a less bulky notation. Wefirst consider the equivalence between using points in space

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!