BISHOP AND FAVARO: THE LIGHT FIELD CAMERA: EXTENDED DEPTH OF FIELD, ALIASING, AND SUPERRESOLUTION 979Let us use Fig. 7b to consider: 1) how a vector or a pointis mapped from the radiance onto the sensor under anarbitrary microlens at c, <strong>and</strong> 2) where the correspondences<strong>of</strong> the point c lie in other views if this point is at angle inone view.To find 1, begin with the purple vector u c. By similartriangles <strong>and</strong> by projecting first through O to the red vector<strong>and</strong> then through c to the green one, we see that the purplevector image under lens c is scaled by (see (12)). Notingthat the local origin is 0 , we can equivalently express themapping <strong>of</strong> the point u in r (the tip <strong>of</strong> the vector) through alens at c to a subimage correspondence at ¼v 0v z 0 ðuÞz 0 ðuÞ v 0 ðc uÞ ð15Þ¼ ðuÞðc uÞ: ð16ÞBy inverting this relation, the original point u in rcorresponding to any <strong>and</strong> c is uð; cÞ ¼cðuÞ, <strong>and</strong> theviews <strong>and</strong> subimages are related to the radiance as:V ðcÞ ¼S c ðÞ ¼rðcðuÞ Þ. V ðcÞ <strong>and</strong> S c ðÞ differ only bywhich <strong>of</strong> or c we hold fixed.Considering (2), we can reformulate the above ideas. Fora point c 1 in a particular view at angle 1 , we can find itscorrespondence uð 1 ;c 1 Þ in the radiance, <strong>and</strong> then solve forc 2 so that V 1 ðc 1 Þ¼rðuÞ ¼V 2 ðc 2 Þ, for arbitrary 2 . <strong>The</strong>trick is to refer everything to a common reference framewhere ðuÞ is defined (the points share the same depth/magnification). We choose this reference frame to be thecentral view 0 ¼ 0, where we have c ¼ u <strong>and</strong> V 0 ðcÞ ¼V 0 ðuÞ ¼rðuÞ, i.e., this view samples the radiance directly.This can be seen in Fig. 7b as the microlens placed at u.<strong>The</strong> result is that c 1 ¼ u þ 1ðuÞ<strong>and</strong> c 2 ¼ u þ 2ðuÞ . <strong>The</strong>discrete version <strong>of</strong> these equations, which we describebelow, leads us to the view matching in (3). We may alsointerpret these matches as positions ðc 1 ; 1 Þ <strong>and</strong> ðc 2 ; 2 Þ onthe same ray in Fig. 8, where 1 is the slope <strong>of</strong> the ray <strong>and</strong> uis where the ray intersects the x-axis.6.2 Discretization <strong>of</strong> Views <strong>and</strong> SubimagesV ðcÞ <strong>and</strong> S c ðÞ are defined for all possible c <strong>and</strong> . Inpractice, if we approximate the microlens array with anarray <strong>of</strong> pinholes, 4 only a discrete set <strong>of</strong> samples in eachview is available, corresponding to the pinholes at positionsc ¼ c k . Furthermore, the pixels in each subimage sample thepossible views at q . <strong>The</strong>refore, we define the discreteobserved view ^V q at angle q as the image^V q ðkÞ ¼ : V ðc q kÞ¼r c k qðc k Þ¼ rdk ð sðc k ÞdqÞ; ð17Þwhere we defined the view disparity, in pixels per view, assðuÞ ¼ : 1d ðuÞ :ð18Þ<strong>The</strong> discrete disparity is sðc k Þ <strong>and</strong> depends on the depth z.<strong>The</strong> discretized subimages are just a rearrangement <strong>of</strong> the4. We will see that the addition <strong>of</strong> microlens blur due to finite apertureswill integrate around these sample locations.LF samples; in fact they are also defined by (17), i.e.,^S k ðqÞ ¼ ^V q ðkÞ.In a similar manner to the continuous case, two discreteviews at q 1 <strong>and</strong> q 2 can be related via the reference view as^V q1 ðk þ sðc k Þq 1 Þ¼ ^V 0 ðkÞ ¼ ^V q2 ðk þ sðc k Þq 2 Þ;ð19Þthus obtaining the matching terms in (3). By defining thesubimage disparity, tðc k Þ¼ : 1sðc kÞ, subimages may also berelated via^S k0 þk 1ðq þ tðc k0 Þk 1 Þ¼ ^S k0 ðqÞ ¼ ^S k0 þk 2ðq þ tðc k0 Þk 2 Þ:ð20Þ<strong>The</strong> discrete views in (17) are just samples <strong>of</strong> r withspacing d, but different shifts sðuÞdq, depending on theview angle <strong>and</strong> depth. <strong>The</strong> multiview disparity estimationtask is to estimate sðuÞ by shifting the views so that they arebest aligned. However, this requires subpixel accuracy, i.e.,an implicit or explicit reconstruction <strong>of</strong> r in the continuum.According to the sampling theorem, r may be reconstructedexactly from the samples taken at spacing d so long as theoriginal radiance image contains no frequencies higher thanthe Nyquist rate f 0 ¼ 12d. In practice, this condition is <strong>of</strong>tennot satisfied due to the low resolution <strong>of</strong> the views, <strong>and</strong>aliasing occurs. Observe that a larger microlens pitch leadsto greater aliasing <strong>of</strong> the views.6.3 Ideal <strong>and</strong> Approximate Antialiasing FilteringIdeally the LF should be antialiased before views areextracted, i.e., we should combine information acrossviews. We make use <strong>of</strong> an extension <strong>of</strong> the samplingtheorem by Papoulis [37], showing that if r is b<strong>and</strong>limitedwith a b<strong>and</strong>width f r ¼ Qf 0 =, then it can be accuratelyreconstructed on a grid with spacing if we have Q=sets <strong>of</strong> samples available, with any shifts or linear filtering<strong>of</strong> the original signal. This implies that we obtain thecorrectly antialiased views ~V q ðkÞ from the sampled lightfield as follows:1. Use a reconstruction method FðÞ jointly on allsamples to obtainrðuÞ ¼Fðf^V q 0ðk0 Þg;uÞ ¼ X k 0 ;q 0k 0 ;q 0ðuÞ ^V q 0ðk0 Þfor some set <strong>of</strong> interpolating kernels k 0 ;q (we could0use the theorem from [37] to define these kernels,but essentially this operation corresponds to applyingany (SR) method).2. Filter these samples with an antialiasing filter h f0 atthe correct Nyquist rate f 0 to obtain~rðuÞ ¼ðh f0 ?rÞðuÞ:3. Resample to obtain ~V q ðkÞ ¼~rðc k þ sðuÞ d qÞ.A drawback <strong>of</strong> this approach is that a computationallydem<strong>and</strong>ing (SR), as well as filtering at a high resolutionbefore extracting low resolution views, is required. Moreover,a chicken-<strong>and</strong>-egg type problem is apparent: <strong>The</strong>depth-dependent filters depend on the unknown depthmap. Thus, we look at an approximate but efficient method.Rather than filtering the whole LF simultaneously, wefilter each subimage directly, bypassing the reconstruction
980 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012Fig. 10. Antialiasing filtering, increasing from (a) to (d). Top row: Detail <strong>of</strong>subimages. Middle: Corresponding filtered full view. Bottom: Magnifieddetail <strong>of</strong> the view.step. Since each subimage is a windowed projection <strong>of</strong> r ontothe sensor (ignoring blur for now), we may equivalentlyproject the filters in the same way. This is approximate atsubimage boundaries, where we must use filters with asupport limited to the domain <strong>of</strong> . Hence, we upper boundthe filter size using a Lanczos windowed version <strong>of</strong> the idealSinc kernel. <strong>The</strong> antialiasing filter h f0 , defined in r, isprojected onto the sensor via the conjugate image at z 0 , i.e.,scaling by jj, as in (16). Hence, the scaled filter has physicalcut<strong>of</strong>f frequency f 0 jj. We propose an iterative method,beginning with a strong antialiasing filter, <strong>and</strong> refining theestimate based upon the current depth map. Too muchfiltering might remove detail for valid matches, while toolittle may leave aliasing behind (see Fig. 10). We summarizethe algorithm as follows:1. Initialize all filters with cut<strong>of</strong>f f 0 jj max , i.e., assumingthe depth which yields the most aliasing in theworking volume.2. Estimate the disparity map sðc k Þ (see Section 6.5).3. Rearrange the views as subimages ^S k ðqÞ.4. For each k, filter ^S k ðqÞ by h f0 jj, using ¼ dsðc k Þ .5. Repeat from 2 until the disparity map update isnegligible.6.4 Microlens BlurWith finite microlens apertures each pixel integrates over alarger area <strong>and</strong> aliasing is reduced due to additional blur(see Fig. 8). By taking this into account we can use milderantialiasing.As the antialiasing filter for an array <strong>of</strong> pinhole lenses isa Sinc filter, we define the antialiasing kernel size as this1filter’s first zero crossing, i.e.,2f 0 jj. <strong>The</strong> correct amount <strong>of</strong>antialiasing is readily obtained by comparing this size withthe blur radius b. <strong>The</strong>n, the final antialiasing filter has a1radius approximated as j2f 0 jjbj, clipped from below at 0<strong>and</strong> from above by d 2. Fig. 11 shows the resulting filtersizes for the settings used in Section 8.1.2.6.5 Regularized <strong>Depth</strong> EstimationWe now have all the necessary ingredients to work on theenergy introduced in (3). <strong>The</strong> depth map s is discretized atc k as a vector s ¼fsðuÞg u2fck ;8kg. Due to the ill-posedness<strong>of</strong> the problem, we introduce regularization, favoringpiecewise constant solutions by using the total variationFig. 11. Microlens blur <strong>and</strong> antialiasing filter sizes versus depth.(a) Overlap <strong>of</strong> filter kernel size <strong>and</strong> microlens blur radius for differentdisparity (depth) values. (b) Resulting antialiasing kernel size fordifferent depth values.term krsðuÞk 1 , where r is the 2D gradient with respect tou. Hence, we wish to solve~s ¼ arg min E data ðsÞþkrsðuÞks1 ; ð21Þwhere >0 determines the trade<strong>of</strong>f between regularization<strong>and</strong> data fidelity (in our experiments we chose ¼ 10 3 ).We minimize this energy by using an iterative solution. Bynoticing that E data can be written as a sum <strong>of</strong> termsdepending on a single entry <strong>of</strong> s at once, we find aninitialization s 0 by performing a fast brute force search inE data for each c k independently. <strong>The</strong>n, we approximateE data via a second order Taylor expansion, i.e.,E data ðs tþ1 Þ’E data ðs t ÞþrE data ðs t Þðs tþ1s t Þþ 1 2 ðs tþ1 s t Þ T H Edata ðs t Þðs tþ1 s t Þ;ð22Þwhere rE data <strong>and</strong> H Edata are the gradient <strong>and</strong> Hessian <strong>of</strong>E data , <strong>and</strong> subscripts t <strong>and</strong> t þ 1 denote iteration number.To ensure our local approximation is convex we take theabsolute value (component wise) <strong>of</strong> H Edata ðs t Þ. In the case<strong>of</strong> the term krsðuÞk 1 , we use a first order Taylor expansion<strong>of</strong> its gradient. Computing the Euler-Lagrange equations <strong>of</strong>the approximate energy E with respect to s tþ1 thislinearization results inrE data ðs t ÞþjH Edata ðs t Þjðs tþ1 s t Þ r rðs tþ1 s t Þ¼ 0;jrs t jð23Þwhich is a linear system in the unknown s tþ1 , <strong>and</strong> can beefficiently solved using conjugate gradients (CG).7 LIGHT FIELD SUPERRESOLUTIONSo far we devised an algorithm to reduce aliasing in views<strong>and</strong> estimate the depth map. We now define a computationalPSF model, <strong>and</strong> formulate the MAP problempresented in Section 3.7.1 <strong>Light</strong> <strong>Field</strong> <strong>Camera</strong> Point Spread Function7.1.1 PSF DefinitionCombining the analysis from Sections 4 <strong>and</strong> 5, we c<strong>and</strong>etermine the system PSF <strong>of</strong> the plenoptic camera h LIs —which is unique for each point in 3D space, <strong>and</strong> will be acombination <strong>of</strong> main lens <strong>and</strong> microlens array blurs. Wedefine this PSF such that the intensity at a pixel i caused bya unit radiance point at u with a disparity sðuÞ is