<strong>IEEE</strong> COMSOC MMTC E-LetterREFERENCES[1] J. Cutting, and P. Vishton, "Perceivinglayout and knowing distances: Theintegration, relative potency, and contextualuse of different information about depth," inW. Epstein & S. Rogers (eds), Perception ofSpace and Motion, Academic Press, SanDiego, CA, pp. 69-117, 1995.[2] T. Okoshi, "Three dimensional displays,"Proceedings of the <strong>IEEE</strong>, vol. 68, pp.548-564, 1980.[3] S. Pastoor, "3-D displays: A review ofcurrent technologies," Displays, vol. 17,no. 2, pp. 100-110, 1997.[4] I. Sexton, and P. Surman, “Stereoscopic andautostereoscopic display systems,” <strong>IEEE</strong>Signal Processing Magazine, vol. 16, no.3, pp. 85-99, 1999.[5] W.A. IJsselsteijn, “Presence in Depth,”Ph.D. thesis, Eindhoven University ofTechnology, The Netherlands.[6] J. A. J. Roufs, “Perceptual image quality:Concept and measurement,” PhilipsJournal of Research 47, pp. 35–62, 1992.[7] P.G. Engeldrum, “Psychometric Scaling: aToolkit for Imaging SystemsDevelopment,” Imcotek Press, Winchester,USA, 2002.[8] C. Fehn, “Depth-image-based rendering(DIBR), compression and transmissionfor a new approach on 3D-TV,”Proceedings of the SPIE, 5291, pp 93-104,2004.[9] W. Tam, L. Stelmach, and P. Corriveau,“Psychovisual aspects of viewingstereoscopic video sequences,”Proceedings of the SPIE, 3295, pp. 226–235, 1998.[10] A. Berthold, “The influence of blur on theperceived quality and sensation of depthof 2D and stereo images,” Technicalreport, ATR Human InformationProcessing Research Laboratories, 1997.[11] J. Freeman and S. Avons, “Focus groupexploration of presence through advancedbroadcast services,” Proceedings of theSPIE, 3959, pp. 530–539, 2000.[12] P.J.H. Seuntiëns, L.M.J. Meesters and W.A.IJsselsteijn, “Perceived quality ofcompressed stereoscopic images: Effectsof symmetric and asymmetric JPEGcoding and camera separation,” ACMTransactions on Applied Perception, 3(2),pp. 95-109, 2006.[13] W.A. IJsselsteijn, D.G. Bouwhuis, J.Freeman, and H. de Ridder, “Presence asan experiential metric for 3-D displayevaluation,” SID Digest 2002, pp. 252-255, 2002.[14] P. Seuntiëns, I. Heynderickx, and W.A.IJsselsteijn, “Capturing the added value of3D-TV: Viewing experience andnaturalness of stereoscopic images,” TheJournal of Imaging Science andTechnology, 52(2), 2008.Wijnand A. IJsselsteijn is Associate Professorat the Human-Technology Interaction group ofEindhoven University of Technology in TheNetherlands. He has a background in psychologyand artificial intelligence, with an MSc incognitive neuropsychology from UtrechtUniversity, and a PhD in media psychology/HCIfrom Eindhoven University of Technology onthe topic of telepresence. His current researchinterests include social digital media, immersiveand stereoscopic media technology, and digitalgaming. His focus is on conceptualizing andmeasuring human experiences, including socialperception and self-perception, in relation tothese advanced media. Wijnand is significantlyinvolved in various industry and governmentsponsored projects on awareness systems,enriched communication media, digital games,and autostereoscopic display technologies. He isassociate director of the Media, Interface, andNetwork Design labs, and co-founder and codirectorof the Game Experience Lab(http://www.gamexplab.nl), and director of the3D/ice Lab at Eindhoven University ofTechnology. He has published over 120 peerreviewedjournal and conference papers, and coeditedfive books.http://www.comsoc.org/~mmc/ 18/41 Vol.4, No.7, August 2009
<strong>IEEE</strong> COMSOC MMTC E-LetterThree-Dimensional Video Capture and AnalysisCha Zhang, Microsoft Research, USAchazhang@microsoft.comThree-dimensional movies have re-gained a lotof interest recently, with companies like WaltDisney and DreamWorks Animation allinvesting heavily in making 3D films. While it isrelatively easy to render stereoscopic imagesfrom graphical models, for many otherapplications, such as 3D TV broadcasting, freeviewpoint 3D video and 3D teleconferencing, itis necessary to capture 3D video contents usingcamera arrays. In this short article, we brieflyreview techniques required for 3D video captureand analysis, which serve as the front end formany 3D video related applications.Considering that most 3D displays today(including 3D films) need only two slightlydifferent views to be sent to the left and righteyes of the user, the simplest 3D video capturingfront end is a stereo camera pair. To createsatisfactory 3D images, one needs to be carefulon a few things. First, the cameras need to besynchronized. It used to be that camerasynchronization has to be conducted through acommon external trigger. Nowadays, when thenumber of cameras is small, one can simplydaisy chain a few 1394 FireWire cameras on acommon bus, and these cameras will beautomatically synchronized. The secondimportant thing is camera calibration and imagerectification [1]. For human eyes to perceive 3Dcontents comfortably, it is necessary that the twovideo streams are rectified. Rectification is atransformation process that project images frommultiple cameras onto a common imaging plane,such that the epipolar lines are alignedhorizontally. In other words, when a point in the3D scene is projected to the two cameras, theprojected pixels must lie on the same horizontalline. Image rectification requires carefulcalibration of the cameras. The most popularcamera calibration algorithm today is inventedby Zhang in [2]. Bouguet had a nice MATLABimplementation of the same algorithm [3], whichalso has a routine to rectify an image pair.The data format for stereoscopic displays is notlimited to image pairs. As part of the EuropeanInformation <strong>Society</strong> Technologies (IST) project“Advanced Three-Dimensional TelevisionSystem Technologies” (ATTEST), Fehn et al. [4]proposed to use image plus depth as the captureand transmission format for stereoscopic videos,which is now the default 3D video format forsome 3D display manufactures such as Philips.Image plus depth is an evolutionary format thathas low overhead in terms of bitrate (because thedepth map can be compressed as a grayscaleimage) and is backward compatible to 2D videostandards. The depth map can be manuallycreated for legacy videos, or computed from astereo camera pair, or directly captured by depthsensors. The main challenge in using the imageplus depth format is that the left and right viewsto be displayed still need to be synthesized. Forscenes with a lot of occlusions, such a task isnon-trivial since some of the occluded regions inthe synthesized views are not visible in the 2Dimage. Various algorithms have been proposedin literature for this hole-filling challenge [5].Fortunately, for most users, this is not a concernas it is dealt internally by the 3D displaymanufacturers.It is worth spending a few additional words ondepth sensors. Triangulation and time-of-flightare two of the most popular mechanisms fordepth sensing. In triangulation, a stripe pattern isprojected onto the scene, which is captured by acamera positioned at a distance from theprojector. To avoid contaminating the color andtexture of the scene, the projector canperiodically switch among a few stripe patternsin order to create a uniform illumination bytemporal averaging. Another possibility is to useinvisible spectrum lights such as infrared lightsources for the stripe pattern. Time-of-flightdepth sensors can be roughly divided into twomain categories: pulsed wave and continuousmodulated wave. Pulsed wave sensors measurethe time of delay directly, while continuousmodulated wave sensors measure the phase shiftbetween the emitted and received laser beams todetermine the scene depth. One example ofpulsed wave sensors is the 3D terrestrial laserscanner systems manufactured by Riegl(http://www.riegl.com/). Continuous modulatedwave sensors include SwissRanger(http://www.swissranger.ch/), ZCam DepthCamera from 3DV Systems(http://www.3dvsystems.com/), etc.http://www.comsoc.org/~mmc/ 19/41 Vol.4, No.7, August 2009