<strong>IEEE</strong> COMSOC MMTC R-LetterDoes Kinect Provide a Simply and Cheap Solution for Telepresence?A short review for ”Enhanced Personal Autostereoscopic Telepresence System UsingCommodity Depth Cameras“Edited by Irene ChengAndrew Maimone, Jonathan Bidwell, Kun Peng and Henry Fuchs, “Enhanced PersonalAutostereoscopic Telepresence System Using Commodity Depth Cameras”, Elsevier:Computers & Graphics 36 (2012) 791-807.Telepresence technology enables a user to feelengaged as if he/she is a part <strong>of</strong> the virtual scene.“Telepresence” and “Kinect” are <strong>of</strong>ten associatedwhen either keyword is searched on the Internet.Applications, starting from the entertaining“Kinect Star Wars” to Kinect-based medicalimage exploration and collaborative telepresencein a social setting, have become commonplacesince the launch <strong>of</strong> Kinect depth sensors in 2010.The paper presents a low-cost Kinect-basedtelepresence system that <strong>of</strong>fers real-time 3Dscene capturing and head-tracked stereo 3Ddisplay without the user wearing any eyeweardevice. The system is an enhancement <strong>of</strong> theauthors previous version published in ISMAR2011 [1]. There have been quite a fewtelepresence systems being developed before, butnone <strong>of</strong> them is based on Kinect and addressesthe many issues associated with an array <strong>of</strong> depthsensors. The appearling features about Kinect liein its low cost and simplicity. The paper shows away to develop a cheap telepresence system,which was traditionally <strong>of</strong> high cost. But Kinectalso brings in unique challenges includingvarious artifacts in depth maps such as holes andnoises, and the interference among multipleKinects. Many existing techniques are employedor adapted by the authors for denoising, holefilling,smoothing, data merger, surfacegeneration, color correction and head tracking.The system takes advantage <strong>of</strong> a fully GPUaccelerateddata processing and renderingpipeline. The main contribution lies in theintegration <strong>of</strong> various existing techniques todeliver a workable solution. The completes<strong>of</strong>tware and hardware framework forimplementing the system is presented, includingGPU-acceleration.The Introduction gives a flavor <strong>of</strong> the evolution<strong>of</strong> 3D data acquisition using depth cameras andvisualization using eyewear in a telepresenceenvironment since late 90s’. The proposedsystem is based on the inexpensive Micros<strong>of</strong>tKinect sensor, providing a 58° x 45° field <strong>of</strong>view with high depth accuracy. After propersensors calibration, an entire room-sized scenecan be captured in real-time. By combining 2Deye detection technique and depth data, Kinect isable to <strong>of</strong>fer a markerless tracking solution.However, there are challenges that the authorsencountered in using Kinect sensors forimplementing their system. Inter-unitinterference is a major problem because eachsensor projects a fixed structured light pattern <strong>of</strong>similar wavelengths. There is also difficulty forpresenting seamless integration <strong>of</strong> color-matcheddata between cameras. Thus the enhancementsinclude introducing a s<strong>of</strong>tware solution to theKinect interference problem and a visibilitybasedmethod to merge data between cameras, aswell as for dynamic color matching betweencolor-plus-depth cameras. The hardwareconfiguration and s<strong>of</strong>tware implementation aredetailed in the paper. Interested readers can referto Section 4.2 to understand how the multi-Kinect interference problem is addressed. Colormatching is a common problem in many camerasystems. Even the same camera model device<strong>of</strong>ten exhibits different color gamuts [2] and soas Kinect sensors. The current available Kinectdriver (at the time <strong>of</strong> this paper) allows onlyautomatic color and exposure control. Thus colorvalues can vary dramatically between adjacentsensors. Here the authors argue that applyingtraditional color matching techniques isineffective because automatic control may altercolor balances. They introduce using depthinformation to find color correspondencesbetween cameras and build a color matchingfunction. Details are described in Section 4.6.Another enhancement explored in this paper isrelated to eye position tracking accuracy, speedand latency described in Section 4.7.Comparison <strong>of</strong> results shows the goodperformance <strong>of</strong> the proposed telepresencesystem. In the Conclusion, the authors point outthat although the system is functional, the outputhttp://committees.comsoc.org/mmc 6/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>
<strong>IEEE</strong> COMSOC MMTC R-Letterimage quality still needs improvement, and inparticular the temporal noise artifacts present atthe edges <strong>of</strong> objects at depth pixel level. Instead<strong>of</strong> presenting all the technical discussions andcomputational analysis, what I like about thispaper is its clarity and readability. A shortoverview suitable for the general readers is givenat the beginning <strong>of</strong> each section followed by aAcknowledgement:This paper is nominated by Jianfei Cai <strong>of</strong> theMMTC 3D Processing, Rendering andCommunication (3DPRC) Interest Group.References:[1] Ilie A, Welch G. Ensuring colorconsistency across multiple cameras. In:Proceedings <strong>of</strong> the tenth <strong>IEEE</strong> internationalconference on computer vision – volume 2.ICCV ’05; Washington, DC, USA: <strong>IEEE</strong>Computer <strong>Society</strong>; 2005, p. 1268–75. ISBN0-7695-2334-X-02.http://dx.doi.org/10.1109/ICCV.2005.88[1] Maimone A, Fuchs H. Encumbrance-freetelepresence system with real-time 3dcapture and display using commodity depthcameras. In: Tenth <strong>IEEE</strong> internationalsymposium on mixed and augmentedreality (ISMAR); 2011. p. 137–46.http://dx.doi.org.10.1109/ISMAR.2011.6092379.more in-depth explanation. The developedsystem shows a promising way to bringtelepresence to common users, which willstimulate more subsequent multimediacommunication research.Irene Cheng, SM<strong>IEEE</strong> isthe Scientific Director <strong>of</strong>the Multimedia ResearchCentre, and an AdjunctPr<strong>of</strong>essor in the Faculty <strong>of</strong>Science, as well as theFaculty <strong>of</strong> Medicine &Dentistry, University <strong>of</strong>Alberta, Canada. She isalso a Research Affiliatewith the Glenrose Rehabilitation Hospital in Alberta,Canada. She is a Co-Chair <strong>of</strong> the <strong>IEEE</strong> SMC <strong>Society</strong>,Human Perception in Vision, Graphics andMultimedia Technical Committee; was the Chair <strong>of</strong>the <strong>IEEE</strong> Northern Canada Section, Engineering inMedicine and Biological Science (EMBS) Chapter(2009-2011), and the Chair <strong>of</strong> the <strong>IEEE</strong>Communication <strong>Society</strong>, Multimedia TechnicalCommittee 3D Processing, Render andCommunication (MMTC) Interest Group (2010-2012).She is now the Director <strong>of</strong> the Review-Letter EditorialBoard <strong>of</strong> MMTC (2012-2014).Over the last ten years, she has more than 110international peer-reviewed publications including 2books and 31 journals. Her research interests includemultimedia communication techniques, Quality <strong>of</strong>Experience (QoE), Levels-<strong>of</strong>-detail, 3D GraphicsVisualization and Perceptual Quality Evaluation. Inparticular, she introduced applying human perception– Just-Noticeable-Difference – followingpsychophysical methodology to generate multi-scale3D models.http://committees.comsoc.org/mmc 7/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>