<strong>IEEE</strong> COMSOC MMTC R-LetterHow to Analyze and Optimize the Encoding Latency for Multiview Video CodingA short review for “A Framework for the Analysis and Optimization <strong>of</strong> Encoding Latency for MultiviewVideo”Edited by Christian TimmererP. Carballeira, J. Cabrera, A. Ortega, F. Jaureguizar and N. García, “A Framework for theAnalysis and Optimization <strong>of</strong> Encoding Latency for Multiview Video”, <strong>IEEE</strong> Journal <strong>of</strong>Selected Topics in Signal Processing, vol. 6, no. 5, pp. 583-596, Sep. 2012.Multiview video with additional scene geometryinformation, such as depth maps, is a widelyadopted data format to enable key functionalitiesin new visual media systems, such as 3D Video(3DV) and Free Viewpoint Video (FVV) 0.Given that the data size <strong>of</strong> multiview videogrows linearly with the number <strong>of</strong> cameras,while the available bandwidth is generallylimited, new schemes for an efficientcompression for multiview video [2] andadditional data [3] have been under investigationin recent years.The authors argue that the design <strong>of</strong> multiviewprediction structures for multiview video coding[4] has been mostly focused on improving ratedistortion(RD) performance, ignoring importantdifferences in the latency behavior <strong>of</strong> theresulting codecs. These differences in latencymay be critical for delay constrained applicationssuch as immersive video conferencing scenarios,in which the end-to-end delay, thecommunication latency, needs to be kept low inorder to preserve interactivity [5]. In hybridvideo encoders there is a clear trade-<strong>of</strong>f betweenRD performance and encoding delay, mainly dueto the use <strong>of</strong> backward prediction andhierarchical prediction structures. In single-viewvideo encoders, the encoding delay can be easilyestimated and reduced by simple decisions onthe design <strong>of</strong> prediction structures.The analysis <strong>of</strong> the encoding delay in the case <strong>of</strong>multiview video is more challenging as itrequires to handle more complex dependencystructures than in single-view video, includingnot only temporal but also inter-view prediction.Additionally, the fact that the encoder may haveto manage the encoding <strong>of</strong> several frames at thesame time (frames from several views), due tothe inherent parallel nature <strong>of</strong> multiview video,makes the characteristics <strong>of</strong> multi-processorhardware platforms play a significant role in theanalysis.In this paper, the authors propose a generalframework for the characterization <strong>of</strong> theencoding latency in multiview encoders thatcaptures the influence <strong>of</strong> 1) the predictionstructure and 2) the hardware encoder model.This framework allows a systematic analysis <strong>of</strong>the encoding latency for arbitrary multiviewprediction structures in a multiview encoder. Theprimary element <strong>of</strong> the proposed framework is anencoding latency model based on graph theoryalgorithms that assumes that the processingcapacity <strong>of</strong> the encoder is essentially unbounded,i.e., the directed acyclic graph encoding latency(DAGEL) model. It can be seen as a taskscheduling model [6] (the encoding <strong>of</strong> a frame isthe task unit) that is used to compute theencoding latency rather than the schedule length.The paper also demonstrates that, despite theassumption <strong>of</strong> unbounded processing capacity,the encoding latency values obtained with theDAGEL model are accurate for multiviewencoders with a finite number <strong>of</strong> processorsgreater than a required minimum, which can beidentified. Otherwise, results provided by theDAGEL model represent a lower bound to theactual encoding latency <strong>of</strong> the encoder.As an example <strong>of</strong> the applications <strong>of</strong> the DAGELmodel, the authors show how it can be used toreduce the encoding latency <strong>of</strong> a given multiviewprediction structure in order to meet a targetvalue while preserving as much as possible theRD performance. In this approach, the objectiveis to prune the minimum number <strong>of</strong> framedependencies (those that introduce a higherencoding delay in the original structure) until thelatency target value is achieved. Therefore, thedegradation <strong>of</strong> RD performance due to removal<strong>of</strong> prediction dependencies is limited. Finally,the authors demonstrate that the prunedprediction structures still produce a minimumencoding latency, as compared to other pruningoptions, even in hardware platforms models thathttp://committees.comsoc.org/mmc 4/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>
<strong>IEEE</strong> COMSOC MMTC R-Letterdo not meet the minimum requirements in terms<strong>of</strong> the number <strong>of</strong> processors <strong>of</strong> the DAGELmodel.Following this research direction, future workincludes the extension <strong>of</strong> this framework tomultiview decoders and the use <strong>of</strong> graph modelsto analyze the delay behavior in more realisticencoder/decoder hardware architectures [7].This paper is nominated by Cha Zhang <strong>of</strong> theMMTC 3D Processing, Rendering andCommunication (3DPRC) Interest Group.References:[1] P. Merkle, K. Mueller, and T. Wiegand, “3Dvideo: acquisition, coding, and display,”<strong>IEEE</strong> Transactions on ConsumerElectronics, vol. 56, no. 2, pp. 946–950,2010.[2] A. Vetro, T. Wiegand, and G. Sullivan,“Overview <strong>of</strong> the stereo and multiviewvideo coding extensions <strong>of</strong> theH.264/MPEG-4 AVC standard,”Proceedings <strong>of</strong> the <strong>IEEE</strong>, vol. 99, no. 4, pp.626–642, Apr. 2011.[3] ISO/IEC JTC1/SC29/WG11, “Call forProposals on 3D Video CodingTechnology,” MPEG output doc. N12036,Geneva, Switzerland, Mar. 2011.[4] P. Merkle, A. Smolic, K. Müller, and T.Wiegand, “Efficient prediction structures formultiview video coding,” <strong>IEEE</strong>Transactions on Circuits and Systems forVideo Technology, vol. 17, no. 11, pp.1461–1473, Nov. 2007.[5] G. Karlsson, “Asynchronous transfer <strong>of</strong>video,” <strong>IEEE</strong> Communication Magazine,vol. 34, no. 8, pp. 118–126, Aug. 1996.[6] Y.-K. Kwok and I. Ahmad, “Staticscheduling algorithms for allocating directedtask graphs to multiprocessors,” ACMAcknowledgement:This paper is nominated by Cha Zhang <strong>of</strong> theMMTC 3D Processing, Rendering andCommunication (3DPRC) Interest Group.Computing Surveys, vol. 31, no. 4, pp. 406–471, Dec. 1999.[7] P. Carballeira, J. Cabrera, F. Jaureguizar andN. García, “Systematic Analysis <strong>of</strong> theDecoding Delay in Multiview Video”,Journal <strong>of</strong> Visual Communication andImage Representation, Special Issue onAdvances in 3D Video Processing, (in press)(doi: 10.1016/j.jvcir.<strong>2013</strong>.04.004).Christian Timmerer is anassistant pr<strong>of</strong>essor at theInstitute <strong>of</strong> InformationTechnology (ITEC),Alpen-Adria-UniversitätKlagenfurt, Austria. Hisresearch interests includeimmersive multimedia communication, streaming,adaptation, and Quality <strong>of</strong> Experience with more than100 publications in this domain. He was the generalchair <strong>of</strong> WIAMIS’08, ISWM’09, EUMOB’09,AVSTP2P’10, WoMAN’11, QoMEX’13 and hasparticipated in several EC-funded projects, notablyDANAE, ENTHRONE, P2P-Next, ALICANTE,QUALINET, and SocialSensor. He also participated inISO/MPEG work for several years, notably in the area<strong>of</strong> MPEG-21, MPEG-M, MPEG-V, and DASH/MMT.He received his PhD in 2006 from the Alpen-Adria-Universität Klagenfurt. Publications and MPEGcontributions can be found underresearch.timmerer.com, follow him ontwitter.com/timse7, and subscribe to his blogblog.timmerer.com.http://committees.comsoc.org/mmc 5/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>