13.07.2015 Views

R-letter of August 2013 - IEEE Communications Society

R-letter of August 2013 - IEEE Communications Society

R-letter of August 2013 - IEEE Communications Society

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>IEEE</strong> COMSOC MMTC R-LetterMessage from the Review BoardIntroductionSince the launch <strong>of</strong> R-Letter in October 2010,there have been fifteen publications. Creditsshould be given to all contributors. In order todeliver timely bi-monthly R-Letter, the ReviewBoard needs to maintain a pool <strong>of</strong> nominatedpapers so that board members can have sufficienttime to complete the review and editorial process.Therefore, we would like to invite the MMTCcommunity to actively participate in thenomination process. Please refer PaperNomination Policy at the end <strong>of</strong> this issue.Nominators <strong>of</strong> Review articles will beacknowledged in the respective R-Letter.The Review Board aims at recommending recent(within one and half year) state-<strong>of</strong>-the-art andemerging publications in the literature. Thetopics should be <strong>of</strong> general interest for theMMTC community. In this issue, the twodistinguished articles discuss multi-view videoencoding and telepresence based on an array <strong>of</strong>Kinect sensor.Distinguished CategoryThe growing need for applications and serviceswhich are ranging from stereoscopic telepresencesystem to multi-view encoded content isincreasing rapidly. However, it is not sure howthese services may be deployed with commodityhardware like the Kinect. For multi-view videoencoding, the actual encoding delay is animportant issue which calls for the analysis andoptimization there<strong>of</strong>.The first paper, published in <strong>IEEE</strong> Journal <strong>of</strong>Selected Topics in Signal Processing, provides aframework for the analysis and optimization <strong>of</strong>the encoding latency for multi-view video. Thesecond paper, published in Elsevier: Computers& Graphics, shows how to use commodity depthcameras in order to provide enhanced personalauto-stereoscopic telepresence.Regular CategoryWhile more and more social data are available,how to extract visual signals and images andanalyze them have been extensively studied inthe literature. Meanwhile, how to allocateresources for these services is another importantarea to study. In this issue, the regular categoryhas assembled six papers on these topics. Thefirst paper, published in the <strong>IEEE</strong> Transactionson Signal Processing, Paper proposes theconstruction <strong>of</strong> two-channel wavelet filter banksfor analyzing graph-signals. The second paper,from <strong>IEEE</strong> Transactions on Multimedia,proposes a blind resource allocation scheme bytaking into account the fairness among the users.The authors derive the convergence time <strong>of</strong> theproposed scheme and show that the proposedscheme provides almost the same MoS value asthe optimum solution which knows QoE modelin advance. The third paper, published in <strong>IEEE</strong>Transactions on Multimedia, the authors providea generalized framework for optimizing theresources needed to support real-time IPTVservices in a virtualized architecture, which takesadvantage <strong>of</strong> the different deadlines associatedwith each service to effectively multiplex theseservices by time-shifting scheduling. The fourthpaper is the best paper from <strong>IEEE</strong> ICME’<strong>2013</strong>,which proposes a novel scheme to automaticallysummarize and depict human movements from2D videos without 3D motion capture ormanually labeled data. The fifth paper is thebest student paper award <strong>of</strong> <strong>IEEE</strong> ICME’<strong>2013</strong>.The authors propose a unified model toautomatically identify visual concepts andestimate their visual characteristics, or visualness,from a large-scale image dataset. We would liketo thank all the authors, reviewers, nominators,editors and others who contribute to the release<strong>of</strong> this issue. The sixth paper, published in the<strong>IEEE</strong> Wireless <strong>Communications</strong>, propose across-layer optimization framework forcooperative video summary transmission.We would like to thank all the authors,nominators, reviewers, editors and others whocontribute to the release <strong>of</strong> this issue.http://committees.comsoc.org/mmc 2/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-Letter<strong>IEEE</strong> ComSoc MMTC R-LetterDirector:Irene Cheng, University <strong>of</strong> Alberta, CanadaEmail: locheng@ualberta.caCo-Director:Weiyi Zhang, AT&T Research, USAEmail: maxzhang@research.att.comCo-Director:Christian TimmererAlpen-Adria-Universität Klagenfurt, AustriaEmail: christian.timmerer@itec.aau.athttp://committees.comsoc.org/mmc 3/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterHow to Analyze and Optimize the Encoding Latency for Multiview Video CodingA short review for “A Framework for the Analysis and Optimization <strong>of</strong> Encoding Latency for MultiviewVideo”Edited by Christian TimmererP. Carballeira, J. Cabrera, A. Ortega, F. Jaureguizar and N. García, “A Framework for theAnalysis and Optimization <strong>of</strong> Encoding Latency for Multiview Video”, <strong>IEEE</strong> Journal <strong>of</strong>Selected Topics in Signal Processing, vol. 6, no. 5, pp. 583-596, Sep. 2012.Multiview video with additional scene geometryinformation, such as depth maps, is a widelyadopted data format to enable key functionalitiesin new visual media systems, such as 3D Video(3DV) and Free Viewpoint Video (FVV) 0.Given that the data size <strong>of</strong> multiview videogrows linearly with the number <strong>of</strong> cameras,while the available bandwidth is generallylimited, new schemes for an efficientcompression for multiview video [2] andadditional data [3] have been under investigationin recent years.The authors argue that the design <strong>of</strong> multiviewprediction structures for multiview video coding[4] has been mostly focused on improving ratedistortion(RD) performance, ignoring importantdifferences in the latency behavior <strong>of</strong> theresulting codecs. These differences in latencymay be critical for delay constrained applicationssuch as immersive video conferencing scenarios,in which the end-to-end delay, thecommunication latency, needs to be kept low inorder to preserve interactivity [5]. In hybridvideo encoders there is a clear trade-<strong>of</strong>f betweenRD performance and encoding delay, mainly dueto the use <strong>of</strong> backward prediction andhierarchical prediction structures. In single-viewvideo encoders, the encoding delay can be easilyestimated and reduced by simple decisions onthe design <strong>of</strong> prediction structures.The analysis <strong>of</strong> the encoding delay in the case <strong>of</strong>multiview video is more challenging as itrequires to handle more complex dependencystructures than in single-view video, includingnot only temporal but also inter-view prediction.Additionally, the fact that the encoder may haveto manage the encoding <strong>of</strong> several frames at thesame time (frames from several views), due tothe inherent parallel nature <strong>of</strong> multiview video,makes the characteristics <strong>of</strong> multi-processorhardware platforms play a significant role in theanalysis.In this paper, the authors propose a generalframework for the characterization <strong>of</strong> theencoding latency in multiview encoders thatcaptures the influence <strong>of</strong> 1) the predictionstructure and 2) the hardware encoder model.This framework allows a systematic analysis <strong>of</strong>the encoding latency for arbitrary multiviewprediction structures in a multiview encoder. Theprimary element <strong>of</strong> the proposed framework is anencoding latency model based on graph theoryalgorithms that assumes that the processingcapacity <strong>of</strong> the encoder is essentially unbounded,i.e., the directed acyclic graph encoding latency(DAGEL) model. It can be seen as a taskscheduling model [6] (the encoding <strong>of</strong> a frame isthe task unit) that is used to compute theencoding latency rather than the schedule length.The paper also demonstrates that, despite theassumption <strong>of</strong> unbounded processing capacity,the encoding latency values obtained with theDAGEL model are accurate for multiviewencoders with a finite number <strong>of</strong> processorsgreater than a required minimum, which can beidentified. Otherwise, results provided by theDAGEL model represent a lower bound to theactual encoding latency <strong>of</strong> the encoder.As an example <strong>of</strong> the applications <strong>of</strong> the DAGELmodel, the authors show how it can be used toreduce the encoding latency <strong>of</strong> a given multiviewprediction structure in order to meet a targetvalue while preserving as much as possible theRD performance. In this approach, the objectiveis to prune the minimum number <strong>of</strong> framedependencies (those that introduce a higherencoding delay in the original structure) until thelatency target value is achieved. Therefore, thedegradation <strong>of</strong> RD performance due to removal<strong>of</strong> prediction dependencies is limited. Finally,the authors demonstrate that the prunedprediction structures still produce a minimumencoding latency, as compared to other pruningoptions, even in hardware platforms models thathttp://committees.comsoc.org/mmc 4/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-Letterdo not meet the minimum requirements in terms<strong>of</strong> the number <strong>of</strong> processors <strong>of</strong> the DAGELmodel.Following this research direction, future workincludes the extension <strong>of</strong> this framework tomultiview decoders and the use <strong>of</strong> graph modelsto analyze the delay behavior in more realisticencoder/decoder hardware architectures [7].This paper is nominated by Cha Zhang <strong>of</strong> theMMTC 3D Processing, Rendering andCommunication (3DPRC) Interest Group.References:[1] P. Merkle, K. Mueller, and T. Wiegand, “3Dvideo: acquisition, coding, and display,”<strong>IEEE</strong> Transactions on ConsumerElectronics, vol. 56, no. 2, pp. 946–950,2010.[2] A. Vetro, T. Wiegand, and G. Sullivan,“Overview <strong>of</strong> the stereo and multiviewvideo coding extensions <strong>of</strong> theH.264/MPEG-4 AVC standard,”Proceedings <strong>of</strong> the <strong>IEEE</strong>, vol. 99, no. 4, pp.626–642, Apr. 2011.[3] ISO/IEC JTC1/SC29/WG11, “Call forProposals on 3D Video CodingTechnology,” MPEG output doc. N12036,Geneva, Switzerland, Mar. 2011.[4] P. Merkle, A. Smolic, K. Müller, and T.Wiegand, “Efficient prediction structures formultiview video coding,” <strong>IEEE</strong>Transactions on Circuits and Systems forVideo Technology, vol. 17, no. 11, pp.1461–1473, Nov. 2007.[5] G. Karlsson, “Asynchronous transfer <strong>of</strong>video,” <strong>IEEE</strong> Communication Magazine,vol. 34, no. 8, pp. 118–126, Aug. 1996.[6] Y.-K. Kwok and I. Ahmad, “Staticscheduling algorithms for allocating directedtask graphs to multiprocessors,” ACMAcknowledgement:This paper is nominated by Cha Zhang <strong>of</strong> theMMTC 3D Processing, Rendering andCommunication (3DPRC) Interest Group.Computing Surveys, vol. 31, no. 4, pp. 406–471, Dec. 1999.[7] P. Carballeira, J. Cabrera, F. Jaureguizar andN. García, “Systematic Analysis <strong>of</strong> theDecoding Delay in Multiview Video”,Journal <strong>of</strong> Visual Communication andImage Representation, Special Issue onAdvances in 3D Video Processing, (in press)(doi: 10.1016/j.jvcir.<strong>2013</strong>.04.004).Christian Timmerer is anassistant pr<strong>of</strong>essor at theInstitute <strong>of</strong> InformationTechnology (ITEC),Alpen-Adria-UniversitätKlagenfurt, Austria. Hisresearch interests includeimmersive multimedia communication, streaming,adaptation, and Quality <strong>of</strong> Experience with more than100 publications in this domain. He was the generalchair <strong>of</strong> WIAMIS’08, ISWM’09, EUMOB’09,AVSTP2P’10, WoMAN’11, QoMEX’13 and hasparticipated in several EC-funded projects, notablyDANAE, ENTHRONE, P2P-Next, ALICANTE,QUALINET, and SocialSensor. He also participated inISO/MPEG work for several years, notably in the area<strong>of</strong> MPEG-21, MPEG-M, MPEG-V, and DASH/MMT.He received his PhD in 2006 from the Alpen-Adria-Universität Klagenfurt. Publications and MPEGcontributions can be found underresearch.timmerer.com, follow him ontwitter.com/timse7, and subscribe to his blogblog.timmerer.com.http://committees.comsoc.org/mmc 5/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterDoes Kinect Provide a Simply and Cheap Solution for Telepresence?A short review for ”Enhanced Personal Autostereoscopic Telepresence System UsingCommodity Depth Cameras“Edited by Irene ChengAndrew Maimone, Jonathan Bidwell, Kun Peng and Henry Fuchs, “Enhanced PersonalAutostereoscopic Telepresence System Using Commodity Depth Cameras”, Elsevier:Computers & Graphics 36 (2012) 791-807.Telepresence technology enables a user to feelengaged as if he/she is a part <strong>of</strong> the virtual scene.“Telepresence” and “Kinect” are <strong>of</strong>ten associatedwhen either keyword is searched on the Internet.Applications, starting from the entertaining“Kinect Star Wars” to Kinect-based medicalimage exploration and collaborative telepresencein a social setting, have become commonplacesince the launch <strong>of</strong> Kinect depth sensors in 2010.The paper presents a low-cost Kinect-basedtelepresence system that <strong>of</strong>fers real-time 3Dscene capturing and head-tracked stereo 3Ddisplay without the user wearing any eyeweardevice. The system is an enhancement <strong>of</strong> theauthors previous version published in ISMAR2011 [1]. There have been quite a fewtelepresence systems being developed before, butnone <strong>of</strong> them is based on Kinect and addressesthe many issues associated with an array <strong>of</strong> depthsensors. The appearling features about Kinect liein its low cost and simplicity. The paper shows away to develop a cheap telepresence system,which was traditionally <strong>of</strong> high cost. But Kinectalso brings in unique challenges includingvarious artifacts in depth maps such as holes andnoises, and the interference among multipleKinects. Many existing techniques are employedor adapted by the authors for denoising, holefilling,smoothing, data merger, surfacegeneration, color correction and head tracking.The system takes advantage <strong>of</strong> a fully GPUaccelerateddata processing and renderingpipeline. The main contribution lies in theintegration <strong>of</strong> various existing techniques todeliver a workable solution. The completes<strong>of</strong>tware and hardware framework forimplementing the system is presented, includingGPU-acceleration.The Introduction gives a flavor <strong>of</strong> the evolution<strong>of</strong> 3D data acquisition using depth cameras andvisualization using eyewear in a telepresenceenvironment since late 90s’. The proposedsystem is based on the inexpensive Micros<strong>of</strong>tKinect sensor, providing a 58° x 45° field <strong>of</strong>view with high depth accuracy. After propersensors calibration, an entire room-sized scenecan be captured in real-time. By combining 2Deye detection technique and depth data, Kinect isable to <strong>of</strong>fer a markerless tracking solution.However, there are challenges that the authorsencountered in using Kinect sensors forimplementing their system. Inter-unitinterference is a major problem because eachsensor projects a fixed structured light pattern <strong>of</strong>similar wavelengths. There is also difficulty forpresenting seamless integration <strong>of</strong> color-matcheddata between cameras. Thus the enhancementsinclude introducing a s<strong>of</strong>tware solution to theKinect interference problem and a visibilitybasedmethod to merge data between cameras, aswell as for dynamic color matching betweencolor-plus-depth cameras. The hardwareconfiguration and s<strong>of</strong>tware implementation aredetailed in the paper. Interested readers can referto Section 4.2 to understand how the multi-Kinect interference problem is addressed. Colormatching is a common problem in many camerasystems. Even the same camera model device<strong>of</strong>ten exhibits different color gamuts [2] and soas Kinect sensors. The current available Kinectdriver (at the time <strong>of</strong> this paper) allows onlyautomatic color and exposure control. Thus colorvalues can vary dramatically between adjacentsensors. Here the authors argue that applyingtraditional color matching techniques isineffective because automatic control may altercolor balances. They introduce using depthinformation to find color correspondencesbetween cameras and build a color matchingfunction. Details are described in Section 4.6.Another enhancement explored in this paper isrelated to eye position tracking accuracy, speedand latency described in Section 4.7.Comparison <strong>of</strong> results shows the goodperformance <strong>of</strong> the proposed telepresencesystem. In the Conclusion, the authors point outthat although the system is functional, the outputhttp://committees.comsoc.org/mmc 6/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-Letterimage quality still needs improvement, and inparticular the temporal noise artifacts present atthe edges <strong>of</strong> objects at depth pixel level. Instead<strong>of</strong> presenting all the technical discussions andcomputational analysis, what I like about thispaper is its clarity and readability. A shortoverview suitable for the general readers is givenat the beginning <strong>of</strong> each section followed by aAcknowledgement:This paper is nominated by Jianfei Cai <strong>of</strong> theMMTC 3D Processing, Rendering andCommunication (3DPRC) Interest Group.References:[1] Ilie A, Welch G. Ensuring colorconsistency across multiple cameras. In:Proceedings <strong>of</strong> the tenth <strong>IEEE</strong> internationalconference on computer vision – volume 2.ICCV ’05; Washington, DC, USA: <strong>IEEE</strong>Computer <strong>Society</strong>; 2005, p. 1268–75. ISBN0-7695-2334-X-02.http://dx.doi.org/10.1109/ICCV.2005.88[1] Maimone A, Fuchs H. Encumbrance-freetelepresence system with real-time 3dcapture and display using commodity depthcameras. In: Tenth <strong>IEEE</strong> internationalsymposium on mixed and augmentedreality (ISMAR); 2011. p. 137–46.http://dx.doi.org.10.1109/ISMAR.2011.6092379.more in-depth explanation. The developedsystem shows a promising way to bringtelepresence to common users, which willstimulate more subsequent multimediacommunication research.Irene Cheng, SM<strong>IEEE</strong> isthe Scientific Director <strong>of</strong>the Multimedia ResearchCentre, and an AdjunctPr<strong>of</strong>essor in the Faculty <strong>of</strong>Science, as well as theFaculty <strong>of</strong> Medicine &Dentistry, University <strong>of</strong>Alberta, Canada. She isalso a Research Affiliatewith the Glenrose Rehabilitation Hospital in Alberta,Canada. She is a Co-Chair <strong>of</strong> the <strong>IEEE</strong> SMC <strong>Society</strong>,Human Perception in Vision, Graphics andMultimedia Technical Committee; was the Chair <strong>of</strong>the <strong>IEEE</strong> Northern Canada Section, Engineering inMedicine and Biological Science (EMBS) Chapter(2009-2011), and the Chair <strong>of</strong> the <strong>IEEE</strong>Communication <strong>Society</strong>, Multimedia TechnicalCommittee 3D Processing, Render andCommunication (MMTC) Interest Group (2010-2012).She is now the Director <strong>of</strong> the Review-Letter EditorialBoard <strong>of</strong> MMTC (2012-2014).Over the last ten years, she has more than 110international peer-reviewed publications including 2books and 31 journals. Her research interests includemultimedia communication techniques, Quality <strong>of</strong>Experience (QoE), Levels-<strong>of</strong>-detail, 3D GraphicsVisualization and Perceptual Quality Evaluation. Inparticular, she introduced applying human perception– Just-Noticeable-Difference – followingpsychophysical methodology to generate multi-scale3D models.http://committees.comsoc.org/mmc 7/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterExtending Signal Processing Techniques to Graph DomainA short review for “Perfect Reconstruction Two-Channel Wavelet Filter Banks for Graph Structured Data”Edited by Jun ZhouSunil K. Narang and Antonio Ortega, “Perfect Reconstruction Two-Channel Wavelet FilterBanks for Graph Structured Data”, <strong>IEEE</strong> Transactions on Signal Processing, Vol. 60, No. 6,pages 2786-2799, 2012.Graph theory has been successfully adopted inmany computer vision and pattern recognitionapplications. When dealing with large scale data,one <strong>of</strong> the problems that hinders the wideadoption <strong>of</strong> graphical models is the very highcomputational complexity caused by largenumber <strong>of</strong> nodes and vertices in graph. Toaddress this challenge, one would expect thatonly a few nodes in the graph be used to form acompacted representation <strong>of</strong> the original graph.Then data processing can be performed only on asmall neighborhood <strong>of</strong> each node. Some recentefforts in this direction have explored traditionalsignal processing techniques, such as wavelettransform, as possible solutions.The paper published by Narang and Ortega in<strong>IEEE</strong> TSP is a seminal work on graph samplingand design <strong>of</strong> critically sampled wavelet filterbanks on graphs. It not only provides acomprehensive review <strong>of</strong> the spatial/spectralrepresentation <strong>of</strong> graph signals and existing workon two-channel filter banks, but also proposesthe important characteristics <strong>of</strong> the samplingstrategy and filter banks for perfectreconstruction <strong>of</strong> bipartite graphs.The key idea behind this method is applying atwo-channel filter banks that decompose a graphinto high-pass and low-pass channels, eachcontaining only part <strong>of</strong> the nodes in the graphafter downsampling and following upsamplingoperations. When these two channels arecombined, they form a perfect reconstruction <strong>of</strong>the original graph representation. In order toachieve such distortion-free reconstruction, analiasing component, which is composed <strong>of</strong> filterbanks and downsampling functions, shall be setto zero. Therefore, the goal <strong>of</strong> this research is t<strong>of</strong>igure out what are the proper filter banks anddownsampling functions to meet the aboverequirement.To develop the downsampleing strategy, theauthors proposed that the decomposed high-passthe low-pass channels shall contain complementnode sets <strong>of</strong> the original graph. This leads to thebuilding <strong>of</strong> a bipartition <strong>of</strong> the graph nodes [1].Based on the graph spectral theory, this strategygenerates spectral coefficients at symmetricgraph frequencies around a central frequency,which is equivalent to the aliasing component <strong>of</strong>the reconstruction function.To design the filter banks, the authors pointedout that they shall meet three conditions, i.e.,aliasing cancellation, perfect reconstruction, andorthogonality. Therefore, a quadrature mirrorfilter bank method [2] (wavelet is one <strong>of</strong> suchmethod) was chosen and extended to bipartitegraph. This method allows a single basis spectralkernel be created, while all other kernels are builton top <strong>of</strong> the basis kernel.Whilst it is straightforward to adopt the waveletfilter banks on bipartite graph, the application <strong>of</strong>this framework to arbitrary graph requiresgenerating a series <strong>of</strong> bipartite subgraphs fromthe original graph. Then each subgraph can beprocessed independently with a cascadedtransform being implemented at the end. In thispaper, the authors proposed to use the biparticitymethod from Harary et al [3] for subgraphgeneration.Two experiments have been performed todemonstrate how the proposed method can beapplied to image processing (as an example <strong>of</strong>regular graph) and traffic graph analysis (as anexample <strong>of</strong> irregular graph). These examplesshow that the two-channel wavelet filter banksand the sampling method form a practicalsolution for graph decomposition andreconstruction. It enables efficient graphcomputation, which has been expected by theresearch community. I believe this work willgenerate long-term impact to the development <strong>of</strong>graph theory because it provides an elegant wayhttp://committees.comsoc.org/mmc 8/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-Letter<strong>of</strong> applying signal processing techniques to solvestructured pattern recognition problems.References:[2] S. Narang and A. Ortega, “Downsamplinggraphs using spectral theory,” Proceedings<strong>of</strong> the International Conference onAcoustics, Speech and Signal Processing,pages 4208-4211, 2011.[3] J. Johnston. “A filter family designed foruse in quadrature mirror filter banks,”Proceedings <strong>of</strong> the <strong>IEEE</strong> InternationalConference on Acoustics, Speech andSignal, pages 291-294, 1980.[4] F. Harary, D. Hsu, and Z.Miller, “Thebiparticity <strong>of</strong> a graph,” Journal <strong>of</strong> GraphTheory, vol. 1, no. 2, pp. 131–133, 1977.Jun Zhou received the B.S.degree in computer scienceand the B.E. degree ininternational business fromNanjing University <strong>of</strong>Science and Technology,China, in 1996 and 1998,respectively. He receivedthe M.S. degree in computerscience from ConcordiaUniversity, Canada, in 2002,and the Ph.D. degree incomputing science from University <strong>of</strong> Alberta, Canada,in 2006.He joined the School <strong>of</strong> Information andCommunication Technology in Griffith University asa lecturer in June 2012. Prior to this appointment, hehad been a research fellow in the Australian NationalUniversity, and a researcher at NICTA. His researchinterests are in statistical pattern recognition,interactive computer vision, and their applications tohyperspectral imaging and environmental informatics.http://committees.comsoc.org/mmc 9/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterFairness Resource Allocation in Blind Wireless Multimedia <strong>Communications</strong>A short review for “Fairness Resource Allocation in Blind Wireless Multimedia <strong>Communications</strong>"Edited by Koichi AdachiL. Zhou, M. Chen, Y. Qian, and H.-H. Chen, "Fairness Resource Allocation in BlindWireless Multimedia <strong>Communications</strong>," <strong>IEEE</strong> Trans. on Multimedia, vol. 15, no. 4, pp.946-956, Jun. <strong>2013</strong>.A scenario where one base station (BS) assignsavailable resource to multiple multimedia usersis considered. The fairness resource allocationproblem is formulated with a fairness parameter to maximize summation <strong>of</strong> user’s quality <strong>of</strong>experience (QoE). Traditional -fairnessresource allocation in wireless multimediacommunication systems assumes the QoE model(or utility function) <strong>of</strong> each user is available atscheduler. In such a system, balancing thetrade<strong>of</strong>f between the fairness and performance isan important task for BS as it is known that theintroduction <strong>of</strong> fairness generally has a negativeimpact on the performance [1-4]. However, thecritical assumption in most existing studies is theavailability <strong>of</strong> QoE model at the scheduler,which is may not be practical.Therefore different from previous works, a blindscenario is considered in this paper where BS hasno knowledge <strong>of</strong> the QoE model during thewhole resource allocation procedure. Theanswers to the following two questions areprovided in this paper: 1) How to set the fairnessparameter from the perspective <strong>of</strong>performance-fairness trade<strong>of</strong>f? 2) Given aspecific fairness parameter , how to implementthe -fairness resource allocation online?The main contributions <strong>of</strong> this paper are lying in:Qualitative analysis and Technical realization.Some recent works theoretically analyzed theperformance with special cases <strong>of</strong> a value, e.g.,proportional fairness ( → 1) and max-minfairness (→∞) [1-3,5,6]. For more general case<strong>of</strong> , only empirical modelling was provided [2].An exact expression for the upper bound <strong>of</strong> theperformance loss is caused by -fairness,characterizing the fairness-performance trade<strong>of</strong>f(Theorem 1). This enables a BS scheduler tochoose the appropriate fairness parameter which answers the first question.For technical realization <strong>of</strong> a specific fairnessresource allocation, convex optimization [4], [7-10] and game theory [10-15] are generally usedin previous works. However, both approachesrequires the utility function <strong>of</strong> each user to beavailable at BS or the controller. In this paper, ablind fairness-aware resource allocation problemis decomposed into two subproblems to describethe behaviors <strong>of</strong> the users and BS. The secondquestion is answered by proposing a biddinggame for the reconciliation between the twosubproblems. The authors show that although allthe users behave selfishly, any specific a-fairnessscheme can be implemented by the bidding gamebetween the users and BS (Theorem 2).In Theorem 1, the upper bound <strong>of</strong> theperformance loss incurred by the -fairness isderived. The derived upper bound connects thenumber <strong>of</strong> users in the system and the fairnessparameter and it is independent from the QoEmodel as long as it satisfies some assumption,which is not too restrictive. The authors proposea bidding game to decompose the optimizationproblem into two subproblems similar to [16].The first subproblem describes the behaviors <strong>of</strong>the users and the second the BS, respectively.Each user tries to maximize its own objectivefunction by bidding game. Then, for givenpayment from each user, the BS strives to findoptimal transmission rate for each user tomaximize its own objective function (controlfunction). The control function is composed <strong>of</strong>the bidding money, allocated resource, andfairness parameter . In Theorem 2, the format<strong>of</strong> control function is given to resolve theoriginal optimization problem. To make theproposed bidding game work smoothly in arealistic blind scenario, the assumption that eachuser does not cheat during the whole biddingprocess is required. The authors also provide thecounter-measures for this issue. The convergenceproperty <strong>of</strong> the proposed bidding game is givenin Theorem 3.The performance evaluation <strong>of</strong> the proposedbidding game is performed based on real-worldtraces consisting <strong>of</strong> three multimediaapplications: audio, file, and video. Forhttp://committees.comsoc.org/mmc 10/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-Lettercomparison, the resource allocation with fullinformation,where the BS knows the QoE model<strong>of</strong> each user in advance, is considered [17].Firstly, the accuracy <strong>of</strong> the derived upper bound<strong>of</strong> the loss function is confirmed by comparing itwith the real observed loss values. It is shownthat the real loss value is close to the obtainedupper bound for different value <strong>of</strong> . It is alsoconfirmed that a higher value <strong>of</strong> yields ahigher performance loss, which is consistentwith the previous works [2]. Furthermore, largernumber <strong>of</strong> users also incurs a larger performanceloss. This observation suggests a basic operationrule for BS: when the system has a relativelysmall number <strong>of</strong> users, BS can achieve fairallocations without significant performancedeterioration. However, in the case with a largenumber <strong>of</strong> users, BS should be careful to employfairness as it will easily lead to a largeperformance loss. Secondly, the total MOS value<strong>of</strong> the proposed bidding game scheme iscompared with those <strong>of</strong> the full-information case.The proposed bidding game almost is shown toachieve the same performance as the fullinformationcase when is large. Finally, theconvergence property <strong>of</strong> the proposed biddinggame is clarified that 1) it converges within alimited number <strong>of</strong> iteration and 2) the number <strong>of</strong>users affects the convergence time.The derived upper bound <strong>of</strong> the performance lossincurred by the -fairness is useful tocharacterize and understand the trade<strong>of</strong>f betweenthe performance loss and the fairness. Since theproposed bidding game does not requireknowledge <strong>of</strong> QoE <strong>of</strong> each user, it is applicableto practical multimedia communication system.References:[1] T. Lan, et al., An Axiomatic Theory <strong>of</strong>Fairness in Resource Allocation, Princeton,NJ, USA: Princeton Univ., 2009.[2] L. Kaplow and S. Shavell, Fairness VersusWelfare, Cambridge, NJ, USA: HarvardUniv., 2002.[3] L. Zhou, et al., “Distributed media services inP-based vehicular networks,” <strong>IEEE</strong> Trans.Veh. Technol., vol. 60, no. 2, pp. 692–703,Feb. 2011.[4] L. Zhou, et al., “Distributed schedulingscheme for video streaming over multichannelmulti-radio multi-hop wirelessnetworks,” <strong>IEEE</strong> J. Sel. Areas Commun., vol.28, no. 3, pp. 409–419, Mar. 2010.[5] T. Nguyen, et al., “Efficient multimediadistribution in source constraint networks,”<strong>IEEE</strong> Trans. Multimedia, vol. 10, no. 3, pp.523–537, Apr. 2008.[6] Y. Li, et al., “Content-aware distortion-fairvideo streaming in congested networks,”<strong>IEEE</strong> Trans. Multimedia, vol. 11, no. 6, pp.1182–1193, Oct. 2009.[7] D. Hu, et al, “Scalable video multicast incognitive radio networks,” <strong>IEEE</strong> J. Sel. AreasCommun., vol. 28, no. 3, pp. 334–344, Mar.2010.[8] H.-P. Shiang and M. v. d. Schaar,“Information-constrained resource allocationin multi-camera wireless surveillancenetworks,” <strong>IEEE</strong> Trans. Circuits Syst. VideoTechnol., vol. 20, no. 4, pp. 505–517, Apr.2010.[9] Y. Zhang, et al., “Multihop packet delaybound violation modeling for resourceallocation in video streaming over meshnetworks,” <strong>IEEE</strong> Trans. Multimedia, vol. 12,no. 8, pp. 886–900, Dec. 2010.[10] H. Hu, et al., “Peer-to-peer streaming <strong>of</strong>layered video: Efficiency, fairness andincentive,” <strong>IEEE</strong> Trans. Circuits Syst. VideoTechnol., vol. 21, no. 8, pp. 1013–1026, Aug.2011.[11] Z. Han, et al, “Fair multiuser channelallocation for OFDMA networks using Nashbargaining solutions and coalitions,” <strong>IEEE</strong>Trans. Communun., vol. 53, no. 8, pp. 1366–1376, Aug. 2005.[12] H. Park and M. van der Schaar, “Fairnessstrategies for wireless resource allocationamong autonomous multimedia users,” <strong>IEEE</strong>Trans. Circuits Syst. Video Technol., vol. 20,no. 2, pp. 297–309, Feb. 2010.[13] Y. Chen, et al., “Multiuser rate allocationgames for multimedia communications,”<strong>IEEE</strong> Trans. Multimedia, vol. 11, no. 6, pp.1170–1181, Oct. 2009.[14] Q. Zhang and G. Liu, “Rate allocationgames in multiuser multimediacommunications,” IET Commun., vol. 5, no.3, pp. 396–407, 2011.[15] H. Park and M. v. d. Schaar, “Bargainingstrategies for networked multimedia resourcemanagement,” <strong>IEEE</strong> Trans. Signal Process.,vol. 55, no. 7, pp. 3496–3511, Jul. 2007.[16] F. Kelly, “Charging and rate control forelastic traffic,” Eur. Trans. Telecommun., vol.8, no. 1, pp. 33–37, 1997.[17] A. Khan, et al, “Quality <strong>of</strong> experiencedrivenadaptation scheme for videoapplications over wireless networks,” IEThttp://committees.comsoc.org/mmc 11/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterCommun., vol. 4, no. 11, pp. 1337–1347,2010.Koichi Adachi received the B.E., M.E., and Ph.Ddegrees in engineering from Keio University, Japan, in2005, 2007, and 2009 respectively. From 2007 to 2010,he was a Japan <strong>Society</strong> for the Promotion <strong>of</strong> Science(JSPS) research fellow. Currently he is with theInstitute for Infocomm Research, A*STAR, inSingapore. His researchinterests include cooperativecommunications. He was thevisiting researcher at CityUniversity <strong>of</strong> Hong Kong inApril 2009 and the visitingresearch fellow at University<strong>of</strong> Kent from June to Aug2009.http://committees.comsoc.org/mmc 12/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterImproved Cloud Resource Utilization for IPTV TransmissionA short review for “Optimizing Cloud Resources for Delivering IPTV Services through Virtualization”Edited by Carl James DebonoV. Aggarwal, V. Gopalakrishnan, R. Jana, K.K. Ramakrishnan, and V.A. Vaishampayan,"Optimizing Cloud Resources for Delivering IPTV Services through Virtualization," <strong>IEEE</strong>Transactions on Multimedia, vol. 15, no. 4, pp. 789-801, June <strong>2013</strong>.Internet Protocol-based video delivery isincreasing in popularity with the result that itsresource requirements are continuously growing.It is estimated that by the year 2017 video trafficwill account 69% <strong>of</strong> the total consumer’s Internettraffic [1]. Content and service providerstypically configure their resources such that theycan handle peak demands <strong>of</strong> each service theyprovide across the subscriber population.However, this means that the resources areunder-utilized during non-peak times. Thepredominant types <strong>of</strong> Internet ProtocolTeleVision (IPTV) services that the authors <strong>of</strong>the original paper focus on are Live TV andVideo On Demand (VoD) services, as these arethe primary capabilities supported by serviceproviders. Live TV provides a very burstyworkload pr<strong>of</strong>ile with tight deadlines, whilst onthe other hand VoD has a relatively steady loadand is less stringent on delay requirements.The solution presented takes advantage <strong>of</strong> thetemporal differences in the demands from theseIPTV workloads to better utilize the servers thatwere deployed to support these services. WhileVoD is delivered via unicast, Live TV isdelivered over multicast to reduce bandwidthdemands. However, to support Instant ChannelChange (ICC) in Live TV, service providers senda unicast stream for that channel for a shortperiod <strong>of</strong> time to keep a good quality <strong>of</strong>experience. If a number <strong>of</strong> users change theirchannels around the same period <strong>of</strong> time, thisproduces a large burst load on the server that hasto support the corresponding number <strong>of</strong> users.Compared to the ICC workload which is verybursty and has a large peak to average ratio, VoDhas a relatively steady load and imposes arelatively lax delay requirement. By multiplexingacross these services, the resource requirementsfor supporting the combined set <strong>of</strong> services canbe reduced.Two services that have workloads which differsignificantly over time can be combined on thesame virtualized platform. This allows forscaling <strong>of</strong> the number <strong>of</strong> resources according toeach service’s current workloads. It is, however,possible that the peak workload <strong>of</strong> differentservices may overlap. Under such scenarios, thebenefit <strong>of</strong> a virtualized infrastructure diminishes,unless there is an opportunity to time shift one <strong>of</strong>the services in anticipation <strong>of</strong> the other service’srequirements to avoid having to deliver bothservices at the same time instant. In general, thecloud service provider strives to optimize thecost for all time instants, not necessarily justreducing the peak server load. The authors <strong>of</strong> theoriginal paper consider a generalized costfunction, which can be specialized to a peakserver load or tiered pricing as possible options.Consider a scenario with multiple services, eachhaving its own deadline constraint. Theoptimization problem tackled is to determine thenumber <strong>of</strong> servers that are needed at each timeinstant by minimizing a generalized costfunction, while at the same time satisfying all thedeadlines associated with these services. Toachieve this, the authors identified the servercapacityregion which is formed by servers ateach time instant (or server tuple), such that allthe arriving requests meet their deadlines. Theresults show that for any server tuple withinteger entries inside the server-capacity region,adopting an Earliest Deadline First (EDF)strategy [2] manages to service all the requestswithout missing deadlines.After identifying the server-capacity region,several cost functions were considered, namely:a separable concave function, a separable convexfunction, and a maximum function. The originalauthors find that the feasible set <strong>of</strong> server tuplesis all integer tuples in the server-capacity region.This constraint increases the difficulty <strong>of</strong> findinga solution to the problem. However, for apiecewise linear separable convex function, analgorithm that minimizes the cost function caneasily be found. Moreover, only causalinformation <strong>of</strong> the requests coming at each timeinstantare required. On the other hand, forhttp://committees.comsoc.org/mmc 13/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-Letterconcave cost functions, the original paper reportsthat the integer constraint can be relaxed, sinceall the corner points <strong>of</strong> the region <strong>of</strong> interest haveinteger coordinates. Therefore, concaveprogramming techniques without integerconstraints [3] can be applied. The paper alsoinvestigates a "two-tier" cost function, as a basicstrategy for cloud service pricing. A closed formexpression is found for the optimal number <strong>of</strong>servers needed in such a scenario. The algorithmdeveloped reduces the run time duration toO(T 2 ), compared to the O(T 3 ) complexityrequired when directly implementing theexpression.The authors <strong>of</strong> the original paper study twoapproaches for sharing the resources: (a)postponing and (b) advancing the delivery <strong>of</strong>VoD. The postponement approach assumes thatchunk i is requested at time t, and has a deadlinewhich is d time units after the initial request.Conversely, the advancement technique assumesthat all the chunks are requested when the videois first demanded by the user and that each chunkhas a deadline for its playout time. A series <strong>of</strong>simulations were set up for both scenarios tostudy the effect <strong>of</strong> varying the ICC durations andthe delay tolerance <strong>of</strong> VoD services on the totalnumber <strong>of</strong> servers needed for the combinedworkload. Two cost functions were considered todetermine the number <strong>of</strong> servers, namely, themaximum and the piecewise-linear convexfunctions. A limit on the downlink bandwidthwas also considered for the VoD deliveryadvancement method. The reported results showthat server bandwidth savings between 10% and32% can be obtained by anticipating the ICCload and shift the VoD load ahead <strong>of</strong> the ICCbursts.The reported results show that if peak loads areconsidered, the algorithm is capable <strong>of</strong> reducingthe peak by around 24%. This has a direct impacton the cost <strong>of</strong> the infrastructure since 24% fewerservers would be required to serve all therequests in the simulated scenario.The possibility <strong>of</strong> predicting and time-shiftingIPTV load in wired and wireless networks allowsfor better utilization <strong>of</strong> the cloud infrastructure.Further work is needed to improve predictiontechniques and include other parameters suchthat the ever increasing demand <strong>of</strong> video servicescan be sustained. Furthermore, storagerequirements and other traffic on the networkneed also to be considered in the optimizationstrategy. The solution presented relies onhomogeneous servers, something which cannotbe guaranteed, and thus heterogeneous systemsneed to be studied too. Moreover, lowcomplexitysecurity solutions to avoideavesdropping <strong>of</strong> the video data, and its relatedprocessing, need to be developed as theseservices keep proliferating.References:[1] Cisco Visual Networking Index: Forecastand Methodology, 2012-2017. [Online]Available:http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360.pdf[2] J. A. Stanković, M. Spuri, K.Ramamritham, and G. C. Buttazzo,Deadline Scheduling for Real-TimeSystems : Edf and Related Algorithm,Norwell, MA, USA: Kluwer, 1998.[3] N. V. Thoai, and H. Tui, “Convergentalgorithms for minimizing a concavefunction,” Mathematics <strong>of</strong> OperationsResearch, vol. 5, no. 4, pp. 556-566, 1980.Carl James Debono (S’97, M’01,SM’07) received his B.Eng.(Hons.) degree in ElectricalEngineering from the University<strong>of</strong> Malta, Malta, in 1997 and thePh.D. degree in Electronics andComputer Engineering from theUniversity <strong>of</strong> Pavia, Italy, in 2000.Between 1997 and 2001 he was employed as aResearch Engineer in the area <strong>of</strong> Integrated CircuitDesign with the Department <strong>of</strong> Microelectronics at theUniversity <strong>of</strong> Malta. In 2000 he was also engaged as aResearch Associate with Texas A&M University,Texas, USA. In 2001 he was appointed Lecturer withthe Department <strong>of</strong> <strong>Communications</strong> and ComputerEngineering at the University <strong>of</strong> Malta and is now anAssociate Pr<strong>of</strong>essor. He is currently the Deputy Dean<strong>of</strong> the Faculty <strong>of</strong> ICT at the University <strong>of</strong> Malta.Pr<strong>of</strong>. Debono is a senior member <strong>of</strong> the <strong>IEEE</strong> andserved as chair <strong>of</strong> the <strong>IEEE</strong> Malta Section between2007 and 2010. He is the <strong>IEEE</strong> Region 8 Vice-Chair<strong>of</strong> Technical Activities for <strong>2013</strong>. He has served onvarious technical program committees <strong>of</strong> internationalconferences and as a reviewer in journals andconferences. His research interests are in wirelesssystems design and applications, multi-view videocoding, resilient multimedia transmission, andmodeling <strong>of</strong> communication systems.http://committees.comsoc.org/mmc 14/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterAutomatic Output Motion Depiction using 2D videos as Direct InputA short review for ”Human Movement Summarization and Depiction from Videos“Edited by Irene ChengYijuan Lui and Hao Jiang, “Human Movement Summarization and Depiction fromVideos”, In Proceedings Internation Conference on Multimedia & Expo (ICME) <strong>2013</strong>(Best Paper).Motion analysis is a popular research topic studied indifferent disciplines including computer vision, imageprocessing, communication and computer graphic. Ithas diverse applications, such as training, medicine,entertainment, surveillance and navigation. Theanalysis can be based on 2D videos or 3D motioncapture data. Starting from 3D motion data <strong>of</strong>tenproduces good results [1, 2, 3] because the motionsequence is view invariant and not affected by visualocclusion. However, capturing 3D motion requiresthe use <strong>of</strong> sensors to track feature points movements.The apparatus setup can be complex and expensive,and thus the operation is not generally accessible bynon-pr<strong>of</strong>essionals. In contrast, motion analysis basedon 2D videos can be affected by view variation andocclusion, but video data is easily be obtained byamateurs.This paper presents an automatic method to depicthuman movement using 2D videos as direct input.The method does not require 3D motion capture dataor manual intervention. The method analyzes interframeas well as frame group trajectories <strong>of</strong> bodyfeature points based on both body part detection andoptical flow adjusted by error correction. The outputis color coded arrows and motion particles, which areparticularly useful for training and rehabilitationpurposes to show how a specific movement can beperformed. The compact depiction can also be usedfor trajectory integration, action recognition andmovement analysis. There are three steps in theproposed method: segment videos into sub-actions,track feature points, and depict motion usingestimated movement. The authors tested their methodon a number <strong>of</strong> videos with satisfactory results.One finding is that the number <strong>of</strong> clusters is difficultto determine when clustering based methods areapplied to action segmentation. Thus in the first step,the authors use cluster <strong>of</strong> streamlines to complementan action boundary detection scheme. Seed points arerandomly selected in each frame. The trajectories aregenerated by linking the points between frames. Inthe current implementation, a group <strong>of</strong> 15 frames isused to compute motion trajectories. The authorsrealize that by using this simple scheme there is noguarantee the motion trajectories will intersect.However, for motion depiction purpose, only theoverall path is needed and thus a rough representationis adequate. The obtained motion trajectories are thenshifted so that they all start from point (0,0,0). Thethree coordinates represent x, y and time. Thetrajectories are further projected to the xy plane andthe 2D coordinates <strong>of</strong> points on the trajectories arenormalized. In order to detect movement boundaries,it requires the action features be stable when bodyparts keep their motion direction. The distance <strong>of</strong>streamlines between successive time instants iscomputed and the results are plotted on a 1D curve.Potential action changes are indicated by localmaxima on the curve, which are used in thesubsequent motion estimation step.In the next step, body parts are detected withassociated feature points trajectories. Ten body partsincluding head, torso, four half arms and four halflegs are detected. The paper points out that thedetector does not distinguish the left and right armsand legs, and there are many detection errors. Toobtain accurate and efficient body parts association, alinear multiple shortest path following problem isformulated. I find the graphs constructed in this paperfor each pair <strong>of</strong> limbs, and the inclusion <strong>of</strong> fourpossible body part assignments at each layer in thegraph quite interesting. Graphical models arecommonly used in research and are <strong>of</strong>ten an efficientmean to solve or simplify complex high dimensionalproblems. Since the trajectory can have a long span ifpropagating point location from frame to frame usingoptical flow, thus the method formulates the pointcloud trajectory estimation as an optimizationproblem based on constraining trajectories by thebody part direction, optical flow and objectforeground estimation. Error body part movementestimation is then cleaned up.Static illustration integrated with artwork is adoptedin the final step to translate human movementestimation into graphics representations. The authorsuse directional arrows to depict the body partmovement, particles to illustrate the local motion, andghost images to indicate transitional and ending poses.Mean trajectory is computed and used as the centerline <strong>of</strong> the arrow with predefined width. In order tohttp://committees.comsoc.org/mmc 15/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-Letterremove the error from the mean trajectory, they fit atrajectory to a second-order polynomial, which issufficient to quantify the general shape <strong>of</strong> the motion.The color at each point on the arrows is proportionalto the speed <strong>of</strong> the motion. Only directional arrows <strong>of</strong>significant length are kept. The final image withoverlapping arrows is then rendered.The proposed method was tested on two balletsequences and two recorded videos containingcomplex motions. The results demonstrate that evenwith occlusion and low quality video shots taken witha shaky hand-held camera, the output motiondepictions are satisfactory. Though more tests couldhave further validated the robustness <strong>of</strong> the approach,I find this paper interesting and can inspire furtherresearch on motion data analysis.References:[1] J. Assa, Y. Caspi and D. Cohen-or, “ActionSynopsis: Pose Selection and Illustration,”SIGGRPAH 2005.[2] S. Bouvier-Zappa, V. Ostromoukhow and P.Poulin, “Motion Cues for Illustration <strong>of</strong> SkeletalMotion Capture Data,” Symposium on Non-Photorealistic Animation and Rendering 2007.[3] J. Barbic, A. Safonova J. Pan, C. Faloutsos, J.K.Hodgins and N.S. Pollard, “Segmenting MotionCapture Data into Distinct Behaviors,” ACMGraphics Interface 2004.Irene Cheng, SM<strong>IEEE</strong> is the Scientific Director <strong>of</strong> theMultimedia Research Centre,and an Adjunct Pr<strong>of</strong>essor in theFaculty <strong>of</strong> Science, as well asthe Faculty <strong>of</strong> Medicine &Dentistry, University <strong>of</strong>Alberta, Canada. She is also aResearch Affiliate with theGlenrose RehabilitationHospital in Alberta, Canada.She is a Co-Chair <strong>of</strong> the <strong>IEEE</strong>SMC <strong>Society</strong>, Human Perception in Vision, Graphics andMultimedia Technical Committee; was the Chair <strong>of</strong> the<strong>IEEE</strong> Northern Canada Section, Engineering in Medicineand Biological Science (EMBS) Chapter (2009-2011), andthe Chair <strong>of</strong> the <strong>IEEE</strong> Communication <strong>Society</strong>, MultimediaTechnical Committee 3D Processing, Render andCommunication (MMTC) Interest Group (2010-2012). Sheis now the Director <strong>of</strong> the Review-Letter Editorial Board <strong>of</strong>MMTC (2012-2014).Over the last ten years, she has more than 110 internationalpeer-reviewed publications including 2 books and 31journals. Her research interests include multimediacommunication techniques, Quality <strong>of</strong> Experience (QoE),Levels-<strong>of</strong>-detail, 3D Graphics Visualization and PerceptualQuality Evaluation. In particular, she introduced applyinghuman perception – Just-Noticeable-Difference – followingpsychophysical methodology to generate multi-scale 3Dmodels..http://committees.comsoc.org/mmc 16/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterIdentify Visualizable ConceptsA short review for ”Mining Visualness“Edited by Weiyi ZhangZheng Xu, Xin-Jing Wang, Chang Wen Chen, "Mining Visualness”, In ProceedingsInternational Conference on Multimedia & Expo (ICME) <strong>2013</strong> (Best Student Paper).Despite decades <strong>of</strong> successful research onmultimedia and computer vision, the semanticgap between low-level visual features and highlevelsemantic concepts remains a problem.Instead <strong>of</strong> generating more powerful features orlearning more intelligent models, researchershave started to investigate which concepts can bemore easily modeled by existing visual features[1, 2, 3]. To understand to what extent a concepthas visual characteristics, i.e. “visualness”, hasmany values. For instance, it can benefit recentresearch efforts on constructing image databases[4, 5]. These efforts generally attempt to attachimages onto pre-defined lexical ontology, whileexisting ontology were built without takingvisual characteristics into consideration.Knowing which concepts are more likely to findrelevant images will help save labors and controlnoises in database construction. Visualnessestimation is also useful for image-to-text [2, 6]and text-to-image [7] translation, e.g., words <strong>of</strong>more visualizable concepts are potentially betterannotations for an image.Albeit the usefulness, a general solution <strong>of</strong>visualness estimation faces many challenges: 1)It is unknown which concepts or which types <strong>of</strong>concepts are visualizable, i.e. whetherrepresentative images can be found to visualizeits semantics. For instance, “dignity” and“fragrant” are both abstract nouns, but the formeris more difficult to visualize as “fragrant” isclosely related to visual concepts such as flowersand fruits; 2) Different visual concepts havediverse visual compactness and consistency,especially for collective nouns (e.g., “animal”)and ambiguous concepts (e.g., “apple”, whichmay represent a kind <strong>of</strong> fruit or a company); and3) Even though a concept is highly visualizable,it may still be difficult to capture the visualcharacteristics due to the semantic gap. Fewprevious works in the literature have touched thisresearch topic. They either use pre-definedconcept list or insufficiently assumedprototypical concepts.In this paper, the authors attempt to discover andquantify the visualness <strong>of</strong> concepts automaticallyfrom a large-scale dataset. The quantitativemeasure <strong>of</strong> a concept is based on visual andsemantic synsets (named Visualsets), rather thana single image cluster or keyword as in previousworks. Visualsets perform disambiguation on thesemantics <strong>of</strong> a concept and ensures visualcompactness and consistency,which is inspiredby synsets in the work <strong>of</strong> ImageNet[4] andVisual Synsets[6]. In this paper's approach, avisualset is a group <strong>of</strong> visually similar imagesand related words, both are scored by theirmembership probabilities. Visualsets containprototypical visual cues as well as prototypicalsemantic concepts. Given the visualsets, thevisualness <strong>of</strong> a concept is thus modeled as amixture distribution on its correspondingvisualsets. Moreover, the authors discover bothsimple concepts (keywords) and compoundconcepts (combination <strong>of</strong> unique keywords)simultaneously from the generated visualsets.The proposed approach contains three steps: 1)build an image heterogeneous graph withattribute nodes generated from multitype features;Given a (noisily) tagged image dataset such as aweb image collection, the proposed schemeconnects the images into a graph to facilitate theclustering approach for visualsets mining.Specifically, the scheme extracts multiple types<strong>of</strong> visual features and textual feature for imagesto generate attribute nodes. The edges <strong>of</strong> thegraph are defined by links between images andattribute nodes instead <strong>of</strong> image similaritieswhich are generally adopted in previous works[2]. 2) mine visualsets from the heterogeneousgraph with an iterative clustering-rankingalgorithm; an iterative ranking-clusteringapproach is applied to form visual and textualsynsets, i.e. visualsets. In each iteration, it startswith the guess on image clusters. Based on theguess, the solution scores and ranks each imageas well as attribute nodes in each visualset.Images are then mapped to the feature spacedefined by the visualsets mixture model. Clustersare refined based on the estimated posteriors,http://committees.comsoc.org/mmc 17/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-Letterwhich gives the guess on image clusters for thenext iteration. 3) estimate visualness <strong>of</strong> conceptswith visualsets: after the clustering-rankingapproach converges, the scheme estimates thevisualness <strong>of</strong> concepts (simple and compound)from the visualsets based on final scores <strong>of</strong>images and attribute nodes.The authors also conducted extensiveexperiments to verify their proposed scheme,using the NUS-WIDE dataset containing269,648 images and 5,018 unique tag wordsfrom Flickr. Two types <strong>of</strong> global features, 64-Dcolor histogram and 512-D GIST, are extracted.Each type <strong>of</strong> global features is further clusteredinto 2000 clusters by k-means clustering, whosecenters form the set-based attribute nodes <strong>of</strong> theimage heterogeneous graph. Local SIFT featuresare also extracted and clustered into 2000 visualwords by k-means clustering, based on whichword-based attribute nodes are generated. Theproposed scheme achieved promising results anddiscovered 26,378 visualizable compoundconcepts from NUS-WIDE.References:[1] K. Yanai and K. Barnard, “Image regionentropy: a measure <strong>of</strong> visualness” <strong>of</strong> webimages associated with one concept,” inACM Multimedia, 2005.[2] Y. Lu, L. Zhang, Q. Tian, and W.Y. Ma,“What are the highlevel concepts with smallsemantic gaps?,” in CVPR, 2008.[3] J.W. Jeong, X.J.Wang, and D.H. Lee,“Towards measuring the visualness <strong>of</strong> aconcept,” in CIKM, 2012.[4] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li,and L. Fei-Fei,“Imagenet: A large-scalehierarchical image database,” in CVPR,2009.[5] X.J. Wang, Z. Xu, L. Zhang, C. Liu, and Y.Rui, “Towards indexing representativeimages on the web,” in ACM MultimediaBrave New Idea Track, 2012.[6] D. Tsai, Y. Jing, Y. Liu, H. A. Rowley, S.I<strong>of</strong>fe, and J.M. Rehg, Large-scale imageannotation using visual synset,” in ICCV,2011.[7] X. Zhu, A. B. Goldberg, M. Eldawy, C. R.Dyer, and B. Strock, “A text-to-picturesynthesis system for augmentingcommunication,” in AAAI, 2007.Weiyi Zhang is currently aResearch Staff Member <strong>of</strong>the Network EvolutionResearch Department atAT&T Labs Research,Middletown, NJ. Beforejoin AT&T Labs Research,he was an AssistantPr<strong>of</strong>essor at the ComputerScience Department, North Dakota State University,Fargo, North Dakota, from 2007 to 2010. His researchinterests include routing, scheduling, and cross-layerdesign in wireless networks, localization and coverageissues in wireless sensor networks, survivable designand quality-<strong>of</strong>-service provisioning <strong>of</strong> communicationnetworks. He has published more than 70 refereedpapers in his research areas, including papers inprestigious conferences and journals such as <strong>IEEE</strong>INFOCOM, ACM MobiHoc, ICDCS, <strong>IEEE</strong>/ACMTransactions on Networking, ACM WirelessNetworks, <strong>IEEE</strong> Transactions on VehicularTechnology and <strong>IEEE</strong> Journal on Selected Areas in<strong>Communications</strong>. He received AT&T Labs ResearchExcellence Award in <strong>2013</strong>, Best Paper Award in 2007from <strong>IEEE</strong> Global <strong>Communications</strong> Conference(GLOBECOM’2007). He has been serving on thetechnical or executive committee <strong>of</strong> manyinternationally reputable conferences, such as <strong>IEEE</strong>INFOCOM. He was the Finance Chair <strong>of</strong> <strong>IEEE</strong>IWQoS’2009, and serves the Student Travel GrantChair <strong>of</strong> <strong>IEEE</strong> INFOCOM’2011.http://committees.comsoc.org/mmc 18/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterCooperative Video Summary Delivery over Wireless NetworksA short review for “Video Summary Delivery over Cooperative Wireless Networks”Edited by Xiaoli ChuS. Ci, D. Wu, Y. Ye, Z. Han, G-M. Su, H. Wang and H. Tang, "Video summary deliveryover cooperative wireless networks,'' <strong>IEEE</strong> Wireless <strong>Communications</strong>, vol. 19, iss. 2, pp.80-87, Apr. 2012.It has been widely accepted that cooperativewireless communications can improve nodeconnectivity, increase link throughput, savenetwork power consumption, and so on. Popularcooperative communication schemes includeamplify-and-forward (AF), decode-and-forward(DF), and coded cooperation (CC), but none <strong>of</strong>them has the ability to perform video rate andquality adaptation for video transmissions at therelay node.Video summarization generates a short summary<strong>of</strong> the content <strong>of</strong> a possibly huge volume <strong>of</strong> videodata, in a way that the essential/importantinformation <strong>of</strong> the original data is delivered to thereceiver. Video summarization can significantlyreduce the data amount to be transmitted whilemaintaining video content coverage.For resource-limited wireless video applicationsthat have stringent requirements on powerconsumption, video delivery timeliness, videoquality and content coverage, such as videosurveillance, integrating video summarizationwith cooperative communications would havesignificant benefits. However, due to thecomplexity <strong>of</strong> combining scene understanding,video coding and wireless communications, therehave been limited research efforts reported onvideo summary transmission over cooperativewireless networks [1-3].In this paper, the authors propose a cross-layeroptimization framework for cooperative videosummary transmission. A decode-process-andforward(DPF) scheme, where a relay node withvideo processing capability extracts the mostuseful information from video summary framesand generates a concise version <strong>of</strong> the summaryframe, namely summary <strong>of</strong> summary (SoS), isproposed for video summary transmission. Thedestination node then uses the received SoSinformation to enhance its error concealmentcapability, resulting in an improved videoreconstruction quality. In the proposed crosslayerframework, source coding, relay processing,power allocation between source and relay, anderror concealment strategy are jointly considered.The SoS can be obtained after a few videoprocessing steps with various levels <strong>of</strong>computational complexity. Depending on specificsystem settings and network conditions, the videoprocessing methods can be chosen from downsamplingthe image, filtering the high-frequencycomponents <strong>of</strong> the image, encoding the vide<strong>of</strong>rame with a lower bit budget, extracting theregion <strong>of</strong> interest (ROI) information [4], anddropping the current video summary frame.The video processing methods and errorconcealment strategies used in the relay node andthe destination are known to the system controller,which resides in the source node, controls andoptimizes the parameter settings <strong>of</strong> all modulesbased on application requirements, channelconditions, and computational complexity. For apractical solution, trade-<strong>of</strong>f needs to be madebetween the computational capability and theoptimality <strong>of</strong> a solution [5].The relationship between the video frame lossprobability and the packet loss probability <strong>of</strong> eachlink depends on the packet encapsulation orpacket fragmentation scheme used. In theproblem formulation, it is assumed that eachvideo summary frame is compressed into onepacket, so the video frame loss probability is thesame as the packet loss probability <strong>of</strong> each link.Experiments have been carried out to evaluate theperformance <strong>of</strong> the proposed DPF scheme byusing H.264/AVC JM 12.2, and throughperformance comparison (in terms <strong>of</strong> peak signalto-noiseratio (PSNR)) with conventional directtransmission, DF, and multipath transmission(MT). The experimental results show that theproposed DPF scheme significantly outperformsthe other three schemes. This indicates that theproposed DPF scheme is able to not only exploitthe channel fading diversity <strong>of</strong> cooperativecommunications but also adapt resourcehttp://committees.comsoc.org/mmc 19/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-Letterallocation through flexible video processing at therelay node.The numerical results also show that excessivepower consumption by either the source or relaynode will not improve the distortion performancesignificantly. The power allocation between thesource and relay nodes needs to be optimized toachieve a remarkable performance gain with theproposed DPF scheme.In conclusion, the proposed DPF scheme canachieve a significant improvement in terms <strong>of</strong> thereceived video quality, as compared with existingcooperative transmission schemes in the literature.References:[1] Z. Li, et al., “Video summarization forenergy efficient wireless streaming,” Proc.SPIE Visual Commun. and Image Processing,Beijing, China, 2005.[2] P. V. Pahalawatta, et al., “Rate-distortionoptimized video summary generation andtransmission over packet lossy networks,”Proc. SPIE Image and Video Commun. andProcessing, San Jose, CA, 2005.[3] D. Wu, S. Ci and H. Wang, “Cross-layeroptimization for video summary transmissionover wireless networks,” <strong>IEEE</strong> JSAC, vol. 25,no. 4, pp. 841-850, May 2007.[4] D. Wu, et al., “Quality-driven optimizationfor content-aware real-time video streamingin wireless mesh networks,” Proc. <strong>IEEE</strong>GlobeCom, New Orleans, LA, Dec. 2008.[5] G. M. Schuster and A. K. Katsaggelos, Rate-Distortion Based Video Compression:Optimal Video Frame Compression andObject Boundary Encoding, Kluwer, 1997.Xiaoli Chu is a Lecturer inthe Department <strong>of</strong>Electronic and ElectricalEngineering at theUniversity <strong>of</strong> Sheffield, UK.She received the B.Eng.degree in Electronic andInformation Engineeringfrom Xi’an Jiao TongUniversity, China, in 2001,and the Ph.D. degree in Electrical and ElectronicEngineering from the Hong Kong University <strong>of</strong>Science and Technology in 2005. From Sep. 2005 toApr. 2012, she was with the Centre forTelecommunications Research at King’s CollegeLondon. Her current research interests includeheterogeneous networks, cooperative communications,cognitive communications, and green radios. She haspublished more than 60 peer-reviewed journal andconference papers. She is the leading editor/author <strong>of</strong>the Cambridge University Press book, HeterogeneousCellular Networks, May <strong>2013</strong>. She was a guest editor<strong>of</strong> the Special Issue on Cooperative FemtocellNetworks (Oct. 2012) for ACM/Springer Journal <strong>of</strong>Mobile Networks & Applications, and is a guest editor<strong>of</strong> the Special Section on Green Mobile Multimedia<strong>Communications</strong> (Apr. 2014) for <strong>IEEE</strong> Transactionson Vehicular Technology.http://committees.comsoc.org/mmc 20/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterPaper Nomination PolicyFollowing the direction <strong>of</strong> MMTC, the R-Letterplatform aims at providing research exchange,which includes examining systems, applications,services and techniques where multiple mediaare used to deliver results. Multimedia include,but are not restricted to, voice, video, image,music, data and executable code. The scopecovers not only the underlying networkingsystems, but also visual, gesture, signal and otheraspects <strong>of</strong> communication.Any HIGH QUALITY paper published in<strong>Communications</strong> <strong>Society</strong> journals/magazine,MMTC sponsored conferences, <strong>IEEE</strong>proceedings or other distinguishedjournals/conferences, within the last two years iseligible for nomination.Nomination ProcedurePaper nominations have to be emailed to R-Letter Editorial Board Directors:Irene Cheng (locheng@ualberta.ca),Weiyi Zhang (maxzhang@research.att.com), andChristian Timmerer(christian.timmerer@itec.aauat)The nomination should include the completereference <strong>of</strong> the paper, author information, abrief supporting statement (maximum one page)highlighting the contribution, the nominatorinformation, and an electronic copy <strong>of</strong> the paperwhen possible.Review ProcessEach nominated paper will be reviewed by members<strong>of</strong> the <strong>IEEE</strong> MMTC Review Board. To avoidpotential conflict <strong>of</strong> interest, nominated papers coauthoredby a Review Board member will bereviewed by guest editors external to the Board. Thereviewers’ names will be kept confidential. If tworeviewers agree that the paper is <strong>of</strong> R-<strong>letter</strong> quality,a board editor will be assigned to complete thereview <strong>letter</strong> (partially based on the nominationsupporting document) for publication. The reviewresult will be final (no multiple nomination <strong>of</strong> thesame paper). Nominators external to the board willbe acknowledged in the review <strong>letter</strong>.R-Letter Best Paper AwardAccepted papers in the R-Letter are eligible forthe Best Paper Award competition if they meetthe election criteria (set by the MMTC AwardBoard).For more details, please refer tohttp://committees.comsoc.org/mmc/r<strong>letter</strong>s.asphttp://committees.comsoc.org/mmc 21/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>


<strong>IEEE</strong> COMSOC MMTC R-LetterMMTC R-Letter Editorial BoardDIRECTOR CO-DIRECTOR CO-DIRECTORIrene Cheng Weiyi Zhang Christian TimmererUniversity <strong>of</strong> Alberta AT&T Research Alpen-Adria-Universität KlagenfurtCanada USA AustriaEDITORSKoichi Adachi, Institute <strong>of</strong> Infocom Research, SingaporePradeep K. Atrey, University <strong>of</strong> Winnipeg, CanadaGene Cheung, National Institute <strong>of</strong> Informatics (NII), Tokyo, JapanXiaoli Chu, University <strong>of</strong> Sheffield, UKIng. Carl James Debono, University <strong>of</strong> Malta, MaltaGuillaume Lavoue, LIRIS, INSA Lyon, FranceJoonki Paik, Chung-Ang University, Seoul, KoreaLifeng Sun, Tsinghua University, ChinaAlexis Michael Tourapis, Apple Inc. USAVladan Velisavljevic, University <strong>of</strong> Bedfordshire, Luton, UKJun Zhou, Griffith University, AustraliaJiang Zhu, Cisco Systems Inc. USAMultimedia <strong>Communications</strong> Technical Committee (MMTC)OfficersChair Jianwei HuangSteering Committee Chair Pascal FrossardVice Chair – North America Chonggang WangVice Chair – Asia Yonggang WenVice Chair – Europe Luigi AtzoriVice Chair – Letters & Member <strong>Communications</strong> Kai YangSecretary Liang ZhouMMTC examines systems, applications, services and techniques in which two or more media areused in the same session. These media include, but are not restricted to, voice, video, image,music, data, and executable code. The scope <strong>of</strong> the committee includes conversational,presentational, and transactional applications and the underlying networking systems to supportthem.http://committees.comsoc.org/mmc 22/22 Vol.4, No.4, <strong>August</strong> <strong>2013</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!