<strong>IEEE</strong> COMSOC MMTC E-LetterAnother important aspect to consider is attention,which is the cognitive process of selectivelyconcentrating on one aspect of the environmentwhile ignoring other things. For example,humans have the ability to focus one’s listeningattention on the voice of a particular talker in anenvironment in which there are many othersspeaking at the same time, known as the cocktailparty effect (Bregman, 1990). For an effectivecommunication, the listener must pay attentionand the speaker must react to where the listener’sattention is. In a face-to-face communication, thebandwidth between people can be considered asboundless, but a person can only consume acertain amount of information due to his/herattention. In telecommunication, bandwidth islimited, and it is even more crucial to be awareof each participant’s attention and deliver onlythe information important to the participants.Decades ago, Herbert Simon already articulatedvery nicely: “...in an information-rich world, thewealth of information means a dearth ofsomething else: a scarcity of whatever it is thatinformation consumes. What informationconsumes is rather obvious: it consumes theattention of its recipients. Hence a wealth ofinformation creates a poverty of attention and aneed to allocate that attention efficiently amongthe overabundance of information sources thatmight consume it” (Simon, 1971). When webuild telecommunications systems, we shouldconsider attention scarcity as a design requisite,and filter out unimportant or irrelevantinformation. With the rapid growth of bandwidthcapacity and the decrease in cost and increase inquality of audio-visual capture devices, it is tooeasy to overload information.Below is a diagram illustrating the working ofcurrent telecommunication systems. Theinformation flows from the source (here Site 1)to the destination (here Site 2). Filter H(R)represents a person’s attention and sensorylimitation. The person at Site 2 has his/herexpectation of what he/she wants. However,capture devices, processing algorithms andrendering equipment are pre-designed to meetwhat the designer expects the expectation is. Theinformation the person at Site 2 actually obtainsusually does not meet his/her expectation.Below is a diagram illustrating a new frameworkI am proposing. The capture devices, processingalgorithms and rendering equipment should bedesigned to reflect the receiver’s expectation andattention in real-time.There are a number of challenges in thisframework:• How to model users’ expectation?• How to determine users’ attention?http://www.comsoc.org/~mmc/ 8/41 Vol.4, No.7, August 2009
<strong>IEEE</strong> COMSOC MMTC E-Letter• How to accommodate the deficiency ofhuman perception?• How to incorporate users’ expectationand attention in each component?• Real-Time! Real-Time!! Real-Time!!!There have been several efforts along thisdirection. One example is the boardroom highendvideo conferencing systems including HPHalo and Cisco TelePresence. My research grouphas been working on a number of projects. Oneis audio spatialization to leverage cocktail partyeffect. We have done audio spatialization overheadphones (Chen & Zhang, 2009) as well asover loudspeakers (Zhang, Cai, & Stokes, 2008).For the latter, we have to address the issue ofmultichannel acoustical echo cancellation. Theother is the Personal Telepresence Station (Knies,2009), which provides the correct gaze cues inmultiparty conferencing through videospatialization. We also work on 3Dvideoconferencing to make conferencing visuallymore immersive. A lot more, however, needs tobe done, and this is an exciting area and the righttime to work on.[6] Knies, R. (2009, March 9). Making VirtualMeetings Feel Real. Retrieved from MicrosoftResearch: http://research.microsoft.com/en-us/news/features/personaltelepresencestation-030909.aspx[7] Mehrabian, A. (1981). Silent messages: Implicitcommunication of emotions and attitudes (2nded.). Belmont, California: Wadsworth.[8] Moore, B. C. (2003). An Introduction to thePsychology of Hearing (Vol. 5th Edition).Academic Press.[9] Simon, H. A. (1971). Designing Organizationsfor an Information-Rich World. In M.Greenberger, Computers, Communication, andthe Public Interest (pp. 40-41). The JohnsHopkins Press.[10] Zhang, Z., Cai, Q., & Stokes, J. (2008).Multichannel Acoustic Echo Cancelation inMultiparty Spatial Audio Conferencing WithConstrained Kalman Filtering. Proc.International Workshop on Acoustic Echo andNoise Control (IWAENC'08) .AcknowledgmentMany ideas in this paper were presented in anICME 2006 panel and in an ImmersCom 2009panel.Bibliography[1] Argyle, M., Salter, V., Nicholson, H., Williams,M., & Burgess, P. (1970). The communication ofinferior and superior attitudes by verbal and nonverbalsignals. British journal of social andclinical psychology , 9, 222-231.[2] Baldis, J. (2001). Effects of spatial audio onmemory, compression, and preference duringdesktop conferences. Proceedings of the CHIConference on Human Factors in ComputingSystems (pp. 166-173). Seattle, USA: ACM Press.[3] Blauert, J. (1983). Spatial Hearing. MIT Press.[4] Bregman, A. (1990). Auditory Scene Analysis.MIT Press.[5] Chen, W.-g., & Zhang, Z. (2009). Highly realisticaudio spatialization for multiparty conferencingusing headphones. <strong>IEEE</strong> International Workshopon Multimedia Signal Processing ( MMSP'09) .Zhengyou Zhang (SM’97–F’05) received theB.S. degree in electronic engineering from theUniversity of Zhejiang, Hangzhou, China, in1985, the M.S. degree in computer science fromthe University of Nancy, Nancy, France, in 1987,and the Ph.D. degree in computer science and theDoctor of Science (Habilitation à diriger desrecherches) diploma from the University of ParisXI, Paris, France, in 1990 and 1994, respectively.He is a Principal Researcher with MicrosoftResearch, Redmond, WA, USA, and manageshttp://www.comsoc.org/~mmc/ 9/41 Vol.4, No.7, August 2009