<strong>IEEE</strong> COMSOC MMTC E-LetterHow to Submit Papers to ICC 2010For ICC 2010, Multimedia Services,Communication Software and ServiceSymposium (MCS) is the ONLY symposiumthat is fully sponsored by MMTC, hence weencourage all our members to submit yourMultimedia related papers to MCS symposium,where a Co-Chair and many TPC membersrecommended from MMTC would handle all thereview process of the multimedia related papersubmissions.Here are the few steps to submit your papers toICC 2010:(1) Go to EDAS website:http://edas.info/index.php and sign intoyour account;(2) Click the “Submit paper” Tab on thetop of the webpage;(3) In the list of the conferences, find“ICC’10” and click the icon on therightmost column;(4) In the list of ICC’10 symposia, pleaseselect the “ICC’10 MCS” and click theicon on the rightmost column;(5) Start the normal paper submissionprocess.http://www.comsoc.org/~mmc/ 6/41 Vol.4, No.7, August 2009
<strong>IEEE</strong> COMSOC MMTC E-LetterTECHNOLOGY ADVANCESDistinguished Position Paper SeriesA Framework on Multimodal Telecommunications from a Human PerspectiveZhengyou Zhang (<strong>IEEE</strong> Fellow), Microsoft Research, USAzhang@microsoft.comIn this position paper, I am arguing forresearching and developing multimodaltelecommunications systems from theperspective of users who are the communicationcreatures. Human’s innate communications skillsare the natural consequence of hundreds ofthousands of years of evolution. We shouldfollow and leverage our own skills rather thantrying to change our behaviors.One of the major goals of communications is toget messages across to people on the other side.There are a variety of messages in any face-tofacecommunication. However, one can basicallyfind three elements, known as “3 Vs”, behindeach message:• Verbal: Words, what you say;• Vocal: Tone of voice, how you say thewords;• Visual: Facial expression, gaze, bodylanguage.The second and third elements are sometimessimply referred to as non-verbal elements. Prof.Albert Mehrabian conducted experimentsdealing with communications of feelings andattitudes (i.e., like-dislike) in 1960s and foundthat the non-verbal elements are particularlyimportant for communicating feelings andattitude, especially when they are incongruent: ifwords and body language disagree, one tends tobelieve the body language (Mehrabian, 1981).More concretely, according to Mehrabian, whena person talks about their feelings or altitudes,the three elements account for 7%, 28% and 55%,respectively. The exact numbers might bearguable depending on experimental settings, butclearly, non-verbal messages are extremelyimportant, as confirmed by other researchers(Argyle, Salter, Nicholson, Williams, & Burgess,1970).<strong>Communications</strong> across distances are muchharder, and in our human history, many toolshave been invented to overcome this difficulty:• Letters: Verbal• Telephony: Verbal + Vocal• Videoconferencing: Verbal + Vocal +VisualVideoconferencing is leveraging all 3 Vs. So, arepeople satisfying with the videoconferencingsystems we have? Obviously, the answer is no.Let’s take a look at the visual aspect of somenonverbal behaviors in face-to-facecommunications:• Facial expressions• Eye gaze• Eye gestures• Head gaze• Head gestures• Arm gestures• Body posturesHow much can a current videoconferencingsystem convey? Not much, mostly facialexpressions and head gestures. Other behaviorsare lost. Even if facial expressions and headgestures are displayed on the video, withoutother elements, say eye gaze, one may get animprecise or even wrong perception (who is shereally smiling to?).In the area of audio, there is also a lot to desire.Current conferencing systems (audio only oraudio-visual) are essentially monaural. Whenmultiple people are involved in conferencing,one major problem is that a participant at oneend has difficulties in identifying who is talkingat the other end and comprehending what isbeing discussed. The reason is that the voices ofmultiple participants are intermixed into a singleaudio stream. This is in sharp contrast to faceto-facecommunications where sounds arecoming from all directions and human canperceive the direction and distance of a soundsource using two ears. Human auditory systemsexploit the knowledge of cues such as theinteraural time difference (ITD) and theinteraural level difference (ILD) due to distancebetween the two ears and the shadowing by thehead (Blauert, 1983).http://www.comsoc.org/~mmc/ 7/41 Vol.4, No.7, August 2009