13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm UniversityConclusionsThe word recognition test of sentences with degradedaudio showed that animations based onreal movements resulted in significantly betterspeech perception than rule-based. The classificationtest then showed that subjects were unableto tell if the displayed animated movementswere real or synthetic, and could to amodest extent discriminate between the two.This study is small and has several factorsof uncertainty (e.g., variation between subjectsin both tests, the influence of the face movements,differences in articulatory range of thereal and rule-based movements) and it is hencenot possible to draw any general conclusions onaudiovisual speech perception with augmentedreality. It nevertheless points out a very interestingpath of future research: the fact that subjectswere unable to tell if animations were createdfrom real speech movements or not, butreceived more support from this type of animationsthan from realistic synthetic movements,gives an indication of a subconscious influenceof visual gestures on speech perception. Thisstudy cannot prove that there is a direct mappingbetween audiovisual speech perceptionand speech motor planning, but it does hint atthe possibility that audiovisual speech is perceivedin the listener’s brain terms of vocaltract configurations (Fowler, 2008). Additionalinvestigations with this type of studies couldhelp determine the plausibility of differentspeech perception theories linked to the listener’sarticulations.AcknowledgementsThis work is supported by the Swedish ResearchCouncil project 80449001 Computer-Animated LAnguage TEAchers (CALATEA).The estimation of parameter values from motioncapture and articulography data was performedby Jonas Beskow.ReferencesAgelfors, E., Beskow, J., Dahlquist, M., Granström,B., Lundeberg, M., Spens, K.-E. andÖhman, T. (1998). Synthetic faces as alipreading support. <strong>Proceedings</strong> of ICSLP,3047–3050.Badin, P., Tarabalka, Y., Elisei, F. and Bailly,G. (2008). Can you ”read tongue movements”?,<strong>Proceedings</strong> of Interspeech, 2635–2638.Benoît, C. and LeGoff, B. (1998). Audio-visualspeech synthesis from French text: Eightyears of models, design and evaluation atthe ICP. Speech Communication 26, 117–129.Beskow, J. (1995). Rule-based visual speechsynthesis. <strong>Proceedings</strong> of Eurospeech, 299–302.Beskow, J., Engwall, O. and Granström, B.(2003). Resynthesis of facial and intraoralmotionfrom simultaneous measurements.<strong>Proceedings</strong> of ICPhS, 431–434.Branderud, P. (1985). Movetrack – a movementtracking system, <strong>Proceedings</strong> of the French-Swedish Symposium on Speech, 113–122,Engwall, O. (2003). Combining MRI, EMA &EPG in a three-dimensional tongue model.Speech Communication 41/2-3, 303–329.Fowler, C. (2008). The FLMP STMPed, PsychonomicBulletin & Review 15, 458–462.Grauwinkel, K., Dewitt, B. and Fagel, S.(2007). Visual information and redundancyconveyed by internal articulator dynamicsin synthetic audiovisual speech. <strong>Proceedings</strong>of Interspeech, 706–709.Liberman A, Cooper F, Shankweiler D andStuddert-Kennedy M (1967). Perceptionof the speech code. Psychological Review,74, 431–461.Liberman A & Mattingly I (1985). The motortheory of speech perception revised.Cognition, 21, 1–36.Siciliano, C., Williams, G., Beskow, J. andFaulkner, A. (2003). Evaluation of a multilingualsynthetic talking face as a communicationaid for the hearing impaired, <strong>Proceedings</strong>of ICPhS, 131–134.Skipper J., Wassenhove V. van, NusbaumH. and Small, S. (2007). Hearing lipsand seeing voices: how cortical areassupporting speech production mediateaudiovisual speech perception. CerebralCortex 17, 387 – 2399.Sumby, W. and Pollack, I. (1954). Visual contributionto speech intelligibility in noise,Journal of the Acoustical Society of America26, 212–215.Traunmüller, H. (2007). Demodulation, mirrorneurons and audiovisual perception nullifythe motor theory. <strong>Proceedings</strong> of <strong>Fonetik</strong>2007, KTH-TMH-QPSR 50: 17–20.Wik, P. and Engwall, O. (2008). Can visualizationof internal articulators support speechperception?, <strong>Proceedings</strong> of Interspeech2008, 2627–2630.35

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!