Proceedings Fonetik 2009 - Institutionen för lingvistik

More documents

Recommendations

Info

Proceedings, FONETIK 2009, Dept. of Linguistics, Stockholm UniversityStudies on using the SynFace talking head for the hearingimpairedSamer Al Moubayed 1 , Jonas Beskow 1 , Ann-Marie Öster 1 , Giampiero Salvi 1 , Björn Granström 1 , Nicvan Son 2 , Ellen Ormel 2 ,Tobias Herzke 31KTH Centre for Speech Technology, Stockholm, Sweden.2 Viataal, Nijmegen, The Netherlands.3 HörTech gGmbH, Germany.sameram@kth.se, {beskow, annemarie, giampi, bjorn}@speech.kth.se, n.vson@viataal.nl, elleno@socsci.ru.nl,t.herzke@hoertech.deAbstractSynFace is a lip-synchronized talking agentwhich is optimized as a visual reading supportfor the hearing impaired. In this paper wepresent the large scale hearing impaired userstudies carried out for three languages in theHearing at Home project. The user tests focuson measuring the gain in Speech ReceptionThreshold in Noise and the effort scaling whenusing SynFace by hearing impaired people,where groups of hearing impaired subjects withdifferent impairment levels from mild to severeand cochlear implants are tested. Preliminaryanalysis of the results does not show significantgain in SRT or in effort scaling. But looking atlarge cross-subject variability in both tests, it isclear that many subjects benefit from SynFaceespecially with speech with stereo babble.IntroductionThere is a growing number of hearing impairedpersons in the society today. In the ongoingEU-project Hearing at Home (HaH)(Beskow et al., 2008), the goal is to develop thenext generation of assistive devices that willallow this group - which predominantly includesthe elderly - equal participation in communicationand empower them to play a fullrole in society. The project focuses on theneeds of hearing impaired persons in home environments.For a hearing impaired person, it is oftennecessary to be able to lip-read as well as hearthe person they are talking with in order tocommunicate successfully. Often, only the audiosignal is available, e.g. during telephoneconversations or certain TV broadcasts. One ofthe goals of the HaH project is to study the useof visual lip-reading support by hard of hearingpeople for home information, home entertainment,automation, and care applications.The SynFace Lip-SynchronizedTalking AgentSynFace (Beskow et al, 2008) is a supportivetechnology for hearing impaired persons,which aims to re-create the visible articulationof a speaker, in the form of an animated talkinghead. SynFace employs a specially developedreal-time phoneme recognition system, basedon a hybrid of recurrent artificial neural networks(ANNs) and Hidden Markov Models(HMMs) that delivers information regardingthe speech articulation to a speech animationmodule that renders the talking face to thecomputer screen using 3D graphics.SynFace previously has been trained on fourlanguages: English, Flemish, German and Swedish.The training used the multilingualSpeechDat corpora. To align the corpora, theHTK (Hidden markov models ToolKit) basedRefRec recogniser (Lindberg et al, 2000) wastrained to derive the phonetic transcription ofthe corpus. Table 1 presents the % correctframe of the recognizers of the four languagesSynFace contains.Table 1. Complexity and % correct frame of the recognizersof different languages in SynFace.Language Connections % correct frameSwedish 541,250 54.2English 184,848 53.0German 541,430 61.0Flemish 186,853 51.0User StudiesThe SynFace has been previously evaluatedby subjects in many ways in Agelfors et al(1998), Agelfors et al (2006) and Siciliano et al(2003). In the present study, a large scale test ofthe use of SynFace as an audio-visual support140
Proceedings, FONETIK 2009, Dept. of Linguistics, Stockholm Universityfor hearing impaired people with different hearingloss levels. The tests investigate how muchsubjects benefit from the use of SynFace interms of speech intelligibility, and how difficultit is for a subject to understand speech with thehelp of SynFace. Following is a detailed descriptionof methods used in these tests.MethodSRT or Speech Reception Threshold is thespeech signal SNR when the listener is able tounderstand 50% of the words in the sentences.In this test, SRT value is measured one timewith a speech signal alone without SynFace,with two types of noise, and another time withthe use of (when looking at) SynFace. If theSRT level has decreased when using SynFace,that means the subject has benefited from theuse of SynFace, since the subject could understand50% of the words with a higher noiselevel than when listening to the audio signalalone.To calculate the SRT level, a recursive proceduredescribed by Hagerman & Kinnefors(1995), is used, where the subject listens tosuccessive sentences of 5 words, and dependingon how many words the subject recognizes correctly,the SNR level of the signal is changed sothe subject can only understand 50% of thewords.The SRT value is estimated for each subjectin five conditions, a first estimation is used astraining, to eliminate any training effect. Thiswas recommended in Hagerman, B., & Kinnefors(1995). Two SRT values are estimated inthe condition of speech signal without SynFace,but with two types of noise, Stationary noise,and Babble noise (containing 6 speakers). Theother two estimations are for the same types ofnoise, but with the use of SynFace, that is whenthe subject is looking at the screen with Syn-Face, and listening in the head-phones to a noisysignal.In the effort scaling test, the easiness of usingSynFace by hearing impaired persons wastargeted.To establish this, the subject has to listen tosentences in the headphones, sometimes whenlooking at SynFace and sometimes withoutlooking at SynFace, and choose a value on apseudo continuous scale, ranging from 1 to 6,telling how difficult it is to listen to the speechsignal transmitted through the head-phones.Small Scale SRT Study on Normal HearingsubjectsA first small scale SRT intelligibility experimentwas performed on normal hearing subjectsranging in age between 26 and 40. Thisexperiment is established in order to confirmthe improvement in speech intelligibility of thecurrent SynFace using the SRT test.The tests were carried out using five normalhearing subjects. The stimuli consisted of twoSRT measurements while each measurementused a list of 10 sentences, and stationary noisewas added to the speech signal. A training sessionwas performed before the real test to controlthe learning effect, and two SRT measurementswere performed after that, one withoutlooking at SynFace and one looking at Syn-Face. Figure 1 shows the SRT levels obtainedin the different conditions, where each line correspondsto a subject.It is clear in the figure that all 5 subjects requiredlower SNR level when using SynFacecompared to the audio-only condition and theSRT for all of them decreased in the audio+SynFacecondition.An ANOVA analysis and successive multiplecomparison analysis confirm that there is asignificant decrease (improvement) of SRT (p
Page 1 and 2:
Department of LinguisticsProceeding
Page 3 and 4:
Proceedings, FONETIK 2009, Dept. of
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Proceedings, FOETIK 2009, Dept. of
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90: Proceedings, FONETIK 2009, Dept. of
Page 127 and 128: Proceedings, FOETIK 2009, Dept. of
Page 139: Proceedings, FONETIK 2009, Dept. of
Page 191 and 192:
Page 193 and 194:
Page 195 and 196:
Page 197 and 198:
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
Page 207 and 208:
Page 209 and 210:
Page 211 and 212:
Page 213 and 214:
Page 215 and 216:
Page 217 and 218:
Page 219 and 220:
Page 221 and 222:
Page 223 and 224:
Page 225 and 226:
Page 227:
Department of LinguisticsPhonetics
show all

Proceedings Fonetik 2009 - Institutionen för lingvistik

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?