13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm UniversityProject presentation: Spontal – multimodal database ofspontaneous speech in dialogJonas Beskow, Jens Edlund, Kjell Elenius, Kahl Hellmer, David House & Sofia StrömbergssonKTH Speech Music & Hearing, Stockholm, SwedenAbstractWe describe the ongoing Swedish speech databaseproject Spontal: Multimodal database ofspontaneous speech in dialog (VR 2006-7482).The project takes as its point of departure thefact that both vocal signals and gesture involvingthe face and body are important in everyday,face-to-face communicative interaction,and that there is a great need for data withwhich we more precisely measure these.IntroductionSpontal: Multimodal database of spontaneousspeech in dialog is an ongoing Swedish speechdatabase project which began in 2007 and willbe concluded in 2010. It is funded by the SwedishResearch Council, KFI - Grant for largedatabases (VR 2006-7482). The project takes asits point of departure the fact that both vocalsignals and gesture involving the face and bodyare key components in everyday face-to-faceinteraction – arguably the context in whichspeech was borne – and focuses in particular onspontaneous conversation.Although we have a growing understandingof the vocal and visual aspects of conversation,we are lacking in data with which we can makemore precise measurements. There is currentlyvery little data with which we can measure withprecision multimodal aspects such as the timingrelationships between vocal signals and facialand body gestures, but also acoustic propertiesthat are specific to conversation, as opposed toread speech or monologue, such as the acousticsinvolved in floor negotiation, feedback andgrounding, and resolution of misunderstandings.The goal of the Spontal project is to addressthis situation through the creation of a Swedishmultimodal spontaneous speech database richenough to capture important variations amongspeakers and speaking styles to meet the demandsof current research of conversationalspeech.Scope60 hours of dialog consisting of 120 half-hoursessions will be recorded in the project. Eachsession consists of three consecutive 10 minuteblocks. The subjects are all native speakers ofSwedish and balanced (1) for gender, (2) as towhether the interlocutors are of opposing genderand (3) as to whether they know each otheror not. This balance will result in 15 dialogs ofeach configuration: 15x2x2x2 for a total of 120dialogs. Currently (April, <strong>2009</strong>), about 33% ofthe database has been recorded. The remainderis scheduled for recording during 2010. All subjectspermit, in writing (1) that the recordingsare used for scientific analysis, (2) that the analysesare published in scientific writings and (3)that the recordings can be replayed in front ofaudiences at scientific conferences and suchlike.In the base configuration, the recordings arecomprised of high-quality audio and highdefinitionvideo, with about 5% of the recordingsalso making use of a motion capture systemusing infra-red cameras and reflectivemarkers for recording facial gestures in 3D. Inaddition, the motion capture system is used onvirtually all recordings to capture body andhead gestures, although resources to treat andannotate this data have yet to be allocated.Instruction and scenariosSubjects are told that they are allowed to talkabout absolutely anything they want at anypoint in the session, including meta-commentson the recording environment and suchlike,with the intention to relieve subjects from feelingforced to behave in any particular manner.The recordings are formally divided intothree 10 minute blocks, although the conversationis allowed to continue seamlessly over theblocks, with the exception that subjects are informed,briefly, about the time after each 10minute block. After 20 minutes, they are alsoasked to open a wooden box which has beenplaced on the floor beneath them prior to therecording. The box contains objects whoseidentity or function is not immediately obvious.The subjects may then hold, examine and190

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!