Proceedings Fonetik 2009 - Institutionen för lingvistik

More documents

Recommendations

Info

Proceedings, FONETIK 2009, Dept. of Linguistics, Stockholm Universityican) English (and the system is used increasinglyfor the prosodic annotation of other languages,too), and good inter-transcriber consistencycan be achieved as long as the voice qualityanalyzed represents normal (modal) phonation.Certain speech situations, however, seemto consistently produce voice qualities differentfrom modal phonation, and the prosodic analysisof such speech data with traditional ToBIlabeling may be problematic. Typical examplesare breathy, creaky and harsh voice qualities.Pitch analysis algorithms, which are used toproduce a record of the fundamental frequency(f0) contour of the utterance to aid the ToBIlabeling, yield a messy or lacking f0 track onnon-modal voice segments. Non-modal voicequalities may represent habitual speaking stylesor idiosyncrasies of speakers but they are oftenprosodic characteristics of emotional discourse(sadness, anger, etc.). It is likely, for example,that the speech of a depressed subject is to asignificant extent characterized by low f0 targetsand creak. Therefore, some special (possiblyemotion-specific) speech genres (observedand recorded in clinical settings) might be problematicfor traditional ToBI labeling.A potential modified system would be “4-Tone EVo” – a ToBI-based framework for transcribingthe prosody of modal/non-modal voicein (emotional) English. As in the original ToBIsystem, intonation is transcribed as a sequenceof pitch accents and boundary pitch movements(phrase accents and boundary tones). The originalToBI break index tier (with four strengthsof boundaries) is also used. The fundamentaldifference between 4-Tone EVo and the originalToBI is that four main tones (H, L, h, l) areused instead of two (H, L). In 4-Tone EVo, Hand L are high and low tones, respectively, asare “h” and “l”, but “h” is a high tone with nonmodalphonation and “l” a low tone with nonmodalphonation. Basically, “h” is H without aclear pitch representation in the record of f0contour, and “l” is a similar variant of L.Preliminary tests for (emotional) Englishprosodic annotation have been made using themodel, and the results seem promising (Toivanen,2006). To assess the usefulness of 4-ToneEVo, informal interviews with British exchangestudents (speakers of southern British English)were used (with permission obtained from thesubjects). The speakers described, among otherthings, their reactions to certain personal dilemmas(the emotional overtone was, predictably,rather low-keyed).The discussions were recorded in a soundtreatedroom; the speakers’ speech data wasrecorded directly to hard disk (44.1 kHz, 16 bit)using a high-quality microphone. The interactionwas visually recorded with a high-qualitydigital video recorder directly facing the speaker.The speech data consisted of 574 orthographicwords (82 utterances) produced bythree female students (20-27 years old). FiveFinnish students of linguistics/phonetics listenedto the tapes and watched the video data;the subjects transcribed the data prosodicallyusing 4-Tone EVo. The transcribers had beengiven a full training course in 4-Tone EVo stylelabeling. Each subject transcribed the materialindependently of one another.As in the evaluation studies of the originalToBI, a pairwise analysis was used to evaluatethe consistency of the transcribers: the label ofeach transcriber was compared against the labelsof every other transcriber for the particularaspect of the utterance. The 574 words weretranscribed by the five subjects; thus a total of5740 (574x10 pairs of transcribers) transcriberpair-wordswere produced. The following consistencyrates were obtained: presence of pitchaccent (73 %), choice of pitch accent (69 %),presence of phrase accent (82 %), presence ofboundary tone (89 %), choice of phrase accent(78 %), choice of boundary tone (85 %), choiceof break index (68 %).The level of consistency achieved for 4-Tone EVo transcription was somewhat lowerthan that reported for the original ToBI system.However, the differences in the agreement levelsseem quite insignificant bearing in mindthat 4-Tone EVo uses four tones instead of two!Gaze direction analysisOur second proposal concerns the multimodalityof a (clinical) situation, e.g. a patient interview,in which (emotional) speech is produced.It seems necessary to record the interactive situationas fully as possible, also visually. In aclinical situation, where the subject’s overallbehavior is being (at least indirectly) assessed,it is essential that other modalities than speechbe analyzed and annotated. Thus, as far as emotionexpression and emotion evaluation in interactionare concerned, the coding of the visuallyobservable behavior of the subject should be astandard procedure. We suggest that, after recordingthe discourse event with a video recorder,the gaze of the subject is annotated asfollows. The gaze of the subject (patient) may178
Proceedings, FONETIK 2009, Dept. of Linguistics, Stockholm Universitybe directed towards the interlocutor (+directedgaze) or shifted away from the interlocutor (-directed gaze). The position of the subject relativeto the interlocutor (interviewer, clinician)may be neutral (0-proxemics), closer to the interlocutor(+proxemics) or withdrawn from theinterlocutor (-proxemics). Preliminary studiesindicate that the inter-transcriber consistencyeven for the visual annotation is promising(Toivanen, 2006).Post-analysis: meta-interviewOur third proposal concerns the interactionalityand negotiability of a (clinical) situation yieldingemotional speech. We suggest that, at somepoint, the subject is given an opportunity toevaluate and assess his/her emotional (speech)behavior. Therefore, we suggest that the interviewer(the clinician) will watch the video recordingtogether with the subject (the patient)and discuss the events of the situation. The aimof the post-interview is to study whether thesubject can accept and/or confirm the evaluationsmade by the clinician. An essential questionwould seem to be: are certain (assumed)manifestations of emotion/affect “genuine”emotional effects caused by the underlyingmental state (mental disorder) of the subject, orare they effects of the interactional (clinical)situation reflecting the moment-by-moment developingcommunicative/attitudinal stances betweenthe speakers? That is, to what extent isthe speech situation, rather than the underlyingmental state or mood of the subject, responsiblefor the emotional features observable in the situation?We believe that this kind of postinterviewwould enrich the clinical evaluationof the subject’s behavior. Especially after atreatment, it would be useful to chart the subject’sreactions to his/her recorded behavior inan interview situation: does he/she recognizecertain elements of his/her behavior being dueto his/her pre-treatment mental state/disorder?ConclusionThe outlined approach to a clinical evaluationof an emotional speech situation reflects theSystemic Approach: emotions, along with otheraspects of human behavior, serve to achieveintended behavioral and interactional goals inco-operation with the environment. Thus, emotionsare always reactions also to the behavioralacts unfolding in the moment-by-moment faceto-faceinteraction (in real time). In addition,emotions often reflect the underlying long-termaffective state of the speaker (possibly includingmental disorders in some subjects). Ananalysis of emotions in a speech situation musttake these aspects into account, and a speechanalyst doing research on clinical speech materialshould see and hear beyond “prosodemes”and given emotional labels when looking intothe data.ReferencesBeckman M.E. and Ayers G.M. (1993) Guidelinesfor ToBI Labeling. Linguistics Department,Ohio State University.Clemmer E.J. (1980) Psycholinguistic aspectsof pauses and temporal patterns in schizophrenicspeech. Journal of PsycholinguisticResearch 9, 161-185.Cornelius R.R. (1996) The science of emotion.Research and tradition in the psychology ofemotion. New Jersey: Prentice-Hall.Covington M, He C., Brown C., Naci L.,McClain J., Fjorbak B., Semple J. andBrown J. (2005) Schizophrenia and thestructure of language: the linguist’s view.Schizophrenia Research 77, 85-98.Damasio A. (1994) Descartes’ error. NewYork: Grosset/Putnam.Golfarb R. and Bekker N. (2009) Noun-verbambiguity in chronic undifferentiated schizophrenia.Journal of Communication Disorders42, 74-88.Laver J. (1994) Principles of phonetics. Cambridge:Cambridge University Press.Murphy D. and Cutting J. (1990) Prosodiccomprehension and expression in schizophrenia.Journal of Neurology, Neurosurgeryand Psychiatry 53, 727-730.Ohala J. (1983) Cross-language use of pitch: anethological view. Phonetica 40, 1-18.Scherer K.R. (2000) Vocal communication ofemotion. In Lewis M. and Haviland-Jones J.(eds.) Handbook of Emotions, 220-235.New York: The Guilford Press.Scherer K.R. (2003) Vocal communication ofemotion: a review of research paradigms.Speech Communication 40, 227-256.Toivanen J. (2006) Evaluation study of “4-ToneEVo”: a multimodal transcription model foremotion in voice in spoken English. In ToivanenJ. and Henrichsen P. (eds.) CurrentTrends in Research on Spoken Language inthe Nordic Countries, 139-140. Oulu University& CMOL, Copenhagen BusinessSchool: Oulu University Press.179
Page 1 and 2:
Department of LinguisticsProceeding
Page 3 and 4:
Proceedings, FONETIK 2009, Dept. of
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Proceedings, FOETIK 2009, Dept. of
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128: Proceedings, FOETIK 2009, Dept. of
Page 137 and 138: Proceedings, FONETIK 2009, Dept. of
Page 177: Proceedings, FONETIK 2009, Dept. of
Page 227: Department of LinguisticsPhonetics
show all

Proceedings Fonetik 2009 - Institutionen för lingvistik

Create successful ePaper yourself

Delete template?

Save as template?