28.11.2014 Views

Rachel Study

Rachel Study

Rachel Study

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Affective Computing :<br />

Computational Approaches in Emotion Recognition<br />

Abe Kazemzadeh, Chi-Chun (Jeremy) Lee,<br />

Angeliki Metallinou<br />

SAIL lab: sail.usc.edu<br />

Univ. of Southern California


Emotional Speech<br />

• Speech conveys rich emotional information


Multimodality<br />

• Emotions are often expressed multimodally<br />

– e.g.,sarcasm: conflicting multimodal cues


Complex Representation of Emotion<br />

• Emotions have variable intensity and clarity<br />

• Categorical descriptions may not give the full picture


Complex Representation of Emotion<br />

• Emotions have variable intensity and clarity<br />

• Categorical descriptions may not give the full picture


Emotion Evolution<br />

• Emotions generally happen in context<br />

• Of a situation<br />

• Of a conversation topic, e.g lost luggage<br />

• Of an emotional history, e.g speaker was angry until now


Motivation<br />

• <strong>Study</strong> of human emotions quantitatively<br />

• Human-computer interface<br />

– Education<br />

– Entertainment<br />

– Dialog system<br />

– Personalized application/software<br />

– Virtual agent<br />

– …<br />

• Behavioral informatics


Outline<br />

• Emotional Representations<br />

• Collecting Emotional Databases<br />

• Multimodal Feature Extraction<br />

• Methods for Emotion Recognition<br />

• Beyond Recognizing Emotions<br />

• Conclusions and Open Questions


Emotion Representations<br />

Categorical and Dimensional<br />

• Categorical Representations<br />

– Description using discrete categories, e.g angry, happy, sad etc<br />

– Six ‘basic’ emotions *<br />

anger, disgust, fear, happiness, sadness, surprise<br />

– Choice of affective states could be application driven, e.g<br />

interest, frustration...<br />

• Dimensional Representations **<br />

– Activation, Valence, Dominance<br />

– Description of attributes (dimensions) of an emotion<br />

* P. Ekman (1999), “Basic Emotions”, in Handbook of Cognition and Emotion<br />

** H. Schlosberg, “Three dimensions of emotion”, in Psychology Review


Emotion Representations<br />

Continuous Dimensional


Continuous Dimensional<br />

Representations<br />

• Feeltrace Tool for continuous<br />

annotations*<br />

• Agreement on the trends of<br />

emotional curves<br />

• Rather that absolute values<br />

• Easier to rate emotions in relative<br />

terms **<br />

* Feeltrace: an instrument for recording perceived<br />

emotion in real time, Cowie etal<br />

**Ranking-based emotion recognition for music<br />

organization and retrieval, Yang and Chen


Emotion Representations<br />

Challenges<br />

• Emotional descriptions are subjective<br />

• perceptual differences among individuals<br />

(‏emotions • vague or subtle emotional expressions (real life<br />

• Ground truth for recognition task may be ambiguous<br />

• Level of detail of emotional descriptions<br />

• How many emotional categories need to be considered?<br />

• How many levels of valence and activation?


Natural Language Descriptions<br />

EMO20Q<br />

Q: Do you feel this emotion at Disneyland?<br />

A: no<br />

Q: Do you feel this emotion when you run over a dog?<br />

A: possibly, yes.<br />

Q: is it remorse?<br />

A: no<br />

Q: Do you feel this when someone close dies?<br />

A: Not necessarily, but you could I suppose<br />

Q: When stealing something from a friend do you feel like this?<br />

A: I think so, but I don't usually steal stuff though.<br />

Q: There is a sound that does not let you sleep at night at your apartment do you feel like this in reaction to this noise?<br />

A: yes<br />

Q: is it annoyed?<br />

A: no<br />

Q: You are walking through South Central very late with your very expensive laptop and you see a stranger quickly<br />

moving towards you, do you feel like this when that happens?<br />

A: yes, getting closer.<br />

Q: Fear?<br />

A: no but close<br />

Q: Nervousness?<br />

A: no, that's a near synonym but I think it's slightly different<br />

Q: Do you feel like this when there is a big event coming up and you "cant wait" for it to happen<br />

A: no, actually the opposite... you don't want it to happen.<br />

Q: how about anxious?<br />

A: yes, it's anxious... I think i'll count it, but that wasn't the exact word. Do you know it?


Data Collection<br />

• IEMOCAP * and CreativeIT **<br />

– Multimodal emotional databases collected by SAIL<br />

• Use of improvisations and theatrical techniques<br />

– Elicitation of naturalistic emotions<br />

– Dyadic settings<br />

• Collected data<br />

– Detailed MoCap information of face or body<br />

– Microphones and cameras<br />

– Dialog transcriptions<br />

* IEMOCAP: Interactive emotional dyadic motion capture database, Busso etal<br />

** The USC CreativeIT database: A multimodal database of theatrical improvisation, A. Metallinou, C.-C.<br />

Lee, C. Busso, S. Carnicke, S. Narayanan


A clip from CreativeIT database<br />

• A scene from Chekhov’s play ’Uncle Vanya’


Feature Extraction


Speech Production And Perception<br />

(Gray's Anatomy via wikipedia.org)


(SAIL Realtime MRI Corpus)<br />

MRI Recordings


Speech Processing:<br />

Spectrogram<br />

(Rob Hagiwara, http://home.cc.umanitoba.ca/~robh/howto.html)


Text Features<br />

• Bag of words (unigrams)<br />

• N-gram language models<br />

• Emotion dictionaries<br />

• Lattices<br />

• Orthography (punctuation, capitalization, emoticons)<br />

• Wordnet<br />

• Syntax<br />

• Semantic roles<br />

• World knowledge


ASR: HMM<br />

(http://en.wikipedia.org/wiki/Hidden_Markov_model)


(Georgiou et al., ACII 2011)<br />

ASR: Lattice


Facial Feature Extraction<br />

• Facial expressions convey emotional information<br />

• Facial Action Coding System (FACS) and Action Units (AU) *<br />

• Extract Facial Features<br />

– FACS based approaches<br />

– Data-driven approaches<br />

– Statistical functionals over<br />

low level face features<br />

*Facial Action Coding System Manual, P. Ekman and W. Friesen


Body Language Feature Extraction<br />

• Body language expresses rich emotional information *<br />

– Body movement, gestures and posture<br />

– Relative behavior, e.g., approach/avoidance,<br />

looking/turning away, touching<br />

• Extract detailed features from MoCap<br />

* The new handbook of methods in nonverbal behavior research, J. Harrigan, R. Rosenthal and K.<br />

Scherer


Examples of Features<br />

* Tracking Changes in Continuous Emotion States using Body Language and Prosodic Cues, A.<br />

Metallinou, A. Katsamanis, Y. Wang and S. Narayanan


Emotion Recognition


Emotion Recognition<br />

turn by turn recognition<br />

• Recognition Task<br />

– recognize emotion in IEMOCAP database<br />

• Extracted Features<br />

– audio features (384 dimensions)<br />

• Emotion Representation<br />

– 4 categorical emotional labels: Angry, Happy, Sad, Neutral<br />

• Technological Difficulties<br />

– multiclass emotion labels classification<br />

– audio features only<br />

– database specific


Hierarchical Tree Classification<br />

• Easily adaptable to other databases (AIBO databases)<br />

• Flexible framework<br />

• Exploit expert knowledge<br />

Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee and Shrikanth S. Narayanan, Emotion recognition using a hierarchical binary<br />

decision tree approach (2011), in: Speech Communication, 53:9-10(1162-1171)


Emotion Recognition<br />

Context-Sensitive Multimodal<br />

• Considering temporal emotional context<br />

– When classifying the emotion of current observation<br />

• Hierarchical Framework to model dynamics<br />

(‏utterance – within emotions (e.g emotional<br />

(‏context – between emotions, during a conversation (temporal<br />

– between speakers<br />

• Flexibility in terms of classifiers<br />

(‏HMMs – HMM approaches (coupled HMMs, hierarchical<br />

(‏BLSTM‏)‏ – Reccurent Neural Networks and their extensions<br />

• Multimodal Fusion<br />

– Face, voice, head and hand movement cues


Emotion Recognition<br />

Context-Sensitive Multimodal


Tracking trends of<br />

continuous emotions<br />

• Estimating continuous emotional curves through time<br />

– Using audio-visual information


Tracking trends of<br />

continuous emotions<br />

• Gaussian Mixture Model-based mapping *<br />

– Continuous underlying emotions x t<br />

– Continuous observed body language (and prosody):y t<br />

( t – Train a joint GMM for (x t ,y<br />

(‏EM • Iterative process (through<br />

(‏MLE‏)‏ – Converges to the maximum likelihood mapping<br />

• Use derivatives to take into account temporal context<br />

– Smoother emotional trajectory estimates<br />

* T. Toda, A. W. Black, K. Tokuda, Statistical mapping between articulatory movements and acoustic<br />

spectrum using a gaussian mixture model


Some Tracking Results<br />

• We are better at tracking trends than absolute values<br />

• Promising performance for activation and dominance<br />

* Tracking Changes in Continuous Emotion States using Body Language and Prosodic Cues, A.<br />

Metallinou, A. Katsamanis, Y. Wang and S. Narayanan


Human Behaviors Modeling<br />

• Broad area of human behaviors/internal states<br />

– Emotion at core<br />

– Cognitive planning<br />

– Social behaviors<br />

– Subjective description of human behaviors<br />

• Behavior signal processing (BSP)<br />

– Recognize abstract human states of interest for<br />

psychologist<br />

– Quantify important human behaviors dynamics objectively<br />

– Provide engineering tools for human behaviors analysis


Entrainment<br />

• Natural coordination of behaviors between interacting<br />

partners<br />

• Reliable human “representation” is hard to achieve<br />

• Completely “signal-derived” unsupervised method inspired<br />

from psychological qualitative description<br />

Husband:<br />

(1) Constructing PCA<br />

vocal Characteristic<br />

Space<br />

(3) Compute<br />

Similarity<br />

Measures<br />

(Entrainment)<br />

Wife:<br />

(2) Projecting<br />

onto<br />

Constructed PCA<br />

Space<br />

Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian Baucom, Panayiotis Georgiou and Shrikanth Narayanan, An Analysis<br />

of PCA-based Vocal Entrainment Measures in Married Couples' Affective Spoken Interactions, in: Proceedings of Interspeech,<br />

Florence, Italy, 2011


Future Work:<br />

issues in need for further research<br />

• Human behaviors representation<br />

– How do we best “describe”, “annotate” subjective human<br />

behaviors? (or can we ?)<br />

• Informative feature extractions<br />

– How do we automatically extract the most “informative”<br />

features?<br />

• Machine learning framework<br />

– What would be the most appropriate framework<br />

(incorporating context, multimodalities, lexical,<br />

interaction)<br />

• Generalize it to model abstract human behaviors<br />

– Emotion is at core<br />

– Provide psychologist an objective method

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!