Rachel Study

Affective Computing : 

Computational Approaches in Emotion Recognition 

Abe Kazemzadeh, Chi-Chun (Jeremy) Lee, 

Angeliki Metallinou 

SAIL lab: sail.usc.edu 

Univ. of Southern California

Emotional Speech 

• Speech conveys rich emotional information

Multimodality 

• Emotions are often expressed multimodally 

– e.g.,sarcasm: conflicting multimodal cues

Complex Representation of Emotion 

• Emotions have variable intensity and clarity 

• Categorical descriptions may not give the full picture

Complex Representation of Emotion 

• Emotions have variable intensity and clarity 

• Categorical descriptions may not give the full picture

Emotion Evolution 

• Emotions generally happen in context 

• Of a situation 

• Of a conversation topic, e.g lost luggage 

• Of an emotional history, e.g speaker was angry until now

Motivation 

• Study of human emotions quantitatively 

• Human-computer interface 

– Education 

– Entertainment 

– Dialog system 

– Personalized application/software 

– Virtual agent 

– … 

• Behavioral informatics

Outline 

• Emotional Representations 

• Collecting Emotional Databases 

• Multimodal Feature Extraction 

• Methods for Emotion Recognition 

• Beyond Recognizing Emotions 

• Conclusions and Open Questions

Emotion Representations 

Categorical and Dimensional 

• Categorical Representations 

– Description using discrete categories, e.g angry, happy, sad etc 

– Six ‘basic’ emotions * 

anger, disgust, fear, happiness, sadness, surprise 

– Choice of affective states could be application driven, e.g 

interest, frustration... 

• Dimensional Representations ** 

– Activation, Valence, Dominance 

– Description of attributes (dimensions) of an emotion 

* P. Ekman (1999), “Basic Emotions”, in Handbook of Cognition and Emotion 

** H. Schlosberg, “Three dimensions of emotion”, in Psychology Review


Continuous Dimensional

Continuous Dimensional 

Representations 

• Feeltrace Tool for continuous 

annotations* 

• Agreement on the trends of 

emotional curves 

• Rather that absolute values 

• Easier to rate emotions in relative 

terms ** 

* Feeltrace: an instrument for recording perceived 

emotion in real time, Cowie etal 

**Ranking-based emotion recognition for music 

organization and retrieval, Yang and Chen


Challenges 

• Emotional descriptions are subjective 

• perceptual differences among individuals 

(‏emotions • vague or subtle emotional expressions (real life 

• Ground truth for recognition task may be ambiguous 

• Level of detail of emotional descriptions 

• How many emotional categories need to be considered? 

• How many levels of valence and activation?

Natural Language Descriptions 

EMO20Q 

Q: Do you feel this emotion at Disneyland? 

A: no 

Q: Do you feel this emotion when you run over a dog? 

A: possibly, yes. 

Q: is it remorse? 

A: no 

Q: Do you feel this when someone close dies? 

A: Not necessarily, but you could I suppose 

Q: When stealing something from a friend do you feel like this? 

A: I think so, but I don't usually steal stuff though. 

Q: There is a sound that does not let you sleep at night at your apartment do you feel like this in reaction to this noise? 

A: yes 

Q: is it annoyed? 

A: no 

Q: You are walking through South Central very late with your very expensive laptop and you see a stranger quickly 

moving towards you, do you feel like this when that happens? 

A: yes, getting closer. 

Q: Fear? 

A: no but close 

Q: Nervousness? 

A: no, that's a near synonym but I think it's slightly different 

Q: Do you feel like this when there is a big event coming up and you "cant wait" for it to happen 

A: no, actually the opposite... you don't want it to happen. 

Q: how about anxious? 

A: yes, it's anxious... I think i'll count it, but that wasn't the exact word. Do you know it?

Data Collection 

• IEMOCAP * and CreativeIT ** 

– Multimodal emotional databases collected by SAIL 

• Use of improvisations and theatrical techniques 

– Elicitation of naturalistic emotions 

– Dyadic settings 

• Collected data 

– Detailed MoCap information of face or body 

– Microphones and cameras 

– Dialog transcriptions 

* IEMOCAP: Interactive emotional dyadic motion capture database, Busso etal 

** The USC CreativeIT database: A multimodal database of theatrical improvisation, A. Metallinou, C.-C. 

Lee, C. Busso, S. Carnicke, S. Narayanan

A clip from CreativeIT database 

• A scene from Chekhov’s play ’Uncle Vanya’

Feature Extraction

Speech Production And Perception 

(Gray's Anatomy via wikipedia.org)

(SAIL Realtime MRI Corpus) 

MRI Recordings

Speech Processing: 

Spectrogram 

(Rob Hagiwara, http://home.cc.umanitoba.ca/~robh/howto.html)

Text Features 

• Bag of words (unigrams) 

• N-gram language models 

• Emotion dictionaries 

• Lattices 

• Orthography (punctuation, capitalization, emoticons) 

• Wordnet 

• Syntax 

• Semantic roles 

• World knowledge

ASR: HMM 

(http://en.wikipedia.org/wiki/Hidden_Markov_model)

(Georgiou et al., ACII 2011) 

ASR: Lattice

Facial Feature Extraction 

• Facial expressions convey emotional information 

• Facial Action Coding System (FACS) and Action Units (AU) * 

• Extract Facial Features 

– FACS based approaches 

– Data-driven approaches 

– Statistical functionals over 

low level face features 

*Facial Action Coding System Manual, P. Ekman and W. Friesen

Body Language Feature Extraction 

• Body language expresses rich emotional information * 

– Body movement, gestures and posture 

– Relative behavior, e.g., approach/avoidance, 

looking/turning away, touching 

• Extract detailed features from MoCap 

* The new handbook of methods in nonverbal behavior research, J. Harrigan, R. Rosenthal and K. 

Scherer

Examples of Features 

* Tracking Changes in Continuous Emotion States using Body Language and Prosodic Cues, A. 

Metallinou, A. Katsamanis, Y. Wang and S. Narayanan

Emotion Recognition

Emotion Recognition 

turn by turn recognition 

• Recognition Task 

– recognize emotion in IEMOCAP database 

• Extracted Features 

– audio features (384 dimensions) 

• Emotion Representation 

– 4 categorical emotional labels: Angry, Happy, Sad, Neutral 

• Technological Difficulties 

– multiclass emotion labels classification 

– audio features only 

– database specific

Hierarchical Tree Classification 

• Easily adaptable to other databases (AIBO databases) 

• Flexible framework 

• Exploit expert knowledge 

Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee and Shrikanth S. Narayanan, Emotion recognition using a hierarchical binary 

decision tree approach (2011), in: Speech Communication, 53:9-10(1162-1171)


Context-Sensitive Multimodal 

• Considering temporal emotional context 

– When classifying the emotion of current observation 

• Hierarchical Framework to model dynamics 

(‏utterance – within emotions (e.g emotional 

(‏context – between emotions, during a conversation (temporal 

– between speakers 

• Flexibility in terms of classifiers 

(‏HMMs – HMM approaches (coupled HMMs, hierarchical 

(‏BLSTM‏)‏ – Reccurent Neural Networks and their extensions 

• Multimodal Fusion 

– Face, voice, head and hand movement cues


Context-Sensitive Multimodal

Tracking trends of 

continuous emotions 

• Estimating continuous emotional curves through time 

– Using audio-visual information

Tracking trends of 

continuous emotions 

• Gaussian Mixture Model-based mapping * 

– Continuous underlying emotions x t 

– Continuous observed body language (and prosody):y t 

( t – Train a joint GMM for (x t ,y 

(‏EM • Iterative process (through 

(‏MLE‏)‏ – Converges to the maximum likelihood mapping 

• Use derivatives to take into account temporal context 

– Smoother emotional trajectory estimates 

* T. Toda, A. W. Black, K. Tokuda, Statistical mapping between articulatory movements and acoustic 

spectrum using a gaussian mixture model

Some Tracking Results 

• We are better at tracking trends than absolute values 

• Promising performance for activation and dominance 

* Tracking Changes in Continuous Emotion States using Body Language and Prosodic Cues, A. 

Metallinou, A. Katsamanis, Y. Wang and S. Narayanan

Human Behaviors Modeling 

• Broad area of human behaviors/internal states 

– Emotion at core 

– Cognitive planning 

– Social behaviors 

– Subjective description of human behaviors 

• Behavior signal processing (BSP) 

– Recognize abstract human states of interest for 

psychologist 

– Quantify important human behaviors dynamics objectively 

– Provide engineering tools for human behaviors analysis

Entrainment 

• Natural coordination of behaviors between interacting 

partners 

• Reliable human “representation” is hard to achieve 

• Completely “signal-derived” unsupervised method inspired 

from psychological qualitative description 

Husband: 

(1) Constructing PCA 

vocal Characteristic 

Space 

(3) Compute 

Similarity 

Measures 

(Entrainment) 

Wife: 

(2) Projecting 

onto 

Constructed PCA 

Space 

Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian Baucom, Panayiotis Georgiou and Shrikanth Narayanan, An Analysis 

of PCA-based Vocal Entrainment Measures in Married Couples' Affective Spoken Interactions, in: Proceedings of Interspeech, 

Florence, Italy, 2011

Future Work: 

issues in need for further research 

• Human behaviors representation 

– How do we best “describe”, “annotate” subjective human 

behaviors? (or can we ?) 

• Informative feature extractions 

– How do we automatically extract the most “informative” 

features? 

• Machine learning framework 

– What would be the most appropriate framework 

(incorporating context, multimodalities, lexical, 

interaction) 

• Generalize it to model abstract human behaviors 

– Emotion is at core 

– Provide psychologist an objective method

Rachel Study

Create successful ePaper yourself

Delete template?

Save as template?