Rachel Study
Rachel Study
Rachel Study
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Affective Computing :<br />
Computational Approaches in Emotion Recognition<br />
Abe Kazemzadeh, Chi-Chun (Jeremy) Lee,<br />
Angeliki Metallinou<br />
SAIL lab: sail.usc.edu<br />
Univ. of Southern California
Emotional Speech<br />
• Speech conveys rich emotional information
Multimodality<br />
• Emotions are often expressed multimodally<br />
– e.g.,sarcasm: conflicting multimodal cues
Complex Representation of Emotion<br />
• Emotions have variable intensity and clarity<br />
• Categorical descriptions may not give the full picture
Complex Representation of Emotion<br />
• Emotions have variable intensity and clarity<br />
• Categorical descriptions may not give the full picture
Emotion Evolution<br />
• Emotions generally happen in context<br />
• Of a situation<br />
• Of a conversation topic, e.g lost luggage<br />
• Of an emotional history, e.g speaker was angry until now
Motivation<br />
• <strong>Study</strong> of human emotions quantitatively<br />
• Human-computer interface<br />
– Education<br />
– Entertainment<br />
– Dialog system<br />
– Personalized application/software<br />
– Virtual agent<br />
– …<br />
• Behavioral informatics
Outline<br />
• Emotional Representations<br />
• Collecting Emotional Databases<br />
• Multimodal Feature Extraction<br />
• Methods for Emotion Recognition<br />
• Beyond Recognizing Emotions<br />
• Conclusions and Open Questions
Emotion Representations<br />
Categorical and Dimensional<br />
• Categorical Representations<br />
– Description using discrete categories, e.g angry, happy, sad etc<br />
– Six ‘basic’ emotions *<br />
anger, disgust, fear, happiness, sadness, surprise<br />
– Choice of affective states could be application driven, e.g<br />
interest, frustration...<br />
• Dimensional Representations **<br />
– Activation, Valence, Dominance<br />
– Description of attributes (dimensions) of an emotion<br />
* P. Ekman (1999), “Basic Emotions”, in Handbook of Cognition and Emotion<br />
** H. Schlosberg, “Three dimensions of emotion”, in Psychology Review
Emotion Representations<br />
Continuous Dimensional
Continuous Dimensional<br />
Representations<br />
• Feeltrace Tool for continuous<br />
annotations*<br />
• Agreement on the trends of<br />
emotional curves<br />
• Rather that absolute values<br />
• Easier to rate emotions in relative<br />
terms **<br />
* Feeltrace: an instrument for recording perceived<br />
emotion in real time, Cowie etal<br />
**Ranking-based emotion recognition for music<br />
organization and retrieval, Yang and Chen
Emotion Representations<br />
Challenges<br />
• Emotional descriptions are subjective<br />
• perceptual differences among individuals<br />
(emotions • vague or subtle emotional expressions (real life<br />
• Ground truth for recognition task may be ambiguous<br />
• Level of detail of emotional descriptions<br />
• How many emotional categories need to be considered?<br />
• How many levels of valence and activation?
Natural Language Descriptions<br />
EMO20Q<br />
Q: Do you feel this emotion at Disneyland?<br />
A: no<br />
Q: Do you feel this emotion when you run over a dog?<br />
A: possibly, yes.<br />
Q: is it remorse?<br />
A: no<br />
Q: Do you feel this when someone close dies?<br />
A: Not necessarily, but you could I suppose<br />
Q: When stealing something from a friend do you feel like this?<br />
A: I think so, but I don't usually steal stuff though.<br />
Q: There is a sound that does not let you sleep at night at your apartment do you feel like this in reaction to this noise?<br />
A: yes<br />
Q: is it annoyed?<br />
A: no<br />
Q: You are walking through South Central very late with your very expensive laptop and you see a stranger quickly<br />
moving towards you, do you feel like this when that happens?<br />
A: yes, getting closer.<br />
Q: Fear?<br />
A: no but close<br />
Q: Nervousness?<br />
A: no, that's a near synonym but I think it's slightly different<br />
Q: Do you feel like this when there is a big event coming up and you "cant wait" for it to happen<br />
A: no, actually the opposite... you don't want it to happen.<br />
Q: how about anxious?<br />
A: yes, it's anxious... I think i'll count it, but that wasn't the exact word. Do you know it?
Data Collection<br />
• IEMOCAP * and CreativeIT **<br />
– Multimodal emotional databases collected by SAIL<br />
• Use of improvisations and theatrical techniques<br />
– Elicitation of naturalistic emotions<br />
– Dyadic settings<br />
• Collected data<br />
– Detailed MoCap information of face or body<br />
– Microphones and cameras<br />
– Dialog transcriptions<br />
* IEMOCAP: Interactive emotional dyadic motion capture database, Busso etal<br />
** The USC CreativeIT database: A multimodal database of theatrical improvisation, A. Metallinou, C.-C.<br />
Lee, C. Busso, S. Carnicke, S. Narayanan
A clip from CreativeIT database<br />
• A scene from Chekhov’s play ’Uncle Vanya’
Feature Extraction
Speech Production And Perception<br />
(Gray's Anatomy via wikipedia.org)
(SAIL Realtime MRI Corpus)<br />
MRI Recordings
Speech Processing:<br />
Spectrogram<br />
(Rob Hagiwara, http://home.cc.umanitoba.ca/~robh/howto.html)
Text Features<br />
• Bag of words (unigrams)<br />
• N-gram language models<br />
• Emotion dictionaries<br />
• Lattices<br />
• Orthography (punctuation, capitalization, emoticons)<br />
• Wordnet<br />
• Syntax<br />
• Semantic roles<br />
• World knowledge
ASR: HMM<br />
(http://en.wikipedia.org/wiki/Hidden_Markov_model)
(Georgiou et al., ACII 2011)<br />
ASR: Lattice
Facial Feature Extraction<br />
• Facial expressions convey emotional information<br />
• Facial Action Coding System (FACS) and Action Units (AU) *<br />
• Extract Facial Features<br />
– FACS based approaches<br />
– Data-driven approaches<br />
– Statistical functionals over<br />
low level face features<br />
*Facial Action Coding System Manual, P. Ekman and W. Friesen
Body Language Feature Extraction<br />
• Body language expresses rich emotional information *<br />
– Body movement, gestures and posture<br />
– Relative behavior, e.g., approach/avoidance,<br />
looking/turning away, touching<br />
• Extract detailed features from MoCap<br />
* The new handbook of methods in nonverbal behavior research, J. Harrigan, R. Rosenthal and K.<br />
Scherer
Examples of Features<br />
* Tracking Changes in Continuous Emotion States using Body Language and Prosodic Cues, A.<br />
Metallinou, A. Katsamanis, Y. Wang and S. Narayanan
Emotion Recognition
Emotion Recognition<br />
turn by turn recognition<br />
• Recognition Task<br />
– recognize emotion in IEMOCAP database<br />
• Extracted Features<br />
– audio features (384 dimensions)<br />
• Emotion Representation<br />
– 4 categorical emotional labels: Angry, Happy, Sad, Neutral<br />
• Technological Difficulties<br />
– multiclass emotion labels classification<br />
– audio features only<br />
– database specific
Hierarchical Tree Classification<br />
• Easily adaptable to other databases (AIBO databases)<br />
• Flexible framework<br />
• Exploit expert knowledge<br />
Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee and Shrikanth S. Narayanan, Emotion recognition using a hierarchical binary<br />
decision tree approach (2011), in: Speech Communication, 53:9-10(1162-1171)
Emotion Recognition<br />
Context-Sensitive Multimodal<br />
• Considering temporal emotional context<br />
– When classifying the emotion of current observation<br />
• Hierarchical Framework to model dynamics<br />
(utterance – within emotions (e.g emotional<br />
(context – between emotions, during a conversation (temporal<br />
– between speakers<br />
• Flexibility in terms of classifiers<br />
(HMMs – HMM approaches (coupled HMMs, hierarchical<br />
(BLSTM) – Reccurent Neural Networks and their extensions<br />
• Multimodal Fusion<br />
– Face, voice, head and hand movement cues
Emotion Recognition<br />
Context-Sensitive Multimodal
Tracking trends of<br />
continuous emotions<br />
• Estimating continuous emotional curves through time<br />
– Using audio-visual information
Tracking trends of<br />
continuous emotions<br />
• Gaussian Mixture Model-based mapping *<br />
– Continuous underlying emotions x t<br />
– Continuous observed body language (and prosody):y t<br />
( t – Train a joint GMM for (x t ,y<br />
(EM • Iterative process (through<br />
(MLE) – Converges to the maximum likelihood mapping<br />
• Use derivatives to take into account temporal context<br />
– Smoother emotional trajectory estimates<br />
* T. Toda, A. W. Black, K. Tokuda, Statistical mapping between articulatory movements and acoustic<br />
spectrum using a gaussian mixture model
Some Tracking Results<br />
• We are better at tracking trends than absolute values<br />
• Promising performance for activation and dominance<br />
* Tracking Changes in Continuous Emotion States using Body Language and Prosodic Cues, A.<br />
Metallinou, A. Katsamanis, Y. Wang and S. Narayanan
Human Behaviors Modeling<br />
• Broad area of human behaviors/internal states<br />
– Emotion at core<br />
– Cognitive planning<br />
– Social behaviors<br />
– Subjective description of human behaviors<br />
• Behavior signal processing (BSP)<br />
– Recognize abstract human states of interest for<br />
psychologist<br />
– Quantify important human behaviors dynamics objectively<br />
– Provide engineering tools for human behaviors analysis
Entrainment<br />
• Natural coordination of behaviors between interacting<br />
partners<br />
• Reliable human “representation” is hard to achieve<br />
• Completely “signal-derived” unsupervised method inspired<br />
from psychological qualitative description<br />
Husband:<br />
(1) Constructing PCA<br />
vocal Characteristic<br />
Space<br />
(3) Compute<br />
Similarity<br />
Measures<br />
(Entrainment)<br />
Wife:<br />
(2) Projecting<br />
onto<br />
Constructed PCA<br />
Space<br />
Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian Baucom, Panayiotis Georgiou and Shrikanth Narayanan, An Analysis<br />
of PCA-based Vocal Entrainment Measures in Married Couples' Affective Spoken Interactions, in: Proceedings of Interspeech,<br />
Florence, Italy, 2011
Future Work:<br />
issues in need for further research<br />
• Human behaviors representation<br />
– How do we best “describe”, “annotate” subjective human<br />
behaviors? (or can we ?)<br />
• Informative feature extractions<br />
– How do we automatically extract the most “informative”<br />
features?<br />
• Machine learning framework<br />
– What would be the most appropriate framework<br />
(incorporating context, multimodalities, lexical,<br />
interaction)<br />
• Generalize it to model abstract human behaviors<br />
– Emotion is at core<br />
– Provide psychologist an objective method