12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Emotion Recognition Agents for Speech Signal 83System Structure. The ER system is part of a new generation computerizedcall center that integrates databases, decision support systems, and differentmedia such as voice messages, e-mail messages and a WWW server intoone information space. The system consists of three processes: a wave filemonitor, a voice mail center and a message prioritizer. The wave file monitorreads periodically the contents of the voice message directory, compares it tothe list of processed messages, and, if a new message is detected, it processesthe message and creates a summary and an emotion description file. The summaryfile contains the following information: five numbers that describe thedistribution of emotions, and the length and percentage of silence in the message.The emotion description file stores data describing the emotional contentof each 1–3 second chunk of message. The prioritizer is a process that readssummary files for processed messages, sorts them taking into account theiremotional content, length and some other criteria, and suggests an assignmentof agents to return back the calls. Finally, it generates a web page, which listsall current assignments. The voice mail center is an additional tool that helpsoperators and supervisors to visualize the emotional content of voice messages.5. ConclusionWe have explored how well people and computers recognize emotions inspeech. Several conclusions can be drawn from the above results. First, decodingemotions in speech is a complex process that is influenced by cultural,social, and intellectual characteristics of subjects. People are not perfect indecoding even such manifest emotions as anger and happiness. Second, angeris the most recognizable and easier to portray emotion. It is also the most importantemotion for business. But anger has numerous variants (for example,hot anger, cold anger, etc.) that can bring variability into acoustic features anddramatically influence the accuracy of recognition. Third, pattern recognitiontechniques based on neural networks proved to be useful for emotion recognitionin speech and for creating customer relationship management systems.Notes1. Each utterance was recorded using a close-talk microphone. The first 100 utterances were recordedat 22-kHz/8 bit and the remaining 600 utterances at 22-kHz/16 bit.2. The rows and the columns represent true and evaluated categories, respectively. For example, thesecond row says that 11.9% of utterances that were portrayed as happy were evaluated as normal (unemotional),61.4% as true happy, 10.1% as angry, 4.1% as sad, and 12.5% as afraid.3. The speaking rate was calculated as the inverse of the average length of the voiced part of utterance.For all other parameters we calculated the following statistics: mean, standard deviation, minimum, maximum,and range. Additionally, for F0 the slope was calculated as a linear regression for voiced part ofspeech, i.e. the line that fits the pitch contour. We also calculated the relative voiced energy. Altogether wehave estimated 43 features for each utterance.4. We ran RELIEF-F for the s70 data set varying the number of nearest neighbors from 1 to 12, andordered features according their sum of ranks.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!