12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Emotion Recognition Agents for Speech Signal 813.4 Computer RecognitionTo recognize emotions in speech we tried the following approaches: K-nearest neighbors, neural networks, ensembles of neural network classifiers,and set of experts. In general, the approach that is based on ensembles ofneural network recognizers outperformed the others, and it was chosen forimplementation at the next stage. We summarize below the results obtainedwith the different techniques.K-nearest neighbors. We used 70% of the s70 data set as database ofcases for comparison and 30% as test set. We ran the algorithm for K = 1to 15 and for number of features 8, 10, and 14. The best average accuracy ofrecognition (∼55%) can be reached using 8 features, but the average accuracyfor anger is much higher (∼65%) for 10- and 14-feature sets. All recognizersperformed very poor for fear (about 5–10%).Neural networks. We used a two-layer backpropagation neural networkarchitecture with a 8-, 10- or 14-element input vector, 10 or 20 nodes in thehidden sigmoid layer and five nodes in the output linear layer. To train andtest our algorithms we used the data sets s70, s80 and s90, randomly split intotraining (70% of utterances) and test (30%) subsets. We created several neuralnetwork classifiers trained with different initial weight matrices. This approachapplied to the s70 data set and the 8-feature set gave an average accuracy ofabout 65% with the following distribution for emotion categories: normal stateis 55–65%, happiness is 60–70%, anger is 60–80%, sadness is 60–70%, andfear is 25–50%.Ensembles of neural network classifiers. We used ensemble 7 sizes from7 to 15 classifiers. Results for ensembles of 15 neural networks, the s70 dataset, all three sets of features, and both neural network architectures (10 and 20neurons in the hidden layer) were the following. The accuracy for happinessremained the same (∼65%) for the different sets of features and architectures.The accuracy for fear was relatively low (35–53%). The accuracy for angerstarted at 73% for the 8-feature set and increased to 81% for the 14-feature set.The accuracy for sadness varied from 73% to 83% and achieved its maximumfor the 10-feature set. The average total accuracy was about 70%.Set of experts. This approach is based on the following idea. Instead oftraining a neural network to recognize all emotions, we can train a set of specialistsor experts 8 that can recognize only one emotion and then combine theirresults to classify a given sample. The average accuracy of emotion recognitionfor this approach was about 70% except for fear, which was ∼44% for the10-neuron, and ∼56% for the 20-neuron architecture. The accuracy of non-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!