This paper describes a revealing robust spectral feature for speech emotion recognition using Deep Neural Network (DNN) architecture with six fully-connected layers. We have used 3 class subset (angry, neutral, sad) of German corpus (Berlin database of emotional speech) containing 271 labeled recordings with a total length of 783 seconds. All data was divided into TRAIN (80 %) VALIDATION (10 %) and TESTING (10 %) sets. DNN is optimized using Stochastic Gradient Descent. And we have used batch normalization. As input, fourteen features were used and supported by the LIBROSSA library. Features are compared between each other. In accordance with the experiment we have discovered that MFCC with 100 percent accuracy is a reliable function for the task of recognizing emotions.
Digital Object Identifier (DOI)
Shoiynbek, Aisultan; Kozhakhmet, Kanat; Sultanova, Nazerke; and Zhumaliyeva, Rakhima
"The Robust Spectral Audio Features for Speech Emotion Recognition,"
Applied Mathematics & Information Sciences: Vol. 13
, Article 21.
Available at: https://dc.naturalspublishing.com/amis/vol13/iss5/21