Salman KARIMI, Mohammad Hossein SEDAAGHI

How to categorize emotional speech signals with respect to the speaker's degree of emotional intensity

Recently, classifying different emotional content of speech signals automatically has become one of the most important comprehensive inquiries. The main subject in this field is related to the improvement of the correct classification rate (CCR) resulting from the proposed techniques. However, a literature review shows that there is no notable research on finding appropriate parameters that are related to the intensity of emotions. In this article, we investigate the proper features to be employed in the recognition of emotional speech utterances according to their intensities. In this manner, 4 emotional classes of the Berlin Emotional Speech database, happiness, anger, fear, and boredom, are evaluated in high and low intensity degrees. Utilizing different classifiers, a CCR of about 70% is obtained. Moreover, a 10-fold cross-validation procedure is used to enhance the consistency of the results.

PDF

___

[1] Busso C, Lee S, Narayanan S. S. Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE T Audio Speech 2009; 17: 582-596.
[2] Yun S, Yoo CH. Speech emotion recognition via a max-margin framework incorporating a loss function based on the Watson and Tellegens emotion model. In: ICASSP 2009 Proceedings; 1924 April 2009; Taipei, Taiwan. New York, NY, USA: IEEE. pp. 4169-4172.
[3] Bezooijen RV. The Characteristics and Recognizability of Vocal Expression of Emotions. Dordrecht, the Netherlands: Foris Publications, 1984.
[4] Tolkmitt FJ, Scherer KR. Effect of experimentally induced stress on vocal parameters. J Exp Psychol Human 1986; 12: 302-313.
[5] McGilloway S, Cowie R, Douglas-Cowi E. Approaching automatic recognition of emotion from voice: a rough benchmark. In: ISCAWSE 2000 Proceedings; 57 September 2000; Newcastle, UK.
[6] Hammal Z, Bozkurt B, Couvreur L, Unay U, Caplier A, Dutoit T. Passive versus active: vocal classification system. In: EUSIPCO 2005 Proceedings; 48 September 2005; Antalya, Turkey. New York, NY, USA: IEEE. pp. 1-4.
[7] Ververidis D, Kotropoulos C, Pitas I. Automatic emotional speech classification. In: ICASSP 2004 Proceedings; 1721 May 2004; Montreal, Canada. New York, NY, USA: IEEE. pp. 593-596.
[8] Pao TL, Liao WY, Chien CS, Chen YT, Yeh JH, Cheng YM. Comparison of several classifiers for emotion recognition from noisy Mandarin speech. In: IIH-MSP 2007 Proceedings; 2626 November 2007; Kaohsiung, Taiwan. New York,NY, USA: IEEE. pp. 23-26.
[9] Yang C, Pu X. Efficient speech emotion recognition based on multisurface proximal support vector machine. In: RAM 2008 Proceedings; 2124 September 2008; Chengdu, China. New York, NY, USA: IEEE. pp. 55-60.
[10] Hansen JHL, Wooil K, Rahurkar M, Ruzanski E, Meyerhoff J. Robust emotional stressed speech detection using weighted frequency subbands. Eurasip J Adv Sig Pr 2011; 2011: 906789.
[11] Tawari A, Trivedi M. Speech emotion analysis in noisy real-world environment. In: ICPR 2010 Proceedings; 2326 August 2010; ˙Istanbul, Turkey. New York, NY, USA: IEEE. pp. 4605-4608.
[12] Kim W, Hansen JHL. Angry emotion detection from real-life conversational speech by leveraging content structure. In: ICASSP 2010; 1419 March 2010; Dallas, TX, USA. New York, NY, USA: IEEE. pp. 5166-5169.
[13] Karimi S, Sedaaghi MH. Robust emotional speech classification in the presence of babble noise. International Journal of Speech Technology 2013; 16: 215-227.
[14] Song M, Chen C, Bu J, You M. Speech emotion recognition and intensity estimation. Lect Notes Comp Sci 2004; 3046: 406-413.
[15] Banziger T, Tran V, Scherer KR. The Geneva emotion wheel: a tool for the verbal report of emotional reactions [poster]. In: ISRE 2005 Proceedings; Bari, Italy, 2005.
[16] Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B. A database of German emotional speech. In: Interspeech 2005 Proceedings; Lisbon, Portugal, 2005.
[17] Ververidis D, Kotropoulos C. Emotional speech recognition: resources, features, and methods. Speech Commun 2006; 48: 1162-1181.
[18] Sedaaghi MH. Gender classification in emotional speech. In: Mihelic F, Zibert J, editors. Speech Recognition: Technologies and Applications. Rijeka, Croatia: InTech, 2008. pp. 363-376.
[19] Chang CC, Hsu CW, Lin CJ. The analysis of decomposition methods for SVM. IEEE T Neural Networ 2000; 11: 1003-1008.
[20] Ratsch G, Mika S, Scholkopf B, Muller KR. Constructing boosting algorithms from SVMs: an application to 1 class classification. IEEE T Pattern Anal 2002; 24: 1184-1199.
[21] Chang JH, Kim NS, Mitra SK. Voice activity detection based on multiple statistical models. IEEE T Signal Proces 2006; 54: 1965-1976.
[22] Rabiner LR, Sambur MR. Voiced-unvoiced-silence detection using Itakura LPC distance measure. In: ASSP 1977 Proceedings; May 1977. New York, NY, USA: IEEE. pp. 323-326.
[23] Wechsler JD. Detection of human speech in structured noise. In: ASSP 1994 Proceedings, 1922 April 1994; Adelaide, Australia. New York, NY, USA: IEEE. pp. 237-240.
[24] Beritelli F, Casale S, Cavallaro A. A robust voice activity detector for wireless communications using soft computing. IEEE J Sel Area Comm 1998; 16: 1818-1829.
[25] Snell RC, Milinazzo F. Formant location from LPC analysis data. IEEE T Speech Audi P 1993; 1: 129-134.
[26] Markel JD, Gray AH. Linear Prediction of Speech. Berlin, Germany: Springer-Verlag, 1976.
[27] Rabiner LR, Schafer RW. Digital Processing of Speech Signals. Englewood Cliffs, NJ, USA: Prentice-Hall, 1978.
[28] Loizou P. COLEA: A MATLAB Software Tool for Speech Analysis. Fayetteville, AR, USA: University of Arkansas, 2003.
[29] Hess WJ. Pitch and voicing determination. In: Furui S, Sondhi MM, editors. Advances in Speech Signal Processing. New York, NY, USA: Marcel Dekker, 1992. pp. 3-48.
[30] Sondhi MM. New methods of pitch extraction. IEEE T Acoust Speech 1968; 16: 262-266.
[31] Ververidis D, Kotropoulos C. Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In: EUSIPCO 2006 Proceedings; 48 September 2006; Florence, Italy.New York, NY, USA: IEEE. pp. 1-5.
[32] Shah F, Krishnan V, Sukumar R, Jayakumar A, Anto B. Speaker independent automatic emotion recognition from speech, a comparison of MFCCs and discrete wavelet transforms. In: ARTCC 2009 Proceedings; 2728 October 2009; Kottayam, India. New York, NY, USA: IEEE. pp. 528-531.
[33] Ning T, Whiting S. Power spectrum estimation via orthogonal transformation. In: ASSP 1990 Proceedings. New York, NY, USA: IEEE. pp. 2523-2526.
[34] Hermansky H. Perceptual linear predictive (PLP) analysis for speech. J Acoust Soc Am 1990; 1: 1738-1752.
[35] Hermansky H, Morgan N, Bayya A, Kohn P. RASTA-PLP speech analysis technique. In: ICASP 1992; 2326 March 1992; San Francisco, CA, USA. New York, NY, USA: IEEE. pp. 121-124.