How to categorize emotional speech signals with respect to the speaker's degree of emotional intensity

How to categorize emotional speech signals with respect to the speaker's degree of emotional intensity

Recently, classifying different emotional content of speech signals automatically has become one of the most important comprehensive inquiries. The main subject in this field is related to the improvement of the correct classification rate (CCR) resulting from the proposed techniques. However, a literature review shows that there is no notable research on finding appropriate parameters that are related to the intensity of emotions. In this article, we investigate the proper features to be employed in the recognition of emotional speech utterances according to their intensities. In this manner, 4 emotional classes of the Berlin Emotional Speech database, happiness, anger, fear, and boredom, are evaluated in high and low intensity degrees. Utilizing different classifiers, a CCR of about 70% is obtained. Moreover, a 10-fold cross-validation procedure is used to enhance the consistency of the results.

___

  • [1] Busso C, Lee S, Narayanan S. S. Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE T Audio Speech 2009; 17: 582-596.
  • [2] Yun S, Yoo CH. Speech emotion recognition via a max-margin framework incorporating a loss function based on the Watson and Tellegen’s emotion model. In: ICASSP 2009 Proceedings; 19–24 April 2009; Taipei, Taiwan. New York, NY, USA: IEEE. pp. 4169-4172.
  • [3] Bezooijen RV. The Characteristics and Recognizability of Vocal Expression of Emotions. Dordrecht, the Netherlands: Foris Publications, 1984.
  • [4] Tolkmitt FJ, Scherer KR. Effect of experimentally induced stress on vocal parameters. J Exp Psychol Human 1986; 12: 302-313.
  • [5] McGilloway S, Cowie R, Douglas-Cowi E. Approaching automatic recognition of emotion from voice: a rough benchmark. In: ISCAWSE 2000 Proceedings; 5–7 September 2000; Newcastle, UK.
  • [6] Hammal Z, Bozkurt B, Couvreur L, Unay U, Caplier A, Dutoit T. Passive versus active: vocal classification system. In: EUSIPCO 2005 Proceedings; 4–8 September 2005; Antalya, Turkey. New York, NY, USA: IEEE. pp. 1-4.
  • [7] Ververidis D, Kotropoulos C, Pitas I. Automatic emotional speech classification. In: ICASSP 2004 Proceedings; 17–21 May 2004; Montreal, Canada. New York, NY, USA: IEEE. pp. 593-596.
  • [8] Pao TL, Liao WY, Chien CS, Chen YT, Yeh JH, Cheng YM. Comparison of several classifiers for emotion recognition from noisy Mandarin speech. In: IIH-MSP 2007 Proceedings; 26–26 November 2007; Kaohsiung, Taiwan. New York,NY, USA: IEEE. pp. 23-26.
  • [9] Yang C, Pu X. Efficient speech emotion recognition based on multisurface proximal support vector machine. In: RAM 2008 Proceedings; 21–24 September 2008; Chengdu, China. New York, NY, USA: IEEE. pp. 55-60.
  • [10] Hansen JHL, Wooil K, Rahurkar M, Ruzanski E, Meyerhoff J. Robust emotional stressed speech detection using weighted frequency subbands. Eurasip J Adv Sig Pr 2011; 2011: 906789.
  • [11] Tawari A, Trivedi M. Speech emotion analysis in noisy real-world environment. In: ICPR 2010 Proceedings; 23–26 August 2010; ˙Istanbul, Turkey. New York, NY, USA: IEEE. pp. 4605-4608.
  • [12] Kim W, Hansen JHL. Angry emotion detection from real-life conversational speech by leveraging content structure. In: ICASSP 2010; 14–19 March 2010; Dallas, TX, USA. New York, NY, USA: IEEE. pp. 5166-5169.
  • [13] Karimi S, Sedaaghi MH. Robust emotional speech classification in the presence of babble noise. International Journal of Speech Technology 2013; 16: 215-227.
  • [14] Song M, Chen C, Bu J, You M. Speech emotion recognition and intensity estimation. Lect Notes Comp Sci 2004; 3046: 406-413.
  • [15] Banziger T, Tran V, Scherer KR. The Geneva emotion wheel: a tool for the verbal report of emotional reactions [poster]. In: ISRE 2005 Proceedings; Bari, Italy, 2005.
  • [16] Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B. A database of German emotional speech. In: Interspeech 2005 Proceedings; Lisbon, Portugal, 2005.
  • [17] Ververidis D, Kotropoulos C. Emotional speech recognition: resources, features, and methods. Speech Commun 2006; 48: 1162-1181.
  • [18] Sedaaghi MH. Gender classification in emotional speech. In: Mihelic F, Zibert J, editors. Speech Recognition: Technologies and Applications. Rijeka, Croatia: InTech, 2008. pp. 363-376.
  • [19] Chang CC, Hsu CW, Lin CJ. The analysis of decomposition methods for SVM. IEEE T Neural Networ 2000; 11: 1003-1008.
  • [20] Ratsch G, Mika S, Scholkopf B, Muller KR. Constructing boosting algorithms from SVMs: an application to 1 class classification. IEEE T Pattern Anal 2002; 24: 1184-1199.
  • [21] Chang JH, Kim NS, Mitra SK. Voice activity detection based on multiple statistical models. IEEE T Signal Proces 2006; 54: 1965-1976.
  • [22] Rabiner LR, Sambur MR. Voiced-unvoiced-silence detection using Itakura LPC distance measure. In: ASSP 1977 Proceedings; May 1977. New York, NY, USA: IEEE. pp. 323-326.
  • [23] Wechsler JD. Detection of human speech in structured noise. In: ASSP 1994 Proceedings, 19–22 April 1994; Adelaide, Australia. New York, NY, USA: IEEE. pp. 237-240.
  • [24] Beritelli F, Casale S, Cavallaro A. A robust voice activity detector for wireless communications using soft computing. IEEE J Sel Area Comm 1998; 16: 1818-1829.
  • [25] Snell RC, Milinazzo F. Formant location from LPC analysis data. IEEE T Speech Audi P 1993; 1: 129-134.
  • [26] Markel JD, Gray AH. Linear Prediction of Speech. Berlin, Germany: Springer-Verlag, 1976.
  • [27] Rabiner LR, Schafer RW. Digital Processing of Speech Signals. Englewood Cliffs, NJ, USA: Prentice-Hall, 1978.
  • [28] Loizou P. COLEA: A MATLAB Software Tool for Speech Analysis. Fayetteville, AR, USA: University of Arkansas, 2003.
  • [29] Hess WJ. Pitch and voicing determination. In: Furui S, Sondhi MM, editors. Advances in Speech Signal Processing. New York, NY, USA: Marcel Dekker, 1992. pp. 3-48.
  • [30] Sondhi MM. New methods of pitch extraction. IEEE T Acoust Speech 1968; 16: 262-266.
  • [31] Ververidis D, Kotropoulos C. Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In: EUSIPCO 2006 Proceedings; 4–8 September 2006; Florence, Italy.New York, NY, USA: IEEE. pp. 1-5.
  • [32] Shah F, Krishnan V, Sukumar R, Jayakumar A, Anto B. Speaker independent automatic emotion recognition from speech, a comparison of MFCCs and discrete wavelet transforms. In: ARTCC 2009 Proceedings; 27–28 October 2009; Kottayam, India. New York, NY, USA: IEEE. pp. 528-531.
  • [33] Ning T, Whiting S. Power spectrum estimation via orthogonal transformation. In: ASSP 1990 Proceedings. New York, NY, USA: IEEE. pp. 2523-2526.
  • [34] Hermansky H. Perceptual linear predictive (PLP) analysis for speech. J Acoust Soc Am 1990; 1: 1738-1752.
  • [35] Hermansky H, Morgan N, Bayya A, Kohn P. RASTA-PLP speech analysis technique. In: ICASP 1992; 23–26 March 1992; San Francisco, CA, USA. New York, NY, USA: IEEE. pp. 121-124.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Development of radiometer operating between 50 MHz and 26.5 GHz using gain-stabilized LNA

Arif DOLMA, Cem HAYIRLI, Murat CELEP, Şenel YARAN

A discrete numerical method for magnetic field determination in three-phase busbars of a rectangular cross-section

Tomasz SZCZEGIELNIAK, Zygmunt PIATEK, Bernard BARON, Pawe LONSKI lJAB, Artur PASIERBEK, Dariusz KUSIAK

A classification of semantic conflicts in heterogeneous Web services at message level

Rodziah ATAN, Ibrahim Ahmed AL-BALTAH, Abdul Azim Abdul GHANI, Wan Nurhayati Wan RAHMAN AB

A novel approach of design and analysis of fractal antenna using a neurocomputational method for reconfigurable RF MEMS antenna

Paras CHAWLA, Rajesh KHANNA

Study of the icing growth characteristic and its influencing factors for different types of insulators

Dongdong ZHANG, Zhijin ZHANG, Xingliang JIANG, Haizhou HUANG, David Wenzhong GAO

An unsupervised heterogeneous log-based framework for anomaly detection

Asif Iqbal HAJAMYDEEN, Nur Izura UDZIR, Ramlan MAHMOD, GHANI ABDUL Abdul Azim

An improved security framework for Web service-based resources

Hai JIN, Hao DONG, Wenbin JIANG, Xiaofei LIAO, Hui XU

Identifying acquisition devices from recorded speech signals using wavelet-based features

Ömer ESKİDERE

Comprehensive review of association estimators for the inference of gene networks

Nizamettin AYDIN, Gökmen ALTAY, Zeyneb KURT

A new CMOS ZC-CDTA realization and its filter applications

Ersin ALAYBEYOGLU, Hulusi Hakan KUNTMAN