Turgut ÖZSEVEN

Investigation of the Relation between Emotional State and Acoustic Parameters in the Context of Language

Acoustic analysis is the most basic method used for speech emotion recognition. Speech records are digitized by signal processingmethods, and various acoustic features of speech are obtained by acoustic analysis methods. The relationship between acoustic featuresand emotion has been investigated in many studies. However, studies have mostly focused on emotion recognition success or the effectsof emotions on acoustic features. The effect of spoken language on speech emotion recognition has been investigated in a limitednumber. The purpose of this study is to investigate the variability of the relationship between acoustic features and emotions accordingto the spoken language. For this purpose, three emotions (anger, fear and neutral) of three different spoken languages (English, Germanand Italian) were used. In these data sets, the change in acoustic features according to spoken language was investigated statistically.According to the results obtained, the effect of anger on the acoustic features does not change according to the spoken language. Forfear, change in spoken language shows a high similarity in Italian and German, but low similarity in English.

PDF

___

B. Zupan, D. Neumann, D. R. Babbage, and B. Willer, ‘The importance of vocal affect to bimodal processing of emotion: implications for individuals with traumatic brain injury’, J. Commun. Disord., vol. 42, no. 1, pp. 1–17, 2009.
F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, ‘A database of German emotional speech.’, in Interspeech, 2005, vol. 5, pp. 1517–1520.
G. Costantini, I. Iaderola, A. Paoloni, and M. Todisco, ‘EMOVO Corpus: an Italian Emotional Speech Database.’, in LREC, 2014, pp. 3501–3504.
P. Jackson, S. Haq, and J. D. Edge, ‘Audio-Visual Feature Selection and Reduction for Emotion Classification’, in In Proc. Int’l Conf. on Auditory-Visual Speech Processing, 2008, pp. 185–190.
J. H. Hansen, S. E. Bou-Ghazale, R. Sarikaya, and B. Pellom, ‘Getting started with SUSAS: a speech under simulated and actual stress database.’, in Eurospeech, 1997, vol. 97, pp. 1743–46.
P. Boersma and D. Weenink, Praat: doing phonetics by computer [Computer program], Version 5.1. 44. 2010.
F. Eyben, M. Wöllmer, and B. Schuller, ‘Opensmile: the munich versatile and fast open-source audio feature extractor’, in Proceedings of the international conference on Multimedia, 2010, pp. 1459–1462.
W. Tarng, Y.-Y. Chen, C.-L. Li, K.-R. Hsie, and M. Chen, ‘Applications of support vector machines on smart phone systems for emotional speech recognition’, World Acad. Sci. Eng. Technol., vol. 72, pp. 106–113, 2010.
M. C. Sezgin, B. Gunsel, and G. K. Kurt, ‘Perceptual audio features for emotion detection’, EURASIP J. Audio Speech Music Process., vol. 2012, no. 1, pp. 1–21, 2012.
M. Grimm, K. Kroschel, E. Mower, and S. Narayanan, ‘Primitives-based evaluation and estimation of emotions in speech’, Speech Commun., vol. 49, no. 10–11, pp. 787–800, Oct. 2007.
D. D. Joshi and M. B. Zalte, Recognition of Emotion from Marathi Speech Using MFCC and DWT Algorithms. IJACECT, 2013.
S. Sarıca, ‘Ses Analizinde Kullanılan Akustik Parametreler’, Tıpta Uzmanlık Tezi, Kahramanmaraş Sütçü İmam Üniversitesi Tıp Fakültesi, Kahramanmaraş, 2012.
K. M. Okur E., ‘CSL ve Dr. Speech ile ölçülen temel frekans ve pertürbasyon değerlerinin karşılaştırılması’, KBB Ihtis. Derg., vol. 8, pp. 152–157, 2001.
M. Farrus and J. Hernando, ‘Using jitter and shimmer in speaker verification’, Signal Process. IET, vol. 3, no. 4, pp. 247–257, 2009.
O. Deshmukh, C. Y. Espy-Wilson, A. Salomon, and J. Singh, ‘Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech’, Speech Audio Process. IEEE Trans. On, vol. 13, no. 5, pp. 776–786, 2005.
Y. Karagöz, ‘Nonparametrik tekniklerin güç ve etkinlikleri’, Elektron. Sos. Bilim. Derg., vol. 9, no. 33, pp. 18–40, 2010.