FARKLI VERİ SETLERİ ARASINDA DUYGU TANIMA ÇALIŞMASI

İnsanlar arasındaki en önemli iletişim aracı konuşmadır. Konuşma ile insanlar birbirlerine sadece düşüncelerini değil duygularını da aktarabilirler. Konuşma ile karşımızdaki kişinin düşüncesini, duygusunu, cinsiyetini ve yaşını da tahmin edebilir. Bu çalışmada EmoSTAR adlı yeni bir duygu veri seti sunulmuş ve Berlin Duygu Veri seti ile çapraz testler yapılmıştır. Çapraz testlerde, setlerden biri eğitim diğeri test seti olarak kullanılmıştır. Ayrıca, çalışmada özellik seçicilerin performansı da incelenmiştir. Özellik çıkarımı için openSMILE Emobase ve Emo_large konfigürasyonlarında MFCC sayısı 12’den 24’e çıkartılarak ve Harmonik Gürültü Oranı özellikleri eklenerek gerçekleştirilmiştir. Özellik seçme ve sınıflandırma ise Weka aracıyla yapılmıştır. EmoSTAR halen daha fazla duygu türü ve örnek için geliştirilme aşamasındadır.

Anahtar Kelimeler:

Çok setli duygu analizi, Duygu madenciliği, Prozodi, EmoDB, EmoSTAR

EMOTION RECOGNITION STUDY BETWEEN DIFFERENT DATA SETS

Speaking is the most important communication tool between people. By speaking people can transfer not only their thoughts but also their feelings to each other, too. When speaking we can estimate the conception, feeling, gender and age of the person we are talking to. In this study, a new sense of a data set called EmoSTAR is presented and cross tests with Berlin emotions data set is made. In the cross-test, one of the data set is used as training set while the other data set is used as test set. Additionally, in the study the performance of feature selector has been also examined. For feature extraction MFCC number is increased from 12 to 24 in openSMILE Emobase and Emo_large configurations and also developed by adding the Harmonic-to-Noise-Ratio features. The feature selection and classification is made by the Weka tool. EmoSTAR is currently under development for more emotion and sample type

Keywords:

Cross-corpus emotion analysis, Emotion mining, Prosody, EmoDB, EmoSTAR,

PDF

___

Batliner A., Steidl S., Schüller B., Seppi D., Vogt T., Wagner J., Devillers L., Vidrascu L., Aharonson V., Kessous L. ve Amir N. (2009): “Whodunnit Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech: Appendix”, Preprint submitted to Elsevier 24 January 2010, Computer Speech and Language, doi:10.1016/j.csl.2009.12.003.
Bhargava M., Polzehl T. (2012): “Improving Automatic Emotion Recognition From Speech Using Rhythm and Temporal Feature”, Proceedings of Icecit-2012 Published by Elsevier, s.139.
Black A. W., Bunnel H. T., Dou Y., Kumar P., Metze F., Perry D., Polzehl T., Prahallad K., Steidl S. ve Vaughn C. (2011): “New Parameterization for Emotional Speech Synthesis”, CSLP Proc., Johns Hopkins Summer Workshop, Baltimore.
Chavhan Y., Dhore M. L., Yesaware P. (2010): “Speech Emotion Recognition Using Support Vector Machine”, International Journal of Computer Applications (0975 - 8887), Cilt 1, No. 20.
Eyben F., Wöllmer M., Schuller S. (2009): “openSMILE-The Munich Versatile and Fast Open- Source Audio Feature Extractor”, In Proc. ACM Multimedia (MM), ACM, Florence, Italy, ACM, ISBN 978-1-60558-933-6, pp. 1459-1462, doi:10.1145/1873951.1874246.
Eyben F., Batliner A., Schuller B., Seppi D., Steidl S. (2010): "Cross-Corpus Classification of Realistic Emotions – Some Pilot Experiments”, In Proc. 7th Intern. Conf. on Language Resources and Evaluation (LREC 2010), Valletta, Elra, 2010.19.-21.05.2010.
Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I. H. (2009): “The Weka data mining software: an update”. Sigkdd Explor. Newsl. Cilt 11, s.10–18.
He L. (2010): “Stress and Emotion Recognition in Natural Speech in the Work and Family Environments”, PhD, Rmit University.
Iida A., Campbell N., Higuchi F., Yasumura M. (2003): "A corpus-based speech synthesis system with emotion”, Speech Communication, Cilt 40, s.161–187
Mairesse F., Polifroni J., Di Fabbrizio G. (2012): ”Can Prosody Inform Sentiment Analysis? Experiments On Short Spoken Reviews”, Nokia Research, ICASSP.
Oflazoğlu C., Yildirim S. (2013): “Recognizing emotion from Turkish speech using acoustic features”, Eurasip Journal on Audio, Speech, and Music Processing.
Pan Y., Shen P., Shen L. (2012): “Speech Emotion Recognition Using Support Vector Machine”, International Journal of Smart Home, Cilt 6, No. 2.
Ramakrishnan S. (2012): “Recognition of Emotion from Speech: A Review”, International Journal of Speech Technology, Cilt 15, Sayı 2, s.99-117.
Schuller B., Batliner A., Seppi D., Steidl S., Vogt T., Wagner J., Devillers L., Vidrascu L., Amir N., Kessous L., Aharonson V. (2007): “The Relevance of Feature Type for the Automatic Recognition of Emotional User States: Low Level Descriptors and Functionals”, Interspeech, Antwerp.
Schuller B., Vlasenko B., Eyben F., Rigoll G., Wendemuth A. (2009): ”Acoustic Emotion Recognition: A Benchmark Comparison of Performances”, IEEE Workshop on Automatic Speech Recognition & Understanding, Asru 2009 Proc., Merano.
Schuller B., Vlasenko B., Eyben F., Wollmer M., Stuhlsatz A.,Wendemuth A., Rigoll G. (2010): “Cross-corpus acoustic emotion recognition: variances and strategies”, IEEE Transactions on Affective Computing, Cilt 1, Sayı 2, s.119–131.
Schuller B., Zhang Z., Weninger F., Rigoll G. (2011): "Selecting Training Data for Cross- Corpus Speech Emotion Recognition: Prototypicality vs. Generalization”, Speech Processing Conference, Avios Proc., Telaviv.
Shahzadi A., Ahmadyfard A., Yaghmaie K., Harimi A. (2013): “Recognition of Emotion In Speech Using Spectral Patterns”, Malaysian Journal of Computer Science, Cilt 26, Sayı 2, s.143-158
Wu S., Falk T. H., Chan W. (2010): "Automatic speech emotion recognition using modulation spectral features", Speech Communication, doi:10.1016/j.specom.2010.08.013.
Wu D., Parsons D.T., Narayanan S. S. (2011): “Acoustic Feature Analysis in Speech Emotion Primitives Estimation”, Interspeech 2011, 26-30,Makuhari, Chiba, Japan.
Zhang Z., Weninger F., Wöllmer M., Schuller B. (2011): "Unsupervised Learning in Cross- Corpus Acoustic Emotion Recognition”, IEEE Workshop on Automatic Speech Recognition & Understanding, Asru Proc., Waikoloa, Hawaii.