Konuşmacı yaş ve cinsiyetinin GKM süpervektörlerine dayalı bir DVM sınıflandırıcısı ile belirlenmesi

Bu çalışmada konuşmacıları yaş ve/veya cinsiyet özelliklerine göre otomatik olarak sınıflandıran bir sistem önerilmiştir. Açık ve kapalı mekanlarda cep telefonu ve karasal bağlantılarla yapılan telefon konuşmalarının giriş olarak kullanıldığı bu sistemde konuşmacıların cinsiyetlerine göre üç (erkek, bayan, çocuk), yaşlarına göre dört (çocuk, genç yetişkin ve yaşlı) ve her iki özelliğine göre ise yedi sınıfa ayrılması amaçlanmıştır. Bu amaçla konuşmaların yalnızca sesli bölümlerinden elde edilen MFCC katsayıları ile oluşturulan GKM modelleri süpervektörlere dönüştürülerek DVM sınıflandırıcısına uygulanmıştır. Çalışmada konuşmaların ses içeren bölümlerinin belirlenmesinde sinyalin enerji özelliği kullanılırken GKM modellerinin eğitiminde ise geniş bir veritabanı ile eğitilen genel arka plan modelinin (GAM) uyarlanması yaklaşımı tercih edilmiştir. Çalışmada ayrıca farklı sayıda bileşenle oluşturulan GKM modelleri farklı uzunluklu konuşmalarla test edilerek GKM bileşen sayısı ve konuşma süresinin yaş ve cinsiyet tespiti üzerindeki etkisi de araştırılmıştır. Yapılan testlerde en yüksek sınıflandırma başarıları 16 saniyelik konuşmaların 64 bileşenli GKM'lerle modellenmesi sonucunda elde edilmiştir. Bu oranlar cinsiyet kategorisinde %92,42, yaş kategorisinde %60,10 ve yaş&cinsiyet kategorisinde ise %60,02 olarak ölçülmüştür.

Determination of a speaker's age and gender with an SVM classifier based on GMM supervectors

In this study, a system classifying speakers according to their age and/or genders is proposed. In this system phone conversations including mobile calls that took place indoor or outdoor are used as inputs. It is aimed to classify the speakers according to their genders into three classes as male, female and child, according to their ages into four classes as child, youth, adult and senior, and finally according to both gender and age into seven classes. For this aim, GMM models that are created with MFCC coefficients obtained by the voiced parts of the conversations are transformed into supervectors. These supervectors are applied to SVM classifier. Signal energy is used for determining the voiced parts of conversations. For the training of GMM models, the adaptation approach of UBM is preferred. Also, by testing GMM models that are created with different number of components and different length conversations, the impact of GMM components number and speech duration on the age and gender identification is investigated. At the end of these tests, the highest classification success rates are obtained by modeling 16-second speeches with 64component GMMs. The rates obtained from these tests are measured as 92.42% for gender category, 60.10% for age category and 60.02% for age&gender category

PDF

___

Schuller B., Steidl S., Batliner A., Burkhardt F., Devillers L., MüllerC.A., NarayananS.S., The INTERSPEECH 2010 paralinguistic challenge, INTERSPEECH, Makuhari-Japan, 2795-2798, 26-30 September, 2010.
Ceyhan E.B., Sağıroğlu Ş., Akyıl E., Gender classification based on ann with using fingerprint feature vectors, Journal of the Faculty of Engineering & Architecture of Gazi University, 29 (1), 201-207, 2014.
Chao W.L., Liu J.Z., Ding J.J., Facial age estimation based on label-sensitive learning and age-oriented regression, Pattern Recognition, 46 (3), 628-41, 2013.
Hu Y., Wu D., Nucci A., Pitch based gender identification with two stage classification, Security and Communication Networks, 5 (2), 211-225, 2012.
Djemili R., Bourouba R., Korba M.C.A., A speech signal based gender identification system using four classifiers, 2012 IEEE International Conference on Multimedia Computing and Systems (ICMCS), Tangier, 184-187, 10-12 May, 2012.
Gaikwad S., Gawali B., Mehrotra S.C., Gender Identıfıcatıon Usıng Svm Wıth Combınatıon Of Mfcc, Advances in Computational Research, 4 (1), 69-73, 2012.
Phoophuangpairoj R., Phongsuphap S., Two-Stage Gender Identification Using Pitch Frequencies, MFCCs and HMMs, 2015 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Kowloon , 2879-2884, 9-12 Oct., 2015.
Metze F., Ajmera J., Englert R., Bub U., Burkhardt F., Stegmann J., Muller C., Huber R., Andrassy B., Bauer J.G., Comparison of four approaches to age and gender recognition for telephone applications, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu-HI, IV-1089- IV-1092, 15-20 April, 2007.
Li M., Han K.J., Narayanan S., Automatic speaker age and gender recognition using acoustic and prosodic level information fusion, Computer Speech & Language, 27 (1), 151-167, 2013.
Van Heerden C., Barnard E., Davel M., van der Walt C., van Dyk E., Feld M., Müller C., Combining regression and classification methods for improving automatic speaker age recognition, 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Dallas-TX, 5174-5177, 14-19 March, 2010.
Meinedo H., Trancoso I., Age and gender classification using fusion of acoustic and prosodic features, INTERSPEECH, Makuhari-Japan, 2818-2821, 26-30 September, 2010.
Chen O.T.C., Gu J.J., Improved gender/age recognition system using arousal-selection and feature-selection schemes, 2015 IEEE International Conference on Digital Signal Processing (DSP), Singapore, 148-152, 21-24 July, 2015.
Gautam S., Singh L., Developmental pattern analysis and age prediction by extracting speech features and applying various classification techniques, 2015 International Communication & Automation (ICCCA), Noida, 83- 87, 15-16 May, 2015. on Computing,
Deller J.R., Hanse J.H.L. ve Proakis J.G., Discrete Time Processing of Speech Signals, IEEE Press, Piscataway, N.J., A.B.D, 2000.
Furui S., Digital Speech Processing, Synthesis and Recognition, Marcel Dekker, New York, A.B.D, 2001.
Reynolds D.A., Rose R.C., Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Transactions on Speech and Audio Processing, 3 (1), 72-83, 1995.
Dempster A.P., Laird N.M., Rubin D.B., Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. Series B (methodological), 1-38, 1977.
Reynolds D.A., Quatieri T.F., Dunn R.B., Speaker verification using adapted Gaussian mixture models, Digital signal processing, 10 (1), 19-41, 2000.
Ferras M., Leung C.C., Barras C., Gauvain J.L., Comparison of speaker adaptation methods as feature extraction for SVM-based speaker recognition, IEEE Transactions on Audio, Speech and Language Processing, 18 (6), 1366-1378, 2010.
Campbell W.M., Sturim D.E., Reynolds D.A., Support vector machines using GMM supervectors for speaker verification, Signal Processing Letters, IEEE, 13 (5), 308-311, 2006.
Takcı H., Diagnosis of Breast Cancer by the Help of Centroid Based Classifiers, Journal of the Faculty of Engineering & Architecture of Gazi University, 31 (2), 323-330, 2016.
Hsu C.W., Chang C.C., Lin C.J., A practical guide to support vector classification, Department of Computer Science and Information Engineering, National Taiwan University, 2003.
Hsu C.W., Lin C.J., A comparison of methods for multiclass support vector machines, IEEE Transactions on Neural Networks, 13 (2), 415-425, 2002.
Williamson G., Human communication: A linguistic introduction (2nd edition), Billingham, UK, 2006.
Stathopoulos E.T., Huber J.E., Sussman J.E., Changes in Acoustic Characteristics of the Voice Across the Life Span: Measures From Individuals 4-93 Years of Age, Journal of Speech, Language, and Hearing Research, 54 (4), 1011-1021, 2011.