PERFORMING ACCURATE SPEAKER RECOGNITION BY USE OF SVM AND CEPSTRAL FEATURES

PERFORMING ACCURATE SPEAKER RECOGNITION BY USE OF SVM AND CEPSTRAL FEATURES

The task of performing speaker recognition over voice recordings is an active research area in the relevant literature in which many applications has been proposed so far.  In this study, speaker recognition is performed over cepstral features extracted from raw voice recordings. Some of the most prominent cepstral feature selection methods, namely, LPC, LPCC, MFCC, PLP and RASTA-PLP are utilized and their contribution to the performance of the applied method is investigated. Obtained features are handled by SVM classification algorithm to finalize the speaker recognition task. As a result, it is observed that cepstral feature selection methods such as LPCC and MFCC combined with SVM classification result in around 97% accuracy.

___

  • "Principles of Data Acquisition and Conversion". Texas Instruments. April 2015. http://www.ti.com/lit/an/sbaa051a/sbaa051a.pdf (08.06.2017)
  • Ambikairajah, E. (2007, December). Emerging features for speaker recognition. In Information, Communications & Signal Processing, 2007 6th International Conference on (pp. 1-7). IEEE.
  • Kurzekar, P. K., Deshmukh, R. R., Waghmare, V. B., & Shrishrimal, P. P. (2014). A comparative study of feature extraction techniques for speech recognition system. International Journal of Innovative Research in Science, Engineering and Technology, 3(12), 18006-18016.
  • Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561-580.
  • Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.
  • Yusnita, M. A., Paulraj, M. P., Yaacob, S., Fadzilah, M. N., & Shahriman, A. B. (2013). Acoustic analysis of formants across genders and ethnical accents in Malaysian English using ANOVA. Procedia Engineering, 64, 385-394.
  • Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.
  • Dumitru, C. O., & Gavat, I. (2006, June). A comparative study of feature extraction methods applied to continuous speech recognition in Romanian Language. In Multimedia Signal Processing and Communications, 48th International Symposium ELMAR-2006 focused on (pp. 115-118). IEEE.
  • O'Shaughnessy, D. (2003). Interacting with computers by voice: automatic speech recognition and synthesis. Proceedings of the IEEE, 91(9), 1272-1305.
  • Maheswari, N. U., Kabilan, A. P., & Venkatesh, R. (2010). A hybrid model of neural network approach for speaker independent word recognition. International Journal of Computer Theory and Engineering, 2(6), 912.
  • Dhonde, S. B., & Jagade, S. M. (2015). Feature extraction techniques in speaker recognition: A review. International Journal on Recent Technologies in Mechanical and Electrical Engineering (IJRMEE), 2(5), 104-106.
  • Chowdhury, M. H. (2014). Speech based gender identification using empirical mode decomposition (EMD) (Doctoral dissertation, BRAC University).
  • Kumar, J., Prabhakar, O. P., & Sahu, N. K. (2014). Comparative Analysis of Different Feature Extraction and Classifier Techniques for Speaker Identification Systems: A Review. International Journal of Innovative Research in Computer and Communication Engineering, 2(1), 2760-2269.
  • Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE transactions on speech and audio processing, 2(4), 578-589.
  • Kwon, O. W., Chan, K., & Lee, T. W. (2003). Speech feature analysis using variational Bayesian PCA. IEEE Signal Processing Letters, 10(5), 137-140.
  • Soman, K.P., Loganathan, R. and Ajay, V. (2011). Machine learning with SVM andother kernel methods. PHI Learning Pvt. Ltd., 486 s.
  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
  • Li, S., Li, H., Li, M., Shyr, Y., Xie, L., & Li, Y. (2009). Improved prediction of lysine acetylation by support vector machines. Protein and peptide letters, 16(8), 977-983.
  • Yildiz, M., Bergil, E., & Oral, C. (2017). Comparison of different classification methods for the preictal stage detection in EEG signals. Biomedical Research, 28(2).
  • AYHAN, S., & ERDOĞMUŞ, Ş. (2014). Destek vektör makineleriyle sınıflandırma problemlerinin çözümü için çekirdek fonksiyonu seçimi. Eskişehir Osmangazi Üniversitesi İktisadi ve İdari Bilimler Dergisi, 9(1).
  • Zouhir, Y., & Ouni, K. (2014). A bio-inspired feature extraction for robust speech recognition. SpringerPlus, 3(1), 651.
  • Saksamudre, S. K., & Deshmukh, R. R. (2015). Comparative study of isolated word recognition system for Hindi language. International Journal of Engineering Research and Technology, 4(07).
  • Salomons, E. L., & Havinga, P. J. (2015). A survey on the feasibility of sound classification on wireless sensor nodes. Sensors, 15(4), 7462-7498.