Ángel David PEDROZA RAMÍREZ, Aldonso BECERRA SÁNCHEZ, José de Jesús VILLA HERNÁNDEZ, José Ismael DE LA ROSA VARGAS

Limited-data automatic speaker verification algorithm using band-limited phase-only correlation function

In this paper, a new method to deal with automatic speaker verification based on band-limited phaseonly correlation (BLPOC) is proposed. The aim of this study is to validate the use of the BLPOC function as a newlimited-data automatic speaker verification technique. Although some speaker verification techniques have high accuracy,efficiency usually depends on the extraction of complex theoretical information from speech signals and the amount of thedata for training the algorithms. The BLPOC function is a high-accuracy biometric technique traditionally implementedin human identification by fingerprints (through image-matching). When applying the BLPOC function in automaticspeaker verification through the proposed algorithms (under limited-data conditions), a 98.24% true acceptance rate(TAR) and 87.17% true rejection rate (TRR) in a custom database (and 93.75% TAR and 67.05% TRR in the ELSDSRdatabase) were obtained. The proposed algorithm is a theoretically simple method for automatic speaker verificationwhose main advantage is that it can provide identification under limited-data conditions. In this sense, the BLPOCfunction could be applicable in other limited-data biometric identifications by sound signals.

PDF

___

[1] Reynolds DA. An overview of automatic speaker recognition technology. In: IEEE 2002 International Conference on Acoustics, Speech, and Signal Processing; Orlando, FL, USA; 2002. pp. IV-4072-IV-4075.
[2] Sharma D, Ali I. The effect of DC coefficient on mMFCC and mIMFCC for robust speaker recognition. In: 2015 International Conference on Advances in Computing, Communications and Informatics; Kochi, India; 2015. pp. 313-317.
[3] Drgas S, Virtanen T. Speaker verification using adaptive dictionaries in non-negative spectrogram deconvolution. In: Vincent E, Yeredor A, Koldovský Z, Tichavský P (editors). Latent Variable Analysis and Signal Separation. Liberec, Czech Republic; 2015. pp. 462-469.
[4] Li M, Kim J, Lammert A, Ghosh P, Ramanarayanan V et al. Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. Computer Speech & Language 2016; 36: 196-211.
[5] Kinnunen T, Li H. An overview of text-independent speaker recognition: from features to supervectors. Speech Communication 2010; 52 (1): 12-40.
[6] Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F et al. Spoofing and countermeasures for speaker verification: a survey. Speech Communication 2015; 66: 130-153.
[7] Chakroborty S, Saha G. Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter. World Academy of Science, Engineering and Technology 2009; 3 (11): 1968-1976.
[8] Sharma D, Ali I. A modified MFCC feature extraction technique for robust speaker recognition. In: 2015 International Conference on Advances in Computing, Communications and Informatics; Kochi, India; 2015. pp. 1052-1057.
[9] Sen N, Basu T, Chakroborty S. Comparison of features extracted using time-frequency and frequency-time analysis approach for text-independent speaker identification. In: 2011 National Conference on Communications; Bangalore, India; 2011. pp. 1-5.
[10] Jayanna H, Prasanna S. Limited data speaker identification. Sadhana 2010; 35 (5): 525-546.
[11] Mowlaee P, Saeidi R, Stylianou Y. Advances in phase-aware signal processing in speech communication. Speech Communication 2016; 81: 1-29.
[12] Schluter R, Ney H. Using phase spectrum information for improved speech recognition performance. In: IEEE 2001 International Conference on Acoustics, Speech, and Signal Processing; Salt Lake City, UT, USA; 2001. pp. 133-136.
[13] Ito I, Kiya H. DCT sign-only correlation with application to image matching and the relationship with phase-only correlation. In: IEEE 2007 International Conference on Acoustics, Speech and Signal Processing; Honolulu, HI, USA; 2007. pp. I-1237-I-1240.
[14] Miyazawa K, Ito K, Aoki T, Kobayashi K, Nakajima H. An efficient iris recognition algorithm using phase-based image matching. In: IEEE 2005 International Conference on Image Processing; Genova, Italy; 2005. p. II-49.
[15] Miyazawa K, Ito K, Aoki T, Kobayashi K, Nakajima H. An effective approach for iris recognition using phase-based image matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 2008; 30 (10): 1741-1756. doi: 10.1109/TPAMI.2007.70833
[16] Shabrina N, Isshiki T, Kunieda H. Fingerprint authentication on touch sensor using Phase-Only Correlation method. In: 2016 7th International Conference of Information and Communication Technology for Embedded Systems; Bangkok, Thailand; 2016. pp. 85-89.
[17] Rida I, Almaadeed S, Bouridane A. Gait recognition based on modified phase-only correlation. Signal, Image and Video Processing 2016; 10 (3): 463-470.
[18] Teusdea AC, Gabor G. Iris recognition with phase-only correlation. In: Annals of DAAAM for 2009 Focus on Theory, Practice and Education; Vienna, Austria; 2009. pp. 189-190.
[19] Ito K, Nakajima H, Kobayashi K, Aoki T, Higuchi T. A fingerprint matching algorithm using phase-only correlation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2004; 87 (3): 682-691.
[20] Wu S, Wang Y, Zhan Y, Chang X. Automatic microseismic event detection by band-limited phase-only correlation. Physics of the Earth and Planetary Interiors 2016; 261: 3-16.
[21] Bimbot F, Bonastre J, Fredouille C, Gravier G, Magrin-Chagnolleau I et al. A tutorial on text-independent speaker verification. EURASIP Journal on Advances in Signal Processing 2004; 2004 (4): 101962.
[22] Kasap C, Arslan M. A unified approach to speech enhancement and voice activity detection. Turkish Journal of Electrical Engineering & Computer Sciences 2013; 21 (2): 527-547. doi: 10.3906/elk-1107-30
[23] Rabiner LR, Juang BH. Fundamentals of Speech Recognition. Upper Saddle River, NJ, USA: Prentice-Hall, 1993.
[24] Dumitru CO, Gavat I, Vieru R. Speaker verification using HMM for Romanian language. In: Proceedings of ELMAR 2006; Zadar, Croatia; 2006. pp. 131-134.
[25] Pedroza A, de la Rosa J, Garcia E, Gamboa H, Becerra A. Diseño de prototipo para mejorar la dicción mediante el uso de Modelos Ocultos de Markov. Pistas Educativas 2016; 38 (120): 1020-1038 (in Spanish).
[26] Jayanth M, Reddy B. Speaker identification based on GFCC using GMM-UBM. International Journal of Engineering Science Invention 2016; 5: 62-65.
[27] Francis F, Vishnu R. A novel noise robust speaker identification system. ARPN Journal of Engineering and Applied Sciences 2015; 10 (17): 7641-7646.
[28] Lan Y, Hu Z, Soh Y, Huang G. An extreme learning machine approach for speaker recognition. Neural Computing and Applications 2013; 22 (3-4): 417-425.