Sajid SALEEM, Falak TAHIR, Ayaz AHMAD

Extracting accent information from Urdu speech for forensic speaker recognition

This paper presents a new method for extraction of accent information from Urdu speech signals. Accentis used in speaker recognition system especially in forensic cases and plays a vital role in discriminating people ofdifferent groups, communities and origins due to their different speaking styles. The proposed method is based onGaussian mixture model-universal background model (GMM-UBM), mel-frequency cepstral coefficients (MFCC), and adata augmentation (DA) process. The DA process appends features to base MFCC features and improves the accentextraction and forensic speaker recognition performances of GMM-UBM. Experiments are performed on an Urdu forensicspeaker corpus. The experimental results show that the proposed method improves the equal error rate and the accuracyof GMM-UBM by 2.5% and 3.7%, respectively.

PDF

___

[1] Bartkova K, Jouvet D. On using units trained on foreign data for improved multiple accent speech recognition. Speech Communication 2007; 49 (10): 836-846. doi: 10.1016/j.specom.2006.12.009
[2] Huang C, Chen T, Chang E. Accent issues in large vocabulary continuous speech recognition. International Journal of Speech Technology 2004; 7 (2): 141-153. doi: 10.1023/B:IJST.0000017014.52972.1d
[3] Sinha S, Jain A, Agrawal SS. Acoustic-phonetic feature based dialect in hindi speech. International Journal on Smart Sensing & Intelligent Systems 2015; 8 (1): 235-254. doi: 10.21307/ijssis-2017-757
[4] Tjalve M, Huckvale M. Pronunciation variation modelling using accent features. In: 2005 European Conference on Speech Communication and Technology; Lisbon, Portugal; 2005. pp. 1341-1344.
[5] Lyn GE. Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model. MSc, Universiti Teknologi Malaysia, Johor, Malaysia, 2013.
[6] Benzeghiba M, Demori R, Deroo O, Dupont S, Erbes T et al. Automatic speech recognition and speech variability: a review. Speech Communication 2007; 49 (11): 763-786. doi: 10.1016/j.specom.2007.02.006
[7] Behravan H. Dialect and accent recognition. MSc, University of Eastern Finland, Joensuu, Finland, 2012.
[8] Kumar VR, Vydana HK, Vuppala AK. Significance of GMM-UBM–based modelling for Indian language identification. Procedia Computer Science 2015; 54: 231-236. doi: 10.1016/j.procs.2015.06.027
[9] Alavijeh AHP. Speaker profiling for forensic applications. MSc, Katholieke Universiteit Leuven, Leuven, Belgium, 2014.
[10] Algabri M, Mathkour H, Bencherif MA, Alsulaiman M, Mekhtiche MA. Automatic speaker recognition for mobile forensic applications. Mobile Information Systems 2017; 1-6. doi: 10.1155/2017/6986391
[11] Rafaqat A, Irtza S, Farooq M, Hussain S. Accent classification among Punjabi, Urdu, Pashto, Saraiki and Sindhi accents of Urdu language. In: 2014 The Conference on Language and Technology; Lahore, Pakistan; 2014. pp. 1-7.
[12] Rauf S, Hameed A, Habib T, Hussain S. District names speech corpus for Pakistani languages. In: IEEE 2015 International Conference on Asian Spoken Language Research and Evaluation; Shanghai, China; 2015. pp. 207-211.
[13] Qasim M, Nawaz S, Hussain S, Habib T. Urdu speech recognition system for district names of Pakistan: development, challenges and solutions. In: IEEE 2016 Conference on Coordination and Standardization of Speech Databases and Assessment Techniques; Bali, Indonesia; 2016. pp. 28-32.
[14] Lazaridis A, Khoury E. Swiss French regional accent identification. In: 2014 The Speaker and Language Recognition Workshop; Joensuu, Finland; 2014. pp. 106-111
[15] Ilina O, Koval S, Khitrov M. Phonetic analysis in forensic speaker identification: an example of routine expert actions. In: 1999 Congress of Phonetic sciences; San Francisco, USA; 1999. pp. 157-160.
[16] Brown G. Exploring forensic accent recognition using the y-accdist system. In: 2016 Annual Conference of the International Speech Communication Association; Dresden, Germany; 2016. pp. 305-308.
[17] Maher RC. Audio forensic examination. IEEE Signal Processing Magazine 2009; 26 (2): 84-94. doi: 10.1109/MSP.2008.931080
[18] Drygajlo A. Automatic speaker recognition for forensic case assessment and interpretation. In: Neustein A (editor). Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism. New York, NY, USA: Springer 2012. pp. 21-39.
[19] Stefanus I, Sarwono RJ, Mandasari MI. GMM-based automatic speaker verification system development for forensics in Bahasa Indonesia. In: IEEE 2017 International Conference on Instrumentation, Control, and Automation; Yogyakarta, Indonesia; 2017. pp. 56-61.
[20] Brown G, Wormald J. Automatic sociophonetics: exploring corpora with a forensic accent recognition system. The Journal of the Acoustical Society of America 2017; 142 (1): 422-433. doi: 10.1121/1.4991330
[21] Ma Z, Fokoué E. A comparison of classifiers in performing speaker accent recognition using mfcc features, Open Journal of Statistics 2014; 4: 258-266. doi: 10.4236/ojs.2014.44025
[22] Bhatia M, Singh N, Singh A. Speaker accent recognition by MFCC using k-nearest neighbour algorithm: a different approach. International Journal of Advanced Research in Computer and Communication Engineering 2015; 4 (1): 153-155. doi: 10.17148/IJARCCE.2015.4131
[23] Maesa A, Garzia F, Scarpiniti M, Cusani R. Text-independent automatic speaker recognition system using melfrequency cepstrum coefficient and gaussian mixture models. Journal of Information Security 2012; 3 (4): 335-340. doi: 10.4236/jis.2012.34041.
[24] Hanani A, Russell M, Carey MJ. Speech-based identification of social groups in a single accent of british english by humans and computers. In: IEEE 2011 International Conference on Acoustics, Speech and Signal Processing; Prague, Czech Republic; 2011. pp. 4876-4879.
[25] Abbas AW, Ahmad N, Ali H. Pashto spoken digits database for the automatic speech recognition research; In: IEEE 2012 International Conference on Automation and Computing; Loughborough, UK; 2012. pp. 1-5.
[26] Huang R, Hansen JHL, Angkititrakul P. Dialect/accent classification using unrestricted audio. IEEE Transactions on Audio, Speech, and Language Processing 2007; 15 (2): 453-464. doi: 10.1109/TASL.2006.881695.
[27] Cortes C, Vapnik V. Support-vector networks. Machine Learning 1995; 20 (3): 273-297. doi: 10.1007/BF00994018.
[28] Reynolds DA, Quatieri TF, Dunn RB. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 2000; 10 (1): 19-41. doi: 10.1006/dspr.1999.0361.
[29] Gauvain JL, Lee CH. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 1994; 2 (2): 291-298. doi: 10.1109/89.279278.
[30] Kenny P. Bayesian speaker verification with heavy-tailed priors. In: 2010 The Speaker and Language Recognition; Brno, Czech Republic; 2010. pp. 1-10.