Extracting accent information from Urdu speech for forensic speaker recognition
Extracting accent information from Urdu speech for forensic speaker recognition
This paper presents a new method for extraction of accent information from Urdu speech signals. Accentis used in speaker recognition system especially in forensic cases and plays a vital role in discriminating people ofdifferent groups, communities and origins due to their different speaking styles. The proposed method is based onGaussian mixture model-universal background model (GMM-UBM), mel-frequency cepstral coefficients (MFCC), and adata augmentation (DA) process. The DA process appends features to base MFCC features and improves the accentextraction and forensic speaker recognition performances of GMM-UBM. Experiments are performed on an Urdu forensicspeaker corpus. The experimental results show that the proposed method improves the equal error rate and the accuracyof GMM-UBM by 2.5% and 3.7%, respectively.
___
- [1] Bartkova K, Jouvet D. On using units trained on foreign data for improved multiple accent speech recognition.
Speech Communication 2007; 49 (10): 836-846. doi: 10.1016/j.specom.2006.12.009
- [2] Huang C, Chen T, Chang E. Accent issues in large vocabulary continuous speech recognition. International Journal
of Speech Technology 2004; 7 (2): 141-153. doi: 10.1023/B:IJST.0000017014.52972.1d
- [3] Sinha S, Jain A, Agrawal SS. Acoustic-phonetic feature based dialect in hindi speech. International Journal on
Smart Sensing & Intelligent Systems 2015; 8 (1): 235-254. doi: 10.21307/ijssis-2017-757
- [4] Tjalve M, Huckvale M. Pronunciation variation modelling using accent features. In: 2005 European Conference on
Speech Communication and Technology; Lisbon, Portugal; 2005. pp. 1341-1344.
- [5] Lyn GE. Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model.
MSc,
Universiti Teknologi Malaysia, Johor, Malaysia, 2013.
- [6] Benzeghiba M, Demori R, Deroo O, Dupont S, Erbes T et al. Automatic speech recognition and speech variability:
a review. Speech Communication 2007; 49 (11): 763-786. doi: 10.1016/j.specom.2007.02.006
- [7] Behravan H. Dialect and accent recognition. MSc, University of Eastern Finland, Joensuu, Finland, 2012.
- [8] Kumar VR, Vydana HK, Vuppala AK. Significance of GMM-UBM–based modelling for Indian language identification. Procedia Computer Science 2015; 54: 231-236. doi: 10.1016/j.procs.2015.06.027
- [9] Alavijeh AHP. Speaker profiling for forensic applications. MSc, Katholieke Universiteit Leuven, Leuven, Belgium,
2014.
- [10] Algabri M, Mathkour H, Bencherif MA, Alsulaiman M, Mekhtiche MA. Automatic speaker recognition for mobile
forensic applications. Mobile Information Systems 2017; 1-6. doi: 10.1155/2017/6986391
- [11] Rafaqat A, Irtza S, Farooq M, Hussain S. Accent classification among Punjabi, Urdu, Pashto, Saraiki and Sindhi
accents of Urdu language. In: 2014 The Conference on Language and Technology; Lahore, Pakistan; 2014. pp. 1-7.
- [12] Rauf S, Hameed A, Habib T, Hussain S. District names speech corpus for Pakistani languages. In: IEEE 2015
International Conference on Asian Spoken Language Research and Evaluation; Shanghai, China; 2015. pp. 207-211.
- [13] Qasim M, Nawaz S, Hussain S, Habib T. Urdu speech recognition system for district names of Pakistan: development, challenges and solutions. In: IEEE 2016 Conference on Coordination and Standardization of Speech
Databases and Assessment Techniques; Bali, Indonesia; 2016. pp. 28-32.
- [14] Lazaridis A, Khoury E. Swiss French regional accent identification. In: 2014 The Speaker and Language Recognition
Workshop; Joensuu, Finland; 2014. pp. 106-111
- [15] Ilina O, Koval S, Khitrov M. Phonetic analysis in forensic speaker identification: an example of routine expert
actions. In: 1999 Congress of Phonetic sciences; San Francisco, USA; 1999. pp. 157-160.
- [16] Brown G. Exploring forensic accent recognition using the y-accdist system. In: 2016 Annual Conference of the
International Speech Communication Association; Dresden, Germany; 2016. pp. 305-308.
- [17] Maher RC. Audio forensic examination. IEEE Signal Processing Magazine 2009; 26 (2): 84-94. doi:
10.1109/MSP.2008.931080
- [18] Drygajlo A. Automatic speaker recognition for forensic case assessment and interpretation. In: Neustein
A (editor).
Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism. New York, NY, USA: Springer 2012. pp.
21-39.
- [19] Stefanus I, Sarwono RJ, Mandasari MI. GMM-based automatic speaker verification system development for forensics
in Bahasa Indonesia. In: IEEE 2017 International Conference on Instrumentation, Control, and Automation;
Yogyakarta, Indonesia; 2017. pp. 56-61.
- [20] Brown G, Wormald J. Automatic sociophonetics: exploring corpora with a forensic accent recognition system. The
Journal of the Acoustical Society of America 2017; 142 (1): 422-433. doi: 10.1121/1.4991330
- [21] Ma Z, Fokoué E. A comparison of classifiers in performing speaker accent recognition using mfcc features, Open
Journal of Statistics 2014; 4: 258-266. doi: 10.4236/ojs.2014.44025
- [22] Bhatia M, Singh N, Singh A. Speaker accent recognition by MFCC using k-nearest neighbour algorithm: a different
approach. International Journal of Advanced Research in Computer and Communication Engineering 2015; 4 (1):
153-155. doi: 10.17148/IJARCCE.2015.4131
- [23] Maesa A, Garzia F, Scarpiniti M, Cusani R. Text-independent automatic speaker recognition system using melfrequency cepstrum coefficient and gaussian mixture models. Journal of Information Security 2012; 3 (4): 335-340.
doi: 10.4236/jis.2012.34041.
- [24] Hanani A, Russell M, Carey MJ. Speech-based identification of social groups in a single accent of british english
by humans and computers. In: IEEE 2011 International Conference on Acoustics, Speech and Signal Processing;
Prague, Czech Republic; 2011. pp. 4876-4879.
- [25] Abbas AW, Ahmad N, Ali H. Pashto spoken digits database for the automatic speech recognition
research; In:
IEEE 2012 International Conference on Automation and Computing; Loughborough, UK; 2012. pp. 1-5.
- [26] Huang R, Hansen JHL, Angkititrakul P. Dialect/accent classification using unrestricted audio. IEEE Transactions
on Audio, Speech, and Language Processing 2007; 15 (2): 453-464. doi: 10.1109/TASL.2006.881695.
- [27] Cortes C, Vapnik V. Support-vector networks. Machine Learning 1995; 20 (3): 273-297. doi: 10.1007/BF00994018.
- [28] Reynolds DA, Quatieri TF, Dunn RB. Speaker verification using adapted Gaussian mixture models. Digital Signal
Processing 2000; 10 (1): 19-41. doi: 10.1006/dspr.1999.0361.
- [29] Gauvain JL, Lee CH. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov
chains. IEEE Transactions on Speech and Audio Processing 1994; 2 (2): 291-298. doi: 10.1109/89.279278.
- [30] Kenny P. Bayesian speaker verification with heavy-tailed priors. In: 2010 The Speaker and Language Recognition;
Brno, Czech Republic; 2010. pp. 1-10.