Identifying acquisition devices from recorded speech signals using wavelet-based features
Identifying acquisition devices from recorded speech signals using wavelet-based features
Speech characteristics have played a critical role in media forensics, particularly in the investigation of evidence. This study proposes two wavelet-based feature extraction methods for the identification of acquisition devices from recorded speech. These methods are discrete wavelet-based coefficients (DWBCs) and wavelet packet-based coefficients, which are mainly based on a multiresolution analysis. These features ability to capture characteristics of acquisition devices is compared to conventional mel frequency cepstral coefficients and subband-based coefficients. In the experiments, 14 different audio acquisition devices were trained and tested using support vector machines. Experimental results showed that DWBCs can effectively be used in source audio acquisition device identification problems.
___
- [1] Reynolds DA. Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 1995; 17: 91-108.
- [2] Rabiner LR, Juang BH. Fundamentals of Speech Recognition. Englewood Cliffs, NJ, USA: Prentice Hall, 1993.
- [3] Chul ML, Narayanan S. Toward detecting emotions in spoken dialogs. IEEE T Speech Audi P 2005; 13: 293-303.
- [4] Metze F, Ajmera J, Englert R, Bub U, Burkhardt F, Stegmann J, Muller C, Huber R, Andrassy B, Bauer JG et al. Comparison of four approaches to age and gender recognition for telephone applications. In: Proceedings of the ICASSP; 1520 April 2007; Honolulu, HI, USA. New York, NY, USA: IEEE. pp. 1605-1608.
- [5] Hanil¸ci C, Erta¸s F, Erta¸s T, Eskidere O. Recognition of brand and models of cell-phones from recorded speech ¨ signals. IEEE T Inf Foren Sec 2012; 7: 625-634.
- [6] Grigoras C. Applications of ENF criterion in forensic audio, video, computer, and telecommunication analysis. Forensic Sci Int 2007; 167: 136-145.
- [7] Nicolalde DP, Apolin´ario JA, Biscainho LWP. Audio authenticity: detecting ENF discontinuity with high precision phase analysis. IEEE T Inf Foren Sec 2010; 5: 534-543.
- [8] Yang R, Zhenhua Q, Jiwu H. Detecting digital audio forgeries by checking frame offsets. In: Proceedings of MM&Sec2008; 2223 August 2008; Oxford, UK. New York, NY, USA: ACM. pp. 21-26.
- [9] Romero DG, Wilson CYE. Automatic acquisition device identification from speech recordings. In: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing; 1419 March 2010; Dallas, TX, USA. New York, NY, USA: IEEE. pp. 1806-1809.
- [10] Kraetzer C, Oermann A, Dittmann J, Lang A. Digital audio forensics: a first practical evaluation on microphone and environment classification. In: 9th Workshop on Multimedia & Security; 2007. New York, NY, USA: ACM. pp. 63-74.
- [11] Buchholz R, Kraetzer C, Dittmann J. Microphone classification using Fourier coefficients. Lect Notes Comp Sci 2009; 5806: 235-246.
- [12] Kraetzer C, Schott M, Dittmann J. Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models. In: Proceedings of the 11th Workshop on Multimedia and Security; 2009. Princeton, NJ, USA: ACM Press. pp. 49-56.
- [13] Kraetzer C, Qian K, Schott M, Dittmann J. A context model for microphone forensics and its application in evaluations. In: Proceedings of Media Watermarking, Security, and Forensics XIII, Electronic Imaging Conference; 2011. New York, NY, USA: SPIE.
- [14] Tzanetakis G, Cook P. Musical genre classification of audio signals. IEEE T Speech Audi P 2002; 10: 293-301.
- [15] Cho HY, Oh YH. On the use of channel-attentive MFCC for robust recognition of partially corrupted speech. IEEE Signal Proc Let 2004; 11: 581-584.
- [16] Campbell WM, Campbell JP, Reynolds DA, Singer E, Torres-Carrasquillo PA. Support vector machines for speaker and language recognition. Comput Speech Lang 2006; 20: 210-229.
- [17] Sarikaya R, Hansen HL. High resolution speech feature parameterization for monophone-based stressed speech recognition. IEEE Signal Proc Let 2000; 7: 182-185.
- [18] Erzin E, Cetin AE, Yardımci Y. Subband analysis for robust speech recognition in the presence of car noise. In: Proceedings of ICASSP-95; 912 May 1995; Detroit, MI, USA. New York, NY, USA: IEEE. pp. 417-420.
- [19] Phadke AG, Thorp JS. Computer Relaying for Power Systems. 2nd ed. Baldock, UK: Research Studies Press Ltd., 2009.
- [20] Garcia C, Zikos G, Tziritas G. A wavelet-based framework for face recognition. In: International Workshop on Advances in Facial Image Analysis Recognition Technology; 1998.
- [21] Mallat S. A theory for multiresolution signal decomposition: the wavelet representation. IEEE T Pattern Anal 1989; 11: 674-693.
- [22] Mallat S. A Wavelet Tour of Signal Processing. San Diego, CA, USA: Academic Press, 1998.
- [23] Tufekci Z, Gurbuz S. Noise robust speaker verification using mel-frequency discrete wavelet coefficients and parallel model compensation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing; 1823 March 2005. New York, NY, USA: IEEE. pp. 657-660.
- [24] Mahmoud IA, Hanaa SA. Wavelet-based mel-frequency cepstral coefficients for speaker identification using hidden Markov models. J Telecommun 2010; 1: 16-21.
- [25] Chen WC, Hsieh CT, Lai E. Multiband approach to robust text-independent speaker identification. Computational Linguistics and Chinese Language Processing 2004; 9: 63-76.
- [26] Sarikaya R, Pellom BL, Hansen HL. Wavelet packet transform features with application to speaker identification. In: Proceedings of the IEEE Nordic Signal Processing Symposium; 1998; Visgo, Denmark. New York, NY, USA: IEEE. pp. 81-84.
- [27] Keeton PIJ, Schlindwein FS. Application of wavelets in Doppler ultrasound. Sensor Rev 1997; 17: 38-45.
- [28] Campbell JP. Speaker recognition: a tutorial. P IEEE 1997; 85: 1437-1462.
- [29] Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task. In: Proceedings of the SPECOM; 2005. pp. 191-194.
- [30] Slaney M. Auditory Toolbox: A MATLAB Toolbox for Auditory Modeling, Work Technical Report. Palo Alto, CA, USA: Interval Research Corporation, 1998.
- [31] Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM T Intel Syst Tec 2001; 2: 1-27.