Identifying acquisition devices from recorded speech signals using wavelet-based features

Identifying acquisition devices from recorded speech signals using wavelet-based features

Speech characteristics have played a critical role in media forensics, particularly in the investigation of evidence. This study proposes two wavelet-based feature extraction methods for the identification of acquisition devices from recorded speech. These methods are discrete wavelet-based coefficients (DWBCs) and wavelet packet-based coefficients, which are mainly based on a multiresolution analysis. These features ability to capture characteristics of acquisition devices is compared to conventional mel frequency cepstral coefficients and subband-based coefficients. In the experiments, 14 different audio acquisition devices were trained and tested using support vector machines. Experimental results showed that DWBCs can effectively be used in source audio acquisition device identification problems.

___

  • [1] Reynolds DA. Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 1995; 17: 91-108.
  • [2] Rabiner LR, Juang BH. Fundamentals of Speech Recognition. Englewood Cliffs, NJ, USA: Prentice Hall, 1993.
  • [3] Chul ML, Narayanan S. Toward detecting emotions in spoken dialogs. IEEE T Speech Audi P 2005; 13: 293-303.
  • [4] Metze F, Ajmera J, Englert R, Bub U, Burkhardt F, Stegmann J, Muller C, Huber R, Andrassy B, Bauer JG et al. Comparison of four approaches to age and gender recognition for telephone applications. In: Proceedings of the ICASSP; 15–20 April 2007; Honolulu, HI, USA. New York, NY, USA: IEEE. pp. 1605-1608.
  • [5] Hanil¸ci C, Erta¸s F, Erta¸s T, Eskidere O. Recognition of brand and models of cell-phones from recorded speech ¨ signals. IEEE T Inf Foren Sec 2012; 7: 625-634.
  • [6] Grigoras C. Applications of ENF criterion in forensic audio, video, computer, and telecommunication analysis. Forensic Sci Int 2007; 167: 136-145.
  • [7] Nicolalde DP, Apolin´ario JA, Biscainho LWP. Audio authenticity: detecting ENF discontinuity with high precision phase analysis. IEEE T Inf Foren Sec 2010; 5: 534-543.
  • [8] Yang R, Zhenhua Q, Jiwu H. Detecting digital audio forgeries by checking frame offsets. In: Proceedings of MM&Sec’2008; 22–23 August 2008; Oxford, UK. New York, NY, USA: ACM. pp. 21-26.
  • [9] Romero DG, Wilson CYE. Automatic acquisition device identification from speech recordings. In: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing; 14–19 March 2010; Dallas, TX, USA. New York, NY, USA: IEEE. pp. 1806-1809.
  • [10] Kraetzer C, Oermann A, Dittmann J, Lang A. Digital audio forensics: a first practical evaluation on microphone and environment classification. In: 9th Workshop on Multimedia & Security; 2007. New York, NY, USA: ACM. pp. 63-74.
  • [11] Buchholz R, Kraetzer C, Dittmann J. Microphone classification using Fourier coefficients. Lect Notes Comp Sci 2009; 5806: 235-246.
  • [12] Kraetzer C, Schott M, Dittmann J. Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models. In: Proceedings of the 11th Workshop on Multimedia and Security; 2009. Princeton, NJ, USA: ACM Press. pp. 49-56.
  • [13] Kraetzer C, Qian K, Schott M, Dittmann J. A context model for microphone forensics and its application in evaluations. In: Proceedings of Media Watermarking, Security, and Forensics XIII, Electronic Imaging Conference; 2011. New York, NY, USA: SPIE.
  • [14] Tzanetakis G, Cook P. Musical genre classification of audio signals. IEEE T Speech Audi P 2002; 10: 293-301.
  • [15] Cho HY, Oh YH. On the use of channel-attentive MFCC for robust recognition of partially corrupted speech. IEEE Signal Proc Let 2004; 11: 581-584.
  • [16] Campbell WM, Campbell JP, Reynolds DA, Singer E, Torres-Carrasquillo PA. Support vector machines for speaker and language recognition. Comput Speech Lang 2006; 20: 210-229.
  • [17] Sarikaya R, Hansen HL. High resolution speech feature parameterization for monophone-based stressed speech recognition. IEEE Signal Proc Let 2000; 7: 182-185.
  • [18] Erzin E, Cetin AE, Yardımci Y. Subband analysis for robust speech recognition in the presence of car noise. In: Proceedings of ICASSP-95; 9–12 May 1995; Detroit, MI, USA. New York, NY, USA: IEEE. pp. 417-420.
  • [19] Phadke AG, Thorp JS. Computer Relaying for Power Systems. 2nd ed. Baldock, UK: Research Studies Press Ltd., 2009.
  • [20] Garcia C, Zikos G, Tziritas G. A wavelet-based framework for face recognition. In: International Workshop on Advances in Facial Image Analysis Recognition Technology; 1998.
  • [21] Mallat S. A theory for multiresolution signal decomposition: the wavelet representation. IEEE T Pattern Anal 1989; 11: 674-693.
  • [22] Mallat S. A Wavelet Tour of Signal Processing. San Diego, CA, USA: Academic Press, 1998.
  • [23] Tufekci Z, Gurbuz S. Noise robust speaker verification using mel-frequency discrete wavelet coefficients and parallel model compensation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing; 18–23 March 2005. New York, NY, USA: IEEE. pp. 657-660.
  • [24] Mahmoud IA, Hanaa SA. Wavelet-based mel-frequency cepstral coefficients for speaker identification using hidden Markov models. J Telecommun 2010; 1: 16-21.
  • [25] Chen WC, Hsieh CT, Lai E. Multiband approach to robust text-independent speaker identification. Computational Linguistics and Chinese Language Processing 2004; 9: 63-76.
  • [26] Sarikaya R, Pellom BL, Hansen HL. Wavelet packet transform features with application to speaker identification. In: Proceedings of the IEEE Nordic Signal Processing Symposium; 1998; Visgo, Denmark. New York, NY, USA: IEEE. pp. 81-84.
  • [27] Keeton PIJ, Schlindwein FS. Application of wavelets in Doppler ultrasound. Sensor Rev 1997; 17: 38-45.
  • [28] Campbell JP. Speaker recognition: a tutorial. P IEEE 1997; 85: 1437-1462.
  • [29] Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task. In: Proceedings of the SPECOM; 2005. pp. 191-194.
  • [30] Slaney M. Auditory Toolbox: A MATLAB Toolbox for Auditory Modeling, Work Technical Report. Palo Alto, CA, USA: Interval Research Corporation, 1998.
  • [31] Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM T Intel Syst Tec 2001; 2: 1-27.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

The process of creeping discharge-caused damage on oil/pressboard insulation

Ruijin LIAO, Ende HU, Lijun YANG, Lian DUAN

Comparative study of drive systems using vector-controlled PMSM fed by a matrix converter and a conventional frequency converter

Pawel SZCZESNIAK, Konrad URBANSKI, Zbigniew FEDYCZAK, Krzysztof ZAWIRSKI

Gender classification: a convolutional neural network approach

Mohamed HANI KHALIL, Syafeeza AHMAD RADZI, Rabia BAKHTERI, Shan Sung LIEW

An unsupervised heterogeneous log-based framework for anomaly detection

Asif Iqbal HAJAMYDEEN, Nur Izura UDZIR, Ramlan MAHMOD, GHANI ABDUL Abdul Azim

A slotted ALOHA-based cognitive radio network under capture effect in Rayleigh fading channels

Alper KARAHAN, Sedat ATMACA, Muhammed Enes BAYRAKDAR

A new deployment method for electric vehicle charging infrastructure

Bünyamin YAĞCITEKİN, Mehmet UZUNOĞLU, Arif KARAKAŞ

A parametric study on privatization revenues of the electricity distribution companies in Turkey

Selim AY, Ercan İZGİ

Application of kappa statistics in sequential tests for family-based design

Farid RAJABLI

Behavior characteristics of a cap-resistor, memcapacitor, and a memristor from the response obtained of RC and RL electrical circuits described by fractional differential equations

Jos´e Francisco AGUILAR GOMEZ

Fractional control and generalized synchronization for a nonlinear electromechanical chaotic system and its circuit simulation with

Wei SUN, Multisim Zhen WANG, Tengfei LEI, Xiaojian XI