Source microphone identification from speech recordings based on a Gaussian mixture model

Microphone identification is a specific type of media forensics that investigates whether it is possible to identify the source microphone from speech recordings. The main aim of this study is to find out which of the several feature extraction techniques are best suited to the source microphone identification systems. We perform microphone identification experiments with 16 different microphones using 3 datasets. In order to improve the results on the datasets, we also investigate the important parameters that may affect the microphone identification performance. Our experimental results show that the proposed method is comparable to the existing studies in a closed-set identification rate.

Source microphone identification from speech recordings based on a Gaussian mixture model

Microphone identification is a specific type of media forensics that investigates whether it is possible to identify the source microphone from speech recordings. The main aim of this study is to find out which of the several feature extraction techniques are best suited to the source microphone identification systems. We perform microphone identification experiments with 16 different microphones using 3 datasets. In order to improve the results on the datasets, we also investigate the important parameters that may affect the microphone identification performance. Our experimental results show that the proposed method is comparable to the existing studies in a closed-set identification rate.

___

  • S. Bayram, H.T. Sencar, N. Memon, ˙I. Avcıba¸s, “Source camera identification based on CFA interpolation”, Proceedings of the IEEE International Conference on Image Processing, Vol. 3, pp. 69–72, 2005.
  • J. Lukas, J. Fridrich, M. Goljan, “Digital camera identification from sensor pattern noise”, IEEE Transactions on Information Forensics and Security, Vol. 1, pp. 205–214, 2006.
  • H. Farid, “Detecting digital forgeries using bispectral analysis”, Technical Report Perceptual Science Group, Massachusetts Institute of Technology, 1999.
  • M. Kajstura, A. Trawinska, J. Hebenstreit, “Application of the electrical network frequency (ENF) criterion. A case of a digital recording”, Forensic Science International, Vol. 155, pp. 165–171, 2005.
  • C. Grigoras, “Applications of ENF criterion in forensic audio, video, computer, and telecommunication analysis”, Forensic Science International, Vol. 167, pp. 136–145, 2007.
  • D.P. Nicolalde, J.A. Apolin´ ario, L.W.P. Biscainho, “Audio authenticity: detecting ENF discontinuity with high precision phase analysis”, IEEE Transactions on Information Forensics and Security, Vol. 5, pp. 534–543, 2010. R. Yang, Q. Zhenhua, H. Jiwu, “Detecting digital audio forgeries by checking frame offsets”, Proceedings of the 10th ACM workshop on Multimedia and Security, pp. 21–26, 2008.
  • L. Burget, P. Matejka, P. Schwarz, O. Glembek, J. Cernock´ y, “Analysis of feature extraction and channel compensation in a GMM speaker recognition system”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 15, pp. 1979–1986, 2007.
  • D.A. Reynolds, “HTIMIT and LLHDB: speech corpora for the study of handset transducer effects”, IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 1535–1538, 1997.
  • C. Kraetzer, A. Oermann, J. Dittmann, A. Lang, “Digital audio forensics: a first practical evaluation on microphone and environment classification”, Proceedings of the 9th Workshop on Multimedia and Security, pp. 63–74. 2007. R. Buchholz, C. Kraetzer, J. Dittmann, “Microphone classification using Fourier coefficients”, Proceedings of the 11th Information Hiding Workshop, Vol. 5806, pp. 235–246, 2009.
  • C. Kraetzer, M. Schott, J. Dittmann, “Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models”, Proceedings of the 11th Workshop on Multimedia and Security, pp. 49–56, 2009.
  • D. Garcia Romero, C.Y. Espy Wilson, “Automatic acquisition device identification from speech recordings”, Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, pp. 1806–1809, 2010. C. Kraetzer, K. Qian, M. Schott, J. Dittmann, “A context model for microphone forensics and its application in evaluations”, Media Watermarking, Security, and Forensics III, Vol. 7880, 2011.
  • C. Hanil¸ ci, F. Erta¸s, T. Erta¸s, ¨ O. Eskidere, “Recognition of brand and models of cell-phones from recorded speech signals”, IEEE Transactions on Information Forensics and Security, Vol. 7, pp. 625–634, 2012.
  • J.L. Flamagan, “A singular advance in conversion of acoustic signals to electrical form: the electret microphone”, IEEE Signal Processing Magazine, Vol. 27, pp. 102–116, 2010.
  • W.J. Wang, R.M. Lin, Q.B. Zhou, X.X. Li, “Modeling and characterization of a silicon condenser microphone”, Journal of Micromechanics and Microengineering, Vol. 14, pp. 403–409, 2004.
  • P.R. Scheeper, A.G.H. Van der Donk, W. Olthuis, P. Bergveld, “A review of silicon microphones”, Sensors and Actuators A: Physical, Vol. 44, pp. 1–11, 1994.
  • P.C. Hsu, C.H. Mastrangelo, K.D. Wise, “A high sensitivity polysilicon diaphragm condenser microphone”, Proceedings of the IEEE Workshop on Micro Electro Mechanical Systems, pp. 580–585, 1998.
  • Q. Zou, Z. Tan, Z. Wang, J. Pang, X. Qian, Q. Zhang, R. Lin, S. Yi, H. Gong, L. Liu, Z. Li, “A novel integrated silicon capacitive microphone-floating electrode ‘electret’ microphone (FEEM)”, Journal of Microelectromechanical Systems, Vol. 7, pp. 224–234, 1998.
  • L.R. Rabiner, B.H. Juang, Fundamentals of Speech Recognition, New Jersey, Prentice Hall, 1993.
  • H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech”, Journal of the Acoustical Society of America, Vol. 87, 1738–1752, 1990.
  • P. Melmerstein, S. Davis, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 28, pp. 357–336, 19 R. Mammone, X. Zhang, R. Ramachandran, “Robust speaker recognition: a feature-based approach”, IEEE Signal Processing Magazine, Vol. 13, pp. 58–71, 1996.
  • M. Slaney, Auditory Toolbox: A MATLAB Toolbox for Auditory Modeling, Work Technical Report, Interval Research Corporation, pp. 29–32, 1998.
  • D.A. Reynolds, R.C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions on Speech Audio Processing, Vol. 3, pp. 72–83, 1995.
  • E. Wong, S. Sridharan, “Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification”, International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 95–98, 2002.
  • W.M. Campbell, J.P. Campbell, D.A. Reynolds, E. Singer, P.A. Torres-Carrasquillo, “Support vector machines for speaker and language recognition”, Computer Speech and Language, Vol. 20, pp. 210–229, 2006.