Özkan ARSLAN, Erkan Zeki ENGİN

Speech enhancement using adaptive thresholding based on gamma distribution of Teager energy operated intrinsic mode functions

This paper introduces a new speech enhancement algorithm based on the adaptive threshold of intrinsicmode functions (IMFs) of noisy signal frames extracted by empirical mode decomposition. Adaptive threshold values are estimated by using the gamma statistical model of Teager energy operated IMFs of noisy speech and estimated noise based on symmetric Kullback–Leibler divergence. The enhanced speech signal is obtained by a semisoft thresholding function, which is utilized by threshold IMF coefficients of noisy speech. The method is tested on the NOIZEUS speech database and the proposed method is compared with wavelet-shrinkage and EMD-shrinkage methods in terms of segmental SNR improvement (SegSNR), weighted spectral slope (WSS), and perceptual evaluation of speech quality (PESQ). Experimental results show that the proposed method provides a higher SegSNR improvement in dB, lower WSS distance, and higher PESQ scores than wavelet-shrinkage and EMD-shrinkage methods. The proposed method shows better performance than traditional threshold-based speech enhancement approaches from high to low SNR levels.

PDF

___

[1] Loizou PC, Lobo A, Hu Y. Subspace algorithms for noise reduction in cochlear implants. J Acoust Soc Am 2005; 5: 2791-2793.
[2] Paliwal KK, Schwerin B, Wójcicki KK. Modulation domain spectral subtraction for speech enhancement. In: Tenth Annual Conference of the International Speech Communication Association; 6–10 September 2009; Brighton, UK. pp. 1327-1330.
[3] Lu Y, Loizou PC. A geometric approach to spectral subtraction. Speech Commun 2008; 6: 453-466.
[4] Hu Y, Loizou PC. A comparative intelligibility study of single-microphone noise reduction algorithms. J Acoust Soc Am 2007; 3: 1777-1786.
[5] Kasap C, Arslan M. A unified approach to speech enhancement and voice activity detection. Turk J Elec & Eng and Comp Sci 2014; 21: 527-547.
[6] Savoji MH, Chehrehsa S. Speech enhancement using Gaussian mixture models, explicit Bayesian estimation and Wiener filtering. Iranian Journal of Electrical and Electronic Eng 2014; 10: 168-175.
[7] Ghanbari Y, Karami-Mollaei MR. A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Commun 2006; 48: 927-940.
[8] Cohen I. Enhancement of speech using bark-scaled wavelet packet decomposition. In: 7th European Conference on Speech Communication and Technology; 3–7 September 2001; Aalborg, Denmark. pp. 1933-1936.
[9] Khaldi K, Boudraa A, Bouchikhi A, Alouane M. Speech enhancement via EMD. EURASIP J Adv Signal Process 2008; 1: 1-8.
[10] Kopsinis Y, Mclaughlin S. Development of EMD-based denoising methods inspired by wavelet thresholding. IEEE T Signal Process 2009; 57: 1351-1362.
[11] Hamid ME, Das S, Hirose K, Molla M. Speech enhancement using EMD based adaptive soft-thresholding (EMDADT). International Journal of Signal Processing, Image Processing and Pattern Recognition 2012; 5: 1-16.
[12] Hamid ME, Molla MKI, Dang X, Nakai T. Single channel speech enhancement using adaptive soft-thresholding with bivariate EMD. ISRN Signal Process 2013; 20: 1-9.
[13] Ephraim Y. Statistical-model-based speech enhancement systems. P IEEE 1992; 80: 1526-1555.
[14] Donoho DL, Johnstone JM. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994; 81: 425-455.
[15] Donoho DL. De-noising by soft-thresholding. IEEE T Inf Theory 1995; 41: 613-627.
[16] Sharma R, Prasanna SM. A better decomposition of speech obtained using modified empirical mode decomposition. Digit Signal Process 2016; 58: 26-39.
[17] Kemiha M. Empirical mode decomposition and normal shrink tresholding for speech denoising. Int J Inf Theory 2014; 3: 27-35.
[18] Khaldi K, Boudraa AO, Komaty A. Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator. J Acoust Soc Am 2014; 135: 451-459.
[19] Sanam TF, Shahnaz C. A combination of semisoft and µ-law thresholding functions for enhancing noisy speech in wavelet packet domain. In: 7th International Conference on Electrical & Computer Engineering (ICECE); 20–22 December 2012; Dhaka, Bangladesh. pp. 884-887.
[20] Sanam TF, Shahnaz C. Teager energy operation on wavelet packet coefficients for enhancing noisy speech using a hard thresholding function. Signal Processing: An International Journal 2012; 6: 22-35.
[21] Islam MT, Shahnaz C, Zhu WP, Ahmad MO. Speech enhancement based on student t modeling of teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM T Audio Speech Lang Process 2015; 23: 1800-1811.
[22] Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE/ACM T Audio Speech Lang Process 1985; 33: 443-445.
[23] Zao L, Coelho R, Flandrin P. Speech enhancement with emd and hurst-based mode selection. IEEE/ACM T Audio Speech Lang Process 2014; 22: 899-911.
[24] Nsabimana FX, Subbaraman V, Zölzer U. A single channel speech enhancement technique exploiting human auditory masking properties. Adv Radio Sci 2010; 8: 95-99.
[25] Soon Y, Koh SN, Yeo CK. Improved noise suppression filter using self-adaptive estimator of probability of speech absence. Signal Process 1999; 75: 151-159.
[26] Kim NS, Chang JH. Spectral enhancement based on global soft decision. IEEE Signal Process Lett 2000; 7: 108-110.
[27] Khaldi K. Processing and analysis of sounds signals by Huang transform (empirical mode decomposition: EMD). PhD, Télécom Bretagne, Université de Bretagne Occidentale, Brest, France, 2012.
[28] Kaiser JF. Some useful properties of Teager’s energy operators. In: IEEE International Conference on Acoustics, Speech, and Signal Processing; 27–30 April 1993; Minneapolis, MN, USA. pp. 149-152.
[29] Maragos P, Quatieri TF, Kaiser JF. Speech nonlinearities, modulations, and energy operators. In: International Conference on Acoustics, Speech, and Signal Processing; 14–17 April 1991; Toronto, Canada. pp. 421-424.
[30] Bahoura M, Rouat J. Wavelet speech enhancement based on the Teager energy operator. IEEE Signal Process Lett 2001; 8: 10-12.
[31] Islam MT, Shahnaz C, Zhu WP, Ahmad MO. Rayleigh modeling of Teager energy operated perceptual wavelet packet coefficients for enhancing noisy speech. Speech Commun 2017; 86: 64-74.
[32] Joyce JM. Kullback-Leibler divergence. In: International Encyclopedia of Statistical Science. Heidelberg, Germany: Springer, 2011. pp. 720-722.
[33] Hershey JR, Olsen PA. Approximating the Kullback Leibler divergence between Gaussian mixture models. In: 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing; 15–20 April 2007; Honolulu, HI, USA. pp. 317-320.
[34] Erkelens JS, Hendriks RC, Heusdens R, Jensen J. Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE T Audio Speech Lang Process 2007; 15: 1741-1752.
[35] Sun P, Qin J. Wavelet packet transform based speech enhancement via two-dimensional SPP estimator with generalized gamma priors. Arch Acoust 2016; 41: 579-590.
[36] Rudresh MD, Rajeshwari KM, Sujatha S, Suresh M. EMD based speech enhancement using soft and hard threshold techniques. Int J Res Eng Technol 2016; 5: 534-541.
[37] Deger E, Molla MKI, Hirose K, Minematsu N, Hasan MK. Speech enhancement using soft thresholding with DCTEMD based hybrid algorithm. In: 15th European Signal Processing Conference; 3–7 September 2007; Poznan, Poland. pp. 75-79.
[38] Loizou PC. Speech Enhancement: Theory and Practice. Boca Raton, FL, USA: CRC Press, 2007. pp. 503-505.
[39] ITU. Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs. Rec. ITU-T P. 862. Geneva, Switzerland: ITU, 2001.
[40] Hu Y, Loizou PC. Evaluation of objective quality measures for speech enhancement. IEEE T Audio Speech Lang Process 2008; 16: 229-238.
[41] Varga A, Steeneken HJ. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 1993; 12: 247-251.