Nonlocal means estimation of intrinsic mode functions for speech enhancement

The main aim of this paper is to introduce a new approach to enhance speech signals by exploring the advantages of nonlocal means NLM estimation and empirical mode decomposition. NLM, a patch-based denoising method, is extensively used for two-dimensional signals like images. However, its use for one-dimensional signals has been attracting more attention recently. The NLM-based approach is quite useful for removing low-frequency noises based on nonlocal similarities present among samples of the signal. However, there is an issue of under averaging in the high-frequency regions. The temporal and spectral characteristics of the speech signal are changing markedly over time. Thus NLM is conventionally not effective to remove the noise components from the speech signal, unlike image denoising. To address this issue, initially, the speech signal is decomposed into oscillatory components called intrinsic mode functions IMFs by using a temporal decomposition technique known as the sifting process. Each IMF represents signal information at a certain scale or frequency band. The IMFs do not have abrupt power spectral changes over time. The decomposed IMFs are processed using NLM estimation based on nonlocal similarities for better speech enhancement. The simulation result shows that the proposed method gives better performance in terms of subjective and objective quality measures. Its performance is evaluated for white, factory, and babble noises at different signal to noise ratios

___

  • [1] Loizou PC. Speech Enhancement: Theory and Practice. 2nd Edition. Boca Raton, FL, USA: CRC Press, Inc., 2013.
  • [2] Weiss MR, Aschkenasy E, Parsons TW. Study and Development of the Intel Technique for Improving Speech Intelligibility. Northvale, NJ, USA: Nicolet Scientific Corperation, 1975.
  • [3] Cohen I. Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing 2003; 11 (5): 466-475.
  • [4] Lu Y, Loizou PC. Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty. IEEE Transactions on Audio, Speech, and Language Processing 2011; 19 (5): 1123-1137.
  • [5] Virag N. Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing 1999; 7 (2): 126-137.
  • [6] Lu Y, Loizou PC. A geometric approach to spectral subtraction. Speech Communication 2008; 50 (6): 453-466.
  • [7] Proakis JG, Ling F, Nikias C. Advanced Topics in Digital Signal Processing. Minneapolis, MN, USA: Prentice Hall Professional Technical Reference, 1992.
  • [8] Vaseghi SV. Advanced Digital Signal Processing and Noise Reduction. Hoboken, NJ, USA: Wiley & Sons, Inc., 2008.
  • [9] Shao Y, Chang CH. A versatile speech enhancement system based on perceptual wavelet denoising. In: 2005 IEEE International Symposium on Circuits and Systems; Kobe, Japan; 2005. pp. 864-867.
  • [10] Daqrouq K, Ibrahim IN, Abu-Isbeih, Daoud O, Khalaf E. An investigation of speech enhancement using wavelet filtering method. International Journal of Speech Technology 2010; 13 (2): 101-115.
  • [11] Huang NE, Shen Z, Long SR, Wu MC, Shih HH et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. In: Proceedings of the Royal Society of London, Series A: mathematical, physical and engineering sciences 1998; 454 (1971): 903-995.
  • [12] Khaldi K, Boudraa AO, Bouchikhi A, Alouane MTH. Speech enhancement via EMD. EURASIP Journal on Advances in Signal Processing 2008; 2008 (1): 873204. doi: 10.1155/2008/873204
  • [13] Tracey BH, Miller EL. Nonlocal means denoising of ECG signals. IEEE Transactions on Biomedical Engineering 2012; 59 (9): 2383-2386.
  • [14] Buades A, Coll B, Morel JM. A review of image denoising algorithms, with a new one. SIAM Journal on Multiscale Modeling and Simulation: A SIAM Interdisciplinary Journal 2005; 4 (2): 490-530.
  • [15] Singh P, Pradhan G, Shahnawazuddin S. Denoising of ECG signal by non-local estimation of approximation coefficients in DWT. Biocybernetics and Biomedical Engineering 2017; 37 (3): 599-610.
  • [16] Van De Ville D, Kocher M. SURE-based non-local means. IEEE Signal Processing Letters 2009; 16 (11): 973-976.
  • [17] Deepak K, Prasanna S. Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Transactions on Audio, Speech and Language Processing 2016; 24 (7): 1204-1218.
  • [18] Upadhyay A, Pachori R. Speech enhancement based on MEMD-VMD method. Electronics Letters 2017; 53(7): 502-504.
  • [19] Srinivas N, Pradhan G, Shahnawazuddin S. Enhancement of noisy speech signal by non-local means estimation of variational mode functions. In: 2018 Interspeech, Hyderabad, India; 2018. pp. 1156-1160.
  • [20] Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS. DARPAD TIMIT acoustic-phonetic continuous speech corpus CD-ROM. Nist Speech Disc 1-1.1. NASA STI/Recon Technical Report N, 1993.
  • [21] Varga A, Steeneken HJ. Assessment for automatic speech recognition:II. Noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication 1993; 12 (3): 247-251.
  • [22] Hu Y, Loizou PC. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing 2008; 16 (1): 229-238.
  • [23] Hu Y, Loizou PC. Evaluation of objective measures for speech enhancement. In: 2006 Ninth International Conference on Spoken Language Processing; Pittsburgh, PA, USA; 2006. pp.1-4.