Deep Q-network-based noise suppression for robust speech recognition

Deep Q-network-based noise suppression for robust speech recognition

This study develops the deep Q-network (DQN)-based noise suppression for robust speech recognition purposes under ambient noise. We thus design a reinforcement algorithm that combines DQN training with a deep neural networks (DNN) to let reinforcement learning (RL) work for complex and high dimensional environments like speech recognition. For this, we elaborate on the DQN training to choose the best action that is the quantized noise suppression gain by the observation of noisy speech signal with the rewards of DQN including both the word error rate (WER) and objective speech quality measure. Experiments demonstrate that the proposed algorithm improves speech recognition in various noisy conditions while reducing the computational burden compared to the DNN-based noise suppression method.

___

  • [1] Xu Y, Du J, Dai LR, Lee CH. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing 2015; 23 (1): 7-9. doi: 10.1109/TASLP.2014.2364452
  • [2] Narayanan A, Wang DL. Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: IEEE 2013 International Conference on Acoustics, Speech, and Signal Processing; Vancouver, Canada; 2013. pp. 7092-7096.
  • [3] Deng L, Yu D, Hinton G. Deep learning for speech recognition and related applications. In: Annual Conference on Neural Information Processing Systems 2009; Vancouver, Canada; 2009.
  • [4] Jin Z, Wang D. A supervised learning approach to monaural segregation of reverberant speech. IEEE Transactions on Audio, Speech, and Language Processing 2009; 17 (4): 625-638. doi: 10.1109/TASL.2008.2010633
  • [5] Williamson DS, Wang Y, Wang DL. Complex ratio masking for monaural speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2016; 24 (3): 483-492. doi: 10.1109/TASLP.2015.2512042
  • [6] Arslan Ö, Engín EZ. Speech enhancement using adaptive thresholding based on gamma distribution of Teager energy operated intrinsic mode functions. Turkish Journal of Electrical Engineering & Computer Sciences 2019; 27 (2): 1355-1370. doi: 10.3906/elk-1804-18
  • [7] Zhao H, Zarar S, Tashev I, Lee CH. Convolutional-Recurrent Neural Networks for Speech Enhancement. In: IEEE 2018 International Conference on Acoustics, Speech, and Signal Processing; Calgary, Canada; 2018. pp. 2401-2405.
  • [8] Kabir H, Abdar M, Jalali SMJ, Khosravi A, Atiya AF et al. SpinalNet: Deep neural network with gradual input. arXiv 2020. arXiv:2007.03347v2
  • [9] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J. Human-level control through deep reinforcement learning. Nature 2015; 518: 529-533. doi: 10.1038/nature14236
  • [10] Sugiyama M. Statistical Reinforcement Learning: Modern Machine Learning Approaches. Boca Raton, FL, USA: Chapman and Hall/CRC, 2015.
  • [11] Koizumi Y, Niwa K, Hioka Y, Kobayashi K, Haneda Y. DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements. In: IEEE 2017 International Conference on Acoustics, Speech, and Signal Processing; New Orleans, USA; 2017. pp. 81-85.
  • [12] Shen Y, Huang C, Wang S, Tsao Y, Wang H et al. Reinforcement learning based speech enhancement for robust speech recognition. In: IEEE 2019 International Conference on Acoustics, Speech, and Signal Processing; Brighton, UK; 2019. pp. 6750-6754.
  • [13] Fakoor R, He X, Tashev I, Zarar S. Reinforcement learning to adapt speech enhancement to instantaneous input signal quality. In: Annual Conference on Neural Information Processing Systems 2017; Long Beach, CA, USA; 2017.
  • [14] Kala T, Shinozaki T. Reinforcement learning of speech recognition system based on policy gradient and hypothesis selection. In: IEEE 2018 International Conference on Acoustics, Speech, and Signal Processing; Calgary, AB, Canada; 2018. pp. 5759-5763.
  • [15] Hori T, Astudillo R, Hayashi T, Zhang Y, Watanabe S et al. Cycle-consistency training for end-to-end speech recognition. In: IEEE 2019 International Conference on Acoustics, Speech, and Signal Processing; Brighton, UK; 2019. pp. 4723-4725.
  • [16] Hu Y, Loizou P. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing 2008; 16 (1): 229-238. doi: 10.1109/TASL.2007.911054
  • [17] Bellman RE. Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1957.
  • [18] ITU-T, Rec. P.862. Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. International Telecommunication UnionTelecommunication Standardization Sector 2001.
  • [19] Zhang XL, Wang D. Boosting contextual information for deep neural network based voice activity detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2016; 24 (2): 252-264. doi: 10.1109/TASLP.2015.2505415
  • [20] Agarap AF. Deep learning using rectified linear units (relu). arXiv 2018. arXiv:1803.08375.
  • [21] Kingma D, Ba J. Adam: a method for stochastic optimization. In: International Conference on Learning Representations 2015; San Diego, CA, USA; 2015.
  • [22] Barker J, Marxer R, Vincent E, Watanabe S. The third CHiME speech separation and recognition challenge: dataset, task and baselines. In: IEEE 2015 Workshop on Automatic Speech Recognition and Understanding; Scottsdale, AZ, USA; 2015.
  • [23] Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O et al. The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding; Waikoloa, HI, USA; 2011.
  • [24] Taal CH, Hendriks RC, Heusdens R, Jensen J. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In: Proceedings of the IEEE 2010 International Conference on Acoustics, Speech and Signal Processing; Dallas, TX, USA; 2010. pp. 4214–4217.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: 6
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Detecting and correcting automatic speech recognition errors with a new model

Necaattin BARIŞÇI, Nursal ARICI, Recep Sinan ARSLAN, Sabri KOÇER

Privacy preserving hybrid recommender system based on deep learning

Sangeetha SELVARAJ, Sudha Sadasivam GANGADHARAN

Multiagent Q-learning based UAV trajectory planning for effective situational awareness

Kubilay DEMİR, Halil YETGİN, Erdal AKIN

Deep Q-network-based noise suppression for robust speech recognition

Tae-Jun PARK, Joon-Hyuk CHANG

Ordered physical human activity recognition based on ordinal classification

Duygu BAĞCI DAŞ, Derya BİRANT

Analyzing students’ experience in programming with computational thinking through competitive, physical, and tactile games: the quadrilateral method approach

Mohammad Ahsan HABIB, Raja-Jamilah RAJA-YUSOF, Siti Salwah SALIM, Asmiza Abdul SANI, Hazrina SOFIAN, Aishah ABU BAKAR

Exploring the attention process differentiation of attention deficit hyperactivity disorder (ADHD) symptomatic adults using artificial intelligence on electroencephalography (EEG) signals

Gokhan Guney, Esra Kisacik, Canan Kalaycioglu, Gorkem Saygili

Radar-based microwave breast cancer detection system with a high-performance ultrawide band antipodal Vivaldi antenna

Hüseyin Özmen, Muhammed B. Kurt

Line-of-sight rate construction for a roll-pitch gimbal via a virtual pitch-yaw gimbal

Oğuzhan ÇİFDALÖZ

An ADMM-based incentive approach for cooperative data analysis in edge computing

Weiwei FANG, Xue WANG, Qingli WANG, Yi DING