Dar Bantlı Konuşma İzgel Zarfının Yapay Sinir Ağları Kullanılarak Genişletilmesi ve Kalitesinin Artırılması

Geleneksel telefon sistemlerinde, 7−8 kHz’e kadar bileşenleri olan insan sesi, 3.4 kHz bant genişliğine sahip bir alçak geçirgen süzgeçten geçirilip 8 kHz’te örneklenerek gönderilir. Kodlama ve benzeri işlemlerden kaynaklanan hiçbir kayıp olmasa bile, yüksek frekans bölgesinin süzülerek kaybolmasından kaynaklanan bir kalite kaybı söz konusudur. Bu kayıp, anlaşılabilirliği pek etkilememekle beraber, konuşma kalitesinde hissedilir bir bozulmaya yol açmaktadır. Bu çalışmada, izgel zarfın düşük frekanslı bölgelerinden faydalanılarak, süzülen yüksek frekanslı bölgelerin izgel zarfı elde edilmiştir. İzgel zarfın genişletilmesi olarak da nitelenebilecek bu işlem için yapay sinir ağları yöntemi kullanılmıştır. Daha sonra genişletilen bu izgel zarf kullanılarak yüksek frekanslı bölgeler kaynak-süzgeç modeliyle yeniden oluşturulmuş ve böylece konuşma kalitesinin artırılması hedeflenmiştir. Geliştirilen bu yöntem telefon kalitesindeki konuşmalar için kullanılmakla birlikte, daha düşük bant genişliğine (1.8 kHz) sahip konuşmalar için de kullanılabilir.

Bandwidth Expansion of Nawrrowbad Speech using Artificial Neural Networks and Quality Enhancement

In traditional telephony, human speech which has components up to 7-8 kHz is transmitted after lowpass filtering at 3.4 kHz and sampling at 8 kHz. Even if this filtering operation has little effect on intelligibility, it generally causes a perceptible degradation in speech quality. This work deals with the reconstruction of this lost high frequency component of telephone speech using the low frequency part of the speech. Artificial neural networks are used to obtain the high frequency region, which can also be named as spectral envelope extrapolation. Then, source excitation model is used to regenerate the high frequency speech using this extended high frequency region. This method is developed to enhance the telephone quality speech, however it may also be used for speech signals which have lower bandwidths (1.8 kHz).

___

  • [1] European Telecommunications Standards Institute (ETSI), "Speech Processing Transmission & Quality Aspects (STQ) QoS parameter definitions and measurements," ETSI, ETSI EG 201 769-1 [online], http://www.etsi.org/deliver/etsi_eg/201700_2017 99/20176901/01.01.01_60/ eg_20176901v010101p.pdf, v1.1.1, 2000.
  • [2] O’Shaughnessy, D. 1987. Speech Communications - Human and Machine. Addison-Wesley, Reading, Massachusetts.
  • [3] Patrick, P. J., Steele, R. ve Xydeas, C. S., 1993. Frequency compression of 7.6 kHz speech into 3.3 khz bandwith, IEEE Trans. on Comm., Cilt. 31(5), s. 692–701. DOI: 10.1109/TCOM.1983.1095876.
  • [4] Epps, J. ve Holmes, W. H. 2001. A new very low bit rate wideband speech coder with a sinusoidal highband model, IEEE International Symposium on Circuits and Systems ISCAS, Sydney.
  • [5] Geiser, B. ve Vary, P. 2007. Backwards compatible wideband telephony in mobile networks: Celp watermarking and bandwidth extension, Proc. of ICASSP, Honolulu.
  • [6] Prasad, N ve Kumar, T. K. 2016. Speech bandwidth extension using magnitude spectrum data hiding, Int. Conf. on Computing, Analytics and Security Trends (CAST), Pune.
  • [7] Carl, H., Heute U. 1994. Bandwidth enhancement of narrowband speech signals, Procceedings of EUSIPCO, VII European Signal Processing Conference, Edinburg, 1178–1181.
  • [8] Yoshida, Y., and Abe, M 1994. An algorithm to reconstruct wideband speech from narrowband speech based on codebook mapping, Proc. Int. Conf. on Spoken Language Processing (ICSLP), Yokohama, 1591-1594.
  • [9] Chan, C-F., and Hui, W-K. 1996. Wideband enhancement of narrowband coded speech using MBE re-synthesis, Proc. Of 3rd Int. Conf. on Signal Processing (ICSP), Beijing.
  • [10] Epps, J., Holmes, W. H. 1998. Speech enhancement using STC-based bandwidth extension, Proc. Int. Conf. on Spoken Language Processing (ICSLP), Sidney.
  • [11] Epps, J., Holmes, W. H. 1999. A new technique for wideband enhancement of coded narrowband speech, Proc. of IEEE Workshop on Speech Coding, Porvoo. DOI: 10.1109/SCFT.1999.781522.
  • [12] Jax, P., Vary, P., 2000. Wideband extension of telephone speech using a hidden markov model, Proc. of IEEE Workshop on Speech Coding, Delevan. DOI: 10.1109/SCFT.2000.878427.
  • [13] Gustafsson, H., Lindgren, U.A., Claesson, I. 2006. Lowcomplexity featuremapped speech bandwidth extension, IEEE Trans. on Audio, Speech and Language Processing, Cilt. 14(2), pp. 577–588. DOI: 10.1109/TSA.2005.855837.
  • [14] Chennoukh, S., Gerrits, A., Miet, G., Sluijter, R. 2001. Speech enhancement via frequency bandwidth extension using line spectral frequencies, Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing ICASSP-2001, Salt Lake City, DOI: 10.1109/ICASSP.2001.940919.
  • [15] Vaseghi, S., Zavarehei, E., Qin, Y. 2006. Speech bandwidth extension: Extrapolations of spectral envelop and harmonicity quality of excitation, Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing ICASSP-2006, Toulouse, DOI: 10.1109/ICASSP.2006.1660786.
  • [16] Pulakka, H., and Alku, P. 2011. Bandwidth extension of telephone speech using a neural network and a filter bank implementation for highband mel spectrum, IEEE Trans. on Audio, Speech, and Language Processing, Cilt. 19(7), pp. 2170 – 2183. DOI: 10.1109/TASL.2011.2118206.
  • [17] Kontio, J., Laaksonen, L., Alku, P. 2007. Neural network-based artificial bandwidth expansion of speech, IEEE Trans. on Audio, Speech, and Language Processing, Cilt. 15(3), pp. 873 – 881.
  • [18] Wang, Y., Zhao, S., Liu, W., Li, M., Kuang, J., 2015., Speech bandwidth expansion based on deep neural networks, Proc. of INTERSPEECH, Dresden.
  • [19] Prasad, N., Kumar, T. K. 2016. Bandwidth extension of speech signals: A comprehensive review, I.J. Intelligent Systems and Applications, Cilt. 2016(2), pp. 45-52, DOI: 10.5815/ijisa.2016.02.06.
  • [20] Lahouti, F., Fazel, A. R., Safavi-Naeini, A. H., Khandani, A. K. 2006. Single and double frame coding of speech LPC parameters using a lattice-based quantization scheme, IEEE Trans. on Audio, Speech and Language Processing, Cilt. 14(5), s. 1624–1701. DOI: 10.1109/TSA.2005.858560.
  • [21] Rabiner, L. R., Schafer, R. W. 1978. Digital Processing of Speech Signals, Prentice-Hall, New Jersey.
  • [22] Kay, S. M. 1988. Modern Spectral Estimation: Theory and Application, Prentice-Hall, New Jersey.
  • [23] Moon, T. K., Stirling, W. C. 2000. Mathematical Methods and Algorithms for Signal Processing, Prentice-Hall, New Jersey.
  • [24] Viswanathan, R., Makhoul, J. 1975. Quantization properties of transmission parameters in linear predictive systems, IEEE Trans. on Acoustics, Speech and Signal Processing, Cilt. 23(3), s. 309–321. DOI: 10.1109/TASSP.1975.1162675.
  • [25] Gray, Jr. A. H., Markel, J. D. 1976. Quantization and bit allocation in speech processing, IEEE Trans. on Acoustics, Speech and Signal Processing, Cilt. 24( 6), s. 459–473. DOI: 10.1109/TASSP.1976.1162857.
  • [26] F. Itakura, 1975 Line spectrum representation of linear predictive coefficients of speech signals, Journal of the Acoustical Society of America, Cilt. 57(1), p. 35. DOI: 10.1121/1.1995189
  • [27] Gray, R., Buzo, A., Gray, A., Matsuyama, Y. 1980. Distortion measures for speech processing, IEEE Trans. on Acoustics, Speech, and Signal Processing, Cilt. 28(4), s. 367–376. DOI: 10.1109/TASSP.1980.1163421.
  • [28] P. Jax. 2004 Bandwidth Extension for Speech, s. 171- 235. Larsen, E., Aarts, R.M. 2004. Audio Bandwidth Extension: Application of Psychoacoustics, Signal Processing and Loudspeaker Design, John Wiley & Sons.