Speech Pre-Processing for Pitch and Pitch-Cylce Evolutions Smoothing

In low bit rate speech coders, pitch is usually transmitted once per frame and, when needed, the intermediate pitch values are obtained by interpolation between 2 adjacent pitch values. Although pitch usually evolves slowly, sometimes it has irregular variations and the estimated pitch differs from the real one. In addition, some speech coders, e.g., waveform interpolation coders, rely on smooth pitch-cycle evolutions to extract speech model parameters in the analysis stage. However, non-stationary characteristics of speech may lead to inaccurate estimation of the parameters. This affects the synthesised speech quality. We propose a pre-processor, which modifies the residual speech signal to provide smooth pitch variations and pitch-cycle evolutions, without distorting perceptual speech quality. Thus, the pitch and the voicing level can be more accurately determined.

Speech Pre-Processing for Pitch and Pitch-Cylce Evolutions Smoothing

In low bit rate speech coders, pitch is usually transmitted once per frame and, when needed, the intermediate pitch values are obtained by interpolation between 2 adjacent pitch values. Although pitch usually evolves slowly, sometimes it has irregular variations and the estimated pitch differs from the real one. In addition, some speech coders, e.g., waveform interpolation coders, rely on smooth pitch-cycle evolutions to extract speech model parameters in the analysis stage. However, non-stationary characteristics of speech may lead to inaccurate estimation of the parameters. This affects the synthesised speech quality. We propose a pre-processor, which modifies the residual speech signal to provide smooth pitch variations and pitch-cycle evolutions, without distorting perceptual speech quality. Thus, the pitch and the voicing level can be more accurately determined.

___

  • W. B. Kleijn, “Waveform Interpolation for Speech Coding and Synthesis,” in Speech Coding and Synthesis, pp. 175-208, Elsevier Science B.V., 1995.
  • S. Lynn, C. Ronald and C. John, “MELP: The New Federal Standard at 2400 bps”, in IEEE ICASSP’97 Conference, Munich, Germany, pp. 1591-1594.
  • T. Eriksson and W. B. Kleijn, “On waveform interpolation coding with asymptotically perfect reconstruction,” Proc. Int. Conf. Acoust. Speech Sign. Process, pp. 147-150, 1999.
  • TIA/EIA, Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems, IS-127, 1997.
  • Y. M. Cheng and D. O’Shaughnessy,” Automatic and reliable estimation of glottal closure instant and period,” IEEE Transaction on Speech and Audio Processing, Vol. 37, No. 12, December 1989.
  • H. Farsi, Advanced Pre-and-post processing techniques for speech coding. Ph.D. thesis, University of Surrey, June 2003.
  • William H. Press, Saul A. Teukolsky and William T. Vetterling, Numerical Recipes in C. Cambridge University Press, Cambridge, 2002.
  • A. M. Kondoz, Digital speech: coding for low bit rate communication systems. John Wiley, UK, 1994.
  • http://www.ntt-at.com/products e/speech2002/
  • W. B. Kleijn and K. K. Paliwal, Speech Coding and Synthesis. Elsevier Science, Amsterdam, The Netherland, 1998.