Compressing English Speech Data with Hybrid Methods without Data Loss

Compressing English Speech Data with Hybrid Methods without Data Loss

Understanding the mechanism of speech formation is of great importance in the successful coding of the speech signal. It is also used for various applications, from authenticating audio files to connecting speech recording to data acquisition device (e.g. microphone). Speech coding is of vital importance in the acquisition, analysis and evaluation of sound, and in the investigation of criminal events in forensics. For the collection, processing, analysis, extraction and evaluation of speech or sounds recorded as audio files, which play an important role in crime detection, it is necessary to compress the audio without data loss. Since there are many voice changing software available today, the number of recorded speech files and their correct interpretation play an important role in detecting originality. Using various techniques such as signal processing, noise extraction, filtering on an incomprehensible speech recording, improving the speech, making them comprehensible, determining whether there is any manipulation on the speech recording, understanding whether it is original, whether various methods of addition and subtraction are used, coding of sounds, the code must be decoded and the decoded sounds must be transcribed. In this study, first of all, what sound coding is, its purposes, areas of use, classification of sound coding according to some features and techniques are given. Moreover, in our study speech coding was done on the English audio data. This dataset is the real dataset and consists of approximately 100000 voice recordings. Speech coding was done using waveform, vocoders and hybrid methods and the success of all the methods used on the system we created was measured. Hybrid models gave more successful results than others. The results obtained will set an example for our future work.

___

  • [1] Johnson M.,”Speech Coding:Fundatemal and Applications”, EOT 156.
  • [2] Lilly B.T,Paliwal K.,”Effect of Speech Coders on Speech Recognition Performance”, Fourth International Conference Spoken Language,vol.4,pp.2344-2347,1996.
  • [3] Cox R.V,Kroon P.,”Low Bit-Rate speech coders for Multimedia Communications”,IEEE Communications Magazine,vol.34,issue 12,pp.34-41,1996.
  • [4] Cox R.V,”Three new speech coders from the ITU cover a range of applications”,IEEE Communications Magazine,vol.35,issue 9,pp.40-47,1997.
  • [5] Kim H.,Cox R.V,”Improving transcoding capability of speech coders in clean and frame erasured channel environments”,IEE Workshop on Speech Coding,pp.78-80,2000.
  • [6] Lamame R.C,Alcain A.,”Strategies to improve the performance of very low bit rate speech coders and applications to a variable rate 1.2kb/s codec”,IEEE Image and Signal Processing,vol.152,issue 1,pp.74-86,2005.
  • [7] Lefeloure R.,Gournay P.,Slami R.,”A study of Design compromises for speech coders in Packet networks”,IEEE International Conference on speech and signal processing,vol.1,2004.
  • [8] Chen F.K,Chen G.M,Su G and Tsai Y.,”Unified pulse-replacement search algorithms for algebra codebooks of speech coders”,IET Signal Processing,vol.4,issue 6,pp.658-665,2010.
  • [9] Ananthpadmanabhan A.,Dejaco A.P,”Method and apparatus for using non-symmetric speech coders to produce non-symmetric wireless communication system,US Patent,2004.
  • [10] Vuppala A.K,Rao K.S,”Effect of speech coding on speaker identification”,IEEE India Conference(INDICON),pp.1-4,2010.
  • [11] Giacobello D.,Christense M.G Dahl J.,”Joint estimation of short-term and long-term predictors in speech and signal processing,pp.4109-4112,2009.
  • [12] Hu Y.,Loizou P.C,”Evaluation of objective quality measures for speech enhancement”,IEEE Transactions on audio,vol.16,no.1,2008.
  • [13] Nikolic J.,Peric Z.,”Lloyd-Max’s algorithm implementation in speech coding algorithm based on forward adaptive technique”, Computer Communication sciences,vol.19,pp.255-270,2008.
  • [14] Kwon S.Y,”M ethod for speech coding based on a code excited linear prediction (CELP) model”,US 6073092A,2000. [15] Palaz H,”Ses kodlama teknikleri ve yeni nato ses kodlama seçimi”,Tübitak UEKAE,2003.
  • [16] Atal B.,Cuperman V. and Gersho A.,”Advances in speech coding”,Kluwer Academic Publisher,1991.
  • [17] B.Tang,A.Shen and G.Pottie,”A perceptually-based embedded subband speech coder”,IEEE Trans.Speech Audio Process,pp.131-140,1997.
  • [18] http://www.primolevi.gov.it/public/transfert/Vallini_Cristina/5AITIS/Pulse%20Code%20Modulation.pdf
  • [19] Richey H,”Adaptive Differential Pulse Code Modulation using PICmicro Microcontrollers”,Microchip Ar643,1997.
  • [20] http://telecom.tbi.net/digadpcm.htm
  • [21] Boser B.E and Wooley B.A,”The design of sigma-delta modulation analog-to-digital converters”,IEEE Journal of solid-state circuits,vol.23,pp.1298-1308,2002.
  • [22] http://acikarsiv.ankara.edu.tr/browse/2980/3826.pdf?show
  • [23] Ordentlich E.,”Low-delay code-excited linear-predictive coding of wideband speech at 32kbps”,International Conference on Speech and Signal Processing,pp.9-12,1991.
  • [24] https://www2.spsc.tugraz.at/www-archive/AdvancedSignalProcessing/SS01-VoiceOverIP/coders/spmethods.html
  • [25] Boubana S.,Maeda S.,”Multipulse LPC modeling of articulatory movement”,Speech Communication,pp.227-248,1998.