Radek MARTINEK, Marcel FAJKUS, Jiri KOZIOREK, Jan NEDOMA, Zdenek MACHACEK, Josef KROCIL

Improved method of heuristic classification of vowels from an acoustic signal

This paper describes research in the field of the improved methodology of the classification of vowels /a, a:/,/ε, ε:/, /i, i:/, /o, o:/, and /u, u:/ (vowel symbols according to IPA, i.e. International Phonetic Alphabet). The aim is todevelop an improved method enabling the automatic allocation of vowel symbols to the corresponding time segments ofacoustic recordings of an undisturbed speech signal. The combined classification method is based on finding frequenciesof the first two local maxims (formants) in a smoothed linear predictive amplitude spectrum (LPC, linear predictivecoding) and zero-crossing values of each speech active voiced short-term segment of the recording. Based on thesemonitored values, simple heuristic conditions are arranged for the classification of the respective vowel. Implementationof the algorithm was realized using the MATLAB environment and its Graphical User Interface (GUI) was used forthe user interaction. Verification of the success rate of vowel classification was done using recordings of forty speakers(twenty men and twenty women), where each speaker repeated the vowels repeatedly with short successive pauses. Thesuccess rate of recognizing vowels is classified and evaluated based on results obtained from our designed method.

PDF

___

Young-Giu J, Mun-Sung H, Sang L. Development of an optimized feature extraction algorithm for throat signal analysis. ETRI J 2007; 29: 292-299.
Uribe A, Gomez A, Bastidas M, Quintero OL, Campo D. A novel emotion recognition technique from voiced speech. In: IEEE 2017 Automatic Control; 18–20 October 2017; Cartagena, Colombia. pp. 1-4.
Stanek M, Polak L. Algorithms for vowel recognition in fluent speech based on formant positions. In: IEEE 2013 Telecommunications and Signal Processing; 2-4 July 2013; Rome, Italy. pp. 521-525.
Sung JL, Byung K, Hoon Ch, Yunkeun L. Intra- and inter-frame features for automatic speech recognition. ETRI J 2014; 36: 514-517
Lobanov B M. Classification of Russian vowels spoken by different speakers, J Acoust Soc Am 1971; 17 49: 606-608.
Yan Q, Vaseghi S, Rentzos D, Ho CH. Analysis and synthesis of formant spaces of British, Australian, and American accents. IEEE T Audio Speech 2007; 15: 676-689.
Adank P, Smits R, Van Hout R. A comparison of vowel normalization procedures for language variation research. J Acoust Soc Am 2004; 116: 3099-3107.
Zahorian SA, Kelkar S, Livingston D. Formant estimation from cepstral coefficients using a feedforward memoryless neural network. In: IEEE 1992 International Joint Conference on Neural Networks; 7–11 June 1992; Baltimore, MD, USA. pp. 673-678.
Mousmita S, Kandarpa S. Segmentation and classification of vowel phonemes of Assamese speech using a hybrid neural framework. Applied Computational Intelligence and Soft Computing 2012; 2012: 871324.
Kaladharan N. A review of different speech coding methods. International Journal of Electrical and Electronic Engineering and Telecommunications 2017; 6: 96-103.
Radova V, Psutka J, Muller L, Byrne W, Psutka JV, Ircing P, Matousek J. Czech Broadcast News Speech and Transcripts. 1st ed. Philadelphia, PA, USA: Linguistic Data Consortium, 2004.
Stanek M, Sigmund M. Speaker distinction using vowel polygons: experimental study. In: IEEE 2015 International Conference Radioelektronika; 21–22 April 2015; Pardubice, Czech Republic. pp. 125-128.
Mishra S, Bhowmick A, Shrotriya MCh. Hindi vowel classification using QCN-MFCC features, Perspect Sci 2016; 33: 28-31.
In-Chul Y, Dongsuk Y. Robust voice activity detection using the spectral peaks of vowel sounds. ETRI J 2009; 31: 451-453.
Martinek R, Kelnar M, Vanus J, Bilik P, Zidek J. A robust approach for acoustic noise suppression in speech using ANFIS. J Electr Eng 2015; 66: 301-310.
Martinek R, Kelnar M, Vanus J, Koudelka P, Bilik P, Koziorek J, Zidek J. Adaptive noise suppression in voice communication using a neuro-fuzzy inference system. In: IEEE 2015 Telecommunications and Signal Processing; 9–11 July 2015; Prague, Czech Republic. pp. 382-386.
Stanek M, Sigmund M. Finding the most uniform changes in vowel polygon caused by psychological stress. Radioengineering 2015; 24: 604-609.
Huang X, Acero A, Hon H. Spoken Language Processing. A Guide to Theory, Algorithm, and System Development. 1st ed. Upper Saddle River, NJ, USA: Prentice Hall, 2001.
Amami R, Ayed D, Ellouze N. An empirical comparison of SVM and some supervised learning algorithms for vowel recognition. Int J Intell Inform Process 2012; 3: 1-8.
Prasad R, Sangwan A, Jamadagni H, Chiranth M. Comparison of voice activity detection algorithms for VoIP. In: IEEE 2002 Computers and Communications; 1–4 July 2002; Taormina-Giardini Naxos, Italy. pp. 530-535.
Machacek Z. Analysis and elimination of dangerous wave propagation as intelligent adaptive technique. Lect Notes Artif Int 2011; 6592: 482-491.
Vanus J, Smolon M, Koziorek J, Martinek R. Voice control of technical functions in smart home with KNX technology. IFIP Adv Inf Comm Te 2015; 455-462.
Lawrence R, Schafer W. Introduction to digital speech processing. Found Trends Signal Process 2007; 1: 1-194.
Vanus J, Smolon M, Martinek R, Koziorek J, Zidek J, Bilik P. Testing of the voice communication in smart home care. Lect Notes Comp Sci 2015; 5: 15.
Youngjik L, Kyu-Woong H. Selecting good speech features for recognition. ETRI J 1996; 18: 29-40.
Sandeep K. Performance evaluation of novel AMDF-based pitch detection scheme. ETRI J 2016; 38: 425-434