Classification Vowel-Consonant Letters with Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size

In the study, a syllable-scale synchronization study was carried out by considering the grammatical structure of Turkish to emphasize simultaneously the sound and the text. Therefore, it was aimed to classify the vowels and consonants in Turkish within the word. For this purpose, two different Artificial Neural Network (ANN) models were preferred for this classification, and also the Mel-Frequency Cepstrum Coefficients method was preferred for extracting features of voice data. It has been observed that ANNs give the best results with deep learning. Tests were made with different numbers of coefficients in feature extraction. In the first stage of this study, a certain number of recordings were taken from the vowels and consonants in Turkish. Then, their feature was extracted and prepared for the training of networks. The best network structure and parameters were selected as a result of training and test made with different parameters. In this training, networks were asked to distinguish vowels from consonants. Afterward, the vowel-consonant distinction was made among 10 predetermined vectors of words and phrases. Layer-recurrent Neural Network and Pattern Recognition Network achieved an average success of 97.43% and 98.04%, respectively, in deep learning training carried out through the Mathworks Matlab software. Because Pattern Recognition Network achieved 98.82% success in recognizing vowels and 97.27% in recognizing consonants, this network model was preferred in vowel-consonant classification. After the classification process, timing files were created by determining the transition times of the vowels in the word. In the last step, an interface was created on the C# .NET platform for the synchronization process, and a syllabic algorithm was developed in this interface to emphasize the syllable synchronization of the text. Thus, the desired high precision was achieved in the simultaneous highlighting of the words.

___

  • Bayat, H.İ., 2020. Identification of vowel-non vowel letter with artificial neural network and sound-text synchronization at syllable level (Master thesis), Gaziosmanpaşa University, Institute of science and technology, Tokat, Turkey.
  • Bengio, Y., 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML workshop on unsupervised and transfer learning, pp. 17-36.
  • Cakir, E., 2014. Multilabel sound event classification with neural networks. (Master thesis), Tampere University of Technology, Faculty of Computing and Electrical Engineering, Finland.
  • Çakır, M.Y., 2017. Real-time high-quality voice recognition. (Master thesis), İstanbul Sabahattin Zaim University, Institute of science and technology, İstanbul, Turkey
  • Cosi, P., Bengua, Y. and De Maria, R., 1990. Phonetically-based multi-layered neural networks for vowel classification. Speech Communication, 1(9), pp. 15-19.
  • Dave, N., 2013. Feature extraction methods LPC, PLP and MFCC in speech recognition. Internatıonal journal for advance research ın engıneerıng and technology, 4(1), 5 pp.
  • Dede, G., 2008. Speech recognition with artificial neural networks (Master thesis), Ankara University, Institute of science and technology, Ankara, Turkey.
  • Elman, L. J., 1990. Finding structure in time. Cognitive Science, 2(14), pp. 179-211.
  • Güloğlu, T., 2014. Speech recognition for Turkish phonology using wavelet techniques.(Master thesis), Dokuz Eylül University, Graduate School of Natural and Applied Sciences, İzmir.
  • Gupta, M., Jin, L. and Homma, N., 2004. Static and dynamic neural networks: from fundamentals to advanced theory. John Wiley & Sons.
  • Haykin, S., 1999. Neural Networks: A Comprehensive Foundation, 2nd Edition, Prentice-Hall, pp. 823, Ontario, Canada.
  • Hinton, G., Osindero, S. ve Teh, Y. W., 2006. A fast-learning algorithm for deep belief nets.Neural computation, 18(7), pp. 1527-1554.
  • Kılıç, E., 2015. The effects of Turkish vowel harmony in word recognition. (Master thesis), DePaul University, The Department of Psychology Collage of Science and Health, Chicago, Illinois, USA.
  • Kohonen, T., 1987. State of the art in neural computing. In Proceedings, IEEE First International Conference on Neural Networks, pp. 179-190, San Diego, USA
  • McCulloch, W. S. and Pitts, W. 1943. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), pp. 115-133.
  • Meng, Y., Lee, T., Ching, P.C. and Zhu, Y. 2004. Speech recognition on DSP: issue on computational efficiency and performance analysis. Microprocessors and Microsystems, 30(3), pp. 155-164.
  • Önder, M., In Printing. Elmas-Hece Engineering in Quran Education
  • Parlaktuna, O., Cakici, T., Tora H. and Barkana, A., 1994. Vowel and consonant recognition in Turkish using neural networks toward continuous speech recognition. Mediterranean Electrotechnical Conference, Antalya, Turkey.
  • Sirigos, J., V. Darsinos, N. Fakotakis and G. Kokkinakis, 1996. Vowel-non vowel decision using neural networks and rules. Proceedings of Third International Conference on Electronics, Circuits, and Systems, Rodos, Greece.
  • Tiwari, V., 2010. MFCC and its application in speaker recognition. International Journal on Emerging Technologies, 1(1), pp. 19-22.
  • Üstün, S.V., 1997. Recognition of vowels in Turkish using artificial neural networks. (Master thesis), Yıldız Technical University, Institute of science and technology, İstanbul, Turkey
  • Vafeiadis, A., Kalatzis, D., Votis, K., Giakoumis, D., Tzovaras, D., Chen, L. And Hamzaoui, R., 2017, November. Acoustic scene classification: From a hybrid classifier to deep learning.
  • Wang, J.C., Wang, J.F., He, K.W. and Hsu, C.S., 2006, July. Environmental sound classification using hybrid SVM/KNN classifier and MPEG-7 audio low-level descriptor. In The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp. 1731-1735
  • Yalçın, N., 2006. Developing a software for teaching initial reading writing to class student of primary education using speech recognition technology. (Doctoral thesis), Institute of science and technology, Ankara, Turkey.
  • Yavuz, E. and Topuz, V., 2010. Recognation of Turkish vowels by probabilistic neural network using Yule-Walker AR method. International Conference on Hybrid Artificial Intelligence Systems, Berlin, Heidelberg, Germany.