A detailed sur A detailed survey of Turkish aut urkish automatic speech r omatic speech recognition ecognition

A detailed sur A detailed survey of Turkish aut urkish automatic speech r omatic speech recognition ecognition

Significant improvements have been made in automatic speech recognition (ASR) systems in terms of both the general technology and the software used. Despite these advancements, however, there is still an important difference between the recognition performance of humans and machines. This work focuses on the studies conducted in the field of Turkish speech recognition, the progress made in such studies in recent years, the language-specific constraints, the performance results achieved in the applications developed to date, and the development of a general scheme for researchers wishing to develop an ASR system for the Turkish language. A comprehensive study on the Turkish language, including all literature and speech recognition systems, was prepared. There has been significant progress in the languages of developed countries in terms of ASR systems, but it was observed in this work that the opposite is true for the Turkish language. The lack of labor force and material resources remains the biggest obstacle to developing better systems. As a result, concrete studies are needed to achieve an ASR system with performance equivalent to the human level for the Turkish language.

___

  • [1] Tran DT. Fuzzy approaches to speech and speaker recognition. PhD, Canberra University, Canberra, Australia, 2000.
  • [2] Bilmes J. EE516 Computer Speech Processing: Lecture: 2. Washington, USA: University of Washington Department of Electrical Engineering Press, 2015.
  • [3] Picone JW. Signal modelling techniques in speech recognition. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP); Minneapolis, USA; 1993. pp. 1215-1247.
  • [4] Unnibhavi AH, Jangamshetti DS. A survey of speech recognition on South Indian languages. In: International Conference on Signal Processing, Communication, Power and Embedded Systems; Odisha, India; 2016. pp. 1122- 1126.
  • [5] Davis KH, Biddulph R, Balashek S. Automatic recognition of spoken digits. Journal of the Acoustic Society of America 1952; 24 (6): 637-642. doi: 10.1121/1.1906946
  • [6] Suzuki J, Nakata K. Recognition of Japanese vowels preliminary to the recognition of speech. Journal of the Radio Research Laboratory 1961; 37 (8): 193-212.
  • [7] Velichko VM, Zagoruyko NG. Automatic recognition of 200 words. International Journal of Man-Machine 1970; 2 (3): 223-234. doi: 10.1016/S0020-7373(70)80008-6
  • [8] Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transaction on Acoustics, Speech, Signal Processing 1978; 26 (1): 43-49. doi: 10.1109/TASSP.1978.1163055
  • [9] Itakura F. Minimum prediction residula applied to speech recognition. IEEE Transactions on Acoustics, Speech, Signal Processing 1975; 23 (1): 67-72.
  • [10] Rabiner LR, Juang BH. An introduction to hidden Markov models. IEEE ASSP Magazine 1986; (3) 1: 4-16. doi: 10.1109/MASSP.1986.1165342
  • [11] Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of IEEE 1989; 77 (2): 257-286. doi: 10.1109/5.18626
  • [12] Juang BH, Furui S. Automatic speech recognition and understanding: A first step toward natural human machine communication. Proceedings of IEEE 2000; 88 (8): 1142-1165. doi: 10.1109/5.880077
  • [13] Riccardi G, Hakkani-Tur D. Active learning: theory and applications to automatic speech recognition. IEEE Transactions On Speech And Audio Processing 2005; 13 (4): 504-511. doi: 10.1109/TSA.2005.848882
  • [14] Hinton G, Deng L, Dong Y, George E, Jaitly N et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine 2012; 29 (6): 82-97.
  • [15] Asefisaray B. End-to-end speech recognition model: experiments in Turkish. PhD, Hacettepe University, Ankara, Turkey, 2018.
  • [16] Arslan Y. Detection and recognition of sounds from hazardous events for surveillance applications. PhD, Yıldırım Beyazıt University, Ankara, Turkey, 2018.
  • [17] Akın C. Voice recognition system with score level fusion methods and embedded system design. MSc, İstanbul Technical University, İstanbul, Turkey, 2019.
  • [18] Anusuya MA, Katti SK. Speech recognition by machine: a review. International Journal of Computer Science and Information Security 2009; 4 (3): 181-205.
  • [19] Tekin T. Altaic Languages, The Encyclopedia of Languages and Linguistic. Brno, Czech Republic: Masaryk University Press, 1994.
  • [20] Starostin SA, Dybo AV, Mudrak OA. An Etymological Dictionary of Altaic Languages. Leiden, Netherlands: Brill Publishers, 2003.
  • [21] Poppe N. Introduction to Altaic Linguistics. Wiesbaden, Germany: Ural-Altaische Bibliotek, 1965.
  • [22] Uzun, NE. Türkçenin dünya dilleri arasındaki yeri üzerine. Ankara Üniversitesi Dil ve Tarih-Coğrafya Fakültesi Türkoloji Dergisi 2012; 19 (2): 115-134.
  • [23] Palaz H, Kanak A, Bicil Y, Doğan MU, İslam T. TREN- Turkish speech recognition platform. In: IEEE Signal Processing and Communications Applications Conference; Kayseri, Turkey; 2005. pp. 1-4.
  • [24] Aksoylar C, Mutluergil SO, Erdogan H. The anatomy of a Turkish speech recognition system. In: IEEE Signal Processing and Communications Applications Conference; Antalya, Turkey; 2009. pp. 512-515.
  • [25] Özbey C., Bayar S. Automatic speech recognition: generating and testing generic acoustic model for Turkish. In: 19. Akademik Bilişim Konferansı; Aksaray, Turkey; 2017. pp. 1-6.
  • [26] Arısoy E. Statistical and discriminative language modelling for Turkish large vocabulary continuous speech recognition. PhD, Boğaziçi University, İstanbul, Turkey, 2009.
  • [27] Asefisaray B, Mengüşoğlu E, Hacıömeroğlu M, Sever H. How does language model size effects speech recognition accuracy for the Turkish language? Pamukkale University Journal of Engineering Sciences 2016; 22 (2): 100-105.
  • [28] Susman D, Köprü S, Yazıcı A. Turkish large vocabulary continuous speech recognition by using limited audio corpus. In: IEEE Signal Processing and Communications Applications Conference; Muğla, Turkey; 2012. pp. 1-4.
  • [29] Erdoğan H, Büyük O, Oflazer K. Incorporating language constraints in sub-word based speech recognition. In: IEEE Signal Processing and Communications Applications Conference; Kayseri, Turkey; 2005. pp. 1-6.
  • [30] Sak H, Saraçlar M, Güngör T. Morphology-based and sub-word language modelling for Turkish speech recognition. In: IEEE Signal Processing and Communications Applications Conference; Diyarbakır, Turkey; 2010. pp. 5402-5405.
  • [31] Peker O. Speech recognition by Turkish word. In: Signal Processing and Communication Application Conference (SIU); Zonguldak, Turkey; 2016. pp. 1737-1740.
  • [32] Arslan LM. Digit recognition application of Turkish continuous speech recognition (In Turkish). In: IEEE Signal Processing and Communications Applications Conference; Antalya, Turkey; 2009. pp. 64-67.
  • [33] Mengüşoğlu E, Deroo O. Turkish LVCSR: database preparation and language modelling for an agglutinative language. In: International Conference on Acoustics, Speech and Signal Processing; Salt-Lake City, USA; 2001. pp. 4018-4023.
  • [34] Arısoy E. Developing an automatic transcription and retrieval system for spoken lectures in Turkish. In: IEEE Signal Processing and Communications Applications Conference; Antalya, Turkey; 2017. pp. 1-4.
  • [35] Vanajakshi P, Mathivanan MA. Detailed survey on large vocabulary continuous speech recognition techniques. In: International Conference on Computer Communication and Informatics; Coimbatore, India; 2017. pp. 1-7.
  • [36] Yalçın N. Speech recognition theory and techniques. Kastamonu Educational Journal 2008; 16 (1): 249-266.
  • [37] Buyuk O, Haznedaroglu A, Arslan LM. Turkish speech recognition software with adaptable language model. In: IEEE Signal Processing and Communications Applications Conference; Eskişehir, Turkey; 2007. pp. 1-4.
  • [38] Caner M, Üstün SV. An application of speaker recognition using artificial neural networks. Journal of Engineering Sciences 2006; 12 (2): 279-284.
  • [39] Keser S, Edizkan R. Phonem-based isolated Turkish word recognition with subspace classifier. In: IEEE Signal Processing and Communications Applications Conference; Antalya, Turkey; 2009. pp. 93-96.
  • [40] Büyük O, Erdoğan H, Oflazer K. Using hybrid lexicon units and incorporating language constraints in speech recognition. In: IEEE Signal Processing and Communications Applications Conference; Kayseri, Turkey; 2005. pp. 111-114.
  • [41] Yadav M, Alam MA. Speech recognition: a review. International Journal of Research in Electronics and Computer Engineering (IJRECE) 2018; 6: 1-9.
  • [42] Vibha T. MFCC and its applications in speaker recognition. International Journal on Emerging Technologies 2010; 1 (1): 19-22.
  • [43] Chandresekar P, Chapaneri S, Jayaswal D. Automatic speech emotion recognition: a survey. In: International Conference on Circuits, Systems, Communication and Information Technology Applications; Mumbai, India; 2014. pp. 341-346.
  • [44] Karpagavalli S, Chandra EA. Review on automatic speech recognition architecture and approaches. International Journal of Signal Processing, Image Processing and Pattern Recognition 2016; 9 (4): 393-404. doi: 10.14257/ijsip.2016.9.4.34
  • [45] Madan A, Gupta D. Speech feature extraction and classification a comparative review. International Journal of Computer Applications 2014; 90 (9): 20-25. doi: 10.5120/15603-4392
  • [46] Thasleem TM, Narayanan NK. A robust approach for Malayalam cv speech unit recognition using artificial neural network. In: Proceedings of the International Conference on Image Processing Computer Vision and Pattern Recognition; Las Vegas, NV, USA; 2011. pp. 52-56.
  • [47] Gelegin İ, Bolat B. Computer control with a discrete-word speech recognition system. In: Elektrik-Elektronik ve Bilgisayar Sempozyumu; Elazığ, Turkey; 2011. pp. 1-4 (in Turkish).
  • [48] Yakar Ö, Aşlıyan R. Turkish speech recognition using hidden Markov model. In: Akademik Bilişim Konferansı; Aydın, Turkey; 2016. pp. 1-7.
  • [49] Oflazoğlu Ç, Yıldırım S. Anger recognition in Turkish speech using acoustic information. In: IEEE Signal Processing and Communications Applications Conference; Muğla, Turkey; 2012. pp. 1-4.
  • [50] Akın AA, Demir C, Doğan MU. Improving sub-word language modelling for Turkish speech recognition. In: IEEE Signal Processing and Communications Applications Conference; Muğla, Turkey; 2012. pp. 1-4. [51] Myers CS, Rabiner LR. A level building dynamic time warping algorithm for connected word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 1981; 29 (2): 284-297. doi: 10.1109/TASSP.1981.1163527
  • [52] Edizkan R, Tiryaki B, Büyükcan T, Uzun İ. Mobile vehicle control with voice command recognition. In: 9. Akademik Bilişim Konferansı; Kütahya, Turkey; 2007. pp. 1-10.
  • [53] Baygin M, Karaköse M. Real time voice recognition based smart home application. In: IEEE Signal Processing and Communications Applications Conference; Muğla, Turkey; 2012. pp. 1-4.
  • [54] Yalçın N, Bay ÖF. HTK ile konuşmacıdan bağımsız Türkçe konuşma tanıma sistemi oluşturma. Gazi Üniversitesi Endüstriyel Sanatlar Eğitim Fakültesi Dergisi 2007; 21: 79-97.
  • [55] Gaikwad SK, Gawali BW, Yannawar P. A review on speech recognition technique. International Journal of Computer Applications 2010; 10 (3): 16-24. doi: 10.5120/1462-1976
  • [56] Varga AP, Moore RK. Hidden Markov model decomposition of speech and noise. In: International Conference on Acoustics, Speech and Signal Processing; Albuquerque, Mexico; 1990. pp. 845-848.
  • [57] Weintraub M, Murveit H, Cohen M, Price P, Bernstein J. Linguistic constraints in hidden Markov model based speech recognition. In: International Conference on Acoustics, Speech and Signal Processing; Glasgow, Scotland; 1989. pp. 699-702.
  • [58] Türk O, Arslan LM. Speech recognition methods for speech therapy. In: IEEE Signal Processing and Communications Applications Conference (SIU); Kuşadası, Turkey; 2004. pp. 410-413.
  • [59] Arısoy E, Saraçlar M. Speech recognition for Turkish broadcast news. In: IEEE Signal Processing and Communications Applications Conference (SIU); Eskişehir, Turkey; 2007. pp. 1-4.
  • [60] Arısoy E, Arslan LM. Turkish dictation system for Broadcast news applications. In: IEEE Signal Processing and Communications Applications Conference (SIU); Kayseri, Turkey; 2005. pp. 629-632.
  • [61] Can B, Artuner H. A syllable-based Turkish speech recognition system by using time delay neural networks. In: International Conference on Soft Computing and Pattern Recognition (SoCPaR); Hanoi, Vietnam; 2013. pp. 219- 224.
  • [62] Dede G, Sazlı MH. Analysis of biometric systems from pattern recognition perspective and voice recognition module simulation. In: Elektrik, Elektronik ve Bilgisayar Mühendisliği Ulusal Kongresi; İstanbul, Turkey; 2010. pp. 1-5.
  • [63] Cristianini N, Taylor JS. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, England: Cambridge University Press, 2000.
  • [64] Schölkopf B, Burges CJC, Smola AJ. Advances in Kernel Methods: Support Vector Learning. Cambridge, England: The MIT Press, 1999.
  • [65] Eray O, Tokat S, İplikci S. An application of speech recognition with support vector machines. In: International Symposium on Digital Forensic and Security (ISDFS); Antalya, Turkey; 2018. pp. 1-6.
  • [66] Tombaloğlu B, Erdem H. Development of a MFCC-SVM based Turkish speech recognition system. In: IEEE Signal Processing and Communications Applications Conference (SIU); Zonguldak, Turkey; 2016. pp. 929-932.
  • [67] Oflazoğlu Ç, Yıldırım S. Binary classification performances of emotion classes for Turkish emotional speech. In: IEEE Signal Processing and Communications Applications Conference (SIU); Malatya, Turkey; 2015. pp. 2353-2356.
  • [68] Çoban Ö, Özyer GT. Music genre classification from Turkish lyrics. In: IEEE Signal Processing and Communication Application Conference (SIU); Zonguldak, Turkey; 2016. pp. 101-104.
  • [69] Tombaloğlu B, Erdem H. A SVM based speech to text converter for Turkish language. In: IEEE Signal Processing and Communications Applications Conference (SIU); Antalya, Turkey; 2017. pp. 1-4.
  • [70] Wan J, Wang D, Hoi S, Wu P, Zhu J et al. Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM International Conference on Multimedia; Orlando, FL, USA; 2014. pp. 157-166.
  • [71] LeCun Y, Bengio Y, Hinton G. Deep learning: review. Nature 2015; 521: 436-444. doi: 10.1038/nature14539
  • [72] Uzun İ, Edizkan R. Performance improvement of distributed Turkish continuous speech recognition system in case of burst packet loses. In: IEEE Signal Processing, Communication and Application Conference; Aydın, Turkey; 2008. pp. 1-4.
  • [73] Uzun İ, Edizkan R. Isolated speech recognition over internet protocol in distributed systems. In: IEEE Signal Processing, Communication and Application Conference (SIU); Aydın, Turkey; 2008. pp. 1-4.
  • [74] Saraçlar M, Arısoy E, Uzun İ, Edizkan R. Analysis and compensation of sparse packet losses in distributed Turkish continuous speech recognition system. In: IEEE Signal Processing, Communication and Application Conference (SIU); Aydın, Turkey; 2008. pp. 1-4.
  • [75] Uzun İ, Edizkan R. Performance improvement in distributed Turkish continuous speech recognition system using packet loss concealment techniques. In: Innovations in Intelligent Systems and Applications; İstanbul, Turkey; 2011. pp. 375-378.
  • [76] Anwer A, Mahmood O. Breast cancer diagnosis using deep learning methods (In Turkish). MSc, University of Turkish Aeronautical Association Institute of Science and Technology, Ankara, Turkey, 2017.
  • [77] Li J, Deng L, Haeb UR, Gong Y. Robust automatic speech recognition: a bridge to practical applications. Netherlands: Academic Press, 2015.
  • [78] Arslan RS, Barışçı N. The effect of different optimization techniques on end-to-end Turkish speech recognition systems that use connectionist temporal classification. In: International Symposium on Multidisciplinary Studies and Innovative Technologies; Ankara, Turkey; 2018. pp. 1-6.
  • [79] Arısoy E, Dutağacı H, Arslan LM. A unified model for large vocabulary continuous speech recognition of Turkish. Signal Processing 2006; 86 (10): 2844-2862. doi: 10.1016/j.sigpro.2005.12.002
  • [80] Bayer AO, Çiloğlu T, Yöndem MT. Investigation of different language models for Turkish speech recognition. In: IEEE Signal Processing and Communications Applications Conference; Antalya, Turkey; 2006. pp. 1-4.
  • [81] Arısoy E, Saraçlar M. Language modelling approaches for Turkish large vocabulary continuous speech recognition based on lattice rescoring. In: IEEE Signal Processing and Communications Applications Conference; Antalya, Turkey; 2006. pp. 1-4.
  • [82] Aksungurlu T, Parlak S, Sak H, Saraçlar M. Comparison of language modelling approaches for Turkish broadcast news. In: IEEE Signal Processing and Communications Applications Conference; Zonguldak, Turkey; 2016. pp. 1-4.
  • [83] Çiloğlu T, Çömez M, Şahin S. Language modelling for Turkish as a agglutinative languages. In: IEEE Signal Processing and Communications Applications Conference (SIU); Aydın, Turkey; 2004. pp. 1-2. [84] Arısoy E, Can D, Parlak S, Sak H, Saraçlar M. Turkish broadcast news transcription and retrieval. IEEE Transactions on Audio, Speech, and Language Processing 2009; 17 (5): 874-883. doi: 10.1109/TASL.2008.2012313
  • [85] Hori T, Cho J, Watanabe S. End-to-end speech recognition with word-based RNN language models. arXiv 2018; arXiv:1808.02608 [cs.CL]. doi: 10.1109/SLT.2018.8639693
  • [86] Graves A, Jaitly N. Towards end to end speech recognition with recurrent neural networks. In: International Conference on Machine Learning; Beijing, China; 2014. pp. 1764-1772.
  • [87] Çelıktaş H, Hanılçı C. A study on Turkish text-dependent speaker recognition. In: IEEE Signal Processing and Communications Applications Conference; Antalya, Turkey; 2017. pp. 1-4.
  • [88] Yılmaz O, Saraçlar M. Audio diarization for Turkish broadcast news transcription. In: IEEE Signal Processing and Communications Applications Conference; Antalya, Turkey; 2011. pp. 379-382.
  • [89] Atalay NB, Oflazer K, Say B. The annotation process in the Turkish treebank. In: Proceedings of the 4th Linguistically Interpreted Corpora LINC-03; Budapest, Hungary; 2003. pp. 1-6.
  • [90] Eryiğit G, İlbay T, Arkan-Can, O. Multiword expressions in statistical dependency parsing. In: Proceedings of the second Workshop on Statistical Parsing of Morphologically-Rich Languages; Dublin, Ireland; 2011. pp. 45-55.
  • [91] Oflazer K, Say B, Hakkani-Tür DZ, Tür G. Building a Turkish Treebank, Invited Chapter in Treebanks: Building and Using Parsed Corpora. Dordrecht, Netherlands: Kluwer Academic Publishers, 2003.
  • [92] Strassel S, Walker K, Jones K, Graff D, Cieri C. New resources for recognition of confusable linguistic varieties: the LRE11 corpus. In: The Speaker and Language Recognition Workshop; Odyssey, Singapore; 2012. pp. 202-208.
  • [93] Craig G, Martin A, Graff D, Walker K, Jones K et al. The 2011 NIST Language Recognition Evaluation Plan (LRE11). Philadelphia, PA, USA: LDC, 2018.
  • [94] Arısoy E, Can D, Parlak S, Sak H, Saraçlar M. Turkish broadcast news speech and transcripts transcription and retrieval. IEEE Transactions on Audio, Speech and Language Processing 2009; 17 (5): 874-883. doi: 10.1109/TASL.2008.2012313
  • [95] Oflazoğlu Ç, Yıldırım S. Turkish emotional speech database. In: IEEE Signal Processing and Communications Applications Conference; Antalya, Turkey; 2011. pp. 1153-1156.
  • [96] Salor Ö, Pellom BL, Çiloğlu T, Demirekler M. Turkish speech corpora and recognition tools developed by porting SONIC: towards multilingual speech recognition. Computer Speech and Language 2007; 21 (4): 580-593. doi: 10.1016/j.csl.2007.01.001