TÜRKÇE KONUŞMA TANIMA SİSTEMİ İÇİN GENİŞ KELİME DAĞARCIĞINA SAHİP TEST VERİ KÜMESİNİN GELİŞTİRİLMESİ VE YENİ BİR TEST PROSEDÜRÜ

Otomatik konuşma tanıma sistemlerindeki en temel sorun, alana özgü bir otomatik konuşma tanıma sisteminin geliştirilmesi değil, geniş kelime dağarcığına sahip bir otomatik konuşma tanıma sisteminin geliştirilmesidir. Geniş kelime dağarcığına sahip olacak şekilde geliştirilen otomatik konuşma tanıma sistemleri, geniş kelime dağarcığına sahip bir test veri kümesi ile test edilmelidir. Bu nedenle çalışma kapsamında bir otomatik konuşma tanıma test veri kümesi hazırlanmıştır. Hazırlanan otomatik konuşma tanıma test veri kümesi, 20 farklı alandan konuşmaları ve bu konuşmalara karşılık gelen metin dosyalarını içermektedir. Çalışma kapsamında sunulan test prosedürü, geniş kelime dağarcığına sahip farklı Türkçe otomatik konuşma tanıma sistemleri üzerinde de test edilmiştir. Elde edilen kelime hata oranı sonuçlarının %14-21 arasında değişkenlik gösterdiği görülmüştür. Geniş kelime dağarcığına sahip olacak şekilde hazırlanan test veri kümesi ve test prosedürü, ilerideki çalışmalarda otomatik konuşma tanıma sistemlerinin başarısının daha net ortaya konması için yol göstericidir.

Anahtar Kelimeler:

Konuşma tanıma, Türkçe konuşma tanıma, Konuşma veri seti, Türkçe konuşma veri seti, Test veri seti

DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE

The most fundamental problem in the automatic speech recognition systems is not the development of a domainspecific automatic speech recognition system, but the development of an automatic speech recognition system with a large vocabulary. Developed automatic speech recognition systems should be tested with a large vocabulary test dataset. For this reason, an automatic speech recognition test corpus was prepared within the scope of the study. Prepared automatic speech recognition test corpus includes conversations from 20 different areas and text files of these conversations. The test procedure presented in the study was also tested on Turkish automatic speech recognition systems with a large vocabulary. It has been observed that the word error rate results ranged between 14-21%. The test corpus and test procedure with a large vocabulary prepared are guiding for the success of automatic speech recognition systems in future studies to be revealed more clearly.

Keywords:

Speech recognition, Turkish speech recognition, speech corpus, test corpus, Turkish speech corpus,

PDF

___

Prakoso H, Ferdiana R, Hartanto R. Indonesian Automatic Speech Recognition system using CMUSphinx toolkit and limited dataset. International Symposium Electronic Smart Devices 2016: 283-286.
Miao Y. Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN. arXiv CoRR, 2014;1401.6:1-4, 2014.
Yang X, Audhkhasi K, Rosenberg A, Thomas S, Ramabhadran B, Hasegawa-Johnson M. Joint modeling of accents and acoustics for multi-accent speech recognition. IEEE International Conference Acoustic Speech Signal Processing. 2018:5989-5993.
Rebai I, Benayed Y, Mahdi W, Lorré J.P. Improving speech recognition using data augmentation and acoustic model fusion. Procedia Computer Science. 2017; 112:316-322.
Jain A, Singh V.P, Rath S.P. A multi-accent acoustic model using mixture of experts for speech recognition. Annual Conference International Speech Communication Association. 2019: 779-783.
Zeineldeen M, Glushko A, Michel W, Zeyer A, Schlüter R, Ney H. Investigating methods to improve language model integration for attention-based encoder-decoder ASR models. Annual Conference of the International Speech Communication Association. 2021: 2856-2860.
Gandhe A, Rastrow A. Audio-attention discriminative language model for ASR rescoring. International Conference Acoustic Speech Signal Processing. 2020: 7944-7948.
Anusuya M.A, Katti S.K. Speech recognition by machine, a review. International Journal of Computer Science and Information Security. 2009; 6:181-205.
Dikici E, Saraçlar M. Semi-supervised and unsupervised discriminative language model training for automatic speech recognition. Speech Communication. 2016; 83:54-63.
Irie K, Tüske Z, Alkhouli T, Schlüter R, Ney H. LSTM, GRU, highway and a bit of attention: An empirical overview for language modeling in speech recognition. Annual Conference of the International Speech Communication Association. 2016: 08-12.
Siddharth D, Xinjian L, Florian M, Alan W. Domain robust feature extraction for rapid low resource ASR development. Black Language Technologies Institute. Carnegie Mellon University; Pittsburgh, USA. 2018; 258-265.
Inaguma H, Cho J, Baskar M.K, Kawahara T, Watanabe S. Transfer learning of languageindependent end-to-end ASR with language model fusion. IEEE International Conference Acoustic Speech Signal Processing. 2019: 6096-6100.
Arısoy E, Can D, Parlak S, Saraçlar M, Sak H. Turkish broadcast news transcription and retrieval,” IEEE Transaction Audio, Speech Language Processing. 2019; 17: 874-883.
Salor Ö, Pellom B.L, Ciloglu T, Demirekler M. Turkish speech corpora and recognition tools developed by porting SONIC: Towards multilingual speech recognition. Computer. Speech Language. 2007; 21:580-593.
Polat H, Oyucu S. Building a speech and text corpus of Turkish: Large corpus collection with initial speech recognition results. Symmetry. 2020; 12: 1-19.
Abate S.T. Large vocabulary read speech corpora for four Ethiopian languages: Amharic, Tigrigna, Oromo and Wolaytta. International Conference Language Resource Evaluation Conference Proceeding. 2020: 4167-4171.
Urban E, Buck A, Farley P, Bullwinkle M. Evaluate and improve Custom Speech accuracy. Microsoft Documents, 2022: 1-5.