Tuncay Şentürk, Eşref Adalı

Türkçe Metin Seslendirme

Bu çalışmada temel amaç, Türkçe metinlerin insan sesine dönüştürülebilmesi ve “Türkçe Metin Seslendirme” sisteminin geliştirilmesidir. Bu sistem geliştirilirken üç farklı yöntem incelenmiş, uygulanmış ve aralarındaki anlaşılırlık istatistiksel olarak ölçülmüştür. İlk olarak, “çift-ses (diphone) eklemeli yöntem” uygulanmıştır. Anlaşılırlığı düşük olmasa da doğallıktan uzak sonuçlar elde edilmiştir. Bunun üzerine, donanım maliyetinin de azalması ile, çift-ses eklemeye nazaran günümüz koşullarında daha kabul görmüş “hece eklemeli yöntem” geliştirilmiştir. Anlaşılırlık olarak ve ses kalitesinde olumlu yönde fark olduğu istatistiksel olarak ispatlanmıştır. Son olarak, ses süre ve şiddetinin değiştirilmesi suretiyle, vurgu ve tonlamada da başarılı sonuçlar elde edilmiştir. Tüm yöntemlerin ayrı ayrı anlaşılırlığının tespit edilebilmesi ve karşılaştırılabilmesi için; belirlenmiş cümleler, farklı yaş gruplarındaki insanlara dinletilmiş ve alınan cevaplara göre belirli formül yardımı ile yüz üzerinden puan verilecek şekilde hesaplama yapılarak, bir matriste sunulmuştur.Bu çalışmada farklı Türkçe ses sentezleme yöntemleri karşılaştırılmış ve kullanıcı deneyleri ile kalite analizi gerçekleştirilmiştir. Ses sentezleme yöntemlerinin karşılaştırmalı incelenmesi ve yapılarla oynanmasına müsaade edilen bir biçimde sunulması (XML), bu makalenin önemli katkı sağlamış olduğu noktalardır.Tüm çalışmalar için gerekli ses dosyalarının hazırlanması amacıyla önce Türk Dil Kurumunun ses veri tabanı kullanılmıştır. Daha sonra, yazılan program vasıtası ile MBROLA kütüphanelerinin kullanılması ile, tüm ses dosyalarının otomatik olarak oluşturulabilmesi sağlanmıştır. Oluşturulan bu ses dosyalarına, genlik dengeleme algoritması uygulanmış, ses dosyaları arasındaki en fazla ve en az genlik seviye farklılıkları aza indirgenerek anlaşılırlık arttırılmıştır.Hazırlanan programın gevşek bağlaşımlı bileşenlerden (metinden XML geçişi ve XML’den ses oluşturulması) oluşabilmesi sağlanmış ve bu bileşenler kullanılarak kullanıcı arayüzü hazırlanmıştır.Son olarak, görme engellilerin de ekran görüntüsü gerektirmeden kullanabileceği metin düzenleme program hazırlanmıştır.

Turkish Text to Speech Synthesizer

The main purpose of this study is development of a "Turkish Text Synthesizer System which converts text, written in Turkish, to human voice. Three different methods are examined for developing this system, these three methods are implemented and their clarity is measured statistically.First, the diphone concatenation method was applied. While the words were understandable, results were far from natural. Thus, considering the reduction of hardware costs in todays conditions the more accepted "syllable concatenation method” was developed. It is statisticaly proven that there is positive improvement with clarity and sound quality with this method. Finally, by changing the amplitude and duration of the sounds, more successful results were obtained for intonation. In order to determine and compare clarity of all methods set sentences were listened by different age groups and their answers were formulated to a score from 0 to 100, and the results were given in a matrix.The Turkish Language Association’s (TDK) database is used to prepare the necessary audio files in the begining of this study. Then, by means of a software program developed, MBROLA library was used to automatically create all the sound files. The amplitude balancing algorithm has been applied to these audio files, and clarity was increased by normalizing the maximum and minimum amplitude differences between sound files.It is provided that, the system has loosely coupled components (text to XML and XML to speech), and using these components a graphical user interface is developed.Finally, a text editing software program is developed to help the visually impaired edit text without the need for a screen image.

PDF

___

Lemmetty S., Review of Speech Synthesis Technology, Helsinki University of Technology, 1999
Allen, J., Hunnicutt, S., Klatt D., From Text to Speech: The MITalk System, Cambridge University Press, 1987
Dutoit T., A Short Introduction to Text-to-Speech Synthesis http://tcts.fpms.ac.be/synthesis/introtts_old.html, alındığı tarih 25.02.2010
http://www.acapela-group.com, alındığı tarih 25.02.2010
Ljungqvist M., Lindström A., Gustafson K., A New System for text-to-Speech and Its Applications to Swedish, ICSLP94 (4) : 1779-1782, 1994
Mönius B., Schroeter J., Santen J., Sproat R., Olive J., Recent Advances Multilingual Text-to-Speech Synthesis, Fortschritte der Akustik, DAGA, 1995
Güldalı K., Türkçe Metin Seslendirme, İstanbul Teknik Üniversitesi, 2009
http://www.nuance.com/realspeak, alındığı tarih 26.02.2010
Festival Project Homepage http://www.cstr.ed.ac.uk/projects/festival, alındığı tarih 26.02.2010
0] Dutoit T., “An Introduction to Text to Speech Synthesis”, pp 26-32, 1997
1] Wave Dosya Formatı http://ccrma.stanford.edu/courses/422/projects/WaveForm at, alındığı tarih 01.03.2010
2] SAMPA Türkçe, http://www.phon.ucl.ac.uk/home/sampa/turkish.htm, alındığı tarih 07.03.2010
3] Türkçe İmla Kılavuzu - Türk Dil Kurumu, 2000
4] Adalı E., Doğal Dil İşleme, 2010