Murat ÇAVDARLI, Ömer ESKİDERE, Figen ERTAŞ

Konuşmacı tanıma için eğitim algoritmalarının karşılaştırılması

Bu çalışmada Gauss Karışım Modeli (GKM) temeline dayanan bir konuşmacı tanıma sisteminde eğitim algoritmaları karşılaştırılmaktadır. GKM eğitim parametrelerinin kestiriminde Beklentinin Maksimumlaştırılması (BM) algoritması yaygın olarak kullanılmaktadır. Bu makalede Vektör nicemleme eğitim parametrelerinin kestirimi amacıyla kullanılan k-ortalama ve Linde, Buzo, Gray (LBG) eğitim algoritmaları GKM’ ye uygulanmaktadır. TIMIT ve NTIMIT veritabanları kullanılarak BM, kortalama ve LBG eğitim algoritmalarının konuşmacı tanıma performansları karşılaştırılmaktadır. Ayrıca model başlangıç değerlerine karşı hassas olan BM ve kortalama algoritmalarının veritabanları için ideal başlangıç değerleri belirlenmektedir.

Anahtar Kelimeler:

BM , LBG, K-ortalama, Konuşmacı tanıma, Gauss karışım modeli

A comparison of training algorithms in speaker identification

In this study, training algorithms are compared in Gaussian mixture model (GMM) based a speaker identification system. The Expectation maximization (EM) algorithm has widely been used to estimation of GMM parameters. In this article, the k-means and LBG are applied to GMM in order to estimate the vector quantization training parameters. The EM, the k-means and LBG training algorithms are tested with TIMIT and NTMIT databases and are compared speaker identification performance. Furthermore, the EM and k-means algorithms which sensitive against model initialization values are found optimum model initialization values for databases.

Keywords:

BM, LBG, k-means, Speaker identification, Gaussian mixture model,

PDF

___

Reynolds, D.A., An Overview of Automatic Speaker Recognition Technology ICASSP, 4072- 4076, 2002.
Ganchev, T., Speaker Recognition, PhD thesis, Dept. of Electrical and Computer Engineering, University of Patras, Greece, 61-82, 2005.
Kinnunen, T., Spectral Features for Automatic Text-independent Speaker Recognition, Ph.D. thesis, University of Joensuu, Department of Computer Science 49-115, 2003.
Rabiner, L. R. and B. H. Juang., Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, 1993.
Furui, S., Digital Speech Processing, Synthesis, and Recognition. M. Dekker Inc., 1989.
Reynolds, D.A., A Gaussian Mixture Modeling Approach to Text Independent Speaker Identification. Ph.D., thesis, Georgia Inst. of Technology, 1992.
Ertaş, F., Fundamantels of Speaker Recognition. Journal of Engineering Sciences, No. 2-3, Pamukkale, 85-193, 2000.
Bhattacharyya, S. T. Srikanthan, P. Krishnamurthy, Ideal GMM Parameters & Posterior Log Likelihood for Speaker Verification, Proceedings of the IEEE Signal Processing Society Workshop, USA. ISBN: 0- 7803-7196-8, 471-480, 2001.
Reynolds, D. A., R. C. Rose., Robust Text- Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Trans. Speech Audio Processing, 3. 72-83, 1995.
McLachlan, G., Mixture Models. New York, NY: Marcel Dekker, 1 st. ed., 1988.
Rabiner, L. R., A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of IEEE, vol. 77, no. 2, 257-286, 1989.
Feder, M. Weinstein and A. Oppenheim, A New Class Of Sequential and Adaptive Algorithms with Application to Noise Cancellation. in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 557- 560, 1988.
Eskidere Ö., İstatiksel Modelleme ile Konuşmacı Tanıma. Doktora tezi, Uludağ Üniversitesi, 2007.
J.B. McQueen, Some Methods of Classification and Analysis of Multivariate Observations, Proc. 5th Berkeley Symp. Mathematic Statist. Probability, 281-296. Univ. of California, Berkeley, USA, 1967.
Duda R. O., Hart P. E. and Stork D.G., Pattern Classification 2nd Edition, Wiley-Interscience Publication, USA. 2001.
G. Singth, A.Panda, S. Bhattacharvya, T. Srikanthan, Vector Quantization Techniques for GMM Based Speaker verification, ICASSP, 65- 69, 2003.
Linde, Y., A. Buzo., R. M. Gray, An Algorithm for Vector Quantization, IEEE Trans. Communications, Vol. 28, No. 1, 84-95, 1980.
Shannon, B., J. Kuldip. and K. K. Paliwal., A Comparative Study of Filter Bank Spacing for Speech Recognition. Microelectronic Engineering Research Conference, 2003.
Campbell, J. P. and A. D. Reynolds., Corpora for the Evaluation of Speaker Recognition Systems. IEEE Trans. Speech Audio Processing, 829-832, 1999.
Slaney, M., Auditory Toolbox: A MATLAB Toolbox for Auditory Modeling Work Technical Report, Interval Research Corporation, 29-32, 1998.
Aliaa, A. Y., A. S. Ebada, W. H. El Behaidy, Development of Automatic Speaker Identification System. Conference, 2004.
Eskidere, Ö. Ertaş, F., Gauss karışım modeli ile konuşmacı tanımada Parametre değerlerinin seçimi, SİU, Eskişehir, 2007.