TİMUR DÜZENLİ, HATİCE DOĞAN, NALAN ÖZKURT

Konuşma/Müzik Ayrıştırması için Kesikli Dalgacık Dönüşümü Tabanlı Öznitelik Çıkarımı

Bu çalışmada konuşma ve müzik işaretlerinin birbirinden ayrıştırılabilmesi için kesikli dalgacık dönüşümü tabanlı bir öznitelik seti önerilmiştir. Öznitelik setinde dalgacık katsayılarının ortalamaları, varyansları ve altbandlar arası değişim oranları kullanılmıştır. Dalgacık dönüşümünün sinyalleri iyi ifade edebilmesi sayesinde, 0,5 saniyelik pencerelerde dahi yüksek doğruluklu bir sınıflandırma sağlanabilmiştir. Veri seti olarak internet radyolarından kaydedilmiş çeşitli bayan-erkek konuşmaları ve farklı türlerden müzik işaretleri kullanılmıştır. Daubechies-8 dalgacığının yok etme moment sayısı ve dikgenliği dikkate alındığında bu ailenin diğer üyeleri arasında en iyi performansa sahip olduğu gözlenmiştir. Öznitelikler çıkarıldıktan sonra, ilintili öznitelikleri yok etmek için temel bileşen analizi kullanılmıştır. Sınıflandırma hem yapay sinir ağları hem de destek vektör makineleri ile yapılmış ve önerilen özniteliklerin, klasik özniteliklerden çok daha iyi performans gösterdiği gözlenmiştir

Discrete Wavelet Transform Based Feature Extraction for Speech/Music Discrimination

In this study, a discrete wavelet transform based feature set has been proposed for discrimination of music and speech. The feature set is constructed using the mean and variances of discrete wavelet coefficients and ratio of the change between the wavelet subbands. Due to the good representation ability of the wavelets, a high accuracy classification can be obtained even for a short window of 0,5 seconds. A database which contains a wide variety of radio recordings from internet radios with different male and female speakers and various genres of musical pieces is constructed. The best performance is obtained with Daubechies-8 wavelet among the other members of the Daubechies family, considering the number of vanishing moments and orthogonality. The principal component analysis has been applied to eliminate the correlated features. The classification has been accomplished using both artificial neural networks and support vector machines and according to the results the proposed feature set outperforms the traditional ones.

PDF

___

1.Mubarak, O. M., Ambikairajah,E.,Epps,J., 2006.Novel Features for Effective Speech and Music Discrimination, IEEE Int. Conf. on Engineering of Intelligent Systems, Islamabad, Pakistan.
2.Exposito,J.E.M., Galan,S.G., Reyes N. R., Candeas, P., 2007.Audio Coding İmprovement Using Evolutionary Speech/Music Discrimination, IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE),Londra,İngiltere.
3.El-Maleh, K., Klein,M.,Petrucci,G.,Kabal,P., 2000.Speech/music Discrimination for Multimedia Applications,IEEE Int. Conf. on Acoustics, Speech, Signal Processing(ICASSP'00), İstanbul, Türkiye.
4.Gedik,A.,Bozkurt,B., 2010.Pitch Frequency Histogram Based Music Information Retrieval for Turkish Music,SignalProcessing, Cilt 10,sayı 4, sayfa 1049-1063.
5.Saunders, J., 1996.Real Time Discrimination of Broadcast Speech/Music,IEEE Int. Conf. on Acoustics, Speech, and Signal Processing,Atlanta, ABD.
6.Scheier, E.,Slaney,M., 1997. Construction and Evaluation of aRobust Multifeature Speech/Music Discriminator, IEEE Int. Conf. on Acoustics, Speech, Signal Processing(ICASSP'97), Münih, Almanya.
7.Ajmera, J., McCowan I., Bourlard,H., 2003.Speech/Music Segmentation Using Entropy and Dynamism Features in a Hmm Classification Framework, Speech Communication, Cilt 40,Sayı 3, 351-363.
8.Panagiotakis, C.,Tziritas,G., 2005.A Speech/Music Discriminator Based on Rms And Zero-Crossings, IEEE Trans. on Multimedia,cilt 7, sayı 1, sayfa155-166.
9.Wang,J.,Wu, Q., Yan,Q.,2008. Real-time Speech/Music Classification with aHierarchical Oblique Decision Tree, IEEE Int. Conf. on Acoustics,Speech, and Signal Processing (ICASSP'08), Las Vegas, ABD.
10.Pikrakis, T., Giannakopoulos, T.,Theodoridis, S., 2008. A Speech/Music Discriminator of Radio Recordings Based on Dynamic Programming And Bayesian Networks, IEEE Trans. onMultimedia, Cilt10,Sayı 5, sayfa 846-857.
11.Kos, M., Grasic,M., Kacic,Z., 2009.Online Speech/Music Segmentation Based on the VarianceMean of Filter Bank Energy, EURASIP Journal on Advances in Signal Processing, cilt 2009, sayfa 1-13.
12.Tzanetakis, G., Essl,G., Cook, P., 2001.Audio Analysis Using the Discrete Wavelet Transform,WSES Int. Conf. Acoustics and Music: Theory and Applications (AMTA 2001), Yunanistan.
13.Didiot,E., Illina,I.,Fohr,D.,Mella,O., 2010.A Wavelet-Based Parameterization for Speech /Music Discrimination, Computer Speech and Language, cilt24, sayı 2, sayfa 341-357.
14.Ntalampiras, S., Fakotakis,N.,2008.Speech /Music Discrimination Based on Discrete Wavelet Transform,5th HellenicConf. On Art.Int. (SETN'08), Yunanistan.
15.Khan, M. K. S.,Al-Khatib, W. G., 2006.Machine-Learning Based Classification of Speech and Music, ACM Jour. on Multimedia Systems, Cilt12, Sayı 1, Sayfa55-67.
16.Düzenli,T.,Özkurt,N., 2011.Comparison ofWavelet Based Feature Extraction Methods for Speech/Music Discrimination, IU-JEEE, cilt 11, sayı 1, sayfa 1355-1362.
17.Düzenli,T.,Özkurt,N., 2011.Discrete andDual TreeWavelet Features for Real-Time Speech/Music Discrimination, ISRN Signal Processing, cilt 2011, Article ID 269361.
18.Mallat,S., 1999.A wavelet tour of signal processing,Elsevier Academic Press, 3. Basım, Burlington, MA, ABD.
19.Joachims, T., 2002.Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers Norwell, MA, ABD.
20.Zheng,F.,Zhang,G.,Song,Z., 2001. Comparison of Different Implemantations of mfcc, Arch. Rat. Mech. Anal.,Cilt16, Sayı 6, sayfa 582-589.
21.Haykin,S., 1999.Neural Networks A Comprehensive Foundation (2nd ed.), New Jersey: Prentice Hall.
22.Vapnik,V. N., 1999.The Nature of Statistical Learning Theory (2nd ed.), New York: Springer.
23.Moller,M.F.,1993. A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning, Neural Networks,Cilt 6, Sayı 4, Sayfa 525-533.
24.Hsu,C.-W., Chang,C.-C. veLin, C.-J., 2010.A Practical Guide toSupport Vector Classification, http://www.csie.ntu.edu.tw/ cjlin.
25.Duda,R. O., Hart, P. E.,Stork,D. G., 2001. Pattern Classification and Scene Analysis (2nd ed.), New York: John Wiley & Sons Inc.