Evrişimli Sinir Ağlarını Kullanarak Müzik Notasyonunu Tanıma

Müzik notaları, müziğin gelişiminde kritik bir rol oynar. Yüzyıllar boyunca müzik, ister bestecisinin el yazması isterse herhangi bir yazılı versiyon olsun, resim biçiminde tutulmuştur. Bununla birlikte, müzik notalarının resim biçiminde arşiv edilmesi, müzik bilgilerinin alınması için birçok zorluğu doğurmuştur. Müzik notası tanıma, MIDI (çalma için) ve musicXML (sayfa düzeni için) gibi, müzik notalarının düzenlenebilecek veya çalınabilecek şekilde tanınmasına izin veren optik karakter tanıma (OCR) uygulamalarından biridir. Bu yazıda, görüntülerde nota tanıma için Evrişimli Sinir Ağları (CNN) tabanlı bir çerçeve öneriyoruz. Not ve dinlenme görüntülerinin genel özelliklerini çıkarmak için, önceden eğitilmiş popüler bir CNN ağı, yani ResNet-101'i kullanıyoruz. Ardından, eğitim ve sınıflandırma amacıyla bir Destek Vektör Makinesi (SVM) kullanılır. ResNet-101, görüntü tanıma için son teknoloji ürünü önceden eğitilmiş ağlardan biridir, ResNet-101 bir milyondan fazla görüntüyle eğitilmiştir. Hızlı bir doğrusal çözücü kullanan çok sınıflı SVM sınıflandırıcılar da çok güçlü bir sınıflandırıcıdır. Çalışmamızı test etmek için, deneyimizde veri seti Attwenger, P RecordLabel ve OMR-veri setinden türetildi ve ardından müzik teorisi ile manuel olarak etiketlendi. Sonuç olarak, notaları ve dinlenmeleri birbirinden %99.02 oranıda doğru bir şekilde ayırabiliriz. Ayrıca beş farklı not türünü sınıflandırabiliriz. Bu çalışmada, Resnet-101 ve bir SVM'in ile kez birleştirilerek müzik notası tanıma için bir araya getirilmiştir ve sonuçlar çok umut vericidir.

Recognizing Musical Notation Using Convolutional Neural Networks

Musical scores are the essential of music theory and its development. Musical notation was developed by Greeks around 521 BCE,considering that music was developed a long time ago will will find a gap between new musical technology and old scrpits of musictheory since they were written in. However, having music scores in written form has rised various kinds of problems for musicinformation retrieval (MIR). Music notation recognition is a type of optical character recognition (OCR) applications, which allow usto recognize musical scores and convert it to a format that can be editied or played on computer such as musicXML (for page layout).In this paper, we introduce a Convolutional Neural Networks (CNN) based framework for musical notation recognition in images. Weuse a popular pre-trained CNN network, namely ResNet-101 to extract global features of notation and rest images. Then, a SupportVector Machine (SVM) is employed for training and classification purpose. ResNet-101 is one of the state-of-art pre-trained networkfor image recognition, ResNet-101 trained with more than a million images. Multiclass SVM classifiers using a fast-linear solver isalso very powerful classifier. We also evaluated the proposed approach on a dataset that was derived from Attwenger, P RecordLabeland OMR-dataset, and then labeled manually by music theory. As a result, we can separate notes and rests from each other with anaverage accuracy of 99.02%. We can also classify five different note types. This is the first time that Resnet-101 and a SVM iscombined together to perform musical notation recognition, and results are very promising.

PDF

___

Attwenger, P. (2015). RecordLabel, http://homepage. univie.ac.at/a1200595/recordlabel/
Bainbridge, D., & Bell, T. (2001). The challenge of optical music recognition. Comput. Humanit, 35, 95–121, doi:10.1023/A:1002485918032.
Calvo-Zaragoza, J., & Rizo, D. (2018). End-to-End Neural Optical Music Recognition of Monophonic Scores, Appl. Sci, 8, 606, doi:10.3390/app8040606.
Casey, M., & Veltkamp, R., & Goto, M., & Leman, M., & Rhodes, C., & Slaney, M. (2008). Content-Based Music Information Retrieval: Current Directions and Future Challenges. In Proc. of IEEE, 668–696, doi:10.1109/JPROC.2008.916370.
Cho, K., & van Merrienboer, B., & Gulcehre, C., & Bahdanau, D., & Bougares, F., & Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, arXiv 2014, arXiv:1406.1078
Dai, J., & Li, Y., & He, K., & Sun, J. (2016). R-FCN: Object Detection via Region-based Fully Convolutional Networks, arXiv 2016, arXiv:1605.06409.
Girshick, R. (2015). Fast R-CNN. arXiv 2015, arXiv:1504.08083.
Good, M., & Actor, G. (2003). Using MusicXML for file interchange. International Conference on WEB Delivering of Music, 15–17, doi:10.1109/WDM.2003.1233890.
Hajiˇc, J., & Pecina, P. (2017). The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. IAPR International Conference on Document Analysis and Recognition (ICDAR), 39–46, doi:10.1109/ICDAR.2017.16.
Hajiˇc, J., & Dorfer, M., & Widmer, G., Pecina, P. (2018). Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets. International Society for Music Information Retrieval Conference, 23–27.
He, K., & Zhang, X., & Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition, arXiv 2015, arXiv:1512.03385.
LeCun, Y., & Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444, doi:10.1038/nature14539.
Lin, T.Y., & Goyal, P., & Girshick, R., & He, K., & Dollár, P. (2017). Focal Loss for Dense Object Detection, arXiv 2017, arXiv:1708.02002.
Liu, W., & Anguelov, D., & Erhan, D., & Szegedy, C., & Reed, S., & Fu, C.Y., & Berg, A.C. (2016). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision; Springer: Cham, Switzerland, 21–37, doi:10.1007/978-3-319-46448-0_2.
Pacha, A., & Hajiˇc, J., & Calvo-Zaragoza, J. (2018). A Baseline for General Music Object Detection with Deep Learning, Appl. Sci., 8, 1488, doi:10.3390/app8091488.
Rebelo, A., & Fujinaga, I., & Paszkiewicz, F., & Marcal, A.R.S., & Guedes, C., & Cardoso, J.S. (2012). Optical music recognition: State-of-the-art and open issues. Int. J. Multimed. Inf. Retr, 1, 173–190, doi:10.1007/s13735-012-0004-6.
Redmon, J., & Divvala, S., & Girshick, R., & Farhadi, A. (2015). You Only Look Once: Unified, Real-Time Object Detection, arXiv 2015, arXiv:1506.02640.
Ren, S., & He, K., & Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015, arXiv:1506.01497.
ResNet, (2015). https://towardsdatascience.com/review-resnet-winner-of-ilsvrc-2015-image-classification-localization-detectione39402bfa5d8
Ronneberger, O., & Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation, arXiv 2015, arXiv:1505.04597.
Sutskever, I., & Vinyals, O., & Le, Q.V. (2014). Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Eds., 3104– 3112.
Tuggener, L., & Elezi, I., & Schmidhuber, J., & Pelillo, M., & Stadelmann, T. (2018). DeepScores—A Dataset for Segmentation, Detection and Classification of Tiny Objects. arXiv 2018, arXiv:1804.00525.
Tuggener, L., & Elezi, I., & Schmidhuber, J., & Stadelmann, T. (2018B). Deep Watershed Detector for Music Object Recognition, arXiv 2018, arXiv:1805.10548.
Van der Wel, E., & Ullrich, K. (2017). Optical Music Recognition with Convolutional Sequence-to-Sequence Models, arXiv 2017, arXiv:1707.04877.