Fatih MERT, Muhammed Ali AYDIN, Zeynep ORMAN

Öznitelik Seçiminde Genetik Algoritma Kullanılarak Kur’an-ı Kerim Ayetlerinin Otomatik Sınıflandırılması

Metin etiketleme olarak da bilinen metin sınıflandırması verilen bir metni organize gruplara ayırma işlemidir. Metin sınıflandırıcılar, Doğal Dil İşleme yöntemlerini kullanarak metni otomatik olarak analiz edebilir ve ardından içeriğine göre bir dizi önceden tanımlanmış etiket veya kategori ataması yapabilir. Söz konusu bir Kur'an ayeti ise, etiketlenmedeki temel amaç ayetin ilgili olduğu temanın belirlenmesidir. Ancak mevcuttaki ayet etiketleme yaklaşımları öncelikli olarak Arapça dilinde ve Kur'an tefsirinde derin uzmanlığa sahip alimlerin mevcudiyetine bağlıdır. Bu çalışmada metin sınıflandırma algoritmalarını kullanarak Kur'an ayetlerinin etiketlenmesi görevinin otomatikleştirilmesi önerilmektedir. Sınıflandırma algoritmaları ile gerçekleştirdiğimiz deneylerde ayetlerin İngilizce çevirilerinin ait oldukları önceden tanımlanmış 15 kategori öznitelik olarak kullanılmıştır. Literatürdeki benzer çalışmalardan farklı olarak öznitelik seçimi aşamasında Genetik Algoritma kullanılmıştır. Böylece gerçekleştirilen bu ara adımın nihai performansa olumlu etki etmesi amaçlanmıştır. Çalışmanın sonunda çeşitli performans değerlendirme metrikleri kullanılarak sınıflandırma modellerinin başarım değerleri karşılaştırılmalı olarak verilmiştir.

Anahtar Kelimeler:

Kur, Metin Sınıflandırma, Genetik Algoritma, Makine Öğrenmesi, Öznitelik Seçimi

Automatic Classification of Quran Verses Using Genetic Algorithm for Feature Selection

Text classification, also known as text tagging, is the process of dividing a given text into organized groups. Using Natural Language Processing methods, text classifiers can automatically analyze text and then assign a set of predefined tags or categories based on its content. If it is a verse of the Holy Qur'an, the main purpose of labeling is to determine the theme of the verse. However, current approaches to verse tagging depend primarily on the availability of scholars with deep expertise in the Arabic language and Qur'anic exegesis. In this study, it is suggested to automate the task of tagging Qur'anic verses using text classification algorithms. In the experiments we carried out with the classification algorithms, the 15 predefined categories to which the English translations of the verses belong were used as features. Unlike similar studies in the literature, Genetic Algorithm was used in the feature selection stage. Thus, it is aimed that this intermediate step will have a positive effect on the final performance. At the end of the study, the performance values of the classification models are given comparatively by using various performance evaluation metrics.

Keywords:

Quran, Text Classification, Genetic Algorithm, Machine Learning, Feature Selection.,

PDF

___

Al-Kabi, M. N., Ata, B. M. A., Wahsheh, H. A., & Alsmadi, I. M. (2013, December). A topical classification of Quranic Arabic text. In Proceedings of the 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences (pp. 252-257).
M.H. Shakir. Al-Quran English Translation.
Adeleke AO, Samsudin NA, Mustapha A & Nawi NM (2017), Comparative Analysis of Text Classification Algorithms for Automated Labelling of Quranic Verses. Int. J. on Advance Science, Engineering and Info. Tech. 7, 1419-1427.
Adeleke AO, Samsudin NA, Mustapha A & Nawi NM (2018), A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses. in R. Ghazali et al. (eds.), Recent Advances on Soft Computing and Data Mining, Advances in Intelligent Systems and Computing 700, 549, 282-297
Goudjil M, Bedda M, Koudil M, & Ghoggali N (2015), Using Active Learning in Text Classification of Quranic Sciences. Int. Conf. on Advances in Information Technology for the Holy Quran and Its Sciences, 209-213
Ibrahim EAA, Ataelfadiel MAM & Atwel ES (2017), Provisions of Quran Tajweed Ontology (Articulations Points of Letters, UN Vowel Noon and Tanween). Int. J. of Science and Research, 6, 8, 756-761
Zharmagambetov AS & Pak AA (2015), Sentiment analysis of document using deep learning and decision trees. Twelve IEEE Int. Conf. on Electronics Computer and Computation, 1-4.
Wang JH & Wang HY (2014), Incremental Neural Network Construction for Text Classification. IEEE Int. Symposium on Computer Consumer and Control, 970-973.
Townsend KR, Sun S, Johson T, Attia OG, Jones PH, and Zambreno J (2015), k-NN text classification using an FPGA-based sparse matrix vector multiplication accelerator. IEEE Int. Conf. on Electro/Information Technology, 257-263.
Sabbah T & Selamat A (2014), Support Vector Machine based approach for Quranic words detection in online textual content. 8th IEEE Malaysian Software Engineering Conference, Malaysia, 325- 330.
Al-Kabi, M. N., Ata, B. M. A., Wahsheh, H. A., & Alsmadi, I. M. (2013, December). A topical classification of Quranic Arabic text. In Proceedings of the 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences (pp. 252-257).
A. Kalousis, J. Prados and M. Hilario, “Stability of Feature Selection Algorithms: a study on high dimensional spaces,” Knowledge and information System, vol. 12, no. 1, 2007, pp. 95-116. Y. Sun, W. Qu, J. Zhou, X.i Tang, Y. Di1 and W. Wu, “An Improved Feature Selection Method in Chinese Text Categorization,” International Journal of Knowledge www.ijklp.org and Language Processing KLP International, 2011 ISSN 2191-2734 Volume 2, Number 3, July 2011, pp. 48-55.
https://dataverse.telkomuniversity.ac.id/