Mehmet ÖZÇALICI, Aslı Boru İPEK, Ayşe Tuğba DOSDOĞRU, Mustafa GÖÇKEN

MELEZ YÖNTEMLER İLE ANKARA ÜNİVERSİTESİ SİYASAL BİLGİLER FAKÜLTESİ DERGİSİNDE YAYIMLANAN BİLİMSEL MAKALELERİN SINIFLANDIRILMASI

Teknoloji, sosyal bilimler ve diğer alanlarda yapılan çalışmaların sayısı hızla artmaktadır. Bu nedenle dergilerde bulunan makalelerin sayısı da her geçen gün artış göstermektedir. Dergide bulunan makaleleri manuel olarak sınıflandırmak çok zaman almaktadır. Bu nedenle, belge seviyesinde sınıflandırma, günümüzde farklı uygulama alanlarında çok sayıda metin belgesi bulunması nedeniyle her zaman önemli bir araştırma konusu olmuştur. Bu noktada, yapılandırılmamış metin analizi yapılmalı ve sınıflandırmak için uygun yöntemler tasarlanmalıdır. Verilerin hızlı artışı nedeniyle, sınıflandırma yapmak için güçlü yöntemlere ihtiyaç duyulmaktadır. Bundan dolayı, araştırmacılar güçlü yöntemler ve algoritmalar geliştirmeye çalışmaktadırlar. Yöntemlerin ve algoritmaların başarısı, uygulanan dil, verilerin yapısı, analiz edilecek verinin uzunluğu gibi birçok faktöre bağlıdır. Çalışmamızda destek vektör makinesi (DVM), k-en yakın komşu algoritması (KNN), karar ağacı (KA) ve genetik algoritma (GA) tabanlı melez yöntemler kullanılarak Ankara Üniversitesi Siyasal Bilgiler Fakültesi Dergisi’nde bulunan bilimsel makaleler sınıflandırılmıştır. Ayrıca farklı veri kümeleri kullanılarak önerilen yöntemler karşılaştırılmıştır. Çalışmanın sonuçları önerilen GA tabanlı yöntemlerin minimum %82.5 doğruluk oranı ile belge sınıflandırılmasında başarıyla kullanılabileceğini göstermiştir.

Classification of Scientific Articles Published in Ankara University Journal of the Faculty of Political Science with Hybrid Methods

The number of studies in technology, social sciences and other fields is increasing rapidly. For this reason, the number of articles in journals is increasing day by day. It takes a lot of time to manually classify the articles in the journal. Therefore, classification at the document level has always been an important research topic because of the large number of text documents in different application areas. At this point, unstructured text analysis should be used and appropriate methods should be designed to classify. Due to the rapid increase of data, strong methods are needed to classify. Hence, researchers are trying to develop powerful methods and algorithms. The success of the methods and algorithms depends on many factors such as the applied language, the structure of the data, and the length of the data to be analyzed. In our study, scientific articles in Ankara University Journal of the Faculty of Political Science are classified using hybrid methods based on support vector machine (DVM), k-nearest neighbor algorithm (KNN), decision tree (KA) and genetic algorithm (GA). In addition, the proposed methods were compared using different data sets. The results of the study showed that the proposed GA based methods can be successfully used in document classification with a minimum accuracy of 82.5%.

PDF

___

Aggrawal, Charu (2015), Data Mining the Textbook (New York: Springer). Çetin, Mahmut ve Fatih Amasyalı (2013), "Active Learning for Turkish Sentiment Analysis”, IEEE INISTA (New York: IEEE): 1-4.
Ben-Hur, Asa ve Jason Weston (2010), “A User’s Guide to Support Vector Machines”, Carugo, Oliviero ve Frank Eisenhaber (Der.) Data Mining Techniques for the Life Sciences, Methods in Molecular Biology (New York: Springer): 223−239.
Brındha, Senthil Kumar, K. Arun Prabha ve Sandeep Sukumaran (2016), “A Survey on Classification Techniques for Text Mining”, 2016 3rd International Conference on Advanced Computing and Communication Systems (New York: IEEE): 1-5.
Deniz, Ayça ve Hakan Ezgi Kiziloz (2017), “Effects of Various Preprocessing Techniques to Turkish Text Categorization using n-gram Features”, 2017 International Conference on Computer Science and Engineering (New York: IEEE): 655-660.
Doddi, Kiran Shriniwas, Y. V. Haribhakta ve Parag Kulkarni (2014), “Sentiment Classification of News Article”, International Journal of Computer Science and Information Technologies, 5 (3): 4621-4623.
Dönmez, İlknur ve Eşref Adalı (2018), “Turkish Document Classification with Coarse-Grained Semantic Matrix”, International Conference on Intelligent Text Processing and Computational Linguistics (Berlin: Springer): 472-484.
Forrest, Stephanie (1993), “Genetic Algorithms: Principles of Natural Selection Applied to Computation”, Science, 261: 872-878.
Gezici, Gizem ve Berrin Yanıkoglu (2018), “Sentiment analysis in Turkish”, Oflazer, Kemal ve Murat Saraçlar (Der.), Turkish Natural Language Processing. Theory and Applications of Natural Language Processing (Berlin: Springer): 255-271.
Gurbuz, Selen ve Galip Aydin (2018), “Büyük Veri Teknolojileri ile Bilimsel Makalelerin Sınıflandırılması”, 2nd International Conference on Computer Science and Engineering (New York: IEEE): 697-701.
Güran, Aysun, Murat Can Ganiz, Hamit Selahattin Naiboğlu ve Halil Oğuz Kaptıkaçtı (2013), “NMF Based Dimension Reduction Methods for Turkish Text Clustering”, 2013 IEEE INISTA (New Jersey: IEEE): 1-5.
Imandoust, Sadegh Bafandeh ve Mohammad Bolandraftar (2013), “Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: Theoretical Background”, International Journal of Engineering Research and Applications, 3(5): 605-610.
Kılınç, Deniz (2016), “The Effect of Ensemble Learning Models on Turkish Text Classification”, Celal Bayar Üniversitesi Fen Bilimleri Dergisi, 12(2): 215-220.
Kilimci, Zeynep Hilal, Selim Akyokus ve Sevinc Ilhan Omurca (2016), “The Effectiveness of Homogenous Ensemble Classifiers for Turkish and English Texts”, Badica, Costin, Mirel Coşulshchi, Adina Modga Florea, Petia Koprinkova-Hristova ve Tülay Yıldırım (Der.), 2016 International Symposium on Innovations in Intelligent Systems and Applications (Danvers: IEEE): 1-7.
Kilimci, Zeynep Hilal, Selim Akyokus ve Sevinc Ilhan Omurca (2017), “The Evaluation of Heterogeneous Classifier Ensembles for Turkish Texts. Jedrzejowicz, Piotr, Tülay Yildirim ve Ireneusz Czarnowski (Der.), 2017 IEEE International Conference on Innovations in Intelligent Systems and Applications (Danvers: IEEE Publishing): 307-311.
Kilimci, Zeynep Hilal ve Selim Akyokus (2018), “Deep Learning-and Word Embedding-Based Heterogeneous Classifier Ensembles for Text Classification”, Complexity, https://doi.org/10.1155/2018/7130146.
Küçük, Dilek (2015), “Automatic Compilation of Language Resources for Named Entity Recognition in Turkish by Utilizing Wikipedia Article Titles”, Computer Standards & Interfaces, 41: 1-9.
Küçük, Dilek ve Adnan Yazıcı (2010), “A Hybrid Named Entity Recognizer for Turkish with Applications to Different Text Genres”, Gelenbe, Erol, Ricardo Lent, Georgia Sakellari, Ahmet Sacan, Hakkı Toroslu, Adnan Yazıcı (Der.), Computer and Information Sciences Lecture Notes in Electrical Engineering (Dordrecth: Springer): 113-116.
Kowsari, Kamran, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes ve Donald Brown (2019), “Text Classification Algorithms: A Survey”, Information, 10 (150): 1-68.
Mathworks (2020), https://www.mathworks.com/help/stats/classificationecoc.html (01.04.2020).
Nanba, Hidetsugu, Noriko Kando, ve Manabu Okumura (2000), “Classification of Research Papers using Citation Links and Citation Types: Towards Automatic Review Article Generation”, Soergel, Dagobert, Padmini Srinivasan ve Barbara H. Kwasnik (Der.), 11th ASIS SIG/CR Classification Research Workshop, 11(1): 117-134.
Poyraz, Mitat, Murat C. Ganiz, Selim Akyokuş, Burak Görener ve Zeynep Hilal Kilimci (2012), “Exploiting Turkish Wikipedia as a Semantic Resource for Text Classification”, 2012 International Symposium on Innovations in Intelligent Systems and Applications, 1-5. Prabowo, Rudy ve Mike Thelwall (2009), “Sentiment Analysis: A Combined Approach”, Journal of Informetrics, 3(2): 143-157.
Rokach, Lior ve Oded Maimon (2005), Decision Trees Data Mining and Knowledge Discovery Handbook (Boston: Springer). Tellal, Erel, Murat Baskıcı, Bülent Duru, Bilge Sevim Çimen Aytekin, Şenay Sabah Kıyan, Bilge Kağan Şakacı, Murat Yaman ve Sevgi Eda Tuzcu (2012), “SBF Dergisi Dizin 1943–2011” https://sbfdergi.ankara.edu.tr/dosyalar/SBF-Dergisi-Dizini-28-Mayis-2012.pdf (01.04.2020).
Torunoğlu, Dilara, Erhan Çakırman, Murat Can Ganiz, Selim Akyokuş ve M. Zahid Gürbüz (2011), “Analysis of Preprocessing Methods on Classification of Turkish Texts”, 2011 International Symposium on Innovations in Intelligent Systems and Applications (New York: IEEE): 112-117.
Türkoğlu, Filiz, Banu Diri ve M. Fatih Amasyalı (2007), “Author Attribution of Turkish Texts by Feature Mining”, Türkoğlu, Filiz, Banu Diri ve M. Fatih Amasyalı (Der.), International Conference on Intelligent Computing (Berlin: Springer): 1086-1093.
Urologın, Siddhaling (2018), “Sentiment Analysis, Visualization and Classification of Summarized News Articles: A Novel Approach”, IJACSA0 International Journal of Advanced Computer Science and Applications, 9(8): 616- 625.
Uysal, Alper Kursat ve Serkan Gunal (2014), “The Impact of Preprocessing on Text Classification”, Information Processing & Management, 50: 104-112.
Yıldırım, Savaş (2014), “A Knowledge-Poor Approach to Turkish Text Categorization”, Gelbukh, Alexander (Der.), International Conference on Computational Linguistics and Intelligent Text Processing (Berlin: Springer): 428-440.