Ercüment GÜVENÇ, Murat SAKAL, Gürcan ÇETİN, Osman ÖZKARACA

Öğrencilerin Dersteki Niteliklerinin Makine Öğrenmesi Teknikleri Kullanılarak Sınıflandırılması

Öğrencilerin akademik başarılarını tahmin etme ve eksik oldukları alanları giderme anlamında yapılan bu çalışma, Bilişim Sistemleri Mühendisliğine Giriş dersi alan öğrencilere uygulanmıştır. Bu öğrencilerin dönem başı bilgisayar bilgi düzeylerinin, dönem sonunda elde ettikleri başarı notu üzerine etkisi makine öğrenmesi yöntemleri uygulanarak eğitim kalitesinin arttırılması amaçlanmıştır. Çalışmaya katılan öğrencilere ait veriseti eğitim ve test verisi olmak üzere ayrıldığında veri yetersizliğinden dolayı anlamsız sonuçlar ortaya çıkmıştır. Bu nedenle makine öğrenmesi algoritmalarının başarımını arttırmak için “Sentetik Azınlık Örneklem Arttırma Yöntemi (SMOTE)” çalışmada veri çoğaltma tekniği olarak seçilmiştir. Veri çoğaltma işlemi yapıldıktan sonra, veri seti üzerinde uygulanan K-en yakın komşu (KNN), Destek vektör makinesi (DVM), Lojistik Regresyon (LR), Rasgele Orman (RF), Karar ağaçları (DT) ve Naive Bayes makine öğrenmesi yöntemlerine göre en iyi sonucu en yakın komşuluk- KNN algoritması ile oluşturulmuş model vermiştir. Bu model, eğitim setinden bağımsız 300 öğrenciden oluşan test verisinin sınıflandırma işlemini, %97.66 doğrulukla tahmin etmiştir.

Anahtar Kelimeler:

Makine öğrenmesi, Sınıflandırma, Başarı tahmini, En yakın komşuluk algoritması, Sentetik azınlık örneklem arttırma yöntemi

Classification of Students' Course Qualifications Using Machine Learning Techniques

This study, which aims to predict the academic success of the students and to eliminate the missing areas, was applied to the students who took the Introduction to Information Systems Engineering course. It is aimed to increase the quality of education by applying machine learning methods to the effect of the computer knowledge level of these students at the beginning of the semester on the success grade they get at the end of the semester. When the dataset of the students participating in the study was separated as training and test data, meaningless results emerged due to the lack of data. For this reason, "Synthetic Minority Sampling Method (SMOTE)" was chosen as the data multiplication technique in the study to increase the performance of machine learning algorithms. After the data replication process is done, according to the K-nearest neighbor (KNN), Support vector machine (DVM), Logistic Regression (LR), Random Forest (RF), Decision trees (DT) and Naive Bayes machine learning methods applied on the data set. The model created with the nearest neighbor-KNN algorithm gave the best result. This model predicted the classification process of the test data consisting of 300 students independent of the training set, with an accuracy of 97.66%.

Keywords:

Machine Learning, Classification, Success prediction, K-nearest Neighbour, SMOTE,

PDF

___

[1] M. Imran, S. Latif, D. Mehmood ve M. S. Shah, "Student academic performance prediction using supervised learning techniques,” International Journal of Emerging Technologies in Learning (iJET),vol. 14, pp. 92-105, 2019.
[2] E. Güvenç, G. Çetin ve H. Koçak, “Comparison of KNN and DNN classifiers performance in predicting mobile phone price ranges,” Advances in Artificial Intelligence Research (AAIR), vol. 1, pp. 19-28, 2021.
[3] A. A. Soofi ve A. Awan, “Classification techniques in machine learning: Applications and issues,” Journal of Basic & Applied Sciences, no. 13, pp. 459-465, 2017.
[4] B. Abdualgalil ve S. Abraham, “Applications of machine learning algorithms and performance comparison: A Review,” International Conference on Emerging Trends in Information Technology and Engineering, pp. 1-6, 2020.
[5] J. Brownlee. (2020, Apr 8). 4 Types of Classification Tasks in Machine Learning. [Online]. Available: https://machinelearningmastery.com/types-of-classification-in-machine-learning/.
[6] M. Yavaş, A. Güran ve M. Uysal, “Covid-19 veri kümesinin SMOTE tabanlı örnekleme yöntemi uygulanarak sınıflandırılması,” Avrupa Bilim ve Teknoloji Dergisi Özel Sayı, ss. 258-264, 2020.
[7] S. Turhan, Ö. Yüksel, B. S. Yürekli, A. S. Karakülah ve E. Doğu, “Sınıf dengesizliği varlığında hastalık tanısı için kolektif öğrenme yöntemlerinin karşılaştırılması: Diyabet tanısı,” Türkiye Klinikleri Biyoistatistik Dergisi, c. 12, ss. 16-26, 2020.
[8] L. Wang, “Imbalanced credit risk prediction based on SMOTE and Multi-Kernel FCM improved by particle swarm optimization,” Applied Soft Computing, vol. 114, pp. 1-14, 2021.
[9] N. V. Chawla, K. W. Bowyer, L. O. Hall ve W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, no. 16, pp. 321-357, 2002.
[10] Y.-l. Cai, D. Ji ve D.-f. Cai, Proceedings of NTCIR-8 Workshop Meeting, Tokyo, 2010.
[11] P. Cunningham ve S. J. Delany, “k-Nearest neighbour classifiers - A Tutorial,” ACM Computing Surveys, vol. 6, no. 54, pp. 1-25, 2021.
[12] G. Guo, H. Wang, D. Bell, Y. Bi ve K. Greer, “KNN model-based approach in classification,” On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, pp. 986-996, 2003.
[13] T. Cover ve p. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory,vol. 1, no. 13, pp. 21-27, 1967.
[14] D. M. Atallah, M. Badawy, A. El-Sayed ve M. A. Ghoneim, “Predicting kidney transplantation outcome based on hybrid feature selection and KNN classifier,” Multimedia Tools and Asslications, 2019.
[15] M. Muja ve D. G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configuration,” Proceedings of the Fourth International Conference on Computer Vision Theory and Asslications, Lisboa, Portugal, pp. 331-340, 2009.
[16] C. Cortes ve V. Vapnik, “Support vector networks,” Machine Leaming, no. 20, pp. 273-289, 1995.
[17] S. Alay. (2020, 22 Haziran). AlgoRithm:Destek Vektör Makineleri(Support Vector Machines)(R Kod Örnekli). [Çevrimiçi]. Erişim: https://www.datasciencearth.com/algorithmdestek-vektor-makinelerisupport-vector-machinesr-kod-ornekli/.
[18] A. Subasi, Practical machine learning for data analysis using python, Academic Press, 2020.
[19] T. K. Ho, “Random decision forests,” Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, pp. 278-282, 1995.
[20] S. M. Piryonesi ve T. E. El-Diraby, “Role of data analytics in infrastructure asset management: overcoming data size and quality problems,” Journal of Transportation Engineering, Part B: Pavements, no. 146, pp. 1-17, 2020.
[21] G. James, D. Witten, T. Hastie ve R. Tibshirani, %1 içinde An Introduction to statistical learning, Springer, 2013, pp. 316-321.
[22] H. Chauhan ve A. Chauhan, “Implementation of decision tree algorithm c4.5,” International Journal of Scientific and Research Publications, no. 3, pp. 1-3, 2013.
[23] E. Uzun. (2022) .Decision Tree (Karar Ağacı): ID3 Algoritması – Classification (Sınıflama). [Çevrimiçi]. Erişim: https://erdincuzun.com/makine_ogrenmesi/decision-tree-karar-agaci-id3-algoritmasi-classification-siniflama/.
[24] A. McCallum, Graphical Models, Lecture2: Bayesian Network Representation, 2019.
[25] S. M. Piryonesi ve T. E. El-Diraby, “Role of data analytics in ınfrastructure asset management: overcoming data size and quality problems,” Journal of Transportation Engineering, Part B: Pavements, vol. 2, no. 146, pp. 1-17, 2020.
[26] T. Hastie, R. Tibshirani ve J. H. Friedman, The elements of statistical learning : Data mining, inference, and prediction : With 200 full-color illustrations, New York : Springer, 2001.
[27] R. Stuart ve N. Peter, Artifıcial intelligence a modern approach, New Jersey: Published by Prentice Hall, 1995.
[28] D. J. Hand ve K. Yu, “Idiot's bayes: Not so stupid after all?,” International Statistical Review, no., 69, pp. 385-398, 2001.
[29] devhunteryz.wordpress.com. (2018, 20 Eylül). Rastgele Orman(Random Forest) Algoritması» [Çevrimiçi]. Erişim: https://devhunteryz.wordpress.com/2018/09/20/rastgele-ormanrandom-forest-algoritmasi/comment-page-1/.