Öğrenme Yönetim Sistemi Log Kayıtlarının Akademik Başarı Tahmininde Kullanılması

Dünyada ve ülkemizde eğitim alanında dijitalleşme eğilimi arttıkça Öğrenme Yönetim Sistemleri (ÖYS) kullanımı da yaygınlaşmaktadır. Öğrenciler bu ortamlarla girdikleri etkileşimlerde kayda değer miktarda veri üretmekte ve bu veri üzerinde yapay zekâ algoritmaları kullanılarak öğrenme sürecini anlamaya dönük modeller geliştirilebilmektedir. Söz konusu modeller geliştirilirken eğitim ve öğrenme ortamına ait her türlü veri bu kapsama girebildiği gibi özellikle ÖYS’ler içerisindeki öğrenmeye harcanan zaman ve ders içeriğine erişim sıklığı gibi değişkenleri ölçmeye yarayan log (etkileşim) verisi öğrenme sürecinin anlaşılması bakımından büyük imkânlar barındırmaktadır. Bu çalışmada 2020 Bahar yarıyılı içerisinde açılan Temel Bilgisayar Uygulamaları dersine kayıtlı 93 öğrencinin 10 haftalık süre boyunca kullanmış oldukları Moodle tabanlı ÖYS içerisinden elde edilen log verisi üzerinde akademik başarı tahmini amacıyla KNN, Naive Bayes, SVM, CART ve C5.0 sınıflandırma algoritmaları uygulanmıştır. Elde edilen log dosyaları her bir öğrenci için ders ortamıyla olan etkileşimlerini ifade eden oturum açma sayısı, geçmiş konulara bakma sayısı, toplam ve ortalama görüntüleme sayısı, toplam ve ortalama oturum süresi, ödev materyalleri indirme sayısı, ödev deneme sayısı, ödeve harcanan zaman, sınav odaklı çalışma, eğitmene gönderilen mesaj sayısı, video sayfalarında geçirilen zaman ve yüklenen ödev sayısı özniteliklerine dönüştürülmüştür. Oluşturulan veri setinin dengesiz olmasından dolayı ayrıca yukarı örnekleme, SMOTE yöntemi ile sınıf örneklerini yakınlaştırma ve SMOTE yöntemi ile yukarı örnekleme yöntemleri kullanılarak 3 ayrı veri seti üzerinde de sınıflandırma algoritmaları uygulanmıştır. Çalışma sonucunda tüm veri setlerinde %80 üzeri sınıflandırma başarısına ulaşıldığı görülmüştür. En yüksek sınıflandırma başarıları SMOTE ile yukarı örnekleme uygulanan veri setinde negatif sınıfa ait örneklerin düşük miktarı ve benzer varyasyonların türetilmesi sebebiyle %100 başarı gösteren KNN algoritması göz ardı edildiğinde %97 başarı oranıyla CART ve SVM algoritmaları ile elde edilmiştir. Diğer yandan, Naive Bayes algoritmasının sonuçları daha güvenilir sayılabilecek olan rastgele alt örnekleme yöntemiyle en yüksek başarıyı gösterdiği görülmüştür. Sonuç olarak, ÖYS log kayıtlarının akademik başarı tahmininde kullanılabileceği görülmüş ve bulgular ilgili literatür ışığında tartışılmıştır.

Anahtar Kelimeler:

Veri Madenciliği, Sınıflandırma, Öğrenme Yönetim Sistemleri, Akademik Başarı

Using Learning Management System Logs to Predict Undergraduate Students’ Academic Performance

Digitalization in education has fostered higher education institutes to use Learning Management Systems (LMS). Students are unintentionally generating large volumes of data known as logs while using LMSs, which can be utilized to build artificial intelligence models to predict educational variables. Unlike the ordinary web server logs, LMS log reports include information on students’ specific interactions with the learning content/material, allowing variables to be created based on their interactions with the course. Five classification algorithms (KNN, Nave Bayes, SVM, CART, and C5.0) were used on the dataset created from the Moodle LMS log reports within a 10-week “Basic Computer Applications” course in which 93 undergraduate students registered. The log records for each student were transformed into a set of attributes that included the number of logins, material downloads, assignment attempts, uploaded assignments, messages sent to instructor, course page views, total time spent on the course page, average session time, total time spent on assignments and exams, and total time spent on video pages. Because the original dataset was imbalanced, over-sampling and SMOTE (Synthetic Minority Over-Sampling Technique) techniques were used to create three additional data sets besides the imbalanced dataset. The results showed that in each dataset, all classification performances were above 80%. If the KNN algorithm is ignored because of its extraordinarily high performance due to similar variations of the negative class generated by SMOTE technique, CART and SVM algorithms were found to be the most successful classifiers of students’ academic performance with 97% accuracy. On the other hand, using random-sub-sampling technique which can be considered as more reliable, NB algorithm was found to be the most successful classifier. The findings of this study demonstrated that using classification algorithms, LMS logs can be utilized to predict academic performance of students.

Keywords:

data mining, classification, learning management systems, academic performance,

PDF

___

[1] U. Fayyad, G. Piatetsky-Shapiro & P. Smyth, “From Data Mining to Knowledge Discovery in Databases”, AI Magazine, 17(3), 37- 54, 1996.
[2] C. Romero & S. Ventura, “Data Mining in Education”, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), 12-27, 2013.
[3] C. Romero & S. Ventura, “Educational data mining: A survey from 1995 to 2005”, Expert Systems with Applications, 33(1), 135-146, 2007.
[4] H. Aldowah, H. Al-Samarraie & W. M. Fauzy, “Educational Data Mining and Learning Analytics for 21st Century Higher Education: A Review and Synthesis”, Telematics and Informatics, 37, 13-49, 2019.
[5] J. P. Vandamme, N. Meskens & J. F. Superby, “Predicting Academic Performance by Data Mining Methods”, Education Economics, 15(4), 405, 2007.
[6] M. Wook, Y. H. Yahaya, N. Wahab, M. R. M. Isa, N. F. Awang & H. Y. Seong, “Predicting NDUM Student’s Academic Performance Using Data Mining Techniques”, 2009 Second International Conference on Computer and Electrical Engineering, Dubai, 357-361, 2009.
[7] B. K. Bhardwaj & S. Pal “Data Mining: A Prediction for Performance Improvement Using Classification”, International Journal of Computer Science and Information Security, 9(4), 136- 140, 2011.
[8] F. Ahmad, N. H. Ismail & A. A. Aziz, “The Prediction of Students’ Academic Performance Using Classification Data Mining Techniques”, Applied Mathematical Sciences, 9, 6415-6426, 2015.
[9] A. Mueen, B. Zafar & U. Manzoor, “Modeling and Predicting Students' Academic Performance Using Data Mining Techniques”, International Journal of Modern Education and Computer Science, 8(11), 36-42, 2016.
[10] C. Romero, S. Ventura & E. García, “Data Mining in Course Management Systems: Moodle Case Study and Tutorial”, Computers & Education, 51(1), 368-384, 2008.
[11] O. El Aissaoui, Y. El Alami El Madani, L. Oughdir & Y. El Allioui, “A Fuzzy Classification Approach for Learning Style Prediction Based on Web Mining Technique in E-Learning Environments”, Education and Information Technologies, 24(3), 1943-1959, 2018.
[12] M. D. Calvo-Flores, E. G. Galindo, M. C. P. Jiménez & O. Pérez, “Predicting students’ marks from Moodle logs using neural network models”, Current Developments in Technology-Assisted Education, 1(2), 586-590, 2006.
[13] C. Romero, S. Ventura, P. G. Espejo & C. Hervás, “Data Mining Algorithms to Classify Students”, 1st International Conference on Educational Data Mining, Canada, 8-17, 2008.
[14] J. Bravo & A. Ortigosa, “Detecting Symptoms of Low Performance Using Production Rules”, 2nd International Conference on Educational Data Mining, Spain, 31-40, 2009.
[15] C. Romero, P. G. Espejo, A. Zafra, J. R. Romero & S. Ventura, “Web Usage Mining for Predicting Final Marks of Students That Use Moodle Courses”, Computer Applications in Engineering Education, 21(1), 135-146, 2010.
[16] Á. F. Agudo-Peregrina, S. Iglesias-Pradas, M. Á. Conde-González & Á. Hernández-García, “Can We Predict Success from Log Data in VLEs? Classification of Interactions for Learning Analytics and Their Relation with Performance in VLE-Supported F2F and Online Learning”, Computers in Human Behavior, 31, 542-550, 2014.
[17] G. Akçapınar, Çevrimiçi öğrenme ortamındaki etkileşim verilerine göre öğrencilerin akademik performanslarının veri madenciliği yaklaşımı ile modellenmesi, Doktora Tezi, Hacettepe Üniversitesi, Eğitim Bilimleri Enstitüsü, 2014.
[18] N. Ademi, S. Loshkovska & S. Kalajdziski, “Prediction of Student Success Through Analysis of Moodle Logs: Case Study”, International Conference on ICT Innovations, North Macedonia, 27-40, 2019.
[19] A. Y. Q. Huang, O. H. T. Lu, J. C. H. Huang, C. J. Yin, & S. J. H. Yang, “Predicting Students’ Academic Performance by Using Educational Big Data and Learning Analytics: Evaluation of Classification Methods and Learning Logs”, Interactive Learning Environments, 28(2), 206-230, 2020.
[20] G. Akçapınar, “Predicting students’ approaches to learning based on Moodle logs”, In 8th International Conference on Education and New Learning Technologies, Spain, 2347-2352, 2016.
[21] M. Abdullah, A. Alqahtani, J. Aljabri, R. Altowirgi & R. Fallatah, “Learning Style Classification Based on Student’s Behavior in Moodle Learning Management System”, Transactions on Machine Learning and Artificial Intelligence, 3(1), 13, 2015.
[22] M. Cocea & S. Weibelzahl, “Eliciting Motivation Knowledge from Log Files Towards Motivation Diagnosis for Adaptive Systems”, User Modeling 2007, Cilt 4511, Editör: Conati C., McCoy K. & Paliouras G., Springer, Berlin, 197-206, 2007.
[23] A. Hershkovitz & R. Nachmias, “Learning about Online Learning Processes and Students’ Motivation through Web Usage Mining”, Interdisciplinary Journal of E-Learning and Learning Objects, 5(1), 197-214, 2009.
[24] Q. Zhou, W. Quan, Y. Zhong, W. Xiao, C. Mou & Y. Wang, “Predicting high-risk students using Internet access logs”, Knowledge and Information Systems, 55(2), 393-413, 2017.
[25] R Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2020.
[26] S. V. Buuren, & K. Groothuis-Oudshoorn, “mice: Multivariate Imputation by Chained Equations in R”, Journal of Statistical Software, 45(3), 1-67, 2011.
[27] N.V. Chawla, “Data Mining for Imbalanced Datasets: An Overview” Data Mining and Knowledge Discovery Handbook Editör: Maimon O., Rokach L., Springer, Boston, 853-867, 2009.
[28] E. Kartal & Z. Özen, “Dengesiz Veri Setlerinde Sınıflandırma”, Mühendislikte Yapay Zeka ve Uygulamaları, Editörler: Torkul O., Gülseçen S., Uyaroğlu Y., Çağıl G., Uçar M. K., Sakarya, Sakarya Üniversitesi Kütüphanesi Yayınevi, 109-131, 2017.
[29] R. Wirth & J. Hipp, “CRISP-DM: Towards a Standard Process Model for Data Mining” 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, London, UK, 29-39, 2000.
[30] T. Cover & P. Hart, “Nearest Neighbor Pattern Classification”, IEEE Transactions on Information Theory, 13(1), 21-27, 1967.
[31] M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms, John Wiley & Sons, New Jersey, ABD, 2011.
[32] D. D. Lewis, “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval”, Machine Learning: ECML-98, Cilt: 1398, Editörler: Nédellec C. & Rouveirol C., Springer, Berlin, 4-15., 1998.
[33] V. Vapnik, The Nature of Statistical Learning Theory. Springer, New York, 1995.
[34] E. E. Osuna, Support Vector Machines: Training and Applications, Doktora Tezi, Massachusetts Institute of Technology, 1998.
[35] Y. Özkan, Veri Madenciliği Yöntemleri, Papatya Yayıncılık, İstanbul, 2008.
[36] R. Pandya & J. Pandya, “C5. 0 Algorithm to Improved Decision Tree with Feature Selection and Reduced Error Pruning” International Journal of Computer Applications, 117(16), 18-21, 2015.
[37] S. Pang & J. Gong, “C5.0 Classification Algorithm and Application on Individual Credit Evaluation of Banks”, Systems Engineering - Theory & Practice, 29(12), 94-104, 2009.
[38] S. Yadav & S. Shukla, “Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification”, 2016 IEEE 6th International Conference on Advanced Computing (IACC), India, 78-83, 2016.
[39] M. Sokolova, N. Japkowicz & S. Szpakowicz, “Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation”, AI 2006: Advances in Artificial Intelligence, 4304, Editörler: Sattar A. & Kang B., Springer, Berlin Heidelberg 1015-1021, 2006.
[40] G. Forman & M. Scholz, “Apples-to-Apples in Cross-Validation Studies: Pitfalls in Classifier Performance Measurement”, ACM SIGKDD Explorations Newsletter, 12(1), 49–57, 2010.
[41] D. M. Hawkins, “The problem of overfitting”, Journal of chemical information and computer sciences, 44(1), 1-12, 2004.
[42] V. Cerqueira, L. Torgo & I. Mozetič. “Evaluating time series forecasting models: an empirical studyon performance estimation methods”, Machine Learning, 109:1997–2028, 2020.
[43] Y.H. Hu, C.-L. Lo & S. P. Shih, “Developing Early Warning Systems to Predict Students’ Online Learning Performance”, Computers in Human Behavior, 36, 469-478, 2014.
[44] C. Romero, P.González, S. Ventura, M. J. del Jesus & F. Herrera, “Evolutionary Algorithms for Subgroup Discovery in E-Learning: A Practical Application Using Moodle Data”, Expert Systems with Applications, 36(2), 1632-1644, 2009.
[45] F. Pascual-Miguel, J. C. Pelaez, A. H. Garcia & S. I. Pradas, “A Characterisation of Passive and Active Interactions and Their Influence on Students’ Achievement Using Moodle LMS Logs”, International Journal of Technology Enhanced Learning, 3(4), 403, 2011.