A Machine Learning Based Approach to Enhance Mooc Users’ Classification
At the beginning of the 2010 decade, the world of education and more specifically e-learning was revolutionized by the emergence of Massive Open Online Courses, better known by their acronym MOOC. Proposed more and more by universities and training centers around the world, MOOCs have become an undeniable asset for any student or person seeking to complete their initial training with free distance courses open to all areas. Despite the remarkable number of course enrollees, MOOCs have a huge dropout rate of up to 90%. This rate significantly affects the efforts made by the moderators for the success of this pedagogical model and negatively influences the learners’ experience and their supervision. To address this problem and help instructors streamline their interventions, we present a solution to classify MOOC learners into three distinct classes. The approach proposed in this paper is based on the filters methods to select the most relevant attributes and ensembling methods of machine learning algorithms. This approach has been validated by four MOOC courses from Stanford University. In order to prove the performance of the model (92.2%), a comparative study between the proposed model and other algorithms was made on several performance measures.
___
- Alonso-betanzos, A. (2007). Filter methods for feature selection. A comparative study. In Intelligent Data
Engineering and Automated Learning - IDEAL 2007 (pp. 178–187). https://doi.org/10.1007/978-
3-642-04394-9
Alves, A. (2017). Stacking machine learning classifiers to identify Higgs bosons at the LHC. Journal of
Instrumentation, 12(5). https://doi.org/10.1088/1748-0221/12/05/T05005
Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R. S., Lian, C., Huai, Y., … Franklin, M. J. (2015). Spark SQL:
Relational Data Processing in Spark. Proceedings of the 2015 ACM SIGMOD International Conference
on Management of Data - SIGMOD ’15, 1383–1394. https://doi.org/10.1145/2723372.2742797
Bahassine, S., Madani, A., Al-sarem, M., & Kissi, M. (2018). Feature selection using an improved Chisquare for Arabic text classification. Journal of King Saud University - Computer and Information
Sciences. https://doi.org/10.1016/j.jksuci.2018.05.010
Burgos, C., Campanario, M. L., Pe??a, D. de la, Lara, J. A., Lizcano, D., & Mart??nez, M. A. (2017). Data
mining for modeling students’ performance: A tutoring action plan to prevent academic dropout.
Computers and Electrical Engineering, 0, 1–16. https://doi.org/10.1016/j.compeleceng.2017.03.005
Chaplot, D. S., Rhim, E., & Kim, J. (2015). Predicting student attrition in MOOCs using sentiment
analysis and neural networks. CEUR Workshop Proceedings, 1432, 7–12.
66
Chen, Z., Brandon, A., Gayle, C., Nicholas, E., Daphne, K., & J.Ezekiel, E. (2015). Who’s benefiting from
MOOCs, and why. Harvard Business Review, 25.
Choudhury, S., & Bhowal, A. (2015). Comparative analysis of machine learning algorithms along with
classifiers for network intrusion detection. 2015 International Conference on Smart Technologies
and Management for Computing, Communication, Controls, Energy and Materials, ICSTM 2015 -
Proceedings, (May), 89–95. https://doi.org/10.1109/ICSTM.2015.7225395
Crossley, S., Paquette, L., Dascalu, M., McNamara, D. S., & Baker, R. S. (2016). Combining clickstream data with NLP tools to better understand MOOC completion. Proceedings of the Sixth
International Conference on Learning Analytics & Knowledge - LAK ’16, 6–14. https://doi.
org/10.1145/2883851.2883931
Erel, I., Stern, L. H., Tan, C., & Weisbach, M. S. (2018). Selecting Directors Using Machine Learning. Ssrn.
https://doi.org/10.2139/ssrn.3144080
Feng, W., Tang, J., & Liu, T. X. (2019). Understanding Dropouts in MOOCs.
Gao, C., Cheng, Q., He, P., Susilo, W., & Li, J. (2018). Privacy-preserving Naive Bayes classifiers secure
against the substitution-then-comparison attack. Information Sciences, 444, 72–88. https://doi.
org/10.1016/j.ins.2018.02.058
Goel, S., Sabitha, A. S., & Choudhury, T. (2019). Analytical Analysis of Learners’ Dropout Rate with Data
Mining Techniques (Vol. 841). Springer Singapore. https://doi.org/10.1007/978-981-13-2285-3
Gupta, R., & Sambyal, N. (2013). An understanding Approach towards MOOCs. International Journal of
Emerging Technology and Advanced Engineering, 3(6), 312--315. Retrieved from http://www.ijetae.
com/files/Volume3Issue6/IJETAE_0613_52.pdf
Halawa, S., Greene, D., & Mitchell, J. (2014). Dropout Prediction in MOOCs using Learner Activity
Features. ELearning Papers, 37(March), 1–10. Retrieved from https://oerknowledgecloud.org/
sites/oerknowledgecloud.org/files/In_depth_37_1 (1).pdf
Jain, I., Jain, V. K., & Jain, R. (2018). Correlation feature selection based improved-Binary Particle Swarm
Optimization for gene selection and cancer classification. Applied Soft Computing Journal, 62,
203–215. https://doi.org/10.1016/j.asoc.2017.09.038
Jovic, A., Brkic, K., & Bogunovic, N. (2015). A review of feature selection methods with applications.
38th International Convention on Information and Communication Technology, Electronics
and Microelectronics, MIPRO 2015 - Proceedings, 1200–1205. https://doi.org/10.1109/
MIPRO.2015.7160458
Kabir, A., Ruiz, C., & Alvarez, S. A. (2014). Regression, Classification and Ensemble Machine Learning
Approaches to Forecasting Clinical Outcomes in Ischemic Stroke. In Biomedical Engineering
Systems and Technologies (Vol. 452, pp. 376–402). Springer International Publishing. https://doi.
org/10.1007/978-3-662-44485-6
Karegowda, A. G., Manjunath, A. S., & Jayaram, M. A. (2010). Feature Subset Selection Problem using
Wrapper Approach in Supervised Learning. International Journal of Computer Applications, 1(7),
13–17. https://doi.org/10.5120/169-295
Khourdifi, Y., & Bahaj, M. (2018). Feature Selection with Fast Correlation-Based Filter for Breast Cancer
Prediction and Classification Using Machine Learning Algorithms. In International Symposium on
Advanced Electrical and Communication Technologies (ISAECT) (pp. 1–6).
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2018). Feature Selection: A
Data Perspective. ACM Computing Surveys, 50. https://doi.org/10.1201/9781351070348
Liu, T., & Li, X. (2017). Finding out Reasons for Low Completion in MOOC Environment : An Explicable
Approach Using Hybrid Data Mining Methods, (Meit), 376–384.
67
Liyanagunawardena, T. R., Parslow, P., & Williams, S. A. (2014). Dropout: MOOC Participants’ Perspective.
Proceedings of the European MOOC Stakeholder Summit 2014, 95–100. Retrieved from http://
centaur.reading.ac.uk/36002/
Ly, A., Marsman, M., & Wagenmakers, E. (2018). Analytic posteriors for Pearson ’ s correlation coefficient.
Statistica Neerlandica, 72(1), 4–13. https://doi.org/10.1111/stan.12111
Martinez-Espana, R., Bueno-Crespo, A., Timón, I., Soto, J., Munoz, A., & Cecilia, J. M. (2018). Airpollution prediction in smart cities through machine learning methods: A case of study in Murcia,
Spain. Journal of Universal Computer Science, 24(3), 261–276.
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., … Talwalkar, A. (2015). MLlib:
Machine Learning in Apache Spark. Journal of Machine Learning Research, 17, 1–7. https://doi.
org/10.1145/2882903.2912565
Mu, Y., Liu, X., & Wang, L. (2017). A Pearson’s correlation coefficient based decision tree and its parallel
implementation. Information Sciences. https://doi.org/10.1016/j.ins.2017.12.059
Naghibi, S. A., Ahmadi, K., & Daneshi, A. (2017). Application of Support Vector Machine, Random Forest,
and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping.
Water Resources Management, 31(9), 2761–2775. https://doi.org/10.1007/s11269-017-1660-3
Nagi, S., & Bhattacharyya, D. K. (2013). Classification of microarray cancer data using ensemble approach.
Network Modeling and Analysis in Health Informatics and Bioinformatics, 2(3), 159–173. https://
doi.org/10.1007/s13721-013-0034-x
Onah, D. F. ., Sinclair, J., & Boyatt. (2014). DROPOUT RATES OF MASSIVE OPEN ONLINE
COURSES : BEHAVIOURAL PATTERNS MOOC Dropout and Completion : Existing
Evaluations. Proceedings of the 6th International Conference on Education and New Learning
Technologies (EDULEARN14), 1–10. https://doi.org/10.13140/RG.2.1.2402.0009
Os, H. J. A. Van, Ramos, L. A., Hilbert, A., & Leeuwen, M. Van. (2018). Predicting Outcome of Endovascular
Treatment for Acute Ischemic Stroke : Potential Value of Machine Learning Algorithms. Frontiers
in Neurology, 9(September), 1–8. https://doi.org/10.3389/fneur.2018.00784
Salcedo-Sanz, S., Cornejo-Bueno, L., Prieto, L., Paredes, D., & Garcia-Herrera, R. (2018). Feature selection
in machine learning prediction systems for renewable energy applications. Renewable and
Sustainable Energy Reviews. https://doi.org/10.1016/j.rser.2018.04.008
Sanchez-Gordon, S., & Luján-Mora, S. (2016). How could MOOCs become accessible? The case of edX
and the future of inclusive online learning. Journal of Universal Computer Science, 22(1), 55–81.
Sikora, R., & Al-Laymoun, O. (2014). A Modified Stacking Ensemble Machine Learning Algorithm Using
Genetic Algorithms, 23(1), 43–53. https://doi.org/10.4018/978-1-4666-7272-7.ch004
Talavera, L. (2005). An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical
Clustering, 440–451. https://doi.org/10.1007/11552253_40
Urbanowicz, R. J., Meeker, M., Lacava, W., Olson, R. S., & Jason, H. (2018). Relief-Based Feature Selection :
Introduction and Review. Journal of Biomedical Informatics, 85, 189–203.
Vora, M. N. (2011). Hadoop-HBase for Large-Scale Data. In International Conference on Computer Science
and Network Technology (pp. 601–605).
White, T. (2012). Hadoop: The definitive guide. (M. Loukides & M. Blanchette, Eds.), Online (3rd Editio).
USA: O’Reilly Media, Inc. https://doi.org/citeulike-article-id:4882841
Xing, W., Chen, X., Stein, J., & Marcinkowski, M. (2016). Temporal predication of dropouts in MOOCs:
Reaching the low hanging fruit through stacking generalization. Computers in Human Behavior,
58(May), 119–129. https://doi.org/10.1016/j.chb.2015.12.007
68
Xing, W., Chen, X., Stein, J., & Marcinkowski, M. (2017). Erratum: Corrigendum to “Temporal predication of
dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization” (Computers
in Human Behavior (2016) 58 (119–129)(S074756321530279X)(10.1016/j.chb.2015.12.007)).
Computers in Human Behavior, 66, 409. https://doi.org/10.1016/j.chb.2016.08.051
Zhu, Y., Xie, C., Wang, G. J., & Yan, X. G. (2017). Comparison of individual, ensemble and integrated
ensemble machine learning methods to predict China’s SME credit risk in supply chain finance.
Neural Computing and Applications, 28(s1), 41–50. https://doi.org/10.1007/s00521-016-2304-x
Zitlau, R., Hoyle, B., Paech, K., Weller, J., Rau, M. M., & Seitz, S. (2016). Stacking for machine learning
redshifts applied to SDSS galaxies. Monthly Notices of the Royal Astronomical Society, 460(3),
3152–3162. https://doi.org/10.1093/mnras/stw1454