An Explainable Machine Learning Approach to Predicting and Understanding Dropouts in MOOCs

Purpose: The purpose of this study is to predict dropouts in two runs of the same MOOC using an explainable machine learning approach. With the explainable approach, we aim to enable the interpretation of the black-box predictive models from a pedagogical perspective and to produce actionable insights for related educational interventions. The similarity and the differences in feature importance between the predictive models were also examined. Design/Methodology/Approach: This is a quantitative study performed on a large public dataset containing activity logs in a MOOC. In total, 21 features were generated and standardized before the analysis. Multi-layer perceptron neural network was used as the black-box machine learning algorithm to build the predictive models. The model performances were evaluated using the accuracy and AUC metrics. SHAP was used to obtain explainable results about the effects of different features on students’ success or failure. Findings: According to the results, the predictive models were quite accurate, showing the capacity of the features generated in capturing student engagement. With the SHAP approach, reasons for dropouts for the whole class, as well as for specific students were identified. While mostly disengagement in assignments and course wares caused dropouts in both course runs, interaction with video (the main teaching component) showed a limited predictive power. In total six features were common strong predictors in both runs, and the remaining four features belonged to only one run. Moreover, using waterfall plots, the reasons for predictions pertaining to two randomly chosen students were explored. The results showed that dropouts might be explained by different predictions for each student, and the variables associated with dropouts might be different than the predictions conducted for the whole course. Highlights: This study illustrated the use of an explainable machine learning approach called SHAP to interpret the underlying reasons for dropout predictions. Such explainable approaches offer a promising direction for creating timely class-wide interventions as well as for providing personalized support for tailored to specific students. Moreover, this study provides strong evidence that transferring predictive models between different contexts is less like to be successful.

An Explainable Machine Learning Approach to Predicting and Understanding Dropouts in MOOCs

Purpose: The purpose of this study is to predict dropouts in two runs of the same MOOC using an explainable machine learning approach. With the explainable approach, we aim to enable the interpretation of the black-box predictive models from a pedagogical perspective and to produce actionable insights for related educational interventions. The similarity and the differences in feature importance between the predictive models were also examined. Design/Methodology/Approach: This is a quantitative study performed on a large public dataset containing activity logs in a MOOC. In total, 21 features were generated and standardized before the analysis. Multi-layer perceptron neural network was used as the black-box machine learning algorithm to build the predictive models. The model performances were evaluated using the accuracy and AUC metrics. SHAP was used to obtain explainable results about the effects of different features on students’ success or failure. Findings: According to the results, the predictive models were quite accurate, showing the capacity of the features generated in capturing student engagement. With the SHAP approach, reasons for dropouts for the whole class, as well as for specific students were identified. While mostly disengagement in assignments and course wares caused dropouts in both course runs, interaction with video (the main teaching component) showed a limited predictive power. In total six features were common strong predictors in both runs, and the remaining four features belonged to only one run. Moreover, using waterfall plots, the reasons for predictions pertaining to two randomly chosen students were explored. The results showed that dropouts might be explained by different predictions for each student, and the variables associated with dropouts might be different than the predictions conducted for the whole course. Highlights: This study illustrated the use of an explainable machine learning approach called SHAP to interpret the underlying reasons for dropout predictions. Such explainable approaches offer a promising direction for creating timely class-wide interventions as well as for providing personalized support for tailored to specific students. Moreover, this study provides strong evidence that transferring predictive models between different contexts is less like to be successful.