Automatic story and item generation for reading comprehension assessments with transformers

Reading comprehension is one of the essential skills for students as they make a transition from learning to read to reading to learn. Over the last decade, the increased use of digital learning materials for promoting literacy skills (e.g., oral fluency and reading comprehension) in K-12 classrooms has been a boon for teachers. However, instant access to reading materials, as well as relevant assessment tools for evaluating students’ comprehension skills, remains to be a problem. Teachers must spend many hours looking for suitable materials for their students because high-quality reading materials and assessments are primarily available through commercial literacy programs and websites. This study proposes a promising solution to this problem by employing an artificial intelligence (AI) approach. We demonstrate how to use advanced language models (e.g., OpenAI’s GPT-2 and Google’s T5) to automatically generate reading passages and items. Our preliminary findings suggest that with additional training and fine-tuning, open-source language models could be used to support the instruction and assessment of reading comprehension skills in the classroom. For both automatic story and item generation, the language models performed reasonably; however, the outcomes of these language models still require a human evaluation and further adjustments before sharing them with students. Practical implications of the findings and future research directions are discussed.

Automatic story and item generation for reading comprehension assessments with transformers

Reading comprehension is one of the essential skills for students as they make a transition from learning to read to reading to learn. Over the last decade, the increased use of digital learning materials for promoting literacy skills (e.g., oral fluency and reading comprehension) in K-12 classrooms has been a boon for teachers. However, instant access to reading materials, as well as relevant assessment tools for evaluating students’ comprehension skills, remains to be a problem. Teachers must spend many hours looking for suitable materials for their students because high-quality reading materials and assessments are primarily available through commercial literacy programs and websites. This study proposes a promising solution to this problem by employing an artificial intelligence (AI) approach. We demonstrate how to use advanced language models (e.g., OpenAI’s GPT-2 and Google’s T5) to automatically generate reading passages and items. Our preliminary findings suggest that with additional training and fine-tuning, open-source language models could be used to support the instruction and assessment of reading comprehension skills in the classroom. For both automatic story and item generation, the language models performed reasonably; however, the outcomes of these language models still require a human evaluation and further adjustments before sharing them with students. Practical implications of the findings and future research directions are discussed.

___

  • Agosto, D.E. (2016). Why storytelling matters: Unveiling the literacy benefits of storytelling. Children and Libraries, 14(2), 21-26. https://doi.org/10.5860/cal.14n2.21
  • Allington, R.L., McGill-Franzen, A., Camilli, G., Williams, L., Graff, J., Zeig, J., Zmach, C., & Nowak, R. (2010). Addressing summer reading setback among economically disadvantaged elementary students. Reading Psychology, 31(5), 411 427. https://doi.org/10.1080/02702711.2010.505165
  • Basu, S., Ramachandran, G.S., Keskar, N.S., & Varshney, L.R. (2020). Mirostat: A neural text decoding algorithm that directly controls perplexity. arXiv preprint. https://doi.org/10.48550/arXiv.2007.14966
  • Begeny, J.C., & Greene, D.J. (2014). Can readability formulas be used to successfully gauge difficulty of reading materials? Psychology in the Schools, 51(2), 198 215. https://doi.org/10.1002/pits.21740
  • Bigozzi, L., Tarchi, C., Vagnoli, L., Valente, E., & Pinto, G. (2017). Reading fluency as a predictor of school outcomes across grades 4-9. Frontiers in Psychology, 8(200), 1-9. https://doi.org/10.3389/fpsyg.2017.00200
  • Bulut, H.C., Bulut, O., & Arikan, S. (2022). Evaluating group differences in online reading comprehension: The impact of item properties. International Journal of Testing. Advance online publication. https://doi.org/10.1080/15305058.2022.2044821
  • Das, B., Majumder, M., Phadikar, S., & Sekh, A.A. (2021). Automatic question generation and answer assessment: A survey. Research and Practice in Technology Enhanced Learning, 16(1), 1-15. https://doi.org/10.1186/s41039-021-00151-1
  • Denkowski, M., & Lavie, A. (2014, June). Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the ninth workshop on statistical machine translation (pp. 376-380).
  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). Bert: Pre training of deep bidirectional transformers for language understanding. arXiv preprint. https://doi.org/10.48550/arXiv.1810.04805
  • Dong, X., Hong, Y., Chen, X., Li, W., Zhang, M., & Zhu, Q. (2018, August). Neural question generation with semantics of question type. In CCF International Conference on Natural Language Processing and Chinese Computing (pp. 213-223). Springer, Cham.
  • Du, X., & Cardie, C. (2017, September). Identifying where to focus in reading comprehension for neural question generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2067 2073). https://doi.org/10.18653/v1/D17-1219
  • Du, X., Shao, J., & Cardie, C. (2017). Learning to ask: Neural question generation for reading comprehension. arXiv preprint. https://doi.org/10.48550/arXiv.1705.00106
  • Duan, N., Tang, D., Chen, P., & Zhou, M. (2017, September). Question generation for question answering. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 866-874). https://doi.org/10.18653/v1/D17-1090
  • Duke, N.K., & Pearson, P.D. (2009). Effective practices for developing reading comprehension. Journal of Education, 189(1/2), 107 122. https://doi.org/10.1177/0022057409189001-208
  • Duke, N.K., Pearson, P.D., Strachan, S.L., & Billman, A.K. (2011). Essential elements of fostering and teaching reading comprehension. What research has to say about reading instruction, 4, 286-314.
  • Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical neural story generation. arXiv preprint. https://doi.org/10.48550/arXiv.1805.04833
  • Guthrie, J.T. (2004). Teaching for literacy engagement. Journal of Literacy Research, 36(1), 1-30. https://doi.org/10.1207/s15548430jlr3601_2
  • Heilman, M., & Smith, N.A. (2010, June). Good question! Statistical ranking for question generation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 609-617).
  • Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2019). The curious case of neural text degeneration. arXiv preprint. https://doi.org/10.48550/arXiv.1904.09751
  • Holtzman, A., Buys, J., Forbes, M., Bosselut, A., Golub, D., & Choi, Y. (2018) Learning to write with cooperative discriminators. arXiv preprint. https://doi.org/10.48550/arXiv.1805.06087
  • Kim, J.S., & White, T.G. (2008). Scaffolding voluntary summer reading for children in grades 3 to 5: An experimental study. Scientific Studies of Reading, 12(1), 1 23. https://doi.org/10.1080/10888430701746849
  • Kulikov, I., Miller, A.H., Cho, K., & Weston, J. (2018). Importance of search and evaluation strategies in neural dialogue modelling. arXiv preprint. https://doi.org/10.48550/arXiv.1811.00907
  • Liu, B. (2020, April). Neural question generation based on Seq2Seq. In Proceedings of the 2020 5th International Conference on Mathematics and Artificial Intelligence (pp. 119-123).
  • Lin, C.Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81).
  • Miller, S., & Pennycuff, L. (2008). The power of story: Using storytelling to improve literacy learning. Journal of Cross-Disciplinary Perspectives in Education, 1(1), 36-43.
  • Pan, L., Lei, W., Chua, T.S., & Kan, M.Y. (2019). Recent advances in neural question generation. arXiv preprint arXiv: https://doi.org/10.48550/arXiv.1905.08949
  • Papineni, K., Roukos, S., Ward, T., & Zhu, W.J. (2002, July). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318).
  • Peck, J. (1989). Using storytelling to promote language and literacy development. The Reading Teacher, 43(2), 138-141. https://www.jstor.org/stable/20200308
  • Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI tech report.
  • Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI tech report.
  • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P.J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint. https://doi.org/10.48550/arXiv.1910.10683
  • Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383–2392).
  • Rasinski, T.V. (2012). Why reading fluency should be hot! The Reading Teacher, 65(8), 516-522. https://doi.org/10.1002/TRTR.01077
  • Rus, V., Wyse, B., Piwek, P., Lintean, M., Stoyanchev, S., & Moldovan. C. (2012). A detailed account of the first question generation shared task evaluation challenge. Dialogue and Discourse, 3(2), 177–204. https://doi.org/10.5087/dad
  • Sáenz, L.M., & Fuchs, L.S. (2002). Examining the reading difficulty of secondary students with learning disabilities: Expository versus narrative text. Remedial and Special Education, 23(1), 31-41.
  • See, A., Pappu, A., Saxena, R., Yerukola, A., & Manning, C.D. (2019). Do massively pretrained language models make better storytellers? arXiv preprint. https://doi.org/10.48550/arXiv.1909.10705
  • Sun, X., Liu, J., Lyu, Y., He, W., Ma, Y., & Wang, S. (2018). Answer-focused and position-aware neural question generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3930-3939).
  • Tang, D., Duan, N., Qin, T., Yan, Z., & Zhou, M. (2017). Question answering and question generation as dual tasks. arXiv preprint. https://doi.org/10.48550/arXiv.1706.02027
  • Taylor, B.M., Pearson, P.D., Clark, K., & Walpole, S. (2000). Effective schools and accomplished teachers: Lessons about primary-grade reading instruction in low-income schools. The Elementary School Journal, 101(2), 121 165. https://doi.org/10.1086/499662
  • Taylor, B.M., Pearson, P.D., Peterson, D.S., & Rodriguez, M.C. (2003). Reading growth in high-poverty classrooms: The influence of teacher practices that encourage cognitive engagement in literacy learning. The Elementary School Journal, 104(1), 3 28. https://doi.org/10.1086/499740
  • Tivnan, T., & Hemphill, L. (2005). Comparing four literacy reform models in high-poverty schools: Patterns of first-grade achievement. The Elementary School Journal, 105(5), 419-441. https://doi.org/10.1086/431885
  • Wang, B., Wang, X., Tao, T., Zhang, Q., & Xu, J. (2020, April). Neural question generation with answer pivot. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 9138-9145).
  • Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., & Zhou, M. (2017, November). Neural question generation from text: A preliminary study. In National CCF Conference on Natural Language Processing and Chinese Computing (pp. 662-671). Springer, Cham.