The Continuity of Students’ Disengaged Responding in Low-stakes Assessments: Evidence from Response Times

Several studies have been published on disengaged test respondents, and others have analyzed disengaged survey respondents separately. For many large-scale assessments, students answer questionnaire and test items in succession. This study examines the percentage of students who continuously engage in disengaged responding behaviors across sections in a low-stakes assessment. The effects on calculated scores of filtering students, based on their responding behaviors, are also analyzed. Data of this study came from the 2015 administration of PISA. For data analysis, frequencies and percentages of engaged students in the sessions were initially calculated using students' response times. To investigate the impact of filtering disengaged respondents on parameter estimation, three groups were created, namely engaged in both measures, engaged only in the test, and engaged only in the questionnaire. Next, several validity checks were performed on each group to verify the accuracy of the classifications and the impact of filtering student groups based on their responding behavior. The results indicate that students who are disengaged in tests tend to continue this behavior when responding to the questionnaire items in PISA. Moreover, the rate of continuity of disengaged responding is non-negligible as can be seen from the effect sizes. On the other hand, removing disengaged students in both measures led to higher or nearly the same performance ratings compared to the other groups. Researchers analyzing the dataset including achievement tests and survey items are recommended to review disengaged responses and filter out students who are continuously showing disengaged responding before performing further statistical analysis.

The Continuity of Students’ Disengaged Responding in Low-stakes Assessments: Evidence from Response Times

Several studies have been published on disengaged test respondents, and others have analyzed disengaged survey respondents separately. For many large-scale assessments, students answer questionnaire and test items in succession. This study examines the percentage of students who continuously engage in disengaged responding behaviors across sections in a low-stakes assessment. The effects on calculated scores of filtering students, based on their responding behaviors, are also analyzed. Data of this study came from the 2015 administration of PISA. For data analysis, frequencies and percentages of engaged students in the sessions were initially calculated using students' response times. To investigate the impact of filtering disengaged respondents on parameter estimation, three groups were created, namely engaged in both measures, engaged only in the test, and engaged only in the questionnaire. Next, several validity checks were performed on each group to verify the accuracy of the classifications and the impact of filtering student groups based on their responding behavior. The results indicate that students who are disengaged in tests tend to continue this behavior when responding to the questionnaire items in PISA. Moreover, the rate of continuity of disengaged responding is non-negligible as can be seen from the effect sizes. On the other hand, removing disengaged students in both measures led to higher or nearly the same performance ratings compared to the other groups. Researchers analyzing the dataset including achievement tests and survey items are recommended to review disengaged responses and filter out students who are continuously showing disengaged responding before performing further statistical analysis.

___

  • Birnbaum, A. (1968). Some latent trait models and their use in inferring a student’s ability. In F. M. Lord and M.R. Novick (eds.), Statistical theories of mental test scores. Addison-Wesley.
  • Buchanan, E. M., & Scofield, J. E. (2018). Methods to detect low-quality data and its implication for psychological research. Behavior Research Methods, 2018, (50), 2586–2596. https://doi.org/10.3758/s13428-018-1035-6
  • Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4 19. https://doi.org/10.1016/j.jesp.2015.07.006
  • DeMars, C. E. (2007). Changes in rapid-guessing behavior series of assessments. Educational Assessment, 12(1), 23–45. https://doi.org/10.1080/10627190709336946
  • Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual review of psychology, 53(1), 109-132. https://doi.org/10.1146/annurev.psych.53.100901.135153
  • Eklöf, H. (2006). Development and validation of scores from an instrument measuring student test-taking motivation. Educational and Psychological Measurement, 66, 643–656. https://doi.org/10.1177/0013164405278574
  • Eklöf, H., Pavešič, B. J., & Grønmo, L. S. (2014). A cross-national comparison of reported effort and mathematics performance in TIMSS Advanced. Applied Measurement in Education, 27(1), 31–45. https://doi.org/10.1080/08957347.2013.853070
  • Goldhammer, F., Martens, T., Christoph, G., & Lüdtke, O. (2016). Test-taking engagement in PIAAC (OECD Education Working Papers, No. 133). OECD Publishing.
  • Guo, H., Rios, J. A., Haberman, S., Liu, O. L., Wang, J., & Paek, I. (2016). A new procedure for detection of students’ rapid guessing responses using response time. Applied Measurement in Education, 29, 173 183. https://doi.org/10.1080/08957347.2016.1171766
  • Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27(1), 99–114. https://doi.org/10.1007/s10869-011-9231-8
  • Huang, J.L., Bowling, N.A., Liu, M., & Li, Y. (2015). Detecting insufficient effort responding with an infrequency scale: Evaluating validity and participant reactions. Journal of Business and Psychology, 30, 299–311. https://doi.org/10.1007/s10869-014-9357-6
  • Huang, J.L., Curran, P.G., Keeney, J., Poposki, E.M., & DeShon, R.P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27, 99–114. https://doi.org/10.1007/s10869-011-9231-8
  • Johnson, J. A. (2005). Ascertaining the validity of individual protocols from web-based personality inventories. Journal of Research in Personality, 39, 103 129. https://doi.org/10.1016/j.jrp.2004.09.009
  • Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16, 277 298. https://doi.org/10.1207/S15324818AME1604
  • Maniaci, M. R., & Rogge, R. D. (2014). Caring about carelessness: Participant inattention and its effects on research. Journal of Research in Personality, 48, 61 83. https://doi.org/10.1016/j.jrp.2013.09.008
  • Martinkova, P., Drabinova, A., Leder, O., & Houdek, J. (2017). ShinyItemAnalysis: Test ´and item analysis via shiny [Computer software manual]. https://CRAN.R-project.org/package=ShinyItemAnalysis.
  • Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17, 437–455. https://doi.org/10.1037/a0028085
  • Meyer, P. J. (2010). A mixture Rasch model with response time components. Applied Psychological Measurement, 34, 521–538. https://doi.org/10.1177/0146621609355451
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159 176. https://doi.org/10.1002/j.2333 8504.1992.tb01436.x
  • Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use?. Journal of Research in Personality, 63, 1–11. https://doi.org/10.1016/j.jrp.2016.04.010
  • OECD. (2017). PISA 2015 assessment and analytical framework: Science, reading, mathematic, financial literacy and collaborative problem solving. OECD Publishing. https://doi.org/10.1787/9789264281820-en
  • Palaniappan, K., & Kum, I. Y. S. (2019). Underlying Causes behind Research Study Participants’ Careless and Biased Responses in the Field of Sciences. Current Psychology, 38(6), 1737–1747. https://doi.org/10.1007/s12144-017-9733-2
  • R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL, https://www.R-project.org/.
  • Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.
  • Rosseel, Y. (2011). lavaan: An R package for structural equation modeling and more (Version 0.4-10 beta).
  • Setzer, J. C., Wise, S. L., van den Heuvel, J. R., & Ling, G. (2013). An investigation of examinee test-taking effort on a low-stakes assessment. Applied Measurement in Education, 26(1), 34–49. https://doi.org/10.1080/08957347.2013.739453
  • Sundre, D. L., & Moore, D. L. (2002). The Student Opinion Scale: A measure of examinee motivation. Assessment Update, 14(1), 8–9.
  • Sundre, D. L., &Wise, S. L. (2003, April). ‘Motivation filtering’: An exploration of the impact of low examinee motivation on the psychometric quality of tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago.
  • van der Linden, W. J. (2009). Conceptual issues in response‐time modeling. Journal of Educational Measurement, 46(3), 247 272. https://doi.org/10.1111/j.1745 3984.2009.00080.x
  • Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456–477. https://doi.org/10.1111/bmsp.12054
  • Wise, S. L. (2006). An investigation of the differential effort received by items on a low-stakes, computer-based test. Applied Measurement in Education, 19, 95–114. https://doi.org/10.1207/s15324818ame1902_2
  • Wise, S. L. (2017). Rapid-guessing behavior: Its identification, interpretations, and implications. Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165
  • Wise, S. L. (2019). An Information-Based Approach to Identifying Rapid-Guessing Thresholds. Applied Measurement in Education, 32(4), 325 336, https://doi.org/10.1080/08957347.2019.1660350
  • Wise, S. L., & DeMars, C. E. (2005). Examinee motivation in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10, 1 18. https://doi.org/10.1207/s15326977ea1001_1
  • Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43, 19 38. https://doi.org/10.1111/j.1745-3984.2006.00002.x
  • Wise, S. L., & Gao, L. (2017). A general approach to measuring test-taking effort on computer-based tests. Applied Measurement in Education, 30(4), 343 354. https://doi.org/10.1080/08957347.2017.1353992
  • Wise, S. L., & Kingsbury, G. G. (2016). Modeling student test-taking motivation in the context of an adaptive achievement test. Journal of Educational Measurement, 53, 86–105. https://doi.org/10.1111/jedm.2016.53.issue-1.
  • Wise, S. L., & Ma, L. (2012, April). Setting response time thresholds for a CAT item pool: The normative threshold method. In annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
  • Wise, S. L., Soland, J., & Bo, Y. (2019). The (Non) Impact of Differential Test Taker Engagement on Aggregated Scores. International Journal of Testing, 1–21. https://doi.org/10.1080/15305058.2019.1605999
  • Woods, C.M. (2006). Careless responding to reverse-worded items: Implications for con- firmatory factor analysis. Journal of Psychopathology and Behavioral Assessment, 28, 189–194. https://doi.org/10.1007/s10862-005-9004-7
  • Zamarro, G., Hitt, C., & Mendez, I. (2019). When students don’t care: Reexamining ınternational differences in achievement and student effort. Journal of Human Capital, 13(4), 519–552. https://doi.org/10.1086/705799
  • Zhang, C., & Conrad, F. (2014). Speeding in web surveys: The tendency to answer very fast and its association with straightlining. In Survey Research Methods, 8, 127–135. https://doi.org/10.18148/srm/2014.v8i2.5453