Examining the Discrimination of Binary Scored Test Items with ROC Analysis

In this study, it was claimed that ROC analysis, which is used to determine to what extent medical diagnosis tests can be differentiated between patients and non-patients, can also be used to examine the discrimination of binary scored items in cognitive tests. In order to obtain various evidence for this claim, the 2x2 contingency table used in the ROC analysis was adapted in accordance with the logic of item discrimination. It was suggested in the article that the areas under the ROC curves (AUC) obtained by using the sensitivity and specificity values calculated with the adapted contingency table can be considered as a measure of item discrimination. The results of the statistical analyses made on the simulation data showed that the AUC values were positively and highly correlated with the D, ?bis and a parameter values of the items, and the AUC values from different sized samples were consistent. Additionally, ROC analysis was more stable against range narrowing than other methods. In this respect, it was concluded that very large groups were not needed to examine item discrimination with the proposed method.

Examining the Discrimination of Binary Scored Test Items with ROC Analysis

In this study, it was claimed that ROC analysis, which is used to determine to what extent medical diagnosis tests can be differentiated between patients and non-patients, can also be used to examine the discrimination of binary scored items in cognitive tests. In order to obtain various evidence for this claim, the 2x2 contingency table used in the ROC analysis was adapted in accordance with the logic of item discrimination. It was suggested in the article that the areas under the ROC curves (AUC) obtained by using the sensitivity and specificity values calculated with the adapted contingency table can be considered as a measure of item discrimination. The results of the statistical analyses made on the simulation data showed that the AUC values were positively and highly correlated with the D, ?bis and a parameter values of the items, and the AUC values from different sized samples were consistent. Additionally, ROC analysis was more stable against range narrowing than other methods. In this respect, it was concluded that very large groups were not needed to examine item discrimination with the proposed method.

___

  • Alonzo, A.T., & Pepe, S. M. (2002). Distribution-free ROC analysis using binary regression tecniques. Biostatistics, 3(3), 421-432. https://doi.org/10.1093/biostatistics/3.3.421
  • Baker, F.B. (2001). The basics of item response theory. ERIC Clearinghouse on Assessment and Evaluation.
  • Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4), 387–415. https://doi.org/10.1016/0022-2496(75)90001-2
  • Çüm, S., Gelbal, S., & Tsai, C-P. (2016). Examination of the consistency of the sato test theory item parameters obtained from different samples. Journal of Measurement and Evaluation in Education and Psychology, 7(1), 170-181. https://doi.org/10.21031/epod.69276
  • DeMars, C. (2016). Madde tepki kuramı [Item response theory]. Nobel.
  • Ebel, R.L., & Frisbie, D.A. (1991). Essentials of educational measurement. Prentice-Hall Inc.
  • Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. Routledge.
  • Gönen, M. (2007). Analyzing Receiver Operating Characteristic Curves with SAS®. SAS Institute Inc.
  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Springer.
  • Hosmer, D.W., & Lemeshow, S. (2000). Applied logistic regression. John-Wiley & Sons, INC.
  • Krzanowski, W.J., & Hand, D.J. (2009). ROC curves for continuous data. Chapman and Hall/CRC Press.
  • Osterlind, S. J. (1990). Toward a uniform definition of a test item. Educational Research Quarterly, 14(4), 2-5.
  • Pepe, M.S. (2003). The statistical evaluation of medical tests for classification and prediction. University Press, Oxford.
  • Ruopp, D. M., Perkins, J. N., Whitcomb, W. B., & Schisterman, F. E. (2008). Youden index and optimal cut-point estimated from observations affected by a lower limit of detection. Biometrical Journal, 3, 419-430. https://doi.org/10.1002/bimj.200710415
  • Van Erkel, A.R., & Pattynama, P.M. (1998). Reciever operating characteristic analysis: Basic principles and applications in radiology. European Journal of Radiology 27, 88-94. https://doi.org/10.1016/S0720-048X(97)00157-5
  • Zou, H. K., Liu, A., Bandos, I.A., Onho-Machado, L., & Rockette, E. H. (2012). Statistical evaluation of diagnostic performance topics in roc analysis. CRC Press.