Comparison of IRT likelihood ratio test and logistic regression dif detection procedures

Bu Monte Carlo simülasyon çalışmasında, MTK olabilirlik oranı testi ve kümülatif lojit ordinal lojistik regresyon yöntemlerinin çok kategorili puanlanan maddeler için değişen madde fonksiyonunu (DMF) saptamada tip I hata oranlan ve güçleri incelenmiştir. Bu amaç doğrultusunda, 54 simülasyon koşulu (3 örneklem büyüklüğü, 2 örneklem büyüklüğü oranı, 3 DMF büyüklüğü ve 3 DMF durumu) üretilmiş ve herbir simülasyon koşulu 200 kere tekrar edilmiştir. MTK olabilirlik oranı testi ve ordinal lojistik regresyon yöntemlerinin tip I hata oranlan genel olarak bütün simülasyon koşulları altında iyi kontrol sağlamıştır. MTK olabilirlik oranı testinin gücü orta veya büyük örneklem büyüklüğü ve orta veya büyük DMF büyüklüğü için yüksek bulunmuştur. Bu yöntemin gücü örneklem büyüklüğü veya DMF büyüklüğü arttıkça artmıştır. Diğer yandan, ordinal lojistik regresyon yönteminin gücü büyük örneklem büyüklüğü ve büyük DMF koşulu hariç bütün simülasyon koşulları için kabul edilemez derecede düşük çıkmıştır.

MTK olabilirlik oranı testi ve lojistik regresyon değişen madde fonksiyonu belirleme yöntemlerinin karşılaştırılması

BSTRACT: The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample size ratios, 3 DIF magnitudes, and 3 DIF conditions) were generated and each simulation condition was replicated 200 times. In general, the Type I error rates of IRT likelihood ratio test and ordinal logistic regression procedures were in good control across all simulation conditions. The power of likelihood-ratio test was high for medium or large sample sizes and moderate or large DIF magnitude conditions. The power of this procedure increased as the sample size or DIF magnitude increased. On the other hand, the power of ordinal logistic regression procedure was unacceptably low for all DIF conditions except for the large sample size and large DIF magnitude condition.

PDF

___

Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67-91.
Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodnes-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277-300.
Chang, H., Mazzeo, J., & Roussos, L. (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 35(3), 333-353.
Cohen, A. S., Kim, S., & Baker, F. B.(1993).Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17(4), 335-350.
Dorans, N. J. & Kulick E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of Educational Measurement, 23(4), 355-368.
Dorans, N. J., & Schmitt, A. P. (1993). Constructed response and differential item functioning: A pragmatic approach. (ETS-RR-91-47). Princeton, NJ: Educational Testing Service.
French, A. W., & Miller, T. R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33(3), 315-332.
Holland, P.W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. Braun (Eds), Test validity (pp.129-145). Hillsdale, NL: Erlbaum.
Kim, S., Cohen, A. S., DiStefano, C. A., & Kim, S. (1998). An investigation of the likelihood ratio test for detection of differential item functioningunder the partial credit models. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Kim, S. & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22(4), 345-355.
Kristjansson, E., Aylesworth, R., McDowell, I., & Zumbo, B. D. (2005). A comparison of four methods for detecting differential item functioning in ordered response items. Educational and Psychological Measurement, 65 (6), 935-953.
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.
Miller, T. R., & Spray, J. A. (1993). Logistic discriminant function analysis for DIF identification of polytomously scored items. Journal of Educational Measurement, 30(2), 107-122.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, No. 17.
Samejima, F. (1999). General graded response model. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada.
Shealy, R. T., & Stout, W. F. (1993). An item response theory model for test bias and differential test functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 197-239). Hillsdale, NJ: Erlbaum.
Swaminathan. H. & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
Thissen, D. (2001). IRTLRDIF v.2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. University of North Carolina at Chapel Hill: L. L. Thurstone Psychometric Laboratory.
Thissen, D., Steinberg, L., & Gerard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118-128.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun(Eds.), Test validity (pp.147-169). Hillsdale, NJ: Erlbaum.
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp.67-113). Hillsdale, NJ: Erlbaum.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Ottawa, Ontario, Canada: Directorate of Human resources research and Evaluation, Department of National Defence.
Zwick, R., Donoghue, J. R., & Grima, A. (1993a). Assessing differential item functioning in performance tests (ETS-RR-93-14). Princeton, NJ: Educational Testing Service.
Zwick, R., Donoghue, J. R., & Grima, A. (1993b). Assessing differential item functioning for performance tests. Journal of Educational Measurement, 30(3), 233-251.