MIMIC, SIBTEST, Lojistik Regresyon ve Mantel-Haenszel Yöntemleriyle Gerçekleştirilen DMF ve Yanlılık Çalışması

Bu çalışmanın amacı değişen madde fonksiyonu (DMF) belirleme yöntemlerinden MIMIC, SIBTEST, Lojistik Regresyon ve Mantel-Haenszel yöntemlerinin benzerlik ve farklılıklarını belirlemektir. Ayrıca uzman görüşleri ile yöntemlere göre elde edilen sonuçların tutarlılığı incelenmiştir. Araştırma yaklaşık 340000 öğrencinin yer aldığı veri setinden 300, 600 1000, 1200 ve 2000 kişilik alt örneklemler seçilerek yürütülmüştür. Dört ayrı yöntem için, örneklemlere göre DMF içeren ortak maddelere baktığımızda; 300 kişilik örneklemde 2. madde; 600 kişilik örneklemde 13. madde; 1000 kişilik örneklemde ortak DMF içeren madde olmadığı, 1200 kişilik örneklemde 19. madde ve 2000 kişilik örneklemde 2, 3 ve 4. maddelerin DMF içerdiği görülmüştür. Bu bakımdan büyük örneklemde, yöntemlerin birbiriyle daha uyumlu sonuçlar verdiği söylenebilir. Uzman kanılarına göre yanlı olduğu belirlenen 2, 5, 6 ve 12. maddeler istatistiksel analizlerin sonuçlarıyla karşılaştırıldığında bu maddelerin farklı yöntem ve ya farklı örneklemlerde DMF gösterdiği görülmüştür. Ayrıca uzman kanıları ile analiz sonuçlarının birbirleriyle tutarlı olduğu söylenebilir.

A DIF and Bias Study by using MIMIC, SIBTEST, Logistic Regression and Mantel-Haenszel Methods

The aim of this study was to find out similarities and differences between methods for differential item functioning (DIF) such as MIMIC, SIBTEST, Logistic Regression and Mantel-Haenszel methods. Also the results obtained with expert opinions according to the methods were examined in terms of consistency. The study was carried out with subsamples of 300, 600, 1000, 1200 and 2000 examinees selected from the dataset of approximately 340.000 students. For the four methods, common items containing DIF were examined by sample groups. It was seen that item 2 does not contain common DMF in the sample of 300 persons, item 13 in the sample of 600, and no items contain common DIF in the sample of 1000 persons; whereas item 19 contains DIF in the group of 1200 and items 2, 3, and 4 contain DIF in the group of 2000 persons. In the light of this, it can be suggested that the methods for 2000 persons yielded more compliant results in the large sample. By comparing items 2, 5, 6 and 12 identified to be biased according to expert opinion with statistical analysis results, it was found out that those items showed DIF with different methods or samples. In addition, expert opinions seem to be consistent with results of the analysis.

___

  • Atalay, K., Gök, B., Kelecioğlu, H. & Arsan, N. (2012). Değişen madde fonksiyonunun belirlenmesinde kullanılan farklı yöntemlerin karşılaştırılması: Bir simülasyon çalışması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 43, 270- 281.
  • Bekçi, B. (2007). Orta öğretim kurumları öğrenci seçme ve yerleştirme sınavının değişen madde fonksiyonlarının cinsiyete ve okul türüne göre incelenmesi (Yüksek lisans tezi). Hacettepe Üniversitesi, Ankara.
  • Calvert, T. (2001). Exploring differential item functioning (DIF) with the Rasch Model: A cross-country comparison of gender differences on eight grade science items. Seattle,WA: AERA.
  • Camili, G., & Shephard, L. A. (1994). Methods for identifying biased test items. London: Sage.
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31- 44.
  • Doğan, N., & Öğretmen, T. (2008). Değişen madde fonksiyonunu belirlemede Mantel-Haenszel, Ki-Kare ve Lojistik Regresyon tekniklerinin karşılaştırılması. Eğitim ve Bilim, 33(148),100-112.
  • Dogan, E., Guerrero, A., & Tatsuoka, K. (2005, April). Using DIF to investigate strengths and weaknesses in mathematics achievement profiles of 10 different countries. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), Montreal, Canada. Abstract retrieved from http://cms.tc.columbia.edu/i/a/1688_ncme-05-enis.pdf
  • Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization (ETS- RR-92-10). Retrieved from http://files.eric.ed.gov/fulltext/ED387526.pdf
  • Finch, W. H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel- Haenszel, SIBTEST and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.
  • Finch, W. H., & French, F. B. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational and Psychological Measurement, 67(4), 565-582.
  • Finch, W. H., & French, B. F. (2011). Estimation of MIMIC model parameters with multilevel data. Structural Equation Modeling: A Multidisciplinary Journal, 18(2), 229-252.
  • Gierl, M. J., Jodoin, M., & Ackerman, T. (2000). Performance of Mantel-Haenszel, simultaneous item bias test and Logistic Regression when the proportion of dif items is large. Paper presented at the Annual Meeting of the American Educational Research Association, New Orleans, Louisiana, USA.
  • Gierl, M. J., Rogers, W. T., & Klinger, D. (1999). Consistency between statistical procedures and content reviews for identifying translation DIF. Paper presented at the annual meeting of the National Council on Measurement in Education, Montréal, Quebec, Canada.
  • Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the Logistic Regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349.
  • MacIntosh, R., & Hashim, S. (2003). Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis. Applied Psychological Measurement, 27(5), 372-379.
  • Roussos, L. A. & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33, 215-230.
  • Shih, C.-L., & Wang, W.-C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33, 184-199.
  • Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292-1306.
  • Wang, W.-C., Shih, C.-L., & Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69, 713-731.
  • Woods, C. M. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44, 1-27.
  • Woods, C. M., Oltmanns, T. F., & Turkheimer, E. (2009). Illustration of MIMIC-model DIF testing with the schedule for nonadaptive and adaptive personality. J Psychopathol Behav Assess, 31(4), 320-330.
  • Zheng, Y., Gierl, M. J., & Cui, Y. (2007). Using real data to compare DMF detection and effect size measures among Mantel-Haenszel, SIBTEST, and Logistic Regression procedures. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, USA.
  • Zieky, M. (2006). Fairness reviews in assessment. In S. M. Downing, & T. M. Haladyna (Eds.), Handbook of Test Development. Mahwah, NJ: Lawrence Erlbaum Associates.
  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic Regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Ottowa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B. D. (2007). Three generations of DMF Analysis: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233.
Hacettepe Üniversitesi Eğitim Fakültesi Dergisi-Cover
  • Başlangıç: 1986
  • Yayıncı: Hacettepe Üniversitesi Eğitim Fakültesi Dekanlığı