Investigation of Equating Error in Tests with Differential Item Functioning

In this study purposes to indicate the effect of the number of DIF items and the distribution of DIF items in these forms, which be equalized on equating error. Mean-mean, mean-standard deviation, Haebara and Stocking-Lord Methods used in common item design equal groups as equalization methods. The study included six different simulation conditions. The conditions were compared according to the number of DIF items and the distribution of DIF items on tests. The results illustrated that adding DIF items to tests were equated caused an increase in the errors obtained by equating methods. We may state that the change in errors is lowest in characteristic curve transformation methods, largest in moment methods depending on the situations in these conditions.

___

  • Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational Measurement (pp. 508-600). Washington, D.C.: American Council on Education.
  • Demirus, K. B. (2015). Ortak maddelerin değişen madde fonksiyonu gösterip göstermemesi durumunda test eşitlemeye etkisinin farklı yöntemlerle incelenmesi. Doktora Tezi, Ankara: Hacettepe Üniversitesi, Eğitim Bilimleri Enstitüsü.
  • Dorans, N. J. and Holland, P. W. (2000). Population invariance and the equatability of tests: basic theory and the linear case. Journal of Educational Measurment, Vol.37, No.4, 281-306.
  • Educational Testing Service. Guidlines for fairness review of assessment. Retrieved May 22, 2015 fromhttp://www.ets.org/Media/About_ETS/pdf/overview.pdf
  • Felan, G. D. (2002). Test Equating: Mean, Linear, Equipercentile and Item Response Theory. Paperpresented at theAnnual Meeting of the South West Educational Research Association, Austin.
  • Hambleton, R. K. (2006). Good practices fori detifying differential item functioning. Medical Care. 44, 11, 182-188.
  • Huggins, A. C. (2014). The effect of differential item functioning in anchor items on population invariance of equating. Educational and Psychological Measurement. 74(4), 627-658.
  • Kim, S. and Hanson, B. A. (2002). Test equating under the multiple-choice model. Applied Psychological Measurement, 26(3), 255-270. doi: 10.1177/0146621602026003002
  • Kolen, M. J. (1988). Traditional equating methodology. Educational Measurement Issues and Practice, 7 (4), 29-36.
  • Kolen, M. J. and Brennan, R. L. (2004). Test Equating, Scalling, andLinking. (second edition). USA: Springer.
  • Lord, M. F. (1980). Application of Item Response Theory to Practical Testing Problems. New Jersey: Lawrence Erlbaum Associates Publishers.
  • Osterlind, J. S. (1983). Test item bias. London Sage Publications.
  • Sireci, S. G. and Mullane, L. A. (1994). Evaluating test fairness in licensure testing: The sensitivity review process. CLEAR Exam Reveiw. 5, 2, 22-27.
  • Swaminathan, H. and Gifford, J. A. (1983). Estimation of parameters in the three parameter latent trait model. In D. Weiss (Ed.), New horizons in testing. New York: Academic Press.
  • Zieky, M. (2002). Ensuring the fairness of Licensing Tests. CLEAR Exam Review. 12, 1, 20-26
  • Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as saUnitary Framework for Binary and Likert-type (Ordinal) Item Scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.