Investigation of Group Invariance in Test Equating Under Different Simulation Conditions

Investigation of Group Invariance in Test Equating Under Different Simulation Conditions

Purpose: This study aimed to examine the impact of differential item functioning in anchor items on the group invariance in test equating for different sample sizes. Within this scope, the factors chosen to investigate the group invariance in test equating were sample size, frequency of sample size of subgroups, differential form of differential item functioning (DIF), frequency of items in the anchor test with differential item functioning, directionality of differential item functioning and mean differences in subpopulation ability levels. Research Methods: The current study was conducted by using item response theory true score equating under equivalent groups anchor test design. REMSD index was used for investigating group invariance in test equating. This study was designed as a comparison of equating results on 96 simulation conditions. The R language and SPSS software was utilized for analysis and 100 replications were performed for each condition. The effect of the conditions held in the study on group invariance in test equating was evaluated by taking average of REMSD. Also, ANOVA was performed to determine significant effect of each factor on group invariance in test equating. Findings: The findings of the study showed that differential form DIF was the factor that had the most prominent impact on group invariance in test equating. Implications for Research and Practice: Within the scope of the results of the study, group invariance affected by factors of DIF were only in instances in which DIF in anchor items was differential across test forms.

___

  • Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508-600). Washington, D.C: American Council on Education
  • Angoff, W.H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service.
  • Angoff, W. H., & Cook, L. L. (1988). Equating the scores of the Prueba de Aptitud Academica and the Scholastic Aptitude Test (College Board Report No. 88-2). New York: College Entrance Examination Board.
  • Atar, B. (2007). Differential item functioning analyses for mixed response data using irt likelihood-ratio test, logistic regression, and gllamm procedures. Unpublished doctorate dissertation. The Florida State University.
  • Bolt, D., & Stout, W. (1996). Differential item functioning: Its multidimensional model and resulting SIBTEST detection procedure. Behaviormetrika, 23(1), 67-95.
  • Chu, K. L. (2002). Equivalent group test equating with the presence of differential item functioning. Unpublished doctorate dissertation. The Florida State University.
  • Chu, K. L., & Kamata, A. (2005). Test equating in the presence of dif items. Journal of Applied Measurement.Special Issue: The Multilevel Measurement Model, 6(3), 342- 354.
  • Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.
  • Demirus, K.B. (2015). Ortak maddelerin değişen madde fonksiyonu gösterip göstermemesi durumunda test eşitlemeye etkisinin farklı yöntemlerle incelenmesi. Yayınlanmamış doktora tezi. Hacettepe Üniversitesi, Eğitim Bilimleri Enstitüsü. Ankara
  • Dorans, N.J. (2004). Using subpopulation invariance to assess test score equity. Journal of Educational Measurement, 41, 43-68.
  • Dorans, N. J., (2008). Three facets of fairness. Paper presented at the annual meeting of the National Council on Measurement in Education, New York.
  • Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equatability of tests: basic theory and the linear case. Journal of Educational Measurement, 37(4), 281-306.
  • Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P.W. Holland & H. Wainer (Eds.), Differential item functioning (pp.35-66). Hillsdale, NJ: Lawrence Erlbaum.
  • Dorans, N. J., Holland, P. W., Thayer, D. T., & Tateneni, K. (2003). Invariance of score linking across gender groups for three advanced placement program exams. In N. J. Dorans (Ed.), Population invariance of score linking: Theory and applications to advanced placement program examinations (pp. 79-118), Research Report 03-27. Princeton, NJ: Educational Testing Service.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
  • Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates. Unpublished doctorate thesis. University of Massachusetts, Amherst.
  • Hanson, B. A., & Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24.
  • Hidalgo Montesinos, M. D., & Lopez Pina, J. A. (2002). Two-stage equating in differential item functioning detection under the graded response model with the Raju area measures and Lord statistic. Educational and Psychological Measurement, 62(1), 32.
  • Holland, P.W. (2007). A framework and history for score linking. In N.J. Dorans, M. Pommerich, & P.W. Holland’s (Eds.), Linking and aligning scores and scales (pp. 5- 30). NY: Springer
  • Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (pp. 187–220). Westport, CT: Praeger Publishers.
  • Huang, J. (2010). Population invariance of linking functions of curriculum-based measures of math problem solving. Unpublished doctorate thesis. University of Miami, Florida.
  • Huggins, A.C. (2012). The effect of differential item functioning on population invariance of item response theory true score equating. Unpublished doctoral dissertation. University of Miami, Florida.
  • Huggins, A. C. (2014). The effect of differential item functioning in anchor items on population invariance of equating. Educational and Psychological Measurement, 74(4), 627-658.
  • Huggins, A.C., & Penfield, R.D. (2012). An instructional NCME module on population invariance in linking and equating. Educational Measurement: Issues and Practices, 31, 27-40.
  • Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1-73.
  • Kim, S.H., & Cohen, A. S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29(1), 51–66.
  • Kolen, M., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (Second ed.). New York: Springer.
  • Kolen, M.J. (2004). Population invariance in equating and linking: Concept and history. Journal of Educational Measurement, 41, 3-14.
  • Lee, W., & Ban, J. (2010). A comparison of IRT linking procedures. Applied Measurement in Education, 23, 23–48.
  • Petersen, N. S., Cook, L. L., & Stocking, M. L. (1983). IRT versus conventional equating methods: A comparative study of scale stability. Journal of Educational Statistics, 8, 137–156.
  • Sahin, A., & Anil, D. (2017). The effects of test length and sample size on item parameters in Item Response Theory. Educational Sciences: Theory and Practice, 17(1n), 321-335.
  • Tian, F. (1999). Detecting differential item functioning in polytomous items. Unpublished doctoral dissertation. Faculty of Education, University of Ottawa.
  • von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The chain and poststratification methods for observed-score equating and their relationship to population invariance. Journal of Educational Measurement, 41, 15-32.
  • von Davier, A. A., & Wilson, C. (2008). Investigating the population sensitivity assumption of item-response theory true-score equating across two subgroups of examinees and two test formats. Applied Psychological Measurement, 32(1), 11- 26.
  • Yang, W.L. (2004). Sensitivity of linkings between AP multiple-choice scores and composite scores to geographical region: An illustration of checking for population invariance. Journal of Educational Measurement, 41, 33-41.
  • Yang, W.L., Dorans, N.J., & Tateneni, K. (2003). Sample selection effects on AP multiple-choice score to composite score scaling. In N.J. Dorans (Ed.), Population invariance of score linking: Theory and applications to advanced placement program examinations (ETS Research Report No. RR- 03-27) (pp. 57-78). Princeton, NJ: Educational Testing Service.