Veriye Sonradan Model Eklemenin ve Madde Sıralamasının DMF Üzerindeki Etkileri

Impact of Retrofitting and Item Ordering on DIF

Richer diagnostic information about examinees’ cognitive strength and weaknesses are obtained from cognitivelydiagnostic assessments (CDA) when a proper cognitive diagnosis model (CDM) is used for response data analysis.To do so, researchers state that a preset cognitive model specifying the underlying hypotheses about response datastructure is needed. However, many real data CDM applications are adds-on to simulation studies and retrofittedto data obtained from non-CDAs. Such a procedure is referred to as retrofitting, and fitting CDMs to traditionaltest data is not uncommon. To deal with a major validity concern of item/test bias in CDAs, some recent DIFdetection techniques compatible with various CDMs have been proposed. This study employs several DIFdetection techniques developed based on CTT, IRT, and CDM frameworks and compares the results to understandthe extent to which DIF flagging behavior of items is affected by retrofitting. A secondary purpose of this study isto gather evidence about test booklet effects (i.e., item ordering) on items’ psychometric properties through DIFanalyses. Results indicated severe DIF flagging prevalence differences for items across DIF detection techniquesemploying Wald test, Raju’s area measures, and Mantel-Haenzsel statistics. The largest numbers of DIF caseswere observed when the data were retrofitted to a CDM. The results further revealed that an item might be flaggedas DIF in one booklet, whereas it might not be flagged in another

PDF

___

Ankara University (2011). Ankara Üniversitesi Eğitim Bilimleri Fakültesi’nin YGS Hakkında Görüşü. Retrieved November 30, 2015, form https://dahilibellek.wordpress.com/2011/04/12/ankara-ebf-ygs/
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Newbury Park, CA: Sage.
Chambers, J.M., Cleveland, W.S., Tukey, P.A., Kleiner, B. (1983). Graphical Methods for Data Analysis. Wadsworth International Group, the University of Michigan.
Choi, K. M., Lee, Y.-S., & Park, Y. S. (2015). What CDM can tell about what students have learned: An analysis of TIMSS eighth grade mathematics. Eurasia Journal of Mathematics, Science, & Technology Education, 11, 1563–1577.
de Ayala, R. J. (2009). The theory and practice of item response theory. The Guilford Press, New York, NY.
de la Torre, J. (2009). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 47, 115-127
de la Torre, J., Hong, Y., & Deng, W. (2010). Factors affecting the item parameter estimation and classification accuracy of the DINA model. Journal of Educational Measurement 47, 227–249.
de la Torre, J., & Minchen N. (2014). Cognitively diagnostic assessments and the cognitive diagnosis framework. Psicologia Educative 20, 89-97
De Carlo, L. T. (2012). Recognizing uncertainty in the Q-matrix via a Bayesian extension of the DINA model. Applied Psychological Measurement, 36, 447-468.
Gierl, M. J., Alves, C., & Majeau, R. T. (2010). Using the attribute hierarchy method to make diagnostic inferences about examinees’ knowledge and skills in mathematics: An operational implementation of cognitive diagnostic assessment. International Journal of Testing, 10, 318-341. doi:10.1080/15305058.2010.509554
Gierl, M.J., & Cui, Y. (2008). Defining characteristics of diagnostic classification models and the problem of retrofitting in cognitive diagnostic assessment. Measurement, 6, 263-275.
Holland, P. W., & Thayer, D. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (129-145). Hillsdale, NJ: Lawrence Erlbaum Associates.
Hou, L., Terzi R., & de la Torre, J. (2020). Wald test formulations in DIF detection of CDM data with the proportional reasoning test. International Journal of Assessment Tools in Education, 7(2), 145-158.
Jöreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631-639.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258-272.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.
Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8(2), 147-154.
Li, F. (2008) Modified higher-order DINA model for de11tecting differential item functioning and differential attribute functioning. Unpublished doctoral dissertation University of Georgia, USA.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.
Ma, W., Terzi, R., & de la Torre, J. (2021). Detecting Differential Item Functioning Using Multiple-Group Cognitive Diagnosis Models. Applied Psychological Measurement, 45(1), 37-53.
Magis, D., Beland, S., & Raiche, G., (2015). difR: Collection of methods to detect dichotomous differential item functioning (DIF). R package version 4.6.
Mantel, N., & Haenszel, W. M. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from person’s responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749.
Middle East Technical University. (2011). 2011 Yılı Yükseköğretime Geçiş Sınavı Hakkında ODTÜ Eğitim Fakültesi Görüşü. Retrieved November 30, 2015, form http://fedu.metu.edu.tr/sites/fedu.metu.edu.tr/files/ygs2011hkegitimfakultesigorusu_28_4_2011_v2.pdf
Measurement, Selection, and Placement Center. (2011). Adaya özgü soru kitapçığı. Retrieved December 29, 2015, file://localhost/from http/::www.osym.gov.tr:belge:1-12431:adaya-ozgu-soru-kitapcigi-21032011.html
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Raju, N. S. (1988). The area between two item characterıstıc curves. Psychometrika, 3, 495-502.
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207.
Rupp, A. A., & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement, 6(4), 219-262.
Sessoms, J. & Henson, R. A. (2018). Applications of Diagnostic Classification Models: A literatüre review and critical commentary. Measurement: Interdisciplinary research and persperctives, 1, 1-17.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
Terzi, R., & Sen, S. (2019). A nondiagnostic assessment for diagnostic purposes: Q-matrix validation and itembased model fit evaluation for the TIMSS 2011 assessment. SAGE Open, 9(1), 1-11.
Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118.
Woods, C. M. (2008). Likelihood-ratio DIF testing: Effects of nonnormality. Applied Psychological Measurement, 32(7), 511-526.
Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4, 223-233.