Alperen YANDI, Nizamettin KOÇ

Ölçme eşdeğerliğini incelemede kullanılan yöntemlerin farklı koşullar altında istatistiksel güç oranları açısından karşılaştırılması

Geçerlilik, bir ölçme aracında bulunması gereken en önemli psikometrik özelliktir. Ölçme eşdeğerliği, ölçme araçlarının geçerliliğinin kanıtlarından biridir. Araştırmacılara ölçme eşdeğerliğini belirleme yöntemleri hakkında bilgi verilmesi, tam bir geçerlilik kanıtı sunulmasına katkı sağlayabilir. Bu çalışmanın amacı, yapısal eşitlik modellemesi ve madde tepki kuramına dayalı ölçme eşdeğerliğini incelemek için kullanılan yöntemlerin istatistiksel güç oranlarını, örneklem büyüklüğü, madde sayısı ve değişen madde fonksiyonu içeren madde oranı değişkenleri değişimlenerek oluşturulan yapay veri setleri üzerinde karşılaştırmaktır. Bu amaç doğrultusunda değişkenler üç farklı düzeyi olacak şekilde değişimlenmiştir. Analizde yapısal eşitlik modellemesine dayalı yöntemlerden çok gruplu doğrulayıcı faktör analizi, madde yanıt teorisine dayalı yöntemlerden olabilirlik oranı testi ve madde parametrelerinin karşılaştırılması yöntemleri kullanılmıştır. Örneklem büyüklüğü 1000/1000, madde sayısının 40 olduğu ve değişen madde fonksiyonuna sahip maddelerin oranları ise %10 koşulda çok gruplu grup doğrulayıcı faktör analizi (%93,50) ve olabilirlik oran testi (%96,75) yöntemleri en yüksek istatistiksel güç oranına ulaşmıştır. Madde parametrelerinin karşılaştırılması yöntemi (%94,50) örneklem büyüklüğünün 1000/1000 olması, madde sayısının 20 olması ve diferansiyel madde fonksiyonuna sahip maddelerin oranlarının 10-20% olması durumunda en yüksek istatistiksel güce ulaşmıştır.

Anahtar Kelimeler:

Ölçüm eşdeğerliği, istatistiksel güç oranları, çoklu grup doğrulayıcı faktör analizi, olabilirlik oranı testi, madde parametrelerinin karşılaştırılması yöntemi

Comparison of the methods of examining measurement equivalence under different conditions in terms of statistical power ratios

Validity is the most important psychometric feature that should be found in a measurement tool. Measurement equivalence is one of the evidences for the validity of measurement tools. Providing information to the researchers about the methods of identifying measurement equivalence may contribute to present a complete validity evidence. The purpose of this study is to compare the statistical power ratios of methods used for examining the measurement equivalence based on structural equation modeling and item response theory on the artificial data sets generated by diversifying variables of sample size, number of items, and the ratios of the items having differential item function. In accordance with this purpose, the variables have been varied to include three different levels. In the analysis, multi-group confirmatory factor analysis was used which is among the methods based on the structural equation modeling, and likelihood ratio test and comparison of the item parameters methods were used which are among the methods based on item response theory. Multi-group group confirmatory factor analysis (93,50%) and likelihood ratio test (96,75%) methods have reached the highest statistical power ratio in the condition that the sample size is 1000/1000, the number of items is 40 items and the ratios of the items having differential item function is 10%. The method of comparing the item parameters (94,50%) has reached the highest statistical power ratio in the condition of sample size is 1000/1000, the number of items is 20 and the ratios of the items having differential item function are 10-20%.

Keywords:

Measurement equivalence, statistical power ratios, multi-group confirmatory factor analysis, likelihood ratio test, comparison of the item parameters method,

PDF

___

Angoff, W.H. (1993). Perspectives On Differential Item Functioning Methodology. Holland and Wainer (Ed.), Differential Item Functioning içinde (s. 3-23). Lawrence Erlbaum Associates, Publishers, New Jersey.
Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An Investigation of the Power of the Likelihood Ratio Goodness‐of‐Fit Statistic in Detecting Differential Item Functioning. Journal of Educational Measurement, 36(4), 277-300. Retrieved from https://doi.org/10.1111/j.1745-3984.1999.tb00558.x
Atalay, K., Gök, B., Kelecioğlu, H., & Arsan, N. (2012). Değişen madde fonksiyonunun belirlenmesinde kullanılan farklı yöntemlerin karşılaştırılması: Bir simülasyon araştırması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 43, 270-281.
Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 41(41).
Bock, R. D., Murakl, E., & Pfeiffenberger, W. (1988). Item pool maintenance in the presence of item parameter drift. Journal of Educational Measurement, 25(4), 275-285.
Bolt, D. M. (2002). A Monte Carlo Comparison of Parametric and Nonparametric Polytomous DIF detection methods. Applied Measurement in Education, 15,113-141. Retrieved from https://doi.org/10.1207/S15324818AME1502_01
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: issues and practice, 17(1), 31-44. Retrieved from https://doi.org/10.1111/j.1745-3992.1998.tb00619.x
Dodeen, H. (2004). Stability of differential item functioning over a single population in survey data. The Journal of experimental education, 72(3), 181-193. Retrieved from https://doi.org/10.3200/JEXE.72.3.181-193
Drasgow, F., & Kanfer, R. (1985). Equivalence of psychological measurement in heterogeneous populations. Journal of Applied Psychology, 70(4), 662. Retrieved from http://dx.doi.org/10.1037/0021-9010.70.4.662
Elosua, P., & Wells, C. S. (2013). Detecting DIF in polytomous items using MACS, IRT and ordinal logistic regression. Psicológica, 34(2)
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Fidalgo, A. M., Mellenbergh G. J., & Muniz, J. (2000). Effects of Amount of DIF, Test Length, and Purification Type on Robustness and Power of Mantel-Haenszel Procedures. Methods of Psychological Research Online. 5(3), 43-53.
Flowers, C.P., Raju, N.S., & Oshima, T. C. (2002). A comparison of measurement equivalence methods based on confirmatory factor analysis and item response theory. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans.
Galton, F. (1884). Measurement of character. Fortnightly Review, 36, 179-185.
González Roma, V., Hernandez, A., & Gomez-Benito, J. (2006). Power and Type I error of the mean and covariance structure analysis model for detecting differential item functioning in graded response items. Multivariate Behavioral Research, 41(1), 29-53. Retrieved from https://www.tandfonline.com/doi/abs/10.1207/s15327906mbr4101_3
Güngör Culha, D. (2012). Örtük sınıf analizlerinde ölçme eşdeğerliğinin incelenmesi. [Investigating measurement equivalence with latent class analysis] (Yayımlanmamış doktora tezi). Ege Üniversitesi, İzmir.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of ıtem response theory. CA: Sage.
Holmes Finch, W., & French, B. F. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational and psychological Measurement, 67(4), 565-582. Retrieved from https://doi.org/10.1177/0013164406296975
Hong, S., Malik, M.L., & Lee, M. (2003). Testing configural, metric, scalar and latent mean invariance across genders in sociotropy and autonomy using a non- western sample. Educational and Psychological Measurement, 63(4), 636-654. Retrieved from https://doi.org/10.1177/0013164403251332
Huang, C. D., Church, A. T., & Katigbak, M. S. (1997). Identifying cultural differences in items and traits: Differential item functioning in the NEO Personality Inventory. Journal of Cross-Cultural Psychology, 28, 192-218. Retrieved from http://dx.doi.org/10.1177/0022022197282004
Kankaras, M., Vermunt, J. K., & Moors, G. (2011). Measurement equivalence of ordinal items: A comparison of factor analytic, item response theory, and latent class approaches. Sociological Methods & Research, 40(2), 279-310. Retrieved from https://doi.org/10.1177/0049124111405301
Karasar, N. (2005). Bilimsel araştırma yöntemi [Scientific research method]. Ankara: Nobel yayın dağıtım
Kazelskis, R., Thames, D., & Reeves, C. (2004). The elementary reading attitude survey: factor invariance across gender and race. Reading Psychology, 25, 111-120. Retrieved from https://doi.org/10.1080/02702710490435682
Kim, S. H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22(4), 345-355. Retrieved from https://doi.org/10.1177/014662169802200403
Kim, E. S., & Yoon, M. (2011). Testing measurement invariance: A comparison of multiple-group categorical CFA and IRT. Structural Equation Modeling, 18(2), 212-228. Retrieved from https://doi.org/10.1080/10705511.2011.557337
Korkmaz, M. (2005). Madde cevap kuramına dayalı olarak çok kategorili maddelerde madde ve test yanlılığının (işlevsel farklılığın) incelenmesi. [An investigation of differential item and test functioning at polytomous items based on item response theory]. (Doktora tezi) Ege Üniversitesi, İzmir.
Little, T. D. (1997). Mean and covariance structures (MACS) analyses of cross-cultural data: Practical and theoretical issues. Multivariate Behavioral Research, 32(1), 53-76. Retrieved from https://doi.org/10.1207/s15327906mbr3201_3
Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Organizational Research Methods, 7(4), 361-388. Retrieved from https://doi.org/10.1177/1094428104268027
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543. Retrieved from https://doi.org/10.1007/BF02294825
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749.
Morales, L. S., Reise, S. P., & Hays, R. D. (2000). Evaluating the equivalence of health care ratings by whites and Hispanics. Medical care, 38(5), 517.
Muraki, E., & Bock, R. D. (1991). PARSCALE: Parameter scaling of rating data. Chicago: Scientific Software, Inc.
Nachtigall, C., Kroehne, U., Funke, F., & Steyer, R. (2003). (Why) should we use SEM? Pros and Cons of Structural Equation Modeling. Methods of Psychological Research Online, 8(2), 1- 22.
Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315-328. Retrieved from https://doi.org/10.1177/014662169401800403
Reise, S. P., Smith, L., & Furr, R. M. (2001). Invariance on the NEO PI-R neuroticism scale. Multivariate Behavioral Research, 36(1), 83-110. Retrieved from https://doi.org/10.1207/S15327906MBR3601_04
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychological Bulletin, 114(3), 552-566. Retrieved from https://doi.org/10.1037/0033-2909.114.3.552
Rogers, J. H., & Swaminathan, H. (1993). A comparison of logistic regression and mantel-haenszel procedures for detecting differential ıtem functioning. Applied Psychological Measurement, 17(2), 105-116. Retrieved from https://doi.org/10.1177/014662169301700201
Somer, O. (2004). Gruplararası karşılaştırmalarda ölçek eşdeğerliğinin incelenmesi: Madde ve test fonksiyonlarının farklılaşması. Türk Psikoloji Dergisi, 19(53), 69 - 82.
Somer, O., Korkmaz, M., Dural, S., & Can, S. (2009). Ölçme eşdeğerliğinin yapısal eşitlik modellemesi ve madde tepki kuramı kapsamında incelenmesi. [Detection of Measurement Equivalence by Structural Equation Modeling and Item Response Theory]. Türk Psikoloji Dergisi, 24(64), 61-75.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292. Retrieved from https://doi.org/10.1037/0021-9010.91.6.1292
Tekin, H. (1991). Eğitimde ölçme ve değerlendirme. [Measurement and evaluation in education]. (6. baskı). Ankara: Yargı Yayınları.
Thissen, D. (1991). MULTILOG: Multiple category item analysis and test scoring using item response theory. Chicago, IL: Scientific Software International.
Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118. Retrieved from http://dx.doi.org/10.1037/0033-2909.99.1.118
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc.
Van de Vijver, F. J. (1998). Towards a theory bias and equivalence. Zuma Nachrichten: Cross-Cultural Survey Equivalence, 3, 41-65. Retrieved from https://nbn-resolving.org/urn:nbn:de:0168-ssoar-49731-1
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4-69. Retrieved from https://doi.org/10.1177/109442810031002
Wang, W. C., & Yeh, L. Y. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498. Retrieved from https://doi.org/10.1177/0146621603259902
Widaman, K. F., & Reise, S. P. (1997). Exploring the measurement invariance of psychological instruments: Applications in the substance use domain. The science of prevention: Methodological advances from alcohol and substance abuse research, 281-324. Retrieved from http://dx.doi.org/10.1037/10222-009
Wu, A. D., Li, Z., & Zumbo, B. D. (2007) Decoding the meaning of factorial invariance and updating the practice of multiple-group confirmatory factor analysis: A demonstration with TIMSS data. Practical Assessment, Research and Evaluation, 12, 1-26.