Ölçme eşdeğerliğini incelemede kullanılan yöntemlerin farklı koşullar altında istatistiksel güç oranları açısından karşılaştırılması

Geçerlilik, bir ölçme aracında bulunması gereken en önemli psikometrik özelliktir. Ölçme eşdeğerliği, ölçme araçlarının geçerliliğinin kanıtlarından biridir. Araştırmacılara ölçme eşdeğerliğini belirleme yöntemleri hakkında bilgi verilmesi, tam bir geçerlilik kanıtı sunulmasına katkı sağlayabilir. Bu çalışmanın amacı, yapısal eşitlik modellemesi ve madde tepki kuramına dayalı ölçme eşdeğerliğini incelemek için kullanılan yöntemlerin istatistiksel güç oranlarını, örneklem büyüklüğü, madde sayısı ve değişen madde fonksiyonu içeren madde oranı değişkenleri değişimlenerek oluşturulan yapay veri setleri üzerinde karşılaştırmaktır. Bu amaç doğrultusunda değişkenler üç farklı düzeyi olacak şekilde değişimlenmiştir. Analizde yapısal eşitlik modellemesine dayalı yöntemlerden çok gruplu doğrulayıcı faktör analizi, madde yanıt teorisine dayalı yöntemlerden olabilirlik oranı testi ve madde parametrelerinin karşılaştırılması yöntemleri kullanılmıştır. Örneklem büyüklüğü 1000/1000, madde sayısının 40 olduğu ve değişen madde fonksiyonuna sahip maddelerin oranları ise %10 koşulda çok gruplu grup doğrulayıcı faktör analizi (%93,50) ve olabilirlik oran testi (%96,75) yöntemleri en yüksek istatistiksel güç oranına ulaşmıştır. Madde parametrelerinin karşılaştırılması yöntemi (%94,50) örneklem büyüklüğünün 1000/1000 olması, madde sayısının 20 olması ve diferansiyel madde fonksiyonuna sahip maddelerin oranlarının 10-20% olması durumunda en yüksek istatistiksel güce ulaşmıştır.

Comparison of the methods of examining measurement equivalence under different conditions in terms of statistical power ratios

Validity is the most important psychometric feature that should be found in a measurement tool. Measurement equivalence is one of the evidences for the validity of measurement tools. Providing information to the researchers about the methods of identifying measurement equivalence may contribute to present a complete validity evidence. The purpose of this study is to compare the statistical power ratios of methods used for examining the measurement equivalence based on structural equation modeling and item response theory on the artificial data sets generated by diversifying variables of sample size, number of items, and the ratios of the items having differential item function. In accordance with this purpose, the variables have been varied to include three different levels. In the analysis, multi-group confirmatory factor analysis was used which is among the methods based on the structural equation modeling, and likelihood ratio test and comparison of the item parameters methods were used which are among the methods based on item response theory. Multi-group group confirmatory factor analysis (93,50%) and likelihood ratio test (96,75%) methods have reached the highest statistical power ratio in the condition that the sample size is 1000/1000, the number of items is 40 items and the ratios of the items having differential item function is 10%. The method of comparing the item parameters (94,50%) has reached the highest statistical power ratio in the condition of sample size is 1000/1000, the number of items is 20 and the ratios of the items having differential item function are 10-20%.

___

  • Angoff, W.H. (1993). Perspectives On Differential Item Functioning Methodology. Holland and Wainer (Ed.), Differential Item Functioning içinde (s. 3-23). Lawrence Erlbaum Associates, Publishers, New Jersey.
  • Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An Investigation of the Power of the Likelihood Ratio Goodness‐of‐Fit Statistic in Detecting Differential Item Functioning. Journal of Educational Measurement, 36(4), 277-300. Retrieved from https://doi.org/10.1111/j.1745-3984.1999.tb00558.x
  • Atalay, K., Gök, B., Kelecioğlu, H., & Arsan, N. (2012). Değişen madde fonksiyonunun belirlenmesinde kullanılan farklı yöntemlerin karşılaştırılması: Bir simülasyon araştırması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 43, 270-281.
  • Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 41(41).
  • Bock, R. D., Murakl, E., & Pfeiffenberger, W. (1988). Item pool maintenance in the presence of item parameter drift. Journal of Educational Measurement, 25(4), 275-285.
  • Bolt, D. M. (2002). A Monte Carlo Comparison of Parametric and Nonparametric Polytomous DIF detection methods. Applied Measurement in Education, 15,113-141. Retrieved from https://doi.org/10.1207/S15324818AME1502_01
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: issues and practice, 17(1), 31-44. Retrieved from https://doi.org/10.1111/j.1745-3992.1998.tb00619.x
  • Dodeen, H. (2004). Stability of differential item functioning over a single population in survey data. The Journal of experimental education, 72(3), 181-193. Retrieved from https://doi.org/10.3200/JEXE.72.3.181-193
  • Drasgow, F., & Kanfer, R. (1985). Equivalence of psychological measurement in heterogeneous populations. Journal of Applied Psychology, 70(4), 662. Retrieved from http://dx.doi.org/10.1037/0021-9010.70.4.662
  • Elosua, P., & Wells, C. S. (2013). Detecting DIF in polytomous items using MACS, IRT and ordinal logistic regression. Psicológica, 34(2)
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
  • Fidalgo, A. M., Mellenbergh G. J., & Muniz, J. (2000). Effects of Amount of DIF, Test Length, and Purification Type on Robustness and Power of Mantel-Haenszel Procedures. Methods of Psychological Research Online. 5(3), 43-53.
  • Flowers, C.P., Raju, N.S., & Oshima, T. C. (2002). A comparison of measurement equivalence methods based on confirmatory factor analysis and item response theory. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans.
  • Galton, F. (1884). Measurement of character. Fortnightly Review, 36, 179-185.
  • González Roma, V., Hernandez, A., & Gomez-Benito, J. (2006). Power and Type I error of the mean and covariance structure analysis model for detecting differential item functioning in graded response items. Multivariate Behavioral Research, 41(1), 29-53. Retrieved from https://www.tandfonline.com/doi/abs/10.1207/s15327906mbr4101_3
  • Güngör Culha, D. (2012). Örtük sınıf analizlerinde ölçme eşdeğerliğinin incelenmesi. [Investigating measurement equivalence with latent class analysis] (Yayımlanmamış doktora tezi). Ege Üniversitesi, İzmir.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of ıtem response theory. CA: Sage.
  • Holmes Finch, W., & French, B. F. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational and psychological Measurement, 67(4), 565-582. Retrieved from https://doi.org/10.1177/0013164406296975
  • Hong, S., Malik, M.L., & Lee, M. (2003). Testing configural, metric, scalar and latent mean invariance across genders in sociotropy and autonomy using a non- western sample. Educational and Psychological Measurement, 63(4), 636-654. Retrieved from https://doi.org/10.1177/0013164403251332
  • Huang, C. D., Church, A. T., & Katigbak, M. S. (1997). Identifying cultural differences in items and traits: Differential item functioning in the NEO Personality Inventory. Journal of Cross-Cultural Psychology, 28, 192-218. Retrieved from http://dx.doi.org/10.1177/0022022197282004
  • Kankaras, M., Vermunt, J. K., & Moors, G. (2011). Measurement equivalence of ordinal items: A comparison of factor analytic, item response theory, and latent class approaches. Sociological Methods & Research, 40(2), 279-310. Retrieved from https://doi.org/10.1177/0049124111405301
  • Karasar, N. (2005). Bilimsel araştırma yöntemi [Scientific research method]. Ankara: Nobel yayın dağıtım
  • Kazelskis, R., Thames, D., & Reeves, C. (2004). The elementary reading attitude survey: factor invariance across gender and race. Reading Psychology, 25, 111-120. Retrieved from https://doi.org/10.1080/02702710490435682
  • Kim, S. H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22(4), 345-355. Retrieved from https://doi.org/10.1177/014662169802200403
  • Kim, E. S., & Yoon, M. (2011). Testing measurement invariance: A comparison of multiple-group categorical CFA and IRT. Structural Equation Modeling, 18(2), 212-228. Retrieved from https://doi.org/10.1080/10705511.2011.557337
  • Korkmaz, M. (2005). Madde cevap kuramına dayalı olarak çok kategorili maddelerde madde ve test yanlılığının (işlevsel farklılığın) incelenmesi. [An investigation of differential item and test functioning at polytomous items based on item response theory]. (Doktora tezi) Ege Üniversitesi, İzmir.
  • Little, T. D. (1997). Mean and covariance structures (MACS) analyses of cross-cultural data: Practical and theoretical issues. Multivariate Behavioral Research, 32(1), 53-76. Retrieved from https://doi.org/10.1207/s15327906mbr3201_3
  • Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Organizational Research Methods, 7(4), 361-388. Retrieved from https://doi.org/10.1177/1094428104268027
  • Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543. Retrieved from https://doi.org/10.1007/BF02294825
  • Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749.
  • Morales, L. S., Reise, S. P., & Hays, R. D. (2000). Evaluating the equivalence of health care ratings by whites and Hispanics. Medical care, 38(5), 517.
  • Muraki, E., & Bock, R. D. (1991). PARSCALE: Parameter scaling of rating data. Chicago: Scientific Software, Inc.
  • Nachtigall, C., Kroehne, U., Funke, F., & Steyer, R. (2003). (Why) should we use SEM? Pros and Cons of Structural Equation Modeling. Methods of Psychological Research Online, 8(2), 1- 22.
  • Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315-328. Retrieved from https://doi.org/10.1177/014662169401800403
  • Reise, S. P., Smith, L., & Furr, R. M. (2001). Invariance on the NEO PI-R neuroticism scale. Multivariate Behavioral Research, 36(1), 83-110. Retrieved from https://doi.org/10.1207/S15327906MBR3601_04
  • Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychological Bulletin, 114(3), 552-566. Retrieved from https://doi.org/10.1037/0033-2909.114.3.552
  • Rogers, J. H., & Swaminathan, H. (1993). A comparison of logistic regression and mantel-haenszel procedures for detecting differential ıtem functioning. Applied Psychological Measurement, 17(2), 105-116. Retrieved from https://doi.org/10.1177/014662169301700201
  • Somer, O. (2004). Gruplararası karşılaştırmalarda ölçek eşdeğerliğinin incelenmesi: Madde ve test fonksiyonlarının farklılaşması. Türk Psikoloji Dergisi, 19(53), 69 - 82.
  • Somer, O., Korkmaz, M., Dural, S., & Can, S. (2009). Ölçme eşdeğerliğinin yapısal eşitlik modellemesi ve madde tepki kuramı kapsamında incelenmesi. [Detection of Measurement Equivalence by Structural Equation Modeling and Item Response Theory]. Türk Psikoloji Dergisi, 24(64), 61-75.
  • Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292. Retrieved from https://doi.org/10.1037/0021-9010.91.6.1292
  • Tekin, H. (1991). Eğitimde ölçme ve değerlendirme. [Measurement and evaluation in education]. (6. baskı). Ankara: Yargı Yayınları.
  • Thissen, D. (1991). MULTILOG: Multiple category item analysis and test scoring using item response theory. Chicago, IL: Scientific Software International.
  • Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118. Retrieved from http://dx.doi.org/10.1037/0033-2909.99.1.118
  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc.
  • Van de Vijver, F. J. (1998). Towards a theory bias and equivalence. Zuma Nachrichten: Cross-Cultural Survey Equivalence, 3, 41-65. Retrieved from https://nbn-resolving.org/urn:nbn:de:0168-ssoar-49731-1
  • Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4-69. Retrieved from https://doi.org/10.1177/109442810031002
  • Wang, W. C., & Yeh, L. Y. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498. Retrieved from https://doi.org/10.1177/0146621603259902
  • Widaman, K. F., & Reise, S. P. (1997). Exploring the measurement invariance of psychological instruments: Applications in the substance use domain. The science of prevention: Methodological advances from alcohol and substance abuse research, 281-324. Retrieved from http://dx.doi.org/10.1037/10222-009
  • Wu, A. D., Li, Z., & Zumbo, B. D. (2007) Decoding the meaning of factorial invariance and updating the practice of multiple-group confirmatory factor analysis: A demonstration with TIMSS data. Practical Assessment, Research and Evaluation, 12, 1-26.
Bartın Üniversitesi Eğitim Fakültesi Dergisi-Cover
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 2012
  • Yayıncı: Bartın Üniversitesi Eğitim Fakültesi