Kategori Sayısının Psikometrik Özellikler Üzerine Etkisinin Mokken Homojenlik Modeli’ne Göre İncelenmesi

Araştırmanın amacı çok kategorili puanlanan maddelerden oluşan testlerde kategori sayısının psikometrik özellikler üzerindeki etkisinin parametrik olmayan madde tepki kuramı (POMTK) modeli ile belirlenmesidir. Belirlenen amaç doğrultusunda iki farklı büyüklükte (100 ve 500), çeşitli dağılım özelliklerine sahip (normal dağılan, sağa çarpık dağılan ve sola çarpık dağılan) örneklemler için iki farklı test uzunluğunda (10 madde ve 30 madde), üç farklı sayıda kategoriye (üç, beş ve yedi) sahip maddeler simülatif olarak üretilmiştir. Kategori sayısının psikometrik özellikler üzerindeki etkisi POMTK modellerinden Mokken Homojenlik Modeli (MHM) ile araştırılmıştır. Yapılan araştırma temel araştırma olarak tasarlanmıştır. Verilerin üretilmesinde ve verilerin analizinde R Studio 3.4.0 yazılımı kullanılmıştır. R Studio yazılımında MHM’ye göre analizler Mokken paketi ile yapılmıştır. MHM’ye göre yapılan ölçekleme sonucunda kategori sayısının değişmesiyle birlikte maddelerin MHM’ye uyumunda belli bir örüntü gözlenmemiştir. Genel olarak hem kısa testlerde, hem de uzun testlerde kategori sayısının güvenirlik değerlerinin kestiriminde etkili olmadıkları gözlenmiştir. Araştırmada belirlenen test koşullarında testler MHM’ye düşük düzeyde uyumlu çıkmıştır.

Investigation of the Effects of the Number of Categories on Psychometric Properties According to Mokken Homogeneity Model

The aim of the research was to examine the effects of the number of categories for polytomous items on psychometric properties in a nonparametric item response theory (NIRT) model. For the purpose of the study, data sets with two different sample sizes (100 and 500) that come from different sample distribution shapes (normal distribution, positively skewed distribution, and negatively skewed distribution), two different test lengths (10 items and 30 items), and three different number of categories (three, five, and seven) were generated. The effects of the number of categories on psychometric properties of polytomous items were analyzed by Mokken Homogeneity Model (MHM) under NIRT model. The research was designed as a basic research. In the generation and analysis of data sets, R Studio 3.4.0 software was used. For analysis conducted with MHM, Mokken package was used in R Studio. According to scaling with MHM, specific pattern of item fit to MHM with changing the number of categories was not observed. In general, it was found that the number of categories has no effect on reliability estimate. It was determined that tests have weak fit to MHM under test conditions in the research.

___

  • Bahry, L. M. (2012). Polytomous item response theory parameter recovery: an investigation of nonnormal distributions and small sample size (Master’s Thesis). Available from ProQuest Dissertations and Theses database. (UMI No. MR90146)
  • Cohen, A. S., Kim, S. H., & Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17(4), 335-350. doi:10.1177/01466216930170040
  • Crocker, L. & Algina, J. (1986). Introduction to Classical and Modern Test Theory. Orlando: Harcourt Brace Jovanovich Inc.
  • DeMars, C. E. (2002, April). Recovery of graded response and partial credit parameters in multilog and parscale. Paper presented at the annual meeting of American Educational Research Association, Chicago.
  • Emons, W. H. M. (2008). Nonparametric person-fit analysis of polytomous item scores. Applied Psychological Measurement, 32(3), 224- 247. doi:10.1177/0146621607302479
  • Erkuş, A., Sanlı, N., Bağlı, M., & Güven, K. (2000). Öğretmenliğe ilişkin tutum ölçeği geliştirilmesi. Eğitim ve Bilim, 25(116). http://egitimvebilim.ted.org.tr/index.php/EB/article/view/5276/1439 adresinden erişildi.
  • Fabiola, G., Iwin, L., Jennifer, L., & Zaira, V. (2012). The effect of the number of answer choices on the psychometric properties of stress measurement in an instrument applied to children. Evaluar, 12, 43-59. Retrieved from https://revistas.unc.edu.ar/index.php/revaluar/article/viewFile/4694/4488
  • Galindo-Garre, F., Hendriks, S. A., Volicer, L., Smalbrugge, M., Hertogh, C. M., & van der Steen, J. T. (2014). The Bedford Alzheimer nursing-severity scale to assess dementia severity in advanced dementia: a nonparametric item response analysis and a study of its psychometric characteristics. Am J Alzheimers Dis Other Demen, 29(1), 84-90. doi: 10.1177/1533317513506777
  • Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement, 12, 38-47.
  • Hemker, B. T., Sijtsma, K., Molenaar, I. W., & Junker, B. W. (1996). Polytomous irt models and monotone likelihood ratio of the total score. Psychometrika, 61(4), 679-693.
  • İlhan, M., & Güler, N. (2017). The number of response categories and the reverse directional item problem in likert-type scales: a study with the rasch model. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(3), 321-343.
  • Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in psychology, 7, 109. doi: 10.3389/fpsyg.2016.00109
  • Junker, B., and Sijtsma, K. (2001). Nonparametric item response theory in action: an overview of the special issue. Applied Psychological Measurement, 25(3), 211- 220. doi:10.1177/01466210122032028
  • Koğar H., (2015). Madde tepki kuramına ait parametrelerin ve model uyumlarının karşılaştırılması: bir monte carlo çalışması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 6, 142-157.
  • Lee, J., & Paek, I. (2014). In search of the optimal number of response categories in a rating scale. Journal of Psychoeducational Assessment, 32(7), 663-673. Leung, S. O. (2011). A comparison of psychometric properties and normality in 4-, 5-, 6-, and 11-point likert scales. Journal of Social Service Research, 37(4), 412-421.
  • Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 4(2), 73-79.
  • Maydeu-Olivares, A., Kramp, U., García-Forero, C., Gallardo-Pujol, D., & Coffman, D. (2009). The effect of varying the number of response alternatives in rating scales: experimental evidence from intra-individual effects. Behavior Research Methods, 41(2), 295-308.
  • Meijer, R. R. (2004, March). Investigating the quality of items in cat using nonparametric irt. Law School Admission Council Computerized Testing Report. A Publication of the Law School Admission Council.
  • Meijer, R. R., & Baneke, J. J. (2004). Analyzing psychopathology items: a case for nonparametric item response theory modeling. Psychological Methods, 9(3), 354-368. doi: 10.1037/1082-989X.9.3.354
  • Mokken, R. J. (1971). A theory and procedure of scale analysis: with applications in political research. The Hague: Mouton.
  • Mokken, R. J. (1997). Nonparametric models for dichotomous responses. In W. J. van der Linden, and R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 351-368). New York: Springer-Verlag.
  • Molenaar, I. W. (2001). Thirty years of nonparametric item response theory. Applied Psychological Measurement, 25(3), 295-299. doi:10.1177/01466210122032091 Ostini, R., & Nering, M. L. (2006). Polytomous Item Response Theory Models. Thousand Oaks, CA: Sage
  • Pozehl, J. B. (1990). Application of item response theory to criterion-referenced measurement: an investigation of the effects of model choice, sample size, and test length on reliability and estimation accuracy (Doctoral Dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 9030146)
  • Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1-15.
  • Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56(4), 611-630.
  • Rivas, T., Bersabé, R., & Berrocal, C. (2005). Application of double monotonicity model to polytomous items: scalability of the beck depression items on subjects with eating disorders. European Journal of Psychological Assessment, 21(1), 1-10. doi:10.1027//1015-5759.21.1.1
  • Sachs, J., Law, Y. K., & Chan, C. K. K. (2003). A nonparametric item analysis of a selected item subset of the learning process. British Journal of Educational Psychology, 73(3), 395–423. doi: 10.1348/000709903322275902
  • Sijtsma, K. & Molenaar, W. I. (2002). Introduction to Nonparametric Item Response Theory, USA: Sage Publications.
  • Sijtsma, K., Debets, P., & Molenaar, W. I. (1990). Mokken scale analysis for polychotomous items: theory, a computer program and an empirical application. Quality and Quantity, Kluwer academic publishers, Netherlands.
  • Štochl, J. (2007). Nonparametric extension of item response theory models and its usefulness for assessment of dimensionality of motor tests. Acta Universitatis Carolinae, 42(1), 75-94.
  • Syu, J. J. (2013). Applying person fit-in faking detection-the simulation and practice of non parametric item response theory. (Doctoral Dissertation, National Chengchi University). Retrieved from http://nccur.lib.nccu.edu.tw/bitstream/140.119/58646/1/251501.pdf
  • Şengül Avşar, A., & Tavşancıl, E. (2017). Examination of polytomous items' psychometric properties according to nonparametric item response theory models in different test conditions. Educational Sciences: Theory & Practice, 17(2). doi:10.12738/estp.2017.2.0246
  • Tendeiro, J. N., & Meijer, R. R. (2013). The probability of exceedance as a nonparametric person fit statistic for tests of moderate length. Applied Psychological Measurement, 37(8), 653–665. doi: 10.1177/0146621613499066
  • Uyumaz, G., & Çokluk, Ö. (2016). An investigation of item order and rating differences in likert-type scales in terms of psychometric properties and attitudes of respondents. Journal of Theoretical Educational Science, 9(3), 400-425. doi:10.5578/keg.10011
  • van der Ark, L. A. (2007). Mokken scale analysis in r. Journal of Statistical Software, 20(11), 1-19.
  • van der Ark, L. A. (2015). Package ‘mokken’. Retrieved from http://cran.rproject.org/web/packages/mokken/mokken.pdf
  • van der Ark, L. A., van der Palm, D. W., & Sijtsma, K. (2011). A latent class approach to estimating test-score reliability. Applied Psychological Measurement, 35(5), 380-392. doi:10.1177/0146621610392911
  • van Onna, M. J. H. (2004). Estimates of the sampling distribution of scalability coefficient h. Applied Psychological Measurement, 28(6), 427-449. doi:10.1177/0146621604268735
  • Wang, W. C. (2004). Direct estimation of correlation as a measure of association strength using multidimensional item response models. Educational and Psychological Measurement, 64(6), 937-955. doi:10.1177/0013164404268671
  • Weng, L. J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test-retest reliability. Educational and Psychological Measurement, 64(6), 956-972.
  • Young, M. A., Blodgett, C., & Reardon, A. (2003). Measuring seasonality: psychometric properties of the seasonal pattern assessment questionnaire and the inventory for seasonal variation. Psychiatry Research, 117(1), 75-83. doi: 10.1016/S0165-1781(02)00299-8
  • Zhang, O. (2010). Polytomous irt or testlet model: an evaluation of scoring models in small testlet size situations (Master’s Thesis, Universtiy of Florida). Retrived from http://ufdc.ufl.edu/UFE0042638/00001
  • Zenisky, A. L., Hambleton, R. K., & Sireci, S. G. (2002). Identification and evaluation of local item dependencies in the medical college admissions test. Journal of Educational Measurement, 39(4), 291 -309. doi:10.1111/j.1745- 3984.2002.tb01144.x