Nuri DOĞAN, Abdullah Faruk KILIÇ

The Effect of Item Weighting on Reliability and Validity

The purpose of this study is to examine the effect of the item weighting method developed by researchers on the construct validity of the test. For this purpose, a Monte Carlo simulation study was carried out. Test length, average factor loadings, and sample size were considered as simulation conditions. Item weighting method was defined as follows: If average score of the individuals (calculated as individual's test score/the number of items) plus item difficulty index is 1 and over then item reliability index added to individual’s item score (1 or 0); if not, then the item score of the individual (1 if 1, 0 if 0) is preserved. As a result of the research, it was observed that the weighting method contributes to the construct validity. According to the results of confirmatory factor analysis, the comparative fit index (CFI) and the root mean square error of approximation (RMSEA) values were improved. According to the research findings, the weighting method used in this research can be recommended.

PDF

___

American Educational Research Association (AERA), American Educational Research Association (APA), & National Council on Measurement In Education (NCME). (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Akkuş, O., & Baykul, Y. (2001). Çoktan seçmeli test maddelerinin puanlamada, seçenekleri farklı biçimde ağırlıklandırmanın madde ve test istatistiklerine olan etkisinin incelenmesi. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 20, 9–15.
Birnbaum, A. (1968). Some latent models and their use in inferring an examinee’s ability. In F. M.
Lord & M. R. Novick (Eds.), Statistical Theories of Mental Test Scores (pp. 397–479). Addison-Wesley: Reading, MA.
Burt, C. (1950). The influence of differential weighting. British Journal of Statistical Psychology, 3(2), 105–125. https://doi.org/10.1111/j.2044-8317.1950.tb00288.x
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Lawrence Erlbaum Associates.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. https://doi.org/10.1037/0033-2909.112.1.155
Comrey, A. L. (1988). Factor-analytic methods of scale development in personality and clinical psychology. Journal of Consulting and Clinical Psychology, 56(5), 754–761. https://doi.org/10.1037/0022-006X.56.5.754
Daniel, L. G. (1989). Comparisons of exploratory and confirmatory factor analysis. In Mid-South Educational Research Association. Little Rock, AR: (ERIC Document Reproduction Service No. ED314447).
Dick, W. (1965). Item weighting: Test parameter effects and comparison of the efficiency of various weighting methods. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3564813).
Erdem, B., Ertuna, L., & Doğan, N. (2016, September). Çoktan seçmeli testlerde seçenek ağırlıklandırma yöntemlerinin testin faktör yapısına etkisinin incelenmesi (pp. 148–149). Paper presented at the Fifth International Congress on Measurement And Evaluation in Education And Psychology, Antalya, TR. Abstract retrieved from http://epod2016.akdeniz.edu.tr/_dinamik/333/53.pdf.
Erkuş, A. (2014). Psikolojide ölçme ve ölçek geliştirme-I: Temel kavramlar ve işlemler (2nd ed.). Ankara: Pegem Akademi.
Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7(3), 286–299. https://doi.org/10.1037/1040-3590.7.3.286
Ghiselli, E. E. (1964). Theory of psychological measurement. New York: McGraw-Hill.
Gorsuch, R. L. (1974). Factor analysis. Toronto: W. B. Saunders.
Gözen-Çıtak, G. (2010). Klasik Test ve Madde Tepki Kuramlarına göre çoktan seçmeli testlerde farklı puanlama yöntemlerinin karşılaştırılması. İlköğretim Online, 9(1), 170–187.
Guadagnoli, E., & Velicer, W. F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103(2), 265–275.
Guilford, J. P. (1954). Psychometric methods (2nd ed.). New York: McGraw-Hill.
Guilford, J. P., Lovell, C., & Williams, R. M. (1942). Completely weighted versus unweighted scoring in an achievement examination. Educational and Psychological Measurement, 2, 15– 21.
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Hallquist, M., & Wiley, J. (2017). MplusAutomation: Automating Mplus Model Estimation and Interpretation. Retrieved from https://cran.r-project.org/package=MplusAutomation
Horst, P. (1936). Obtaining a composite measure from a number of different measures of the same attribute. Psychometrika, 1(1), 53–60. https://doi.org/10.1007/BF02287924
MEB. (2013). Temel Eğitimden Ortaöğretime Geçiş. Retrieved March 24, 2015, from http://oges.meb.gov.tr/docs2104/sunum.pdf
MEB. (2019). Sınavla öğrenci alacak ortaöğretim kurumlarına ilişkin merkezî sınav başvuru ve uygulama kılavuzu. Retrieved May 15, 2019, from https://www.meb.gov.tr/meb_iys_dosyalar/2019_04/03134315_Kilavuz2019.pdf
Messick, S. (1995). Validity of psychological assessment. American Psychologist, 50(9), 741–749. https://doi.org/10.1037//0003-066X.50.9.741
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd. ed.). New York, NY: McGraw-Hill.
Özdemir, D. (2004). Çoktan seçmeli testlerin Klasik Test Teorisi ve Örtük Özellikler Teorisine göre hesaplanan psikometrik özelliklerinin iki kategorili ve ağırlıklandırılmış puanlanması yönünden karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 26, 117–123.
Phillips, A. J. (1943). Further evidence regarding weighted versus unweighted scoring of examinations. Educational and Psychological Measurement, 3(1), 151–155. https://doi.org/10.1177/001316444300300114
R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.r-project.org/.
Revelle, W. (2016). psych: Procedures for psychological, psychometric, and personality research. Evanston, Illinois. Retrieved from https://cran.r-project.org/package=psych
Rhemtulla, M., Brosseau-Liard, P. É., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17(3), 354–373. https://doi.org/10.1037/a0029315
Rotou, O., Headrick, T. C., & Elmore, P. B. (2002). An investigation of difference scores for a grade-level testing program. International Journal of Testing, 2(2), 83–105. https://doi.org/10.1207/S15327574IJT0202
Stevens, J. P. (2009). Applied multivariate statistics for the social science (5th ed.). London: Routledge.
Trierweiler, T. (2009). An evaluation of estimation methods in confirmatory factor analytic models with ordered categorical data in LISREL. (Doctoral dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3416004).
Yurdugül, H. (2010). Farklı madde puanlama yöntemlerinin ve farklı test puanlama yöntemlerinin karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 1(1), 1–8.