Genellenebilirlik Kuramı Alternatif Karar Çalışmaları ile Senaryolar ve Gerçek Durumlar İçin Elde Edilen G ve Phi Katsayılarının Tutarlılığının İncelenmesi

Bu araştırmada, aynı değişkenlik kaynaklarının farklı düzeyleri için alternatif K Çalışmaları senaryolarıyla elde edilen G ve Phi katsayıları ile aynı değişkenlik kaynaklarının aynı düzeyleriyle gerçekte uygulanması sonucunda elde edilen G ve Phi katsayılarının tutarlılıkları karşılaştırılmıştır. Araştırmada müzik öğretmenliği programına öğrenci seçmek amacı ile yapılan özel yetenek seçme sınavlarının ardışık iki yıl verileri kullanılmıştır. Bu sınavlarda ortak boyut olan çift ses, ezgi ve tartım boyutları bir yıl üç puanlayıcı diğer yıl ise dört puanlayıcı tarafından puanlanılmıştır. Araştırmada kullanılan özel yetenek sınavlarının alt boyutlardan oluşması nedeniyle, birey (b), görev (g) ve puanlayıcı (p) değişkenlik kaynağı (facets) olmak üzere, G-kuramının çok değişkenli ($b^. x g ^o x p^.$) deseni kullanılmıştır. Üç puanlayıcı sınav durumundan dört puanlayıcı sınav durumunun ve dört puanlayıcılı sınav durumundan üç puanlayıcı sınav durumunun G ve Phi katsayıları hesaplanarak, senaryolarla elde edilen katsayılarla gerçekleşenler karşılaştırılmıştır. Sonuç olarak; puanlayıcı sayısının artırılması senaryosu ile kestirilen G ve Phi katsayılarının gerçekten büyük, puanlayıcı sayısının azaltılması yoluyla kestirilen G ve Phi katsayılarının ise gerçekte olduğundan küçük çıktığı görülmüştür.

An Investigation on Consistency of G and Phi Coefficients Obtained by Generalizability Theory Alternative Decisions Study for Scenarios and Actual Cases

In this research, consistency of G and Phi coefficients obtained by alternative D-study scenarios for different conditions of similar facets was compared with G and Phi coefficient obtained by actual cases for the same conditions of similar facets. In the research, the date of consecutive years for "Special Ability Selection Examination" in order to make selection among the candidate students for the program of music teaching was used. In this exam, common dimensions, that is dual sound, note-hearing and rhythm dimensions are marked by three raters for one year and by four raters for the following year. Since special ability test used in the research had some subtests, a multivariate pattern ($b^. x g ^o x p^.$) of G-theory, that is individual (b), tasks (g) and raters (p) as facets was used. By calculating G and Phi coefficients for four-raters test condition from three-raters test condition and vice versa, we compared coefficient of the scenarios with that of actual data. As a result, it was seen that G and Phi coefficient obtained by decreasing the number of raters was smaller than actual case while G and Phi coefficients estimated by scenarios of increasing the number of raters was higher than actual case.

PDF

___

Allal, L. (1990). Generalizability Theory. In Wallberg, J. H. & Haertel, D. G (Eds.), The International Encyclopedia of Educational Evaluation, (pp. 274-279). New York: Pergamon Pres,
Atılgan, Hakan (2004). genellenebilirlik kuramı ve çok değişkenlik kaynaklı rasch modelinin karşılaştırılmasına ilişkin bir araştırma. (Yayınlanmamış Doktora Tezi) Ankara: Hacettepe Üniversitesi.
Brennan, R. L.ve Prediger, D. J. (1981). Coefficient Kappa: some uses, missues, and alternatives. Educational and Psychological Measurement, 41,687-699.
Brennan, R. L. (2001). Generalizability Theory. New York: Springer-Verlag.
Brennan, Robert L. (2001). Manual for mGENOWA Version 2.1, Iowa Testing Programs Occasional Papers, Number 50, Iowa: College Education The University of Iowa.
Brennan, R. L. (2003). Coefficients and indices in generalizability theory. CASMA Research Report Number 1, Iowa: Center for advanced studies in Measurement and Assessment.
Crick, Joe E. ve Brennan, Robert L. (1983). Manual for GENOVA: A Generalized Analysis of Variance System, Iowa: The American College Testing Program.
Crocker, L & Algina, J. (1986). Introduction to clasical and modern test theory. Belmont CA: Wadsworth Group/Thomson Learning Inc..
Lee, Y., Kantor, R. & Mollaun, P. (2002). Score dependability of the writing and speaking section of new TOEFL. Educational Testing Service.
Lee, Yong-Won & Kantor, R.(2003). Investigating differential rater functioning for academic writing samples: an MFRM approach. Educational Testing Service, Unpublished Work.
Lynch, B. K. & McNamara, T. F. (1998). Using G-theory and many-facet rasch measurement in the development of performance assessments of the ESL speaking skills of imigrants. Language Testing, 15 (2) 158-180.
Matt, E. G. (2003). Generalizability Theory. URL: http://www.psychology.sdsu.edu/ faculty/matt/Pubs/GThtml/ GTheory_GEMatt.html
Nunnally, J. C. & Bernstein, I. H. (1991). Psychometric theory (3rd Ed.). New York: McGraw-Hill Inc..
Shavelson, R. J & Webb, M. N. (1991). Generalizability theory a prime. California: SAGE Publication, Inc..
Shavelson, R. J. ve Webb, M. N. (2003). Generalizability Theory. In Kempf-Leonard (Ed.), Encyclopedia of Social Measurement. San Diego: Kimberly. Academic Pres..
Stuhlmann, J., Daniel, C. Dellinger, A., Denny, R. K. & Powers, T. (1999). A generalizability study of the effects training on teachers' abilities to rate children's writing using rubric, Journal of Reading Psychology, 20:107-127.
Traub, R: E. (1994). Reliability for the social sciences: theory and apitcations. California: SAGE Publications Inc..
Vanleeuwen, D. M. (1997). Assessing reliability of measurements with generalizability theory: an aplication to inter-rater Reliability. Journal of Agricultural Education, Vol. 38, no.3.