Hüseyin Hüsnü YILDIRIM

Findings from an empirical vertical scaling study with BILOG-MG

Bu çalışmada, Madde Tepki Kuramına (MTK) dayalı iki dikey ölçekleme yöntemi (sabitlemeye dayalı kestirim ve eşzamanlı kestirim) karşılaştırılmıştır. Sonuçlar, özellikle çapa madde sayısının az olduğu durumlarda, eşzamanlı kestirimin daya iyi sonuçlar verdiğini göstermektedir. Ancak, 3 parametreli MTK modeli kullanılarak öğrenci yeterliklerinin “expected a posteriori” (EAP) yöntemiyle kestirildiği durumlarda dikey ölçeklenmiş değerlerde bozulma görülmektedir. Genel olarak bu çalışmanın sonuçları, sınıf seviyeleri arasındaki gelişimin takip edilmesi amacıyla yürütülebilecek geniş ölçekli test uygulamalarında, 2 veya 3 parametreli MTK modellerine dayalı eşzamanlı kestirimin makul bir seçenek olabileceğini göstermektedir.

BILOG-MG ile empirik bir dikey ölçekleme çalışmasından bulgular

This study compared two procedures for the vertical scaling in the Item Response Theory (IRT) context: fixed estimation, and simultaneous estimation. The results favored the simultaneous estimation procedure to the fixed estimation procedure, especially when there were few anchor items. However, the results also revealed that using expected a posteriori estimates (EAP) of ability scores in 3-parameter IRT model may have a deteriorating effect on the vertically scaled test results through the simultaneous estimation procedure. Overall, the results of this empirical study showed that in the large scale tests which aim to monitor the development across grade levels, the simultaneous estimation procedure with the 2-parameter or the 3-parameter IRT models would be a reasonable choice.

PDF

___

Angoff, W.H. (1984). Scales, norms, and equivalent scores. Princeton NJ: Educational Testing Service.
Bock, R.D. (2003a). Estimation in BILOG-MG. In M. du Toit (Ed.), IRT from SSI, (pp. 599-611). Illinois: Scientific Software International.
Bock, R.D. (2003b). Uses of Item Response Theory. In M. du Toit (Ed.), IRT from SSI, (pp. 618-633). Illinois: Scientific Software International.
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46, 443-445.
Bowman, A.W., & Azzalini, A. (1997). Applied smoothing techniques for data analysis. Oxford: Clarendon Press.
Cook, L.L. & Eignor, D.R. (1991). An NCMF instructional module on IRT equating methods. Educational Measurement: Issues and Practice, 10, 37-45.
Crawley, M.J. (2007). The R book. West Sussex: John Wiley & Sons. du Toit, M. (Ed.) (2003). IRT from SSI. Illinois: Scientific Software International.
Fischer, G.H., & Molenaar, I.W. (1995). Rasch models: Foundations, recent developments, and applications. New York: Springer-Verlag.
Hambleton R.K., Swaminathan, H., Rogers, H.J. (1991). Fundamentals of Item Response Theory. New York: Springer-Verlag.
Kim, J., & Hanson, B.A. (2002). Test equating under the multiple choice model. Applied Psychological Measurement, 26(3), 255-270.
Kim, J., Lee, W., Kim, D., & Kelley, K. (2009). Investigation of vertical scaling using the Rasch model. Paper presented at the annual meeting of NCME, April 2009, San Diego.
Lord, F.M. (1980). Applications of Item Response Theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Maris, G., Schmittmann, V.D. & Borsboom, D. (2010). Who needs linear equating under the NEAT design. Measurement, 8, 11-15.
MEB. (2009). İlköğretim matematik dersi 6-8. sınıflar öğretim programı ve kılavuzu. Ankara: T.C. M.E.B. Talim ve Terbiye Kurulu Başkanlığı.
Stocking, M.L. & Lord, F.M. (1983). Developing a common metric in Item Response Theory. Applied Psychological Measurement, 7(2), 201-210.
Thissen, D. (2003). Estimation. In M. du Toit (Ed.), IRT from SSI, (pp. 592-599). Illinois: Scientific Software International.
Vale, C.D. (1986). Linking item parameters onto a common scale. Applied Psychological Measurement, 10(4), 333-344.
van der Linden, W.J. (2000). A test-theoretic approach to observed-score equating. Psychometrika, 65, 437-456.
Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1996). BILOG-MG: Multiple group IRT analysis and test maintenance for binary items. Chicago: Scientific Software International, Inc.