19192

Kernel Eşitleme Yöntemlerinin Denk Olmayan Gruplarda Ortak Madde Test Deseninde Karşılaştırılması

Problem Durumu: Eşitleme benzer içerik ve güçlük düzeyinde geliştirilen test formları arasındaki farklılıkları düzenleyerek, bu formlardan elde edilen puanların birbiri yerine kullanılmasını sağlayan istatistiksel bir süreç olarak tanımlanabilir (Kolen, 1988). Test eşitleme yöntemleri yaklaşık 100 yıldır psikometristlerin dikkatini çekmekte ve yeni yöntemler geliştirilmektedir. Eşitleme yöntemleri eşit yüzdelikli eşitlemeye dayalı yöntemler, doğrusal eşitleme yöntemleri, MTK gözlenen ve gerçek puan eşitleme, van der Linden yerel eşitleme, Levine doğrusal olmayan metot ve yeni bir yaklaşım olan Kernel eşitlemeyi kapsar (von Davier, 2013). Tek grup, eşdeğer grup ve denk olamayan gruplarda ortak madde test deseninde kullanılır (von Davier et al., 2004). Denk olmayan gruplarda ortak madde deseni (Non-Equivalent groups Anchor Test-NEAT), test güvenliği nedeniyle test formunun birden daha fazla uygulandığı durumlarda kullanılır. NEAT deseninde, her iki formda ortak maddeler yer alır ve test formları arasındaki eşitleme ilişkisi de ortak maddeler üzerinden kurulur (Kolen ve Brennan, 2014). Kernel eşitleme doğrusal ve eşit yüzdelikli eşitleme yöntemlerini içerir. NEAT deseninde zincirleme eştileme (doğrusal ve eşit yüzdelikli), son tabakalama (eşit yüzdelikli ve doğrusal), Levine gözlenen puan doğrusal eşitleme yöntemleri bulunmaktadır. Yeni bir yaklaşım olan Kernel eşitleme yöntemlerinin geleneksel eşitle yöntemleri ve Madde Tepki Kuramı eştilem yöntemleri ile karşılaştırıldığı çalışmalar bulunmakdadır. Bu çalışmanın amacı ise, Türkiye’nin de yer aldığı TIMMS fen datasındaki Kernel eşitleme yöntemlerine göre eşitlenmesidir.Araştırmanın Amacı: Bu araştırmanın amacı, TIMMS fen datasındaki 1. Ve 14. Kitapçıklarının Kernel eşitleme yöntemlerinden zincirleme ve son tabakalama eşitleme yöntemlerine göre eşitlenerek, en iyi eşitleme yönteminin belirlenmesidir. Araştırmanın Yöntemi: TIMSS 2015 araştırmasının yapıldığı dönemde Türkiye’de toplam 1.108.572 4. sınıf öğrencisi, 1.187.893 de 8. sınıf öğrencisi bulunmaktadır. 6456, 4. sınıf öğrencisi ve 6079, 8. Sınıf öğrencisi TIMMS uygulamasına katılmıştır. Araştırmanın örneklemini ise Türkiye’deki TIMMS uygulamasına katılan 8 sınıf öğrenciler arasından, bu uygulama esnasında 1. ve 14. kitapçıkları alan 865 öğrenci oluşturmaktadır. Veri analizi için TIMMS 2015 uygulanmasına katılan Türkiye’deki 8. sınıf öğrencilerin fen okuryazarlığı maddelerine verdiği cevap örüntülerinden oluşan veri setinden yararlanılmıştır. Bu çalışmada TIMMS uygulamasında yer alan 14 kitapçıktan 1 ve 14 nolu kitapçıklarda yer alan maddeler kullanılmıştır. 4 nolu kitapçıkta 39, 14 nolu kitapçıkta 38 madde yer almaktadır. Yanlış ve kayıp veriler 0 ve kısmi puanlanan ve doğru cevapların hepsi 1 olarak kodlanarak analiz edilecek veri hazırlanmıştır. Verilerin analizinin birinci aşamasında, Kernel zincirleme ve Kernel son tabakalama eşit yüzdelikli ve doğrusal eşitleme yöntemlerine göre kitapçıklar eşitlenmiştir. Daha sonra eşitleme yöntemleri DTM, PRE, SEE, SEED ve RMSD kriterlerine göre değerlendirilmiştir.Araştırmanın Bulguları: Kernel zincirleme eşit yüzdelikli, zincirleme doğrusal, son tabakalama doğrusal ve son tabakalama eşit yüzdelikli eşitleme yöntemlerine göre kitapçıklar eştilendiğind eilk olarak PRE değerleri elde edilmiştir. KE zincirleem eşit yüzdelikli ve son tabakalama eşit yüzdelikli eşitleme yöntemlerine datanın daha iyi uyum sağladığı elde edilmiştir. Eşitleme yöntemleri karşılaştırıldığında, eşit yüzdelikli eşitleme yöntemlerinin ve doğrusal eşitleme yöntemlerinin birbiriyle benzer sonuçlar ürettiği ve aralarındaki farkın DTM’den küçük olduğu elde edilmiştir. Eşitleme yöntemlerine göre SEE değerleri karşılaştırıldığında, orta puan ölçeğinde bu değerlerin birbirlerine yakın olduğu görülmektedir. Uç puanlarda ise Kernel eşit yüzdelikli eşitleme yöntemleri düşük, doğrusal eşitleme yöntemleri ise yüksek standart hatalara sahip olduğu elde edilmiştir. Eşitleme yöntemlerine göre SEED değerleri karşılaştırıldığında, eşitleme yöntemleri arasındaki farkın DTM’den küçük olduğu ve ±2 SEED çizgisi arasında kaldığı bulunmuştur. Eşitleme yöntemlerine karışan random hatayı değerlendirebilmek için RMSD katsayısı hesaplanmıştır. En az random hata içeren eşitleme yöntem son tabakalama eşitleme yönteminde iken en fazla random hata içeren yöntemin zincirleme doğrusal eşitleme yönteminde olduğu elde edilmiştir. Araştırmanın Sonuçları ve Önerileri: Kernel eşitleme yöntemleri ortalama SEE açısından karşılaştırıldığında, doğrusal eşitleme yöntemlerinin eşit yüzdelikli yöntemlere göre daha yüksek ortalama SEE sahip olduğu bulunmuştur. Bu bulgu Choi (2009) ve Liou ve diğerlerinin (1997) bulgularıyla tutarlı olmadığı görülmektedir. Elde edilen bu sonuç diğer çalışmalarda simülasyon data veya geniş örneklem büyüklüğünün kullanılmasından kaynaklı olabilir. Ayrıca KE doğrusal eşitleme yöntemlerinde uç puanlarda orta puanlara göre daha yüksek standart hata verdiği bulunmuştur. Bu bulgu literatürdeki çalışmaları desteklemektedir. RMSD katsayıları karşılaştırıldığında en az random hata içeren yöntem son tabakalama eşitleme yöntemi iken en fazla random hata içeren yöntemin zincirleme doğrusal eşitleme olduğu görülmüştür. Elde edilen bu sonuçlardan hareketle, gelecek çalışmalarda farklı kriterler kullanılarak farklı eşitleme yöntemleri kullanılabilir ve bu çalışmanın sonuçlarıyla karşılaştırılabilir.

Anahtar Kelimeler:

eşit yüzdelikli, eşitleme, doğrusal, SEED, SEE

A Comparison of Kernel Equating Methods Based on Neat Design

Problem Statement: Equating can be defined as a statistical process that allows modifying the differences between test forms with similar content and difficulty so that the scores obtained from these forms can be used interchangeably. In the literature, there are many equating methods, one of which is Kernel equating. Trends in International Mathematics and Science Study (TIMSS) aims to find out the knowledge and skills gained by the fourth and eighth-grade students in the fields of mathematics and science. TIMSS have different test forms, and these forms are equated through common items. Purpose of the Study: This research aimed to compare the equated score results of the Kernel equating (KE) methods, which are chained, and post-stratification equipercentile and linear equating methods under NEAT design. Methodology: TIMMS Science data were used in this study. The study sample consisted of 865 eighth-grade examinees who were given the Booklets 1 and 14 during the TIMSS application in Turkey. There were 39 items in Booklet 1, and 38 items in Booklet 14. Firstly, descriptive statistics were calculated and then the two Booklets were equated according to NEAT design based on Kernel chained, Kernel post-stratification equipercentile, and linear equating methods. Secondly, the equating methods were evaluated according to some criteria such as DTM, PRE, SEE, SEED, and RMSD. Findings and Results: It was seen that results based on equipercentile and linear equating methods were consistent with each other, except for a high range of the score scale. PRE values demonstrated that KE equipercentile equating methods better matched with the discrete target distribution Y, and distribution of SEED revealed that KE equipercentile and linear methods were not significantly different from each other according to DTM.

Keywords:

Equating, equipercentile, linear, RMSD SEED,

PDF

___

Andersson, B., Bränberg, K., & Wiberg, M. (2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55(6), 1-25.
Akin-Arikan, Ç. (2017). Kernel Eşitleme ve Madde Tepki Kuramına Dayalı Eşitleme Yöntemlerinin Karşılaştırılması [Comparison of kernel equating and item response theory equating methods] (Unpublished doctoral dissertation). Hacettepe University, Ankara, Turkey.
Akin-Arikan, Ç. & Gelbal, S. (2018). A Comparison of Traditional and Kernel Equating Methods. International Journal of Assessment Tools in Education, 5(3), 417-427. doi: 10.21449/ijate.409826
Choi, S. I. (2009). A Comparison of Kernel Equating and Traditional Equipercentile Equating Methods and the Parametric Bootstrap Methods for Estimating Standard Errors in Equipercentile Equating (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, United States.
Godfrey, K. E. (2007). A comparison of Kernel equating and IRT true score equating methods (Unpublished doctoral dissertation). The University of North Carolina, United States.
Grant, M. C., Zhang, L., & Damiano, M. (2009). An Evaluation of Kernel Equating: Parallel Equating with Classical Methods in the SAT Subject Tests [TM] Program. (ETS RR-09-06). Princeton, NJ: Educational Testing Service.
Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (pp. 187–220). Westport, CT: Praeger Publishers.
Holland, P., von Davier, A., Sinharay, S., & Han, N. (2006). Testing the untestable assumptions of the chain and post-stratification equating methods for the NEAT design (ETS RR-06-17). Princeton, NJ: Educational Testing Service.
Kolen, M. J. (1988). Traditional equating methodology. Educational Measurement: Issues and Practice, 7(4), 29-37. doi: 10.1111/j.1745-3992.1988.tb00843.x
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. New York: Springer.
Lee, Y. H., & von Davier, A. A. (2010). Equating through alternative kernels. In A.A. von Davier (Ed.) Statistical models for test equating, scaling, and linking (pp. 159-173). Springer New York.
Liou, M., Cheng, P. E., & Johnson, E. G. (1997). Standard errors of the Kernel equating methods under the common-item design. Applied Psychological Measurement, 21(4), 349-369. doi: 10.1177/01466216970214005
Liu, J., & Low, A. C. (2007). An Exploration of Kernel Equating Using SAT® Data: Equating to a Similar Population and to a Distant Population. (ETS RR‐07‐17). Princeton, NJ: Educational Testing Service.
MEB (2016). TIMSS 2015 Uluslararası Matematik ve Fen Eğilimleri Araştrıması: TIMMS 2015 Ulusal Matematik ve Fen Bilimleri On Raporu 4. ve 8. Sınıflar. MEB Ölçme, Değerlendirme ve Sınav Hizmetleri Genel Müdürlüğü. Ankara. [Çevrim-içi http://timss.meb.gov.tr/wp-content/uploads/TIMSS_2015_Ulusal_Rapor.pdf] Erişim Tarihi: 15 Mart 2018.
Meng, Y. (2012). Comparison of Kernel Equating and Item Response Theory Equating Methods (Unpublished doctoral dissertation). University of Massachusetts Amherst, United States.
Mao, X. (2006). An investigation of the accuracy of the estimates of standard errors for the Kernel equating functions (Unpublished doctoral dissertation). University of Iowa, Iowa City, United States.
Mao, X., von Davier, A. A., & Rupp, S. (2006). Comparisons of the Kernel equating method with the traditional equating methods on PRAXISTM data (ETS RR-06-30). Princeton, NJ: Educational Testing Service.
Moses, T., & Holland, P. (2007). Kernel and traditional equipercentile equating with degrees of presmoothing (ETS RR-07-15). Princeton, NJ: Educational Testing Service.
Mullis, I. V. S., Cotter, K. E., Centurino, V. A. S., Fishbein, B. G., & Liu, J. (2016). Using Scale Anchoring to Interpret the TIMSS 2015 Achievement Scales. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and Procedures in TIMSS 2015 (pp. 14.1-14.47). Retrieved from Boston College, TIMSS & PIRLS International Study Center website: http://timss.bc.edu/publications/timss/2015-methods/chapter-14.html
Norman Dvorak, R. K. (2009). A comparison of Kernel equating to the test characteristic curve methods (Unpublished doctoral dissertation). University of Nebraska, Lincoln, United States.
R Core Team. (2017). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria.
Ricker, K., & von Davier, A. A. (2007). The impact of anchor test length on equating results in a nonequivalent groups design (ETS RR-07-44). Princeton, NJ: Educational Testing Service.
von Davier, A., Holland, P. W., & Thayer, D. T. (2004). The Kernel method of equating. New York, NY: Springer.
von Davier, A. A., Holland, P. W., Livingston, S. A., Casabianca, J., Grant, M. C., & Martin, K. (2006). An evaluation of the Kernel equating method. A special study with pseudotests constructed from real test data (ETS RR-06-02). Princeton, NJ: Educational Testing Service.