Türkiye’de Yükseköğretimde Kullanılan Bağıl Değerlendirme Sistemlerinin İstatistiksel Olarak Karşılaştırılması

Bu çalışmanın amacı Türkiye’de yükseköğretimde kullanılan farklı bağıl değerlendirme sistemlerini tespit etmek ve bu sistemleri ampirik olarak karşılaştırmaktır. Bunun için öncelikle Türkiye’de bağıl değerlendirme sistemi kullanan yaklaşık 70 devlet üniversitesinin bağıl değerlendirme yönetmelikleri incelenmiş ve üniversiteler kullandıkları bağıl değerlendirme sistemlerine göre 4 farklı grupta sınıflandırılmıştır (en yaygın yöntem olarak kullanılan yöntem sadece T puan dönüşümü uygulayan; T puan dönüşümü ve yüzdelik dilimi birlikte uygulayan; T puan dönüşümü, yüzdelik dilim ve standart sapmayı birlikte uygulayan; sadece standart sapma tabanlı bağıl değerlendirme sistemi uygulayan). T puan dönüşümü uygulayan 2 üniversitenin ve diğer bağıl değerlendirme systemleri kullanan 3 üniversitenin algoritmaları seçildikten sonra bu algoritmalar bir devlet üniversitesinde okuyan 19574 öğrencinin her bir ders için dönem sonu ham başarı puanlarını (HBP) harf notlarına ve 4’lük sistemdeki notlara dönüştürmek için kullanılmıştır. Belirlenen bu üniversitelerde kullanılan bağıl değerlendirme sistemlerinin farklılıklarını test etmek için hem mevcut kullandıkları mutlak değerlendirme sitemiyle ve hem de birbirleriyle karşılaştırılmıştır. Mutlak ve bağıl değerlendirme arasındaki farkı test etmek için eşleştirilmiş iki grup arasındaki farkların testi uygulanırken, bağıl değerlendirme sistemleri arasındaki fark ise tek yönlü varyans analizi ile hesaplanmıştır. Elde edilen sonuçlara göre bağıl değerlendirme ile hesaplanan harf notların mutlak değerlendirme ile hesaplanan notlara göre istatistiksel olarak anlamlı ve yüksek olduğu görülürken, üniversitelerin bağıl değerlendirme sistemleri kullanılarak elde edilen harf notları arasında istatistiksel olarak anlamlı olarak farklılık gösterdiği bulunmuştur. Çalışmanın sonunda gerek mutlak ve bağıl not sistemi farklılığı gerekse kullanılan bağıl değerlendirme sistemleri arasındaki farklılığın sonuçları öğrenci ve öğretim elemanları açısından tartışılmıştır.

Anahtar Kelimeler:

norm dayanaklı değerlendirme, ölçüt dayanaklı değerlendirme, yükseköğretimde değerlendirme, not verme sistemi

A Statistical Comparison of Norm-Referenced Assessment Systems Used in Higher Education in Turkey

The purpose of this study is to identify different norm-referenced assessment systems used in Turkish higher education, and to compare them empirically. Norm-referenced assessment regulations of 70 universities in Turkey was primarily analyzed, and universities were divided into four different groups depending on their norm-referenced assessment systems (only applying T-score conversion, the most commonly used method; applying T-score conversion and quantiles together; applying T-score conversion, quantiles and standard deviation together; applying standard deviation based norm-referenced assessment system). After the algorithms of two universities applying T-score conversion and three universities applying other norm-referenced assessment system were selected, they were used to convert end-of-year grade for each course of 19,574 students in a state university into letter grades and 4-point system. To test the differences of the norm-referenced assessment systems used in these universities, the norm-referenced system of a university were compared with the criterion-referenced system of the same university as well as norm-referenced systems of other universities. The paired t-test was used to identify the difference between norm-referenced and criterion-referenced assessment, while the differences between norm-referenced assessment systems were analyzed through one-way analysis of variance. The findings revealed that the letter grades calculated through the norm-referenced assessment were statistically different than the ones calculated with criterion-referenced; besides, a statistically significant difference was identified between the letter grades obtained using the norm-referenced assessment systems of universities. At the end of the study, the findings were discussed in term of students and instructors.

Keywords:

Norm-referenced assessment, criterion-referenced assessment, assessment in higher education, grading system,

PDF

___

AERA, APA, & NCME. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Atılgan, H., Yurdakul, H., & Oğretmen, T. (2012). A research on the relative and absolute evaluation for determination of students achievement. Inonu University Journal of the Faculty of Education, 13(2), 79-98.
Başol, G.(2013). Eğitimde ölçme ve değerlendirme. Ankara: Pegem Akademi Yayınları.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Büyüköztürk, Ş. (2009). Sosyal bilimler için veri analizi el kitabı: İstatistik, araştırma deseni, SPSS uygulamaları ve yorum (9. baskı). Ankara: Pegem Yayınları.
Çelen, Ü., & Aybek, E. C. (2013). Öğrenci Başarısının Öğretmen Yapımı Bir Testle Klasik Test Kuramı ve Madde Tepki Kuramı Yöntemleriyle Elde Edilen Puanlara Göre Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 4(2), 64-75.
Demir, P. & Atalmış, E.A. (2017). Öğretmen Yazılı Sınav Sorularının Hess Bilişsel Zorluk Matrisine Göre İncelenmesi. Köse, Selçuk, & Atalmış (Eds.), Sosyo Ekonomik Stratejiler III – Eğitim içinde (s. 43-73). Londra: IJOPEC Publication.
Duman, B. (2011). The views of classroom teachers related to norm-referenced assessment. Education Sciences, 6(1), 536-548.
Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10(2), 133-143. DOI 10.1007/s10459-004-4019-5
Haladyna, T.M. & Rodriguez, M.C (2013). Developing and validating test items. New York: Routledge.
Hambleton, R. K., Swaminathan, H., Algina, J., & Coulson, D. B. (1978). Criterion-referenced testing and measurement: A review of technical issues and developments. Review of Educational Research, 48(1), 1-47.
Johnson, V. E. (2003). Grade inflation: a crisis in college education. New York, NY: Springer.
Martin, I. G., & Jolly, B. (2002). Predictive validity and estimated cut score of an objective structured clinical examination (OSCE) used as an assessment of clinical skills at the end of the first clinical year. Medical education, 36(5), 418-425. https://doi.org/10.1046/j.1365-2923.2002.01207.x
Masters, J. C., Hulsmeyer, B. S., Pike, M. E., Leichty, K., Miller, M. T., & Verst, A. L. (2001). Assessment of multiple-choice questions in selected test banks accompanying text books used in nursing education. Journal of Nursing Education, 40(1), 25-32. https://doi.org/10.3928/0148-4834-20010101-07
Mehrens, W.A. & Lehmann, I.J. (1991). Measurement and Evaluation in Education and Psychology.New York: Harcourt Brace
Nartgün, Z., (2007). Aynı puanlar üzerinden yapılan mutlak ve bağıl değerlendirme uygulamalarının notlarda farklılık oluşturup oluşturmadığına ilişkin bir inceleme. Ege Eğitim Dergisi, 8 (1), 19- 40.
Sadler*, D. R. (2005). Interpretations of criteria‐based assessment and grading in higher education. Assessment & Evaluation in Higher Education, 30(2), 175-194. https://doi.org/10.1080/0260293042000264262
Sawilowsky, S (2009). New effect size rules of thumb. Journal of Modern Applied Statistical Methods, 8(2), 467–474.
Sayın, A. (2016). The Effect of using relative and absolute criteria to decide students’ passing or failing a Course. Journal of Education and Training Studies, 4(9), 1-9. http://dx.doi.org/10.11114/jets.v4i9.1571
Selvi, K. (1998). Üniversitelerde Uygulanan Başarı Değerlendirme Yaklaşımları. Kurgu Dergisi, 15, 336-345
Tan, Ş. (2015). Öğretim hedeflerinin belirlenmesi. Şeref Tan (Ed.), Öğretim ilke ve yöntemleri içinde (s.38-76). Ankara: Pegem Akademi Yayınları.
Tarrant, M., Knierim, A., Hayes, S. K., & Ware, J. (2006). The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse education in practice, 6(6), 354-363. DOI:10.1016/j.nedt.2006.07.006
Thordike, R.M. (2005). Measurement and Evaluation in Psychology and Education (7th Ed.). Upper Saddle River, NJ: Pearson Education.
Turgut, M. F., & Baykul, Y. (2015). Eğitimde ölçme ve değerlendirme. Pegem Akademi Yayıları.
Yorke, M. (2011). Summative assessment: dealing with the ‘measurement fallacy’. Studies in Higher Education, 36(3), 251-273. https://doi.org/10.1080/03075070903545082
Yücel, C. (2015). Sınıf İçi Değerlendirme ve Not Verme. Emin Karip (Ed.), Ölçme ve Değerlendirme içinde (s. 324-361). Ankara: Pegem Akademi Yayınları.