Konuşma Sınavlarında Bir Ölçme Hatası Kaynağı Olarak Notlandıranların Öğrencilerin Dil Seviyelerini Bilmesi
Yaygın olarak kullanılmakta olsa da notlandıran olarak insan faktörünün varlığı ve notlardaki farklılığa neden olan etmenler sebebiyle konuşma sınav notlarının güvenirliği konusunda süregelen bir tartışma vardır. Bu yarı deneysel çalışma, konuşma sınavlarının değerlendirilmesinde, not verenlerin öğrencilerin dil yeterlilik seviyelerini önceden biliyor olmasının verdikleri notları üzerindeki etkilerini araştırmayı amaçlamaktadır. Bu çalışma, Türkiye’deki bir devlet üniversitesinde yabancı dil olarak İngilizce öğreten ve aynı üniversitede konuşma sınavlarında notlandıran olarak görev alan 15 okutman ile ön ve son test olarak iki oturumda yürütülmüştür. Hem ön hem de son testte, notlandıranlar üç farklı seviyeden 12 öğrencinin video kaydına alınmış aynı konuşma sınavı performansları için not vermiştir. Aynı zamanda, performanslar için not verirken, notlandıranlar eş zamanlı olarak ne düşündükleri ile ilgili sözlü bildirimde bulunmuştur. Öğrencilerin dil yeterlilik seviyeleri ile ilgili ön testte herhangi bir bilgi verilmezken, notlandıranlar öğrencilerin seviyeleri konusunda son testte sözlü ve yazılı olarak bilgilendirilmiştir. Sonuçlara göre, önteste kıyasla son testte Toplam Notların büyük çoğunluğunun son testte düştüğü veya yükseldiği saptanmıştır. Tüm notlandıranların video kayıtlı sözlü bildirimleri tematik olarak incelendiğinde, notlandıranların çoğunun son testte not verirken öğrencilerin dil yeterlilik seviyelerine değindikleri gözlemlenmiştir. Çalışmanın sonuçları, not verenlerin ve sınava giren adayların aksan, uyruk ve cinsiyet özellikleri gibi etkenlerin yanısıra, daha güvenilir test sonuçları için, not verenlerin adayların dil yeterlilik seviyelerini önceden biliyor olmasının kontrol edilmesi gereken bir değişken olduğunu ileri sürmektedir.
Raters’ Knowledge of Students’ Proficiency Levels as a Source of Measurement Error in Oral Assessments
Although widely conducted, there has been an ongoing debate on the reliability of oral exams resulting scores due to the existence of human raters and the factors for the differences in their scorings. This quasi-experimental study investigated the possible effect(s) of the raters' prior knowledge of students' proficiency levels on rater scorings in oral interview assessments. The study was carried out in a pre- and post-test design with 15 EFL instructors who also performed as raters in oral assessments at a Turkish state university. In both pre- and post-tests, the raters assigned scores to the same video-recorded oral interview performances of 12 students from three different proficiency levels. While rating the performances, the raters also provided verbal reports about their thought processes. The raters were not informed about the students' proficiency levels in the pre-test, while this information was provided in the post-test. According to the findings, majority of the Total Scores ranked lower or higher in the post-test. The thematic analysis of the raters' video recorded verbal reports revealed that most of the raters referred to the proficiency levels of the students while assigning scores in the post-test. The findings of the study suggest that besides factors such as accent, nationality, and gender of the test-takers and the assessors, raters’ prior knowledge of students' proficiency levels could be a variable that needs to be controlled for more reliable test results.
___
- Bachman, L. F. (1990). Fundamental considerations in language testing. New York: Oxford University Press.
- Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language
tests. New York: Oxford University Press.
- Brown, A. (2000). An investigation of the rating process in the IELTS oral interview. IELTS Research Reports, 3, 49-
84.
- Brown, H. D. (2004). Language assessment: Principles and classroom practices. New York: Longman.
- Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall Regents.
- Caban, H. L. (2003). Rater group bias in the speaking assessment of four L1 Japanese ESL students. Second Language Studies, 21(2), 1-44.
- Carey, M. D., Mannell, R. H., & Dunn, P. K. (2011). Does a rater’s familiarity with a candidate’s pronunciation
affect the rating in oral proficiency interviews? Language Testing, 28(2), 201-219.
- Chalhoub-Deville, M. (1995). Deriving oral assessment scales across different tests and rater groups. Language
Testing, 12(1), 16-33.
- Chalhoub-Deville, M., & Wigglesworth, G. (2005). Rater judgment and English language speaking proficiency.
World Englishes, 24(3), 383-391.
- Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching,
assessment. Cambridge: Cambridge University Press.
- Elder, C. (1998). What counts as bias in language testing? Papers in Language Testing and Assessment, 7, 1- 42.
- Ellis, R. O. D., Johnson, K. E., & Papajohn, D. (2002). Concept mapping for rater training. TESOL Quarterly, 36(2),
219-233.
- Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87(3), 215-251.
- Fulcher, G., & Davidson, F. (2007). Language testing and assessment. London, New York: Routledge.
- Hughes, A. (2003). Testing for language teachers (2nd Ed.). Cambridge: Cambridge University Press.
- Joe, J. N. (2008). Using verbal reports to explore rater perceptual processes in scoring: An application to oral
communication assessment (Unpublished doctoral dissertation). James Madison University, VA, USA.
- Joe, J. N., Harmes, J. C., & Hickerson, C. A. (2011). Using verbal reports to explore rater perceptual processes in
scoring: A mixed methods application to oral communication assessment. Assessment in Education:
Principles, Policy & Practice, 18(3), 239-258.
- Joughin, G. (1998). Dimensions of oral assessment. Assessment & Evaluation in Higher Education, 23(4), 367-378.
- Kang, O. (2012). Impact of rater characteristics and prosodic features of speaker accentedness on ratings of
international teaching assistants' oral performance. Language Assessment Quarterly, 9(3), 249-269.
- Kunnan, A. J. (2000). Fairness and justice for all. In A. J. Kunnan (Ed.), Fairness and validation in language
assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida (pp.
1-14). Cambridge: Cambridge University Press.
- Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language
Testing, 12(1), 54-71.
- MacIntyre, P. D., Noels, K. A., & Clément, R. (1997). Biases in self-satings of second language proficiency: The role
of ranguage anxiety. Language Learning, 47(2), 265-287.
- McNamara, T., & Roever, C. (2006). Language testing: The social dimension. MA & Oxford: Blackwell.
- Myford, C. M., & Wolfe, E. W. (2000). Monitoring sources of variability within the Test of Spoken English
Assessment System. TOEFL Research Report 65.NJ: TOEFL Research Program, Educational Testing
Service.
- Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement:
Part I. Journal of Applied Measurement, 4(4), 386–422.
- O’Loughlin, K. (2002). The impact of gender in oral proficiency testing. Language Testing, 19(2), 169-192.
- Orr, M. (2002). The FCE speaking test: using rater reports to help interpret test scores. System, 30(2), 143-154.
- O'Sullivan, B. (2000). Exploring gender and oral proficiency interview performance. System, 28(3), 373-386.
- Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating
data. Psychological Bulletin, 88(2), 413-428.
- Stoynoff, S. (2012). Research agenda: Priorities for future research in second language assessment. Language
Teaching, 45(02), 234-249.
- Van Someren, M. W., Barnard, Y. F., & Sandberg, J. A. C. (1994). The think aloud method: A practical guide to
modeling cognitive processes. London: Academic Press.
- Weir, C. J. (2005). Language testing and validation: An evidence-based approach. New York: Palgrave Macmillan.
- Winke, P., & Gass, S. (2012). The influence of second language experience and accent familiarity on oral proficiency
rating: A qualitative investigation. TESOL Quarterly, 47(4), 762-789.
- Winke, P., Gass, S., & Myford, C. (2011). The relationship between raters' prior language study and the evaluation of
foreign language speech samples. TOEFL iBT® Research Report. NJ: Educational Testing Services.