Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students

Öz This study aimed to examine the effect of rater training on the differential rater function (rater error) in the process of assessing the academic writing skills of higher education students. The study was conducted with a pre-test and post-test control group quasi-experimental design. The study group of the research consisted of 45 raters, of whom 22 came from experimental, and 23 came from control groups. The raters were pre-service teachers who did not participate in any rater training before, and it was investigated that they had similar experiences in assessment. The data were collected using an analytical rubric developed by the researchers and an opinion-based writing task prepared by the International English Language Testing System (IELTS). Within the scope of the research, the compositions of 39 students that were written in a foreign language (English) were assessed. Many Facet Rasch Model was used for the analysis of the data, and this analysis was conducted under the Fully Crossed Design. The findings of the study revealed that the given rater training was effective on differential rater function, and suggestions based on these results were presented.

Kaynakça

Aryadoust, V. (2016). Understanding the growth of ESL paragraph writing skills and its relationships with linguistic features. Educational Psychology, 36(10), 1742-1770. https://doi.org/10.1080/01443410.2014.950946

Attali, Y., Bridgeman, B., & Trapani, C. (2010). Performance of a generic approach in automated essay scoring. Journal of Technology, Learning, and Assessment, 10(3), 1-16. Retrieved from https://ejournals.bc.edu/ojs/index.php/jtla/article/view/1603

Baştürk, M. (2012). İkinci dil öğrenme algılarının belirlenmesi: Balıkesir örneği. Balikesir University Journal of Social Sciences Institute, 15(28-1), 251-270. Retrieved from http://dspace.balikesir.edu.tr/xmlui/handle/20.500.12462/4594

Bayat, N. (2014). Öğretmen adaylarının eleştirel düşünme düzeyleri ile akademik yazma başarıları arasındaki ilişki. Eğitim ve Bilim, 39(173), 155-168. Retrieved from http://eb.ted.org.tr/index.php/EB/article/view/2333

Bernardin, H. J. & Pence, E. C. (1980). Effects of rater training: New response sets and decreasing accuracy. Journal ofApplied Psychology, 65, 60-66. https://doi.org/10.1037/0021-9010.65.1.60

Bernardin, H. J., & Buckley, M. R. (1981). Strategies in rater training. Academy of Management Review, 6(2), 205-212. Retrieved from https://journals.aom.org/doi/abs/10.5465/amr.1981.4287782

Bijani, H. (2018). Investigating the validity of oral assessment rater training program: A mixed-methods study of raters’ perceptions and attitudes before and after training. Cogent Education, 5(1), 1-20. https://doi.org/10.1080/2331186X.2018.1460901

Bitchener, J., Young, S., & Cameron, D. (2005). The effect of different types of corrective feedback on ESL students. Journal of Second Language Writing, 14, 191–205. https://doi.org/10.1016/j.jslw.2005.08.001

Bond, T., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences. New York and London: Routledge. https://doi.org/10.4324/9781315814698

Brennan, R.L., Gao, X., & Colton, D.A. (1995). Generalizability analyses of work key listening and writing tests. Educational and Psychological Measurement, 55(2), 157-176. https://doi.org/10.1177/0013164495055002001

Brijmohan, A. (2016). A many-facet Rasch measurement analysis to explore rater effects and rater training in medical school admissions. (Doktora Tezi). Retrieved from http://www.proquest.com/

Brookhart, S. M. (2013). How to create and use rubrics for formative assessment and grading. Alexandria, Virginia: ASCD.

Brown, H. D. (2007). Teaching by principles: An interactive approach to language pedagogy. New York: Pearson Education.

Brown, J. D., & Hudson, T. (1998). The alternatives in language assessment. TESOL quarterly, 32(4), 653-675. https://doi.org/10.2307/3587999

Burstein, J., Kukich, K., Wolff, S., Lu, C., Chodorow, M., Braden-Harder, L., & Harris, M. D. (1998). Automated scoring using a hybrid feature identification technique. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Quebec, Canada. https://doi.org/10.3115/980845.980879

Büyüköztürk, Ş. (2011). Deneysel desenler- öntest-sontest kontrol grubu desen ve veri analizi. Ankara: Pegem Akademi Yayıncılık.

Carter, C., Bishop, J. L., & Kravits, S. L. (2002). Keys to college studying: becoming a lifelong learner. New Jersey: Printice Hall.

Çekici, Y. E. (2018). Türkçe’nin yabancı dil olarak öğretiminde kullanılan ders kitaplarında yazma görevleri: Yedi iklim ve İstanbul üzerine karşılaştırmalı bir inceleme. Gaziantep Üniversitesi Eğitim Bilimleri Dergisi, 2(1), 1-10. Retrieved from http://dergipark.gov.tr/http-dergipark-gov-tr-journal-1517-dashboard/issue/36422/367409

Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289. https://doi.org/10.3102/10769986022003265

Congdon, P., & McQueen, J. (2000). The stability of rater severity in large-scale assessment programs. Journal of Educational Measurement, 37(2), 163-178. https://doi.org/10.1111/j.1745-3984.2000.tb01081.x

Cumming, A. (2013). Assessing integrated writing tasks for academic purposes: Promises and perils. Language Assessment Quarterly, 10(1), 1–8. https://doi.org/10.1080/15434303.2011.622016

Cumming, A. (2014). Assessing integrated skills. In A. Kunnan (Vol. Ed.), The companion to language assessment: Vol. 1, (pp. 216–229). Oxford, United Kingdom: Wiley-Blackwell. https://doi.org/10.1002/9781118411360.wbcla131

Dunbar, N.E., Brooks, C.F., & Miller, T.K. (2006). Oral communication skills in higher education: Using a performance-based evaluation rubric to assess communication skills. Innovative Higher Education, 31(2), 115-128. https://doi.org/10.1007/s10755-006-9012-x

Ebel, R.L., & Frisbie, D.A. (1991). Essentials of educational measurement. New Jersey: Prentice Hall Press.

Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185. https://doi.org/10.1177/0265532207086780

Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt: Peter Lang.

Ellis, R. O. D., Johnson, K. E., & Papajohn, D. (2002). Concept mapping for rater training. TESOL Quarterly, 36(2), 219-233. https://doi.org/10.2307/3588333

Engelhard Jr, G., & Myford, C. M. (2003). Monitoring faculty consultant performance in the advanced placement English Literature and composition program with a many‐faceted Rasch model. ETS Research Report Series, i-60. https://doi.org/10.1002/j.2333-8504.2003.tb01893.x

Engelhard, G. (2002). Monitoring raters in performance assessments. In G. Tindal and T. Haladyna (Eds.), Large-scale assessment programs for ALL students: Development, implementation, and analysis (pp. 261-287). Mahway, NJ: Lawrence Erlbaum Associates

Esfandiari, R. (2015). Rater errors among peer-assessors: applying the many-facet Rasch measurement model. Iranian Journal of Applied Linguistics, 18(2), 77-107. https://doi.org/10.18869/acadpub.ijal.18.2.77

Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. Iranian Journal of Language Testing, 1(1), 1-16. Retrieved from http://www.ijlt.ir/portal/files/401-2011-01-01.pdf

Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101. Retrieved from https://jalt-publications.org/files/pdf-article/jj2012a-art4.pdf

Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83. Retrieved from https://pdfs.semanticscholar.org/dd21/ba5683dde8b616374876b0c53da376c10ca9.pdf

Feldman, M., Lazzara, E. H., Vanderbilt, A.A., & DiazGranados, D. (2012). Rater training to support high‐stakes simulation‐based assessments. Journal of Continuing Education in the Health Professions, 32(4), 279-286. https://doi.org/10.1002/chp.21156 Gillet, A., Hammond, A. & Martala, M. (2009). Successful academic writing. New York: Pearson Longman.

Göçer, A. (2010). Türkçe öğretiminde yazma eğitimi. Uluslararası Sosyal Araştırmalar Dergisi, 12(3), 178-195. Retrieved from http://www.sosyalarastirmalar.com/cilt3/sayi12pdf/gocer_ali.pdf

Goodrich, H. (1997). Understanding Rubrics: The dictionary may define" rubric," but these models provide more clarity. Educational Leadership, 54(4), 14-17.

Gronlund, N. E. (1977). Constructing achievement test. New Jersey: Prentice-Hall Press

Guadagnoli, E., & Velicer, W. F. (1988). Relation of sample size to the stability of component patterns. Psychological bulletin, 103(2), 265-275. https://doi.org/10.1037/0033-2909.103.2.265

Haladyna, T. M. (1997). Writing test items in order to evaluate higher order thinking. USA: Allyn & Bacon.

Hauenstein, N. M., & McCusker, M. E. (2017). Rater training: Understanding effects of training content, practice ratings, and feedback. International Journal of Selection and Assessment, 25(3), 253-266. https://doi.org/10.1111/ijsa.12177

Howitt, D., & Cramer, D. (2008). Introduction to statistics in psychology. Harlow: Pearson Education.

Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press. IELTS (t.y). Prepare for IELTS. Retrieved from https://takeielts.britishcouncil.org/prepare-test/free-sample-tests/writing-sample-test-1-academic/writing-task-2

İlhan, M. (2015). Standart ve SOLO taksonomisine dayalı rubrikler ile puanlanan açık uçlu matematik sorularında puanlayıcı etkilerinin çok yüzeyli Rasch modeli ile incelenmesi. (Doktora Tezi). Retrieved from https://tez.yok.gov.tr

İlhan, M., & Çetin, B. (2014). Performans değerlendirmeye karışan puanlayıcı etkilerini azaltmanın yollarından biri olarak puanlayıcı eğitimleri: Kuramsal bir analiz. Journal of European Education, 4(2), 29-38. https://doi.org/10.18656/jee.77087

Jin, K. Y., & Wang, W. C. (2017). Assessment of differential rater functioning in latent classes with new mixture facets models. Multivariate behavioral research, 52(3), 391-402. https://doi.org/10.1080/00273171.2017.1299615

Johnson, R. L., Penny, J. A., & Gordon, B. (2008). Assessing performance: Designing, scoring, and validating performance tasks. New York: Guilford Press.

Kassim, N. L. A (2007). Exploring rater judging behaviour using the many-facet Rasch model. Paper Presented in the Second Biennial International Conference on Teaching and Learning of English in Asia: Exploring New Frontiers (TELiA2), Universiti Utara, Malaysia. Retrieved from http://repo.uum.edu.my/3212/

Kassim, N. L. A. (2011). Judging behaviour and rater errors: an application of the many-facet Rasch model. GEMA Online Journal of Language Studies, 11(3), 179-197. Retrieved from http://ejournals.ukm.my/gema/article/view/49

Kim, Y., Park, I., & Kang, M. (2012). Examining rater effects of the TGMD-2 on children with intellectual disability. Adapted Physical Activity Quarterly, 29(4), 346-365. https://doi.org/10.1123/apaq.29.4.346

Kim, Y.K. (2009). Combining constructed response items and multiple choice items using a hierarchical rater model (Doktora Tezi). Retrieved from http://www.proquest.com/

Kondo, Y. (2010). Examination of rater training effect and rater eligibility in L2 performance assessment. Journal of Pan-Pacific Association of Applied Linguistics, 14(2), 1-23. Retrieved from https://eric.ed.gov/?id=EJ920513

Kubiszyn, T., & Borich, G. (2013). Educational testing and measurement. New Jersey: John Wiley & Sons Incorporated. Kutlu, Ö., Doğan, C.D., & Karaya, İ. (2014). Öğrenci başarısının belirlenmesi: Performansa ve portfolyoya dayalı durum belirleme. Ankara: Pegem Akademi Yayıncılık.

Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel psychology, 28(4), 563-575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x

Linacre, J. M. (1993). Rasch-based generalizability theory. Rasch Measurement Transaction, 7(1), 283-284. Retrieved from https://www.rasch.org/rmt/rmt71h.htm

Linacre, J. M. (1994). Many-facet Rasch measurement. Chicago: Mesa Press.

Linacre, J. M. (1996). Generalizability theory and many-facet Rasch measurement. Objective measurement: Theory into practice, 3, 85-98. Retrieved from https://files.eric.ed.gov/fulltext/ED364573.pdf

Linacre, J. M. (2017). A user’s guide to FACETS: Rasch-model computer programs. Chicago: MESA

Liu, J., & Xie, L. (2014). Examining rater effects in a WDCT pragmatics test. Iranian Journal of Language Testing, 4(1), 50-65. Retrieved from https://cdn.ov2.com/content/ijlte_1_ov2_com/wp-content_138/uploads/2019/07/422-2014-4-1.pdf

Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54-71. https://doi.org/10.1177/026553229501200104

Lunz, M. E., Wright, B. D. & Linacre, J. M. (1990). Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3(4), 331-345. https://doi.org/10.1207/s15324818ame0304_3

May, G. L. (2008). The effect of rater training on reducing social style bias in peer evaluation. Business Communication Quarterly, 71(3), 297-313. https://doi.org/10.1177/1080569908321431

McDonald, R. P. (1999). Test theory: A unified approach. Mahwah, NJ: Erlbaum.

McNamara, T. F. (1996). Measuring second language performance. New York: Longman.

Moore, B.B. (2009). Consideration of rater effects and rater design via signal detection theory. (Doktora Tezi). Retrieved from http://www.proquest.com/

Moser, K., Kemter, V., Wachsmann, K., Köver, N. Z., & Soucek, R. (2016). Evaluating rater training with double-pretest one-posttest designs: an analysis of testing effects and the moderating role of rater self-efficacy. The International Journal of Human Resource Management, 1-23. https://doi.org/10.1080/09585192.2016.1254102

Moskal, B.M. (2000). Scoring rubrics: What, when and how?. Retrieved from http://pareonline.net/htm/v7n3.htm

Murphy, K.R. & Balzer, W.K. (1989). Rater errors and rating accuracy. Journal of Applied Psychology, 74, 619-624. https://doi.org/10.1037/0021-9010.74.4.619

Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422. Retrieved from http://psycnet.apa.org/record/2003-09517-007

Oosterhof, A. (2003). Developing and using classroom assessments. New Jersey: Merrill-Prentice Hall Press.

Osburn, H. G. (2000). Coefficient alpha and related internal consistency reliability coefficients. Psychological methods, 5(3), 343. http://dx.doi.org/10.1037/1082-989X.5.3.343

Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94(1), 31-37. Retrieved from http://peterliljedahl.com/wp-content/uploads/Myth-of-Objectivity2.pdf

Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465-493. https://doi.org/10.1177/0265532208094273

Selden, S., Sherrier, T., & Wooters, R. (2012). Experimental study comparing a traditional approach to performance appraisal training to a whole‐brain training method at CB Fleet Laboratories. Human Resource Development Quarterly, 23(1), 9-34. https://doi.org/10.1002/hrdq.21123

Shale, D. (1996). Essay reliability: Form and meaning. In: White, E. Lutz, W. & Kamusikiri S. (Eds.), Assessment of writing: Politics, policies, practices (pp. 76–96). New York: MLAA.

Stamoulis, D.T. & Hauenstein, N.M.A. (1993). Rater training and rating accuracy: Training for dimensional accuracy versus training for ratee differentiation. Journal of Applied Psychology, 78(6), 994-1003. https://doi.org/10.1037/0021-9010.78.6.994

Storch, N., & Tapper, J. (2009). The impact of an EAP course on postgraduate writing. Journal of English for Academic Purposes, 8, 207-223. https://doi.org/10.1016/j.jeap.2009.03.001

Sulsky, L.M., & Day, D.V. (1992). Frame-of-reference training and cognitive categorization: An empirical investigation of rater memory issues. Journal of Applied Psychology, 77(4), 501-510. https://doi.org/10.1037/0021-9010.77.4.501

Van Dyke, N. (2008). Self‐and peer‐assessment disparities in university ranking schemes. Higher Education in Europe, 33(2/3), 285-293. https://doi.org/10.1080/03797720802254114

Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287. https://doi.org/10.1177/026553229801500205

Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511732997

Wesolowski, B. C., Wind, S. A., & Engelhard Jr, G. (2015). Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning. Musicae Scientiae, 19(2), 147-170. https://doi.org/10.1177/1029864915589014

Wilson, F. R., Pan, W., & Schumsky, D. A. (2012). Recalculation of the critical values for Lawshe’s content validity ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197-210. https://doi.org/10.1177/0748175612440286

Wind, S. A., & Guo, W. (2019). Exploring the combined effects of rater misfit and differential rater functioning in performance assessments. Educational and psychological measurement, 79(5), 962-987. https://doi.org/10.1177/0013164419834613

Woehr, D.J., & Huffuct, A.I. (1994). Rater training for performance appraisal. A qantitative review. Journal of Occupational and Organizational Psychology, 67(3), 189-205. https://doi.org/10.1111/j.2044-8325.1994.tb00562.x

Wolfe, E. W., & McVay, A. (2012). Application of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practice, 31(3), 31-37. https://doi.org/10.1111/j.1745-3992.2012.00241.x

Yan, X. (2014). An examination of rater performance on a local oral English proficiency test: A mixed-methods approach. Language Testing, 31(4), 501-527. https://doi.org/10.1177/0265532214536171

Zedeck, S., & Cascio, W. F. (1982). Performance appraisal decisions as a function of rater training and purpose of the appraisal. Journal of Applied Psychology, 67(6), 752-758. https://doi.org/10.1037/0021-9010.67.6.752

Zwiers, J. (2008). Building academic language: Essential practices for content classrooms. San Francisco: Jossey-Bass.

Kaynak Göster

Bibtex @araştırma makalesi { epod842094, journal = {Journal of Measurement and Evaluation in Education and Psychology}, issn = {1309-6575}, eissn = {1309-6575}, address = {}, publisher = {Eğitimde ve Psikolojide Ölçme ve Değerlendirme Derneği}, year = {2021}, volume = {12}, pages = {163 - 181}, doi = {10.21031/epod.842094}, title = {Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students}, key = {cite}, author = {Şata, Mehmet and Karakaya, İsmail} }
APA Şata, M , Karakaya, İ . (2021). Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students . Journal of Measurement and Evaluation in Education and Psychology , 12 (2) , 163-181 . DOI: 10.21031/epod.842094
MLA Şata, M , Karakaya, İ . "Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students" . Journal of Measurement and Evaluation in Education and Psychology 12 (2021 ): 163-181 <
Chicago Şata, M , Karakaya, İ . "Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students". Journal of Measurement and Evaluation in Education and Psychology 12 (2021 ): 163-181
RIS TY - JOUR T1 - Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students AU - Mehmet Şata , İsmail Karakaya Y1 - 2021 PY - 2021 N1 - doi: 10.21031/epod.842094 DO - 10.21031/epod.842094 T2 - Journal of Measurement and Evaluation in Education and Psychology JF - Journal JO - JOR SP - 163 EP - 181 VL - 12 IS - 2 SN - 1309-6575-1309-6575 M3 - doi: 10.21031/epod.842094 UR - Y2 - 2021 ER -
EndNote %0 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students %A Mehmet Şata , İsmail Karakaya %T Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students %D 2021 %J Journal of Measurement and Evaluation in Education and Psychology %P 1309-6575-1309-6575 %V 12 %N 2 %R doi: 10.21031/epod.842094 %U 10.21031/epod.842094
ISNAD Şata, Mehmet , Karakaya, İsmail . "Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students". Journal of Measurement and Evaluation in Education and Psychology 12 / 2 (Haziran 2021): 163-181 .
AMA Şata M , Karakaya İ . Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students. Journal of Measurement and Evaluation in Education and Psychology. 2021; 12(2): 163-181.
Vancouver Şata M , Karakaya İ . Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students. Journal of Measurement and Evaluation in Education and Psychology. 2021; 12(2): 163-181.
IEEE M. Şata ve İ. Karakaya , "Investigating the Effect of Rater Training on Differential Rater Function in Assessing Academic Writing Skills of Higher Education Students", Journal of Measurement and Evaluation in Education and Psychology, c. 12, sayı. 2, ss. 163-181, Haz. 2021, doi:10.21031/epod.842094