An Empirical Study for the Statistical Adjustment of Rater Bias Mustafa İlhan
An Empirical Study for the Statistical Adjustment of Rater Bias Mustafa İlhan
This study investigated the effectiveness of statisticaladjustments applied to rater bias in many-facet Rasch analysis. Somechanges were first made in the dataset that did not include rater × examineebias to cause to have rater × examinee bias. Later, bias adjustment wasapplied to rater bias included in the data file, and the effectiveness of thestatistical adjustment was further examined. The outcomes pertaining to thedatasets with and without bias, and to which the bias adjustment wasapplied, were compared. It was concluded that diversities created by rater× examinee bias in examinees’ ability estimation, item difficulty indices andmeasures of rater severity and leniency were, to a large extent, eliminatedby bias adjustment. This result indicates that the bias adjustment usingmany-facet Rasch analysis is a viable way to control rater bias.
___
- Aubin, A. S., St-Onge, C., & Renaud, J. S. (2018). Detecting rater bias using a person-fit
statistic: A Monte Carlo simulation study. Perspectives on Medical Education, 7(2), 83-
92. http://dx.doi.org/10.1007/s40037-017-0391-8
- Bailey, K. (1994). Methods of social research. New York: The Free.
- Bennett, R. E. (1991). On the meanings of constructed response. ETS Research Report Series,
2, 1-46. http://dx.doi.org/10.1002/j.2333-8504.1991.tb01429.x
- Bennett, R. E., Ward, W. C., Rock, D. A., & LaHart, C. (1990). Toward a framework for
constructed response items. ETS Research Report Series, 1, 1 - 29. http://dx.doi.org/10.
1002/j.2333-8504.1990.tb01348.x
- Connaway, L. S., & Powell, R. R. (2010). Basic research methods for librarians. Santa Barbara,
CA: Libraries Unlimited.
- DeMars, C. (2010). Item response theory. Oxford, UK: Oxford University.
- Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance
assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-
221. http://dx.doi.org/10.1207/s15434311laq0203_2
- Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in
second language writing assessment. Iranian Journal of Language Testing, 1(1), 1-16.
Retrieved from http://www.ijlt.ir/portal/files/401-2011-01-01.pdf
- Güler, N., İlhan, M., Güneyli, A., & Demir, S. (2017). An evaluation of the psychometric
properties of three different forms of Daly and Miller’s writing apprehension test through
Rasch analysis. Educational Sciences: Theory & Practice, 17(3), 721-744.
http://dx.doi.org/10.12738/estp.2017.3.0051
- Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement
in analyzing the reliability of an English test for non-English major graduates. Chinese
Journal of Applied Linguistics, 33(2), 87 - 102. Retrieved from http://www.celea.org.cn/
teic/90/10060807.pdf
- Haladyana, T. M. (1997). Writing test items to evaluate higher order thinking. Needham
Heights, MA: Allyn & Bacon.
- Hogan, T. P., & Murphy, G. (2007) Recommendations for preparing and scoring constructedresponse
items: What the experts say. Applied Measurement in Education, 20(4), 427-
441. http://dx.doi.org/10.1080/08957340701580736
- Houston, W. M., Raymond, M.R., & Svec, J. C. (1991). Adjustments for rater effects in
performance assessment. Applied Psychological Measurement, 15(4), 409-421.
http://dx.doi.org/10.1177/014662169101500411
- Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can
we do about it? Psychological Methods, 5(1), 64–86. http://dx.doi.org/10.1037/1082-
989X.5.1.64
- İlhan, M. (2015). The identification of rater effects on open-ended math questions rated through
standard rubrics and rubrics based on the SOLO taxonomy in reference to the many facet
Rasch model. Doctoral dissertation, Gaziantep University, Gaziantep, Turkey. Retrieved
from https://tez.yok.gov.tr/UlusalTezMerkezi/
- İlhan, M. (2016). Comparison of the ability estimations of classical test theory and the many
facet Rasch model in measurements with open-ended questions. Hacettepe University
Journal of Education, 31(2), 346–368. http://dx.doi.org/10.16986/HUJE.2016015182
- Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it
compare with face-to-face training? Assessing Writing, 12(1), 26-43.
http://dx.doi.org/10.1016/j.asw.2007.04.001
- Kondo Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second
language writing performance. Language Testing, 19(1), 3 - 31. https://doi.org/10.1191/
0265532202lt218oa
- Kumar, DSP D. (2005). Performance appraisal: The importance of rater training. Journal of the
Kuala Lumpur Royal Malaysia Police College, 4, 1 - 15. Retrieved from http://rmpckl.r
mp.gov.my/Journal/BI/performanceappraisal.pdf
- Lee, M., Peterson, J. J., & Dixon, A. (2010). Rasch calibration of physical activity self-efficacy
and social support scale for persons with intellectual disabilities. Research in
Developmental Disabilities, 31(4), 903-913. http://dxdoi.org/10.1016/j.ridd.2010.02.010
- Linacre, J. M. (2012). Many-facet Rasch measurement: Facets tutorial. Retrieved from
http://www.winsteps.com/a/ftutorial2.pdf
- Linacre, J. M. (2018). A user's guide to FACETS Rasch-model computer programs. Retrieved
from https://www.winsteps.com/manuals.htm
- McNamara, J. F., Erlandson, D. A., & McNamara, M. (2013). Measurement and evaluation:
Strategies for school improvement. New York, NY: Routledge.
- Myford, C. M., & Wolfe, E. W. (2004). Detecting and Measuring rater effects using many-facet
Rasch measurement: Part II. Journal of Applıed Measurement, 5(2), 189-227. Retrieved
from http://jimelwood.net/students/grips/tables_figures/myford_wolfe_2004.pdf
- Nandakumar, R., & Ackerman, T. A. (2004). Test modeling. In D. Kaplan (Ed.), The Sage
handbook of quantitative methodology for the social sciences (pp. 93-105). Thousand
Oaks, CA: Sage.
- Raymond, M. R., & Houston, W. M. (1990). Detecting and correcting for rater effects
inperformance assessment (ACT Research Rep. No. 90-14). Iowa City, American
College Testing. Retrieved from http://www.act.org/content/dam/act/unsecured/docume
nts/ACT_RR90-14.pdf
- Raymond, M. R., & Viswesvaran, C. (1993). Least squares models to correct for rater effects
in performance assessment. Journal of Educational Measurement, 30(3), 253-268.
http://dx.doi.org/10.1111/j.1745-3984.1993.tb00426.x
- Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the
psychometric quality of rating data. Psychological Bulletin, 88(2), 413-428.
http://dx.doi.org/10.1037/0033-2909.88.2.413
- Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement
Transactions, 8(3), 370-371. Retrieved from https://www.rasch.org/rmt/rmt83b.htm