Issues and Remedies in Composite Scoring: A Case of Joint EGP-ESP Test

In comparing the high stake ESP tests administered in recent years in Iran for Master Degree and PhD levels admission purpose, a vast area of uncertainty arises as to the nature of scoring system used at the large scale for these competitive exams. Where we had the modular PhD entrance exam with a prerequisite EGP Module followed by a Specific Purpose Module, the participants of our Master Degree counterpart test sat for a joint or blended EGP-ESP subtest which then was scored and reported in a composite percentile and interpreted along with other knowledge subtests against a common national norm. In this article, we attempted to address the issues associated with these models of scoring and the problems which are due from it for our accountability system.

___

  • Arce-Ferrer, A. (2010). Investigating approaches to estimate an individual’s strand/objective score profile reliability: A Monte Carlo study. Paper presented at the 2010 Annual Meeting of the American Educational Research Association, Denver, CO.
  • Baker, D. (1990). A Guide to Language Testing. London: Edward Arnold. Chalhoub-Deville, M. And Fulcher, G. (2003). The oral proficiency interview and the ACTFL
  • Guidelines: A research agenda. Foreign Language Annals, 36, (4), 498 - 506. Boughton, D., Yao, K., & Lewis, D. (2006, April). A comparison of subscale score augmentationmethods using empirical data. Paper presented at the meeting of the National Council on Measurement in Education, San Fransisco, CA.
  • Chase, L. (1999). Contemporary assessment for educators. New York: Longman. Cronbach, L. J. (1984). Essentials of Psychological Testing, 4th edn. Harper and Row, New York.
  • Farhady, H. (1998). A critical review of the English section of the BA and MA University Entrance Examination. In the Proceedings of the conference on MA tests in Iran (1998). Ministry of Culture and Higher Education, Center for Educational Evaluation. Tehran, Iran.
  • Farhady, H. & Hedayati, H. (2009). Language assessment policy in Iran. Annual Review of Applied Linguistics, 29, 132– 141.
  • Fulcher, G., & Davidson, F. (2009). Test architecture. Test retrofit. Language Testing, 26, (1)123–144.
  • Glutting, J. (2002). Glutting 's guide for norm referenced test score interpretation, using a sample psychological report. Retrieved from: http://www.udel.edu/educ/gottfredson/451/Glutting-guide.htm Johnson. E. G., & Carlson, J. (1994). The NAEP 1992 Technical Report (Report No. 23-TR-20). Washington, DC: National Center for Education Statistics.
  • Kohlman, N. (2006). What teachers need to know, and be able to do, about norm-referenced tests. ELL Outlook, available online at: http://www.coursecrafters.com/ELL-Outlook/2006/jul_aug/ELLOutlookITIArticle3.htm
  • Rabbani Yekta, R. (2012). Justifying the dimensional structure of General Academic –Specific Academic Purposes English subtests for master’s degree entrance examinations: An upgrade test retrofit using Rasch de-modularization technique (Doctoral Dissertation). Isfahan University, Isfahan, Iran.
  • Robinson, P. & Ross, S. (1996). The development of task-based assessment in English for Academic Purposes programs. Applied Linguistics, 17 (4), 455-476.
  • Shin, D. (2004). A comparison of methods of estimating objective scores. Ph.D. dissertation, The University of Iowa, United States -- Iowa. Retrieved December 10, 2007, from ProQuest Digital Dissertations database. (Publication No. AAT 139395).
  • Tate, R. L. (2004). Implications of multidimensionality for total score and subscore performance. Applied measurement in education, 17(2). 89-112.
  • Thissen, D., & Edwards, M.C. (2005). Diagnostic scores augmented using multidimensional item response theory. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.
  • Torre, J., & Patz, R. (2002) A Multidimensional item response theory approach to simultaneous ability estimation. Paper presented at the Annual Meeting of the National Council on Measurement in Education, April, New Orleans, LA.
  • Tratnik, A. (2008). Key issues in testing English for Specific Purposes. Scripta Manent, 4(1) 313.
  • Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Rosa, K., Nelson, L., et al. (2001). Augmented scores: ‘‘Borrowing Strength’’ to compute scores based on small numbers of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343-387). Mahwah, NJ: Lawrence Erlbaum.
  • Williams, Paul L. (1989). Using customized standardized tests. Practical Assessment, Research & Evaluation, 1(9).
  • Wood, D., Nye, C., & Saucier, G. (2010).Identification and measurement of a more comprehensive set of person-descriptive trait markers from the English lexicon. Journal of Research in Personality, 44 (2).
  • Yao, L., & Boughton, K. A. (2005). A Multidimensional item response modeling approach for improving subscale proficiency estimation in cognitive diagnostic assessments. Paper submitted for publication in APM.
  • Yen, W. M. (1987, April). A Bayesian/IRT index of objective performance. Paper presented at the annual meeting of the Psychometric Society, June, Montreal, Quaebec, Canada.
  • Yen, W. M., Green, D. R., & Burket, G. R. (1987). Valid Normative Information from Customized Achievement Tests. Educational Measurement: Issues and Practice, 6, 7-13.
  • Zhang, J. & Stout, W. (1999). Conditional covariance structure of generalized compensatory multidimentional items. Psychometrika, 64, 129-152.
The Journal of Language Learning and Teaching-Cover
  • ISSN: 2146-1732
  • Başlangıç: 2011
  • Yayıncı: Gazi Yabancı Diller Derneği