Testing Multistage Testing Configurations: Post-Hoc vs. Hybrid Simulations

Due to low cost monte-carlo MC simulations have been extensively conducted in the area of educational measurement. However, the results derived from MC studies may not always be generalizable to operational studies. The purpose of this study was to provide a methodological discussion on the other different types of simulation methods, and run post-hoc and hybrid simulations in the multistage testing environment MST , and to discuss the findings and interpretations derived from each simulation method. The real data collected via paper-pencil administrations from 652 students were used to test different MST design configurations under different test lengths. The levels of test lengths were 24-item, 36-item and 48-item. The five levels of MST designs were 1-2, 1-3, 1-2-2, 1-3-3, 1-4-4 and 1-5-5 design. Both post-hoc and hybrid simulations were run with the same group of students. All analyses were completed in R program. The results indicated that in terms of absolute bias, RMSE and pearson correlations, post-hoc and hybrid simulations generally resulted in comparable outcomes. Regardless of the simulation method, the 1-5-5 MST design was the most recommended design. Another finding was that in all simulations, as the test length increased, the outcomes were better, in general. Advantages and disadvantages of each method, and recommendations for practitioner and limitations for future research were provided in the study.

Keywords:

multistage testing, post hoc hybrid simulation,

PDF

___

Barnard, J. J. (2018). From simulation to implementation: Two CAT case studies. Practical Assessment, Research & Evaluation, 23(14), 2. Retrieved from: http://pareonline.net/getvn.asp?v=23&n=14
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In R.L. Brennan (2011) Generalizability theory and classical test theory. Applied Measurement in Education. 24(1), 1-21.
Bulut, O., & Kan, A. (2012) Application of computerized adaptive testing to entrance examination for graduate studies in Turkey. Egitim Arastirmalari-Eurasian Journal of Educational Research, 49, 61-80.
Buuren, S. V., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal https://www.jstatsoft.org/article/view/v045i03 software, 45(3). doi
18637/jss.v045.i03. retrieved from
Davey, T., & Y.H. Lee. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE Revised General Test. (GRE Board Research Report 08-01). Princeton, NJ: Educational Testing Service.
de Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. NY: The Guilford Press.
Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36-49.
Han, K. T., Dimitrov, D. M., & Al-Mashary, F. (in press). Developing Multistage Tests Using D-Scoring Method. Educational and Psychological Measurement, doi: 0013164419841428.
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44-52.
ILOG. (2006). ILOG CPLEX 10.0 [User’s manual]. Paris, France: ILOG SA.
Kalender, I., & Berberoglu, G. (2017). Can computerized adaptive testing work in students’ admission to higher education programs in Turkey? Kuram ve Uygulamada Eğitim Bilimleri, 17(2), 573-596. doi: 10.12738/estp.2017.2.0280. content/uploads/2017/02/2017.2.0280.pdf Retrieved from
http://www.estp.com.tr/wp
Liu, O. L., Bridgeman, B., Gu, L., Xu, J., & Kong, N. (2015). Investigation of response changes in the GRE revised general test. Educational and psychological measurement, 75(6), 1002-1020.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Luecht, R. M. & Sireci, S. G. (2011). A review of models for computer-based testing. Research Report RR-2011- 12. New York: The College Board.
Luecht, R. M. (2000, April). Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans, LA.
Luecht, R. M., & Nungester, R. J. (1998). Some Practical Examples of Computer‐Adaptive Sequential Testing. Journal of Educational Measurement, 35(3), 229-249.
Nydick, S. W., & Weiss, D. J. (2009). A hybrid simulation procedure for the development of CATs. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. Retrieved from www.psych.umn.edu/psylabs/CATCentral/
Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing (Order No. 9950199). Available from ProQuest Dissertations & Theses Global. (304514969)
R Development Core Team. (2013). R: A language and environment for statistical computing, reference index (Version 2.2.1). Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute of Educational Research.
Sari, H. I., & Raborn, A. (2018). What Information Works Best?: A Comparison of Routing Methods. Applied Psychological Measurement, 42(6), 499-515.
Sari, H.I., & Huggins-Manley, A.C. (2017). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized adaptive multistage testing. Educational Sciences: Theory and Practice, 17, 1759-1781
Thissen, D., & Mislevy, R. J. (2000). Testing algorithms. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., pp. 101-133). Hillsdale, NJ: Lawrence Erlbaum.
Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical https://pareonline.net/getvn.asp?v=16&n=1 Research &
Evaluation, 16(1). Retrieved from
Weiss, D. J., & Gibbons, R. D. (2007). Computerized adaptive testing with the bifactor model. In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. Retrieved from http://iacat.org/sites/default/files/biblio/cat07weiss%26gibbons.pdf
Yan, D., von Davier, A. A., & Lewis, C. (2014). Computerized multistage testing: Theory and applications. Boca Raton, FL: CRC Press.
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment (Order No. 3136800).

International Journal of Psychology and Educational Studies-Cover

ISSN: 2148-9378
Başlangıç: 2014
Yayıncı: Mustafa ÖZGENEL

Arşiv