Semirhan GÖKÇE, Cees A. W. GLAS

Can TIMSS Mathematics Assessments be Implemented as Computerized Adaptive Test?

In recent years, there has been a growing interest and extensive use of computerized adaptive testing (CAT)especially in large-scale assessments. Numerous simulation studies have been conducted on both real andsimulated data sets to determine the optimum conditions and develop CAT versions. Being one of the mostpopular large-scale assessment programs, Trends in International Mathematics and Science Study (TIMSS) hasbeen implemented as paper and pencil tests to monitor student achievement in mathematics and science at fourthand eighth grade levels since 1995. The purpose of this study is to investigate the optimum CAT algorithm forTIMSS eighth grade mathematics assessments. Since Turkey and USA participated in 2007, 2011 and 2015administrations, their data were combined and then 393 items were calibrated on the same scale by usingmarginal maximum likelihood estimation method. With this item pool, several scenarios were proposed andtested to determine not only the optimum starting rule, ability estimation method, test termination rule but alsothe efficiency of exposure control method. The results of the study indicated that estimating abilities withexpected a posteriori method after 6 random items, terminating the fixed-length test after 20 items seemed to bethe optimum algorithm for TIMSS eighth grade mathematics assessments. Also, it was found that using itemexposure control had a prior importance for the effective use of the item pool. This study has some implicationsfor both national and international large-scale test developers in determining the optimum CAT algorithm andits consequences compared with paper and pencil versions.

PDF

___

Davey, T., & Pitoniak, M. J. (2006). Designing computerized adaptive tests. Handbook of Test Development, 543-574. Routledge.
Eggen, T. J. H. M. (2001). Overexposure and underexposure of items in computerized adaptive testing. Measurement and Research Department Reports, 1.
Eggen, T. J. H. M. (2004). Contributions to the Theory and Practice of Computerized Adaptive Testing. Dissertation. Print Partners Ipskamp B.V., Enschede.
Eggen, T. J. H. M. (2007). Choices in CAT models in the context of educational testing. In D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing.
Eggen, T. J. H. M., & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734.
Glas, C. A. W. (2010) MIRT: Multidimensional Item Response Theory. (Computer Software). University of Twente. Retrieved from https://www.utwente.nl/nl/bms/omd/Medewerkers/medewerkers/glas/#soft-ware
Glas, C. A. W., & Geerlings, H. (2009). Psychometric aspects of pupil monitoring systems. Studies in Educational Evaluation, 35, 83-88.
Gu, L., Reckase M. D. (2007). Designing optimal item pools for computerized adaptive tests with SympsonHetter exposure control. In D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory (Vol. 2). Sage.
Kalender, I. (2011). Effects of different computerized adaptive testing strategies on recovery of ability. Yayınlanmamış Doktora Tezi. Middle East Technical University, Ankara.
Kezer, F. & Koç, N. (2014). Bilgisayar ortamında bireye uyarlanmış test stratejilerinin karşılaştırılması [A comparison of computerized adaptive testing strategies]. Eğitim Bilimleri Araştırmaları Dergisi - Journal of Educational Sciences Research, 4 (1), 145-174.
Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2(4), 359-375.
Luecht, R. M. & Sireci, S. G. (2012). A review of models for computer-based testing. Research Report RR-2011- 12. New York: The College Board.
Masters, G. N. (2016). Partial credit model. In Handbook of Item Response Theory, Volume One (pp. 137-154). Chapman and Hall/CRC.
Meijer, R. R. & Nering M. L. (1999). Computerized adaptive testing: overview and introduction. Applied Psychological Measurement, 23, 187-194.
Mills, C. N. & Stocking, M. L. (1996). Practical issues in large-scale computerized adaptive testing. Applied Measurement in Education, 9 (4), 287-304.
Mullis, I., Martin, V. & Loveless, T. (2016). 20 years of TIMSS, international trends in mathematics and science achievement, curriculum, and instruction. IEA, TIMSS&PIRLS International Study Center Lynch School of Education, Boston College.
Sireci, S. G., Baldwin, P., Martone, A., Zenisky, A. L., Kaira, L., Lam, W., & Hambleton, R. K. (2008). Massachusetts Adult Proficiency Tests Technical Manual, Version 2. Center for Educational Assessment Research Report No, 677.
Smits, N., van Straten, A., & Cuijpers, P. (2011). Applying computerized adaptive testing to the CES-D scale: A simulation study. Psychiatry Research, 188(1), 147-155.
van der Linden, W. J. (1995). Advances in computer applications. In T. Oakland & R. K. Hambleton (Eds.), International Perspectives on Academic Assessment, (pp. 105-124). Kluwer Academic Publishers.
van der Linden, W. J. (2001). Computerized test construction. Research Report. Twente University, Enschede (Netherlands).
van der Linden, W. J. (2010). Item selection and ability estimation in adaptive testing. Elements of Adaptive Testing, 3-30. Springer.
Verschoor, A. J., & Straetmans, G. J. J. (2010). MATHCAT: A flexible testing system in mathematics education for adults. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of Adaptive Testing, (pp. 137- 149). Statistics for Social and Behavioral Sciences. Springer.
Wainer, H. (2000). Computerized Adaptive Testing: A Primer. Mahvah, NJ: Erlbaum.
Zenisky A. L., & Sireci, S. G. (2002) Technological innovations in large-scale assessment, Applied Measurement in Education, 15:4, 337-362.