Hülya KELECİOĞLU, Sema SULAK

Investigation of Item Selection Methods According to Test Termination Rules in CAT Applications

In this research, computerized adaptive testing item selection methods were investigated in regard to ability estimation methods and test termination rules. For this purpose, an item pool including 250 items and 2000 people were simulated (M = 0, SD = 1). A total of thirty computerized adaptive testing (CAT) conditions were created according to item selection methods (Maximum Fisher Information, a-stratification, Likelihood Weight Information Criterion, Gradual Information Ratio, and Kullback-Leibler), ability estimation methods (Maximum Likelihood Estimation, Expected a Posteriori Distribution), and test termination rules (40 items, SE < .20 and SE < .40). According to the fixed test-length stopping rule, the SE values that were obtained by using the Maximum Likelihood Estimation method were found to be higher than the SE values that were obtained by using the Expected a Posteriori Distribution ability estimation method. When ability estimation was Maximum Likelihood, the highest SE value was obtained from a-stratification item selection method when the test length is smaller then 30. Whereas, Kullback-Leibler item selection method yielded the highest SE value when the test length is larger then 30. According to Expected a Posteriori ability estimation method, the highest SE value was obtained from a-stratification item selection method in all test lengths. In the conditions where test termination rule was SE < .20, and Maximum Likelihood Ability Estimation method was used, the lowest and highest average number of items were obtained from the Gradual Information Ratio and Maximum Fisher Information item selection method, respectively. Furthermore, when the SE is lower than .20 and Expected a Posteriori ability estimation method was utilized, the lowest average number of items was obtained through Kullback-Leibler, and the highest was obtained through Likelihood Weight Information Criterion item selection method. In the conditions where the test termination rule was SE < .40, and ability estimation method was Maximum Likelihood Estimation, the maximum and minimum number of items were obtained by using Maximum Fisher Information and Kullback-Leibler item selection methods respectively. Additionally, when Expected a Posteriori ability estimation was used, the maximum and minimum number of items were obtained via Maximum Fisher Information and a-stratification item selection methods. For the cases where the stopping rule was SE < .20 and SE < .40 and Maximum Likelihood Estimation method was used, the average number of items were found to be highest in all item selection methods.

PDF

___

Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431-444. doi: 10.1177/014662168200600405
Bulut, O., & Sünbül, Ö. (2017). R programlama dili ile madde tepki kuramında monte carlo simülasyon çalışmaları. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287. doi: 10.21031/epod.305821
Chang, H.-H, Qian, J., & Ying, Z. (2001). a-stratified multistage adaptive testing with b blocking. Applied Psychological Measurement, 25(4), 333-341. doi: 10.1177/01466210122032181
Chang, H.-H, & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20(3), 213-229. doi: 10.1177/014662169602000303
Chang, H. H, & Ying, Z. (1999). a-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23(3), 211-222. doi: 10.1177/01466219922031338
Chen, S.-Y., Ankenmann, R. D., & Chang, H. H. (2000). A comparison of item selection rules at the early stages of computerized adaptive testing. Applied Psychological Measurement, 24(3), 241-255. doi: 10.1177/01466210022031705
Costa, D., Karino, C., Moura, F., & Andrade, D. (2009, June). A comparision of three methods of item selection for computerized adaptive testing. Paper session presented at the meeting of 2009 GMAC Conference on Computerized Adaptive Testing. Retrieved from www.psych.umn.edu/psylabs/CATCentral/
Deng, H., & Chang, H. H. (2001, April). A-stratified computerized adaptive testing with unequal item exposure across strata. Paper session presented at the American Educational Research Association Annual Meeting 2001. Retrieved from https://www.learntechlib.org/p/93050/
Deng, H., Ansley, T., & Chang, H. H. (2010). Stratified and maximum information item selection procedures in computer adaptive testing. Journal of Educational Measurement, 47(2), 202-226. doi: 10.1111/j.1745-3984.2010.00109.x
Eggen, T. H. J. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261. doi: 10.1177/01466219922031365
Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36-49. doi: 10.1111/emip.12111
Han, K. (2009). Gradual maximum information ratio approach to item selection in computerized adaptive testing (Graduate Management Admission Council Research Reports No. 09-07). USA.
Han, K. (2010). Comparision of non-fisher information item selection criteria in fixed length computerized adaptive testing. Paper session presented at the annual meeting of the National Council on Measurement in Education. Denver, CO.
Han, K. (2012). SimulCAT: Windows software for simulating computerized adaptive test administration. Applied Psychological Measurement, 36(1), 64-66. doi: 10.1177/0146621611414407
Harwell, M. R., Stone, C. A., Hsu, T., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. doi: 10.1177/014662169602000201
Kingsbury, G. G., Zara, A. R. (1989). Procedures for selecting ıtems for computerized adaptive tests. Applied Measurement in Education, 2(4), 359-375.
Linda, T. (1996, April). A comparision of the traditional maximum information method and the global information method in cat item selection. Paper session at annual meeting of the National Council on Measurement in Education, New York, NY.
Orcutt, V. L. (2002, February). Computerized adaptive testing: Some issues in development. Paper session at the annual meeting of the Educational Research Exchange. Denton, TX.
Ree, M. J., & Jensen, H. E. (1983). Effects of sample size on linear equating of item characteristic curve parameters, In Weiss, D. (Ed). New horizons in testing latent trait test theory and computerized adaptive testing (pp.135-146). London: Academic Press. doi: 10.1016/B978-0-12-742780-5.50017-2
Şahin, A., & Özbaşı, D. (2017). Effects of content balancing and item selection method on ability estimation in computerized adaptive testing. Eurasian Journal of Educational Research, 17(69), 21-36. Retrieved from http://dergipark.org.tr/ejer/issue/42462/511414
Sireci, S. (2003). Computerized adaptive testing: An introduction. In Wall, & Walz (Eds), Measuring up: Assessment issues for teachers, counselors and administrators (pp. 684-694). USA: CAPS Press.
Slater, S. C. (2001). Pretest ıtem calibration within the computerized adaptive testing environment (Unpublished doctoral dissertation, Graduate School of the University Massachusetts). Retrieved from https://elibrary.ru/item.asp?id=5337539. Amherst.
Stocking, M. L. (1992). Controlling item exposure rates in a realistic adaptive testing paradigm (Research Report No. 93-2). Princeton, NJ: Educational Testing Service. doi: 10.1002/j.2333-8504.1993.tb01513.x
Thissen, D., & Mislevy, R. J. (2000). Testing algorithms. In H. Wainer, (Ed.), Computerized adaptive testing: A primer (pp. 101-133). Mahwah, NH: Lawrence Erlbaum Associates, Inc.
Urry, V. W. (1977). Tailored testing: A successful application of latent trait theory. Journal of Educational Measurement, 14(2), 181-196.
Van Der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing, statistics for social and behaviorel sciences, New York, NY: Springer.
Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22(2), 203-226. doi: 10.3102/10769986022002203
Veldkamp, B. P. (2012). Ensuring the future of computerized adaptive testing. In T. J. H. M. Eggen & B. P.
Veldkamp (Eds.). Psychometrics in practice at RCEC (pp. 35-46). Netherlands: RCEC, Cito.
Wainer, H. (Ed.) (2000). Computerized adaptive testing: A primer. Hillsdale, NJ: Lawrence Erlbaum Associates.
Wang, T., & Visposel, W. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35(2), 109-135.
Weiss, D. J. (1983). Latent trait theory and adaptive testing. In D. J. Weiss (Ed.). New horizons in testing: latent trait test theory and computerized adaptive testing (pp. 5-7). New York, NY: Academic Press.
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375. doi: 10.1111/j.1745-3984.1984.tb01040.x
Weissman, A. (2003, April). Assessing the efficiency of item selection in computerized adaptive testing. Paper session presented at the annual meeting of the American Educational Research Association. Chicago, IL.
Wen, H., Chang, H., Hau, K. (2001, April). Adaption of a-stratified method in variable length computerized adaptive testing. Paper session at the American Educational Research Association Annual Meeting, Seattle, WA.
Yi, Q., Chang, H. (2003). a-stratified CAT design with content blocking. British Journal of Mathematical and Statistical Psychology, 56(2), 359-378. doi: 10.1348/000711003770480084