Sümeyra SOYSAL, Esin YILMAZ KOĞAR

Item parameter recovery via traditional 2PL, Testlet and Bi-factor models for Testlet-Based tests

The testlet comprises a set of items based on a common stimulus. When the testlet is used in the tests, there may violate the local independence assumption, and in this case, it would not be appropriate to use traditional item response theory models in the tests in which the testlet is included. When the testlet is discussed, one of the most frequently used models is the testlet response theory (TRT) model. In addition, the bi-factor model and traditional 2PL models are also used for testlet-based tests. This study aims to examine the item parameters estimated by these three calibration models of the data properties produced under different conditions and to compare the performances of the models. For this purpose, data were generated under three conditions: sample size (500, 1000, and 2000), testlet variance (.25, .50, and 1), and testlet size (4 and 10). For each simulation condition, the number of items in the test was fixed at i = 40 and 100 replications were made under each condition. Among these models, it was concluded that the TRT model gave less biased results than the other two models, but the results of the bi-factor model and the TRT were more similar as the sample size increased. Among the examined conditions, it was determined that the most effective variable in parameter recovery was the sample size.

Anahtar Kelimeler:

Local dependence, Testlet model, Bi-factor model, Random effect parameter, Parameter recovery

Item parameter recovery via traditional 2PL, Testlet and Bi-factor models for Testlet-Based tests

Keywords:

Local dependence, Testlet model, Bi-factor model, Random effect parameter, Parameter recovery,

PDF

___

Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153-168. https://doi.org/10.1007/bf02294533
Chalmers, R.P. (2020). mirt: Multidimensional item response theory. R package version 1.33.2. [Computer software manual]. http://www.R-project.org/
Chen, W.H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.3102/10769986022003265
DeMars, C.E. (2006). Application of the bi‐factor multidimensional item response theory model to testlet based tests. Journal of Educational Measurement, 43(2), 145 168. https://doi.org/10.1111/j.1745-3984.2006.00010.x
DeMars, C.E. (2012). Confirming testlet effects. Applied Psychological Measurement, 36, 104 121. https://doi.org/10.1177/0146621612437403
Eckes, T. (2014). Examining testlet effects in the TestDaF listening section: A testlet response theory modeling approach. Language Testing, 31(1), 39 61. https://doi.org/10.1177/0265532213492969
Eckes, T., & Baghaei, P. (2015). Using testlet response theory to examine local dependence in C tests. Applied Measurement in Education, 28(2), 85 98. https://doi.org/10.1080/08957347.2014.1002919
Glas, C.A.W., Wainer, H., & Bradlow, E.T. (2000). MML and EAP estimation in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 271–288). Kluwer-Nijhoff.
Jiao, H., Kamata, A., Wang, S., & Jin, Y. (2012). A multilevel testlet model for dual local dependence. Journal of Educational Measurement, 49(1), 82 100. https://doi.org/10.1111/j.1745-3984.2011.00161.x
Jiao, H., Wang, S., & Kamata, A. (2005). Modeling local item dependence with the hierarchical generalized linear model. Journal of Applied Measurement, 6(3), 311-321.
Keller, L., Swaminathan, H., & Sireci, S.G. (2003). Evaluating scoring procedures for context-dependent item sets. Applied Measurement in Education, 16, 207 222. https://doi.org/10.1207/s15324818ame1603_3
Koziol, N.A. (2016). Parameter recovery and classification accuracy under conditions of testlet dependency: a comparison of the traditional 2PL, testlet, and bi-factor models. Applied Measurement in Education, 29(3), 184 195. https://doi.org/10.1080/08957347.2016.1171767
Li, F. (2017). An information‐correction method for testlet‐based test analysis: From the perspectives of item response theory and generalizability theory (Report No. ETS RR-17-27). ETS Research Report Series. https://doi.org/10.1002/ets2.12151
Liu Y, & Liu H.Y. (2012). When should we use testlet model? A comparison study of Bayesian testlet random-effects model and standard 2-pl bayesian model. Acta Psychologica Sinica, 44(2), 263-275. https://doi.org/10.3724/sp.j.1041.2012.00263
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Addison-Wesley.
Luo, Y., & Wolf, M.G. (2019). Item parameter recovery for the two-parameter testlet model with different estimation methods. Psychological Test and Assessment Modeling, 61(1), 65-89.
Min, S., & He, L. (2014). Applying unidimensional and multidimensional item response theory models in testlet-based reading assessment. Language Testing, 31(4), 453-477. https://doi.org/10.1177/0265532214527277
Paek, I., & Cole, K. (2019). Using R for item response theory model applications. Routledge.
Pak, S. (2017). Ability parameter recovery of a computerized adaptive test based on rasch testlet models [Doctoral dissertation, University of Iowa]. Iowa University Libraries https://doi.org/10.17077/etd.5akqn3gy
Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47, 361-372. https://doi.org/10.1111/j.1745-3984.2010.00118.x
Sireci, S.G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet‐based tests. Journal of Educational Measurement, 28(3), 237 247. https://doi.org/10.1002/j.2333 8504.1991.tb01389.x
Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29(2), 108 121. https://doi.org/10.1080/08957347.2016.1138956
Thissen, D., Steinberg, L., & Mooney, J.A. (1989). Trace lines for testlets: A use of multiplecategorical-response models. Journal of Educational Measurement, 26, 247-260. https://doi.org/10.1111/j.1745-3984.1989.tb00331.x
Tuerlinckx, F., & De Boeck, P. (2001). The effect of ignoring item iterations on the estimated discrimination parameters in item response theory. Psychological Methods, 6(2), 181-195. https://doi.org/10.1037/1082-989x.6.2.181
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education, 8, 157–186. https://doi.org/10.1207/s15324818ame0802_4
Wainer, H., Bradlow, E.T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C.A.W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–270). Kluwer-Nijhoff.
Wainer, H., Bradlow, E.T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press.
Wainer, H., & Kiely, G.L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185 201. https://doi.org/10.1111/j.1745-3984.1987.tb00274.x
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15(1), 22-29. https://doi.org/10.1002/j.2333-8504.1998.tb01749.x
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220. https://doi.org/10.1002/j.2333-8504.2001.tb01851.x
Yen, W.M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187 213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x
Zenisky, A.L., Hambleton, R.K., & Sired, S.G. (2002). Identification and evaluation of local item dependencies in the Medical College Admissions Test. Journal of Educational Measurement, 39(4), 291-309. https://doi.org/10.1111/j.1745-3984.2002.tb01144.x