Item Parameter Estimation for Dichotomous Items Based on Item Response Theory: Comparison of BILOG-MG, Mplus and R (ltm)

Item Parameter Estimation for Dichotomous Items Based on Item Response Theory: Comparison of BILOG-MG, Mplus and R (ltm)

The aim of this study is twofold. The first one is to investigate the effect of sample size and test length on theestimation of item parameters and their standard errors for the two parameter item response theory (IRT). Anotheris to provide information about the performance of Mplus, BILOG-MG and R (ltm) programs in terms of parameterestimation under the conditions which were mentioned above. The simulated data were used in this study. Theexaminee responses were generated by using the open-source program R. After obtaining the data sets, theparameters were estimated in BILOG-MG, Mplus and R (ltm). The accuracy of the item parameters and abilityestimates were evaluated under six conditions that differed in the numbers of items and examinees. After lookingat the resulting bias and root mean square error (RMSE) values, it can be concluded that Mplus is an unbiasedprogram when compared to BILOG-MG and R (ltm). BILOG-MG can estimate parameters and standard errorsclose to the true values, when compared to Mplus and R (ltm)

___

  • Baker, F. B. (1987). Methodology review: Item parameter estimation under the one, two and three parameter logistic models. Applied Psychological Measurement, 11(2), 111- 141.
  • Baker, F. B. (1990). Some observations on the metric of BILOG results. Applied Psychological Measurement, 14(2), 139–150. DOI: https://doi.org/10.1177/014662169001400203
  • Baker, F. B. (1998). An investigation of the item parameter recovery of a Gibbs sampling procedure. Applied Psychological Measurement, 22(2), 153–169. DOI: https://doi.org/10.1177/01466216980222005
  • Bulut, O. & Zopluoğlu, C. (2013). Item parameter recovery of the graded response model using the R package ltm: A Monte Carlo simulation study. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
  • Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. N.Y: CBS College Publishing Company.
  • de Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
  • Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16(1), 5–18. DOI: http://dx.doi.org/10.1007/s11136-007-9198-0
  • Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357-381. DOI: https://doi.org/10.1177/0013164498058003001
  • Foley, B. (2010). Improving IRT parameter estimates with small sample sizes: Evaluating the efficacy of a new data augmentation technique. Open Access Theses and Dissertations from the College of Education and Human Sciences. Paper 75
  • Gao, F. & Chen, L. (2005). Bayesian or non-Bayesian: A comparison study of item parameter estimation in the three-parameter logistic model. Applied Measurement in Education, 18(4), 351-380.
  • Gübeş, N. Ö., Paek, I., & Cui, M. (2018). Örneklem büyüklüğünün ve test uzunluğunun MTK parametre kestirimine etkisi. Pegem Atıf İndeksi, 135-148.
  • Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. Linn (Ed.), Educational Measurment (3rd.ed., pp. 147-200). New York: Macmillan.
  • Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38–47. DOI: https://doi.org/10.1111/j.1745-3992.1993.tb00543.x
  • Hambleton, R. K. & Swaminathan, H. (1985). Item response theory principles and applications. Boston: KluwerNijhoff Publishing
  • Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, Calif.: Sage Publications.
  • Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two and three-parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6(3), 249–260. http://dx.doi.org/ 10.1177/014662168200600301
  • Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43(4), 355-381. DOI: https://doi.org/10.1111/j.1745-3984.2006.00021.x
  • Lim, R. G. & Drasgow, F. (1990). Evaluation of two methods for estimating item response theory parameters when assessing differential item function. Journal of Applied Psychology, 75(2), 164–174.
  • Lord, F. M. (1968). An Analysis of the Verbal Scholastic Aptitude Test using Birnbaum's three-parameter logistic model. Educational and Psychological Measurement, 28(2), 989-1020. DOI: https://doi.org/10.1002/j.2333-8504.1967.tb00987.x
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
  • Mislevy, R. J. & Stocking, M. L. (1989). A consumer’s guide to LOGIST and BILOG. Applied Psychological Measurement, 13(1), 57-75.
  • Muthén, B. O. (1999). IRT models in Mplus. Retrieved from http://www.statmodel.com/discussion/messages/23/25.html
  • Muthén, L. K., & Muthén, B. O. (1998-2012). Mplus user’s guide (8th ed.). Los Angeles, CA: Muthén & Muthén
  • Muthén, L. K., & Muthén, B. O. (2002). How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power. Structural Equation Modeling: A Multidisciplinary Journal, 9(4), 599-620.
  • Pan, T. (2012). Comparison of four maximum likelihood methods in estimating the Rasch model. Paper presented at the annual meeting of the American Educational Research Association, Vancouver, Canada.
  • Partchev, I. (2017). Package ‘irtoys’. A collection of functions related to item response theory (IRT).
  • Patsula, L. N., & Gessaroli, M. E. (1995). A comparison of item parameter estimates and ICCs produced with TESTGRAF and BILOG under different test lengths and sample sizes. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.
  • Proctor, T., Teo, K.-S., Hou & J., Hsieh (2005). Comparison of Parameter Recovery in a 2 Parameter Logistic Item Response Model using MLE and Bayesian MCMC Methods. Class project for 07P:148/22S:138 Bayesian Statistics,University of Iowa.
  • Rahman, N. & Chajewski, M. (2014). A Comparison and Validation of 2- and 3-PL IRT Calibrations in BILOG, PARSCALE, IRTPPRO, flexMIRT, and LTM (R). National Council of Measurement in Education at Philadephia.
  • Ree, M. J., & Jensen, H. E. (1980). Effects of sample size on linear equating of item characteristic curve parameters. In D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference. Minneapolis, MN: University of Minnesota.
  • Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.
  • Swaminathan, H. & Gifford, J. (1983). Estimation of parameters in the three-parameter latent trait model. In D. J. Weiss & R. D. Bock (Eds.), New horizons in testing: latent trait test theory and computerized adaptive testing (pp. 13–30). New York: Academic Press.
  • Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16(1), 1–16. DOI: http://dx.doi.org/10.1177/014662169201600101
  • Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory and Practice, 17(1), 321-335. DOI: 10.12738/estp.2017.1.0270
  • Şahin, F. & Colvin, K. (2015). Evaluation of R package ltm with IRT dichotomous models. NERA Conference Proceedings, 6.
  • Thissen, D. & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4), 397–412. DOI: 10.1007/BF02293705
  • Toland, M. D. (2008). Determining the accuracy of item parameter standart error of estimates in BILOG-MG3. Doctoral dissertation. Available from ProQuest LLC (UMI Number 3317288)
  • Van der Linden, W. & Hambleton, R. K. (1997). Handbook of modern item response theory. Newyork: SpringerVerlag. Yen, W. M. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52(2), 275–291. DOI: http://dx.doi.org/10.1007/BF02294241
  • Yen, W., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 111-153). Westport, CT: Praeger Publishers.
  • Yoes, M. (1995). An updated comparison of micro-computer based item parameter estimation procedures used with the 3-parameter IRT model. Saint Paul, MN: Assessment Systems Corporation.
  • Zimowski, M., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG 3: Item analysis and test scoring with binary logistic models. Chicago, IL: Scientific Software