A Study on the Identification of Latent Classes Using Mixture Item Response Theory Models: TIMSS 2015 Case

This study examined the existence of latent classes in TIMSS 2015 data from three countries, Singapure, Turkey and South Africa, were analyzed using Mixture Item Response Theory (MixIRT) models (Rasch, 1PL, 2PL and 3PL) on 18 multiple-choice items in the science subtest. Based on the findings, it was concluded that the data obtained from TIMSS 2015 8th grade science subtest have a heterogeneous structure consisting of two latent classes. When the item difficulty parameters in two classes were examined for Singapore, it was determined that the items were considerably easy for the students in Class 1 and the items were easy for the students in Class 2. When the item difficulty parameters in two classes were examined for Turkey, it was found that the items were easy for the students in Class 1 and the items were difficult for the students in Class 2. When the item difficulty parameters in two classes were examined for South Africa, it was ascertained that the items were a bit easy for the students in Class 1 and the items were considerably difficult for the students in Class 2. The findings were discussed in the context of the assumption of parameter invariance and test validity.

___

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723.
  • Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 433–448). New York, NY: Springer.
  • Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42,133–148.
  • Cook, L. (2006). Practical considerations in linking scores on adapted tests. Keynote address at the 5th International Meeting of the International Test Commission, Brussels, Belgium. Dempster,
  • de Ayala, R. J. (2009). The theory and practice of item response theory. New York, NY: Guilford Press.
  • de Ayala, R. J. & Santiago, S. Y. (2017). An introduction to mixture item response theory models. Journal of School Psychology, 60, 25-40.
  • DeMars, C. (2010). Item response theory. New York: Oxford University Press.
  • DeMars, C. E., & Lau, A. (2011). Differential item functioning detection with latent classes: how accurately detect who is responding differentially? Educational and Psychological Measurement, 71, 597–616.
  • Dorans, N. J., & Kingston, N. M. (1985). The effects of violations of unidimensionality on the estimation of item and ability parameters and on item response theory equating of the GRE verbal scale. Journal of Educational Measurement, 22(4), 249-262.
  • Eid, M., & Rauber, M. (2000). Detecting measurement invariance in organizational surveys. European Journal of Psychological Assessment, 16(1), 20–30.
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Quality of Life Research. New Jersey: Lawrence Erlbaum Associates, Inc.
  • Embretson, S. E. (2007). Mixed Rasch models for measurement in cognitive psychology. In M. von Davier, & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: extensions and applications (pp. 235-253). New York: Springer Verlag.
  • Finch, W. H. & French, B. F. (2012). Parameter estimation with mixture item response theory models: A monte carlo comparison of maximum likelihood and Bayesian methods. Journal of Modern Applied Statistical Methods, 11(1), 167-178.
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (8th ed.). New York: Mc Graw Hill.
  • Glück, J., & Spiel, C. (2007). Studying development via Item Response Model: A wide range of potential uses. In M. von Davier, & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions and applications (pp. 281-292). New York: Springer Verlag.
  • Goldstein, H. (2004). International comparisons of student attainment: some issues arising from the PISA study. Assessment in Education: Principles, Policy & Practice, 11(3), 319–330. doi:10.1080/0969594042000304618.
  • Grisay, A., & Monseur, C. (2007). Measuring the equivalence of item difficulty in the various versions of an international test. Studies in Educational Evaluation, 33(1), 69–86.
  • Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of Item Response Theory. Newbury Park, CA: Sage.
  • Harris, D. (1989). Comparison of 1-, 2-, and 3-Parameter MTK Models. Educational Measurement: Issues and Practice, 8(1), 35–41.
  • Hong, S. (2007). Mixed Rasch modeling of the self-rating depression scale. Educational and Psychological Measurement, 67, 280-299.
  • Kreiner, S., & Christensen, K. B. (2007). Validity and objectivity in health-related scales: Analysis by graphical loglinear Rasch models. In M. von Davier, & C. H. Carstensen (Eds.). Multivariate and mixture distribution Rasch models: Extensions and applications (pp. 329-346). New York: Springer Verlag.
  • Kreiner, S., & Christensen, K. B. (2014). Analyses of model fit and robustness: A new look at the PISA scaling model underlying ranking of countries according to Reading literacy. Psychometrika, 79(2), 210–231.
  • Kutscher, T., Eid, M., & Crayen, C. (2019). Sample size requirements for applying mixed Polytomous item response models: Results of a Monte Carlo simulation study. Frontiers in Psychology, 10, 2494.
  • Li, F., Cohen, A. S., Kim, S.-H., & Cho, S.-J. (2009). Model selection methods for mixture dichotomous MTK models. Applied Psychological Measurement, 33, 353-373.
  • Liu, H., Liu, Y., & Li, M. (2018). Analysis of process data of PISA 2012 computer-based problem solving: Application of the modified multilevel mixture IRT model. Frontiers in Psychology, 9.
  • Maij- de Meij, A. M., Kelderman, H., & van der Flier, H. (2010). Improvement in detection of differential item functioning using a mixture item response theory model. Multivariate Behavioral Research, 45, 975-999.
  • Martin, M. O., Mullis, I. V. S., & Hooper, M. (2016). Methods and procedures in TIMSS 2015. Chestnut Hill, MA: Boston College, TIMSS & PIRLS International Study Center. Zugriff am (Vol. 21).
  • McLachlan, G., & Peel, D. (2000). Finite mixture models. New York, NY: John Wiley.
  • Meiser, T., & Machunsky, M. (2008). The personal structure of personal need for structure: A mixture-distribution Rasch analysis. European Journal of Psychological Assessment, 24(1), 27–34.
  • Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543.
  • Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23,13–23.
  • Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York, NY: Routledge.
  • Mislevy, R. J., & Verhelst, N. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55, 195-215.
  • Mislevy, R., & Huang, C., W. (2007). Measurement models as narrative structures. In M. von Davier, & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions and applications (pp. 15-35). New York: Springer Verlag.
  • Muthén, L. K., & Muthén, B. O. (2017). Mplus user’s guide (Eighth Edition). Los Angeles, CA: Muthén & Muthén.
  • Nylund, K. L., Asparouhov, T., & Muthe´n, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.
  • Park, Y. S., Lee, Y.-S., & Xing, K. (2016). Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions. Frontiers in Psychology, https://doi.org/10.3389/fpsyg.2016.
  • Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333.
  • Oliveri, M. E., Ercikan, K., Zumbo, B. D., & Lawless, R. (2014). Uncovering substantive patterns in student responses in international large-scale assessments—Comparing a latent class to a manifest DIF approach. International Journal of Testing, 14(3), 265-287.
  • Oliveri, M. E., & von Davier, M. (2014). Toward increasing fairness in score scale calibrations employed in international large-scale assessments. International Journal of Testing, 14(1), 1–21. doi:10.1080/15305058.2013.825265.
  • Preinerstorfer, D., & Formann, A. K. (2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65, 251–262.
  • Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
  • Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271-282.
  • Rost, J., & von Davier, M. (1993). Measuring different traits in different populations with the same items. In R. Steyer, K. F. Wender, & K. F. Widaman (Eds.), Psychometric Methodology: Proceedings of the 7th European Meeting of the Psychometric Society in Trier (pp. 446-450).
  • Rost, J., Carstensen, C., & Von Davier, M. (1997). Applying the mixed Rasch model to personality questionnaires. In J. Rost & R. Langeheine (Eds.), Applications oflatent trait and latent class models in the social sciences (pp. 324-332). New York, NY: Wax-Mann.
  • Samuelsen, K. (2005). Examining differential item functioning from a latent class perspective. Doctoral Dissertation. Available from ProQuest Dissertations and Theses database. (UMI No. 3175148)
  • Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
  • Sen, S., Cohen, A. S., & Kim, S. H. (2016). The impact of non-normality on extraction of spurious latent classes in mixture IRT models. Applied Psychological Measurement, 40, 98- 113., Doi: 10.1177/0146621615605080.
  • Sen, S. (2018). Spurious Latent Class Problem in the Mixed Rasch Model: A Comparison of Three Maximum Likelihood Estimation Methods under Different Ability Distributions. International Journal of Testing,18(1), 71-100., Doi: 10.1080/15305058.2017.1312408.
  • Toker, T. (2016). A comparision latent class analysis and the mixture Rasch model: A cross-cultural comparision of 8th grade mathematics achievement in the fourth international mathematics and science study (TIMSS-2011). Doctoral Dissertation, The Faculty of the Morgridge College of Education University of Denver, USA.
  • von Davier, & Yamamoto, K. (2007). Mixture- distribution and hybrid Rasch models. In M. von Davier, & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions and applications (pp. 99-115). New York: Springer Verlag.
  • Xu, Y. (2009). Measuring change in jurisdiction achievement over time: Equating issues in current international assessment programs. Doctoral Dissertation, University of Toronto.