PISA 2015 Reading Test Item Parameters Across Language Groups: A measurement Invariance Study with Binary Variables

Large-scale international assessments, including PISA, might be useful for countries to receive feedback on their education systems. Measurement invariance studies are one of the active research areas for these assessments, especially cross-cultural and linguistic comparability have attracted attention. PISA questions are prepared in the English language, and students from many countries answer the translated form. In this respect, the purpose of our study is to investigate whether there is a measurement invariance problem across native English and non-native English speaker groups in the PISA-2015 reading skills subtest. The study sample included students from Canada, the USA, and the UK as the native speaker group and students from Japan, Thailand, and Turkey as the non-native speaker group. Measurement invariance studies taking into account the binary structure of the data set for these two groups revealed that eight of the twenty-eight items in the PISA-2015 reading skills test had possible limitations in equivalence

___

  • Adams, R., & Rowe , K. (1988). Educational research, methodology, and measurement:An international handbook. Oxford: Pergamon Press.
  • Adams, R., & Rowe , K. (1988). Educational research, methodology, and measurement:An international handbook. Oxford: Pergamon Press.
  • Akbaş, U. ve Tavşancıl, E. (2015). Farklı örneklem büyüklüklerinde ve kayıp veri örüntülerinde ölçeklerin psikometrik özelliklerinin kayıp veri baş etme teknikleri ile incelenmesi., Journal of Measurement and Evaluation in Education and Psychology, 6(1), 38-57
  • Akbaş, U. ve Tavşancıl, E. (2015). Farklı örneklem büyüklüklerinde ve kayıp veri örüntülerinde ölçeklerin psikometrik özelliklerinin kayıp veri baş etme teknikleri ile incelenmesi., Journal of Measurement and Evaluation in Education and Psychology, 6(1), 38-57
  • Algina, J. & Crocker, L., (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart and Winston.
  • Algina, J. & Crocker, L., (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart and Winston.
  • Arffman, I. (2002). In search of equivalence: Translation problems in international literacy studies. Finland.
  • Arffman, I. (2010). Equivalence of translations in international reading literacy studies. Scandinavian Journal of Educational Research, 54(1), 37-59.
  • Arffman, I. (2010). Equivalence of translations in international reading literacy studies. Scandinavian Journal of Educational Research, 54(1), 37-59.
  • Asil, M., & Gelbal, S. (2012). Cross-cultural equivalence of the PISA student questionnaire. Education ve Science, 236-249.
  • Asil, M., & Gelbal, S. (2012). Cross-cultural equivalence of the PISA student questionnaire. Education ve Science, 236-249.
  • Asil M., & Brown, G. (2015). Comparing OECD PISA reading in English to other languages: Identifying potential sources of non-invariance. International Journal of Testing, 16(1), 71-93.
  • Asil M., & Brown, G. (2015). Comparing OECD PISA reading in English to other languages: Identifying potential sources of non-invariance. International Journal of Testing, 16(1), 71-93.
  • Baker, F. B. (2016). The basics of item response theory. Ankara: Pegem Academy.
  • Baker, F. B. (2016). The basics of item response theory. Ankara: Pegem Academy.
  • Baykal, A., & Circi, R. (2010). Item revision to improve construct validity: A study on released science items in Turkish PISA 2006 . Procedia Social and Behavioral Sciences, 2(2), 1931-1935.
  • Baykal, A., & Circi, R. (2010). Item revision to improve construct validity: A study on released science items in Turkish PISA 2006 . Procedia Social and Behavioral Sciences, 2(2), 1931-1935.
  • Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
  • Bonnet, G. (2002). Reflections in a critical eye: on the pitfalls of international assessment. Assessment in Education: Principles, Policy & Practice, 9(3), 387–399.
  • Brown , T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford.
  • Cheema, J. (2012). Handling missing data in educational research using SPSS. Unpublished doctoral dissertation. George Mason University.
  • Cheema, J. (2012). Handling missing data in educational research using SPSS. Unpublished doctoral dissertation. George Mason University.
  • Downey, R., & King, C. (1998). Missing data in likert ratings: A comparison of replacement methods. The Journal of General Psychology, 175-191.
  • Downey, R., & King, C. (1998). Missing data in likert ratings: A comparison of replacement methods. The Journal of General Psychology, 175-191.
  • Elosua, P. (2011). Assessing Measurement Equivalence in Ordered-Categorical Data. Psicológica, 403-421.
  • Enders, C. K. (2010). Applied missing data analysis. (1. Ed.). New York: The Guilford Publications, Inc
  • Enders, C. K. (2010). Applied missing data analysis. (1. Ed.). New York: The Guilford Publications, Inc
  • Ercikan, K., & Lyons-Thomas, J. (2013). Adapting test for use in other languages and cultures. APA Handbook of Testing and Assesment in Psychology (s. 545-569). içinde Washington: American Psychological Association.
  • Frank J. Fabozzi, S. M., & Wiley , J. (2014). Model Selection Criterion: AIC and BIC. The Basics of Financial Econometrics: Tools, Concepts, and Asset Management Applications.
  • French, B. F., & Finch, W. H. (2006). Confirmatory Factor Analytic Procedures for the Determination of Measurement Invariance. Structural Equation Modeling, 13(3), 378-402.
  • French, B. F., & Finch, W. H. (2006). Confirmatory Factor Analytic Procedures for the Determination of Measurement Invariance. Structural Equation Modeling, 13(3), 378-402.
  • Goldstein, H. (2017). Measurement and evaluation issues with PISA. Routhledge.
  • Gregoric, S. (2006). Do self-report instruments allow meaningful comparisons across diverse population groups?: Testing measurement invariance using the confirmatory factor analysis framework. Medical Care, 78-94.
  • Gregoric, S. (2006). Do self-report instruments allow meaningful comparisons across diverse population groups?: Testing measurement invariance using the confirmatory factor analysis framework. Medical Care, 78-94.
  • Grisay, A., de Jong, J. H., Gebhardt, E., Berezner, A., & Halleux-Monseur, B. (2007). Translation equivalence across PISA countries. Journal of Applied Measurement, 8(3), 249-266.
  • Grisay, A., de Jong, J. H., Gebhardt, E., Berezner, A., & Halleux-Monseur, B. (2007). Translation equivalence across PISA countries. Journal of Applied Measurement, 8(3), 249-266.
  • Grisay, A., Gonzalez, E., & Monseur, C. (2009). Equivalence of item difficulties across national versions of the PIRLS and PISA reading assessments. ERI Monograph Series: Issues and Methodologies In Large-Scale Asssesments, 2, 63-84.
  • Grisay, A., Gonzalez, E., & Monseur, C. (2009). Equivalence of item difficulties across national versions of the PIRLS and PISA reading assessments. ERI Monograph Series: Issues and Methodologies In Large-Scale Asssesments, 2, 63-84.
  • Hambelton, R. K., & Swaminathan , H. (1985). Item Response Theory. Nijhoff Publishing.
  • Hambleton, R. K., & De Jong, J. A. (2003). Advances in translating and adapting educational and psychological tests. Language Testing, 127-134.
  • Hambleton, R. K., Merenda, P., & Spielberger, C. (2005). Adapting educational and psychological tests for cross-cultural assessment. Hillsdale, NJ: Lawrence S. Erlbaum Publishers
  • Herdman M., Rushby J. F., & Badia X. (1998). A Model of Equivalence in The Cultural Adaptation of HRQol Instruments: The Universalist Approach. Ouality Of Life Research, 7(4), 323-335.
  • Herdman M., Rushby J. F., & Badia X. (1998). A Model of Equivalence in The Cultural Adaptation of HRQol Instruments: The Universalist Approach. Ouality Of Life Research, 7(4), 323-335.
  • He, J., Barrera-Pedemonte, F., & Bucholz, J. (2018). Cross-cultural comparability of non-cognitive constructs in TIMSS and PISA. Assesment in Education:Principles, Policy & Practice, 26(4), 369-385.
  • He, J., Barrera-Pedemonte, F., & Bucholz, J. (2018). Cross-cultural comparability of non-cognitive constructs in TIMSS and PISA. Assesment in Education:Principles, Policy & Practice, 26(4), 369-385.
  • He, J., & van de Vijver, F. (2012). Bias and equivalence in cross-cultural research. Online Readings in Psychology and Culture.
  • Jöreskog, K.G., Sörbom, D., Du Toit, S.H.C., & Du Toit, M. (2001). LISREL 8: New statistical features (3rd ed.). Lincolnwood, IL: Scientific Software International.
  • Jöreskog, K.G., Sörbom, D., Du Toit, S.H.C., & Du Toit, M. (2001). LISREL 8: New statistical features (3rd ed.). Lincolnwood, IL: Scientific Software International.
  • Kankaras, M., & Moors, G. (2013). Analysis of cross-cultural comparability of PISA 2009 scores. Journal of Cross-Cultural Psychology, 43(3), 381-399.
  • Kankaras, M., & Moors, G. (2013). Analysis of cross-cultural comparability of PISA 2009 scores. Journal of Cross-Cultural Psychology, 43(3), 381-399.
  • Kim, E. S., & Yoon, M. (2011). Testing Measurement Invariance: A Comparison of Multiple-Group Categorical CFA and IRT. Structural Equation Modeling, 212-228.
  • Kim, E. S., Cao, C., Wang, Y., & Nguyen, D. T. (2017). Measurement Invariance Testing with Many Groups:A Comparison of Five Approaches. Structural Equation Modeling: A Multidisciplinary Journal.
  • Kline, R. B. (2016). Principles and practice of structural equation modeling. Guilford publications.
  • Kline, R. B. (2016). Principles and practice of structural equation modeling. Guilford publications.
  • Kreiner, S., & Christensen, K. B. (2014). Analyses of model fit and robustness. A new look at the PISA scaling model underlying ranking of countries according to reading literacy. Psychometrika, 210-231.
  • Lord, F. M., & Novick, M. E. (1968). Statistical theories of mental test scores . MA: Addison-Wesley.
  • Lubke, G. H., & Muthén, B. O. (2004). Applying Multigroup Confirmatory Factor Models for Continuous Outcomes to Likert Scale Data Complicates Meaningful Group Comparisons . Structural Equation Modeling A Multidisciplinary Journal, 11(4), 514-534.
  • Lubke, G. H., & Muthén, B. O. (2004). Applying Multigroup Confirmatory Factor Models for Continuous Outcomes to Likert Scale Data Complicates Meaningful Group Comparisons . Structural Equation Modeling A Multidisciplinary Journal, 11(4), 514-534.
  • Martin, M., Mullis, I., Gonzalez, E., Gregory, K., Smith, T., Chrostowski, S., O'Connor, K. (2000). TIMSS 2009 International Science Report: Findings from IEA's Repeat of The Third International Mathematics and Science Study at the Eight Grade.
  • McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah: NJ: Lawrence Erlbaum Associates.
  • Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Organizational Research Methods,7(4), 361-388.
  • Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Organizational Research Methods,7(4), 361-388.
  • Milli Eğitim Bakanlığı (2016). PISA 2015 International report. Ankara. [Online: https://odsgm.meb.gov.tr/www/2015-pisa-ulusalraporu/icerik/204], 2016.
  • Milli Eğitim Bakanlığı (2016). PISA 2015 International report. Ankara. [Online: https://odsgm.meb.gov.tr/www/2015-pisa-ulusalraporu/icerik/204], 2016.
  • Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Pyschometrika.
  • Millsap , R. E. (2011). Statistical approaches to measurement invariance. New York: US: Routledge/Taylor & Francis Group.
  • Muthén, B., and Asparouhov, T. (2013). BSEM measurement invariance analysis. Mplus Web Notes: No. 17. Available online at: www.statmodel.com
  • Muthén, B., and Asparouhov, T. (2013). BSEM measurement invariance analysis. Mplus Web Notes: No. 17. Available online at: www.statmodel.com
  • Muthén , B., Asparouhov, T., & Morin, A. J. (2015). Bayesian Structural Equation Modeling With Cross-Loadings and Residual Covariances: Comments on Stromeyer et al. Journal Of Management.
  • Muthén, L. K., & Muthén, B. O. (2019). Mplus user’s guide. Los Angeles, CA: Muthén & Muthén.
  • Muthén, L. K., & Muthén, B. O. (2019). Mplus user’s guide. Los Angeles, CA: Muthén & Muthén.
  • Nylund, K.L., Asparouhov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.
  • Nylund, K.L., Asparouhov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.
  • Organisation for Economic Co-operation and Development (2016). Online: http://www.oecd.org/education/ ], 2016
  • Organisation for Economic Co-operation and Development (2016). Online: http://www.oecd.org/education/ ], 2016
  • Oishi, S. (2006). The concept of life satisfaction across cultures: An IRT analysis. Journal of Research in Personality, 40(4), 411-423.
  • Oishi, S. (2006). The concept of life satisfaction across cultures: An IRT analysis. Journal of Research in Personality, 40(4), 411-423.
  • Ogretmen, T. (2006). Uluslararası okuma becerilerinde gelişim projesi (PIRLS) 2001 testinin psikometrik özelliklerinin incelenmesi: Türkiye-Amerika Birleşik Devletleri örneği. Ankara.
  • Ogretmen, T. (2006). Uluslararası okuma becerilerinde gelişim projesi (PIRLS) 2001 testinin psikometrik özelliklerinin incelenmesi: Türkiye-Amerika Birleşik Devletleri örneği. Ankara.
  • Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Journal of Psychological Test and AssessmentModeling, 53(3), 315-333.
  • Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Journal of Psychological Test and AssessmentModeling, 53(3), 315-333.
  • Onen, E. (2009). Ölçme değişmezliğinin yapısal eşitlik modellemesi teknikleri ile incelenmesi. Ankara: Ankara University, Doctoral Thesis.
  • Onen, E. (2009). Ölçme değişmezliğinin yapısal eşitlik modellemesi teknikleri ile incelenmesi. Ankara: Ankara University, Doctoral Thesis.
  • Raykov, T., Marcoulides, G. A., & Millsap, R. E. (2013). Examining factorial invariance: A multiple testing procedure. Educational and Psychological Measurement, 73(4), 713-727.
  • Raykov, T., Marcoulides, G. A., & Millsap, R. E. (2013). Examining factorial invariance: A multiple testing procedure. Educational and Psychological Measurement, 73(4), 713-727.
  • Raykov, T., Dimitrov, D., Marcoulides, G., Li, T., & Menold, N. (2018). Examining Measurement Invariance and Differential Item Functioning With Discrete Latent Construct Indicators: A Note on a Multiple Testing Procedure. Educational and Psychological Measurement, 78(2), 343-352.
  • Raykov, T., Dimitrov, D., Marcoulides, G., Li, T., & Menold, N. (2018). Examining Measurement Invariance and Differential Item Functioning With Discrete Latent Construct Indicators: A Note on a Multiple Testing Procedure. Educational and Psychological Measurement, 78(2), 343-352.
  • Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114(3), 552-566.
  • Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114(3), 552-566.
  • Rubin, D. B., (1976). Inference and missing data. Biometrika, 63, 581-592.
  • Rubin, D. B., (1976). Inference and missing data. Biometrika, 63, 581-592.
  • Salzberg, T., Sinkovics, R., & Schlgelmich, B. (1999). Data equivalence in cross-cultural research: a comparison of classical test theory and latent trait theory based approaches. Australasian Marketing Journal, 23-38.
  • Salzberg, T., Sinkovics, R., & Schlgelmich, B. (1999). Data equivalence in cross-cultural research: a comparison of classical test theory and latent trait theory based approaches. Australasian Marketing Journal, 23-38.
  • Sireci, S. G., & Berberoğlu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13(3), 229-248.
  • Sireci, S. G., & Berberoğlu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13(3), 229-248.
  • Sirganci, G., Uyumaz, G., & Yandi, A. (2020). Measurement invariance testing with alignment method: Many groups comparison. International Journal of Assessment Tools in Education, 657-673.
  • Sirganci, G., Uyumaz, G., & Yandi, A. (2020). Measurement invariance testing with alignment method: Many groups comparison. International Journal of Assessment Tools in Education, 657-673.
  • Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393-408.
  • Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393-408.
  • van de Schoot, R., Kluytmans, A., Tummers, L., Lugtig, P., Hox, J., & Muthén, B. (2013). Facing off with Scylla and Charybdis: A comparison of scalar, partial, and the novel possibility of approximate measurement invariance. Frontiers in Psychology, 4, 1–15.
  • van de Schoot, R., Kluytmans, A., Tummers, L., Lugtig, P., Hox, J., & Muthén, B. (2013). Facing off with Scylla and Charybdis: A comparison of scalar, partial, and the novel possibility of approximate measurement invariance. Frontiers in Psychology, 4, 1–15.
  • Wu, D., Li, Z., & Zumbo, B. (2007). Decoding the meaning of factorial invariance and updating the practice of multi-group confirmatory factor analysis: A demonstration with TIMSS data. Practical Assesment, Research & Evaluation, 12(3), 1-26.
  • Wu, D., Li, Z., & Zumbo, B. (2007). Decoding the meaning of factorial invariance and updating the practice of multi-group confirmatory factor analysis: A demonstration with TIMSS data. Practical Assesment, Research & Evaluation, 12(3), 1-26.