A Comparison of Covariates, Equating Designs, and Methods in Equating TIMSS 2019 Science Tests

A Comparison of Covariates, Equating Designs, and Methods in Equating TIMSS 2019 Science Tests

This research aimed to compare the equated scores by the methods based on classical test theory (CTT) and kernel equating, using covariates design (NEC) and anchor test design (NEAT). TIMSS 2019 science test scores equated by both Tucker, Levine true score, Levine observed score, equipercentile equating (pre-smoothing and post-smoothing) methods in CTT, and linear and equipercentile methods in kernel equating. Additionally, the covariates in NEC design were “home resources for learning,” “student confidence in science and mathematics,” “like learning science,” “instructional clarity in science lessons,” “math achievement,” “sex,” and “speaking the language of the test at home”. The equating results in NEC were compared with those in NEAT and EG. The participants comprised 1699 4th-grade students who attended the e-TIMSS 2019 in Canada, Singapore, and Chile. Results were analyzed according to equating errors and differences between equated scores. The research concluded that math achievement and home resources for learning could be used as covariates in NEC to equate the science test in case equating could not be done in the NEAT. However, when the other variables were used as covariates in NEC, the equated scores were very similar to the EG. Also, Tucker (CTT) and post-stratification (kernel) yielded similar equated scores in linear equating, and these methods were similarly different from kernel linear equating in EG. In equipercentile equating, the equated scores obtained from the post-smoothing (CTT) and EG were close to each other but slightly differed from post-stratification.


  • Akın Arıkan, C. (2019). A comparison of kernel equating methods based on neat design. Eurasian Journal of Educational Research, 19(82), 27-44. Retrieved from https://dergipark.org.tr/en/pub/ejer/issue/48089/608101
  • Akın-Arıkan, Ç. (2020). The impact of covariate variables on kernel equating under the non-equivalent groups. Journal of Measurement and Evaluation in Education and Psychology, 11(4), 362-373. doi:10.21031/epod.706835
  • Akın Arıkan, Ç., & Gelbal, S. (2018). A comparison of traditional and kernel equating methods. International Journal of Assessment Tools in Education, 5(3), 417–427. doi:10.21449/ijate.409826
  • Albano, A. D. (2016). equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74(8), 1–36. doi: 10.18637/jss.v074.i08
  • Altintas, O., & Wallin, G. (2021). Equality of admission tests using kernel equating under the non-equivalent groups with covariates design. International Journal of Assessment Tools in Education, 8(4), 729–743. doi:10.21449/ijate.976660
  • Andersson, B., Branberg, K., & Wiberg, M. (2013). Performing the kernel method of test equating using the R package kequate. Journal of Statistical Software, 55, 1-25.
  • Andersson, B., Branberg, K., & Wiberg, M. (2022). Package ‘kequate’. Retrieved from https://cran.r-project.org/web/packages/kequate/kequate.pdf
  • Atalay Kabasakal, K., & Kelecioglu, H. (2015). Effect of differential item functioning on test equating. Educational Sciences: Theory & Practice, 15(5), 1229–1246. doi:10.12738/estp.2015.5.2505
  • Atar, B., Atalay Kabasakal, K., & Kibrislioglu Uysal, N. (2023). Comparability of TIMSS 2015 mathematics test scores across country subgroups. The Journal of Experimental Education, 91(1), 82-100. doi:10.1080/00220973.2021.1913978
  • Aydın, M. (2015). The effects of student-level and school-level factors on middle school students’ mathematics achievement. (Unpublished doctoral dissertation). Necmettin Erbakan University, Konya.
  • Aydoğan, İ., & Gelbal, S. (2022). Determination of the characteristics predicting science achievement through the classification and regression tree (cart) method: The case of TIMSS 2015 Turkey. Education and Science, 47(209), 239-259. doi:10.15390/EB.2022.9368
  • Branberg, K., & Wiberg, M. (2011). Observed score linear equating with covariates. Journal of Educational Measurement, 48(4), 419-440. doi:10.1111/j.1745-3984.2011.00153.x
  • Coşkun, B. (2021). The effects of student and school characteristics on TIMSS 2015 science and math achievement. (Unpublished doctoral dissertation). Eskişehir Osmangazi University, Eskişehir.
  • González, J., & Wiberg, M. (2017). Applying test equating methods using R. Cham, Switzerland: Springer.
  • Guilford, J. P. (1956). Fundamental statistics in psychology and education (3th ed.). New York: McGraw-Hill.
  • House, J. D. (2006). The effects of classroom instructional strategies on science achievement of elementary-school students in Japan: Findings from the Third International Mathematics and Science Study (TIMSS). International Journal of Instructional Media, 33(2), 217-229.
  • International Association for the Evaluation of Educational Achievement. (2021). TIMSS 2019 international database. [Data set]. Retrieved from https://timss2019.org/international-database/?_gl=1*bf0qid*_ga*NDQyNzY0MjI0LjE2NDI2MjMxOTU.*_ga_L2FMXN42HR*MTY0MjYyMzE5NS4xLjAuMTY0MjYyMzE5NS4w
  • Kim, S., & Lu, R. (2018). The pseudo-equivalent groups approach as an alternative to common-item equating (Research Report No. ETS RR–18-02). ETS Research Report Series. doi:10.1002/ets2.12195
  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking (3rd ed.). New York: Springer.
  • Leung, F. K. (2002). Behind the high achievement of East Asian students. Educational Research and Evaluation, 8(1), 87-108.
  • Liu, J., Guo, H., & Dorans, N. J. (2014). A comparison of raw-to-scale conversion consistency between single- and multiple-linking using a nonequivalent groups anchor test design (Research Report No. ETS RR–14-13). ETS Research Report Series. doi:10.1002/ets2.12014
  • Liu, J., & Low, A. C. (2008). A comparison of the kernel equating method with traditional equating methods using SAT data. Journal of Educational Measurement, 45(4), 309-323.
  • Livingston, S. A. (2014). Equating test scores (without IRT). (2nd. ed.). Educational Testing Service. Retrieved from https://www.ets.org/Media/Research/pdf/LIVINGSTON2ed.pdf
  • Lu. R.. & Guo. H. (2018). A simulation study to compare nonequivalent groups with anchor testing and pseudo-equivalent group linking (Research Report No. RR-18-08). Educational Testing Service. doi:10.1002/ets2.12196
  • Lyren, P. E., & Hambleton, R. K. (2011). Consequence of violated equating assumptions using the equivalent group design. International Journal of Testing, 11(4), 308–323. doi:10.1080/15305058.2011.585535
  • Mao, X., von Davier, A. A., & Rupp, S. (2006). Comparisons of the kernel equating method with the traditional equating methods on Praxis™ data (Research Report No. RR-06-30). ETS Research Report Series. Retrieved from https://files.eric.ed.gov/fulltext/EJ1111483.pdf
  • Mullis, I. V. S. (2013). Introduction. In I. V. S. Mullis & M. O. Martin (Eds.), TIMSS 2015 assessment frameworks, (3-9). Boston College: TIMSS & PIRLS International Study Center. Retrieved from https://timssandpirls.bc.edu/timss2015/frameworks.html
  • Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 international results in mathematics and science. Boston College: TIMSS & PIRLS International Study Center. Retrieved from https://timss2019.org/reports/achievement/#science-4
  • Özkan, U. B. (2018). Comparative evaluation of TIMSS-2015 results in terms of educational resources at home. Amasya Education Journal, 7(1), 98-120.
  • Pajares, F. (2008). Motivational role of self-efficacy beliefs in self-regulated learning. In D. H. Schunk, & B. J. Zimmerman (Eds.), Motivation and self-regulated learning: Theory and research and applications (111-140). New York: Lawrence Erlbaum Associates.
  • Puhan, G. (2010). A comparison of chained linear and poststratification linear equating under different testing conditions. Journal of Educational Measurement, 47(1), 54-75.
  • Revelle, W. (2022). Package ‘psych’. Retrieved from https://cran.r-project.org/web/packages/psych/psych.pdf
  • Salaway, L. J. (2008). Efficacy of a direct instruction approach to promote early learning (Unpublished doctoral dissertation). Duquesne University, Department of Counselling, Psychology and Special Education, Pittsburgh.
  • Sansivieri, V., Wiberg, M., & Matteucci, M. (2017). A review of test equating methods with a special focus on irt-based approaches. Statistica, 77(4), 329–352. doi:10.6092/issn.1973-2201/7066
  • Sarıer, Y. (2020). Turkey's performance in TIMSS applications and variables predicting academic achievement. Journal of Primary Education, 2(2), 6-27.
  • Skaggs, G., & Lissitz, R. W. (1986). An exploration of the robustness of four test equating models. Applied Psychological Measurement, 10(3), 303–317. doi:10.1177/014662168601000308
  • Soysal, S. (2019). The effects of getting home learning resources and preschool education training on TIMSS 2015 mathematics and science performance. Academy Journal of Educational Sciences, 3(2), 101-113.
  • Suh, Y., Mroch, A. A., Kane, M. T., & Ripkey, D. R. (2009). An empirical comparison of five linear equating methods for the NEAT design. Measurement, 7, 147–173. doi: 10.1080/15366360903418048
  • Üstün, E. (2007). Okul öncesi çocuklarının okuma yazma becerilerinin gelişimi [Development of preschool children's literacy skills]. İstanbul: Morpa.
  • von Davier, A. A., Holland, P. W., Livingston, S. A., Casabianca, J., Grant, M. C., & Martin, K. (2006). An evaluation of the kernel equating method: A special study with pseudotests constructed from real test data (Research Report No. RR-06-02). ETS Research Report Series. doi:10.1002/j.2333-8504.2006.tb02008.x
  • von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York, NY: Springer.
  • Yin, L., & Fishbein, B. (2020). Creating and interpreting the TIMSS 2019 context questionnaire scales. In M. O. Martin., M. von Davier. & I. V. S. Mullis (Eds.). Methods and procedures: TIMSS 2019 technical report (16.1-16.331). Boston College: TIMSS & PIRLS International Study Center. Retrieved from https://timssandpirls.bc.edu/timss2019/methods/chapter-16.html
  • Yurtcu, M., & Guzeller, C. O. (2018). Investigation of equating error in tests with differential item functioning. International Journal of Assessment Tools in Education, 5(1), 50–57. doi:10.21449/ijate.316420
  • Yurtçu, M., Kelecioğlu, H., & Boone, E. L. (2021). The comparison of the equated tests scores by various covariates using bayesian nonparametric model. Journal of Measurement and Evaluation in Education and Psychology, 12(2), 192-211. doi:10.21031/epod.864744
  • Wallin, G. & Wiberg, M. (2016). Nonequivalent groups with covariates design using propensity scores for kernel equating. Paper presented in Quantitative Psychology The 81st Annual Meeting of the Psychometric Society, Asheville, North Carolina.
  • Wallin, G., & Wiberg, M. (2019). Kernel equating using propensity scores for nonequivalent groups. Journal of Educational and Behavioral Statistics, 44(4), 390-414. doi:10.3102/1076998619838226
  • Wiberg, M., & Branberg, K. (2015). Kernel equating under the non-equivalent groups with covariates design. Applied Psychological Measurement, 39(5), 349-361. doi:10.1177/0146621614567939
  • Wiberg, M., & González, J. (2016). Statistical assessment of estimated transformations in observed-score equating. Journal of Educational Measurement, 53(1), 106–125. http://www.jstor.org/stable/43940606
  • Wiersma, W., & Jurs, S. G. (2005). Research methods in education: An introduction. Boston: Pearson.


APA Sezer Başaran, E. , Mutluer, C. & Çakan, M. (2023). A Comparison of Covariates, Equating Designs, and Methods in Equating TIMSS 2019 Science Tests . Participatory Educational Research , 10 (5) , 41-63 . DOI: 10.17275/per.
Participatory Educational Research
  • ISSN: 2148-6123
  • Yayın Aralığı: Yılda 4 Sayı
  • Yayıncı: Özgen KORKMAZ


Sayıdaki Diğer Makaleler

The Role of Time on Performance Assessment (Self, Peer and Teacher) in Higher Education: Rater Drift

Hikmet ŞEVGİN, Mehmet ŞATA

School Principals’ Perspective on Technological Leadership, Technostress and Information and Communication Technology: A scoping review

Erol ATA, Fatih SALTAN

A Comparison of Covariates, Equating Designs, and Methods in Equating TIMSS 2019 Science Tests


Are Existing Mobile Writing Applications for Writing Difficulties Sufficient?


The effects of Multiple Representation Method and Prior Knowledge Level on Problem Solving Skills and Cognitive Load

Muhammed DAĞLI, Ahmet Feyzi SATICI

I Am Learning Destructive Natural Disasters: Technology Enriched “My Hero Water Drop Activity"


Pre-service Teachers' Environmental Literacy: The Role of STEM-Based Environmental Education with Microcontrollers


Can Dialogic Narratives and Discourse Engage Online Learners?

Ruth Gannon COOK

Learner Readiness and Efficiency of Learning English in Online Higher Education Context: Voices of Turkish EFL Learners

Burak TOMAK, Ufuk ATAŞ

Job Satisfaction with Profession among Teachers in Türkiye: Perceptions of Social Utility and Educational Policy Influence