An Investigation of Group Invariance in Test Equating According to Gender

The aim of this study is to investigate the group invariance condition according to Tucker and Levine observed score equating among linear equating methods. In the study, the 4th and 6th booklets of the PISA 2012 Mathematics subtest were used. Booklets were equated according to group and gender sub-variables, and then group invariance of each condition and WMSE values were calculated. Within this scope, REMSD and RMSD (x) group invariance indexes were employed. The results of the study indicated that, when WMSE values, obtained according to equating methods, were compared, Tucker observed score equating method with regard to whole-group and gender sub-groups produced the lowest error. When RMSD and REMSD values obtained according to gender sub-groups were examined by linear equating methods, it was found that group invariance value is smaller than criterion value for Tucker equating method, while it was greater than criterion value for Levine equating method. Eventually, group invariance condition was met for Tucker observed score equating, but not for Levine observed score equating.


  • Akhun, İ. (1984). İki korelasyon katsayısı arasındaki manidarlığın test edilmesi. Ankara Üniversitesi Eğitim Fakültesi Dergisi. 17, 1-7.
  • Angoff, W. (1996). Scales, norms, and equivalent scores. Educational Measurement: Theories and Applications, 2, 121.
  • Atalay Kabasakal, K. (2014). Değişen madde fonksiyonunun test eşitlemeye etkisi (Yayınlanmamış Doktora Tezi). Hacettepe Üniversitesi, Eğitim Bilimleri Enstitüsü, Ankara.
  • Baykul, Y. (1996). İstatistik: Metodlar ve uygulamalar (3. Baskı). Ankara: Anı Yayıncılık
  • Babcock, B., Albano, A., & Raymond, M. (2012). Nominal weights mean equating: A method for very small samples. Educational And Psychological Measurement, 72(4), 608-628.
  • Bozdağ, S., & Kan, A. (2010). Şans başarısının test eşitlemeye etkisi. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 39, 91-108.
  • Braun, H. I., & Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. Test equating, 1982, 9-49.
  • Chen, H., Cui, Z., Zhu, R., & Gao, X. (2010). Evaluating the effects of differences in group abilities on the Tucker and the Levine observed-score methods for common-item nonequivalent groups equating. ACT Research Report Series (2010-(1)). Iowa City, IA: ACT.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. NewYork: Holt, Rinehart& Winston.
  • Demir, S., & Güler, N. (2014). Ortak maddeli denk olmayan gruplar desenine ilişkin test eşitleme çalışması. International Journal of Human Sciences, 11(2), 190-208.
  • Dorans, N. J. (2004). Using subpopulation invariance to assess test score equity. Journal of Educational Measurement, 41, 43-68.
  • Dorans, N. J. (2008). Three facets of fairness. Paper presented at the annual meeting of the National Council on Measurement in Education, New York.
  • Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equitability of tests: Basic theory and the linear case. Journal of Measurement, 37, 281-306.
  • Dorans, N. J., Holland, P. W., Thayer, D. T., & Tateneni, K. (2003). Invariance of score linking across gender group for three Advanced Placement Program examinations. In N. J. Dorans (Ed.). Population invariance o f score linking: Theory and applications to Advanced Placement Program examinations (RR-03-27). Princeton, NJ: Educational Testing Service.
  • Dorans, N. J., Liu, J., & Hammond, S. (2008). Anchor test type and population invariance: An exploration across subpopulations and test administrations. Applied Psychological Measurement, 32, 81-98.
  • Gök, B. (2012). Denk olmayan gruplarda ortak madde deseni kullanılarak Madde Tepki Kuramına dayalı eşitleme yöntemlerinin karşılaştırılması (Yayınlanmamış Doktora Tezi). Hacettepe Üniversitesi, Sosyal Bilimler Enstitüsü, Ankara.
  • Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and applications. Boston: Kluwer.
  • İnci, Y. (2014). Örneklem büyüklüğünün test eşitlemeye etkisi (Yayınlanmamış Yüksek Lisans Tezi). Hacettepe Üniversitesi, Eğitim Bilimleri Enstitüsü, Ankara.
  • Kahraman, H. (2012). Düzgünleştirilmiş puanların eşitleme hatasına etkisi (Yayınlanmamış Yüksek Lisans Tezi). Hacettepe Üniversitesi, Sosyal Bilimler Enstitüsü, Ankara.
  • Kan, A. (2011). Test eşitleme: OKS testlerinin istatistiksel eşitliğinin sınanması. Eğitim ve Bilim, 36(160), 38-51.
  • Kelecioğlu, H. (1994). Öğrenci Seçme Sınavı puanlarının eşitlenmesi üzerine bir çalışma (Yayınlanmamış Doktora Tezi). Hacettepe Üniversitesi, Sosyal Bilimler Enstitüsü Ankara.
  • Kelecioğlu, H., & Öztürk Gübeş, N. (2013). Comparing linear equating and equipercentile equating methods using random groups design. International. Online Journal of Educational Sciences, 5(1), 227-241.
  • Kilmen, S. (2010). Madde Tepki Kuramına dayalı test eşitleme yöntemlerinden kestirilen eşitleme hatalarının örneklem büyüklüğü ve yetenek dağılımına göre karşılaştırılması (Yayınlanmamış doktora tezi). Ankara Üniversitesi, Eğitim Bilimler Enstitüsü, Ankara.
  • Kim, S., & Walker, M. E. (2009). Evaluating subpopulation invariance of linking functions to determine the anchor composition for a mixed-format test. ETS Research Rep. No. RR-09-36. Princeton, NJ: Educational Testing Service.
  • Kolen, M. J. (1988). An NCME intructional module on traditional equating methodology. Educational Measurement: Isuues and Practice, 7, 29-36.
  • Kolen, M. J. (2004). Population invariance in equating and linking: Concept and history. Journal of Educational Measurement, 41, 3-14.
  • Kolen, M., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd Ed.). New York: Springer.
  • Kolen, M., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd Ed.). New York: Springer.
  • Levine, R. (1955). Equating the score scales of alternate forms administered to samples of different ability. ETS Research Report Series, 1955(2).
  • Livingston, S. A. (2004). Equating test scores (without IRT). Princeton, NJ: Educational Testing Service.
  • MEB. (2010). PISA 2009 projesi ulusal ön raporu. MEB Eğitimi Araştırma ve Geliştirme Dairesi Başkanlığı.
  • MEB. (2013). PISA 2012 ulusal ön raporu. MEB Yenilik ve Eğitim Teknolojileri Genel Müdürlüğü.
  • Mutluer, C. (2013). Yıl içinde farklı dönemlerde yapılan Akademik Personel ve Lisansüstü Eğitimi Giriş Sınavı (ALES) puanlarına ilişkin bir test eşitleme çalışması (Yayınlanmamış yüksek lisans tezi). Abant İzzet Baysal Üniversitesi, Eğitim Bilimleri Enstitüsü, Bolu.
  • OECD. (2014). PISA Technical Report, OECD Publishing.
  • Öztürk, N. (2010). Akademik personel ve lisansüstü eğitimi giriş sınavı puanlarının eşitlenmesi üzerine bir çalışma (Yayınlanmamış yüksek lisans tezi). Hacettepe Üniversitesi, Sosyal Bilimler Enstitüsü. Ankara.
  • Öztürk Gübeş, N., & Kelecioğlu, H. (2017). Investigating group invariance of equating results. Elementary Education Online, 16(1), 217-227.
  • Parshall, C. G., Du Bose Houghton, P., & Kromrey, J. D. (1995). Equating error and statistical bias in small sample linear equating. Journal of Educational Measurement, 32, 37-54.
  • Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming and equating. In R. L. Linn (Ed.) Educational Measurement (pp.221-262). New York: Macmillan.
  • Skaggs, G., & Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56, 495–529.
  • Skaggs, G., & Lissitz, R. W. (1988). Effect of examinee ability on test equating invariance. Applied Psychological Measurement, 12, 69–82.
  • Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42, 309-330.
  • Şahhüseyinoğlu, D. (2005). İngilizce yeterlik sınavı puanlarının üç farklı eşitleme yöntemine göre karşılaştırılması (Yayımlanmamış Doktora Tezi). Hacettepe Üniversitesi, Sosyal Bilimler Enstitüsü. Ankara.
  • Topczewski, A., Cui, Z., Woodruff, D., Chen, H., & Fang, Y. (2013). Comparison of four linear equating methods for the common-item nonequivalent groups design using simulation methods. ACT Research Report Series (2013-(2). Iowa City, IA: ACT.
  • Uysal, İ. (2012). Madde Tepki Kuramı'na dayalı test eşitleme yöntemlerinin karma modeller üzerinde karşılaştırılması (Yayınlanmamış yüksek lisans tezi). Abant İzzet Baysal Üniversitesi, Eğitim Bilimleri Enstitüsü, Bolu.
  • von Davier, A. A., & Han, N. (2004). Population invariance and linear equating for the non-equivalent groups design. (ETS RR-04-47). Princeton, NJ: Educational Testing Service.
  • von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.
  • von Davier, A. A., & Kong, N. (2003). A unified approach to linear equating for the non-equivalent groups design. (ETS SR-03-31). Princeton, NJ: Educational Testing Service.
  • von Davier, A. A., & Wilson, C. (2008). Investigating the population sensitivity assumption of Item Response Theory true-score equating across two subgroups of examinees and two test formats. Applied Psychological Measurement, 32, 11-26.
  • Yang, W. L., & Gao, R. (2008). Invariance of score linkings across gender groups for forms of a testlet-based college-level examination program examination. Applied Psychological Measurement, 32, 45-61.
  • Yang, W. L. (2004). Sensitivity of linkings between AP multiple-choice scores and composite scores to geographical region: An illustration of checking for population invariance. Journal of Educational Measurement, 41, 33-41.
  • Yin, P., Brennan, R. L., & Kolen, M. J. (2004). Concordance between ACT and ITED scores from different populations. Applied Psychological Measurement, 28, 274-289.
  • Yi, Q., Harris, D. J., & Gao, X. (2008). Invariance of equating functions across different subgroups of examinees taking a science achievement test. Applied Psychological Measurement, 32, 62-80.