Gender differential item functioning in mathematics in four international jurisdictions

Yapılan çalışmalar hem erkeklerin kızlara göre matematikte daha avantajlı olduğunu, hem de madde türlerine göre matematik performansında cinsiyete göre farklılıkların olduğunu göstermiştir. Bu geçmiş çalışmaların çoğunda cinsiyet farklılıkları sadece bir kültürel grupta incelenmiştir. Bu çalışmada ise, cinsiyete göre farklı işleyen maddeler geniş ölçekli uluslararası uygulamalara katılan dört farklı bölgede incelemektedir: Kanada, Çin (Şanghay), Finlandiya ve Türkiye. Her bir bölgeye ait bulgular daha önceki araştırmaların sonuçlarını desteklemektedir: genel olarak, çoktan seçmeli soruların erkekler lehine, cevabın yazılmasını gerektiren türdeki soruların ise kızlar lehine çalıştığı görülmüştür. Bölgeler arasında, kızların lehine işleyen soru oranının erkeklerin lehine işleyen soru oranına yakın olduğu, cinsiyete göre farklı işleyen madde sayısının ise en fazla Finlandiya’da olduğu görülmüştür.

Dört uluslararası bölgede matematik alanında cinsiyete göre farklı işleyen maddeler

Past research has shown a male advantage in mathematics compared to females, as well as gender differences in mathematics performance by type of item. However, most past research has focused on gender differences within one cultural group. This research examines gender differential item functioning (DIF) across four jurisdictions that took part in a large-scale international assessment: Canada, Shanghai (China), Finland, and Turkey. The findings from each jurisdiction were consistent with previous research findings: selected response formats tended to favour males, while constructed response item formats tended to favour females. Similar proportions of items favoured one gender group over the other across jurisdictions, and Finland had the greatest number of items that exhibited DIF in favour of both genders.

___

  • Abedalaziz, N. (2010a). A gender-related differential item functioning of mathematics test items. The International Journal of Educational and Psychological Assessment, 5, 101- 116.
  • Abedalaziz, N. (2010b). Detecting gender related DIF using logistic regression and Mantel-Haenszel approaches. Procedia-Social and Behavioural Sciences, 7(C), 406-413.
  • Angoff, W. H. (1982). Use of difficulty and discrimination indices for detecting item bias. In R. A. Berk (Ed.), Handbook of methods for detecting test bias (pp. 96-116). Baltimore: Johns Hopkins University Press.
  • Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3-23). Hillsdale, NJ: Erlbaum.
  • Berberoglu, G. (1995). Differential Item Functioning (DIF) Analysis of Computation, Word Problem and Geometry Questions across Gender and SES Groups. Studies in Educational Evaluation, 21(4), 439-456.
  • Burkett, G. (1998). Pardux (Version 1.02) [Software]. CTB/McGraw-Hill.
  • Ercikan, K. (2002). Disentangling sources of differential item functioning in multi-language assessments. International Journal of Testing, 2, 199-215.
  • Ercikan, K. (2009). Limitations in sample to population generalizing. In K. Ercikan & M-W. Roth (Eds.), Generalizing in educational research: Beyond qualitative and quantitative polarization (pp. 211- 235), New York: Routledge.
  • Ercikan, K., Arim, R.,G., Law, D. M., Lacroix, S., Gagnon, F., & Domene, J. F. (2010). Application of think-aloud protocols in examining sources of differential item functioning. Educational Measurement: Issues and Practice, 29, 24-35.
  • Ercikan, K., & Lyons-Thomas, J. (2013). Adapting tests for use in other languages and cultures. In K. Geisinger (Ed.), APA handbook of testing and assessment in psychology, Volume 3, (pp. 545-569). American Psychological Association: Washington, DC.
  • Ercikan, K., & McCreith, T. (2002). Effects of adaptations on comparability of test items and test scores. In D. Robitaille & A. Beaton (Eds.) Secondary analysis of the TIMSS results: A synthesis of current research (pp. 391-407). Dordrecht, the Netherlands, Kluwer Academic Publishers.
  • Ercikan, K., McCreith, T., & Lapointe, V. (2005). Factors associated with mathematics achievement and participation in advanced mathematics courses: An examination of gender differences from an international perspective. School Science and Mathematics Journal, 105, 11-18.
  • Ercikan, K., & Oliveri, M. E. (in press). Is fairness research doing justice? A modest proposal for an alternative validation approach in differential item functioning (DIF) investigations. In M. Chatterji (Ed.) Validity, fairness and testing of individuals in high stakes decision-making context (pp. 69-86). Bingley, UK: Emerald Publishing.
  • Ercikan, K., Roth, W-M. & Asil, M. (in press). Cautions about uses of international assessments. Teachers College Record.
  • Ercikan, K., Simon, M., & Oliveri, M. E. (2013). Score comparability of multiple language versions of assessments within jurisdictions. In M. Simon, K. Ercikan, & M. Rousseau. (Eds.), Improving large-scale assessment in education: Theory, issues and practice. (pp. 110-124). New York: Routledge/Taylor & Francis.
  • Gierl, M., Khaliq, S.N. & Boughton, K. (1999, June). Gender Differential Item Functioning in Mathematics and Science: Prevalence and Policy Implications. Paper presented at the Canadian Society for the Study of Education.
  • Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory. Newberry Park, CA: Sage Publications, Inc.
  • Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Lane, S., Wang, N. & Magone, M. (1996). Gender-related differential item functioning on a middle- school mathematics performance assessment. Educational Measurement: Issues and Practice, 15(4), 21-27.
  • Le, L.T. (2009). Investigating Gender Differential Item Functioning Across Countries and Test Languages for PISA Science Items. International Journal of Testing, 9, 122–133.
  • Liu, O.L. & Wilson, M. (2009a). Gender differences in large-scale math assessments: PISA trend 2000 and 2003. Applied Measurement in Education, 22, 164-184.
  • Liu, O.L., Wilson, M. (2009b). Gender differences and similarities in PISA 2003 mathematics: A comparison between the United States and Hong Kong. International Journal of Testing, 9, 20-40.
  • Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19, 289-304.
  • Oliveri, M., & Ercikan, K. (2011). Do different approaches to examining construct comparability lead to similar conclusions? Applied Measurement in Education, 24, 349-366.
  • Organisation for Economic Co-operation and Development. (2010a). PISA 2009 Results: Executive Summary, Retrieved from http://www.oecd.org/dataoecd/34/60/46619703.pdf
  • Organisation for Economic Co-operation and Development. (2010b). PISA 2009 Results: What Students Know and Can Do – Student Performance in Reading, Mathematics and Science, (Volume I). Retrieved from http://dx.doi.org/10.1787/9789264091450-en
  • Organisation for Economic Co-operation and Development. (2012). PISA 2009 Technical Report, Retrieved from http://dx.doi.org/10.1787/9789264167872-en
  • Penfield, R. D. (2003). IRT-Lab: Software for research and pedagogy in item response theory. Applied Psychological Measurement, 27(4), 301–302.
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/dif from group ability differences and detects test bias/dtf as well as item bias/dif. Psychometrika, 58, 159-194.
  • Sireci, S. G., Patsula, L., & Hambleton, R. K. (2005). Statistical methods for identifying flaws in the test adaptation process. In R. K. Hambleton, P. F. Merenda, & Spielberger, C. D. (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 93–115). Mahwah, N.J.: Lawrence Erlbaum Associates, Inc.
  • Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245-262.
  • Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three- parameter logistic model. Applied Psychological Measurement, 8(2), 125-145.
  • Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187-213.