A Study on Detecting of Differential Item Functioning of PISA 2006 Science Literacy Items in Turkish and American Samples

groups (different gender, cultural background, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test scores. Thus, items should be tested for bias during the process of test development and adaptation. Items used in testing programs, such as the Program for International Student Assessment (PISA) study, whose results are  inform educational  policies throughout the participating countries, should be reviewed for bias. The study examines whether items of the 2006 PISA science literacy test, applied in Turkey, show bias.Purpose of the Study: The aim of this study is  to analyze  the measurement equality of the PISA science literacy test of 2006 in Turkish and American groups in terms of structural invariance and  also determined whether the science literacy items show inter-cultural bias. 
Keywords:

-,

___

  • Abbott, M.L. (2007). A Confirmatory Approach to Differential İtem Functioning on an ESL Reading Assessment. Language Testing 2007 24: 7. Retrieved January 24 2011 from http://ltj.sagepub.com/content/24/1/7
  • Acar, T. (2008). Maddenin Farklı Fonksiyonlaşmasını Belirlemede Kullanılan Genelleştirilmiş Aşamalı Doğrusal Modelleme, Lojistik Regresyon Ve Olabilirlik Oranı Tekniklerinin Karsılaştırılması. [Determination of a Differential Item Functioning (DIF) Procedure Using the Hierarchical Generalized Linear Model: A Comparison Study with Logistic Regression and Likelihood Ratio Procedure] Yayımlanmamış doktora tezi, Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Ankara.
  • Angoff, William,H. (1993). Perspectives on differential item functioning methodology. In Differentional item functioning Eds. (Paul. W. Holland and Howard Wainer). p.3-23.IEA: NJ USA.
  • Ateşok Deveci, N. (2008). Üniversitelerarası Kurul Yabancı Dil Sınavının Madde Yanlılığı Bakımından İncelenmesi. [Examination of Inter-University Board Foreign Language Test in The Frame of Item Bias] Yayımlanmamış doktora tezi, Ankara Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
  • Bakan Kalaycıoğlu, Dilara. (2008). Öğrenci Seçme Sınavı’nın Madde Yanlılığı Açısından İncelenmesi [Item Bias Analysis of the University Entrance Examination]. Yayımlanmamış doktora tezi, Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Ankara.
  • Baker, F. B. (2001). The Basıcs of Item Response Theory. Second edition. ERIC Clearinghouse on Assessment and Evaluation
  • Bertrand, R. & Boiteau, N. (2003). Comparing the Stability of IRT- Based and non IRT-Based DIF Methods in Different Cultural Contexts Using TIMSS Data. Educational Resources information center (ERIC)
  • Benito, J.G. & Ara, M. J. N. (2000). A Comporison of X2, RFA ve IRT Based Procedures in the Detection of DIF. Klıwer Academic Publishers. Netherlands. Quality & iquantity 34: 17-31
  • Bolt, S.E. & Ysseldyke, J.E. (2006). Comparingd DIF acrass Math and Reading/Language Arts Tests for Students Receiving a Read – Aloud Accommodation. Applied Measurement in Education. 19.(4), 329-355
  • Camilli, G. & Shepard, A., L., (1994). Methods for Identifying Biased Test Items. SAGE Publications. California.
  • ÇET, S. (2006). A Multıvarıate Analysıs In Detectıng Dıfferentıally Functıonıng Items Through The Use Of Programme For Internatıonal Student Assessment (Pısa) 2003 Mathematıcs Lıteracy Items. Yayınlanmamış doktora tezi, ODTÜ, Ankara
  • Doğan, N. & Öğretmen, T. (2006). Madde Yanlılığını Belirleme Teknikleri Arasında Bir Karşılaştırma. Eurasian Journal of Educational Research, 23, pp, 94-105.
  • Donoghue, J.R, Holland P.W. & Thayer, D.T. (1993). A Monte Carlo Study of Factors That Affect the Mantel-Haenszel and Standadization Measures of Differential Item Functioning. Differantial İtem Functioning. Edicted By J.R.Holland and H. Wainer. Lawrence Erlbaum Associaten. London.
  • Dorans. N.J. & Holland, P.W, (1992). DIF Detection and Desciription: Mantel-Haenszel and Standardizasion. In P.W.Holland and H. Wainer (Eds.). Differantial İtem Functioning.Lawrence Erlbaum.
  • Ellis, B. B., & Raju, N.S.(2003). Test and İtem Bias: What They Are, What They Aren’t, and How To Detect Them. Educational Resources information center (ERIC)
  • Embretson, S.E. & Reise, S.T, (2000). Item Perponse Theory For Psychologists. London: Lawrance Erlbaum Associates Publishers
  • Ercikan, K. (2002). Disentangling sources of Differential Item Functioning in Multilanguage Assessments. International Journal of Testing, 2(3&4), 199-215.
  • Ercikan, K., Gierl, M. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of bilingual versions of assessments: sources of incomparability of English and French versions of Canada’s national achievement tests. Applied Measurement in Education, 17, 301–321. Retrieved August 23 2011 from Web: http://dx.doi.org/10.1207/s15324818ame1703_4
  • Ercikan, K., Mc Creith, T. & Lapointe V. (2005). Factors Associated with Mathematics Achievement and Participation in Advanced Mathematics Courses: An Examination of Gender Differences From an International Perspective. Volume 105(1), Retrieved January 8 2007 from Web: http://qix.sagepub.com/ cgi/reprint/9/6/859
  • Ercikan, K. & Kim, K. (2005): Examining the Construct Comparability of the English and French Versions of TIMSS, International Journal of Testing, 5:1, 23-35. Retrieved August 19 2011 from Web: http://dx.doi.org/10.1207/s15327574ijt0501_3.
  • Gierl, M.J.(2000). Construct Equivalance on Translated Achievement Test. Canadian Journal of Education 25, 4 (2000), 280-296.
  • Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests. Journal of Educational Measurement, 38, 164–187.
  • Gierl, M., Khaliq, S. N. & Boughton, K. (1999). Gender Differential Item Functioning in Mathematics and Science: Prevalence and Policy Implications. Paper Presented at the Symposium entitled “Improving Large-Scale Assessment in Education” at the Annual Meeting of the Canadian Society for the Study of Education. Sherbrooke, Québec.
  • Gotzmann, A.J.(2002). The Effect of Large Ability Differences on Type I Error and Power Rates Using SIBTEST and TESTGRAF DIF Detection Procedures. Paper prepared at the Annual Meeting of American Educational Research Association. New Orleans.
  • Gotzmann, A., Wright, K. & Rodden, L.(2006). A Comparison of Power Rates for Items Favoring the Reference and Focal group for the Mantel-Haenszel and SIBTEST Procedures. Paper presented at the American Educational Research Association (AERA) in San Francisco, California.
  • Hambleton, R. K., Clauser, B. E., Mazor,K. M. & Jones, R. W. (1993). Advances in the detection of differentially functioning test items.European Journal of Psychological Assessment, 9(1), 1-18.
  • Hambleton, R. & Rodgers, J. (1995). Item bias review. Practical Assessment, Research & Evaluation, 4(6). Retrieved February 11, 2013 from http://PAREonline.net/getvn.asp?v=4&n=6).
  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.
  • Hambleton, R.K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of Item Response Theory. California:SAGE Publications.
  • Jöreskog, K. G. & Sörbon, D. (1993). LISREL 8: structural equation modeling with the SİMPLİS command language. Scientific software.
  • Kelloway, E. K. (1998). Using LISREL for structural equation modeling. London: Sage.
  • Le, L.T. (2009). Investigating Gender Differential Item Functioning Across Countries and Test Language of PISA Science Items.International Journal of Testing, 9: 122-133.
  • MEB (2007). PISA 2006 Ulusal Ön Rapor. MEB Eğitimi Araştırma ve Geliştirme Dairesi. Ankara.
  • MEB (2010). Pısa 2006 Projesi Ulusal Nihai Rapor. MEB Eğitimi Araştırma ve Geliştirme Dairesi Başkanlığı. Ankara.
  • OECD.(2005). PISA 2003 Data Analysis Manual . PISA 2003 Data Analysis Manual: SPSS® Users
  • Osterlind, S.J. (1983). Test Item Bias. Sage Publications, California
  • Reynolds, C., Livingston, R.B. & Wilson, V. (2006). Measurement and Assessment in Education. Boston: Pearson.
  • Shepard, L. A., Camili, G. & Williams, D. M. (1985). Validity of Approximation Techniques for Detecting Item Bias. Journal of Educatıonal Measurement. Volume 22, No. 2, Summer 1985, pp. 77-105
  • Skaggs, G. & Lissitz, R.W. (1992) The Consistency of Detecting İtem Bias Across Different Test Administrations: İmplications of Another Failure. Journal of Educational Measurement Fall 1992,vol.29, No.3, pp 227-242
  • Stout, W., Bolt, D. Froelich, A. G., Habing, B, Hartz, S. ve Roussos, L. (2003). Development of a SIBTEST Bundle Methodology for Improving Test Equity With Applications for GRE Test Development. Princeton, NJ: ETS.
  • Şimşek, Ö.F. (2007). Yapısal Eşitlik Modellemesine Giriş (Temel İlkeler ve LISREL Uygulamaları). Ankara: Ekinoks Yayınevi.
  • Tabachnick, B. G. & Fidell, L. S. (2007). Using Multivariate Statistics. Fifth Edition. Pearson: AB
  • Thissen, D. (2001). IRTLRDIF v.2.0b: Software For The Computation of The Statistics Involved In Item Response Theory Likelihood-Ratio Tests For Differential Item Functioning.
  • Thissen, D., Steinberg, L. & Wainer, H. (1993). Detection of differential Item Functioning Using the Parameters of Item Response Models. In P.W.Holland and H. Wainer (Eds.). Differantial İtem Functioning.Lawrence Erlbaum.
  • Waller, N.G. (2005). EZDIF: A Computer Program For Detecting Uniform And Nonuniform Differential Item Functioning With The Mantel-Haenszel And Logistic Regression Procedures. Retrieved September 25 2010 from Web: http://www.psych.umn.edu/faculty/ waller/downloads.htm
  • Welkenhuysen-Gybels, J. & Billiet, J. (2002). A Comparison of Techniques for Detecting Cross-Cultural Inequivalence at the Item Level. Kluwer Academic Publishers. 197–218. Netherlands.
  • Yıldırım, H. H & Berberoğlu, G. (2009). Judgmental and Statistical DIF Analyses of the PISA-2003 Mathematics Literacy Items. International Journal of Testing, 9: 108–121, 2009. Retrieved July 17 2011 from http://www.informaworld.com/smpp/title~content=t775653658
  • Yıldırım, H., Hüseyin. (2006).The Dıfferentıal Item Functıonıng (Dıf) Analysıs of Mathematıcs Items in The Internatıonal Assessment Programs. Yayınlanmamış doktora tezi, ODTÜ, Ankara.
  • Yıldırım, S. (2008). Farklı işleyen maddelerin belirlenmesinde sınırlandırılmış faktör çözümlemesinin olabilirlik–oranı ve Mantel-Haenszel yöntemleriyle karşılaştırılması [Comparison of Restricted-Factor Analysis With Likelihood Ratio and Mantel-Haenszel Methods in DİF Analyses]. H.Ü.eğitim Fakültesi Dergisi 34:297-307.
  • Zhou,j., Gierl, M. J. & Tan, X. (2005).Evaluating the Performance of SIBTEST and MULTISIB Using Different Matching Criteria. Retrieved September 25 2010 from www2.education.ualberta.ca/educ/psych/crame/files/ncme06_jz.pdf
  • Zumbo, B.D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • http://pisa2006.acer.edu.au/downloads.php
  • http://earged.meb.gov.tr