Madde Tepki Kuramına Dayalı Gerçek Puan Eşitlemede Ölçek Dönüştürme Yöntemlerinin İncelenmesi

Bu araştırmada, Madde Tepki Kuramı’na (MTK) dayalı gerçek puan eşitlemede, ölçek dönüştürme yöntemlerinin (ortalama-ortalama (OO), ortalama-standart sapma (OS), Stocking-Lord (SL), Haebara (HB)) farklı koşullar altında eşitleme hatalarının karşılaştırılması amaçlanmıştır. Araştırmanın amacı doğrultusunda, yöntemlerin hatalarını karşılaştırmak için örneklem büyüklüğü (500, 1000, 3000, 10000), test uzunluğu (40, 50, 80), ortak madde oranı (%20-%30-%40), parametre kestirim modeli (iki ve üç parametreli lojistik model (2PLM ve 3PLM)) ve grupların yetenek dağılımı (benzer (N(0-1) - N(0-1)), farklı (N(0-1) - N(0.5,1)) koşulları altında 2PLM ve 3PLM’ye uyumlu iki kategorili 50 yineleme ile 7200 veri seti oluşturulmuştur. Veri toplamı deseni olarak “denk olmayan gruplarda ortak madde/test (NEAT) deseni” kullanılmıştır. Veri üretiminde ve analizinde R yazılımı kullanılmıştır. Araştırmadan elde edilen bulgular, eşitleme hatası (RMSD) ölçütüne göre değerlendirilmiştir. Çalışmanın sonucunda, tüm koşullar göz önünde bulundurulduğunda, SL yönteminin RMSD değerlerinin, diğer yöntemlere göre daha yüksek olduğu görülmekle birlikte, OO ve OS yöntemlerinin birbirine benzer RMSD değerleri ürettiği görülmüştür. Ayrıca, ölçek dönüştürme yöntemlerine ilişkin RMSD değerleri karşılaştırıldığında, 2PLM ve 3PLM’nin kullanıldığı durumlarda benzer sonuçlar elde edildiği, örneklem büyüklüğü ve test uzunluğu arttıkça SL yöntemi dışında diğer yöntemlerin eşitleme hatalarında azalma oluştuğu ve ortak madde oranının %40 ve grupların yetenek dağılımının benzer olduğu durumlarda, yöntemlerin, RMSD değerlerinin daha düşük olduğu gözlenmiştir.

Investigation of Scale Transformation Methods in True Score Equating Based on Item Response Theory

In this research, it was aimed to compare equating errors of scale transformation methods (mean-mean (MM), mean-sigma (MS), Heabera (HB) and Stocking-Lord (SL)) in true score equating based on item response theory (IRT) under different conditions. In line with the purpose of the study, 7200 dichotomous data sets which were consistent with two and three- parameter logistic model were generated with 50 replication under the conditions of sample size (500, 1,000, 3,000, 10,000), test length (40, 50, 80), rate of the common item (20%, 30%, 40%), type of model used in parameter estimation (two and three-parameter logistic models (2PLM and 3PLM)), and ability distribution of groups (similar (N(0-1) - N(0-1)), different (N(0-1) - N(0.5,1)) for the obtained performance of methods. Common item nonequivalent groups equating design was used. R software was used for data generation and analyses. The results obtained from the study were evaluated by using equating error (RMSD) criterion. As a result of the study, considering all the conditions, it was seen that the RMSD values of the SL method were higher than the other methods, but it was seen that the MM and MS methods produced similar RMSD values. In addition, when the RMSD values of the scale transformation methods are compared, similar results are obtained in cases where 2PLM and 3PLM are used, as the sample size and test length increase, equating errors of other methods except the SL method decrease, and It was observed that the methods had lower RMSD values in cases where the common item rate is 40% and the ability distribution of the groups is similar.

___

  • Aksekioğlu, B. (2017). Madde tepki kuramına dayalı test eşitleme yöntemlerinin karşılaştırılması: PISA 2012 Fen testi örneği (Tez Numarası: 454879) [Yüksek lisans tezi, Akdeniz Üniversitesi]. Yükseköğretim Kurulu Ulusal Tez Merkezi. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Eds.), Educational measurement (2nd ed., pp. 508-600). American Council on Education.
  • Angoff, W. H. (1984). Scales, norms and equivalent scores. Educational Testing Service.
  • Arai, S., and Mayekawa, S. (2011). A comparison of equating methods and linking designs for developing an item pool under item response theory. Behaviormetrica, 38(1), 1-16. https://link.springer.com/article/10.2333/bhmk.38.1
  • Babcock, B., Albano, A., and Raymond, M. (2012). Nominal weights mean equating: A method for very small samples. Educational and Psychological Measurement, 72(4), 608–628. https://doi.org/10.1177/0013164411428609
  • Baker, F. B., and Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147- 162. http://www.jstor.org/stable/1434796
  • Barnard, J. J. (1996). In search of equity in educational measurement: traditional versus modern equating methods. [Paper presentation]. ASEESA’s National Conference at the HSRC Conference Center 1996 Annual Meeting, Pretoria, South Africa.
  • Bastari, B. (2000). Linking multiple-choice and constructed-response items to a common proficiency scale [Doctoral dissertation, Massachusetts Institute of Technology]. https://scholarworks.umass.edu/dissertations_1/5557
  • Brossman, B. G., and Lee. W-C. (2013). Observed score and true score equating procedures for multidimensional item response theory. Applied Psychological Measurement, 37(6), 460–481. https://doi.org/10.1177/0146621613484083
  • Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test. Journal of Educational Measurement, 22(1), 13-20. https://www.jstor.org/stable/1434562
  • Caldwell, L. J. (1984). A comparison of equating error in lineer and rasch model test equating method [Unpublished doctoral dissertation]. Florida State University.
  • Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets (Publication No. 3341415) [Doctoral dissertation, University of Maryland-Maryland]. ProQuest Dissertations and Theses Global.
  • Chen, H. W. (2001). Calibration of the ITBS test battery to the complete test battery: A comparison five linking methods (Publication No. 3009576) [Doctoral dissertation, University of Iowa-Iowa]. ProQuest Dissertations and Theses Global.
  • Cho, Y. (2007). Comparison of bookstrap standard errors of equating using IRT and equipercentile methods with polytomously-scored items under the commonitem-nonequivalent-group design (Publication No. 3301690) [Doctoral dissertation, University of Iowa-Iowa]. ProQuest Dissertations and Theses Global.
  • Chon, K. H., Lee, W. C., and Ansley, T. N. (2007). Assessing IRT model-data fit for mixed format tests. (No.26). Center for Advanced Studies in Measurement and Assessment. https://www.semanticscholar.org/paper/Number-26-Assessing IRT-Model-Data-Fit-for-Mixed-%E2%88%97-Chon Lee/49c57e474a54beed3010ab0f2af64985ce6ddb50
  • Chu, K-L. (2002). Equivalent group test equating with the presence of differantial item functioning (Publication No. 3065477) [Doctoral dissertation, The Florida State University-Florida]. ProQuest Dissertations and Theses Global.
  • Cook, L. L., and Eignor, D. R. (1991). An NCME instructional module on IRT equating methods. Educational measurement: Issues and Practice. 10 (3), 37- 45. https://eric.ed.gov/?id=EJ436860
  • Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory. Harcourt Brace Jovanovich College.
  • Cui, Z., and Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334-347. https://doi.org/10.1177/0146621607300854
  • Dorans, N. J. (1990). Equating methods and sampling designs. Applied Measurement in Education, 3(1), 3-17. https://doi.org/10.1207/s15324818ame0301_2
  • Dorans, N. J., and Holland P. W. (2000). Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37(4), 281-306. https://doi.org/10.1111/j.1745-3984.2000.tb01088.x
  • Dorans, N. J., Moses, T. P., and Eignor, D. R. (2010). Principles and practices of test score equating. (No.41). Educational Testing Service. https://www.ets.org/research/policy_research_reports/publications/report/2010/ ilrs
  • Drasgow, F., Levine, M. V., Tsien, S., Williams, B., and Mead, A. D. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19(2), 143– 165. https://doi.org/10.1177/014662169501900203
  • Eid, G. K. (2005). The effects of sample size on the equating of test items. Education, 126(1), 165-180. https://www.thefreelibrary.com/The+effects+of+sample+size+on+the+equatin g+of+test+items.-a0136846803
  • Felan, G. D. (2002, February 14-16). Test equating: Mean, linear, equipercentile and item response theory. [Paper presentation]. Southwest Educational Research Association 2002 Annual Meeting, Austin, TX, United States.
  • Fraenkel, J. R., Wallen, N. E., and Hyun, H. H. (2012). How to design and evaluate research in education. McGraw-Hill.
  • Godfrey, K. E. (2007). A comparison of Kernel equating and IRT true score equating methods (Publication No. 3273329) [Doctoral dissertation, The University of North Carolina-Chapel Hill]. ProQuest Dissertations and Theses Global.
  • González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59(7), 1-30. https://www.jstatsoft.org/index
  • Gök, B., ve Kelecioğlu, H. (2014). Denk olmayan gruplarda ortak madde deseni kullanılarak madde tepki kuramına dayalı eşitleme yöntemlerinin karşılaştırılması. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 10(1), 120-136. https://dergipark.org.tr/tr/pub/mersinefd/issue/17393/181786
  • Gül, E., Doğan-Gül, Ç., Çokluk-Bökeoğlu, Ö. ve Özkan, M. (2017). Temel eğitimden ortaöğretime geçiş matematik alt testi asıl sınav ve mazeret sınavlarının madde tepki kuramına göre eşitlenmesi. Abant İzzet Baysal Üniversitesi Eğitim Fakültesi Dergisi, 17(4), 1900-1915. https://dergipark.org.tr/tr/pub/aibuefd/issue/32772/363973
  • Gündüz, T. (2015). Test eşitlemede Madde Tepki Kuramına dayalı yetenek parametresine yönelik ölçek dönüştürme yöntemlerinin karşılaştırılması] (Tez Numarası: 429524) [Yüksek lisans tezi, Gazi Üniversitesi]. Yükseköğretim Kurulu Ulusal Tez Merkezi. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144- 149. https://doi.org/10.4992/psycholres1954.22.144
  • Hagge, S. L. (2010). The impact of equating method and format representation of common items on the adequacy of mixed-format test equating using nonequivalent groups (Publication No. 3422144) [Doctoral dissertation, University of Iowa, Iowa]. ProQuest Dissertations and Theses Global.
  • Hambleton, R. K., Swaminathan, H., and Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
  • Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates. (Publication No. 3325324) [Doctoral dissertation, University of Massachusetts, Amherst]. ProQuest Dissertations and Theses Global.
  • Han, T., Kolen, M. J., and Pohlmann, J. (1997). A comparison among IRT true- and observed score equating and traditional equipercentile equating. Applied Measurement in Education, 10(2), 105-121. https://doi.org/10.1207/s15324818ame1002_1
  • Hanson, B. A., and Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3– 24. https://doi.org/10.1177/0146621602026001001
  • Harris, D. J., and Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6(3), 195–240. https://doi.org/10.1207/s15324818ame0603_3
  • Harris, D. J., and Kolen, M. J. (1986). Effect of examinee group on equating relationships. Applied Psychological Measurement, 10(1), 35-43. https://doi.org/10.1177/014662168601000103
  • Harwell, M., Stone, C. A., Hsu, T. C., and Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20 (2), 101-125. https://doi.org/10.1177/014662169602000201
  • He, Q. (2010). Maintaining standards in on-demand testing using item response theory (No.10/4724). Office of Qualifications and Examinations Regulation. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/att achment_data/file/605861/0210_QingpingHe_Maintaining-standards.pdf
  • He, Y. (2011). Evaluating equating properties for mixed-format tests (Publication No. 3461151) [Doctoral dissertation, University of Iowa, Iowa City]. ProQuest Dissertations and Theses Global.
  • Holland, P. W., and Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Eds.), Educational Measurement (4th ed., pp. 187-220). Praeger.
  • Holland, P. W., Dorans, N. J., and Petersen, N. S. (2007). Equating test scores. In C. R. Rao and S. Sinharay (Eds.), Handbook of statistics (pp. 169-203). Elsevier. https://doi.org/10.1016/S0169-7161(06)26006-1
  • Hills, J. R., Subhiyah, R. G., and Hirsch, T. M. (1988). Equating minimumcompetency tests: comparisons of methods. Journal of Educational Measurement, 25(3), 221- 231. https://www.jstor.org/stable/1434501
  • Hu, H., Rogers, T. W., and Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presenceof outlier common items. Applied Psychological Measurement, 32(4), 311-333. https://doi.org/10.1177/0146621606292215
  • Ironson, G. H. (1983). Using item response theory to measure bias. In R.K. Hambleton (Eds.), Applications of item response theory (2nd ed., pp. 155–174). Educational Research Institute of British Columbia.
  • Kang, T., and Petersen, N. S. (2009, April, 14-16). Linking item parameters to a base scale. [Paper presentation]. National Council on Measurement in Education 2009 Annual Meeting, San Diego, CA, United States.
  • Karkee, T. B., and Wright, K. R. (2004, April, 12-16). Evaluation of linking methods for placing three parameter logistic item parameter estimates onto a oneparameter scale. [Paper presentation]. American Educational Research Association 2004 Annual Meeting, San Diego, CA, United States.
  • Kaskowitz, G. S., and De Ayala, R. J. (2001). The effect of error in item parameter estimates on the test response function method of linking. Applied Psychological Measurement, 25(1), 39-52. https://doi.org/10.1177/01466216010251003
  • Kolen, M. J. (1981). Comparison of traditional and Item Response Theory methods for Equating Tests. Journal of Educational Measurement, 18(1), 1-11. https://www.jstor.org/stable/1434813
  • Kolen, M. J. (1985). Standard errors of tucker equating. Applied Psychological Measurement, 9(2), 209-223. https://doi.org/10.1177/014662168500900209
  • Kolen, M. J. (1988). An NCME instructional module on traditional equating methodology. Educational Measurement: Issues and Practice, 7, 29-36. https://eric.ed.gov/?id=EJ388096
  • Kolen, M. J. (2007). Data collection designs and linking procedures. In N. J. Dorans, M. Pommerich, and Holland, P. W. (Eds.), Linking and aligning scores and scales (2nd ed. pp. 31-55). Springer. https://doi.org/10.1007/978-0-387- 49771-6_3
  • Kolen, M. J., and Brennan, R. L. (1995). Test Equating: methods and practices. Springer Verlag.
  • Kolen, M. J., and Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. Springer.
  • Kilmen, S. (2010). Madde tepki kuramına dayalı test eşitleme yöntemlerinden kestirilen eşitleme hatalarının örneklem büyüklüğü ve yetenek dağılımına göre karşılaştırılması (Tez Numarası: 279926) [Yüksek lisans tezi, Ankara Üniversitesi]. Yükseköğretim Kurulu Ulusal Tez Merkezi. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kim, S., and Cohen, A. S. (1998). A comprasion of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22(2), 131- 143. https://doi.org/10.1177/01466216980222003
  • Kim, S., and Hanson, B. A. (2002). Test equating under the multiple-choice model. Applied Psychological Measurement, 26(3), 255-270. https://doi.org/10.1177/0146621602026003002
  • Kim, S., and Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381. https://doi.org/10.1207/s15324818ame1904_7
  • Kim, S., and Lee, W. C. (2004). IRT scale linking methods for mixed-format tests. (No.5). American College Testing. https://www.act.org/content/dam/act/unsecured/documents/ACT_RR2004- 5.pdf
  • Kim, S., and Lee, W. (2006). An extension of four IRT linking methods for mixedformat tests. Journal of Educational Measurement, 43(1), 53-76. https://www.jstor.org/stable/20461809
  • Lee, Y. S. (2007). A comparison of methods for nonparametric estimation of item characteristic curves for binary items. Applied Psychological Measurement, 31(2), 121-134. https://doi.org/10.1177/0146621606290248
  • Lee, W. C., and Ban, J. C. (2009). Comparison of IRT linking procedures. Applied Measurement in Education, 23(1), 23-48. https://doi.org/10.1080/08957340903423537
  • Lee, G., and Fitzpatrick, A. R. (2008). A new approach to test score equating using item response theory with fixed c-parameters. Asia Pacific Education Review, 9(3), 248–261. https://www.springer.com/journal/12564
  • Li, D. (2009). Developing a common scale for testlet model parameter estimates under the common-item nonequivalent groups design (Publication No. 3359398) [Doctoral dissertation, University of Maryland, Maryland]. ProQuest Dissertations and Theses Global.
  • Li, Y. H., and Lissitz, R. W. (2000). An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement, 24 (2), 115-138. https://doi.org/10.1177/01466216000242002
  • Liou, M., Cheng, P. E., and Johnson, E. G. (1997). Standard errors of the Kernel equating methods under the common-item design. Applied Psychological Measurement, 21 (4), 349-369. https://doi.org/10.1177/01466216970214005
  • Livingston, S. A., and Kim, S. (2010). Random-Groups equating with samples of 50 to 400 test takers. Journal of Educational Measurement, 47(2), 175–185. https://www.jstor.org/stable/20778946
  • Lord, F. M. (1983). Statistical bias in maximum likelihood estimators of item parameters. Psychometrika, 48(3), 477-482. https://doi.org/10.1007/BF02293684
  • Lord, F. M., and Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score equatings. Applied Psychological Measurement, 8(4),453-461. https://doi.org/10.1177/014662168400800409
  • Loyd, B. H., and Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179-193. https://www.jstor.org/stable/1434833
  • Marco, G. L. (1977). Item Characteristic Curve Solutions to Three Intractable Testing Problems. Journal of Educational Measurement, 14(2), 139-160. http://www.jstor.org/stable/1434012
  • Meng, Y. (2012). Comparison of Kernel equating and item response theory equating methods (Publication No. 3518262) [Doctoral dissertation, University of Massachusetts, Amherst]. ProQuest Dissertations and Theses Global.
  • Michaelides, M. P. (2003, April, 21-25). Sensitivity of IRT equating to the behavior of test equating items. [Paper presentation]. American Educational Research Association 2003 Annual Meeting, Chicago, Illinois, United States.
  • Mohandas, R. (1996). Test equating, problems and solutions: Equating English test forms for the Indonesian junior secondary schoool final examination administered in 1994 [Doctoral dissertation, Flinders Institute of Technology]. https://flinders-primo.hosted.exlibrisgroup.com/primoexplore/search?vid=FUL&lang=en_US
  • Ngudgratoke, S. (2009). An investigation of using collateral information to reduce equating biases of the post-stratification equating method (Publication No. 3381312) [Doctoral dissertation, Michigan State University-Michigan]. ProQuest Dissertations and Theses Global.
  • Norman-Dvorak, R. K. (2009). A comparison of Kernel equating to the test characteristic curve methods (Publication No. 3350452) [Doctoral dissertation, University of Nebraska-Lincoln]. ProQuest Dissertations and Theses Global.
  • Nozawa, Y. (2008). Comparison of parametric and nonparametric IRT equating methods under the common-item nonequivalent groups design (Publication No. 3347237) [Doctoral dissertation, The University of Iowa-Iowa City]. ProQuest Dissertations and Theses Global.
  • Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51(1), 1-23. https://www.researchgate.net/publication/241025868
  • Partchev, I. (2016). Package “irtoys”. (Version 0.2.0). https://cran.rproject.org/web/packages/irtoys/irtoys.pdf
  • Petersen, N. S., Cook, L. L., and Stocking, M. L. (1983). IRT versus conventional equating methods: a comparative study of scale stability. Journal of Educational Statistics, 8(2), 137-156. https://www.jstor.org/stable/1164922
  • Petersen, N. S., Kolen, M. J., and Hoover, H. D. (1993). Scaling, norming and equating. In Linn, R. L. (Eds.) Educational measurement ( 2nd. pp. 221-262). The Oryx.
  • Rizopoulos, D. (2015). Package “ltm”. https://cran.rproject.org/web/packages/ltm/ltm.pdf Ryan, J., and Brockmann, F. (2009). A prictitioner’s introduction to equating. https://files.eric.ed.gov/fulltext/ED544690.pdf
  • Sarkar, D. (2017). Package “lattice”. https://cran.rproject.org/web/packages/lattice/lattice.pdf
  • Skaggs, G. (1990). To match or not to match samples on ability for equating: A discussion of five articles. Applied Measurement in Education, 3 (1), 105- 113. https://doi.org/10.1207/s15324818ame0301_8
  • Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(2),309–330. https://doi.org/10.1111/j.1745-3984.2005.00018.x
  • Skaggs, G., and Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495-529. https://doi.org/10.3102/00346543056004495
  • Speron, E. (2009). A comparison of metric linking procedures in item response theory (Publication No. 3370885) [Doctoral dissertation, University of Illinois-Illinois]. ProQuest Dissertations and Theses Global.
  • Spence, P. D. (1996). The effect of multidimensionality on unidimensional equating with item response theory (Publication No. 9703612) [Doctoral dissertation, University of Florida-Florida]. ProQuest Dissertations and Theses Global.
  • Stocking, M. L., and Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201-210. https://doi.org/10.1177/014662168300700208
  • Sinharay, S., and Holland, P. W. (2008). The missing data assumptions of the nonequivalent groups with anchor test (neat) design and their implications for test equating (No.09-16). Educational Testing Service. https://files.eric.ed.gov/fulltext/ED507841.pdf
  • Qu, Y. (2007). The effect of weighting in Kernel equating using counter-balanced designs (Publication No. 3282191) [Doctoral dissertation, Michigan State University-East Lansing]. ProQuest Dissertations and Theses Global.
  • Tate, R. (2000). Performance of a proposed method for he linking of mixed-format tests with constructed response and multiple choice items. Journal of Educational Measurement, 37(4), 329-346. http://www.jstor.org/stable/1435244
  • Tsai, T. H. (1997,March, 24-27). Estimating minumum sample sizes in random groups equating. [Paper presentation]. National Council on Measurement in Education Association 1997 Annual Meeting,Chicago, Illinois, United States.
  • Tsai, T. H., Hanson, A. B., Kolen, J. M., and Forsyth, A. R. (2001). A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design. Applied Measurement in Education, 14(1), 17–30. https://doi.org/10.1207/S15324818AME1401_03
  • Uysal, İ. (2014). Madde Tepki Kuramı’na dayalı test eşitleme yöntemlerinin karma modeller üzerinde karşılaştırılması (Tez Numarası: 370226) [Yüksek lisans tezi, Abant İzzet Baysal Üniversitesi]. Yükseköğretim Kurulu Ulusal Tez Merkezi. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • von Davier, A. A. (2008). New results on the linear equating methods for the nonequivalent groups design. Journal of Educational and Behavioral Statistics, 33(2), 186-203. https://www.jstor.org/stable/20172112
  • von Davier, A. A. (2010). Statistical Models For Test Equating, Scaling and Linking. Springer.
  • von Davier, A. A., and Wilson, C. (2007). IRT true-score test equating: A guide through assumptions and applications. Educational and Psychological Measurement, 67(6), 940-957. https://doi.org/10.1177/0013164407301543
  • Walker, M. E., and Kim, S. (2010, April). Linking mixed-format tests using multiple choice anchors. [Paper presentation]. National Council on Measurement in Education Association 2010 Annual Meeting, San Diego, CA, United States.
  • Wang, X. (2012). Effect of simple size on IRT equating of uni-dimentional tests in common item non-equivalent group design: a monte carlo simulation study [Doctoral dissertation, Virginia Institute of Technology]. https://vtechworks.lib.vt.edu/handle/10919/37555
  • Way, W. D., and Tang, K. L. (1991, April, 3-7). A comparison of four logistic model equating methods. [Paper presentation]. American Educational Research Association 1991 Annual Meeting, Chicago, Illinois, United States.
  • Weeks, J. P. (2010). Plink: An R package for linking mixed-format tests using IRTbased methods. Journal of Statistical Software, 35(12), 1-33. https://www.jstatsoft.org/article/view/v035i12
  • Wu, N., Huang, C-Y., Huh, N., and Harris, D. (2009, April, 12-13). Robustness in using multiple choice iteOS as an external anchor for constructed-response test equating. [Paper presentation]. National Council on Measurement in Education Association 2009 Annual Meeting, San Diego, CA, United States.
  • Yang, W. L. (1997). The effects of content homogeneity and equating method on the accuracy of common item test equating (Publication No. 9839718) [Doctoral dissertation, Michigan State University-Michigan]. ProQuest Dissertations and Theses Global.
  • Yang, W. L., and Houang, R. T. (1996, April, 11-13). The effect of anchor length and equating method on the accuracy of test equating: comparisons of linear and IRT-based equating using an anchor-item design. [Paper presentation]. American Educational Research Association 1996 Annual Meeting, New York City, New York, United States.
  • Zeng, L. (1991). Standard errors of linear equating for the single-group design. (No.91-4). American College Testing. https://www.act.org/content/dam/act/unsecured/documents/ACT_RR91-04.pdf
  • Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. (Publication No.3337019) [Doctoral dissertation, University of Massachusett-Amherst]. ProQuest Dissertations and Theses Global.