Bireye Uyarlanmış Çok Aşamalı Testlerde Madde Ön Bilgisinin Test Sonuçlarına Etkisinin Araştırılması

Bu çalışmanın amacı, bireye uyarlanmış çok aşamalı (BUÇAT) testi alan bireylerin madde ön bilgisini kullandıkları durumlarda yetenek seviyelerinin nasıl etkilendiğini ortaya çıkarmak ve bu durumun meydana getirmiş olduğu sonuçlar konusunda testi düzenleyenleri test güvenliğini arttırmak için ek önlemler almaya teşvik etmektir. BUÇAT’ta madde ön bilgi kullanımının istatistiksel sonuçlarını araştırmak için, null durumuna (madde ön bilginin kullanılmadığı) ek olarak üç farklı madde hırsızlığı senaryosu simülasyonla üretilmiştir. Bulgular, 30 maddelik ve 60 maddelik test uzunluğu koşullarında 1-3-3 BUÇAT panel tasarımı ile karşılaştırılmıştır. Madde hırsızlığı yapan 30 bireyin yetenek seviyeleri normal dağılımla üretilmiştir. Bireylerin ara ve final yetenek seviyeleri beklenen sonsal dağılım (EAP) ile hesaplanmıştır. Simülasyon sonuçları iki farklı istatistik grubuyla değerlendirilmiştir: (a) genel sonuçlar ve (b) koşullu sonuçlar. Genel istatistikler için, ortalama yanlılık (mean bias), ortalama kareler hatası (RMSE) ve hesaplanan ve doğru yetenek seviyeleri arasındaki korelasyon hesaplanmıştır. Bulgulara göre madde ön bilginin kullanılmasının öğrenci yetenek seviyelerini ciddi şekilde etkilediği ve risk altındaki (test sonrasında paylaşılan maddeler) maddelerin sayısının artmasıyla sonuçların daha da kötüleştiği görülmüştür. Madde paylaşımının ve / veya test hırsızlığının test puanlarına, test kullanımına ve puan yorumlarına ciddi şekilde zarar verdiği sonucuna varılmıştır.

Investigating Consequences of Using Item Pre-knowledge in Computerized Multistage Testing

The goal of this study is to determine the effects of test cheating in a scenario where test-takers use item pre-knowledge in the c-MST, and to urge practitioners to take additional precautions to increase test security. In order to investigate the statistical consequences of item pre-knowledge use in the c-MST, three different cheating scenarios were created, in addition to the baseline condition (e.g., no pre-knowledge usage). The findings were compared under 30-item and 60-item test length conditions with 1-3-3 c-MST panel design. A total of thirty cheaters were generated from a normal distribution, and EAP was used as an ability estimation method. The findings were discussed with the evaluation criteria of mean bias, root mean square error, correlation between true and estimated thetas, conditional absolute bias, and conditional root mean square. It was found that using item pre-knowledge severely affected the estimated thetas, and as the number of compromised items increased, the results got worse. It was concluded that item sharing and/or test cheating seriously damage the test scores, test usage, and score interpretations.

PDF

___

Armstrong, R. D., Jones, D. H., Li, X., & Wu, L. (1996). A study of a network-flow algorithm and a noncorrecting algorithm for test assembly. Applied Psychological Measurement, 20(1), 89-98.
Baker, F. (1992). Item response theory. New York, NY: Markel Dekker, INC.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick. In statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.
Bock, R. D., &Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied psychological measurement, 6(4), 431– 444.
Diao, Q., & van der Linden, W. J. (2011). Automated test assembly using lp_solve version 5.5 in R. Applied Psychological Measurement, 35(5) 398–409.
Foster, D. (2013). Security issues in technology-based testing. Handbook of test security, 39-83.
Guo, J., Tay, L., &Drasgow, F. (2009). Conspiracies and test compromise: An evaluation of the resistance of test systems to small-scale cheating. International Journal of Testing, 9(4), 283-309. ILOG. (2006).
ILOG CPLEX 10.0 [User’s manual]. Paris, France: ILOG SA.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Luecht, R. M., &Nungester, R. J. (1998). Some practical examples of computeradaptive sequential testing. Journal of Educational Measurement, 35(3), 229– 249.
Luecht, R. M. &Sireci, S. G. (2011). A review of models for computer-based testing. Research Report RR-2011–12. New York: The College Board.
McLeod, L., Lewis, C., &Thissen, D. (2003). A Bayesian method for the detection of item preknowledge in computerized adaptive testing. Applied Psychological Measurement, 27(2), 121–137.
Meijer, R. R. (1996). Person-Fit research: An introduction. Applied Measurement in Education, 9, 3-8.
Schnipke, D. L., & Reese, L. M. (1999). A Comparison [of] Testlet-Based Test Designs for Computerized Adaptive Testing. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.
Segall, D. O. (2004). A sharing item response theory model for computerized adaptive testing. Journal of Educational and Behavioral Statistics, 29(4), 439–460.
Team, R. (2016). RStudio: integrated development for R. RStudio, Inc., Boston, MA. Retrieved from http://www.rstudio.com.
Thissen, D., &Mislevy, R. J. (2000). Testing algorithms. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., pp. 101–133). Hillsdale, NJ: Lawrence Erlbaum.
Weiss, D. J., & Kingsbury, G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361–375.
Weissman, A., Belov, D.I., & Armstrong, R.D. (2007). Information-based versus number-correct routing in multistage classification tests. (Research Report RR07–05). Newtown, PA: Law School Admissions Council.
Wollack, J. A., Cohen, A. S., & Serlin, R. C. (2001). Defining error rates and power for detecting answer copying. Applied Psychological Measurement, 25(4), 385404.
Yan, D., von Davier, A. A., & Lewis, C. (Eds.). (2014). Computerized multistage testing: Theory and applications. CRC Press.
Yi, Q., Zhang, J., & Chang, H. H. (2006). Severity of Organized Item Theft in Computerized Adaptive Testing: An Empirical Study. ETS Research Report Series, 2006(2), i-25.
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment (Order No. 3136800).
Zopluoglu, C., & Davenport, E.C. (2012). The empirical power and type ı error rates of the gbt and omega ındices in detecting answer copying on multiple-choice tests. Educational and Psychological Measurement, 72(6), 975–1000.