Bireyselleştirilmiş Bilgisayarlı Test Uygulamalarında Farklı Sonlandırma Kurallarının Ölçme Kesinliği ve Test Uzunluğu Açısından Karşılaştırılması1

Bireyselleştirilmiş testlerde, geleneksel testlerden farklı olarak test algoritması söz konusudur. Test algoritması; teste başlama, devam etme ve testi sonlandırma olmak üzere üç bölümden oluşmaktadır. Bu çalışmanın amacı, bireyselleştirilmiş bilgisayarlı test (BBT) uygulamalarında farklı sonlandırma kurallarının kullanılmasının ölçme kesinliğine ve test uzunluğuna etkisini incelemek ve birbirleri ile karşılaştırmaktır. Araştırma simülasyon çalışması olarak yürütülmüştür. Araştırma kapsamında sabit uzunluk, standart hata, standart hata-en az madde, theta yakınsama ve theta yakınsama-en az madde olmak üzere beş farklı sonlandırma kuralı kullanılmıştır. Her bir sonlandırma kuralında farklı koşullar söz konusu olup toplam 12 koşul birbiriyle karşılaştırılmıştır. Ayrıca sonlandırma kurallarının karşılaştırılmasında BBTde test algoritmasında önemli yere sahip olan farklı madde havuzu büyüklükleri (250 ve 500 madde) ve yetenek kestirim yöntemleri (Maksimum Olabilirlik Kestirimi ve Beklenen Sonsal Dağılım) seçilmiştir. Her bir BBT uygulamasında ölçme kesinliği için RMSE, yanlılık ve uyum değerleri hesaplanmış ve test uzunlukları elde edilip, birbirleriyle karşılaştırılmıştır. Araştırmanın sonucunda, genel olarak 20 madde sabit uzunluk, 0,220 standart hata ve 0,02 theta yakınsama sonlandırma koşullarında RMSE, yanlılık değerlerinin düşük elde edildiği ancak uyum katsayılarının önemli oranda etkilenmediği belirlenmiştir. Ayrıca en az madde koşulunun eklenmesi ile bazı sonlandırma koşulları ölçme kesinliği açısından daha iyi sonuçlar vermiştir. Ortalama test uzunluk değerlerinin RMSE değerleri ile ters yönde değiştiği bulunmuştur. Aynı sonlandırma koşullarında madde havuzu büyüklüğünün artması ile ölçme kesinliği için elde edilen RMSE ve yanlılık değerlerinin genel olarak daha düşük elde edilmiştir. Yetenek kestirim yöntemi olarak Beklenen Sonsal Dağılım yönteminin kullanılmasının RMSE ve yanlılık değerlerinde düşmeye neden olduğu belirlenmiştir.

Comparison Of Different Test Termination Rules in Terms of Measurement Precision and Test Length in Computerized Adaptive Testing

In adaptive testing, there exists a test algorithm different than the classical tests. The test algorithm consists of three parts which are starting, resuming and termination. The aim of this study is to measure the effect of different termination rules on measurement precision and test length in computer adaptive testing. The research was implemented as a simulation study. Five different termination rules have been used for the study which are: fixed length, standard error, standard error-least item, theta convergence and theta convergence-least item. Different conditions are in place in each termination rule and a total of 12 conditions are compared. Additionally, in comparing termination rules, different item pools (250 and 500) and ability estimation methods (Maksimum Likelihood Estimation and Expected a Posteriori) have been selected since these are critical in the algorithms of Computer Adaptive Testing. RMSE, bias and fidelity values were calculated for the measurement precision and test lengths were obtained and compared for each of the CAT implementation. As a result, for the 20 item fixed length, 0,220 standard error and 0,02 theta convergence termination conditions RMSE and bias values are small but fidelity factors are not significantly affected. And with the addition of the least item factor, better results were achieved in some of the termination conditions in terms of measurement precision. The test length is observed to be negatively correlated with the RMSE values. In the same termination conditions, with the increases in item pool generally smaller RMSE and bias values were for measurement precision were achieved. Not a significant change was observed in the evaluation of the effect of the starting rules. The preference of Expected a Posteriori method for the ability estimation is observed to cause a drop in values for RMSE and bias values.

PDF

___

Babcock, B. and Weiss, D.J., 2012. Termination criteria in Computerized Adaptive Tests: do variable-length CATs provide efficient and effective measurement? International Association for Computerized Adaptive Testing, 1, 1-18.
Blais, J. and Raiche, G., 2002. Features of the sampling distribution of the ability estimate in Computerized Adaptive Testing according to two stopping rules, International Objective Measurement Workshop, New Orleans, April 2002.
Choi, S. W., Grady, M.W. and Dodd, B.G., 2011. A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 71, 37-53.
Çıkrıkçı, Demirtaşlı, N., 1999. Psikometride yeni ufuklar: bilgisayar ortamında bireye uyarlanmış test. Türk Psikoloji Bülteni. 5(13), 31-36.
Dodd, B.G., Koch, W.R. and de Ayala, R.J., 1993. Computerized Adaptive Testing using the partial credit model effects of item pool characteristics and different stopping rules. Educational and Psychological Measurement, 53, 61-77.
Embretson, E. and Reise, S. P., 2000. Item response theory for psychologist principles and application. London: Lawrence Erlbaum Assc.
Eggen,T., 2004. Contributions to the theory and practice of Computerized Adaptive Testing. Druk: Print Partners Ipskamp B.V., Enschede.
Evans, J. J., 2010. Comparability of examinee proficiency scores on Computer Adaptive Tests using real and simulated data. Doctoral Dissertation. The State University of New Jersey.
Flaugher, R., 2000. Item pools. In H. Wainer (Eds.), Computerized Adaptive Testing, 37-59. London: Lawrence Erlbaum Assc.
Hambleton, R. K. and Swaminathan, H., 1985. Item response theory. principles and application. Boston: Kluwer-Nijhoff.
Hambleton, R. K., Swaminathan, H. and Rogers,H. J., 1991. Fundamentals of item response theory. California: Sage Publications.
Han, K. C., 2011. User's Manual: SimulCAT. Graduate Management Admission Council.
Ivei, J. L., 2007. Test taking strategies in Computer Adaptive Testing That will ımprove your score: fact or fiction? Doctoral Dissertation. University of Michigan.
İseri, A. I., 2002. Assessment of students' mathematics achievement through Computer Adaptive Testing procedures. Middle East Technical University.
Kalender, İ., 2004. Bilgisayar ortamında Bireyselleştirilmiş Testlerin eğitimde kullanımı. XIII. Ulusal Eğitim Bilimleri Kurultayı. İnönü Üniversitesi, Eğitim Fakültesi, Malatya. 6-9 Temmuz 2004.
Kalender, İ., 2011. Effects of different Computerized Adaptive Testing strategied on recovery of abilitiy. Doctoral Disertation. Middle East Technical University.
Kaptan, F., 1993. Yetenek kestiriminde Adaptive (Bireyselleştirilmiş) Test uygulaması ile geleneksel kâğıt-kalem testi uygulamasının karşılaştırılması. Doktora Tezi. Hacettepe Üniversitesi.
Karasar, N., 2004. Bilimsel araştırma yöntemi: kavramlar, ilkeler, teknikler (13. Baskı). Ankara: Nobel Yayın Dağıtım.
Köklü, N., 1990. Klasik test teorisine göre geliştirilen tailored test ile grup testi arasında bir karşılaştırma. Doktora Tezi. Hacettepe Üniversitesi.
Linden, W. J and Glas, G. A. W., 2002. Computerized Adaptive Testing: theory and practice. USA: Kluwer Academic Publishers. Linecra, J. M., 2000. Computer-Adaptive Testing: a methodology whose time has come, [Çevrim-içi: http://www.rasch.org/memo69.htm], erişim tarihi: 02 Mayıs 2012.
McBride, J. R. and Martin, J. T., 1983. Reliability and validity of adaptive ability tests in a militarysetting. In D. J. Weiss (Eds.), New horizons in testing: Latent trait theory and computerized adaptive testing, 223226). New York: Academic Press.
McBride, J. R., Wetzel,C.D. and Hetter, R. D., 2001. Preliminary psychometric research for CAT-ASVAB: selecting an adaptive testing strategy. In W. Sands, B. K. Waters, and J.R.McBride (Eds.). Computerized Adaptive Testing: from inquiry to operation, 8395. Washington, DC: American Psychological Association.
Mead, A. D. and Drasgow, F., 1993. Equivalence of computerized and paper-andpencil cognitive ability tests: a meta-analysis. Psychological Bulletin, 114, 449-458.
Meijer, R. R. & Nerring, M. L., 2001. New development in the area of Computerized Testing. Psychologie Francaise, 46(3), 221-230.
Mills, C. N. and Stocking, M.L., 1996. Practical issues in large-scale Computerized Adaptive Testing. Applied Mesurement in Education, 9(4), 287-304.
Pearson Asssessment., 2012. [Çevrim-içi: www.pearsonassessment.com] Erişim tarihi: 02.05.12.
Riley, B. B., Conrad, K. J., Bezruczko, N. and Dennis, M., 2007. Relative precision, efficiency and construct validity of different starting and stopping rules for a Computerized Adaptive Test: the GAIN substance problem scale. Journal of Applied Measurement, 8(1).
Rudner, L. M., 1998. Interactive Computer Adaptive Testing, [Çevrim-içi: http://EdRes.org/scripts/cat], Erişim Tarihi: 02.05.12.
Samejima, F., 1977. A method of estimating item characteristic functions using the maximum likelihood estimate of ability. Psychometrika, 42(2), 163-191.
Segall, D. O., 2004. Computerized Adaptive Testing. In Kempf-Leanard (Eds.). The Encyclopedia of Social Measurement, 429-438. San Diego, CA: Academic Press.
Simms, L. J. and Clark, L. A., 2005. Validation of a computerized adaptive version of the schedule for non-adaptive and adaptive personality (SNAP). Psychological Assessment, 17, 28-43.
Stocking, M. L., 1987. Two Simulated feasibility studies in Computerized Adaptive Testing. Applied Psychology: An International Review, 36, 263-267.
Thissen, D. and Mislevy, R. J., 2000. Testing algorithms. In H. Wainer (Eds.). Computerized Adaptive Testing, 101-135. London: Lawrence Erlbaum Assc.
Wainer, H., 2000. Computerized Adaptive Testing. London: Lawrence Erlbaum Assc.
Wainer, H. and Mislevy, R. J., 2000. Item response theory, item calibration and proficiency estimation. In H. Wainer (Eds.). Computerized Adaptive Testing. 65-102. London: Lawrence Erlbaum Assc.
Wang, T., Hanson, B. A. and Lau, C., 1999. Reducing bias in CAT ability estimation: a comparison of approaches. Applied Psychological Measurement, 23, 263-278.
Wang, T. and Vispoel, W. P., 1998. Properties of ability estimation methods Computerized Adaptive Testing. Journal of Educational Measurement, 35 (2), 109-135. Wang, S. and Wang, T., 2001. Precision of warms weighted likelihood estimates for a polytomous model in Computerized Adaptive Testing. Applied Psychological Measurement, 25(4), 317331.
Weiss, D. J., 1982. Improving measurement quality and efficiency with Adaptive Testing. Applied Psychological Mesurement, 6,473-492.
Weiss, D. J., 1983. New horizons in testing. New York: Academic Press.
Weiss, D. J., 2004. Computerized Adaptive Testing for effective and efficient measurement in counseling and education. Measurement and Evaluation in Counseling and Development, 37(2), 70-84.
Weiss, D. J. and Kingsbury, G. G. 1984. Application of Computerized Testing to educational problems. Journal of Educational Measurement, 21(4), 361- 375.
Yi, Q., Wang, T. and Ban, J. C., 2001. Effects of scale transformation and test termination rule on the precision of ability estimation in Computerized Adaptive Testing. Journal of Educational Measurement, 38, 267-292.
Yoo, H., 2011. Evaluating several Multidimensional Adaptive Testing procedures for diagnostic assessment. Doctoral Dissertation. Unıversity of Massachusetts.