Halil İbrahim SARI, Anne Corinne HUGGINS MANLEY, Hasibe YAHŞİ SARI

Bireye Uyarlanmış Çok Aşamalı Testler: Pratik Konular, Zorluklar ve Prensipler

Genel olarak eğitimdeki testlerin amacı testteki sorulara verilen cevaplarla testi alan bireylerin yetenek seviyelerini ölçmektir. Bu işlem yıllarca geleneksel yöntem olarak bilinen kağıt-kalem formundaki testlerle yapıldı. Ancak geleneksel yöntemler diğer test yöntemlerine (bireye uyarlanmış testler) göre yüksek ölçme hatası barındırmaları ve test uzunlugu gibi problemler nedeniyle çokça eleştirilmektedir. Bu problemlerin üstesinden gelebilmek icin bireye uyarlanmış testler tasarlanmıştır. Günümüzde kullanılan en yaygın bireye uyarlanmiş iki tip test bulunmaktadır: 1) bilgisayar ortamında madde bazında bireye uyarlanmış testler ve 2) bilgisayar ortamında modül bazında bireye uyarlanmış çok aşamalı testler. Madde bazında bireye uyarlamış testler yüzyılı aşkın bir geçmise sahip olup bugüne kadar üzerinde çokça çalışma yapılmıştır. Bu yüzden eğitimde ve psikolojide ölçme alanı dışındaki araştırmacılar tarafından bile birçok yönü itibariyle bilinmektedir. Fakat bireye uyarlanmış çok aşamalı testler, madde bazında bireye uyarlanan testlere göre çok daha yeni bir çalışma alanı. Bu sebeble de çok aşamalı testlerin birçok araştırmacı tarafından yeterince bilinmemektedir. Bu çalışmanın amacı bireye uyarlanmış çok aşamalı testlerin tüm özelliklerini, diğer testlerden farklılıklarını, avantajları ve dezavantajlarini araştırmacılarla paylaşmak, aynı zamanda araştırmacıların bu alana olan ilgilerini arttırmak ve bu alanın gelişmesine katki sağlamalarına teşvik etmektir. Çalışmada ayrıca bu alanda yazılan kitaplar, kullanılan bilgisayar yazılımları ve alanla ilgili gelecekte yapılabilecek çalışmalar tartışılmıstır.

Computer Adaptive Multistage Testing: Practical Issues, Challenges and Principles

The purpose of many test in the educational and psychological measurement is to measure test takers’ latenttrait scores from responses given to a set of items. Over the years, this has been done by traditional methods(paper and pencil tests). However, compared to other test administration models (e.g., adaptive testing),traditional methods are extensively criticized in terms of producing low measurement accuracy and long testlength. Adaptive testing has been proposed to overcome these problems. There are two popular adaptivetesting approaches. These are computerized adaptive testing (CAT) and computer adaptive multistage testing(ca-MST). The former is a well-known approach that has been predominantly used in this field. We believethat researchers and practitioners are fairly familiar with many aspects of CAT because it has more than ahundred years of history. However, the same thing is not true for the latter one. Since ca-MST is relativelynew, many researchers are not familiar with features of it. The purpose of this study is to closely examine thecharacteristics of ca-MST, including its working principle, the adaptation procedure called the routing method,test assembly, and scoring, and provide an overview to researchers, with the aim of drawing researchers’attention to ca-MST and encouraging them to contribute to the research in this area. The books, software andfuture work for ca-MST are also discussed.

PDF

___

Angoff, W. H., & Huddleston, E. M. (1958). The multi-level experiment: a study of a two-level test system for the College Board Scholastic Aptitude Test (SR-58-21). Princeton, New Jersey: Educational Testing Service.
Armstrong, R. D. & Roussos, L. (2005). A method to determine targets for multi-stage adaptive tests. (Research Report 02-07). Newtown, PA: Law School Admissions Council.
Armstrong, R. D., Jones, D. H., Koppel, N. B., & Pashley, P. J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28, 147- 164.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2008). Incorporating randomness to the Fisher information for improving item exposure control in CATS. British Journal of Mathematical and Statistical Psychology, 61, 493-513.
Becker, K. A., & Bergstrom, B. A. (2013). Test administration models. Practical Assessment, Research & Evaluation, 18(14), 7.
Belov, D. I., & Armstrong, R. D. (2005). Monte Carlo test assembly for item pool analysis and extension. Applied Psychological Measurement, 29, 239-261.
Berkelaar, M. (2015). Package ‘lpSolve’.
Bridgeman, B. (2012). A Simple Answer to a Simple Question on Changing Answers. Journal of Educational Measurement, 49, 467-468.
Chang, H.H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213–229.
Choi, S. W., Grady, M. W., & Dodd, B. G. (2010). A New Stopping Rule for Computerized Adaptive Testing. Educational and Psychological Measurement, 70(6), 1–17.
Cor, K., Alves, C., & Gierl, M. (2009). Three Applications of Automated Test Assembly within a UserFriendly Modeling Environment. Practical Assessment, Research & Evaluation, 14(14).
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
Cronbach, L. J., & Glaser, G. C. (1965). Psychological tests and personnel decisions. Urbana, IL: University of Illinois Press.
Crotts, K. M., Zenisky, A. L., & Sireci, S. G. (2012, April). Estimating measurement precision in reducedlength multistage-adaptive testing. Paper presented at the meeting of the National Council on Measurement in Education, Vancouver, BC, Canada.
Davey, T., & Y.H. Lee. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE Revised General Test. (GRE Board Research Report 08-01). Princeton, NJ: Educational Testing Service.
Davis, L. L., & Dodd, B. G. (2003). Item Exposure Constraints for Testlets in the Verbal Reasoning Section of the MCAT. Applied Psychological Measurement, 27, 335-356.
Diao, Q., & van der Linden, W. J. (2011). Automated test assembly using lp_solve version 5.5 in R. Applied Psychological Measurement, DOI: 0146621610392211.
Dubois, P. H. (1970). A history of psychological testing. Boston: Allyn & Bacon
Eignor, D. R., Stocking, M. L., Way, W. D., & Steffen, M. (1993). Case studies in computer adaptive test design through simulation. ETS Research Report Series, 1993(2).
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications (Vol. 7). Springer Science & Business Media.
Han, K. T. (2013). " MSTGen": Simulated Data Generator for Multistage Testing. Applied Psychological Measurement, 37, 666-668.
Han, K. T., & Kosinski, M. (2014). Software Tools for Multistage Testing Simulations. In Computerized Multistage Testing: Theory and Applications (pp. 411-420). Chapman and Hall/CRC.
Han, K.T., & Guo, F. (2013). An Approach to Assembling Optimal Multistage Testing Modules on the Fly (Report No. RR-13-01). Reston, Virginia: Graduate Management Admission Council. Retrieved from GMAC website: http://www.gmac.com/market-intelligence-and-research/research-library/validity-andtesting/research-reports-validity-related/module-assembly-on-the-fly.aspx
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44-52.
ILOG. (2006). ILOG CPLEX 10.0 [User’s manual]. Paris, France: ILOG SA.
Keller, L. A. (2000). Ability estimation procedures in computerized adaptive testing. USA: American Institute of Certified Public Accountants-AICPA Research Concortium-Examination Teams.
Keng, L. & Dodd, B.G. (2009, April). A comparison of the performance of testlet based computer adaptive tests and multistage tests. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Keng, L. (2008). A comparison of the performance of testlet-based computer adaptive tests and multistage tests (Order No. 3315089).
Kim, H., & Plake, B. S. (1993, April). Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Atlanta, GA.
Leung, C.K., Chang, H.H., & Hau, K.T. (2002). Item selection in computerized adaptive testing: Improving the a-Stratified design with the Sympson-Hetter algorithm. Applied Psychological Measurement, 26(4), 376–392.
Leung, C.K., Chang, H.H., & Hau, K.T. (2003). Computerized adaptive testing: A comparison of three content balancing methods. Journal of Technology, Learning, and Assessment, 2(5).
Lewis, C., & Sheehan, K. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14(4), 367-386.
Lord, F. M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227–242.
Lord, F. M. (1974). Practical methods for redesigning a homogeneous test, also for designing a multilevel test. Educational Testing Service RB-74–30.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Lord, F. M. (1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157–162.
Luecht, R. M. & Sireci, S. G. (2011). A review of models for computer-based testing. Research (Report No: 2011-12). New York: The College Board. Retrieved from website: http://research.collegeboard.org/publications/content/2012/05/review-models-computer-based-testing
Luecht, R. M. (1998). Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, 22, 224-236.
Luecht, R. M. (2000, April). Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans, LA.
Luecht, R. M. (2003, April). Exposure control using adaptive multi-stage item bundles. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, IL.
Luecht, R. M., & Nungester, R. J. (1998). Some Practical Examples of Computer‐ Adaptive Sequential Testing. Journal of Educational Measurement,35, 229-249.
Luecht, R. M., Brumfield T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202.
Luecht, R. M., Nungester, R.J., & Hadadi, A. (1996, April). Heuristic-based CAT: Balancing item information, content and exposure. Paper presented at the annual meeting of the National Council of Measurement in Education, New York.
Magis, D., & Raîche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48(8), 1-31.
Mead, A. D. (2006). An introduction to multistage testing. Applied Measurement in Education, 19, 185-187.
Patsula, L. N. & Hambleton, R.K. (1999, April). A comparative study of ability estimates from computer adaptive testing and multi-stage testing. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec.
Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing (Order No. 9950199). Available from ProQuest Dissertations & Theses Global. (304514969)
R Development Core Team. (2013). R: A language and environment for statistical computing, reference index (Version 2.2.1). Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org
Rudner, L. M. (2009). Implementing the graduate management admission test computerized adaptive test. In Elements of adaptive testing (pp. 151-165). Springer New York.
Schnipke, D. L., & Reese, L. M. (1999). A Comparison [of] Testlet-Based Test Designs for Computerized Adaptive Testing. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.
Segall, D. O. (2005). Computerized adaptive testing. Encyclopedia of social measurement, 1, 429-438.
Thissen, D., & Mislevy, R. J. (2000). Testing algorithms. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., pp. 101-133). Hillsdale, NJ: Lawrence Erlbaum.
Thompson, N. A. (2008). A Proposed Framework of Test Administration Methods. Journal of Applied Testing Technology, 9(5), 1-17.
Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation,16(1), 1-9.
van der Linden, W. J. (1998). Bayesian item selection criteria for adaptive testing. Psychometrika, 63, 201-216. doi: 10.1007/BF02294775
van Der Linden, W. J., & Chang, H. H. (2003). Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach. Applied Psychological Measurement, 27, 107-120.
van der Linden, W.J., Jeon, M., & Ferrara, S. (2011). A paradox in the study of the benefits of test-item review. Journal of Educational Measurement, 48, 380-398.
Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203-226. doi: 10.3102/10769986022002203
Veldkamp, B. P., & van der Linden, W. J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67, 575-588.
Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational and Behavioral Statistics, 12, 339-368.
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized adaptive testing: A primer. Routledge.
Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35, 109-135.
Wang, X., Fluegge, L., & Luecht, R.M. (2012, April). A large-scale comparative study of the accuracy and efficiency of ca-MST panel design configurations. Paper presented at the meeting of the National Council on Measurement in Education, Vancouver, BC, Canada.
Weiss, D. J. (1973). The stratified adaptive computerized ability test (Research Report 73-3). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program.
Weiss, D. J., & Kingsbury, G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-375.
Weissman, A., Belov, D., & Armstrong, R. (2007). Information-based versus number-correct routing in multistage classification tests. (LSAC Research Report No:07-05). Newtown, PA: Law School Admissions Council.
Yan, D., von Davier, A. A., & Lewis, C. (Eds.). (2014). Computerized multistage testing: Theory and applications. CRC Press.
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment (Order No. 3136800).
Zenisky, A., Hambleton, R. K., & Luecht, R. M. (2010). Multistage testing: Issues, designs, and research. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 355-372). New York: Springer.
Zheng, Y., & Chang, H. H. (2015). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104-118.
Zheng, Y., Nozawa, Y., Gao, X., & Chang, H. H. (2012). Multistage Adaptive Testing for a Large-Scale Classification Test: Design, Heuristic Assembly, and Comparison with Other Testing Modes. ACT Research Report Series, 2012 (6). ACT.