Investigating Invariant Item Ordering in Intelligence Tests: Mokken Scale Analysis of KBIT-2

The Kaufman Brief Intelligence Test – Second Edition (KBIT-2) is designed to measure verbal and nonverbal abilities in a wide range of individuals from 4 years 0 months to 90 years 11 months of age. This study examines both the advantages of using Mokken Scale Analysis (MSA) in intelligence tests and the hierarchical order of the items in the KBIT-2: Turkish form by estimating the parameters of each of the three subtests by testing the dimensionality of the KBIT-2 subtests by using the Invariant Item Ordering (IIO) assumptions. 2850 people participated in the study, including children, adolescents, and adults. Participants' ages varied from 48 months (4 years 0 months) to 539 months (44 years 11 months). Automated Item Selection Procedure (AISP) was applied for the assessment of unidimensionality under three different lower bounds as 0.30, 0.40, and 0.55. The items of all three subtests formed a unidimensional scale. Backward Item Selection (BIS) procedure detected seven items in the Matrices and 17 items in the Verbal Knowledge, while six items in the Riddles subtest violated the IIO criteria. KBIT-2: Reliability values obtained using MSA analysis show that all three subtests have a high degree of internal consistency. However, care should be taken when IIO assumptions do not fit the intelligence scales in the original form.

Investigating Invariant Item Ordering in Intelligence Tests: Mokken Scale Analysis of KBIT-2

The Kaufman Brief Intelligence Test – Second Edition (KBIT-2) is designed to measure verbal and nonverbal abilities in a wide range of individuals from 4 years 0 months to 90 years 11 months of age. This study examines both the advantages of using Mokken Scale Analysis (MSA) in intelligence tests and the hierarchical order of the items in the KBIT-2: Turkish form by estimating the parameters of each of the three subtests by testing the dimensionality of the KBIT-2 subtests by using the Invariant Item Ordering (IIO) assumptions. 2850 people participated in the study, including children, adolescents, and adults. Participants' ages varied from 48 months (4 years 0 months) to 539 months (44 years 11 months). Automated Item Selection Procedure (AISP) was applied for the assessment of unidimensionality under three different lower bounds as 0.30, 0.40, and 0.55. The items of all three subtests formed a unidimensional scale. Backward Item Selection (BIS) procedure detected seven items in the Matrices and 17 items in the Verbal Knowledge, while six items in the Riddles subtest violated the IIO criteria. KBIT-2: Reliability values obtained using MSA analysis show that all three subtests have a high degree of internal consistency. However, care should be taken when IIO assumptions do not fit the intelligence scales in the original form.

___

  • Abdelhamid, G. S. M., Gómez-Benito, J., Abdeltawwab, A. T. M., Abu Bakr, M. H. S., & Kazem, A. M. (2020). A Demonstration of Mokken Scale Analysis Methods Applied to Cognitive Test Validation Using the Egyptian WAIS-IV. Journal of Psychoeducational Assessment, 38(4), 493–506. https://doi.org/10.1177/0734282919862144
  • Atalay, Z. Ö. (2007). Kaufman brief intelligence test the studies of validity, reliability, and pre norm on children who are 13-14 years of age [Unpublished master's thesis], İstanbul University, İstanbul.
  • Chernyshenko, O. S., Stark, S., Chan, K. Y., Drasgow, F., Williams, B. A. (2001). Fitting item response theory models to two personality inventories: Issues and insights. Multivariate Behavioral Research, 36, 523–562. https://doi.org/10.1207/S15327906MBR3604_03
  • Cole, J. C., & Randall, M. K. (2003). Comparing the cognitive ability models of Spearman, Horn and Cattell, and Carroll. Journal of Psychoeducational Assessment, 21, 160-179. https://doi.org/10.1177%2F073428290302100204
  • Crișan, D. R., Tendeiro, J., & Meijer, R. (2019). The Crit Value as an Effect Size Measure for Violations of Model Assumptions in Mokken Scale Analysis for Binary Data. https://doi.org/10.31234/osf.io/8ydmr
  • Embretson, S. E., Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. Sage.
  • Hogarty, K. Y., Hines, C. V., Kromrey, J. D., Ferron, J. M., & Mumford, K. R. (2005). The quality of factor solutions in exploratory factor analysis: The influence of sample size, communality, and overdetermination. Educational and Psychological Measurement, 65, 202–226. https://psycnet.apa.org/doi/10.1177/0013164404267287
  • Junker, B. W., & Sijtsma, K. (2000). Latent and manifest monotonicity in item response models. Applied Psychological Measurement, 24, 65 81. https://doi.org/10.1177%2F01466216000241004
  • Junker, B. W., & Sijtsma, K. (2001). Nonparametric item response theory in action: An overview of the special issue. Applied Psychological Measurement, 25, 211–220. https://doi.org/10.1177%2F01466210122032028
  • Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Brief Intelligence Test (2nd ed.). American Guidance Service.
  • Ligtvoet, R., Van der Ark, L. A., Te Marvelde, J. M., & Sijtsma, K. (2010). Investigating an invariant item ordering for polytomously scored items. Educational and Psychological Measurement, 70, 578–595. https://doi.org/10.1177%2F0013164409355697
  • Lord, F. M. , & Novick, R. (1968). Statistical theories of mental test scores. Addison-Wesley.
  • Meijer, R. R., & Baneke, J. J. (2004). Analyzing psychopathology items: A case for nonparametric item response theory modeling. Psychological Methods, 9, 354–368. https://doi.org/10.1037/1082-989x.9.3.354
  • Meijer, R. R., Sijtsma, K., & Smid, N. G. (1990). Theoretical and empirical comparison of the Mokkenand the Rasch approach to IRT. Applied Psychological Measurement,14, 283–298. https://doi.org/10.1177/014662169001400306
  • Meijer, R. R., de Vries, R. M., & van Bruggen, V. (2011). An evaluation of the Brief Symptom Inventory-18 using item response theory: Which items are most strongly related to psychological distress?. Psychological Assessment, 23, 193 202. https://doi.org/10.1037/a0021292
  • Molenaar, I. W. (1997). Nonparametric models for polytomous responses. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 369–380). Springer.
  • Molenaar, I. W., & Sijtsma, K. (2000). MSP5 for Windows. A program for Mokken scale analysis for polytomous items. Groningen: iecProGAMMA.
  • Mokken, R. J. (1971). A theory and procedure of scale analysis. Mouton.
  • Mokken, R. J., & Lewis, C. (1982). A nonparametric approach to the analysis of dichotomous item responses. Applied Psychological Measurement, 6, 417 430. https://doi.org/10.1177%2F014662168200600404
  • Nunnally, J. C. (1978). Psychometric theory (2nd ed.). McGraw-Hill.
  • Öktem, F., (2016). Brief Intelligence Tests and Kaufman Brief Intelligence Test (KBIT-2). Turkiye Klinikleri J Psychol-Special Topics, 1(1), 10-6.
  • Reise, S. P., & Waller, N. G. (2003). How many IRT parameters does it take to model psychopathology items?. Psychological Methods, 8(2), 164 184. https://doi.org/10.1037/1082-989x.8.2.164
  • Robie, C., Zickar, M. J., & Schmit, M. J. (2001). Measurement equivalence between applicant and incumbent groups: An IRT analysis of personality scales. Human Performance, 14, 187–207. https://doi.org/10.1207/S15327043HUP1402_04
  • Savaşan, G. (2006). Kaufman Brief Intelligence Test the studies of validity, reliability and pre norm (age 9-10) [Unpublished master's thesis], İstanbul University, İstanbul.
  • Sijtsma, K. (2009). Correcting fallacies in validity, reliability, and classification. International Journal of Testing, 9, 167-194. https://doi.org/10.1080/15305050903106883
  • Sijtsma, K., Debets, P., & Molenaar, I. W. (1990). Mokken scale analysis for polychotomous items: Theory, a computer program and an empirical application. Quality and Quantity, 24, 173-188. https://doi.org/10.1007/BF00209550
  • Sijtsma, K.,& Meijer, R. R. (1992). A method for investigating the intersection of item response functions in Mokken's nonparametric IRT model. Applied Psychological Measurement, 16, 149–157. https://doi.org/10.1177/014662169201600204
  • Sijtsma, K., Meijer, R. R., & Van der Ark, L. A. (2011). Mokken scale analysis as time goes by: An update for scaling practitioners. Personality and Individual Differences, 50(1), 31-37. https://psycnet.apa.org/doi/10.1016/j.paid.2010.08.016
  • Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. SAGE.
  • Sijtsma, K., & Molenaar, I. W. (2016). Mokken models. In W. J. van der Linden (Ed.), Handbook of item response theory, Volume One: Models (pp. 303–321). Chapman & Hall/CRC.
  • Sijtsma, K., & Van der Ark, L. A. (2017). A tutorial on how to do a Mokken scale analysis on your test and questionnaire data. British Journal of Mathematical and Statistical Psychology,70(1), 137–158. https://doi.org/10.1111/bmsp.12078
  • Steinberg, L. (1994). Context and serial-order effects in personality measurement: Limits on the generality of measuring changes the measure. Journal of Personality and Social Psychology, 66(2),341–349. https://psycnet.apa.org/doi/10.1037/0022-3514.66.2.341
  • Straat, J. H., Van der Ark, L. A., & Sijtsma, K. (2013). Comparing optimization algorithms for item selection in Mokken scale analysis. Journal of Classification, 30, 72–99. https://doi:10.1007/s00357-013-9122-y
  • Straat, J. H., Van der Ark, L. A., & Sijtsma, K. (2014). Minimum sample size requirements for Mokken scale analysis. Educational and Psychological Measurement, 74(5), 809–822. https://doi:10.1177/0013164414529793
  • Straat, J. H., Van der Ark, L. A., & Sijtsma, K. (2016). Using conditional association to identify locally independent item sets. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 12(4), 117–123. https://doi.org/10.1027/1614-2241/a000115
  • Uluç, S., Öktem, F., Korkmaz, B. (2015). Brief screening tests: Kaufman Brief Intelligence Test-2 standardization for the Turkish version. VII. Işık Savaşır Clinical Psychology Symposium, Ankara.
  • Van der Ark LA (2012). "New Developments in Mokken Scale Analysis in R." Journal of Statistical Software, 48(5), 1–27. https://doi.org/10.18637/jss.v048.i05
  • Van der Ark, L. A., Van der Palm, D. W., & Sijtsma, K. (2011). A latent class approach to estimating test score reliability. Applied Psychological Measurement, 35(5), 380-392. https://doi.org/10.1177%2F0146621610392911
  • Waller, N. G., Thompson, J. S., & Wenk, E. (2000). Using IRT to separate measurement bias from true group differences on homogeneous and heterogeneous scales: An illustration with the MMPI. Psychological Methods, 5(1), 125–146. https://doi.org/10.1037/1082-989X.5.1.125
  • Watson, R., Deary, L., & Shipley, B. (2008). A hierarchy of distress: Mokken scaling of the GHQ 30. Psyhcological Medicine, 38(4), 575 579. https://doi.org/10.1017/S003329170800281X
  • Wind, S. (2016). Examining the psychometric quality of multiple-choice assessment items using Mokken scale analysis. Journal of Applied Measurement, 17(2), 142–165.
  • Wind, S. (2017). An instructional module on Mokken scale analysis. Educational Measurement: Issues and Practice, 36(2), 50–66. https://doi.org/10.1111/emip.12153
  • Zhu, J., Weiss, L. G., Prifitera, A., & Coalson, D. (2004). The Wechsler Intelligence Scales for Children and Adults. In G. Goldstein, S. R. Beers, & M. Hersen (Eds.), Comprehensive handbook of psychological assessment, Vol. 1. Intellectual and neuropsychological assessment (p. 51–75). John Wiley & Sons, Inc.