Higher Education End-of-Course Evaluations: Assessing the Psychometric Properties Utilizing Exploratory Factor Analysis and Rasch Modeling Approaches

This paper offers a critical assessment of the psychometric properties of a standard higher education end-of-course evaluation. Using both exploratory factor analysis (EFA) and Rasch modeling, the authors investigate the (a) an overall assessment of dimensionality using EFA, (b) a secondary assessment of dimensionality using a principal components analysis (PCA) of the residuals when the items are fit to the Rasch model, and (c) an assessment of item-level properties using item-level statistics provided when the items are fit to the Rasch model. The results support the usage of the scale as a supplement to high-stakes decision making such as tenure. However, the lack of precise targeting of item difficulty to person ability combined with the low person separation index renders rank-ordering professors according to minuscule differences in overall subscale scores a highly questionable practice.

Higher Education End-of-Course Evaluations: Assessing the Psychometric Properties Utilizing Exploratory Factor Analysis and Rasch Modeling Approaches

This paper offers a critical assessment of the psychometric properties of a standard higher education end-of-course evaluation. Using both exploratory factor analysis (EFA) and Rasch modeling, the authors investigate the (a) an overall assessment of dimensionality using EFA, (b) a secondary assessment of dimensionality using a principal components analysis (PCA) of the residuals when the items are fit to the Rasch model, and (c) an assessment of item-level properties using item-level statistics provided when the items are fit to the Rasch model. The results support the usage of the scale as a supplement to high-stakes decision making such as tenure. However, the lack of precise targeting of item difficulty to person ability combined with the low person separation index renders rank-ordering professors according to minuscule differences in overall subscale scores a highly questionable practice.

___

  • American Association of University Professors, Committee C on College and University Teaching, Research, and Publication. (1974). Statement on teaching evaluation. AAUP Bulletin, 60, 168-170.
  • Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
  • Bangert, A. W. (2006). The development of an instrument for assessing online teaching effectiveness. Journal of Educational Computing Research, 35, 227-244.
  • Barth, M. M. (2008). Deciphering student evaluations of teaching: A factor analysis approach. Journal of Education for Business, 84, 40-46.
  • Beran, T., Violato, C., & Kline, D. (2007). What’s the “use” of student ratings of instruction for administrators? One university’s experience. Canadian Journal of Higher Education, 17(1), 27-43.
  • Brown, A., & Green, T. (2003). Showing up to class in pajamas (or less ): The fantasies and realities of on-line professional development courses for teachers. The Clearing House, 76, 148-151.
  • Calkins, S., & Micari, M. (2010). Less-than-perfect judges: Evaluating student evaluations. Thought & Action: The NEA Higher Education Journal, 26, 7-22.
  • Campbell, J. P., & Bozeman, W.C. (2008). The value of student ratings: Perceptions of students, teachers, and administrators. Community College Journal of Research and Practice, 32, 13-24.
  • Cashin, W. E. (1995). Student ratings of teaching: The research revisited (IDEA Paper No. 32). http://www.theideacenter.org/sites/default/files/Idea_Paper_32.pdf from the IDEA Center website:
  • Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 629-637.
  • Cohen, E. H. (2005). Student evaluations of course and teacher: Factor analysis and SSA approaches. Assessment & Evaluation in Higher Education, 30, 123–136.
  • Côté, J. E. & Allahar, A. L. (2007). Ivory tower blues: A university system in crisis. Toronto, Ontario, Canada: University of Toronto Press.
  • DeCarlo, L. T. (1997). On the meaning and use of kurtosis. Psychological Methods, 2, 292- 307.
  • Ewing, J. K., & Crockford, B. (2008). Changing the culture of expectations: Building a culture of evidence. The Department Chair, 18(3), 23-25.
  • Flora, D. B., LaBrish, C., & Chalmers, R. P. (2012). Old and new ideas for data screening and assumption testing for exploratory and confirmatory factor analysis. Frontiers in Psychology, 3, 1-21.
  • Franklin, J., & Theall, M. (1989, March). Who reads ratings: Knowledge, attitude, and practice of users of student ratings of instruction. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
  • Griffin A., & Cook, V. (2009). Acting on evaluation: Twelve tips from a national conference on student evaluations. Medical Teacher, 31, 101-104.
  • Guthrie, E. R. (1954). The evaluation of teaching: A progress report. Seattle, WA: University of Washington Press.
  • Hathorn, L., & Hathorn, J. (2010). Evaluation of online course websites: Is teaching online a tug-of-war? Journal of Educational Computing Research, 42, 197-217.
  • Hodges, L.C., & Stanton, K. (2007). Translating comments on student evaluations into the language of learning. Innovative Higher Education, 31, 279-286.
  • Jaeger, R. M. (1977). A word about the issue. Journal of Educational Measurement, 14, 73- 74.
  • Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20, 141-151.
  • Kaplan, R. M. & Saccuzzo, D. P. (1997). Psychological testing: Principles, applications and issues (4 ed.). Pacific Grove, CA: Brooks/Cole. th
  • Kelly, H. F., Ponton, M. K., & Rovai, A. P. (2007). A comparison of student evaluations of teaching between online and face-to-face courses. The Internet and Higher Education, 10, 89-101.
  • Kim, K., Liu, S., & Bonk, C. J. (2005). Online MBA students’ perceptions of online learning: Benefits, challenges, and suggestions. The Internet and Higher Education, 8, 335-344.
  • Laube, H., Massoni, K., Sprague, J., & Ferber, A. L. (2007). The impact of gender on the evaluation of teaching: What we know and what we can do. National Women’s Studies Association Journal, 19(3), 87-104.
  • Linacre, J. M. (2002). What do infit and outfit, mean square and standardized mean? Rasch Measurement Transactions, 16, 878.
  • Linacre, J. M. (2014a). Dimensionality: Contrasts and variances. In A user’s guide to Winsteps Ministep Rasch-model computer programs (version 3.81.0). Retrieved from http://www.winsteps.com/winman/principalcomponents.htm
  • Linacre, J. M. (2014b). Reliability and separation of measures. In A user’s guide to Winsteps Ministep Rasch-model computer programs (version 3.81.0). Retrieved from http://www.winsteps.com/winman/reliability.htm
  • Marsh, H. W., & Dunkin, M. J. (1997). Students’ evaluations of university teaching: A multidimensional perspective. In R. P. Perry & J. C. Smart (Eds.), Effective teaching in higher education: Research and practice (pp. 241-320). New York, NY: Agathon Press.
  • Mortelmans, D., & Spooren, P. (2009). A revalidation of the SET37-questionnaire for student evaluations of teaching. Educational Studies, 35, 547–552.
  • Morgan, D. A., Sneed, J., & Swinney, L. (2003). Are student evaluations a valid measure of teaching effectiveness: Perceptions of accounting faculty members and administrators. Management Research News, 26(7): 17-32.
  • Ory, J. C. (2001). Faculty thoughts and concerns about student ratings. New Directions for Teaching and Learning, 87, 3-15. doi: 10.1002/tl.23
  • Osborne, J. W., Costello, A. B., & Kellow, J. T. (2008). Best practices in exploratory factor analysis. In J. W. Osborne (Ed.), Best practices in quantitative methods (pp. 86-102). Thousand Oaks, CA: Sage.
  • Otani, K., Kim, B. J., & Cho, J. (2012). Student evaluation of teaching (SET) in higher education: How to use SET more effectively and efficiently in public affairs education. Journal of Public Affairs Education, 18, 531-544.
  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research.
  • Raîche, G. (2005). Critical eigenvalue sizes (variances) in standardized residual principal components analysis (PCA). Rasch Measurement Transactions, 19, 1012.
  • Sick, J. (2009). Rasch measurement in language education part 3: The family of Rasch models. Shiken: JALT Testing and Evaluation SIG Newsletter, 13(1), 4-10
  • Spooren, P. (2010). On the credibility of the judge. A cross-classified multilevel analysis on student evaluations of teaching. Studies in Educational Evaluation, 36, 121–131.
  • Spooren, P. Brockx, B. & Mortelmans D. (2013). On the validating of student evaluation of teaching: The state of the art. Review of Educational Research, 83 (4), pp. 598-642.
  • Toland, M., & De Ayala, R. J. (2005). A multilevel factor analysis of students’ evaluations of teaching. Educational and Psychological Measurement, 65, 272–296.
  • Thorne, G. L. (1980). Student ratings of instructors: From scores to administrative decisions. Journal of Higher Education, 51, 207-214.
  • Van Der Ven, A. H. G. S. (1980). Introduction to scaling. New York, NY: Wiley.
  • Wachtel, H. K. (1998). Student evaluation of college teaching effectiveness: A brief review. Assessment & Evaluation in Higher Education, 23, 191-212.
  • Wattiaux, M. A., Moore, J. A., Rastani, R. R., & Crump, P. M. (2010). Excellence in teaching for promotion and tenure in animal and dairy sciences at doctoral/research universities: A faculty perspective. Journal of Dairy Science, 93, 3365-3376.
  • Warner, R. M. (2008). Applied statistics: From bivariate through multivariate techniques. Thousand Oaks, CA: Sage.
  • West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 56-75). Thousand Oaks, CA: Sage.
  • Wolfer, T. A., & Johnson, M. M. (2003). Re-evaluating student evaluation of teaching: The teaching evaluation form. Journal of Social Work Education, 39, 111-121.
  • Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370.