Effect of influential cases on factor analysis results

When performing regression analysis, one way to examine the normality of data is to screen outliers. Outliers, on the other hand, do not always have an effect on regression results. In reality, cases with a large amount of residuals that affect regression analysis results are referred to as influential cases. It is important to detect them in the dataset because they can lead to erroneous conclusions. The influence of influential cases has already gotten a lot of attention in the regression literature, while it has gotten a lot less attention in factor analysis. The aim of this paper is to show how influential cases affect factor analysis results when they are detected using the Forward Search algorithm. The data was collected from 686 university students ranging in age from 17 to 30. The data was gathered using the Self-Regulation Scale (SRS). The results revealed that the removal of influential cases had an effect on the observed correlation matrice for the SRS items, the factorability results, the number of dimensions extracted, CFA fit indices, and the amount of factor loadings and associated errors. Later, in light of related literature, these results were discussed and The researchers were recommended to consider the effect of influential when applying factor analysis.

___

  • Atkinson, A. C. (1994). Fast very robust methods for the detection of multiple outliers. Journal of the American statistical Association, 89, 1329-1339. https://doi.org/10.1080/01621459.1994.10476872
  • Atkinson, A. C., Riani, M., & Cerioli, A. (2004). Exploring multivariate data with the forward search. Springer-Verlag.
  • Barnett V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). Wiley.
  • Bendre, S. M., & Kale, B. K. (1987). Masking effect on tests for outliers in normal samples. Biometrika, 74(4), 891-896. https://doi.org/10.1093/biomet/74.4.891
  • Bollen, K. A. & , Arminger, G. (1991). Observational residuals in factor analysis and structural equation models. In Mardsen, P. V. (Eds.), Sociological methodology (Vol. 21, pp. 235-262). Blackwell Pulishing.
  • Bollen, K. A. (1987). Outliers and improper solutions: a confirmatory factor analysis example. Sociological Methods & Research, 15(4), 375–384. https://doi.org/10.1177/0049124187015004002
  • Chalmers, R.P. &and Flora D. B. (2015). faoutlier: an r package for detecting influential cases in exploratory and confirmatory factor analysis. Applied Psychological Measurement, 39(7), 573-374. doi:10.1177/0146621615597894
  • Cook, R.D. & Weisberg, S. (1982). Residuals and influence in regression. Chapman&Hall.
  • Dan, E., & Ijeoma, O. A. (2013). Statistical analysis/methods of detecting outliers in a univariate data in a regression analysis model. International journal of education and research, 1(5), 1-24.
  • Demiraslan, Y. C., Haşlaman, T., Filiz, K. Mumcu, F. & Gökçearslan, Ş. (2015). Özdüzenlemenin dikkat kontrolü boyutu: bir ölçek uyarlama çalışması-Control dimension of self-regulation: a scale adaptation study. Başkent university journal of education, 2, 229-238.
  • Diehl, M., Semegon, A. B., & Schwarzer, R. (2006). Assessing attention control in goal pursuit: A component of dispositional self-regulation. Journal of Personality Assessment, 86(3), 306-317. https://doi.org/10.1207/s15327752jpa8603_06
  • Flora, D., LaBrish, C., & Chalmers, P. (2012). Old and new ideas for data screening and assumption testing for exploratory and confirmatory factor analysis. Frontiers in Psychology, 3, 1-21. https://doi.org/doi:10.3389/fpsyg.2012.00055.
  • Fox, J. (2008). Applied regression analysis and generalized linear models (2nd ed.). Sage Publications, Inc.
  • Hadi, A. S., & Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models. Journal of the American statistical association, 88(424), 1264-1272. https://doi.org/10.1080/01621459.1993.10476407
  • Kleinbaum, D.G., Kupper, L.L. and Muller, K.E. (1988). Variable reduction and factor analysis. applied regression analysis and other multivariable methods. PWS Kent Publishing Co.
  • Koran, J., & Jaffari, F. (2020). Deletion statistic accuracy in confirmatory factor models. Methodological Innovations, 13(2), 1-10. https://doi.org/10.1177/2059799120918349
  • Lee, S. Y, & Wang, S. J. (1996). Sensitivity analysis of structural equation models. Psychometrika, 61, 93-108. https://doi.org/10.1007/BF02296960
  • Liu, H., Shah, S., & Jiang, W. (2004). On-line outlier detection and data cleaning. Computers & chemical engineering, 28(9), 1635-1647. https://doi.org/10.1016/j.compchemeng.2004.01.009
  • Liu, Y. & , Zumbo, B. D. (2007). The impact of outliers on Cronbach’s coefficient alpha estimate of reliability: Visual analogue scales. Educational and Psychological Measurement, 67, 620-634. https://doi.org/10.1177/0013164406296976
  • Liu, Y., Zumbo, B. D., & Wu, A. D. (2012). A Demonstration of the impact of outliers on the decisions about the number of factors in exploratory factor analysis. Educational and Psychological Measurement, 72(2), 181–199. https://doi.org/10.1177/0013164411410878
  • Mavridis D. & Moustaki, I. (2008) Detecting outliers in factor analysis using the forward search algorithm, Multivariate Behavioral Research, 43(3), 453-475, https://doi.org/DOI: 10.1080/00273170802285909.
  • Pek, J., & MacCallum, R. C. (2011). Sensitivity analysis in structural equation models: Cases and their influence. Multivariate Behavioral Research, 46(2), 202–228. https://doi.org/10.1080/00273171.2011.561068
  • Poon, W.-Y. & Wong Y.-K. (2004) A forward search procedure for ıdentifying ınfluential observations in the estimation of a covariance matrix. Structural Equation Modeling: A Multidisciplinary Journal, 11(3), 357-374, https://doi.org/DOI: 10.1207/s15328007sem1103_4
  • R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  • Rasmussen, J. L. (1988). Evaluating outlier identification tests: Mahalanobis D squared and Comrey Dk. Multivariate behavioral research, 23(2), 189-202. https://doi.org/10.1207/s15327906mbr2302_4
  • Revelle, W. (2019) psych: Procedures for personality and psychological research. Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych Version = 1.9.12.
  • Riani, M., Cerioli, A., Atkinson, A. C., Perrotta, D. & Torti, F. (2008). Fitting mixtures of regression lines with the forward search. Mining Massive Data Sets for Security: Advances in Data Mining, Search, Social Networks and Text Mining, and Their Applications to Security, 19, 271.
  • Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1-36. https://doi.org/10.18637/jss.v048.i02
  • Schwarzer R, Diehl M., & Schmitz G. S. (1999, April 15). Self-Regulation scale. http://userpage.fu-berlin.de/~health/selfreg_e.htm
  • Spearman, C. (1904). 'General intelligence,' objectively determined and measured. The American Journal of Psychology, 15(2), 201–293. https://doi.org/10.2307/1412107
  • Tabachnick, B. G., & Fidell, L. S. (2001). Using Multivariate Statistics (4th ed). Needham Heights, Allyn and Bacon.
  • Tanaka, Y., Watadani, S., & Ho Moon, S. (1991). Influence in covariance structure analysis: With an application to confirmatory factor analysis. Communications in Statistics-Theory and Methods, 20(12), 3805-3821. https://doi.org/10.1080/03610929108830742.
  • Yuan, K.-H., & Zhong, X. (2008). Outliers, leverage observations, and ınfluential cases in factor analysis: using robust procedures to minimize their effect. Sociological Methodology, 38(1), 329-368. https://doi.org/10.1111/j.1467-9531.2008.00198.x