Aykırı değer varlığında hızlı minimum kovaryans determinantı kestiricilerinin faktör analizinde kullanımı

Çok değişkenli analizlerden biri olan faktör analizi veri indirgeme, değişkenler arasındaki ilişkileri belirleme ve aynı zamanda sınıflandırma yöntemi olarak karşımıza çıkmaktadır. Çok sayıda değişken içeren veri setlerini analiz ederken araştırmacılar problemin sonuçları üzerinde olası zararlı etkileri olabilen ve aykırı değer olarak isimlendirilen olağandışı gözlemlerle karşılaşabilmektedir. Sıklıkla hatalı gözlemler olarak değerlendirilen aykırı değerler, veri seti hakkında önemli bilgileri içinde barındırabileceği gibi kurulan modelin yanlış belirlenmesi, yanlı parametre kestirimi ve yanlış analiz sonuçlarına da neden olabilmektedir. Bu çalışmanın amacı aykırı değerlerin parametre kestirimlerini yanlı yapmayacak özellikte faktör analizi metodunu kullanmaktır. Bu amaçla aykırı değerlerin etkisini azaltan sağlam konum ve ölçek kestiricileri içinden hızlı minimum kovaryans determinantı kestiricileri tercih edilip, bir faktör analizi uygulaması gerçekleştirilmiştir. Sonuç olarak aykırı değerlerin etkisini azaltarak verilerin çoğunluğuna uyan, açıklanan varyansı daha yüksek ve değişkenlerin daha anlamlı olarak faktörlerde toplandığı sağlam faktör analizi kestirim sonuçları elde edilmiştir.

Using fast minimum covariance determinant estimators for factor analysis in the presence of outliers

Factor analysis, one of those multivariate methods, is used for data reduction, determining the relationships between variables, and also as a classification method. The researchers frequently face with the observations that might have bad affects on the results while analysing data sets with many variables. Those kind of observations are called outliers. Outliers, usually evaluated as erroneous observations, may reflect important information about the data set, but on the other hand may cause misspecification of the model, biased parameter estimates, and may lead incorrect analysis results. The aim of this study is to use factor analysis method which will not make biased parameter estimations of the outliers. For this aim, among the robust local and scale parameter estimators which reduce the affects of outliers, fast minimum covariance determinant estimators are choosen and an application is performed. As a result, robust factor analysis estimation results are obtained which are mostly fitted to the data by reducing the affect of the outliers, with higher explained variability and with explained variables which are gathered on more significant factors.

___

  • [1] K. G. Jöreskog, "Some contributions to maximum likelihood factor analysis," Psychometrika, vol. 32, p. 443-482, 1967.
  • [2] D. N. Lawley and A. E. Maxwell, Factor analysis as a statistical method, London: Butterworth, 1971.
  • [3] D. J. Bartholomew and M. Knott, Latent variable models and factor analysis, London: Arnold, 1999.
  • [4] G. Pison, P. J. Rousseeuw, P. Filzmoser and C. Croux, "Robust factor analysis," Journal of multivariate analysis, vol. 84, no. 7, pp. 145- 172, 2003.
  • [5] J. P. Stevens, "Outliers and influential data points in regression analysis," Psychological Bulletin, vol. 95, no. 2, pp. 334-344, 1984.
  • [6] A. Christmann and S. Van Aelst, "Robust Estimation of Cronbach's Alpha," Journal of Multivariate Analysis, vol. 97, no. 7, p. 1660- 1674, 2006.
  • [7] D. Mavridis and I. Moustaki, "Detecting outliers in factor analysis using the forward search algorithm," Multivaraite behavioral research, vol. 43, no. 3, pp. 453-475, 2008.
  • [8] K. A. Bollen and G. Arminger, "Observational residuals in factor analysis and structural equation models," Sociological methodology, vol. 21, pp. 235-262, 1991.
  • [9] K. A. Bollen, Structural equations with latent variables, New York: Wiley, 1989.
  • [10] A. Basilevsky, Statistical factor analysis and related methods : theory and application, New York: Wiley, 1994.
  • [11] K. M. Yuan and P. M. Bentler, "Structural equation modeling with robust covariances," Sociological methodology, vol. 28, pp. 363- 396, 1998b.
  • [12] K. G. Jöreskog, "Structural equation models in the social sciences: Specification, estimation and testing," in Advances in factor analysis and structural equation models, K. Jöreskog and D. Sörbom, Eds., Cambridge, Abt Books, 1979, pp. 105-127.
  • [13] N. A. Champbell, "Robust procedures in multivariate analysis I: Robust covariance estimation," Applied Statistics, vol. 29, no. 3, pp. 231-237, 1980.
  • [14] T. C. Cheng and M. P. Victoria-Feser, "High breakdown estimation of multivariate mean and covariance with missing observations," British Journal of Mathematical and Statistical Psychology, vol. 55, pp. 317-335, 2002.
  • [15] P. L. Davis, "Asyptotic behaviour of Sestimators of multivariate location parameters and dispersion matrices," Annals of Statistics, vol. 15, pp. 1269-1292, 1987.
  • [16] S. J. Devlin, R. Gnanadesikan and J. Kettenring, "Robust estimation of dispersion matrices and principal components," Journal of the american statstical association, vol. 76, no. 374, pp. 354-362, 1981.
  • [17] D. L. Donoho, "Breakdown properties of multivariate location estimators," Ph.D. Qualifying paper, p. 69, 1982.
  • [18] A. S. Hadi, "Identifying multiple outliers in multivariate data," Journal of the Royal Statistical Society, Series B, vol. 54, no. 3, p. 761-771, 1992.
  • [19] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw and W. A. Stahel, Robust statistics, the approach based on influence functions, New York: Wiley, 1986.
  • [20] P. J. Huber, Robust statistics, New York: Wiley, 1981.
  • [21] R. A. Maronna, "Robust M-estimators of multivariate location and scatter," Annals of statistics, vol. 4, pp. 51-67, 1976.
  • [22] P. J. Rousseeuw, "Multivariate estimation with high breakdown point," in Mathematical statistics and applications, W. Grossmann, G. Pflug, I. Vincze, and W. Wertz, Eds., Dordrecht:, Reidel, 1985, p. 283-297.
  • [23] P. J. Rousseeuw and A. M. Leroy, Robust regression and outlier detection, New York: Wiley, 1987.
  • [24] P. J. Rousseeuw and B. C. Van Zomeren, "Unmasking multivariate outliers and leverage points," Journal of the American Statistical Association, vol. 85, pp. 633-651, 1990.
  • [25] W. A. Stahel, Robust estimation: Infinitesimal optimality and covariance matrix estimators, Zurich: Ph.D Dissertation, ETH, 1981.
  • [26] D. Woodruff and D. M. Rocke, "Computable robust estimation of multivariate location and shape in high dimension using compound estimators," Journal of the American Statistical Association, vol. 89, p. 888-896, 1994.
  • [27] P. Filzmoser, "Robust principal component and factor analysis in the geostatistical treatment of environmental data," Environmetrics, vol. 10, p. 363-375, 1999.
  • [28] I. Moustaki and M. P. Victoria-Feser, "Bounded-influence robust estimation in generalized linear latent variable models," Journal of the American Statistical Association, vol. 101, no. 474, p. 644-653, 2006.
  • [29] K. M. Yuan and P. M. Bentler, "Robust mean and covariance structure analysis," British Journal of Mathematical and Statistical Psychology, vol. 51, p. 63-88, 1998a.
  • [30] K. M. Yuan and P. M. Bentler, "Effects of outliers on estimators and tests in covariance structure analysis," British Journal of Mathematical and Statistical Psychology, vol. 54, p. 161-175, 2001.
  • [31] R. Kosfeld, "Robust exploratory factor analysis," Statistical Papers, vol. 37, pp. 105- 122, 1996.
  • [32] D. Hawkins, "The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data," Computational Statistics and Data Analysis, vol. 17, pp. 197- 210 , 1994.
  • [33] A. Atkinson, " Fast very robust methods for the detection of multiple outliers," Journal of the American Statistical Association, vol. 89, p. 1329-1339, 1994.
  • [34] P. J. Rousseeuw and K. Van Driessen, "A fast algorithm for the minimum covariance determinant estimator," Technometrics, vol. 41, no. 3, pp. 212-223, 1999.
  • [35] J. Hardin and D. M. Rocke, "The Distributions of Robust Distances," Journal of Computational and Graphical Statistics, vol. 14, no. 4, pp. 1-19 , 2005.
  • [36] J. S. Tanaka and Y. Odaka, "Influential observations in principal factor analysis," Psychometrika, vol. 54, p. 475-485, 1989.
  • [37] H. P. Lopuhaä and P. J. Rousseeuw, "Breakdown points of affine equivariant estimators of multivariate location and covariance matrices," The Annals of Statistics, vol. 19, no. 1, pp. 229-248 , 1991.
  • [38] A. Cerioli, "Multivariate outlier detection with high-breakdown estimators," Journal of the American Statistical Assosciation-Theory and Methods, vol. 105, no. 489, pp. 147-156, 2010.
  • [39] B. Thomson, Exploratory and confirmatory factor analysis: Understanding concepts and applications., Washington, DC: American Psychological Association, 2004.
  • [40] R. B. Cattell, "The scree test for the number of factors," Multivariate Behavioural Research, vol. 1, pp. 245-276, 1966.
  • [41] D. L. Streiner, "Factors affecting reliability of interpretations of scree plots," Psychological Reports, vol. 83, pp. 687-694, 1998.