Robust Principal Component Analysis based on Fuzzy Coded Data

In the presence of outliers in the dataset, the principal component analysis method, like many of the classical statistical methods, is severely affected. For this reason, if there are outliers in dataset, researchers tend to use alternative methods. Use of fuzzy and robust approaches is the leading choice among these methods. In this study, a new approach to robust fuzzy principal component analysis is proposed. This approach combines the power of both robust and fuzzy methods at the same time and collects these two approaches under the framework of principal component analysis. The performance of proposed approach called robust principal component analysis based on fuzzy coded data is examined through a set of artificial dataset that are generated by considering three different scenarios and a real dataset to observe how it is affected by the increase in sample size and changes in the rate of outliers. In light of the study's findings, it is seen that the proposed approach gives better results than the ones in the classical and robust principal component analysis in the presence of outliers in dataset.

___

  • Alkan, B. B. (2016). Robust Principal Component Analysis Based On Modified Minimum Covariance Determinant In The Presence Of Outliers (in Turkish). Alphanumeric Journal, 4(2).
  • Alkan, B. B., Atakan, C., Alkan, N., (2015). A comparison of different procedures for principal component analysis in the presence of outliers, Journal of Applied Statistics, 42(8), 1716-1722.
  • Asan, Z., & Greenacre, M. (2011). Biplots of fuzzy coded data. Fuzzy sets and Systems, 183(1), 57-71.
  • Asan, Z., & Senturk, S. (2011). An Application of Fuzzy Coding in Multiple Correspondence Analysis for Transforming Data from Continuous to Categorical. Journal of Multiple-Valued Logic & Soft Computing, 17.
  • Atkinson, A.C., (1994). Fast Very Robust Methods for the Detection of Multiple Outliers, J. Amer. Statist. Assoc. 89, 1329–1339.
  • Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191-203.
  • Calcagnì, A., Lombardi, L., & Pascali, E. (2016). A dimension reduction technique for two-mode non-convex fuzzy data. Soft Computing, 20(2), 749-762.
  • Campbell, N. A., (1980). Robust procedures in multivariate analysis I: Robust covariance estimation, Applied statistics, 231-237.
  • Croux, C., Haesbroeck G., (2000). Principal components analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies, Biometrika, 87, 603–618.
  • Croux, C., Ruiz-Gazen, A.,(2005). High breakdown estimators for principal components: the projection-pursuit approach revisited, Journal of Multivariate Analysis 95, 206–226.
  • Croux, C., Filzmoser, P., & Fritz, H. (2013). Robust sparse principal component analysis. Technometrics, 55(2), 202-214.
  • Devlin, S. J., Gnanadesikan, R., Kettenring, J. R., (1981). Robust estimation of dispersion matrices and principal components, Journal of the American Statistical Association, 76(374), 354-362.
  • Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4(2):229–246
  • Dumitrescu, D., S [acaron] rbu, C., & Pop, H. (1994). A fuzzy divisive hierarchical clustering algorithm for the optimal choice of sets of solvent systems. Analytical letters, 27(5), 1031-1054.
  • Farcomeni, A., Greco, L., (2015). Robust methods for data reduction. CRC press.
  • Filzmoser, P., & Todorov, V. (2011). Review of robust multivariate statistical methods in high dimension. Analytica chimica acta, 705(1), 2-14.
  • Filzmoser, P., Reimann, C., Garrett, R.G., (2003). Multivariate outlier detection in exploration geochemistry, Technical ReportTS 03–5, Department of Statistics, Vienna University of Technology, Austria.
  • Guitonneau, G. G., & Roux, M. (1977). Sur la taxinomie du genre Erodium. Les cahiers de l'analyse des données, 2(1), 97-113.
  • Hubert, M., Engelen, S., (2004). Robust PCA and classification in biosciences, Bioinformatics, 20(11), 1728-1736.
  • Hubert, M., Reynkens, T., Schmitt, E., & Verdonck, T. (2016). Sparse PCA for high-dimensional data with outliers. Technometrics, 58(4), 424-434.
  • Jang, J. S. R., Sun, C. T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing; a computational approach to learning and machine intelligence.
  • Johnson, R., Wichern, D. (1992). Applied multivariate statistical methods,3rd Edi., Prentice Hall, Englewood Cliffs, NJ.
  • Lauro CN, Palumbo F (2000) Principal component analysis of interval data: a symbolic data analysis approach. Comput Stat 15(1):73–87
  • Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., Cohen, K., (1999). Robust principal components for functional data, Test 8, 1–28.
  • Maronna, R., (2005). Principal components and orthogonal regression based on robust scales, Technometrics, 47(3), 264-273.
  • Pop, H. F., Sârbu, C., Horowitz, O., & Dumitrescu, D. (1996). A Fuzzy Classification of the Chemical Elements⊥. Journal of chemical information and computer sciences, 36(3), 465-482.
  • Rocke, D. M., Woodruff, D. L., (1996). Identification of Outliers in Multivariate Data, J. Amer. Statist. Assoc. 91 (435), 1047–1061.
  • Rousseeuw, P. J., (1984). Least median of squares regression, Journal of the American statistical association, 79(388), 871-880.
  • Rousseeuw, P. J., (1985). Multivariate estimation with high breakdown point, Mathematical statistics and applications, 8, 283-297.
  • Rousseeuw, P. J., Driessen, K. V., (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41(3), 212-223.
  • R Development Core Team, (2011). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna.
  • Rousseeuw, P.J., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke T., Maechler, M., (2009). Robustbase: basic robust statistics, R package version 0.4–5. Available at http://CRAN. R-project. org/package = robustbase.
  • Sârbu, C., & Pop, H. F. (2000). Fuzzy clustering analysis of the first 10 MEIC chemicals. Chemosphere, 40(5), 513-520.
  • Sârbu, C., & Pop, H. F. (2001). Fuzzy robust estimation of central location. Talanta, 54(1), 125-130.
  • Sarbu, C., & Pop, H. F. (2004). Fuzzy soft-computing methods and their applications in chemistry. Reviews in Computational Chemistry, 20, 249.
  • Sarbu, C., & Pop, H. F. (2005). Principal component analysis versus fuzzy principal component analysis: a case study: the quality of Danube water (1985–1996). Talanta, 65(5), 1215-1220.
  • Smithson, M., & Verkuilen, J. (2006). Fuzzy set theory: Applications in the social sciences (No. 147). Sage.
  • Taheri SM (2003) Trends in fuzzy statistics. Austrian J Stat 32(3):239– 257
  • Todorov, V., Neyko, N., Neytchev, P., (1994). Stability of High Breakdown Point Robust PCA, in Short Communications, COMPSTAT'94; Physica Verlag, Heidelberg.
  • Todorov, V., (2009). rrcov: Scalable Robust Estimators with High Breakdown Point, R package version 0.5–03, Availableat http://CRAN. R-project. org/package = rrcov.
  • Todorov ,V. and Filzmoser, P., (2009). An object-oriented framework for robust multivariate analysis, J. Statist. Softw. 32(3) (2009), 1–47.
  • Viertl R (2011) Statistical methods for fuzzy data. Wiley
  • Yang, T. N., & Wang, S. D. (1999). Robust algorithms for principal component analysis. Pattern Recognition Letters, 20(9), 927-933. Zimmermann HJ (2001) Fuzzy set theory-and its applications. Springer