Determining The Number of Principal Components with Schur's Theorem in Principal Component Analysis

Determining The Number of Principal Components with Schur's Theorem in Principal Component Analysis

Principal Component Analysis is a method for reducing the dimensionality of datasets while also limiting information loss. It accomplishes this by producing uncorrelated variables that maximize variance one after the other. The accepted criterion for evaluating a Principal Component’s (PC) performance is λ_j/tr(S) where tr(S) denotes the trace of the covariance matrix S. It is standard procedure to determine how many PCs should be maintained using a predetermined percentage of the total variance. In this study, the diagonal elements of the covariance matrix are used instead of the eigenvalues to determine how many PCs need to be considered to obtain the defined threshold of the total variance. For this, an approach which uses one of the important theorems of majorization theory is proposed. Based on the tests, this approach lowers the computational costs.

___

  • [1] K. Pearson, "LIII. On lines and planes of closest fit to systems of points in space," The London, Edinburgh, and Dublin philosophical magazine and journal of science, vol. 2, no. 11, pp. 559-572, 1901.
  • [2] H. Hotelling, "Analysis of a complex of statistical variables into principal components," Journal of educational psychology, vol. 24, no. 6, p. 417, 1933.
  • [3] I. T. Jolliffe, "Graphical representation of data using principal components," Principal component analysis, pp. 78-110, 2002.
  • [4] T. Hastie, R. Tibshirani, and J. Friedman, "Unsupervised learning," in The elements of statistical learning: Springer, pp. 485-585, 2009.
  • [5] C. Hafemeister and R. Satija, "Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression," Genome biology, vol. 20, no. 1, pp. 1-15, 2019.
  • [6] L. McInnes, J. Healy, and J. Melville, "Umap: Uniform manifold approximation and projection for dimension reduction," arXiv preprint arXiv:1802.03426, 2018.
  • [7] M. P. Deisenroth, A. A. Faisal, and C. S. Ong, Mathematics for machine learning. Cambridge University Press, 2020.
  • [8] J. Wilson Black, J. Brand, J. Hay, and L. Clark, "Using principal component analysis to explore co-variation of vowels," Language and Linguistics Compass, vol. 17, no. 1, p. e12479, 2023.
  • [9] I. Świetlicka, W. Kuniszyk-Jóźkowiak, and M. Świetlicki, "Artificial Neural Networks Combined with the Principal Component Analysis for Non-Fluent Speech Recognition," Sensors, vol. 22, no. 1, p. 321, 2022.
  • [10] Y. Zhang and Y. Wang, "Forecasting crude oil futures market returns: A principal component analysis combination approach," International Journal of Forecasting, vol. 39, no. 2, pp. 659-673, 2023.
  • [11] F. Castells, P. Laguna, L. Sörnmo, A. Bollmann, and J. M. Roig, "Principal Component Analysis in ECG Signal Processing," EURASIP Journal on Advances in Signal Processing, vol. 2007, no. 1, p. 074580, 2007.
  • [12] D.-Y. Tzeng and R. S. Berns, "A review of principal component analysis and its applications to color technology," Color Research & Application, vol. 30, no. 2, pp. 84-98, 2005.
  • [13] O. H. J. Christie, "Introduction to multivariate methodology, an alternative way?," Chemometrics and Intelligent Laboratory Systems, vol. 29, no. 2, pp. 177-188, 1995.
  • [14] M. Ghil et al., "Advanced Spectral Methods for Clımatic Time Series," Reviews of Geophysics, vol. 40, no. 1, pp. 3-1-3-41, 2002.
  • [15] J. Hwang et al., "Fast and sensitive recognition of various explosive compounds using Raman spectroscopy and principal component analysis," Journal of Molecular Structure, vol. 1039, pp. 130-136, 2013.
  • [16] P. Federolf, R. Reid, M. Gilgien, P. Haugen, and G. Smith, "The application of principal component analysis to quantify technique in sports," Scandinavian Journal of Medicine & Science in Sports, vol. 24, no. 3, pp. 491-499, 2014.
  • [17] L. Ferré, "Selection of components in principal component analysis: A comparison of methods," Computational Statistics & Data Analysis, vol. 19, no. 6, pp. 669-682, 1995.
  • [18] E. Saccenti and J. Camacho, "Determining the number of components in principal components analysis: A comparison of statistical, crossvalidation and approximated methods," Chemometrics and Intelligent Laboratory Systems, vol. 149, pp. 99-116, 2015.
  • [19] P. R. Peres-Neto, D. A. Jackson, and K. M. Somers, "How many principal components? stopping rules for determining the number of non-trivial axes revisited," Computational Statistics & Data Analysis, vol. 49, no. 4, pp. 974-997, 2005.
  • [20] D. A. Jackson, "Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches," Ecology, vol. 74, no. 8, pp. 2204-2214, 1993.
  • [21] I. T. Jolliffe and J. Cadima, "Principal component analysis: a review and recent developments," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, no. 2065, p. 20150202, 2016.
  • [22] F. Zhang, Matrix theory: basic results and techniques. Springer, 2011.
  • [23] K. Nakai and M. Kanehisa, "Expert system for predicting protein localization sites in gram-negative bacteria," (in eng), Proteins, vol. 11, no. 2, pp. 95-110, 1991.
  • [24] K. Nakai and M. Kanehisa, "A knowledge base for predicting protein localization sites in eukaryotic cells," (in eng), Genomics, vol. 14, no. 4, pp. 897-911, Dec 1992.
  • [25] G. Scalabrini Sampaio, A. R. d. A. Vallim Filho, L. Santos da Silva, and L. Augusto da Silva, "Prediction of Motor Failure Time Using An Artificial Neural Network," Sensors, vol. 19, no. 19, p. 4342, 2019.
  • [26] M. Patrício et al., "Using Resistin, glucose, age and BMI to predict the presence of breast cancer," BMC Cancer, vol. 18, no. 1, p. 29, 2018.
  • [27] D. Ayres-de Campos, J. Bernardes, A. Garrido, J. Marques-de-Sá, and L. Pereira-Leite, "SisPorto 2.0: a program for automated analysis of cardiotocograms," (in eng), J Matern Fetal Med, vol. 9, no. 5, pp. 311-8, Sep-Oct 2000.
  • [28] P. Tüfekci, "Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods," International Journal of Electrical Power & Energy Systems, vol. 60, pp. 126-140, 2014.
  • [29] H. Kaya and P. Tufekci, Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine. 2012.
Bitlis Eren Üniversitesi Fen Bilimleri Dergisi-Cover
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 2012
  • Yayıncı: Bitlis Eren Üniversitesi Rektörlüğü
Sayıdaki Diğer Makaleler

Determination of MIC and MBC Values Using Different Extraction Methods in Plants of Nigella Sativa, Cuminum Cyminum and Pimpinella Anisum L. Samples from Kırıkkale Region

Eftal BÖKE, Birgül KAÇMAZ, Aysun ERGENE

Synthesis, Solvatochromic Analysis and Theoretical Studies of 3-((1H-benzo[d][1,2,3]triazole-1-yl)methyl)-4- phenylethyl -1H-1,2,4-triazole-5(4H)-thione

Zuhal KARAGÖZ, Umut İbrahim OGUZ, Murat GENÇ

Co(II) Adsorption onto Ferrous Chloride and Thermally Modified Diatomite: Surface Properties and Adsorption Mechanism

Eda GÖKIRMAK SÖĞÜT, Metin ÇELEBİ

Investigation of The Effect of Cr2O3 Particles on Al-Si Matrix Composites Produced by Powder Metallurgy

Serkan ÖZEL, Kübra ASLAN

Recycling Wastewater with Membrane Technology and The Case of Singapore

Mikail YENİÇERİ, Kübra TORTUM YENİÇERİ

Application of Intuitionistic Fuzzy Topological Operators in Spatial Objects Modeling

Sinem TARSUSLU

Electrospun TiO2 Nanofibers in the Presence of Avocado Seed Extract

Kübra TEMİZ, Merve ÇAPKIN YURTSEVER

Performance Optimization of Auxetic Structures on Energy Absorption of Cylindrical Sandwich Using Taguchi and ANOVA Methods

Onur KAYA, Ali Husnu BADEMLİOĞLU, Cihan KABOĞLU

An Approach to Determine of the Formation Stages of Volcanism Using Natural Gamma-Ray Spectrometer from Geophysical Methods (Example of Gölcük Volcanism)

Nurten Ayten UYANIK

Examination of the Potential Effect of Corrosion Current Density of Ship Hulls on the Sacrificial Anode Cathodic Protection

Kenan YİĞİT, Mustafa ADANUR