Kanonik Korelasyon Katsayılarının İstatistiksel Önemliliğini Test Etmek için Hangi Test Daha Güvenilirdir?

Bu çalışmada, kanonik korelasyon katsayılarının istatistiksel olarak önemlilik testinde kullanılan Wilks' Λ (W), Hotelling-Lawley Trace (H) ve Pillai's Trace (P) testleri gerçek tip I hata oranı açısından karşılaştırılmıştır. Yapılan 10000 simülasyon deneyi sonucunda, normal olan ve normallikten hafif veya orta derecede sapan çok değişkenli dağılımlardan örnekler alındığında, W testi gerçek tip I hata oranını tüm durumlarda koruma açısından muhafazakar olmuştur. Ancak normallikten aşırı derecede sapma olduğunda, W testi için gerçek tip I hata oranları hemen hemen tüm durumlarda Bradley kriterinin üst sınırını (%4,50-5,50) aşmıştır. H testi ve P testi ise genel olarak Bradley sınırlarının dışında kalan gerçek tip I hata oranları elde etmiştir.

Anahtar Kelimeler:

Wilks’ Λ, Hotelling-Lawley Trace, Pillai’s Trace, I. tip hata oranı, Monte Carlo simülasyonu

Which Test is More Reliable for The Testing Statistical Significance of Canonical Correlation Coefficients?

In this study, Wilks’ Λ (W), Hotelling-Lawley Trace (H) and Pillai’s Trace (P) tests which are used in testing of statistically significance for canonical correlation coefficients were compared in terms of actual type I error rate. As a result of 10000 simulation experiments conducted, when samples were taken from multivariate distributions which are normal and deviate slightly or moderately from normality, the W test was conservative in terms of protecting actual type I error rate in all cases. However, when there is excessively deviate from normality, actual type I error rates for the W test exceeded the upper limit of Bradley’s criterion (4.50-5.50%) almost in all cases. On the other hand, the H test and P test generally obtained actual type I error rates which were outside Bradley limits.

Keywords:

Wilks’ Λ, Hotelling-Lawley Trace, Pillai’s Trace, type I error rate, Monte Carlo simulation,

PDF

___

Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis (2nd edition). John Wiley and Sons,
Anderson, T. W. (1999). Asymptotic theory for canonical correlation analysis. Journal of Multivariate Analysis, 70(1), 1-29.
Andrew, G. Arora, R, Bilmes, J. & Livescu, K. (2013). Deep canonical correlation analysis. In International conference on machine learning (pp. 1247-1255).
Baggaley, A. R. (1981). Multivariate analysis: an introduction for consumers of behavioral research.Evaluation Review, 5, 123-131.
Bradley, J. V. (1978). Robustness?. British Journal of Mathematical and Statistical Psychology, 31(2), 144-152.
Carroll, J. D. (1968). Generalization of canonical correlation analysis to three or more sets of variables. Proceedings of the 76th Annual Convention of the Psychological Association, 3, 227–228.
Gauch, H. G. & Wentworth, T. R. (1976). Canonical correlation analysis as an ordination technique. Vegetatio, 33(1), 17-22.
Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28(3/4):321–377, 1936.
Ferreira, M. A. & Purcell, S. M. (2009). A multivariate test of association. Bioinformatics, 25(1), 132-133.
Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43, 521-532.
Hotelling, H. (1951) “A generalized T-test and measure of multivariate dispersion,” in Proceedings of the Second Berkely Symposium on Mathematics and Statistics, pp. 23–41, Berkeley, CA, USA, August 1951.
Knapp, T. R. (1978). Canonical correlation analysis: A general parametric significance-testing system. Psychological Bulletin, 85(2), 410.
Kerlinger, F. N. & Pedhazur, E. J., (1973). Multiple regression in behavioral research. New York, NY:Holt Rinehart & Winston.
Lawley D. N. (1938), A generalization of Fisher’s z test, Biometrika, vol. 30, no. 1‐2, pp. 180–187, 1938.
Meloun, M. & Militky, J. (2011). Statistical data analysis: A practical guide. Woodhead Publishing, Limited.
Pillai, K. C. S. (1955). Some new test criteria in multivariate analysis. The Annals of Mathematical Statistics, 26(1), 117-121.
R Core Team. (2019). R: A language and environment for statistical computing. Ankara, Turkey: R Foundation for Statistical Computing. URL http://www.R-project.org/
Rao, C. R. (1973). Linear Statistical Inference and Its Applications. 2nd ed. New York: John Wiley & Sons.
Sharma, S. (1996). Applied Multivariate Techniques: Canonical Corelation, 391-418. John Willey and Sons Inc., USA.
Stewart, D. & Love, W. (1968). A general canonical correlation index. Psychological bulletin, 70(3p1), 160.
Tang, C. S. & Ferreira, M. A. (2012). A gene-based test of association using canonical correlation analysis. Bioinformatics, 28(6), 845-850.
Takane, Y. Yanai, H. & Hwang, H. (2006). An improved method for generalized constrained canonical correlation analysis. Computational statistics & data analysis, 50(1), 221-241.
Thompson, B. (1984). Canonical correlation analysis uses and interpretations. Newbury Park, CA: Sage.
Vale, D. C. & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48, 465-471.
Van De Velden, M. & Bijmolt, T. H. (2006). Generalized canonical correlation analysis of matrices with missing rows: a simulation study. Psychometrika, 71(2), 323-331.
Waller, N. G. (2016). Fungible correlation matrices: A method for generating nonsingular, singular, and improper correlation matrices for Monte Carlo research. Multivariate behavioral research, 51(4), 554-568.
Wilks S. S. (1932). Certain generalizations made in the analysis of variance, Biometrica, vol. 24, no. 3-4, pp. 471–494, 1932.
Yanai, H. & Takane, Y. (1992). Canonical correlation analysis with linear constraints. Linear algebra and its applications, 176, 75-89.