On some Diagonalized and Regularized Hotelling’s T^2 Tests of Location for High Dimensional Data

A widely used statistical test of hypothesis for location parameter in R^p is the Hotelling’s T^2 test. This test is efficient if data is normally distributed, ratio of sample size to dimension diverges and there are no outliers in the data. However, it is practically impossible to implement when dimension is greater than sample size. As a remedial measure, diagonalized and regularized Hotelling’s T^2 tests were proposed. In this paper, powers of regularized and diagonalized Hotelling’s T^2 tests are compared with the usual Hotelling’s T^2 test in low dimension and the usual Hotelling’s T^2 perform much better. It is observed that diagonalized Hotelling’s T^2 test may have low power for mixture distributions. Due to a comparative performance of regularized and diagonalized Hotelling’s T^2 tests, robust versions of diagonalized and regularized Hotelling’s T^2 tests are proposed in high dimension in the presence of outliers. The powers of these tests were compared using simulated as well as real datasets.

___

  • Bai Z, Saranadasa H, “Effect of high dimension: by an example of a two sample problem”, Statistica Sinica, 6:311–329, (1996).
  • Capilla C, “Application and Simulation on Study of the Hotelling’s T^2 Control Chart to monitor a wastewater treatment Process”, Journal of Environmental Engineering Science, 26(2): 333–342, (2009).
  • Chattinnawat W, Bilen C, “Performance analysis of Hotelling T^2 under multivariate inspection errors”, Quality Technology and Quantitative Management, 14(3):249 – 268, (2017).
  • Chen LS, Paul D, Prentice RL, Wang P. “A regularized Hotelling’s T^2 test for pathway analysis in proteomic studies”, Journal of American Statistical Association, (496):1345 – 1360, (2011).
  • Chen SX, Zhang LX, Zhong PS, “Tests for High-Dimensional Covariance Matrices”, Journal of the American Statistical Association, 105(490): 810-819, (2010).
  • Dempster AP, “A high dimensional two sample significance test”, The Annals of Mathematical Statistics, 29(4):995-1010, (1958).
  • Dempster AP, “A significance test for the separation of two highly multivariate small sample”, Biometrics, 16(1):41-50, (1960).
  • Hu J, Bai Z, “A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices”, Science China Mathematics, 59(12), 2281-2300, (2016).
  • Jureckova J, Kalina J, “Nonparametric multivariate rank tests and their unbiasedness”, Bernoulli, 18(1):229-251, (2012).
  • Liu R, Singh K, “A quality index based on data depth and multivariate rank test”, Journal of the American Statistical Association, 88(421):252-259, (1993).
  • Lopez-Pintado S, Romo J, “Depth based classification of functional data”, DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Data Depth: Robust Multivariate Analysis, Computational Geometry and Appliations. American Mathematical Society, 72:103-120, (2006).
  • Lu Y, Liu PY, Xiao P, Deng HW, “Hotelling's T^2 multivariate profiling for detecting differential expression in microarrays”, Bioinformatics, 21(14):3105-13, (2005).
  • Makinde OS, “Gene expression data classification: some distance-based methods”, Kuwait Journal of science, 46(3):31-39, (2019)
  • Makinde OS, Adewumi AD, “A comparison of depth functions in maximal depth classification rules”, Journal of Modern Applied Statistics and Methods, 16(1):388 – 405, (2017).
  • Srivastava MS, Du M, “A test for the mean vector with fewer observations than the dimension”, Journal of Multivariate Analysis, 99(3):386–402, (2008)
  • Zuo Y, Serfling R, “General notions of Statistical depth function”. The Annals of Statistics, 28(2): 461 – 482, (2000).