On some Diagonalized and Regularized Hotelling’s ?? Tests of Location for High Dimensional Data

On some Diagonalized and Regularized Hotelling’s ?? Tests of Location for High Dimensional Data

A widely used statistical test of hypothesis for location parameter in ℝ?is the Hotelling’s ?2test. This test is efficient if data is normally distributed, ratio of sample size to dimension diverges and there are no outliers in the data. However, it is practically impossible to implementwhendimensionisgreaterthansamplesize.Asaremedialmeasure, diagonalized and regularized Hotelling’s ?2tests were proposed. In this paper, powers of regularized and diagonalized Hotelling’s ?2tests are compared with the usual Hotelling’s ?2testin low dimension and the usual Hotelling’s ?2perform much better. It is observed that diagonalized Hotelling’s ?2test may have low power for mixture distributions. Due to a comparative performance of regularized and diagonalized Hotelling’s ?2tests, robust versions of diagonalized and regularized Hotelling’s ?2tests are proposed in high dimension in the presence of outliers. The powers of these tests were compared using simulated as well as real datasets.

___

  • [1] Hu, J., Bai, Z., “A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices”, Science China Mathematics, 59(12): 2281-2300, (2016).
  • [2] Capilla, C., “Application and Simulation on Study of the Hotelling’s ?2 Control Chart to monitor a wastewater treatment Process”, Journal of Environmental Engineering Science, 26(2): 333–42, (2009).
  • [3] Bai, Z., Saranadasa, H., “Effect of high dimension: by an example of a two sample problem”, Statistica Sinica, 6: 311–29, (1996).
  • [4] Dempster, A.P., “A high dimensional two sample significance test”, The Annals of Mathematical Statistics, 29(4): 995-1010, (1958).
  • [5] Dempster, A.P., “A significance test for the separation of two highly multivariate small sample”, Biometrics, 16(1): 41-50, (1960).
  • [6] Chen, S.X., Qin, Y.L., “A two-sample test for high-dimensional data with applications to gene set testing”, The Annals of Statistics, 38(2): 808–35, (2010).
  • [7] Srivastava, M.S., Du, M., “A test for the mean vector with fewer observations than the dimension”, Journal of Multivariate Analysis, 99(3): 386–402, (2008).
  • [8] Chattinnawat, W., Bilen, C., “Performance analysis of Hotelling ?2 under multivariate inspection errors”, Quality Technology and Quantitative Management, 14(3): 249 – 68, (2017).
  • [9] Chen, L.S., Paul, D., Prentice, R.L., Wang, P., “A regularized Hotelling’s ?2 test for pathway analysis in proteomic studies”, Journal of American Statistical Association, (496): 1345 – 60, (2011).
  • [10] Chen, S.X., Zhang, L.X., Zhong, P.S., “Tests for High-Dimensional Covariance Matrices”, Journal of the American Statistical Association, 105(490): 810-19, (2010).
  • [11] Lu, Y., Liu, P.Y., Xiao, P., Deng, H.W., “Hotelling's T^2 multivariate profiling for detecting differential expression in microarrays”, Bioinformatics, 21(14): 3105-13, (2005).
  • [12] Jureckova, J., Kalina, J., “Nonparametric multivariate rank tests and their unbiasedness”, Bernoulli, 18(1): 229-51, (2012).
  • [13] Lopez-Pintado, S., Romo, J., “Depth based classification of functional data”, DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Data Depth: Robust Multivariate Analysis, Computational Geometry and Appliations. American Mathematical Society, 72: 103-20, (2006).
  • [14] Liu, R., Singh, K., “A quality index based on data depth and multivariate rank test”, Journal of the American Statistical Association, 88(421): 252-59, (1993).
  • [15] Zuo, Y., Serfling, R., “General notions of Statistical depth function”. The Annals of Statistics, 28(2): 461 – 82, (2000).
  • [16] Makinde, O.S., Adewumi, A.D., “A comparison of depth functions in maximal depth classification rules”, Journal of Modern Applied Statistics and Methods, 16(1): 388 – 405, (2017).
  • [17] Guo, Y., Hastie, T., Tibshirani, R., “Regularized linear discriminant analysis and its application in micro-arrays”, Biostatistics, 8: 86-100, (2007).
  • [18] Makinde, O.S., “Gene expression data classification: some distance-based methods”, Kuwait Journal of Science, 46(3): 31-39, (2019).
  • [19] Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S., "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks", Nature Medicine, 7: 673-79, (2001).
  • [20] Makinde, O.S., Fasoranbaku. O.A., "On maximum depth classifiers: depth distribution approach", Journal of Applied Statistics, 45(6): 1106-17, (2018).
  • [21] Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J., "Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays", Proceedings of the National Academy of Sciences of the United States of America, 96(12): 6745-50, (1999).