Review Articles

High-dimensional proportionality test of two covariance matrices and its application to gene expression data

Long Feng ,

School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, People’s Republic of China

Xiaoxu Zhang ,

School of Mathematics and Statistics and KLAS, Northeast Normal University, Changchun, People’s Republic of China

Binghui Liu

School of Mathematics and Statistics and KLAS, Northeast Normal University, Changchun, People’s Republic of China

Pages 161-174 | Received 04 Nov. 2020, Accepted 09 Sep. 2021, Published online: 06 Oct. 2021,
  • Abstract
  • Full Article
  • References
  • Citations

With the development of modern science and technology, more and more high-dimensional data appear in the application fields. Since the high dimension can potentially increase the complexity of the covariance structure, comparing the covariance matrices among populations is strongly motivated in high-dimensional data analysis. In this article, we consider the proportionality test of two high-dimensional covariance matrices, where the data dimension is potentially much larger than the sample sizes, or even larger than the squares of the sample sizes. We devise a novel high-dimensional spatial rank test that has much-improved power than many existing popular tests, especially for the data generated from some heavy-tailed distributions. The asymptotic normality of the proposed test statistics is established under the family of elliptically symmetric distributions, which is a more general distribution family than the normal distribution family, including numerous commonly used heavy-tailed distributions. Extensive numerical experiments demonstrate the superiority of the proposed test in terms of both empirical size and power. Then, a real data analysis demonstrates the practicability of the proposed test for high-dimensional gene expression data.

References

  • Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., & Levine, D. M. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 96(12), 67456750. https://doi.org/10.1073/pnas.96.12.6745
  • Bai, Z., & Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statistica Sinica, 6(2), 311329.
  • Barber, R. F., & Kolar, M. (2018). Rocket: Robust confidence intervals via Kendall's tau for transelliptical graphical models. The Annals of Statistics, 46(6B), 34223450. https://doi.org/10.1214/17-AOS1663
  • Bühlmann, P, & van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications (1st ed.). Springer Publishing Company, Incorporated.
  • Cai, T. T., & Zhang, A. (2016). Inference for high-dimensional differential correlation matrices. Journal of Multivariate Analysis, 143(6009), 107126. https://doi.org/10.1016/j.jmva.2015.08.019
  • Chen, S. X., & Qinm, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38(2), 808835.https://doi.org/10.1214/09-AOS716
  • Cheng, G., Liu, B., Peng, L., Zhang, B., & Zheng, S. (2018). Testing the equality of two high-dimensional spatial sign covariance matrices. Scandinavian Journal of Statistics, 46(1), 257271. https://doi.org/10.1111/sjos.v46.1
  • Eriksen, P. S. (1987). Proportionality of covariance matrices. Annals of Statistics, 15(2), 732748. https://doi.org/10.1214/aos/1176350372
  • Fang, K. T., Kotz, S., & Ng, K. W. (1990). Symmetric multivariate and related distributions. Chapman and Hall.
  • Federer, W. T. (1951). Testing proportionality of covariance matrices. Annals of Mathematical Statistics, 22(1), 102106. https://doi.org/10.1214/aoms/1177729697
  • Feng, L., & Liu, B. (2017). High-dimensional rank tests for sphericity. Journal of Multivariate Analysis, 155, 217233. https://doi.org/10.1016/j.jmva.2017.01.003
  • Feng, L., & Sun, F. (2016). Spatial-sign based high-dimensional location test. Electronic Journal of Statistics, 10(2), 24202434. https://doi.org/10.1214/16-EJS1176
  • Feng, L., Zou, C., & Wang, Z. (2016). Multivariate-sign-based high-dimensional tests for the two-sample location problem. Journal of the American Statistical Association, 111(514), 721735. https://doi.org/10.1080/01621459.2015.1035380
  • Flury, B. K. (1986). Proportionality of k covariance matrices. Statistics and Probability Letters, 4(1), 2933. https://doi.org/10.1016/0167-7152(86)90035-0
  • Flury, B. K., & Riedwyl, H. (1988). Multivariate statistics: A practical approach. Chapman and Hall.
  • Hall, P. G., & Hyde, C. C. (1980). Martingale central limit theory and its applications. Academic Press.
  • Han, F., Chen, S., & Liu, H. (2017). Distribution-free tests of independence in high dimensions. Biometrika, 104(4), 813828. https://doi.org/10.1093/biomet/asx050
  • Han, F., & Liu, H. (2018). ECA: High-dimensional elliptical component analysis in non-Gaussian distributions. Journal of the American Statistical Association, 113(521), 252268. https://doi.org/10.1080/01621459.2016.1246366
  • Kim, D. Y. (1971). Statistical inference for constants of proportionality between covariance matrices. Technical Report 59, Stanford University.
  • Leung, D., & Drton, M. (2018). Testing independence in high dimensions with sums of rank correlations. The Annals of Statistics, 46(1), 280307. https://doi.org/10.1214/17-AOS1550
  • Li, J., & S. X. Chen (2012). Two sample tests for high dimensional covariance matrices. Annals of Statistics, 40(2), 908940.https://doi.org/10.1214/12-AOS993
  • Liu, B., Xu, L., Zheng, S., & Tian, G. (2014). A new test for the proportionality of two large-dimensional covariance matrices. Journal of Multivariate Analysis, 131(1), 293308. https://doi.org/10.1016/j.jmva.2014.06.008
  • Magyar, A., & Tyler, D. (2014). The asymptotic inadmissibility of the spatial sign covariance matrix for elliptically symmetric distributions. Biometrika, 101(3), 673688. https://doi.org/10.1093/biomet/asu020
  • Mottonen, J., & Oja, H. (1995). Multivariate spatial sign and rank methods. Journal of Nonparametric Statistics, 5(2), 201213. https://doi.org/10.1080/10485259508832643
  • Oja, H. (2010). Multivariate nonparametric methods with R. Springer.
  • Rao, C. R. (1983). Likelihood ratio tests for relationships between two covariance matrices. In S. Karlin, T. Amemiya, & L. A. Goodman (Eds.), Studies in econometrics, time series and multivariate statistics (pp. 529–543). Academic Press.
  • Schott, J. R. (1991). Some tests for common principal component subspaces in several groups. Biometrika, 78(4), 771777. https://doi.org/10.1093/biomet/78.4.771
  • Schott, J. R. (1999). A test for proportional covariance matrices. Computational Statistics and Data Analysis, 32(2), 135146. https://doi.org/10.1016/S0167-9473(99)00032-8
  • Wang, L., Peng, B., & Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. Journal of the American Statistical Association, 110(512), 16581669. https://doi.org/10.1080/01621459.2014.988215
  • Xu, L., Liu, B., Zheng, S., & Bao, S. (2014). Testing proportionality of two large-dimensional covariance matrices. Computational Statistics and Data Analysis, 78, 4355. https://doi.org/10.1016/j.csda.2014.03.014
  • Zou, C. L., Peng, L. H., Feng, L., & Wang, Z. J. (2014). Multivariate sign-based high-dimensional tests for sphericity. Biometrika, 101(1), 229236. https://doi.org/10.1093/biomet/ast040

To cite this article: Long Feng, Xiaoxu Zhang & Binghui Liu (2021): High-dimensional
proportionality test of two covariance matrices and its application to gene expression data,
Statistical Theory and Related Fields, DOI: 10.1080/24754269.2021.1984373
To link to this article: https://doi.org/10.1080/24754269.2021.1984373