Review Articles

A distribution-free test of independence based on a modified mean variance index

Weidong Ma ,

Department of Mathematical Sciences, Tsinghua University, Beijing, People's Republic of China

Fei Ye ,

School of Statistics, Capital University of Economics and Business, Beijing, People's Republic of China

Jingsong Xiao ,

Department of Mathematical Sciences, Tsinghua University, Beijing, People's Republic of China

Ying Yang

Department of Mathematical Sciences, Tsinghua University, Beijing, People's Republic of China

Pages | Received 23 Feb. 2022, Accepted 03 Apr. 2023, Published online: 28 Apr. 2023,
  • Abstract
  • Full Article
  • References
  • Citations

Cui and Zhong (2019), (Computational Statistics & Data Analysis, 139, 117–133) proposed a test based on the mean variance (MV) index to test independence between a categorical random variable Y with R categories and a continuous random variable X. They ingeniously proved the asymptotic normality of the MV test statistic when R diverges to infinity, which brings many merits to the MV test, including making it more convenient for independence testing when R is large. This paper considers a new test called the integral Pearson chi-square (IPC) test, whose test statistic can be viewed as a modified MV test statistic. A central limit theorem of the martingale difference is used to show that the asymptotic null distribution of the standardized IPC test statistic when R is diverging is also a normal distribution, rendering the IPC test sharing many merits with the MV test. As an application of such a theoretical finding, the IPC test is extended to test independence between continuous random variables. The finite sample performance of the proposed test is assessed by Monte Carlo simulations, and a real data example is presented for illustration.

References

  • Csörgö, S. (1985). Testing for independence by the empirical characteristic function. Journal of Multivariate Analysis16(3), 290–299. https://doi.org/10.1016/0047-259X(85)90022-3 
  • Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association110(510), 630–641. https://doi.org/10.1080/01621459.2014.920256 
  • Cui, H., & Zhong, W. (2018). A distribution-free test of independence and its application to variable selection. Available at arXiv:1801.10559. 
  • Cui, H., & Zhong, W. (2019). A distribution-free test of independence based on mean variance index. Computational Statistics & Data Analysis139, 117–133. https://doi.org/10.1016/j.csda.2019.05.004 
  • Dvoretzky, A., Kiefer, J., & Wolfowitz, J. (1956). Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics27(3), 642–669. https://doi.org/10.1214/aoms/1177728174
  • Gretton, A., Bousquet, O., Smola, A., & Schölkopf, B. (2005). Measuring statistical dependence with hilbert-schmidt norms. In S. Jain, H. U. Simon, & E. Tomita (Eds.), Algorithmic learning theory (pp. 63–77). Springer Berlin Heidelberg. 
  • Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J. (2007). A kernel statistical test of independence. In Proceedings of the 20th International Conference on Neural Information Processing Systems (pp 585–592). Curran Associates Inc. NIPS'07. 
  • Hall, P., & Heyde, C. C (1980). Martingale limit theory and its application, Probability and mathematical statistics, Inc, Academic Press [Harcourt Brace Jovanovich, Publishers]. 
  • Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair, J. P., Niu, M., Hirsch, M. S., & Merigan, T. C. (1996). A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine335(15), 1081–1090. https://doi.org/10.1056/NEJM199610103351501 
  • He, S., Ma, S., & Xu, W. (2019). A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis. Computational Statistics & Data Analysis137, 155–169. https://doi.org/10.1016/j.csda.2019.02.003 
  • Heller, R., Heller, Y., & Gorfine, M. (2012). A consistent multivariate test of association based on ranks of distances. Biometrika100(2), 503–510. https://doi.org/10.1093/biomet/ass070 
  • Heller, R., Heller, Y., Kaufman, S., Brill, B., & Gorfine, M. (2016). Consistent distribution-free k-sample and independence tests for univariate random variables. Journal of Machine Learning Research17(29), 1–54. 
  • Hoeffding, W. (1948). A non-parametric test of independence. The Annals of Mathematical Statistics19(4), 546–557. https://doi.org/10.1214/aoms/1177730150 
  • Jiang, B., Ye, C., & Liu, J. S. (2015). Nonparametric k-sample tests via dynamic slicing. Journal of the American Statistical Association110(510), 642–653. https://doi.org/10.1080/01621459.2014.920257 
  • Lu, W., Zhang, H. H., & Zeng, D. (2013). Variable selection for optimal treatment decision. Statistical Methods in Medical Research22(5), 493–504. https://doi.org/10.1177/0962280211428383 
  • Ma, W., Xiao, J., Yang, Y., & Ye, F. (2022). Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index. Journal of Statistical Computation and Simulation92(15), 3222–3248. https://doi.org/10.1080/00949655.2022.2062358 
  • Mai, Q., & Zou, H. (2015a). Sparse semiparametric discriminant analysis. Journal of Multivariate Analysis135, 175–188. https://doi.org/10.1016/j.jmva.2014.12.009 
  • Mai, Q., & Zou, H. (2015b). The fused Kolmogorov filter: A nonparametric model-free screening method. The Annals of Statistics43(4), 1471–1497. https://doi.org/10.1214/14-AOS1303 
  • Ni, L., & Fang, F. (2016). Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification. Journal of Nonparametric Statistics28(3), 515–530. https://doi.org/10.1080/10485252.2016.1167206 
  • Ni, L., Fang, F., & Shao, J. (2020). Feature screening for ultrahigh dimensional categorical data with covariates missing at random. Computational Statistics & Data Analysis142, Article 106824. https://doi.org/10.1016/j.csda.2019.106824 
  • Ni, L., Fang, F., & Wan, F. (2017). Adjusted Pearson chi-square feature screening for multi-classification with ultrahigh dimensional data. Metrika80(6–8), 805–828. https://doi.org/10.1007/s00184-017-0629-9 
  • Pfister, N., Bühlmann, P., Schölkopf, B., & Peters, J. (2018). Kernel-based tests for joint independence. Journal of the Royal Statistical Society. Series B. Statistical Methodology80(1), 5–31. https://doi.org/10.1111/rssb.12235
  • Rosenblatt, M. (1975). A quadratic measure of deviation of two-dimensional density estimates and a test of independence. The Annals of Statistics3(1), 1–14. https://doi.org/10.1214/aos/1176342996 
  • Scholz, F.-W., & Stephens, M. A. (1987). k-sample Anderson–Darling tests. Journal of the American Statistical Association82(399), 918–924. https://doi.org/10.2307/2288805 
  • Székely, G. J., & Rizzo, M. L. (2009). Brownian distance covariance. The Annals of Applied Statistics3(4), 1236–1265. https://doi.org/10.1214/09-AOAS312 
  • Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics35(6), 2769–2794. https://doi.org/10.1214/009053607000000505 
  • Tsiatis, A. A., Davidian, M., Zhang, M., & Lu, X. (2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Statistics in Medicine27(23), 4658–4677. https://doi.org/10.1002/sim.3113 
  • Xu, K., Shen, Z., Huang, X., & Cheng, Q. (2020). Projection correlation between scalar and vector variables and its use in feature screening with multi-response data. Journal of Statistical Computation and Simulation90(11), 1923–1942. https://doi.org/10.1080/00949655.2020.1753057 
  • Yan, X., Tang, N., Xie, J., Ding, X., & Wang, Z. (2018). Fused mean-variance filter for feature screening. Computational Statistics & Data Analysis122, 18–32. https://doi.org/10.1016/j.csda.2017.10.008 
  • Zhang, M., Tsiatis, A. A., & Davidian, M. (2008). Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics64(3), 707–715. https://doi.org/10.1111/j.1541-0420.2007.00976.x 
  • Zhang, Y., Chen, C., & Zhu, L. (2022). Sliced independence test. Statistica Sinica32(Special onlline issue), 2477–2496. https://doi.org/10.5705/ss.202021.0203 
  • Zhong, W., Wang, J., & Chen, X. (2021). Censored mean variance sure independence screening for ultrahigh dimensional survival data. Computational Statistics & Data Analysis159, Article 107206. https://doi.org/10.1016/j.csda.2021.107206
  • Zhou, N., Guo, X., & Zhu, L. (2020). A projection-based model checking for heterogeneous treatment effect. Available at arXiv:2009.10900. 
  • Zhou, Y., & Zhu, L. (2018). Model-free feature screening for ultrahigh dimensional datathrough a modified Blum-Kiefer-Rosenblatt correlation. Statistica Sinica28(3), 1351–1370. https://doi.org/10.5705/ss.202016.0264 
  • Zhu, L., Xu, K., Li, R., & Zhong, W. (2017). Projection correlation between two random vectors. Biometrika104(4), 829–843. https://doi.org/10.1093/biomet/asx043 

To cite this article: Weidong Ma, Fei Ye, Jingsong Xiao & Ying Yang (2023) A distribution-free test of independence based on a modified mean variance index, Statistical Theory and Related Fields, 7:3, 235-259, DOI: 10.1080/24754269.2023.2201101

To link to this article: https://doi.org/10.1080/24754269.2023.2201101