A distribution-free test of independence based on a modified mean variance index

ISSN 2475-4269

CN 31-2182/O1

Fei Ye ,

School of Statistics, Capital University of Economics and Business, Beijing, People's Republic of China

Jingsong Xiao ,

Department of Mathematical Sciences, Tsinghua University, Beijing, People's Republic of China

Ying Yang

Department of Mathematical Sciences, Tsinghua University, Beijing, People's Republic of China

Pages | Received 23 Feb. 2022, Accepted 03 Apr. 2023, Published online: 28 Apr. 2023,

Abstract
Full Article
References
Citations

Cui and Zhong (2019), (Computational Statistics & Data Analysis, 139, 117–133) proposed a test based on the mean variance (MV) index to test independence between a categorical random variable Y with R categories and a continuous random variable X. They ingeniously proved the asymptotic normality of the MV test statistic when R diverges to infinity, which brings many merits to the MV test, including making it more convenient for independence testing when R is large. This paper considers a new test called the integral Pearson chi-square (IPC) test, whose test statistic can be viewed as a modified MV test statistic. A central limit theorem of the martingale difference is used to show that the asymptotic null distribution of the standardized IPC test statistic when R is diverging is also a normal distribution, rendering the IPC test sharing many merits with the MV test. As an application of such a theoretical finding, the IPC test is extended to test independence between continuous random variables. The finite sample performance of the proposed test is assessed by Monte Carlo simulations, and a real data example is presented for illustration.

References

Csörgö, S. (1985). Testing for independence by the empirical characteristic function. Journal of Multivariate Analysis, 16(3), 290–299. https://doi.org/10.1016/0047-259X(85)90022-3
Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630–641. https://doi.org/10.1080/01621459.2014.920256
Cui, H., & Zhong, W. (2018). A distribution-free test of independence and its application to variable selection. Available at arXiv:1801.10559.
Cui, H., & Zhong, W. (2019). A distribution-free test of independence based on mean variance index. Computational Statistics & Data Analysis, 139, 117–133. https://doi.org/10.1016/j.csda.2019.05.004
Dvoretzky, A., Kiefer, J., & Wolfowitz, J. (1956). Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, 27(3), 642–669. https://doi.org/10.1214/aoms/1177728174
Gretton, A., Bousquet, O., Smola, A., & Schölkopf, B. (2005). Measuring statistical dependence with hilbert-schmidt norms. In S. Jain, H. U. Simon, & E. Tomita (Eds.), Algorithmic learning theory (pp. 63–77). Springer Berlin Heidelberg.
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J. (2007). A kernel statistical test of independence. In Proceedings of the 20th International Conference on Neural Information Processing Systems (pp 585–592). Curran Associates Inc. NIPS'07.
Hall, P., & Heyde, C. C (1980). Martingale limit theory and its application, Probability and mathematical statistics, Inc, Academic Press [Harcourt Brace Jovanovich, Publishers].
Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair, J. P., Niu, M., Hirsch, M. S., & Merigan, T. C. (1996). A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine, 335(15), 1081–1090. https://doi.org/10.1056/NEJM199610103351501
He, S., Ma, S., & Xu, W. (2019). A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis. Computational Statistics & Data Analysis, 137, 155–169. https://doi.org/10.1016/j.csda.2019.02.003
Heller, R., Heller, Y., & Gorfine, M. (2012). A consistent multivariate test of association based on ranks of distances. Biometrika, 100(2), 503–510. https://doi.org/10.1093/biomet/ass070
Heller, R., Heller, Y., Kaufman, S., Brill, B., & Gorfine, M. (2016). Consistent distribution-free k-sample and independence tests for univariate random variables. Journal of Machine Learning Research, 17(29), 1–54.
Hoeffding, W. (1948). A non-parametric test of independence. The Annals of Mathematical Statistics, 19(4), 546–557. https://doi.org/10.1214/aoms/1177730150
Jiang, B., Ye, C., & Liu, J. S. (2015). Nonparametric k-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510), 642–653. https://doi.org/10.1080/01621459.2014.920257
Lu, W., Zhang, H. H., & Zeng, D. (2013). Variable selection for optimal treatment decision. Statistical Methods in Medical Research, 22(5), 493–504. https://doi.org/10.1177/0962280211428383
Ma, W., Xiao, J., Yang, Y., & Ye, F. (2022). Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index. Journal of Statistical Computation and Simulation, 92(15), 3222–3248. https://doi.org/10.1080/00949655.2022.2062358
Mai, Q., & Zou, H. (2015a). Sparse semiparametric discriminant analysis. Journal of Multivariate Analysis, 135, 175–188. https://doi.org/10.1016/j.jmva.2014.12.009
Mai, Q., & Zou, H. (2015b). The fused Kolmogorov filter: A nonparametric model-free screening method. The Annals of Statistics, 43(4), 1471–1497. https://doi.org/10.1214/14-AOS1303
Ni, L., & Fang, F. (2016). Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification. Journal of Nonparametric Statistics, 28(3), 515–530. https://doi.org/10.1080/10485252.2016.1167206
Ni, L., Fang, F., & Shao, J. (2020). Feature screening for ultrahigh dimensional categorical data with covariates missing at random. Computational Statistics & Data Analysis, 142, Article 106824. https://doi.org/10.1016/j.csda.2019.106824
Ni, L., Fang, F., & Wan, F. (2017). Adjusted Pearson chi-square feature screening for multi-classification with ultrahigh dimensional data. Metrika, 80(6–8), 805–828. https://doi.org/10.1007/s00184-017-0629-9
Pfister, N., Bühlmann, P., Schölkopf, B., & Peters, J. (2018). Kernel-based tests for joint independence. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 80(1), 5–31. https://doi.org/10.1111/rssb.12235
Rosenblatt, M. (1975). A quadratic measure of deviation of two-dimensional density estimates and a test of independence. The Annals of Statistics, 3(1), 1–14. https://doi.org/10.1214/aos/1176342996
Scholz, F.-W., & Stephens, M. A. (1987). k-sample Anderson–Darling tests. Journal of the American Statistical Association, 82(399), 918–924. https://doi.org/10.2307/2288805
Székely, G. J., & Rizzo, M. L. (2009). Brownian distance covariance. The Annals of Applied Statistics, 3(4), 1236–1265. https://doi.org/10.1214/09-AOAS312
Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794. https://doi.org/10.1214/009053607000000505
Tsiatis, A. A., Davidian, M., Zhang, M., & Lu, X. (2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Statistics in Medicine, 27(23), 4658–4677. https://doi.org/10.1002/sim.3113
Xu, K., Shen, Z., Huang, X., & Cheng, Q. (2020). Projection correlation between scalar and vector variables and its use in feature screening with multi-response data. Journal of Statistical Computation and Simulation, 90(11), 1923–1942. https://doi.org/10.1080/00949655.2020.1753057
Yan, X., Tang, N., Xie, J., Ding, X., & Wang, Z. (2018). Fused mean-variance filter for feature screening. Computational Statistics & Data Analysis, 122, 18–32. https://doi.org/10.1016/j.csda.2017.10.008
Zhang, M., Tsiatis, A. A., & Davidian, M. (2008). Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics, 64(3), 707–715. https://doi.org/10.1111/j.1541-0420.2007.00976.x
Zhang, Y., Chen, C., & Zhu, L. (2022). Sliced independence test. Statistica Sinica, 32(Special onlline issue), 2477–2496. https://doi.org/10.5705/ss.202021.0203
Zhong, W., Wang, J., & Chen, X. (2021). Censored mean variance sure independence screening for ultrahigh dimensional survival data. Computational Statistics & Data Analysis, 159, Article 107206. https://doi.org/10.1016/j.csda.2021.107206
Zhou, N., Guo, X., & Zhu, L. (2020). A projection-based model checking for heterogeneous treatment effect. Available at arXiv:2009.10900.
Zhou, Y., & Zhu, L. (2018). Model-free feature screening for ultrahigh dimensional datathrough a modified Blum-Kiefer-Rosenblatt correlation. Statistica Sinica, 28(3), 1351–1370. https://doi.org/10.5705/ss.202016.0264
Zhu, L., Xu, K., Li, R., & Zhong, W. (2017). Projection correlation between two random vectors. Biometrika, 104(4), 829–843. https://doi.org/10.1093/biomet/asx043

To cite this article: Weidong Ma, Fei Ye, Jingsong Xiao & Ying Yang (2023) A distribution-free test of independence based on a modified mean variance index, Statistical Theory and Related Fields, 7:3, 235-259, DOI: 10.1080/24754269.2023.2201101

To link to this article: https://doi.org/10.1080/24754269.2023.2201101

Archives

References

Authors

About the Journal

Links

Search

Archives