Review Articles

CorrDA: correlation-matrix driven discriminant analysis

Feifei Yan ,

School of Mathematics and Statistics, Zhoukou Normal University, Zhoukou, People's Republic of China

Yingjie Zhang ,

School of Statistics, KLATASDS-MOE, East China Normal University, Shanghai, People's Republic of China

Jing Ning ,

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA

Hai Shu ,

Department of Biostatistics, School of Global Public Health, New York University, New York, NY, USA

Ziqi Chen

School of Statistics, KLATASDS-MOE, East China Normal University, Shanghai, People's Republic of China

zqchen@fem.ecnu.edu.cn

Pages | Received 17 Feb. 2025, Accepted 25 Mar. 2026, Published online: 07 Apr. 2026,
  • Abstract
  • Full Article
  • References
  • Citations

This article introduces a novel approach to integrating correlation matrix information from training samples to construct a classification rule for testing samples. Traditional discriminant analysis methods that rely solely on mean vectors tend to perform poorly when the mean of the training samples is not indicative of the testing samples. To address this limitation, we propose a new discriminant analysis method called Correlation-matrix driven Discriminant Analysis (CorrDA). By considering the correlation matrices of different classes in the training samples, we can capture the unique patterns among the classes. CorrDA utilizes the Bayes classifier and mixture models to effectively incorporate the correlation matrix information derived from the training samples, thereby improving the discriminant analysis performance on the testing data. Through the analysis of COVID-19 datasets and extensive simulation studies, we provide empirical evidence demonstrating the superior performance of CorrDA.

Your browser may not support PDF viewing. Please click to download the file.

References

  • Aerts, S., & Wilms, I. (2017). Cellwise robust regularized discriminant analysis. Statistical Analysis and Data Mining: The ASA Data Science Journal10(6), 436–447. https://doi.org/10.1002/sam.2017.10.issue-6
  • Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis. Wiley.
  • Bensmail, H., & Celeux, G. (1996). Regularized Gaussian discriminant analysis through eigenvalue decomposition. Journal of the American Statistical Association91(436), 1743–1748. https://doi.org/10.1080/01621459.1996.10476746
  • Bergquist, S., Otten, T., & Sarich, N. (2020). COVID-19 pandemic in the United States. Health Policy and Technology9(4), 623–638. https://doi.org/10.1016/j.hlpt.2020.08.007
  • Bickel, P. J., & Levina, E. (2004). Some theory for Fisher's linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli10(6), 989–1010. https://doi.org/10.3150/bj/1106314847
  • Bosse, N. I., Abbott, S., Cori, A., Van Leeuwen, E., Bracher, J., & Funk, S. (2023). Scoring epidemiological forecasts on transformed scales. PLoS Computational Biology19(8), Article e1011393. https://doi.org/10.1371/journal.pcbi.1011393
  • Bouveyron, C. (2014). Adaptive mixture discriminant analysis for supervised learning with unobserved classes. Journal of Classification31(1), 49–84. https://doi.org/10.1007/s00357-014-9147-x
  • Bouveyron, C., Girard, S., & Schmid, C. (2007). High-dimensional discriminant analysis. Communications in Statistics: Theory and Methods36(14), 2607–2623. https://doi.org/10.1080/03610920701271095
  • Breiman, L. (2001). Random forests. Machine Learning45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  • Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition28(5), 781–793. https://doi.org/10.1016/0031-3203(94)00125-6
  • Chen, M., Wu, Y., & Jin, B. (2023). Evaluation of the Canadian government policies on controlling the COVID-19 outbreaks. Statistical Theory and Related Fields7(3), 223–234. https://doi.org/10.1080/24754269.2023.2201108
  • Chen, S., Yang, J., Yang, W., Wang, C., & Bärnighausen, T. (2020). COVID-19 control in China during mass population movements at New Year. The Lancet395(10226), 764–766. https://doi.org/10.1016/S0140-6736(20)30421-9
  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning20(3), 273–297. https://doi.org/10.1023/A:1022627411411
  • Delatola, E. I., Lebarbier, E., Mary-Huard, T., Radvanyi, F., Robin, S., & Wong, J. (2017). SegCorr a statistical procedure for the detection of genomic regions of correlated expression. BMC Bioinformatics18(1), Article 333. https://doi.org/10.1186/s12859-017-1742-5
  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B: Statistical Methodology39(1), 1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
  • Felizola Diniz-Filho, J. A., Jardim, L., Toscano, C. M., & Rangel, T. F. (2020). The effective reproductive number (Rt) of COVID-19 and its relationship with social distancing. medRxiv, 2020–07.
  • Flamary, R., Cuturi, M., Courty, N., & Rakotomamonjy, A. (2018). Wasserstein discriminant analysis. Machine Learning107(12), 1923–1945. https://doi.org/10.1007/s10994-018-5717-1
  • Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association97(458), 611–631. https://doi.org/10.1198/016214502760047131
  • Friedman, J. H. (1989). Regularized discriminant analysis. Journal of the American Statistical Association84(405), 165–175. https://doi.org/10.1080/01621459.1989.10478752
  • Fu, Y., Liu, Y., Wang, H.-H., & Wang, X. (2020). Empirical likelihood estimation in multivariate mixture models with repeated measurements. Statistical Theory and Related Fields4(2), 152–160. https://doi.org/10.1080/24754269.2019.1630544
  • Gostic, K. M., McGough, L., Baskerville, E. B., Abbott, S., Joshi, K., Tedijanto, C., Kahn, R., Niehus, R., Hay, J. A., De Salazar, P. M., Hellewell, J., Meakin, S., Munday, J. D., Bosse, N. I., Sherrat, K., Thompson, R. N., White, L. F., Huisman, J. S., Scire, J., …Pitzer, V. E. (2020). Practical considerations for measuring the effective reproductive number, Rt. PLoS Computational Biology16(12), Article e1008409. https://doi.org/10.1371/journal.pcbi.1008409
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media.
  • Jalal, H., Lee, K., & Burke, D. S. (2024). Oscillating spatiotemporal patterns of COVID-19 in the United States. Scientific Reports14(1), Article 21562. https://doi.org/10.1038/s41598-024-72517-6
  • Janková, J., & van de Geer, S. (2017). Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test26(1), 143–162. https://doi.org/10.1007/s11749-016-0503-5
  • Jiang, B., Chen, Z., & Leng, C. (2020). Dynamic linear discriminant analysis in high dimensional space. Bernoulli26(2), 1234–1268. https://doi.org/10.3150/19-BEJ1154
  • Lauer, S. A., Grantz, K. H., Bi, Q., Jones, F. K., Zheng, Q., Meredith, H. R., Azman, A. S., Reich, N. G., & Lessler, J. (2020). The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application. Annals of Internal Medicine172(9), 577–582. https://doi.org/10.7326/M20-0504
  • Lei, Z., Liao, S., & Li, S. Z. (2009). Stepwise correlation metric based discriminant analysis and multi-probe images fusion for face recognition. In 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops (pp. 147–153). IEEE.
  • Liu, X., Peng, J., Wang, D., & Chen, H. (2025). Maximum-likelihood estimation of the Po-MDDRCINAR(p) model with analysis of a COVID-19 data. Statistical Theory and Related Fields9(1), 34–58. https://doi.org/10.1080/24754269.2024.2412491
  • Ma, H., & Jiang, J. (2023). Pseudo-Bayesian classified mixed model prediction. Journal of the American Statistical Association118(543), 1747–1759. https://doi.org/10.1080/01621459.2021.2008944
  • Ma, H., Qin, J., Chen, F., & Zhou, Y. (2023). A novel nonparametric mixture model for the detection pattern of COVID-19 on diamond princess cruise. Statistical Theory and Related Fields7(1), 85–96. https://doi.org/10.1080/24754269.2022.2156743
  • Ma, Y., Lao, S., Takikawa, E., & Kawade, M. (2007). Discriminant analysis in correlation similarity measure space. In Proceedings of the 24th International Conference on Machine Learning (pp. 577–584). ACM.
  • Maurer, A., & Pontil, M. (2021). Concentration inequalities under sub-Gaussian and sub-exponential conditions. In Advances in Neural Information Processing Systems (NeurIPS)(pp. 7588–7597). NeurIPS Foundation.
  • Mclachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley.
  • McLachlan, G. J., Lee, S. X., & Rathnayake, S. I. (2019). Finite mixture models. Annual Review of Statistics and Its Application6(1), 355–378. https://doi.org/10.1146/statistics.2019.6.issue-1
  • Sohil, F., Sohali, M. U., & Shabbir, J. (2022). An introduction to statistical learning with applications in R. Statistical Theory and Related Fields6(1), 87–87. https://doi.org/10.1080/24754269.2021.1980261
  • Tian, H., Liu, Y., Li, Y., Wu, C.-H., Chen, B., Kraemer, M. U. G., Li, B., Cai, J., Xu, B., Yang, Q., Wang, B., Yang, P., Cui, Y., Song, Y., Zheng, P., Wang, Q., Bjornstad, O. N., Yang, R., Grenfell, B. T., …Dye, C. (2020). An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China. Science368(6491), 638–642. https://doi.org/10.1126/science.abb6105
  • Truong, D., & Truong, M. D. (2021). Projecting daily travel behavior by distance during the pandemic and the spread of COVID-19 infections–Are we in a closed loop scenario? Transportation Research Interdisciplinary Perspectives9, Article 100283. https://doi.org/10.1016/j.trip.2020.100283
  • Venables, W. N., & Ripley, B. D. (2013). Modern Applied Statistics with S. Springer Science & Business Media.
  • Vovan, T. (2018). Some results of classification problem by Bayesian method and application in credit operation. Statistical Theory and Related Fields2(2), 150–157. https://doi.org/10.1080/24754269.2018.1528420
  • White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica50(1), 1–25. https://doi.org/10.2307/1912526
  • Witten, D. M., & Tibshirani, R. (2011). Penalized classification using Fisher's linear discriminant. Journal of the Royal Statistical Society Series B: Statistical Methodology73(5), 753–772. https://doi.org/10.1111/j.1467-9868.2011.00783.x
  • Xu, J., & Tang, Y. (2021). An integrated epidemic modelling framework for the real-time forecast of COVID-19 outbreaks in current epicentres. Statistical Theory and Related Fields5(3), 200–220. https://doi.org/10.1080/24754269.2021.1872131
  • Xu, L., Iosifidis, A., & Gabbouj, M. (2018). Weighted linear discriminant analysis based on class saliency information. In 2018 25th IEEE International Conference on Image Processing (ICIP) (pp. 2306–2310). IEEE.
  • Xu, L., Lin, N., Zhang, B., & Shi, N.-Z. (2012). A finite mixture model for working correlation matrices in generalized estimating equations. Statistica Sinica22(2), 755–776. https://doi.org/10.5705/ss.2010.090
  • Zhang, X., & Warner, M. E. (2020). COVID-19 policy differences across US states: Shutdowns, reopening, and mask mandates. International Journal of Environmental Research and Public Health17(24), Article 9520. https://doi.org/10.3390/ijerph17249520

To cite this article: Feifei Yan, Yingjie Zhang, Jing Ning, Hai Shu & Ziqi Chen (07 Apr 2026): CorrDA: correlation-matrix driven discriminant analysis, Statistical Theory and Related Fields, DOI: 10.1080/24754269.2026.2652551

To link to this article: https://doi.org/10.1080/24754269.2026.2652551