Review Articles

Variable selection and subgroup analysis for high-dimensional censored data

Yu Zhang ,

Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, People's Republic of China

Jiangli Wang ,

School of Artificial Intelligence and Big Data, Hefei University, Hefei, People's Republic of China

Weiping Zhang

Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, People's Republic of China

zwp@ustc.edu.cn

Pages | Received 24 Jul. 2023, Accepted 29 Feb. 2024, Published online: 13 Mar. 2024,
  • Abstract
  • Full Article
  • References
  • Citations

This paper proposes a penalized method for high-dimensional variable selection and subgroup identification in the Tobit model. Based on Olsen's [(1978). Note on the uniqueness of the maximum likelihood estimator for the Tobit model. Econometrica: Journal of the Econometric Society46(5), 1211–1215. https://doi.org/10.2307/1911445] convex reparameterization of the Tobit negative log-likelihood, we develop an efficient algorithm for minimizing the objective function by combining the alternating direction method of multipliers (ADMM) and generalised coordinate descent (GCD). We also establish the oracle properties of our proposed estimator under some mild regularity conditions. Furthermore, extensive simulations and an empirical data study are conducted to demonstrate the performance of the proposed approach.

References

  • Alhamzawi, A. (2020). A new Bayesian elastic net for Tobit regression. Journal of Physics: Conference Series1664(1), 012047.
  • Alhamzawi, R. (2016). Bayesian elastic net Tobit quantile regression. Communications in Statistics-Simulation and Computation45(7), 2409–2427. https://doi.org/10.1080/03610918.2014.904341
  • Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal. Econometrica: Journal of the Econometric Society41(6), 997–1016. https://doi.org/10.2307/1914031
  • Amemiya, T. (1984). Tobit models: A survey. Journal of Econometrics24(1–2), 3–61. https://doi.org/10.1016/0304-4076(84)90074-5
  • Bondell, H. D., & Reich, B. J. (2008). Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics64(1), 115–123. https://doi.org/10.1111/biom.2008.64.issue-1
  • Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning3(1), 1–122. https://doi.org/10.1561/2200000016
  • Bradic, J., Fan, J., & Jiang, J. (2011). Regularization for Cox's proportional hazards model with NP-dimensionality. Annals of Statistics39(6), 3092. https://doi.org/10.1214/11-AOS911
  • Dagne, G. A. (2016). A growth mixture Tobit model: Application to AIDS studies. Journal of Applied Statistics43(7), 1174–1185. https://doi.org/10.1080/02664763.2015.1092114
  • Everitt, B. (2013). Finite mixture distributions. Springer Science & Business Media.
  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association96(456), 1348–1360. https://doi.org/10.1198/016214501753382273
  • Fan, J., & Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica20(1), 101–148.
  • Fan, Y., & Tang, C. (2013). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society: Series B (Statistical Methodology)75(3), 531–552. https://doi.org/10.1111/rssb.12001
  • Gandhi, R. T., Tashima, K. T., Smeaton, L. M., Vu, V., Ritz, J., Andrade, A., Eron, J. J., Hogg, E., & Fichtenbaum, C. J. (2020). Long-term outcomes in a large randomized trial of HIV-1 salvage therapy: 96-week results of AIDS Clinical Trials Group A5241 (OPTIONS). The Journal of Infectious Diseases221(9), 1407–1415. https://doi.org/10.1093/infdis/jiz281
  • Jacobson, T., & Zou, H. (2023). High-dimensional censored regression via the penalized Tobit likelihood. Journal of Business & Economic Statistics42(1), 286–297. https://doi.org/10.1080/07350015.2023.2182309
  • Johnson, B. A. (2009). On lasso for censored data. Electronic Journal of Statistics3, 485–506. https://doi.org/10.1214/08-EJS322
  • Ma, S., & Huang, J. (2017). A concave pairwise fusion approach to subgroup analysis. Journal of the American Statistical Association112(517), 410–423. https://doi.org/10.1080/01621459.2016.1148039
  • Ma, S., Huang, J., Zhang, Z., & Liu, M. (2019). Exploration of heterogeneous treatment effects via concave fusion. The International Journal of Biostatistics16(1), 20180026. https://doi.org/10.1515/ijb-2018-0026
  • Müller, P., & van de Geer, S. (2016). Censored linear model in high dimensions: Penalised linear regression on high-dimensional data with left-censored response variable. Test25(1), 75–92. https://doi.org/10.1007/s11749-015-0441-7
  • Olsen, R. J. (1978). Note on the uniqueness of the maximum likelihood estimator for the Tobit model. Econometrica: Journal of the Econometric Society46(5), 1211–1215. https://doi.org/10.2307/1911445
  • Powell, J. L. (1984). Least absolute deviations estimation for the censored regression model. Journal of Econometrics25(3), 303–325. https://doi.org/10.1016/0304-4076(84)90004-6
  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association66(336), 846–850. https://doi.org/10.1080/01621459.1971.10482356
  • Shafer, R. W. (2006). Rationale and uses of a public HIV drug-resistance database. The Journal of Infectious Diseases194(s1), S51–S58. https://doi.org/10.1086/jid.2006.194.issue-s1
  • Shen, J., & He, X. (2015). Inference for subgroup analysis with a structured logistic-normal mixture model. Journal of the American Statistical Association110(509), 303–312. https://doi.org/10.1080/01621459.2014.894763
  • Soret, P., Avalos, M., Wittkop, L., Commenges, D., & Thiébaut, R. (2018). Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors. BMC Medical Research Methodology18(1), 1–13. https://doi.org/10.1186/s12874-018-0609-4
  • Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica: Journal of the Econometric Society26(1), 24–36. https://doi.org/10.2307/1907382
  • Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications109(3), 475–494. https://doi.org/10.1023/A:1017501703105
  • Wang, H., Li, B., & Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology)71(3), 671–683. https://doi.org/10.1111/j.1467-9868.2008.00693.x
  • Wang, H., Li, R., & Tsai, C. L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika94(3), 553–568. https://doi.org/10.1093/biomet/asm053
  • Wang, X., Zhu, Z., & Zhang, H. H. (2019). Spatial automatic subgroup analysis for areal data with repeated measures. arXiv:1906.01853.
  • Yan, X., Yin, G., & Zhao, X. (2021). Subgroup analysis in censored linear regression. Statistica Sinica31(2), 1027–1054.
  • Yang, Y., & Zou, H. (2013). An efficient algorithm for computing the HHSVM and its generalizations. Journal of Computational and Graphical Statistics22(2), 396–415. https://doi.org/10.1080/10618600.2012.680324
  • Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics32(2), 894–942.
  • Zhao, P., & Yu, B. (2006). On model selection consistency of Lasso. The Journal of Machine Learning Research7(90), 2541–2563.
  • Zhou, X., & Liu, G. (2016). LAD-lasso variable selection for doubly censored median regression models. Communications in Statistics-Theory and Methods45(12), 3658–3667. https://doi.org/10.1080/03610926.2014.904357
  • Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association101(476), 1418–1429. https://doi.org/10.1198/016214506000000735

To cite this article: Yu Zhang, Jiangli Wang & Weiping Zhang (13 Mar 2024): Variable selection and subgroup analysis for high-dimensional censored data, Statistical Theory and Related Fields, DOI: 10.1080/24754269.2024.2327113

To link to this article: https://doi.org/10.1080/24754269.2024.2327113