Review Articles

Sequential profile Lasso for ultra-high-dimensional partially linear models

Yujie Li ,

College of Applied Sciences, Beijing University of Technology, Beijing, China; Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, China

Gaorong Li ,

Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, China

ligaorong@bjut.edu.cn ligaorong@gmail.com

Tiejun Tong

Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong

Pages 234-245 | Received 12 Oct. 2017, Accepted 21 Oct. 2017, Published online: 09 Nov. 2017,
  • Abstract
  • Full Article
  • References
  • Citations

ABSTRACT

In this paper, we study ultra-high-dimensional partially linear models when the dimension of the linear predictors grows exponentially with the sample size. For the variable screening, we propose a sequential profile Lasso method (SPLasso) and show that it possesses the screening property. SPLasso can also detect all relevant predictors with probability tending to one, no matter whether the ultra-high models involve both parametric and nonparametric parts. To select the best subset among the models generated by SPLasso, we propose an extended Bayesian information criterion (EBIC) for choosing the final model. We also conduct simulation studies and apply a real data example to assess the performance of the proposed method and compare with the existing method.

References

  1. Bertsekas, D. P. (1999). Nonlinear programming. Belmont, MA: Athena Scientific[Google Scholar]
  2. Bunea, F. (2004). Consistent covariate selection and post model selection inference in semiparametric regression. The Annals of Statistics, 32, 898927[Google Scholar]
  3. Chen, J. H., & Chen, Z. H. (2008). Extended Bayesian information criterion for model selection with large model spaces. Biometrika, 95, 759771[Google Scholar]
  4. Cheng, M. Y., Honda, T., & Zhang, J. T. (2016). Forward variable selection for sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 111, 12091221[Taylor & Francis Online], [Google Scholar]
  5. Cui, H. J., Li, R. Z., & Zhong, W. (2015). Model-free feature screening for ultra-high dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630641[Taylor & Francis Online], [Google Scholar]
  6. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32, 407499[Google Scholar]
  7. Fan, J. Q., Feng, Y., & Song, R. (2011). Nonparametric independence screening in sparse ultra-high dimensional additive models. Journal of the American Statistical Association, 106, 544557[Taylor & Francis Online], [Google Scholar]
  8. Fan, J. Q., & Gijbels, I. (1996). Local polynomial modeling and its applications. London: Chapman and Hall[Google Scholar]
  9. Fan, J. Q., & Lv, J. C. (2008). Sure independence screening for ultra-high dimensional feature space (with discussion). Journal of the Royal Statistical Society, Series B, 70, 849911[Google Scholar]
  10. Fan, J. Q., Ma, Y. B., & Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109, 12701284[Taylor & Francis Online], [Google Scholar]
  11. Härdle, W., Liang, H., & Gao, J. T. (2000). Partially linear models. Heidelberg: Springer Physica[Google Scholar]
  12. Knight, W. A., Livingston, R. B., Gregory, E. J., & McGuire, W. L. (1977). Estrogen receptor as an independent prognostic factor for early recurrence in breast cancer. Cancer Research, 37, 46694671[Google Scholar]
  13. Li, Y. J., Li, G. R., Lian, H., & Tong, T. J. (2017). Profile forward regression screening for ultra-high dimensional semiparametric varying coefficient partially linear models. Journal of Multivariate Analysis, 155, 133150[Google Scholar]
  14. Li, G. R., Peng, H., Zhang, J., & Zhu, L. X. (2012). Robust rank correlation based screening. The Annals of Statistics, 40, 18461877[Google Scholar]
  15. Li, G. R., Zhang, J., & Feng, S. Y. (2016). Modern measurement error models. Beijing: Science Press[Google Scholar]
  16. Li, R. Z., Zhong, W., & Zhu, L. P. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 11291139[Taylor & Francis Online], [Google Scholar]
  17. Liang, H., & Li, R. Z. (2009). Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association, 104, 234248[Taylor & Francis Online], [Google Scholar]
  18. Liang, H., Wang, H. S., & Tsai, C. L. (2012). Profile forward regression for ultrahigh dimensional variable screening in semiparametric partially linear models. Statistica Sinica, 22, 531554[Google Scholar]
  19. Liu, X., Wang, L., & Liang, H. (2011). Estimation and variable selection for semiparametric additive partial linear models. Statistica Sinica, 21, 12251248[Google Scholar]
  20. Luo, S. (2012). Feature selection in high-dimensional studies (Doctoral dissertation). National University of Singapore, Singapore[Google Scholar]
  21. Luo, S., & Chen, Z. H. (2014). Sequential Lasso cum EBIC for feature selection with ultra-high dimensional feature space. Journal of the American Statistical Association, 109, 12291240[Taylor & Francis Online], [Google Scholar]
  22. Mammen, E., & van de Geer, S. (1997). Penalized quasi-likelihood estimation in partial linear models. The Annals of Statistics, 25, 10141035[Google Scholar]
  23. Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34, 14361462[Google Scholar]
  24. Park, M. Y., & Hastie, T. (2007). L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society, Series B, 69, 659677[Google Scholar]
  25. Sherwood, W., & Wang, L. (2016). Partially linear additive quantile regression in ultra-high dimension. The Annals of Statistics, 44, 288317[Google Scholar]
  26. Stewart, B. W., & Wild, C. P. (2014). World cancer report 2014. Lyon: International Agency for Research on Cancer[Google Scholar]
  27. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267288[Crossref], [Google Scholar]
  28. van't Veer, L. J., Dai, H. Y., van de Vijver, M. J., He, Y. D., Hart, A. A. M., Mao, M., … Friens, S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530536[Google Scholar]
  29. Wang, H. S. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104, 15121524[Taylor & Francis Online], [Google Scholar]
  30. Wang, L., Liu, X., Liang, H., & Carroll, R. J. (2011). Estimation and variable selection for generalized additive partial linear models. The Annals of Statistics, 39, 18271851[Google Scholar]
  31. Weisberg, S. (1980). Applied linear regression. New York: Wiley[Google Scholar]
  32. Xie, H. L., & Huang, J. (2009). SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics, 37, 673696[Google Scholar]
  33. Yu, T., Li, J. L., & Ma, S. G. (2012). Adjusting confounders in ranking biomarkers: A model-based ROC approach. Briefings in Bioinformatics, 13, 513523[Google Scholar]
  34. Zhang, C. H., & Huang, J. (2008). The sparsity and bias of the Lasso selection in high dimensional linear regression. The Annals of Statistics, 37, 15671594[Google Scholar]
  35. Zhao, P., & Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine Learning Research, 7, 25412563[Google Scholar]
  36. Zhu, L. P., Li, L. X., Li, R. Z., & Zhu, L. X. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106, 14641475[Taylor & Francis Online], [Google Scholar]
  37. Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 14181429[Taylor & Francis Online], [Google Scholar]