Variable selection in finite mixture of median regression models using skew-normal distribution

ISSN 2475-4269

CN 31-2182/O1

Yuanyuan Ju ,

Faculty of Science, Kunming University of Science and Technology, Kunming, People's Republic of China

Liucang Wu

Faculty of Science, Kunming University of Science and Technology, Kunming, People's Republic of China

wuliucang@163.com

Pages | Received 18 Apr. 2021, Accepted 25 Jul. 2022, Published online: 06 Aug. 2022,

Abstract
Full Article
References
Citations

A regression model with skew-normal errors provides a useful extension for traditional normal regression models when the data involve asymmetric outcomes. Moreover, data that arise from a heterogeneous population can be efficiently analysed by a finite mixture of regression models. These observations motivate us to propose a novel finite mixture of median regression model based on a mixture of the skew-normal distributions to explore asymmetrical data from several subpopulations. With the appropriate choice of the tuning parameters, we establish the theoretical properties of the proposed procedure, including consistency for variable selection method and the oracle property in estimation. A productive nonparametric clustering method is applied to select the number of components, and an efficient EM algorithm for numerical computations is developed. Simulation studies and a real data set are used to illustrate the performance of the proposed methodologies.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. International Symposium on Information Theory, 1, 610–624. https://doi.org/10.1007/978-1-4612-1694-0_15
Atienza, N., Garcia-Heras, J., & Muñoz-Pichardo, J. (2006). A new condition for identifiability of finite mixture distributions. Metrika, 63(2), 215–221. https://doi.org/10.1007/s00184-005-0013-z
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12(2), 171–178. http://www.jstor.org/stable/4615982
Azzalini, A., & Capitanio, A. (2013). The skew-normal and related families. Cambridge University Press.
Chen, J. (2017). Consistency of the MLE under mixture models. Statistical Science, 32(1), 47–63. https://doi.org/10.1214/16-sts578
Chen, J., Li, P., & Liu, G. (2020). Homogeneity testing under finite location-scale mixtures. Canadian Journal of Statistics, 48(4), 670–684. https://doi.org/10.1002/cjs.11557
Chen, J., & Tan, X. (2009). Inference for multivariate normal mixtures. Journal of Multivariate Analysis, 100(7), 1367–1383. https://doi.org/10.1016/j.jmva.2008.12.005
Cook, R.-D., & Weisberg, S. (1994). An introduction to regression graphics. John Wiley and Sons.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360. https://doi.org/10.1198/016214501753382273
Goldfeld, S., & Quandt, R. (1973). A Markov model for switching regressions. Journal of Econometrics, 1(1), 3–15. https://doi.org/10.1016/0304-4076(73)90002-X
He, M., & Chen, J. (2022a). Consistency of the MLE under a two-parameter gamma mixture model with a structural shape parameter. Metrika. https://doi.org/10.1007/s00184-021-00856-9
He, M., & Chen, J. (2022b). Strong consistency of the MLE under two-parameter gamma mixture models with a structural scale parameter. Advances in Data Analysis and Classification, 16(1), 125–154. https://doi.org/10.1007/s11634-021-00472-5
Hu, D., Gu, Y., & Zhao, W. (2019). Bayesian variable selection for median regression. Chinese Journal of Applied Probability and Statistics, 35(6), 594–610.
Karlis, D., & Xekalaki, E. (2003). Choosing initial values for the EM algorithm for finite mixtures. Computational Statistics & Data Analysis, 41(3–4), 577–590. https://doi.org/10.1016/S0167-9473(02)00177-9
Khalili, A., & Chen, J. (2007). Variable selection in finite mixture of regression models. Journal of the American Statistical Association, 102(479), 1025–1038. https://doi.org/10.1198/016214507000000590
Kottas, A., & Gelfand, A. (2001). Bayesian semiparametric median regression modeling. Journal of the American Statistical Association, 96(456), 1458–1468. https://doi.org/10.1198/016214501753382363
Li, H., Wu, L., & Ma, T. (2017). Variable selection in joint location, scale and skewness models of the skew-normal distribution. Journal of Systems Science and Complexity, 30(3), 694–709. https://doi.org/10.1007/S11424-016-5193-2
Li, H., Wu, L., & Yi, J. (2016). A skew-normal mixture of joint location, scale and skewness models. Applied Mathematics-A Journal of Chinese Universities, 31(3), 283–295. https://doi.org/10.1007/S11766-016-3367-2
Li, J., Ray, S., & Lindsay, B.-G. (2007). A nonparametric statistical approach to clustering via mode identification. Journal of Machine Learning Research, 8(8), 1687–1723.
Lin, T.-I., Lee, J., & Yen, S. (2007). Finite mixture modelling using the skew normal distribution. Statistica Sinica, 17(3), 909–927. http://www.jstor.org/stable/24307705
Liu, M., & Lin, T.-I. (2014). A skew-normal mixture regression model. Educational and Psychological Measurement, 74(1), 139–162. https://doi.org/10.1177/0013164413498603
McLachlan, G., & Peel, D. (2004). Finite mixture models. John Wiley and Sons.
Otiniano, C. E. G., Rathie, P. N., & Ozelim, L. C. S. M. (2015). On the identifiability of finite mixture of skew-normal and skew-t distributions. Statistics & Probability Letters, 106, 103–108. https://doi.org/10.1016/j.spl.2015.07.015
Richardson, S., & Green, P. (1997). On bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(4), 731–792. https://doi.org/10.1111/1467-9868.00095
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/AOS/1176344136
Tang, A., & Tang, N. (2015). Semiparametric Bayesian inference on skew-normal joint modeling of multivariate longitudinal and survival data. Statistics in Medicine, 34(5), 824–843. https://doi.org/10.1002/SIM.6373
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 58(1), 267–288. https://doi.org/10.1111/J.2517-6161.1996.TB02080.X
Titterington, D., Smith, A., & Makov, U. (1985). Statistical analysis of finite mixture distributions. John Wiley and Sons
Wang, H., Li, R., & Tsai, C. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94(3), 553–568. https://doi.org/10.1093/BIOMET/ASM053
Wang, P., Puterman, M., Cockburn, I., & Le, N. (1996). Mixed Poisson regression models with covariate dependent rates. Biometrics, 52(2), 381–400. https://doi.org/10.2307/2532881
Wu, L. (2014). Variable selection in joint location and scale models of the skew-t-normal distribution. Communications in Statistics. Simulation and Computation, 43(3), 615–630. https://doi.org/10.1080/03610918.2012.712182
Wu, L., Li, S., & Tao, Y. (2020). Estimation and variable selection for mixture of joint mean and variance models. Communications in Statistics-Theory and Methods, 50(24), 6081–6098. https://doi.org/10.1080/03610926.2020.1738493
Wu, L., Zhang, Z., & Xu, D. (2013). Variable selection in joint location and scale models of the skew-normal distribution. Journal of Statistical Computation and Simulation, 83(7), 1266–1278. https://doi.org/10.1080/00949655.2012.657198
Yao, W., & Li, L. (2014). A new regression model: Modal linear regression. Scandinavian Journal of Statistics, 41(3), 656–671. https://doi.org/10.1111/SJOS.12054
Yin, J., Wu, L., & Dai, L. (2020). Variable selection in finite mixture of regression models using the skew-normal distribution. Journal of Applied Statistics, 47(16), 2941–2960. https://doi.org/10.1080/02664763.2019.1709051
Yin, J., Wu, L., Lu, H., & Dai, L. (2020). New estimation in mixture of experts models using the Pearson type VII distribution. Communications in Statistics. Simulation and Computation, 49(2), 472–483. https://doi.org/10.1080/03610918.2018.1485943
Zhou, X., & Liu, G. (2016). LAD-Lasso variable selection for doubly censored median regression models. Communications in Statistics. Theory and Methods, 45(12), 3658–3667. https://doi.org/10.1080/03610926.2014.904357

To cite this article: Xin Zeng, Yuanyuan Ju & Liucang Wu (2023) Variable selection in finite mixture of median regression models using skew-normal distribution, Statistical Theory and Related Fields, 7:1, 30-48, DOI: 10.1080/24754269.2022.2107974 To link to this article: https://doi.org/10.1080/24754269.2022.2107974

Archives

Authors

About the Journal

Links

Search

Archives