Model averaging for generalized linear models in fragmentary data prediction

ISSN 2475-4269

CN 31-2182/O1

chaoxiayuan@163.com

Yang Wu ,

KLATASDS – MOE, School of Statistics, East China Normal University, Shanghai, People's Republic of China

Fang Fang

KLATASDS – MOE, School of Statistics, East China Normal University, Shanghai, People's Republic of China

Pages | Received 01 Feb. 2022, Accepted 18 Jul. 2022, Published online: 30 Jul. 2022,

Abstract
Full Article
References
Citations

Fragmentary data is becoming more and more popular in many areas which brings big challenges to researchers and data analysts. Most existing methods dealing with fragmentary data consider a continuous response while in many applications the response variable is discrete. In this paper, we propose a model averaging method for generalized linear models in fragmentary data prediction. The candidate models are fitted based on different combinations of covariate availability and sample size. The optimal weight is selected by minimizing the Kullback–Leibler loss in the completed cases and its asymptotic optimality is established. Empirical evidences from a simulation study and a real data analysis about Alzheimer disease are presented.

Akaike, H. (1970). Statistical predictor identification. Annals of the Institute of Statistical Mathematics, 22(1), 203–217. https://doi.org/10.1007/BF02506337
Ando, T., & Li, K.-C. (2014). A model averaging approach for high dimensional regression. Journal of American Statistical Association, 109(505), 254–265. https://doi.org/10.1080/01621459.2013.838168
Ando, T., & Li, K.-C. (2017). A weight-relaxed model averaging approach for high-dimensional generalized linear models. The Annals of Statistics, 45(6), 2654–2679. https://doi.org/10.1214/17-AOS1538
Buckland, S. T., Burnham, K. P., & Augustin, N. H. (1997). Model selection: An integral part of inference. Biometrics, 53(2), 603–618. https://doi.org/10.2307/2533961
Chen, J., Li, D., Linton, O., & Lu, Z. (2018). Semiparametric ultra-high dimensional model averaging of nonlinear dynamic time series. Journal of the American Statistical Association, 113(522), 919–932. https://doi.org/10.1080/01621459.2017.1302339
Dardanoni, V., Luca, G. D., Modica, S., & Peracchi, F. (2015). Model averaging estimation of generalized linear models with imputed covariates. Journal of Econometrics, 184(2), 452–463. https://doi.org/10.1016/j.jeconom.2014.06.002
Dardanoni, V., Modica, S., & Peracchi, F. (2011). Regression with imputed covariates: A generalized missing indicator approach. Journal of Econometrics, 162(2), 362–368. https://doi.org/10.1016/j.jeconom.2011.02.005
Ding, X., Xie, J., & Yan, X. (2021). Model averaging for multiple quantile regression with covariates missing at random. Journal of Statistical Computation and Simulation, 91(11), 2249–2275. https://doi.org/10.1080/00949655.2021.1890733
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association, 96(456), 1348–1360. https://doi.org/10.1198/016214501753382273
Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussions). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849–911. https://doi.org/10.1111/rssb.2008.70.issue-5
Fang, F., Li, J., & Xia, X. (2022). Semiparametric model averaging prediction for dichotomous response. Journal of Econometrics, 229(2), 219–245. https://doi.org/10.1016/j.jeconom.2020.09.008
Fang, F., Wei, L., Tong, J., & Shao, J. (2019). Model averaging for prediction with fragmentary data. Journal of Business & Economic Statistics, 37(3), 517–527. https://doi.org/10.1080/07350015.2017.1383263
Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75(4), 1175–1189. https://doi.org/10.1111/ecta.2007.75.issue-4
Hansen, B. E., & Racine, J. S. (2012). Jackknife model averaging. Journal of Econometrics, 167(1), 38–46. https://doi.org/10.1016/j.jeconom.2011.06.019
Hjort, N. L., & Claeskens, G. (2003). Frequentist model average estimators. Journal of American Statistical Association, 98(464), 879–899. https://doi.org/10.1198/016214503000000828
Hoeting, J., Madigan, D., Raftery, A., & Volinsky, C. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4), 382–401. https://doi.org/10.1214/ss/1009212519
Kim, J. K., & Shao, J. (2013). Statistical methods for handling incomplete data. Chapman & Hall/CRC.
Leung, G., & Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Transactions on Information Theory, 52(8), 3396–3410. https://doi.org/10.1109/TIT.2006.878172
Li, C., Li, Q., Racine, J. S., & Zhang, D. (2018). Optimal model averaging of varying coefficient models. Statistica Sinica, 28(2), 2795–2809. https://doi.org/10.5705/ss.202017.0034
Li, D., Linton, O., & Lu, Z. (2015). A flexible semiparametric forecasting model for time series. Journal of Econometrics, 187(1), 345–357. https://doi.org/10.1016/j.jeconom.2015.02.025
Liao, J., Zong, X., Zhang, X., & Zou, G. (2019). Model averaging based on leave-subject-out cross-validation for vector autoregressions. Journal of Econometrics, 209(1), 35–60. https://doi.org/10.1016/j.jeconom.2018.10.007
Lin, H., Liu, W., & Lan, W. (2021). Regression analysis with individual-specific patterns of missing covariates. Journal of Business & Economic Statistics, 39(1), 179–188. https://doi.org/10.1080/07350015.2019.1635486
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. 2nd ed. Wiley.
Liu, Q., & Okui, R. (2013). Heteroskedasticity-robust CpCp model averaging. The Econometrics Journal, 16(3),463–472. https://doi.org/10.1111/ectj.12009
Liu, Q., & Zheng, M. (2020). Model averaging for generalized linear model with covariates that are missing completely at random. The Journal of Quantitative Economics, 11(4), 25–40. https://doi.org/10.16699/b.cnki.jqe.2020.04.003
Longford, N. T. (2005). Editorial: Model selection and efficiency is ‘Which model…?’ the right question? Journal of the Royal Statistical Society: Series A (Statistics in Society), 168(3), 469–472. https://doi.org/10.1111/rssa.2005.168.issue-3
Lu, X., & Su, L. (2015). Jackknife model averaging for quantile regressions. Journal of Econometrics, 188(1), 40–58. https://doi.org/10.1016/j.jeconom.2014.11.005
Mallows, C. (1973). Some comments on CpCp. Technometrics, 15(4), 661–675. https://doi.org/10.2307/1267380
Meier, L., Geer, S. V. D., & Peter, B. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1), 53–71. https://doi.org/10.1111/j.1467-9868.2007.00627.x
Schomaker, M., Wan, A. T. K., & Heumann, C. (2010). Frequentist model averaging with missing observations. Computational Statistics and Data Analysis, 54(12), 3336–3347. https://doi.org/10.1016/j.csda.2009.07.023
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Wan, A. T. K., Zhang, X., & Zou, G. (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics, 156(2), 277–283. https://doi.org/10.1016/j.jeconom.2009.10.030
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50(1), 1–25. https://doi.org/10.2307/1912526
Xue, F., & Qu, A. (2021). Integrating multi-source block-wise missing data in model selection. Journal of American Statistical Association, 116(536), 1914–1927. https://doi.org/10.1080/01621459.2020.1751176
Yang, Y. (2001). Adaptive regression by mixing. Journal of American Statistical Association, 96(454), 574–588. https://doi.org/10.1198/016214501753168262
Yang, Y. (2003). Regression with multiple candidate models: Selecting or mixing? Statistica Sinica, 13, 783–809.
Zhang, X. (2013). Model averaging with covariates that are missing completely at random. Economics Letters, 121(3), 360–363. https://doi.org/10.1016/j.econlet.2013.09.008
Zhang, X., Yu, D., Zou, G., & Liang, H. (2016). Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models. Journal of the American Statistical Association, 111(516), 1775–1790. https://doi.org/10.1080/01621459.2015.1115762
Zhang, X., Zou, G., & Liang, H. (2014). Model averaging and weight choice in linear mixed effects models. Biometrika, 101(1), 205–218. https://doi.org/10.1093/biomet/ast052
Zhang, X., Zou, G., Liang, H., & Carroll, R. J. (2020). Parsimonious model averaging with a diverging number of parameters. Journal of the American Statistical Association, 115(530), 972–984. https://doi.org/10.1080/01621459.2019.1604363
Zhang, Y., Tang, N., & Qu, A. (2020). Imputed factor regression for high-dimensional block-wise missing data. Statistica Sinica, 30(2), 631–651. https://doi.org/10.5705/ss.202018.0008
Zheng, H., Tsui, K-W, Kang, X., & Deng, X. (2017). Cholesky-based model averaging for covariance matrix estimation. Statistical Theory and Related Fields, 1(1), 48–58. https://doi.org/10.1080/24754269.2017.1336831
Zhu, R., Wan, A. T. K., Zhang, X., & Zou, G. (2019). A Mallow-type model averaging estimator for the varying-coefficient partially linear model. Journal of the American Statistical Association, 114(526), 882–892. https://doi.org/10.1080/01621459.2018.1456936

To cite this article: Chaoxia Yuan, Yang Wu & Fang Fang (2022) Model averaging for generalized linear models in fragmentary data prediction, Statistical Theory and Related Fields, 6:4, 344-352, DOI: 10.1080/24754269.2022.2105486 To link to this article: https://doi.org/10.1080/24754269.2022.2105486

Archives

Authors

About the Journal

Links

Search

Archives