Review Articles

Optimal model averaging estimator for multinomial logit models

Rongjie Jiang ,

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China

Liuming Wang ,

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China

Yang Bai

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China

Pages | Received 29 Mar. 2020, Accepted 10 Jan. 2022, Published online: 17 Feb. 2022,
  • Abstract
  • Full Article
  • References
  • Citations

In this paper, we study optimal model averaging estimators of regression coefficients in a multinomial logit model, which is commonly used in many scientific fields. A Kullback–Leibler (KL) loss-based weight choice criterion is developed to determine averaging weights. Under some regularity conditions, we prove that the resulting model averaging estimators are asymptotically optimal. When the true model is one of the candidate models, the averaged estimators are consistent. Simulation studies suggest the superiority of the proposed method over commonly used model selection criterions, model averaging methods, as well as some other related methods in terms of the KL loss and mean squared forecast error. Finally, the website phishing data is used to illustrate the proposed method.


  • Abdelhamid, N., Ayesh, A., & Thabtah, F. (2014). Phishing detection based associative classification data mining. Expert Systems with Applications41(13), 5948–5959. 
  • Akaike, H. (1973). Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika60(2), 255–265. 
  • Ando, T., & Li, K. C. (2014). A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association109, 254–265. 
  • Bayaga, A. (2010). Multinomial logistic regression: Usage and application in risk analysis. Journal of Applied Quantitative Methods5, 288–297. 
  • Buckland, S. T., Burnham, K. P., & Augustin, N. H. (1997). Model selection: An integral part of inference. Biometrics53(2), 603–618.
  • Cavanaugh, J. E. (1999). A large-sample model selection criterion based on Kullback's symmetric divergence. Statistics & Probability Letters42(4), 333–343. 
  • Cheng, T. C. F., Ing, C. K., & Yu, S. H. (2015). Toward optimal model averaging in regression models with time series errors. Journal of Econometrics189(2), 321–334.
  • Diebold, F. X., & Mariano, R. S. (2002). Comparing predictive accuracy. Journal of Business & Economic Statistics20, 134–144. 
  • Ederington, L. H. (1985). Classification models and bond ratings. Financial Review20, 237–262. 
  • Fahrmeir, L., & Tutz, G. (2013). Multivariate statistical modelling based on generalized linear models. Springer Science & Business Media. 
  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software33, 1–22.
  • Guadagni, P. M., & Little, J. D. C. (1983). A logit model of brand choice calibrated on scanner data. Marketing Science2, 203–238. 
  • Hansen, B. E. (2007). Least squares model averaging. Econometrica75, 1175–1189.
  • Hansen, B. E., & Racine, J. S. (2012). Jackknife model averaging. Journal of Econometrics167, 38–46. 
  • Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science14, 382–417.
  • Hurvich, C. M., Simonoff, J. S., & Tsai, C. L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society: Series B (Statistical Methodology)60, 271–293. 
  • Konishi, S., & Kitagawa, G. (1996). Generalised information criteria in model selection. Biometrika83, 875–890. 
  • Li, C., Li, Q., Racine, J., & Zhang, D. Q. (2018). Optimal model averaging of varying coefficient models. Statistica Sinica28, 2795–2809.
  • Liu, Q., & Okui, R. (2013). Heteroskedasticity-Robust CpCp model averaging. Econometrics Journal16(3), 463–472. 
  • Lu, X., & Su, L. (2015). Jackknife model averaging for quantile regressions. Journal of Econometrics188, 40–58.
  • Mallows, C. L. (1973). Some comments on Cpp. Technometrics15, 661–675.
  • Portnoy, S. (1988). Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. The Annals of Statistics16(1), 356–366. 
  • Raftery, A. E., & Zheng, Y. (2003). Discussion: Performance of Bayesian model averaging. Journal of the American Statistical Association98(464), 931–938.
  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics6(2), 461–464. 
  • Shen, X., Huang, H. C., & Ye, J. (2004). Adaptive model selection and assessment for exponential family distributions. Technometrics46(3), 306–317. 
  • Wan, A. T., Zhang, X., & Wang, S. (2014). Frequentist model averaging for multinomial and ordered logit models. International Journal of Forecasting30(1), 118–128. 
  • Wan, A. T., Zhang, X., & Zou, G. (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics156(2), 277–283. 
  • Wang, H., Zhang, X., & Zou, G. (2009). Frequentist model averaging estimation: A review. Journal of Systems Science and Complexity22(4), 732–748. 
  • White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica50(1), 1–25. 
  • Zhang, X., & Liu, C. A. (2019). Inference after model averaging in linear regression models. Econometric Theory35(4), 816–841.
  • Zhang, X., Wan, A. T., & Zou, G. (2013). Model averaging by jackknife criterion in models with dependent data. Journal of Econometrics174(2), 82–94. 
  • Zhang, X., & Wang, W. (2019). Optimal model averaging estimation for partially linear models. Statistica Sinica29, 693–718. 
  • Zhang, X., Yu, D., Zou, G., & Liang, H. (2016). Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models. Journal of the American Statistical Association111(516), 1775–1790.
  • Zhang, X., & Yu, J. (2018). Spatial weights matrix selection and model averaging for spatial autoregressive models. Journal of Econometrics203(1), 1–18. 
  • Zhang, X., Zou, G., & Carroll, R. J. (2015). Model averaging based on Kullback-Leibler distance. Statistica Sinica25, 1583–1598. 
  • Zhang, X., Zou, G., Liang, H., & Carroll, R. J. (2020). Parsimonious model averaging with a diverging number of parameters. Journal of the American Statistical Association115(530), 972–984. 
  • Zhao, P., & Li, Z. (2008). Central limit theorem for weighted sum of multivariate random vector sequences. Journal of Mathematics28, 171–176. 
  • Zhao, S., Zhou, J., & Yang, G. (2019). Averaging estimators for discrete choice by M-fold cross-validation. Economics Letters174, 65–69. 
  • Zhu, R., Wan, A. T., Zhang, X., & Zou, G. (2019). A mallows-type model averaging estimator for the varying-coefficient partially linear model. Journal of the American Statistical Association114(526), 882–892. 
  • Zhu, R., Zou, G., & Zhang, X. (2018). Model averaging for multivariate multiple regression models. Statistics52(1), 205–227.

To cite this article: Rongjie Jiang, Liming Wang & Yang Bai (2022): Optimal model averaging estimator for multinomial logit models, Statistical Theory and Related Fields, DOI: 10.1080/24754269.2022.2037204