Review Articles

Two-stage least squares model averaging for instrumental variable models with exogenous variables

Wenjun Shen ,

College of Mathematics and Statistics, Chongqing University, Chongqing, People's Republic of China

Xiaochao Xia

College of Mathematics and Statistics, Chongqing University, Chongqing, People's Republic of China; Key Laboratory of Nonlinear Analysis and Its Applications (Ministry of Education), Chongqing University, Chongqing, People's Republic of China

xxc@cqu.edu.cn

Pages | Received 21 Feb. 2025, Accepted 12 Feb. 2026, Published online: 07 Mar. 2026,
  • Abstract
  • Full Article
  • References
  • Citations

Instrumental variable (IV) methods are widely used to address unmeasured confoundings in structural equation models. In this paper, we focus on the settings where a possibly large number of instruments and a weak correlation between the instruments and the endogenous variable exist. Specifically, we propose a novel two-stage least squares (2SLS) model averaging approach to estimate the coefficient of an endogenous variable. Differing from existing literature, our model averaging estimation allows multiple exogenous variables to be included in both stages simultaneously. Theoretically, we study the consistency and asymptotic distributions of the estimated weights and the proposed model averaging estimator. Importantly, we discover that the proposed model averaging estimator produces an asymptotic bias when the endogenous variable and exogenous variables are correlated. Then, we construct a debiased estimator and establish its consistency and asymptotic normality to make statistical inference. Furthermore, we present an equivalent interpretation of the debiased estimator from another construction. Finally, numerical simulations and a real data analysis are conducted to illustrate our proposal.

Your browser may not support PDF viewing. Please click to download the file.

References

  • Ando, T., & Li, K.-C. (2017). A weight-relaxed model averaging approach for high-dimensional generalized linear models. The Annals of Statistics45(6), 2654–2679. https://doi.org/10.1214/17-AOS1538
  • Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association91(434), 468–472.
  • Bekker, P. A. (1994). Alternative approximations to the distributions of instrumental variable estimators. Econometrica62(3), 657–681. https://doi.org/10.2307/2951662
  • Belloni, A., Chen, D., Chernozhukov, V., & Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica80(6), 2369–2429. https://doi.org/10.3982/ECTA9626
  • Belloni, A., Chernozhukov, V., & Hansen, C. (2014). High-dimensional methods and inference on structural and treatment effects. Journal of Economic Perspectives28(2), 29–50. https://doi.org/10.1257/jep.28.2.29
  • Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogeneous explanatory variable is weak. Journal of the American Statistical Association90(430), 443–450.
  • Canay, I. A. (2010). Simultaneous selection and weighting of moments in GMM using a trapezoidal kernel. Journal of Econometrics156(2), 284–303. https://doi.org/10.1016/j.jeconom.2009.10.036
  • Chen, D. L., & Yeh, S. (2012). Growth under the shadow of expropriation? The economic impacts of eminent domain. https://doi.org/10.2139/ssrn.2977074
  • Chen, J., Jiang, B., & Li, J. (2023). Nonparametric instrument model averaging. Journal of Nonparametric Statistics35(4), 905–926. https://doi.org/10.1080/10485252.2023.2215339
  • Chen, J., Li, D., Linton, O., & Lu, Z. (2018). Semiparametric ultra-high dimensional model averaging of nonlinear dynamic time series. Journal of the American Statistical Association113(522), 919–932. https://doi.org/10.1080/01621459.2017.1302339
  • Corbae, D., Durlauf, S., & Hansen, B. (2006). Econometric Theory and Practice. Cambridge University Press.
  • Duncan, O. D. (1975). Introduction to Structural Equation Models. Academic Press.
  • Fan, Q., & Zhong, W. (2018). Variable selection for structural equation with endogeneity. Journal of Systems Science and Complexity31(3), 787–803. https://doi.org/10.1007/s11424-017-6195-4
  • Fang, F., Li, J., & Xia, X. (2022). Semiparametric model averaging prediction for dichotomous response. Journal of Econometrics229(2), 219–245. https://doi.org/10.1016/j.jeconom.2020.09.008
  • Feng, Y., Liu, Q., Yao, Q., & Zhao, G. (2022). Model averaging for nonlinear regression models. Journal of Business & Economic Statistics40(2), 785–798. https://doi.org/10.1080/07350015.2020.1870477
  • Guo, Z., Kang, H., Cai, T. T., & Small, D. S. (2018). Testing endogeneity with high dimensional covariates. Journal of Econometrics207(1), 175–187. https://doi.org/10.1016/j.jeconom.2018.07.002
  • Hansen, B. E. (2007). Least squares model averaging. Econometrica75(4), 1175–1189. https://doi.org/10.1111/ecta.2007.75.issue-4
  • Hansen, B. E. (2017). Stein-like 2SLS estimator. Econometric Reviews36(6–9), 840–852. https://doi.org/10.1080/07474938.2017.1307579
  • Hansen, B. E. (2022). Econometrics. Princeton University Press.
  • Hansen, C., Hausman, J., & Newey, W. (2008). Estimation with many instrumental variables. Journal of Business & Economic Statistics26(4), 398–422. https://doi.org/10.1198/073500108000000024
  • Hong, Y. (2020). Foundations of Modern Econometrics. World Scientific.
  • Kang, H., Zhang, A., Cai, T. T., & Small, D. S. (2016). Instrumental variables estimation with some invalid instruments and its application to mendelian randomization. Journal of the American Statistical Association111(513), 132–144. https://doi.org/10.1080/01621459.2014.994705
  • Kline, R. B. (1998). Principles and Practice of Structural Equation Modeling. The Guilford Press.
  • Kok, B. C., Choi, J. S., Oh, H., & Choi, J. Y. (2021). Sparse extended redundancy analysis: Variable selection via the exclusive Lasso. Multivariate Behavioral Research56(3), 426–446. https://doi.org/10.1080/00273171.2019.1694477
  • Kuersteiner, G., & Okui, R. (2010). Constructing optimal instruments by first-stage prediction averaging. Econometrica78(2), 697–718. https://doi.org/10.3982/ECTA7444
  • Li, C., Li, Q., Racine, J. S., & Zhang, D. (2018). Optimal model averaging of varying coefficient models. Statistica Sinica28(4, SI), 1017–0405.
  • Li, D., Linton, O., & Lu, Z. (2015). A flexible semiparametric forecasting model for time series. Journal of Econometrics187(1), 495–509.
  • Li, J., Lv, J., Wan, A. T. K., & Liao, J. (2022). Adaboost semiparametric model averaging prediction for multiple categories. Journal of the American Statistical Association117(537), 495–509. https://doi.org/10.1080/01621459.2020.1790375
  • Li, J., Xia, X., Wong, W. K., & Nott, D. (2018). Varying-coefficient semiparametric model averaging prediction. Biometrics74(4), 1417–1426. https://doi.org/10.1111/biom.12904
  • Liu, C.-A. (2015). Distribution theory of the least squares averaging estimator. Journal of Econometrics186(1), 142–159. https://doi.org/10.1016/j.jeconom.2014.07.002
  • Martins, L. F., & Gabriel, V. J. (2014). Linear instrumental variables model averaging estimation. Computational Statistics & Data Analysis71, 709–724. https://doi.org/10.1016/j.csda.2013.05.008
  • Nelson, C. R., & Startz, R. (1990). The distribution of the instrumental variables estimator and its t-ratio when the instrument is a poor one. The Journal of Business63(S1), S125–S140. https://doi.org/10.1086/jb.1990.63.issue-S1
  • Okui, R. (2011). Instrumental variable estimation in the presence of many moment conditions. Journal of Econometrics165(1), 70–86. https://doi.org/10.1016/j.jeconom.2011.05.007
  • Seng, L., & Li, J. (2022). Structural equation model averaging: Methodology and application. Journal of Business & Economic Statistics40(2), 815–828. https://doi.org/10.1080/07350015.2020.1870479
  • Stock, J. H., Wright, J. H., & Yogo, M. (2002). A survey of weak instruments and weak identification in generalized method of moments. Journal of Business & Economic Statistics20(4), 518–529. https://doi.org/10.1198/073500102288618658
  • Zhang, X., & Liu, C. (2023). Model averaging prediction by K-fold cross-validation. Journal of Econometrics235(1), 280–301. https://doi.org/10.1016/j.jeconom.2022.04.007
  • Zhang, X., & Wang, W. (2019). Optimal model averaging estimation for partially linear models. Statistica Sinica29(2), 693–718.
  • Zhang, X., & Zhang, X. (2023). Optimal model averaging based on forward-validation. Journal of Econometrics237(2),105295. https://doi.org/10.1016/j.jeconom.2022.03.010
  • Zhang, X., Zou, G., & Liang, H. (2014). Model averaging and weight choice in linear mixed-effects models. Biometrika101(1), 205–218. https://doi.org/10.1093/biomet/ast052
  • Zhu, R., Wan, A. T. K., Zhang, X., & Zou, G. (2019). A Mallows-type model averaging estimator for the varying-coefficient partially linear model. Journal of the American Statistical Association114(526), 882–892. https://doi.org/10.1080/01621459.2018.1456936
  • Zhu, R., Zhang, X., Wan, A. T. K., & Zou, G. (2023). Kernel averaging estimators. Journal of Business & Economic Statistics41(1), 157–169. https://doi.org/10.1080/07350015.2021.2006668

To cite this article: Wenjun Shen & Xiaochao Xia (07 Mar 2026): Two-stage least squares model averaging for instrumental variable models with exogenous variables, Statistical Theory and Related Fields, DOI: 10.1080/24754269.2026.2635747

To link to this article: https://doi.org/10.1080/24754269.2026.2635747