Empirical likelihood inference and goodness-of-fit test for logistic regression model under two-phase case-control sampling

ISSN 2475-4269

CN 31-2182/O1

Yukun Liu ,

KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, People’s Republic of China

Jing Qin

National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA

Pages 0 | Received 02 Mar. 2021, Accepted 18 Jun. 2021, Published online: 08 Jul. 2021,

Abstract
Full Article
References
Citations

Due to cost-effectiveness and high efficiency, two-phase case-control sampling has been widely used in epidemiology studies. We develop a semi-parametric empirical likelihood approach to two-phase case-control data under the logistic regression model. We show that the maximum empirical likelihood estimator has an asymptotically normal distribution, and the empirical likelihood ratio follows an asymptotically central chi-square distribution. We find that the maximum empirical likelihood estimator is equal to Breslow and Holubkov (1997)’s maximum likelihood estimator. Even so, the limiting distribution of the likelihood ratio, likelihood-ratio-based interval, and test are all new. Furthermore, we construct new Kolmogorov–Smirnov type goodness-of-fit tests to test the validation of the underlying logistic regression model. Our simulation results and a real application show that the likelihood-ratio-based interval and test have certain merits over the Wald-type counterparts and that the proposed goodness-of-fit test is valid.

References

Anderson, J. A. (1972). Separate sample logistic discrimination. Biometrika, 59(1), 19–35. https://doi.org/10.1093/biomet/59.1.19
Anderson, J. A. (1979). Multivariate logistic compounds. Biometrika, 66, 17–26. https://doi.org/10.2307/2335237
Breslow, N. E., & Cain, K. C. (1988). Logistic regression for two-stage case-control data. Biometrika, 75(1), 11–20. https://doi.org/10.1093/biomet/75.1.11
Breslow, N. E., & Chatterjee, N. (1999). Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis. Applied Statistics, 48(4), 457–468. https://doi.org/10.1111/1467-9876.00165
Breslow, N., & Day, N. E. (1980). Statistical methods in cancer research. Volume 1. The analysis of case-control studies. IARC Scientific Publications.
Breslow, N. E., & Holubkov, R. (1997). Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling. Journal of the Royal Statistical Society Series B, 59(2), 447–461. https://doi.org/10.1111/rssb.1997.59.issue-2
Breslow, N. E., Lumley, T., Ballantyne, C. M., Chambless, L. E., & Kulich, M. (2009). Improved Horvitz-Thompson estimation of model parameters from two-phase stratified samples: Applications in epidemiology. Statistics in Biosciences, 1(1), 32–49. https://doi.org/10.1007/s12561-009-9001-6
Cai, S., Chen, J., & Zidek, J. V. (2017). Hypothesis testing in the presence of multiple samples under density ratio models. Statistica Sinica, 27, 761–783. https://doi.org/10.5705/ss.2014.168
Chen, J., & Liu, Y. (2013). Quantile and quantile-function estimations under density ratio model. Annals of Statistics, 41(3), 1669–1692. https://doi.org/10.1214/13-AOS1129
Chen, J., & Qin, J. (1993). Empirical likelihood estimation for finite populations and the effective usage of auxiliary information. Biometrika, 80(1), 107–116. https://doi.org/10.1093/biomet/80.1.107
Chen, J., Sitter, R. R., & Wu, C. (2002). Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys. Biometrika, 89(1), 230–237. https://doi.org/10.1093/biomet/89.1.230
D'Angio, G. J., Breslow, N., Beckwith, J. B., Evans, A., Baum, H., Fernbach, D., Hrabovsky, E., Jones, B., & Kelalis, P. (1989). Treatment of Wilms' tumour. Results of the third national Wilms' tumor study. Cancer, 64(2), 349–360. https://doi.org/10.1002/(ISSN)1097-0142
Diao, G., Ning, J., & Qin, J. (2012). Maximum likelihood estimation for semiparametric density ratio model. The International Journal of Biostatistics, 8(1). https://doi.org/10.1515/1557-4679.1372
DiCiccio, T., Hall, P., & Romano, J. (1991). Empirical likelihood is Bartlett-correctable. The Annals of Statistics, 19(2), 1053–1061. https://doi.org/10.1214/aos/1176348137
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1), 1–26. https://doi.org/10.1214/aos/1176344552
Farewell, V. (1979). Some results on the estimation of logistic models based on retrospective data. Biometrika, 66(1), 27–32. https://doi.org/10.1093/biomet/66.1.27
Flanders, W. D., & Greenland, S. (1991). Analytic methods for two-stage case-control studies and other stratified designs. Statistics in Medicine, 10(5), 739–747. https://doi.org/10.1002/(ISSN)1097-0258
Green, D. M., Breslow, N. E., Beckwith, J. B., Finklestein, J. Z., Grundy, P. E., P. R. Thomas, Kim, T., Shochat, S. J., Haase, G. M., Ritchey, M. L., Kelalis, P. P., & D'Angio, G. J. (1998). Comparison between single-dose and divided-dose administration of dactinomycin and doxorubicin for patients with Wilms' tumor: A report from the national Wilms' tumor study group. Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology, 16(1), 237–245. https://doi.org/10.1200/JCO.1998.16.1.237
Kitamura, Y. (2006). Empirical likelihood methods in econometrics: theory and practice. Discussion Paper 1569. Cowles Foundation.
Lawless, J. F., Kalbfleisch, J. D., & Wild, C. J. (1999). Semiparametric methods for response-selective and missing data problems in regression. Journal of the Royal Statistical Society Series B, 61(2), 413–438. https://doi.org/10.1111/rssb.1999.61.issue-2
Liu, Y., & Chen, J. (2010). Adjusted empirical likelihood with high-order precision. The Annals of Statistics, 38(3), 1341–1362. https://doi.org/10.1214/09-AOS750
Luo, X., & Tsai, W. Y. (2012). A proportional likelihood ratio model. Biometrika, 99(1), 211–222. https://doi.org/10.1093/biomet/asr060
Neyman, J. (1938). Contribution to the theory of sampling human populations. Journal of the American Statistical Association, 33(201), 101–116. https://doi.org/10.1080/01621459.1938.10503378
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2), 237–249. https://doi.org/10.1093/biomet/75.2.237
Owen, A. B. (2001). Empirical likelihood. Chapman and Hall.
Prentice, R. L., & Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika, 66(3), 403–411. https://doi.org/10.1093/biomet/66.3.403
Qin, J. (1998). Inferences for case-control and semiparametric two-sample density ratio models. Biometrika, 85(3), 619–630. https://doi.org/10.1093/biomet/85.3.619
Qin, J., & Zhang, B. (1997). A goodness-of-fit test for logistic regression models based on case-control data. Biometrika, 84(3), 609–618. https://doi.org/10.1093/biomet/84.3.609
Qin, J., & Zhang, B. (2005). Density estimation under a two-sample semiparametric model. Journal of Nonparametric Statistics, 17(6), 665–683. https://doi.org/10.1080/10485250500039346
Qin, J., Zhang, H., Li, P., Albanes, D., & Yu, K. (2015). Using covariate-specific disease prevalence information to increase the power of case-control studies. Biometrika, 102(1), 169–180. https://doi.org/10.1093/biomet/asu048
Saegusa, T., & Wellner, J. A. (2013). Weighted likelihood estimation under two-phase sampling. Annals of Statistics, 41(1), 269–295. https://doi.org/10.1214/12-AOS1073
Schaid, D. J., Jenkins, G. D., Ingle, J. N., & Weinshilboum, R. M. (2013). Two-phase designs to follow-up genome-wide association signals with DNA resequencing studies. Genetic Epidemiology, 37(3), 229–238. https://doi.org/10.1002/gepi.2013.37.issue-3
Schill, W., Jöckel, K. H., Drescher, K., & Timm, J. (1993). Logistic analysis in case-control studies under validation sampling. Biometrika, 80(2), 339–352. https://doi.org/10.1093/biomet/80.2.339
Scott, A. J., & Wild, C. J. (1997). Fitting regression models to case-control data by maximum likelihood. Biometrika, 84(1), 57–71. https://doi.org/10.1093/biomet/84.1.57
Thomas, D. C., Yang, Z., & Yang, F. (2013). Two-phase and family-based designs for next-generation sequencing studies. Frontiers in Genetics, 4, 276. https://doi.org/10.3389/fgene.2013.00276
Walker, A. M. (1982). Anamorphic analysis: sampling and estimation for covariate effects when both exposure and disease are known. Biometrics, 38(4), 1025–1032. https://doi.org/10.2307/2529883
White, J. E. (1982). A two stage design for the study of the relationship between a rare exposure and a rare disease. American Journal of Epidemiology, 115(1), 119–128. https://doi.org/10.1093/oxfordjournals.aje.a113266
Wu, C., & Thompson, M. E. (2020). Sampling theory and practice. Springer.
Zhao, P., & Wu, C. (2019). Some theoretical and practical aspects of empirical likelihood methods for complex surveys. International Statistical Review, 87(1), S239–S256. https://doi.org/10.1111/insr.v87.S1
Zhou, H., Song, R., Wu, Y., & Qin, J. (2011). Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics, 67(1), 194–202. https://doi.org/10.1111/j.1541-0420.2010.01446.x

To cite this article: Zhen Sheng, Yukun Liu & Jing Qin (2022) Empirical likelihood inference and goodness-of-fit test for logistic regression model under two-phase case-control sampling, Statistical Theory and Related Fields, 6:4, 265-276, DOI: 10.1080/24754269.2021.1946373 To link to this article: https://doi.org/10.1080/24754269.2021.1946373

Archives

References

Authors

About the Journal

Links

Search

Archives