Review Articles

Sample size and power analysis for stepped wedge cluster randomised trials with binary outcomes

Jijia Wang ,

a Department of Applied Clinical Research, UT Southwestern Medical Center, Dallas, TX, USA

Jing Cao ,

b Department of Statistical Science, Southern Methodist University, Dallas, TX, USA

Song Zhang ,

c Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA

Chul Ahn

c Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA

Pages 162-169 | Received 09 Jul. 2020, Accepted 12 Mar. 2021, Published online: 06 Apr. 2021,
  • Abstract
  • Full Article
  • References
  • Citations


In stepped wedge cluster randomised trials (SW-CRTs), clusters of subjects are randomly assigned to sequences, where they receive a specific order of treatments. Compared to conventional cluster randomised studies, one unique feature of SW-CRTs is that all clusters start from control and gradually transition to intervention according to the randomly assigned sequences. This feature mitigates the ethical concern of withholding an effective treatment and reduces the logistic burden of implementing the intervention at multiple clusters simultaneously. This feature, however, presents challenges that need to be addressed in experimental design and data analysis, i.e., missing data due to prolonged follow-up and complicated correlation structures that involve between-subject and longitudinal correlations. In this study, based on the generalised estimating equation (GEE) approach, we present a closed-form sample size formula for SW-CRTs with a binary outcome, which offers great flexibility to account for unbalanced randomisation, missing data, and arbitrary correlation structures. We also present a correction approach to address the issue of under-estimated variance by GEE estimator when the sample size is small. Simulation studies and application to a real clinical trial are presented.


  1. Bacchieri, G., Barros, A. J., Gonçalves, H., & Gigante, D. P. (2010). A community intervention to prevent traffic accidents among bicycle commuters. Revista De Saude Publica44(5), 867–875. [Crossref][Web of Science ®], [Google Scholar]
  2. Bailet, L. L., Repper, K. K., Piasta, S. B., & Murphy, S. P. (2009). Emergent literacy intervention for prekindergarteners at risk for reading failure. Journal of Learning Disabilities42(4), 336–355. [Crossref][Web of Science ®], [Google Scholar]
  3. Baio, G., Copas, A., Ambler, G., Hargreaves, J., Beard, E., & Omar, R. Z. (2015). Sample size calculation for a stepped wedge trial. Trials16(1), 354. [Crossref], [Google Scholar]
  4. Beard, E., Lewis, J. J., Copas, A., Davey, C., Osrin, D., Baio, G., Thompson, J. A., Fielding, K. L., Omar, R. Z., Ononge, S., Hargreaves, J., & Prost, A. (2015). Stepped wedge randomised controlled trials: systematic review of studies published between 2010 and 2014. Trials16(1), 353. [Crossref], [Google Scholar]
  5. Copas, A. J., Lewis, J. J., Thompson, J. A., Davey, C., Baio, G., & Hargreaves, J. R. (2015). Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches. Trials16(1), 352. [Crossref], [Google Scholar]
  6. Donner, A., & Klar, N. (2000). Design and analysis of cluster randomization trials in health research. Arnold. [Google Scholar]
  7. Edwards, S. J. (2013). Ethics of clinical science in a public health emergency: Drug discovery at the bedside. The American Journal of Bioethics13(9), 3–14. [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  8. Emrich, L. J., & Piedmonte, M. R. (1991). A method for generating high-dimensional multivariate binary variates. The American Statistician45(4), 302–304. [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  9. Fay, M. P., & Graubard, B. I. (2001). Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics57(4), 1198–1206. [Crossref][Web of Science ®], [Google Scholar]
  10. Hemming, K., Haines, T. P., Chilton, P. J., Girling, A. J., & Lilford, R. J. (2015). The stepped wedge cluster randomised trial: Rationale, design, analysis, and reporting. BMJ (Clinical Research Ed.)350, h391. [Crossref][Web of Science ®], [Google Scholar]
  11. Hooper, R., Teerenstra, S., de Hoop, E., & Eldridge, S. (2016). Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Statistics in Medicine35(26), 4718–4728. [Crossref][Web of Science ®], [Google Scholar]
  12. Hussey, M. A., & Hughes, J. P. (2007). Design and analysis of stepped wedge cluster randomized trials. Contemporary Clinical Trials28(2), 182–191. [Crossref][Web of Science ®], [Google Scholar]
  13. Kasza, J., Hemming, K., Hooper, R., Matthews, J., & Forbes, A. (2019). Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials. Statistical Methods in Medical Research28(3), 703–716. [Crossref][Web of Science ®], [Google Scholar]
  14. Kauermann, G., & Carroll, R. J. (2001). A note on the efficiency of sandwich covariance matrix estimation. Journal of the American Statistical Association96(456), 1387–1396. [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  15. Lenguerrand, E., Winter, C., Siassakos, D., MacLennan, G., Innes, K., Lynch, P., Cameron, A., Crofts, J., McDonald, A., McCormack, K., Forrest, M., Norrie, J., Bhattacharya, S., & Draycott, T. (2020). Effect of hands-on interprofessional simulation training for local emergencies in Scotland: The thistle stepped-wedge design randomised controlled trial. BMJ Quality & Safety29(2), 122–134. [Crossref][Web of Science ®], [Google Scholar]
  16. Li, F., Turner, E. L., & Preisser, J. S. (2018). Sample size determination for GEE analyses of stepped wedge cluster randomized trials. Biometrics74(4), 1450–1458. [Crossref][Web of Science ®], [Google Scholar]
  17. Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis for discrete and continuous outcomes using generalized linear models. Biometrika84, 3–32. [Google Scholar]
  18. Mancl, L. A., & DeRouen, T. A. (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics57(1), 126–134. [Crossref][Web of Science ®], [Google Scholar]
  19. Martin, J., Taljaard, M., Girling, A., & Hemming, K. (2016). Systematic review finds major deficiencies in sample size methodology and reporting for stepped-wedge cluster randomised trials. BMJ Open6(2), e010166. [Crossref][Web of Science ®], [Google Scholar]
  20. Morel, J. G., Bokossa, M., & Neerchal, N. K. (2003). Small sample correction for the variance of GEE estimators. Biometrical Journal45(4), 395–409. [Crossref][Web of Science ®], [Google Scholar]
  21. Mouchoux, C., Rippert, P., Duclos, A., Fassier, T., Bonnefoy, M., Comte, B., Heitz, D., Colin, C., & Krolak-Salmon, P. (2011). Impact of a multifaceted program to prevent postoperative delirium in the elderly: The CONFUCIUS stepped wedge protocol. BMC Geriatrics11(1), 1157. [Crossref], [Google Scholar]
  22. Moulton, L. H., Golub, J. E., Durovni, B., Cavalcante, S. C., Pacheco, A. G., Saraceni, V., King, B., & Chaisson, R. E. (2007). Statistical design of THRio: A phased implementation clinic-randomized study of a tuberculosis preventive therapy intervention. Clinical Trials4(2), 190–199. [Crossref][Web of Science ®], [Google Scholar]
  23. Pan, W., & Wall, M. M. (2002). Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Statistics in Medicine21(10), 1429–1441. [Crossref][Web of Science ®], [Google Scholar]
  24. Scalia, P., Durand, M.-A., Forcino, R. C., Schubbe, D., Barr, P. J., O'Brien, N., O'Malley, A. J., Foster, T., Politi, M. C., Laughlin-Tommaso, S., Banks, E., Madden, T., Anchan, R. M., Aarts, J. W. M., Velentgas, P., Balls-Berry, J., Bacon, C., Adams-Foster, M., Mulligan, C. C., …, Elwyn, G. (2019). Implementation of the uterine fibroids option grid patient decision aids across five organizational settings: A randomized stepped-wedge study protocol. Implementation Science14(1), 100. [Crossref][Web of Science ®], [Google Scholar]
  25. van Holland, B. J., de Boer, M. R., Brouwer, S., Soer, R., & Reneman, M. F. (2012). Sustained employability of workers in a production environment: Design of a stepped wedge trial to evaluate effectiveness and cost-benefit of the POSE program. BMC Public Health12(1), 1003. [Crossref], [Google Scholar]
  26. Woertman, W., de Hoop, E., Moerbeek, M., Zuidema, S. U., Gerritsen, D. L., & Teerenstra, S. (2013). Stepped wedge designs could reduce the required sample size in cluster randomized trials. Journal of Clinical Epidemiology66(7), 752–758. [Crossref][Web of Science ®], [Google Scholar]
  27. Zhou, X., Liao, X., Kunz, L. M., Normand, S.-L. T., Wang, M., & Spiegelman, D. (2020). A maximum likelihood approach to power calculations for stepped wedge designs of binary outcomes. Biostatistics (Oxford, England)21(1), 102–121. [Crossref][Web of Science ®], [Google Scholar]
  28. Ziegler, A., & Vens, M. (2010). Generalized estimating equations. Methods of Information in Medicine49(05), 421–425. [Crossref], [Google Scholar]