Review Articles

Calibration bands for mean estimates within the exponential dispersion family

Łukasz Delong ,

Department of Statistics and Econometrics, Faculty of Economic Sciences, University of Warsaw, Poland

Selim Gatti ,

RiskLab, Department of Mathematics, ETH Zurich, Zürich, Switzerland

selim.gatti@math.ethz.ch

Mario V. Wüthrich

RiskLab, Department of Mathematics, ETH Zurich, Zürich, Switzerland

Pages | Received 25 Mar. 2025, Accepted 11 Jan. 2026, Published online: 04 Feb. 2026,
  • Abstract
  • Full Article
  • References
  • Citations

A statistical model is said to be calibrated if the resulting mean estimates perfectly match the true means of the underlying responses. Aiming for calibration is often not achievable in practice as one has to deal with finite samples of noisy observations. A weaker notion of calibration is auto-calibration. An auto-calibrated model satisfies that the expected value of the responses being given the same mean estimate matches this estimate. Testing for auto-calibration has only been considered recently in the literature, and we propose a new approach based on calibration bands. Calibration bands denote a set of lower and upper bounds such that the probability that the true means lie simultaneously inside those bounds exceeds some given confidence level. Such bands were constructed by Yang and Barber ((2019). Contraction and uniform convergence of isotonic regression. Electronic Journal of Statistics13(1), 646–677. https://doi.org/10.1214/18-EJS1520) for sub-Gaussian distributions. Dimitriadis et al. ((2023). Honest calibration assessment for binary outcome predictions. Biometrika110(3), 663–680. https://doi.org/10.1093/biomet/asac068) then introduced narrower bands for the Bernoulli distribution. We use the same idea in order to extend the construction to the entire exponential dispersion family that contains, for example, the binomial, Poisson, negative binomial, gamma, and normal distributions. Moreover, we show that the obtained calibration bands allow us to construct various tests for calibration and auto-calibration, respectively. As the construction of the bands does not rely on asymptotic results, we emphasize that our tests can be used for any sample size.

Your browser may not support PDF viewing. Please click to download the file.

References

  • Barlow, R. E., Bartholomew, D. J., Bremner, J. M., & Brunk, H. D. (1972). Statistical inference under order restrictions: The theory and application of isotonic regression. Wiley.
  • Barndorff-Nielsen, O. (2014). Information and exponential families: In statistical theory. Wiley.
  • Delong, Ł., & Wüthrich, M. V. (2025). Isotonic regression for variance estimation and its role in mean estimation and model validation. North American Actuarial Journal29(3), 563–591. https://doi.org/10.1080/10920277.2024.2421221
  • Denuit, M., Charpentier, A., & Trufin, J. (2021). Autocalibration and Tweedie-dominance for insurance pricing with machine learning. Insurance: Mathematics and Economics101(B), 485–497.
  • Denuit, M., Huyghe, J., Trufin, J., & Verdebout, T. (2024). Testing for auto-calibration with Lorenz and concentration curves. Insurance: Mathematics and Economics117, 130–139.
  • Denuit, M., & Trufin, J. (2023). Model selection with Pearson's correlation, concentration and Lorenz curves under auto-calibration. European Actuarial Journal13(2), 871–878. https://doi.org/10.1007/s13385-023-00353-5
  • Dimitriadis, T., Dümbgen, L., Henzi, A., Puke, M., & Ziegel, J. (2023). Honest calibration assessment for binary outcome predictions. Biometrika110(3), 663–680. https://doi.org/10.1093/biomet/asac068
  • Dunn, P. K., & Smyth, G. K. (2018). Generalized linear models with examples in R. Springer.
  • Dunn, P. K., & Smyth, G. K. (2022). GLMsData R Package Vignette (Reference manual. Version 1, packaged 2022-08-22) [Computer software manual].
  • Dutang, C., & Charpentier, A. (2018). CASdatasets R Package Vignette (Reference manual. Version 1.0-8, packaged 2018-05-20) [Computer software manual].
  • Fisher, R. A. (1935). The fiducial argument in statistical inference. Annals of Eugenics6(4), 391–398. https://doi.org/10.1111/ahg.1935.6.issue-4
  • Fisher, R. A. (1973). Statistical methods and scientific inference. Hafner Press.
  • Fissler, T., Lorentzen, C., & Mayer, M. (2022). Model comparison and calibration assessment: User guide for consistent scoring functions in machine learning and actuarial practice. Preprint. arXiv:2202.12780 [stat.ML].
  • Gneiting, T., & Resin, J. (2023). Regression diagnostics meets forecast evaluation: Conditional calibration, reliability diagrams, and coefficient of determination. Electronic Journal of Statistics17(2), 3226–3286. https://doi.org/10.1214/23-EJS2180
  • Henzi, A., Mösching, A., & Dümbgen, L. (2022). Accelerating the pool-adjacent-violators algorithm for isotonic distributional regression. Methodology and Computing in Applied Probability24(4), 2633–2645. https://doi.org/10.1007/s11009-022-09937-2
  • Hosmer, D. W., & Lemeshow, S. (1980). Goodness of fit tests for the multiple logistic regression model. Communications in Statistics - Theory and Methods9(10), 1043–1069. https://doi.org/10.1080/03610928008827941
  • Jørgensen, B. (1986). Some properties of exponential dispersion models. Scandinavian Journal of Statistics13(3), 187–197.
  • Jørgensen, B. (1997). The theory of dispersion Models. Chapman and Hall.
  • Krüger, F., & Ziegel, J. F. (2021). Generic conditions for forecast dominance. Journal of Business & Economic Statistics39(4), 972–983. https://doi.org/10.1080/07350015.2020.1741376
  • McCullagh, P., & Nelder, J. A. (1983). Generalized linear models. Chapman and Hall.
  • Pedersen, J. G. (1978). Fiducial inference. International Statistical Review46(2), 147–170. https://doi.org/10.2307/1402811
  • Pohle, M. O. (2020). The Murphy decomposition and the calibration-resolution principle: A new perspective on forecast evaluation. Preprint. arXiv:2005.01835 [stat.ME].
  • Rao, M. M., & Swift, R. J. (2006). Probability theory with applications. Springer.
  • R Core Team (2021). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/(open in a new window).
  • Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference. Wiley.
  • Shaked, M., & Shanthikumar, J. G. (2007). Stochastic orders. Springer.
  • Sprott, D. A. (2000). Statistical inference in science. Springer.
  • Veronese, P., & Mellili, E. (2015). Fiducial and confidence distributions for real exponential families. Scandinavian Journal of Statistics42(2), 471–484. https://doi.org/10.1111/sjos.v42.2
  • Wüthrich, M. V. (2024). Auto-calibration tests for discrete finite regression functions. Preprint. arXiv:2408.05993 [math.ST].
  • Wüthrich, M. V., & Merz, M. (2023). Statistical foundations of actuarial learning and its applications. Springer.
  • Wüthrich, M. V., & Ziegel, J. (2024). Isotonic recalibration under a low signal-to-noise ratio. Scandinavian Actuarial Journal2024(3), 279–299. https://doi.org/10.1080/03461238.2023.2246743
  • Yang, F., & Barber, R. F. (2019). Contraction and uniform convergence of isotonic regression. Electronic Journal of Statistics13(1), 646–677. https://doi.org/10.1214/18-EJS1520

To cite this article: Łukasz Delong, Selim Gatti & Mario V. Wüthrich (04 Feb 2026): Calibration bands for mean estimates within the exponential dispersion family, Statistical Theory and Related Fields, DOI: 10.1080/24754269.2026.2620835
To link to this article: https://doi.org/10.1080/24754269.2026.2620835