Calibration bands for mean estimates within the exponential dispersion family

ISSN 2475-4269

CN 31-2182/O1

Selim Gatti ,

RiskLab, Department of Mathematics, ETH Zurich, Zürich, Switzerland

selim.gatti@math.ethz.ch

Mario V. Wüthrich

RiskLab, Department of Mathematics, ETH Zurich, Zürich, Switzerland

Pages | Received 25 Mar. 2025, Accepted 11 Jan. 2026, Published online: 04 Feb. 2026,

Abstract
Full Article
References
Citations

A statistical model is said to be calibrated if the resulting mean estimates perfectly match the true means of the underlying responses. Aiming for calibration is often not achievable in practice as one has to deal with finite samples of noisy observations. A weaker notion of calibration is auto-calibration. An auto-calibrated model satisfies that the expected value of the responses being given the same mean estimate matches this estimate. Testing for auto-calibration has only been considered recently in the literature, and we propose a new approach based on calibration bands. Calibration bands denote a set of lower and upper bounds such that the probability that the true means lie simultaneously inside those bounds exceeds some given confidence level. Such bands were constructed by Yang and Barber ((2019). Contraction and uniform convergence of isotonic regression. Electronic Journal of Statistics, 13(1), 646–677. https://doi.org/10.1214/18-EJS1520) for sub-Gaussian distributions. Dimitriadis et al. ((2023). Honest calibration assessment for binary outcome predictions. Biometrika, 110(3), 663–680. https://doi.org/10.1093/biomet/asac068) then introduced narrower bands for the Bernoulli distribution. We use the same idea in order to extend the construction to the entire exponential dispersion family that contains, for example, the binomial, Poisson, negative binomial, gamma, and normal distributions. Moreover, we show that the obtained calibration bands allow us to construct various tests for calibration and auto-calibration, respectively. As the construction of the bands does not rely on asymptotic results, we emphasize that our tests can be used for any sample size.

References

Barlow, R. E., Bartholomew, D. J., Bremner, J. M., & Brunk, H. D. (1972). Statistical inference under order restrictions: The theory and application of isotonic regression. Wiley.
Barndorff-Nielsen, O. (2014). Information and exponential families: In statistical theory. Wiley.
Delong, Ł., & Wüthrich, M. V. (2025). Isotonic regression for variance estimation and its role in mean estimation and model validation. North American Actuarial Journal, 29(3), 563–591. https://doi.org/10.1080/10920277.2024.2421221
Denuit, M., Charpentier, A., & Trufin, J. (2021). Autocalibration and Tweedie-dominance for insurance pricing with machine learning. Insurance: Mathematics and Economics, 101(B), 485–497.
Denuit, M., Huyghe, J., Trufin, J., & Verdebout, T. (2024). Testing for auto-calibration with Lorenz and concentration curves. Insurance: Mathematics and Economics, 117, 130–139.
Denuit, M., & Trufin, J. (2023). Model selection with Pearson's correlation, concentration and Lorenz curves under auto-calibration. European Actuarial Journal, 13(2), 871–878. https://doi.org/10.1007/s13385-023-00353-5
Dimitriadis, T., Dümbgen, L., Henzi, A., Puke, M., & Ziegel, J. (2023). Honest calibration assessment for binary outcome predictions. Biometrika, 110(3), 663–680. https://doi.org/10.1093/biomet/asac068
Dunn, P. K., & Smyth, G. K. (2018). Generalized linear models with examples in R. Springer.
Dunn, P. K., & Smyth, G. K. (2022). GLMsData R Package Vignette (Reference manual. Version 1, packaged 2022-08-22) [Computer software manual].
Dutang, C., & Charpentier, A. (2018). CASdatasets R Package Vignette (Reference manual. Version 1.0-8, packaged 2018-05-20) [Computer software manual].
Fisher, R. A. (1935). The fiducial argument in statistical inference. Annals of Eugenics, 6(4), 391–398. https://doi.org/10.1111/ahg.1935.6.issue-4
Fisher, R. A. (1973). Statistical methods and scientific inference. Hafner Press.
Fissler, T., Lorentzen, C., & Mayer, M. (2022). Model comparison and calibration assessment: User guide for consistent scoring functions in machine learning and actuarial practice. Preprint. arXiv:2202.12780 [stat.ML].
Gneiting, T., & Resin, J. (2023). Regression diagnostics meets forecast evaluation: Conditional calibration, reliability diagrams, and coefficient of determination. Electronic Journal of Statistics, 17(2), 3226–3286. https://doi.org/10.1214/23-EJS2180
Henzi, A., Mösching, A., & Dümbgen, L. (2022). Accelerating the pool-adjacent-violators algorithm for isotonic distributional regression. Methodology and Computing in Applied Probability, 24(4), 2633–2645. https://doi.org/10.1007/s11009-022-09937-2
Hosmer, D. W., & Lemeshow, S. (1980). Goodness of fit tests for the multiple logistic regression model. Communications in Statistics - Theory and Methods, 9(10), 1043–1069. https://doi.org/10.1080/03610928008827941
Jørgensen, B. (1986). Some properties of exponential dispersion models. Scandinavian Journal of Statistics, 13(3), 187–197.
Jørgensen, B. (1997). The theory of dispersion Models. Chapman and Hall.
Krüger, F., & Ziegel, J. F. (2021). Generic conditions for forecast dominance. Journal of Business & Economic Statistics, 39(4), 972–983. https://doi.org/10.1080/07350015.2020.1741376
McCullagh, P., & Nelder, J. A. (1983). Generalized linear models. Chapman and Hall.
Pedersen, J. G. (1978). Fiducial inference. International Statistical Review, 46(2), 147–170. https://doi.org/10.2307/1402811
Pohle, M. O. (2020). The Murphy decomposition and the calibration-resolution principle: A new perspective on forecast evaluation. Preprint. arXiv:2005.01835 [stat.ME].
Rao, M. M., & Swift, R. J. (2006). Probability theory with applications. Springer.
R Core Team (2021). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/(open in a new window).
Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference. Wiley.
Shaked, M., & Shanthikumar, J. G. (2007). Stochastic orders. Springer.
Sprott, D. A. (2000). Statistical inference in science. Springer.
Veronese, P., & Mellili, E. (2015). Fiducial and confidence distributions for real exponential families. Scandinavian Journal of Statistics, 42(2), 471–484. https://doi.org/10.1111/sjos.v42.2
Wüthrich, M. V. (2024). Auto-calibration tests for discrete finite regression functions. Preprint. arXiv:2408.05993 [math.ST].
Wüthrich, M. V., & Merz, M. (2023). Statistical foundations of actuarial learning and its applications. Springer.
Wüthrich, M. V., & Ziegel, J. (2024). Isotonic recalibration under a low signal-to-noise ratio. Scandinavian Actuarial Journal, 2024(3), 279–299. https://doi.org/10.1080/03461238.2023.2246743
Yang, F., & Barber, R. F. (2019). Contraction and uniform convergence of isotonic regression. Electronic Journal of Statistics, 13(1), 646–677. https://doi.org/10.1214/18-EJS1520

To cite this article: Łukasz Delong, Selim Gatti & Mario V. Wüthrich (04 Feb 2026): Calibration bands for mean estimates within the exponential dispersion family, Statistical Theory and Related Fields, DOI: 10.1080/24754269.2026.2620835
To link to this article: https://doi.org/10.1080/24754269.2026.2620835

Archives

References

Authors

About the Journal

Links

Search

Archives