References
- Bernardo, J., & Smith, A. F. M. (2000). Bayesian theory. John Wiley & Sons. [Google Scholar]
- Biau, G., Devroye, L., & Lugosi, G. (2008). Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research, 9(66), 2015–2033. https://doi.org/10.1145/1390681.1442799 [Google Scholar]
- Bierens, H. (2005). Introduction to the mathematical and statistical foundations of econometrics. Cambridge University Press. [Google Scholar]
- Billingsley, P. (2012). Probability and measure. Wiley. [Google Scholar]
- Breiman, L. (2001a). Stacked regressions. Machine Learning, 24(1), 49–64. https://doi.org/10.1007/BF00117832 [Crossref], [Google Scholar]
- Breiman, L. (2001b). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324 [Crossref], [Web of Science ®], [Google Scholar]
- Caruana, R., Karampatziakis, N., & Yessenalina, A. (2008). An empirical evaluation of supervised learning in high dimensions. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 2008. [Crossref], [Google Scholar]
- Cesa-Bnachi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge University Press. [Crossref], [Google Scholar]
- Clarke, B. (2007). Information optimality and Bayesian modelling. Journal of Econometrics, 138(2), 405–429. https://doi.org/10.1016/j.jeconom.2006.05.003 [Crossref], [Web of Science ®], [Google Scholar]
- Dawid, A. P. (1984). Statistical theory: The prequential approach (with discussion). Journal of the Royal Statistical Society, Series A 147(2), 278–292. https://doi.org/10.2307/2981683 [Crossref], [Web of Science ®], [Google Scholar]
- Dawid, A. P. (1992). Prequential data analysis. In: M. Ghosh & P. K. Pathak (Eds.), Current issues in statistical inference: Essays in Honor of D. Basu (pp. 113–126). IMS Lecture Notes Monograph Ser. 17. Institute of Mathematical Statistics. [Google Scholar]
- Dawid, A. P. (2010). Fundamentals of prequential analysis. http://www3.stat.sinica.edu.tw/2013frontiers/presentation/29.pdf [Google Scholar]
- Dawid, A. P., & Vovk, V. G. (1999). Prequential probability: Principles and properties. Bernoulli, 5(1), 125–162. https://doi.org/10.2307/3318616 [Crossref], [Web of Science ®], [Google Scholar]
- De Blasi, P. (2013). Discussion on article ‘Bayesian inference with misspecified models’ by Stephen G. Walker. Journal of Statistical Planning and Inference, 143(10), 1634–1637. https://doi.org/10.1016/j.jspi.2013.05.015 [Crossref], [Web of Science ®], [Google Scholar]
- Diaconis, P., Goel, S., & Holmes, S. (2008). Horseshoes in multidimensional scaling and local kernel methods. The Annals of Applied Statistics, 2(3), 777–807. https://doi.org/10.1214/08-AOAS165 [Crossref], [Web of Science ®], [Google Scholar]
- Eck, D., & Crawford, F. (2019). Efficient and minimal length parametric conformal prediction regions. https://arxiv.org/pdf/1905.03657.pdf. [Google Scholar]
- Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences, 55, 1(119), 139. https://doi.org/10.1006/jcss.1997.1504 [Crossref], [Web of Science ®], [Google Scholar]
- Friedman, J., Hastie, T., & Tibshirani, R. (2000). A statistical view of boosting. The Annals of Statistics, 28(2), 337–407. https://doi.org/10.1214/aos/1016218223 [Crossref], [Web of Science ®], [Google Scholar]
- Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70(350), 320–328. https://doi.org/10.1080/01621459.1975.10479865 [Taylor & Francis Online], [Web of Science ®], [Google Scholar]
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). Elements of statistical learning (2nd ed.). Springer. [Crossref], [Google Scholar]
- Hoff, P., & Wakefield, J. (2013). Bayesian sandwich posteriors for pseudo-true parameters: A discussion of ‘Bayesian inference with misspecified models’ by Stephen Walker. Journal of Statistical Planning and Inference, 143(10), 1638–1642. https://doi.org/10.1016/j.jspi.2013.05.014 [Crossref], [Web of Science ®], [Google Scholar]
- Kimeldorf, G., & Wahba, G. (1971). Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 33(1), 82–95. https://doi.org/10.1016/0022-247X(71)90184-3 [Crossref], [Web of Science ®], [Google Scholar]
- Le, T. M., & Clarke, B. (2016a). Using the Bayesian Shtarkov solution for predictions. Computational Statistics & Data Analysis, 104(9), 183–196. https://doi.org/10.1016/j.csda.2016.06.018 [Crossref], [Google Scholar]
- Le, T. M., & Clarke, B. (2016b). A Bayes interpretation of stacking for M-complete and -open settings. Bayesian Analysis, 12(3), 807–829.https://doi.org/10.1214/16-BA1023 [Web of Science ®], [Google Scholar]
- Le, T. M., & Clarke, B. (2018). On the interpretation of ensemble classifiers in terms of Bayes classifiers. Journal of Classification, 35(2), 198–229. https://doi.org/10.1007/s00357-018-9257-y [Crossref], [Web of Science ®], [Google Scholar]
- Le, T. M., & Clarke, B. (2020). In praise of partially interpretable predictors. Statistical Analysis and Data Mining, 13(2), 113–133. https://doi.org/10.1002/sam.v13.2 [Crossref], [Web of Science ®], [Google Scholar]
- Liyang, Z., & Lee, Y. (2013). Eigen-analysis of nonlinear PCA with polynomial kernels. Statistical Analysis and Data Mining, 6(6), 529–544. https://doi.org/10.1002/sam.11211 [Crossref], [Web of Science ®], [Google Scholar]
- Mease, D., & Wuner, A. (2008). Evidence contrary to the statistical view of boosting. The Journal of Machine Learning Research, 9(6), 131–156.https://doi.org/10.1145/1390681.1390687 [Google Scholar]
- Newey, W., & McFadden, D. (1994). Large sample estimation and hypothesis testing. Elsevier Science. [Google Scholar]
- O'Hagan, A. (2013). Bayesian inference with misspecified models: Inference about what? Journal of Statistical Planning and Inference, 143(10), 1643–1648. https://doi.org/10.1016/j.jspi.2013.05.016 [Crossref], [Web of Science ®], [Google Scholar]
- Pearson, K. (1895). Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. Philosophical Transactions of the Royal Society of London A, (186), 343–414. https://doi.org/10.1098/rsta.1895.0010 [Google Scholar]
- Polson, N. G., Scott, J. G., & Windle, J. (2014). The Bayesian bridge. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 713–733. https://doi.org/10.1111/rssb.2014.76.issue-4 [Crossref], [Web of Science ®], [Google Scholar]
- Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5(June 1990), 197–227. https://doi.org/10.1007/BF00116037 [Crossref], [Google Scholar]
- Scholkopf, B., & Smola, A. (2002). Learning with kernels. MIT Press. [Google Scholar]
- Shafer, G., & Vovk, V. (2008). A tutorial on conformal prediction. The Journal of Machine Learning Research, 9(12), 371–421. [Google Scholar]
- Shi, T., Belkin, M., & Yu, B. (2008). Data spectroscopy: Learning mixture models using eigenspaces of convolution operators. In Andrew McCallum & Sam Roweis (Eds.), Proceedings of the 25th Annual International Conference on Machine Learning (pp. 936–953). University of Cambridge. [Crossref], [Google Scholar]
- Shtarkov, Y. (1987). Universal sequential coding of single messages. Problems in Information Transmission, 23(3), 3–17. [Google Scholar]
- Steyn, H. (1960). On regression properties of multivariate probability functions of Pearso's types. Proceedings of the Royal Academy of Sciences, 63, 302–311. https://doi.org/10.1016/S1385-7258(60)50038-2 [Google Scholar]
- Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1(June 2001), 211–244. https://doi.org/10.1162/15324430152748236. [Google Scholar]
- Van de Geer, S. (2000). Applications of empirical process theory. Cambridge University Press. [Google Scholar]
- Vovk, V., Gammerman, A., & Shafer, G. (2005). Algorithmic learning in a random world. Springer. [Google Scholar]
- Vovk, V., Petej, I., Nouretdinov, I., Manokhin, V., & Gammerman, A. (2020). Computationally efficient versions of conformal predictive distributions. Neurocomputing, 397(July 2020), 292–308. https://doi.org/10.1016/j.neucom.2019.10.110 [Crossref], [Google Scholar]
- Walker, S. G. (2013). Bayesian inference with misspecified models, with discussion and rejoinder. Journal of Statistical Planning and Inference, 143(10), 1621–1633. https://doi.org/10.1016/j.jspi.2013.05.013 [Crossref], [Web of Science ®], [Google Scholar]
- Williamson, R. E. (1956). Multiply monotone functions and their Laplace transforms. Duke Mathematical Journal, 23(2), 189–207. https://doi.org/10.1215/S0012-7094-56-02317-1 [Crossref], [Web of Science ®], [Google Scholar]
- Wyner, A., Olsen, M., & Bleich, J. (2017). Explaining the success of
AdaBoost and random forests as interpolating classifiers. Journal of Machine Learning Research, 18(May 2017), 1–33. [Google Scholar] - Xie, Q., & Barron, A. R. (2000). Asymptotic minimax regret for data compression, gambling, and prediction. IEEE Transactions on Information Theory, 46(2), 431–445. https://doi.org/10.1109/18.825803 [Crossref], [Web of Science ®], [Google Scholar]