Review Articles

Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers

Yaping Wang ,

a KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, People's Republic of China

Zhicheng Peng ,

a KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, People's Republic of China;b Ant Group, Hangzhou, People's Republic of China

Riquan Zhang ,

a KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, People's Republic of China

Qian Xiao

c Department of Statistics, University of Georgia, Athens, GA, USA

qian.xiao@uga.edu

Pages 122-133 | Received 03 Oct. 2020, Accepted 10 Mar. 2021, Published online: 12 Apr. 2021,
  • Abstract
  • Full Article
  • References
  • Citations

ABSTRACT

References

  1. Alaya-Feki, A. B. H., Moulines, E., & LeCornec, A. (2008). Dynamic spectrum access with non-stationary multi-armed bandit. In 2008 IEEE 9th Workshop on Signal Processing Advances in Wireless Communications. (pp. 416–420). IEEE. [Crossref], [Google Scholar]
  2. Allesiardo, R., & Féraud, R. (2015). Exp3 with drift detection for the switching bandit problem. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (pp. 1–7). IEEE. [Crossref], [Google Scholar]
  3. Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research3, 397–422. https://dl.acm.org/doi/10.5555/944919.944941 [Google Scholar]
  4. Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning47, 235–256. https://doi.org/10.1023/A:1013689704352 [Crossref][Web of Science ®], [Google Scholar]
  5. Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing32, 48–77. https://doi.org/10.1137/S0097539701398375 [Crossref][Web of Science ®], [Google Scholar]
  6. Besbes, O., Gur, Y., & Zeevi, A. (2014). Stochastic multi-armed-bandit problem with non-stationary rewards. In Advances in Neural Information Processing Systems (pp. 199–207). MIT Press. [Google Scholar]
  7. Buccapatnam, S., Liu, F., Eryilmaz, A., & Shroff, N. B. (2017). Reward maximization under uncertainty: leveraging side-observations on networks. Journal of Machine Learning Research18, 7947–7980. https://dl.acm.org/doi/10.5555/3122009.3242073 [Web of Science ®], [Google Scholar]
  8. Cao, Y., Wen, Z., Kveton, B., & Xie, Y. (2019). Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In Proceedings of Machine Learning Research (pp. 418–427). PMLR. [Google Scholar]
  9. Garivier, A., & Moulines, E. (2011). On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory (pp. 174–188).  Springer. [Crossref], [Google Scholar]
  10. Hartland, C., Baskiotis, N., Gelly, S., Sebag, M., & Teytaud, O. (2007). Change point detection and meta-bandits for online learning in dynamic environments. In CAp 2007: 9è Conférence francophone sur l'apprentissage automatique (pp. 237–250). CEPADUES. [Google Scholar]
  11. Hinkley, D. V. (1971). Inference about the change-point from cumulative sum tests. Biometrika58, 509–523. https://doi.org/10.1093/biomet/58.3.509 [Crossref][Web of Science ®], [Google Scholar]
  12. Kaufmann, E., Korda, N., & Munos, R. (2012). Thompson sampling: An asymptotically optimal finite-time analysis. In International Conference on Algorithmic Learning Theory (pp. 199–213). Springer. [Crossref], [Google Scholar]
  13. Kocsis, L., & Szepesvári, C. (2006). Discounted UCB. In 2nd PASCAL Challenges Workshop (Vol. 2). http://videolectures.net/pcw06_kocsis_diu/ [Google Scholar]
  14. Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics6, 4–22. https://doi.org/10.1016/0196-8858(85)90002-8 [Crossref][Web of Science ®], [Google Scholar]
  15. Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press. [Crossref], [Google Scholar]
  16. Li, S., Karatzoglou, A., & Gentile, C. (2016). Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 539–548). New York, NY: Association for Computing Machinery. [Crossref], [Google Scholar]
  17. Liu, F., Lee, J., & Shroff, N. (2018). A change-detection based framework for piecewise-stationary multi-armed bandit problem. In Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press. [Google Scholar]
  18. Mellor, J., & Shapiro, J. (2013). Thompson sampling in switching environments with Bayesian online change detection. In Artificial Intelligence and Statistics (pp. 442–450). Springer. [Google Scholar]
  19. Srivastava, V., Reverdy, P., & Leonard, N. E. (2014). Surveillance in an abruptly changing world via multiarmed bandits. In 53rd IEEE Conference on Decision and Control (pp. 692–697). IEEE. [Crossref], [Google Scholar]
  20. Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika25, 285–294. https://doi.org/10.1093/biomet/25.3-4.285 [Crossref], [Google Scholar]
  21. Yu, J. Y., & Mannor, S. (2009). Piecewise-stationary bandit problems with side observations. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1177–1184). New York, NY: Association for Computing Machinery. [Google Scholar]