Towards Thompson Sampling for Complex Bayesian Reasoning

Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found.

The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.

Description

Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.

Has parts

Paper I: Glimsdal, S. & Granmo, O.-C. (2018). A Bayesian network based solution scheme for the constrained Stochastic On-line Equi-Partitioning Problem. Applied Intelligence, 48, 3735–3747. doi: https://doi.org/10.1007/s10489-018-1172-8. Author´s accepted manuscript. Full-text is not available in AURA as a separate file.

Paper II: Glimsdal, S. & Granmo, O.-C. (2019). Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems. Journal of Machine Learning Research, 20(52). http://jmlr.org/papers/v20/18-263.html Published version. Full-text is not available in AURA as a separate file.

Paper III: Glimsdal, S. & Granmo, O.-C. (2015). Thompson Sampling Guided Stochastic Searching on the Line for Non-Stationary Adversarial Learning. In 2015 IEEE 14th International Conference on Machine Learning and Applications, 687-692. doi: https://doi.org/10.1109/ICMLA.2015.203. Published version. Full-text is not available in AURA as a separate file.

Paper IV: Glimsdal, S. & Granmo, O.-C. (2013). Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game. Applied Intelligence, 38, 479–488. doi: https://doi.org/10.1007/s10489-012-0346-z. Published version. Full-text is not available in AURA as a separate file.

Paper V: Glimsdal, S. & Granmo, O.-C. (2013). Gaussian Process Based Optimistic Knapsack Sampling with Applications to Stochastic Resource Allocation. In M. Glass, S. Hettiarachchi & R. Finkbine (Eds.), Proceedings of the 24th Midwest Artificial Intelligence and Cognitive Science Conference (1348, p. 43-50). http://ceur-ws.org/Vol-1348/. Published version. Full-text is not available in AURA as a separate file.

Paper VI: Glimsdal, S. & Granmo, O.-C. (2019). Thompson Sampling Based Active Learning in Probabilistic Programs with Application to Travel Time Estimation. In F. Wotawa, G. Friedrich, I. Pill, R. Koitz-Hristov & M. Ali (Eds), Advances and Trends in Artificial Intelligence. From Theory to Practice (11606, p. 71-78). Cham: Springer. doi: https://doi.org/10.1007/978-3-030-22999-3_7. Published version. Full-text is not available in AURA as a separate file.