Solving dynamic bandit problems and decentralized games using the kalman bayesian learning automaton
Master thesis
Åpne
Permanent lenke
http://hdl.handle.net/11250/137504Utgivelsesdato
2010Metadata
Vis full innførselSamlinger
Sammendrag
Multi-armed bandit problems have been subject to a lot of research in computer science because it captures
the fundamental dilemma of exploration versus exploitation in reinforcement learning. The goal of
a bandit problem is to determine the optimal balance between the gain of new information (exploration)
and immediate reward maximization (exploitation). Dynamic bandit problems are especially challenging
because they involve changing environments. Combined with game theory, where one analyze the
behavior of agents in multi-agent settings, bandit problems serves as a framework for benchmarking the
applicability of learning algorithms in various situations.
In this thesis, we investigate a novel approach to the multi-armed bandit problem, the Kalman Bayesian
Learning Automaton, an algorithm which applies concepts from Kalman filtering, a powerful technique
for probabilistic reasoning over time. To determine the effectiveness of such an approach we have conducted
an empirical study of the Kalman Bayesian Learning Automaton in multi-armed dynamic bandit
problems and selected games from game theory. Specifically, we evaluate the performance of the Kalman
Bayesian Learning Automaton in randomly changing environments, switching environments, the Goore
game, the Prisoners Dilemma and zero-sum games. The scalability and robustness of the algorithm are
also examined.
Indeed, we reveal that the strength of the Kalman Bayesian Learning Automatons lies in its excellent
tracking abilities, and are among the top performers in all experiments. Unfortunately, it is dependent on
tuning of parameters. We believe further work on the approach could solve the parameter problem, but
even with the need to tune parameters we consider the Kalman Bayesian Learning Automaton a strong
solution to dynamic multi-armed bandit problems and definitely has the potential to be applied in various
applications and multi-agent settings.
Beskrivelse
Masteroppgave i informasjons- og kommunikasjonsteknologi 2010 – Universitetet i Agder, Grimstad