Solving dynamic bandit problems and decentralized games using the kalman bayesian learning automaton

2010

Multi-armed bandit problems have been subject to a lot of research in computer science because it captures

the fundamental dilemma of exploration versus exploitation in reinforcement learning. The goal of

a bandit problem is to determine the optimal balance between the gain of new information (exploration)

and immediate reward maximization (exploitation). Dynamic bandit problems are especially challenging

because they involve changing environments. Combined with game theory, where one analyze the

behavior of agents in multi-agent settings, bandit problems serves as a framework for benchmarking the

applicability of learning algorithms in various situations.

In this thesis, we investigate a novel approach to the multi-armed bandit problem, the Kalman Bayesian

Learning Automaton, an algorithm which applies concepts from Kalman filtering, a powerful technique

for probabilistic reasoning over time. To determine the effectiveness of such an approach we have conducted

an empirical study of the Kalman Bayesian Learning Automaton in multi-armed dynamic bandit

problems and selected games from game theory. Specifically, we evaluate the performance of the Kalman

Bayesian Learning Automaton in randomly changing environments, switching environments, the Goore

game, the Prisoners Dilemma and zero-sum games. The scalability and robustness of the algorithm are

also examined.

Indeed, we reveal that the strength of the Kalman Bayesian Learning Automatons lies in its excellent

tracking abilities, and are among the top performers in all experiments. Unfortunately, it is dependent on

tuning of parameters. We believe further work on the approach could solve the parameter problem, but

even with the need to tune parameters we consider the Kalman Bayesian Learning Automaton a strong

solution to dynamic multi-armed bandit problems and definitely has the potential to be applied in various

applications and multi-agent settings.

Masteroppgave i informasjons- og kommunikasjonsteknologi 2010 – Universitetet i Agder, Grimstad

University of Agder