Solving dynamic bandit problems and decentralized games using the kalman bayesian learning automaton

Berg, Stian

dc.contributor.author	Berg, Stian
dc.date.accessioned	2010-12-06T13:52:28Z
dc.date.available	2010-12-06T13:52:28Z
dc.date.issued	2010
dc.identifier.uri	http://hdl.handle.net/11250/137504
dc.description	Masteroppgave i informasjons- og kommunikasjonsteknologi 2010 – Universitetet i Agder, Grimstad	en_US
dc.description.abstract	Multi-armed bandit problems have been subject to a lot of research in computer science because it captures the fundamental dilemma of exploration versus exploitation in reinforcement learning. The goal of a bandit problem is to determine the optimal balance between the gain of new information (exploration) and immediate reward maximization (exploitation). Dynamic bandit problems are especially challenging because they involve changing environments. Combined with game theory, where one analyze the behavior of agents in multi-agent settings, bandit problems serves as a framework for benchmarking the applicability of learning algorithms in various situations. In this thesis, we investigate a novel approach to the multi-armed bandit problem, the Kalman Bayesian Learning Automaton, an algorithm which applies concepts from Kalman filtering, a powerful technique for probabilistic reasoning over time. To determine the effectiveness of such an approach we have conducted an empirical study of the Kalman Bayesian Learning Automaton in multi-armed dynamic bandit problems and selected games from game theory. Specifically, we evaluate the performance of the Kalman Bayesian Learning Automaton in randomly changing environments, switching environments, the Goore game, the Prisoners Dilemma and zero-sum games. The scalability and robustness of the algorithm are also examined. Indeed, we reveal that the strength of the Kalman Bayesian Learning Automatons lies in its excellent tracking abilities, and are among the top performers in all experiments. Unfortunately, it is dependent on tuning of parameters. We believe further work on the approach could solve the parameter problem, but even with the need to tune parameters we consider the Kalman Bayesian Learning Automaton a strong solution to dynamic multi-armed bandit problems and definitely has the potential to be applied in various applications and multi-agent settings.	en_US
dc.language.iso	eng	en_US
dc.publisher	University of Agder	en_US
dc.title	Solving dynamic bandit problems and decentralized games using the kalman bayesian learning automaton	en_US
dc.type	Master thesis	en_US
dc.source.pagenumber	129	en_US

Tilhørende fil(er)

Filnavn:: Berg.pdf
Størrelse:: 897.0Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Master's theses in Information and Communication Technology [505]
MM500, IKT590, IKT591

Vis enkel innførsel