Empirical Evaluation of the Bayesian Learning Automaton Family

Brådland, Terje; Norheim, Thomas

dc.contributor.author	Brådland, Terje
dc.contributor.author	Norheim, Thomas
dc.date.accessioned	2010-01-19T13:11:13Z
dc.date.available	2009
dc.date.issued	2009
dc.identifier.uri	http://hdl.handle.net/11250/137478
dc.description	Masteroppgave i informasjons- og kommunikasjonsteknologi 2009 – Universitetet i Agder, Grimstad	en
dc.description.abstract	The two-armed bandit problem is a classical optimization problem where a player sequentially selects and pulls one of two arms attached to a gambling machine, and each arm pull results in either a reward or penalty to the player. Each arm is associated with a certain reward probability which is unknown to the player, and the player needs to sequentially select and play an arm and receive a reward or a penalty in order to discover its true reward probability. The overall goal for the player is reward maximization, and the player needs to balance between exploiting existing knowledge or obtaining new knowledge by trying different arms. In the long run it may be beneficial to risk short term loss to gain greater certainty about the reward probability associated with each arm. The problem as described above is the simplest case of the more general k-armed bandit problem and is concerned with the exploration vs. exploitation dilemma. A wide range of schemes has been proposed to solve the problem, and in this thesis we focus our attention on a series of schemes collectively referred to as the Bayesian Learning Automaton family, originally introduced by O-C. Granmo with the Bayesian Learning Automaton deigned for Bernoulli distributed reward. The Bayesian Learning Automata rely upon Bayesian statistics and the use of conjugate prior distributions with sets of updating rules of hyperparameters associated with these conjugate prior distributions. These updating rules make it possible to apply Bayesian statistics in an automaton in a simple and highly computationally efficient manner. In this thesis we extend the Bayesian Learning Automaton family with three new members designed for Poisson and normally distributed reward. We empirically evaluate both the Bayesian Learning Automaton family and competing schemes in the general k-armed bandit problem with Bernoulli, Poisson and normally distributed reward. We also empirically evaluate these players in the Goore game, iterative prisoners’ dilemma and two-player zero-sum game from game theory. Through extensive experiments we show that the Bayesian Learning Automaton family overall outperforms all other comparable learning schemes in the k-armed bandit problem, and the Bayesian Learning Automata are among the top performers in all the the games they are introduced to in this thesis. Thus, we believe that the Bayesian Learning Automaton family is an important addition to the field of bandit playing algorithms and is an important area for further research.	en
dc.format.extent	1219594 bytes
dc.format.mimetype	application/pdf
dc.language.iso	eng	en
dc.publisher	University of Agder	en
dc.title	Empirical Evaluation of the Bayesian Learning Automaton Family	en
dc.type	Master thesis	en
dc.source.pagenumber	127, 16	en

Tilhørende fil(er)

Filnavn:: Braadland_Norheim.pdf
Størrelse:: 1.163Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Master's theses in Information and Communication Technology [491]
MM500, IKT590, IKT591

Vis enkel innførsel