Empirical Evaluation of the Bayesian Learning Automaton Family
Master thesis
Permanent lenke
http://hdl.handle.net/11250/137478Utgivelsesdato
2009Metadata
Vis full innførselSamlinger
Sammendrag
The two-armed bandit problem is a classical optimization problem where a player sequentially
selects and pulls one of two arms attached to a gambling machine, and each arm pull results in
either a reward or penalty to the player. Each arm is associated with a certain reward probability
which is unknown to the player, and the player needs to sequentially select and play an arm
and receive a reward or a penalty in order to discover its true reward probability. The overall
goal for the player is reward maximization, and the player needs to balance between exploiting
existing knowledge or obtaining new knowledge by trying different arms. In the long run it
may be beneficial to risk short term loss to gain greater certainty about the reward probability
associated with each arm.
The problem as described above is the simplest case of the more general k-armed bandit problem
and is concerned with the exploration vs. exploitation dilemma. A wide range of schemes
has been proposed to solve the problem, and in this thesis we focus our attention on a series
of schemes collectively referred to as the Bayesian Learning Automaton family, originally introduced
by O-C. Granmo with the Bayesian Learning Automaton deigned for Bernoulli distributed
reward.
The Bayesian Learning Automata rely upon Bayesian statistics and the use of conjugate prior
distributions with sets of updating rules of hyperparameters associated with these conjugate
prior distributions. These updating rules make it possible to apply Bayesian statistics in an
automaton in a simple and highly computationally efficient manner.
In this thesis we extend the Bayesian Learning Automaton family with three new members
designed for Poisson and normally distributed reward.
We empirically evaluate both the Bayesian Learning Automaton family and competing schemes
in the general k-armed bandit problem with Bernoulli, Poisson and normally distributed reward.
We also empirically evaluate these players in the Goore game, iterative prisoners’ dilemma and
two-player zero-sum game from game theory.
Through extensive experiments we show that the Bayesian Learning Automaton family overall
outperforms all other comparable learning schemes in the k-armed bandit problem, and the
Bayesian Learning Automata are among the top performers in all the the games they are introduced
to in this thesis. Thus, we believe that the Bayesian Learning Automaton family is an
important addition to the field of bandit playing algorithms and is an important area for further
research.
Beskrivelse
Masteroppgave i informasjons- og kommunikasjonsteknologi 2009 – Universitetet i Agder, Grimstad