Playing Axis & Allies Revised Using Learning Automata
Master thesis
Åpne
Permanent lenke
http://hdl.handle.net/11250/137076Utgivelsesdato
2009Metadata
Vis full innførselSamlinger
Sammendrag
The Artficial Intelligence (AI) of opponents in computer games in general, and
in strategy games in particular, have been plagued with performance problems
of many kinds since they first appeared. Not the least of these problems is the
fact that their design schemes often base themselves on predfined ways to play
the game, making these opponents predictable and dull to a seasoned player.
In this thesis, we propose using Learning Automata (LA) to create opponents
that are able to adapt to any game situation and find a good response, much in
the way a player would - by looking ahead in time to see what could happen in
the game beyond the immediate next move.
As a suitable environment for these LA, we have chosen the game Axis &
Allies Revised. A turn-based war game emulating the second world war, it has
many layers of complexity for the LA to struggle with - multiple moves per turn,
random outcome of combat, and highly complex rules. To play this game well,
the artficial opponent would need not only coordinate all his units into the best
combined move each turn, but also to avoid performing moves in the present
that it would be punished for during the next turns.
To solve these problems, we propose a two-step solution: First, each unit
will be assigned its own, independent LA. Secondly, for each possible action
that this unit can select in the next immediate turn, another independent LA
will be assigned. This process can then be repeated until a sufficient depth into
future moves has been achieved. Each tier of LA in this structure will receive
its feedback not from its immediate surroundings - but from the status of the
next LA down the tree.
In this thesis we lay the foundation for such a solution by implementing
the method on a smaller scale, and by carefully testing its performance in a
controlled environment. We find which approaches give the best results, which
can only perform under certain conditions, and which are suitable for expanding
into larger scale.
The three types of LA chosen for our testing covers most schools of reinforcement
learning. The Tsetlin Automata, with its simple, state based structure.
The Linear Reward Inaction Automata, with its linear updating scheme. And
finally the Bayesian Learning Automata, shaping conjugate distributions in order
to determine the optimal action. Each have their own unique strengths and
weaknesses, which are recorded in this thesis.
Through thorough testing and careful tuning of these automata, we conclude
that while LA may in fact have the potential to perform well in almost any type
of scenario, it would still be impractical considering the time spent on deciding
on a move. While the speed of decision making of our LA vary, so does its
performance, even in our small scale testing.
Nevertheless, we believe that our results should give some insight into the
possibilities and benefits, both in performance and design simplicity, of using
LA as the decision maker for artificial players.
Beskrivelse
Masteroppgave i informasjons- og kommunikasjonsteknologi 2009 – Universitetet i Agder, Grimstad