Interconnected learning automata playing Iterated Prisoner's Dilemma
Abstract
There exist several examples of situations where interaction between entities is essential in order to reach a common goal, for instance negotiations. Problems arise when the parties of a case are in disagreement with each other and they all want the best for themselves. Our thesis focuses around one such case. Analyzing the results of its own actions, an entity can learn what kind of adversaries it faces. For instance they may be cooperative, hostile, ignorant or irrational. The situation becomes even more complex when the opposing parties change its behaviour while trying to reach a goal. In this case one has to detect and adapt the change of behaviour.
A “Learning Automaton” (LA) is a learning entity that learns the optimal action to use from its set of possible actions. It does this by performing actions toward an environment and analyzes the resulting response.
In our thesis the environment is the game of “Iterated Prisoner’s Dilemma” (IPD) which is a non-cooperative, non-zerosum game. This game represents a case where two prisoners are held in prison of a common theft. They are given the option to either defect and turn against the other prisoner by telling the truth or to cooperate with the other prisoner by holding the truth back. The game is played over several iterations where actions performed in the past may affect the present choice of actions.
There is no set optimal strategy in this game. The optimal action depends on the strategy of the opponent. If the opponent should switch strategy during play, we get a nonstationary environment. This makes the situation even more challenging, since the learner will have to adapt toward new strategies when the opponent switches. State-of-the art technologies do not handle such nonstationary environment well. However, nonstationary environments are in accordance with most real-life situations; environments are seldom static. The main aim of our thesis is to design systems of Learning Automata that handle IPD games where the opponent switches strategy during the game.
It is our hypothesis that “Interconnected Learning Automata” (ILA) will play better in such an environment than other existing automata. Interconnected learning automata are several learning entities connected in a network to achieve a common task. If the learner also remembers previous moves, we believe the learner will be able to adapt to rapid changes in the environment quickly.
To evaluate this we have developed three new players. One player remembers the moves performed in recent iterations in order analyze the opponent more effectively. This player is called SLASH. The second player is called the Hierarchical player where a learner has a selection of predefined strategies as its action set. The third player is called I-SLASH which consists of several SLASH players arranged in a hierarchy like the Hierarchical player.
A prototype was developed to evaluate the new players in a nonstationary environment. Several tests where defined to evaluate how well the new players perform compared to other well known players. There have also been tests evaluating how well the new players play comparing to the optimal play.
Tests show that the Hierarchical player does well, but will always be limited to its set of available strategies. The Hierarchical player also has a slow learning rate because every strategy will have to be tested in turn. SLASH shows a remarkable ability to learn to play close to optimal in most cases. In the tests performed SLASH does well overall, but is beaten by its subspecies I-SLASH. I-SLASH has a higher learning rate than SLASH and receives on average a better score in most of the cases we have tested.
Description
Masteroppgave i informasjons- og kommunikasjonsteknologi 2004 - Høgskolen i Agder, Grimstad
Publisher
Høgskolen i AgderAgder University College