Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game

Granmo, Ole-Christoffer; Glimsdal, Sondre

dc.contributor.author	Granmo, Ole-Christoffer
dc.contributor.author	Glimsdal, Sondre
dc.date.accessioned	2012-09-04T11:41:22Z
dc.date.available	2012-09-04T11:41:22Z
dc.date.issued	2012
dc.identifier.citation	Granmo, O.-C., & Glimsdal, S. (2012). Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game. Applied Intelligence, 1-10. doi: 10.1007/s10489-012-0346-z	no_NO
dc.identifier.issn	0924-669X
dc.identifier.uri	http://hdl.handle.net/11250/137969
dc.description	Published version of an article in the journal: Applied Intelligence. Also available from the publisher at: http://dx.doi.org/10.1007/s10489-012-0346-z	no_NO
dc.description.abstract	The two-armed bandit problem is a classical optimization problem where a decision maker sequentially pulls one of two arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Bandit problems are particularly fascinating because a large class of real world problems, including routing, Quality of Service (QoS) control, game playing, and resource allocation, can be solved in a decentralized manner when modeled as a system of interacting gambling machines. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel scheme for decentralized decision making based on the Goore Game in which each decision maker is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling conjugate priors, and on random sampling from these posteriors. We further report theoretical results on the variance of the random rewards experienced by each individual decision maker. Based on these theoretical results, each decision maker is able to accelerate its own learning by taking advantage of the increasingly more reliable feedback that is obtained as exploration gradually turns into exploitation in bandit problem based learning. Extensive experiments, involving QoS control in simulated wireless sensor networks, demonstrate that the accelerated learning allows us to combine the benefits of conservative learning, which is high accuracy, with the benefits of hurried learning, which is fast convergence. In this manner, our scheme outperforms recently proposed Goore Game solution schemes, where one has to trade off accuracy with speed. As an additional benefit, performance also becomes more stable. We thus believe that our methodology opens avenues for improved performance in a number of applications of bandit based decentralized decision making.	no_NO
dc.language.iso	eng	no_NO
dc.publisher	Springer	no_NO
dc.subject	bandit problems	no_NO
dc.subject	Goore Game	no_NO
dc.subject	Bayesian learning	no_NO
dc.subject	decentralized decision making	no_NO
dc.subject	quality of service control	no_NO
dc.subject	wireless sensor networks	no_NO
dc.title	Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game	no_NO
dc.type	Journal article	no_NO
dc.type	Peer reviewed	no_NO
dc.subject.nsi	VDP::Mathematics and natural science: 400::Mathematics: 410::Applied mathematics: 413	no_NO
dc.subject.nsi	VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551	no_NO
dc.source.pagenumber	1-10	no_NO
dc.source.journal	Applied Intelligence	no_NO

Tilhørende fil(er)

Filnavn:: Granmo_2012_Accelerated.pdf
Størrelse:: 504.7Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Scientific Publications in Information and Communication Technology [710]

Vis enkel innførsel