Vis enkel innførsel

dc.contributor.authorGranmo, Ole-Christoffer
dc.contributor.authorGlimsdal, Sondre
dc.date.accessioned2011-11-15T13:10:45Z
dc.date.available2011-11-15T13:10:45Z
dc.date.issued2011
dc.identifier.citationGranmo, O.-C., & Glimsdal, S. (2011). A two-armed bandit based scheme for accelerated decentralized learning. In K. Mehrotra, C. Mohan, J. Oh, P. Varshney & M. Ali (Eds.), Modern Approaches in Applied Intelligence (Vol. 6704, pp. 532-541): Springer Berlin / Heidelberg.no_NO
dc.identifier.isbn978-3-642-21826-2
dc.identifier.urihttp://hdl.handle.net/11250/137867
dc.descriptionPublished version of a chapter from the book: Modern Approaches in Applied Intelligence. Also available at SpringerLink: http://dx.doi.org/10.1007/978-3-642-21827-9_54no_NO
dc.description.abstractThe two-armed bandit problem is a classical optimization problem where a decision maker sequentially pulls one of two arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Bandit problems are particularly fascinating because a large class of real world problems, including routing, QoS control, game playing, and resource allocation, can be solved in a decentralized manner when modeled as a system of interacting gambling machines. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel scheme for decentralized decision making based on the Goore Game in which each decision maker is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling conjugate priors, and on random sampling from these posteriors. We further report theoretical results on the variance of the random rewards experienced by each individual decision maker. Based on these theoretical results, each decision maker is able to accelerate its own learning by taking advantage of the increasingly more reliable feedback that is obtained as exploration gradually turns into exploitation in bandit problem based learning. Extensive experiments demonstrate that the accelerated learning allows us to combine the benefits of conservative learning, which is high accuracy, with the benefits of hurried learning, which is fast convergence. In this manner, our scheme outperforms recently proposed Goore Game solution schemes, where one has to trade off accuracy with speed. We thus believe that our methodology opens avenues for improved performance in a number of applications of bandit based decentralized decision making.no_NO
dc.language.isoengno_NO
dc.publisherSpringer Berlin/Heidelbergno_NO
dc.relation.ispartofseriesLecture Notes in Computer Science;6704
dc.titleA two-armed bandit based scheme for accelerated decentralized learningno_NO
dc.typeChapterno_NO
dc.typePeer reviewedno_NO
dc.subject.nsiVDP::Mathematics and natural science: 400::Mathematics: 410::Applied mathematics: 413no_NO
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550::Computer technology: 551no_NO
dc.source.pagenumber532-541no_NO


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel