Vis enkel innførsel

dc.contributor.authorGranmo, Ole-Christoffer
dc.contributor.authorBerg, Stian
dc.date.accessioned2011-02-25T12:03:29Z
dc.date.available2011-02-25T12:03:29Z
dc.date.issued2010
dc.identifier.citationGranmo, O.-C., & Berg, S. (2010). Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters. In N. García-Pedrajas, F. Herrera, C. Fyfe, J. Benítez & M. Ali (Eds.), Trends in Applied Intelligent Systems (Vol. 6098, pp. 199-208): Springer Berlin / Heidelberg.en_US
dc.identifier.isbn978-3-642-13032-
dc.identifier.issn0302-9743
dc.identifier.urihttp://hdl.handle.net/11250/137863
dc.descriptionPublished version of an article from Lecture Notes in Computer Science. Also available at SpringerLink: http://dx.doi.org/10.1007/978-3-642-13033-5_21en_US
dc.description.abstractThe multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel solution scheme for bandit problems with non-stationary normally distributed rewards. The scheme is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling Kalman Filters, and on random sampling from these posteriors. Furthermore, it is able to track the better actions, thus supporting non-stationary bandit problems. Extensive experiments demonstrate that our scheme outperforms recently proposed bandit playing algorithms, not only in non-stationary environments, but in stationary environments also. Furthermore, our scheme is robust to inexact parameter settings. We thus believe that our methodology opens avenues for obtaining improved novel solutions.en_US
dc.language.isoengen_US
dc.publisherSpringeren_US
dc.relation.ispartofseriesLecture Notes in Computer Science ; 6098
dc.titleSolving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filtersen_US
dc.typeChapteren_US
dc.typePeer reviewed
dc.subject.nsiVDP::Mathematics and natural science: 400::Information and communication science: 420::Knowledge based systems: 425en_US
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550en_US
dc.source.pagenumber199-208en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel