Vis enkel innførsel

dc.contributor.authorZhang, Xuan
dc.contributor.authorOommen, B. John
dc.contributor.authorGranmo, Ole-Christoffer
dc.date.accessioned2012-01-05T08:17:46Z
dc.date.available2012-01-05T08:17:46Z
dc.date.issued2011
dc.identifier.citationZhang, X., Oommen, B. J., & Granmo, O.-C. (2011). Generalized Bayesian pursuit: A novel scheme for multi-armed Bernoulli bandit problems. In L. Iliadis, I. Maglogiannis & H. Papadopoulos (Eds.), Artificial Intelligence Applications and Innovations (Vol. 364, pp. 122-131): Springer.no_NO
dc.identifier.isbn978-3-642-23959-5
dc.identifier.urihttp://hdl.handle.net/11250/137913
dc.descriptionPublished version of a chapter in the book: IFIP Advances in Information and Communication Technology. Also available from the publisher at: http;//dx.doi.org/10.1007/978-3-642-23960-1_16no_NO
dc.description.abstractIn the last decades, a myriad of approaches to the multi-armed bandit problem have appeared in several different fields. The current top performing algorithms from the field of Learning Automata reside in the Pursuit family, while UCB-Tuned and the ε -greedy class of algorithms can be seen as state-of-the-art regret minimizing algorithms. Recently, however, the Bayesian Learning Automaton (BLA) outperformed all of these, and other schemes, in a wide range of experiments. Although seemingly incompatible, in this paper we integrate the foundational learning principles motivating the design of the BLA, with the principles of the so-called Generalized Pursuit algorithm (GPST), leading to the Generalized Bayesian Pursuit algorithm (GBPST). As in the BLA, the estimates are truly Bayesian in nature, however, instead of basing exploration upon direct sampling from the estimates, GBPST explores by means of the arm selection probability vector of GPST. Further, as in the GPST, in the interest of higher rates of learning, a set of arms that are currently perceived as being optimal is pursued to minimize the probability of pursuing a wrong arm. It turns out that GBPST is superior to GPST and that it even performs better than the BLA by controlling the learning speed of GBPST. We thus believe that GBPST constitutes a new avenue of research, in which the performance benefits of the GPST and the BLA are mutually augmented, opening up for improved performance in a number of applications, currently being tested.no_NO
dc.language.isoengno_NO
dc.publisherSpringerno_NO
dc.relation.ispartofseriesIFIP Advances in Information and Communication Technology;364
dc.subjectBandit problems, estimator algorithms, general Bayesian pursuit algorithm, Beta distribution, conjugate priorsno_NO
dc.titleGeneralized Bayesian pursuit: A novel scheme for multi-armed Bernoulli bandit problemsno_NO
dc.typeChapterno_NO
dc.typePeer reviewedno_NO
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550no_NO
dc.subject.nsiVDP::Mathematics and natural science: 400::Information and communication science: 420::Algorithms and computability theory: 422no_NO


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel