Granmo, Ole-Christoffer; Berg, Stian (Lecture Notes in Computer Science ; 6098, Chapter; Peer reviewed, 2010)
The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions ...