Efficient gaussian process based optimistic knapsack sampling with applications to stochastic resource allocation
Abstract
The stochastic non-linear fractional knapsack problem is a challeng-
ing optimization problem with numerous applications, including resource
allocation. The goal is to nd the most valuable mix of materials that
ts within a knapsack of xed capacity. When the value functions of the
involved materials are fully known and di erentiable, the most valuable
mixture can be found by direct application of Lagrange multipliers. How-
ever, in many real-world applications, such as web polling, information
about material value is uncertain, and in many cases missing altogether.
Surprisingly, without prior information about material value, the recently
proposed Learning Automata Knapsack Game (LAKG) and Hierarchy of
Twofold Resource Allocation Automata (H-TRAA) o ers arbitrarily ac-
curate convergence towards the optimal solution, simply by interacting
with the knapsack on-line. This paper introduces Gaussian Process based
Optimistic Knapsack Sampling (GPOKS) a novel model-based reinforce-
ment learning scheme for solving stochastic fractional knapsack problems,
founded on Gaussian Process (GP) enabled Optimistic Thompson Sam-
pling (OTS). Not only does this scheme converge signi cantly faster than
LAKG, GPOKS also incorporates GP based learning of the material val-
ues themselves, forming the basis for OTS supported balancing between
exploration and exploitation. Using resource allocation in web polling as a
proof-of-concept application, our empirical results show that GPOKS con-
sistently outperforms LAKG and H-TRAA, the current top-performers,
under a wide variety of parameter settings.
Description
Masteroppgave i informasjons- og kommunikasjonsteknologi IKT590 2013 – Universitetet i Agder, Grimstad