Efficient gaussian process based optimistic knapsack sampling with applications to stochastic resource allocation

2013

The stochastic non-linear fractional knapsack problem is a challeng-

ing optimization problem with numerous applications, including resource

allocation. The goal is to nd the most valuable mix of materials that

ts within a knapsack of xed capacity. When the value functions of the

involved materials are fully known and di erentiable, the most valuable

mixture can be found by direct application of Lagrange multipliers. How-

ever, in many real-world applications, such as web polling, information

about material value is uncertain, and in many cases missing altogether.

Surprisingly, without prior information about material value, the recently

proposed Learning Automata Knapsack Game (LAKG) and Hierarchy of

Twofold Resource Allocation Automata (H-TRAA) o ers arbitrarily ac-

curate convergence towards the optimal solution, simply by interacting

with the knapsack on-line. This paper introduces Gaussian Process based

Optimistic Knapsack Sampling (GPOKS) a novel model-based reinforce-

ment learning scheme for solving stochastic fractional knapsack problems,

founded on Gaussian Process (GP) enabled Optimistic Thompson Sam-

pling (OTS). Not only does this scheme converge signi cantly faster than

LAKG, GPOKS also incorporates GP based learning of the material val-

ues themselves, forming the basis for OTS supported balancing between

exploration and exploitation. Using resource allocation in web polling as a

proof-of-concept application, our empirical results show that GPOKS con-

sistently outperforms LAKG and H-TRAA, the current top-performers,

under a wide variety of parameter settings.

Masteroppgave i informasjons- og kommunikasjonsteknologi IKT590 2013 – Universitetet i Agder, Grimstad

Universitetet i Agder / University of Agder