dc.contributor.author | Andersen, Per-Arne | |
dc.contributor.author | Goodwin, Morten | |
dc.contributor.author | Granmo, Ole-Christoffer | |
dc.date.accessioned | 2019-05-02T06:20:40Z | |
dc.date.available | 2019-05-02T06:20:40Z | |
dc.date.created | 2019-02-01T08:38:46Z | |
dc.date.issued | 2018 | |
dc.identifier.citation | Lecture Notes in Computer Science. 2018, LNCS (11311), 143-155. | nb_NO |
dc.identifier.issn | 0302-9743 | |
dc.identifier.uri | http://hdl.handle.net/11250/2596208 | |
dc.description.abstract | Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming Variational Autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration. | nb_NO |
dc.description.abstract | The Dreaming Variational Autoencoder for Reinforcement Learning Environments | nb_NO |
dc.language.iso | eng | nb_NO |
dc.subject | Maskinlæring | nb_NO |
dc.subject | Machine learning | nb_NO |
dc.subject | Deep learning | nb_NO |
dc.title | The Dreaming Variational Autoencoder for Reinforcement Learning Environments | nb_NO |
dc.type | Journal article | nb_NO |
dc.type | Peer reviewed | nb_NO |
dc.description.version | acceptedVersion | nb_NO |
dc.subject.nsi | VDP::Datateknologi: 551 | nb_NO |
dc.subject.nsi | VDP::Computer technology: 551 | nb_NO |
dc.source.pagenumber | 143-155 | nb_NO |
dc.source.volume | LNCS | nb_NO |
dc.source.journal | Lecture Notes in Computer Science | nb_NO |
dc.source.issue | 11311 | nb_NO |
dc.identifier.doi | https://doi.org/10.1007/978-3-030-04191-5_11 | |
dc.identifier.cristin | 1671831 | |
dc.description.localcode | Nivå1 | nb_NO |
cristin.unitcode | 201,15,4,0 | |
cristin.unitname | Institutt for informasjons- og kommunikasjonsteknologi | |
cristin.ispublished | true | |
cristin.fulltext | postprint | |
cristin.qualitycode | 1 | |