Advancements in Safe Deep Reinforcement Learning for Real-Time Strategy Games and Industry Applications
Original version
Andersen, P.-A. (2022). Advancements in Safe Deep Reinforcement Learning for Real-Time Strategy Games and Industry Applications [Doctoral dissertation]. University of Agder.Abstract
Deep reinforcement learning has attracted considerable attention from industry and academia, among others, because of its success in solving intricate video games and industrial applications. Recent advancements in hardware and computing exponentially increase computational power availability, facilitating deep neural networks training. These networks can learn the RL behavior policy from high-dimensional data and perform significantly better than exact tabular solutions, albeit requiring considerably more computer resources.
Games are among the most used applications to assess reinforcement learning (RL) algorithms’ behavioral properties and planning efficiency. They can provide the data structure and volume required to train deep learning models. Specially crafted games can express real-world industry applications to reduce setup costs while drastically increasing reproducibility. RL can improve efficiency in industrial applications where expert systems dominate the scene, reduce manual and potentially dangerous labor. The problem with applied industrial reinforcement learning is that traditional methods learn by trial and error. Because of this, RL agents risk encountering catastrophic events during learning, which can cause damage to humans or equipment. Therefore, using games to train and study safe RL agents is appealing.
Real-Time Strategy (RTS) games are especially captivating because of their high dimensional state and action spaces. Furthermore, RTS games share many attributes with industrial and real-world applications, such as simultaneous actions, imperfect information, and system stochasticity. Recent advancements show that model-free RL algorithms can learn superhuman performance in games such as StarCraft II, again using substantial computational power. Therefore, the downside is that these algorithms are expensive and hard to train, making it challenging to use the same methods for industrial applications. There are also substantial state-space complexity gaps in open-source environments. This restricts algorithm evaluation to only a subset of tasks required for operating sufficiently in industry applications.
Has parts
Paper I: Andersen, P.-A., Goodwin, M. & Granmo, O.-C. (2017). FlashRL : A Reinforcement Learning Platform for Flash Games. Proceedings of the Norwegian ICT Conference for Research and Education. https://www.ntnu.no/ojs/index.php/nikt/article/view/5341. Published version. Full-text is available in AURA as a separate file: http://hdl.handle.net/11250/2490572.Paper II: Andersen, P.-A., Goodwin, M. & Granmo, O.-C. (2017). Towards a Deep Reinforcement Learning Approach for Tower Line Wars. In A. Bramer & M. Petridis (Eds.), Artificial Intelligence XXXIV. SGAI 2017. Lecture Notes in Computer Science, 10630, 101–114. Springer. https://doi.org/10.1007/978-3-319-71078-5_8. Accepted version. Full-text is available in AURA as a separate file: https://hdl.handle.net/11250/3152620.
Paper III: Andersen, P.-A., Goodwin M. & Granmo, O.-C. (2018). Deep RTS : A Game Environment for Deep Reinforcement Learning in Real-Time Strategy Games. In proceedings of 2018 IEEE Conference on Computational Intelligence and Games (pp. 1-8). IEEE. https://doi.org/10.1109/CIG.2018.8490409. Accepted version. Full-text is available in AURA as a separate file: https://hdl.handle.net/11250/3152033.
Paper IV: Andersen, P.-A., Goodwin, M. & Granmo, O.-C. (2018). The Dreaming Variational Autoencoder for Reinforcement Learning Environments. In M. Bramer & M. Petridis (Eds.), Artificial Intelligence XXXV. SGAI 2018. Lecture Notes in Computer Science, 11311, 143-155. Springer. https://doi.org/10.1007/978-3-030-04191-5_11. Accepted version. Full-text is available in AURA as a separate file: http://hdl.handle.net/11250/2596208.
Paper V: Andersen, P.-A., Goodwin, M. Granmo, O.-C. (2019). Towards Model-Based Reinforcement Learning for Industry-Near Environments. In M. Bramer & M. Petridis (Eds.), Artificial Intelligence XXXVI. SGAI 2019. Lecture Notes in Computer Science, 11927, 36-49. Springer. https://doi.org/10.1007/978-3-030-34885-4_3. Accepted version. Full-text is available in AURA as a separate file: .
Paper VI: Andersen, P. -A., Goodwin, M. & Granmo, O.-C. (2020). Increasing sample efficiency in deep reinforcement learning using generative environment modelling. Expert Systems, 38(7): e12537. https://doi.org/10.1111/exsy.12537. Published version. Full-text is available in AURA as a separate file: https://hdl.handle.net/11250/2731779.
Paper VII: Andersen, P.-A., Goodwin, M. & Granmo, O.-C. (2020). Towards safe reinforcement-learning in industrial grid-warehousing. Information Sciences, 537, 467-484. https://doi.org/10.1016/j.ins.2020.06.010. Published version. Full-text is available in AURA as a separate file: https://hdl.handle.net/11250/2711224.
Paper VIII: Andersen, P.-A., Goodwin, M. & Granmo, O.-C. (2020). CostNet : An End-to-End Framework for Goal-Directed Reinforcement Learning. In M. Bramer & R. Ellis (Eds.), Artificial intelligence XXXVII (pp. 94–107). Lecture Notes in Computer Science, 12498 Springer. https://doi.org/10.1007/978-3-030-63799-6_7. Accepted version. Full-text is available in AURA as a separate file: .
Paper IX: Andersen, P.-A., Goodwin, M. & Granmo, O.-C. (2020). Safer Reinforcement Learning for Agents in Industrial Grid-Warehousing. In G. Nicosia, V. Ojha, E. La Malfa, G. Jansen, V. Sciacca, P. Pardalos, G. Giuffrida & R. Umeton (Eds.), Machine Learning, Optimization, and Data Science (pp. 169-180). Lecture Notes in Computer Science, 12566. Springer. https://doi.org/10.1007/978-3-030-64580-9_14. Accepted version. Full-text is available in AURA as a separate file: .
Paper X: Andersen, P.-A., Goodwin, M. & Granmo, O.-C. (2020). Interpretable Option Discovery Using Deep Q-Learning and Variational Autoencoders. In S. Yildirim Yayilgan, I. S. Bajwa & F. Sanfilippo (Eds.), Intelligent Technologies and Applications (pp: 127–138). Communications in Computer and Information Science, 1382. Springer. https://doi.org/10.1007/978-3-030-71711-7_11. Accepted version. Full-text is available in AURA as a separate file: .
Paper XI Andersen, P.-A., Goodwin, M. & Granmo, O.-C. (2021). ORACLE: End-to-End Model Based Reinforcement Learning. In M. Bramer & R. Ellis (Eds.), Artificial intelligence XXXVIII (pp. 1–14). Lecture Notes in Computer Science, 13101. Springer. https://doi.org/10.1007/978-3-030-63799-6_7. Accepted version. Full-text is available in AURA as a separate file: .
Paper XII Andersen, P.-A., Goodwin, M. & Granmo, O.-C. (2022). CaiRL: A High-Performance Reinforcement Learning Environment Toolkit. IEEE Conference on Computatonal Intelligence and Games, 361-368. https://doi.org/10.1109/CoG51982.2022.9893661. Submitted version. Full-text is available in AURA as a separate file: .
Paper XIII Andersen, P.-A., Goodwin, M. & Granmo, O.-C. (2024). Towards safe and sustainable reinforcement learning for real-time strategy games. Information Science, 679: 120980. Submitted version. https://doi.org/10.1016/j.ins.2024.120980. Full-text is available in AURA as a separate file: .