The Hierarchical Discrete Pursuit Learning Automaton: A Novel Scheme With Fast Convergence and Epsilon-Optimality

Omslandseter, Rebekka Olsson; Jiao, Lei; Zhang, Xuan; Yazidi, Anis; Oommen, John

Omslandseter, Rebekka Olsson; Jiao, Lei; Zhang, Xuan; Yazidi, Anis; Oommen, John

Peer reviewed, Journal article

Accepted version

Åpne

Article.pdf (2.196Mb)

Permanent lenke

https://hdl.handle.net/11250/3053563

Utgivelsesdato

2022

Metadata

Vis full innførsel

Samlinger

Originalversjon

Omslandseter, R. O., Jiao, L., Zhang, X., Yazidi, A. & Oommen, J. (2022). The Hierarchical Discrete Pursuit Learning Automaton: A Novel Scheme With Fast Convergence and Epsilon-Optimality. IEEE Transactions on Neural Networks and Learning Systems, 1-15. https://doi.org/10.1109/TNNLS.2022.3226538

Sammendrag

Since the early 1960s, the paradigm of learning automata (LA) has experienced abundant interest. Arguably, it has also served as the foundation for the phenomenon and field of reinforcement learning (RL). Over the decades, new concepts and fundamental principles have been introduced to increase the LA’s speed and accuracy. These include using probability updating functions, discretizing the probability space, and using the “Pursuit” concept. Very recently, the concept of incorporating “structure” into the ordering of the LA’s actions has improved both the speed and accuracy of the corresponding hierarchical machines, when the number of actions is large. This has led to the ϵ -optimal hierarchical continuous pursuit LA (HCPA). This article pioneers the inclusion of all the above-mentioned phenomena into a new single LA, leading to the novel hierarchical discretized pursuit LA (HDPA). Indeed, although the previously proposed HCPA is powerful, its speed has an impediment when any action probability is close to unity, because the updates of the components of the probability vector are correspondingly smaller when any action probability becomes closer to unity. We propose here, the novel HDPA, where we infuse the phenomenon of discretization into the action probability vector’s updating functionality, and which is invoked recursively at every stage of the machine’s hierarchical structure. This discretized functionality does not possess the same impediment, because discretization prohibits it. We demonstrate the HDPA’s robustness and validity by formally proving the ϵ -optimality by utilizing the moderation property. We also invoke the submartingale characteristic at every level, to prove that the action probability of the optimal action converges to unity as time goes to infinity. Apart from the new machine being ϵ -optimal, the numerical results demonstrate that the number of iterations required for convergence is significantly reduce...

Beskrivelse

Author's accepted manuscript

© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.