Vis enkel innførsel

dc.contributor.authorOmslandseter, Rebekka Olsson
dc.contributor.authorJiao, Lei
dc.contributor.authorZhang, Xuan
dc.contributor.authorYazidi, Anis
dc.contributor.authorOommen, John
dc.date.accessioned2023-02-23T10:38:05Z
dc.date.available2023-02-23T10:38:05Z
dc.date.created2023-01-10T12:38:27Z
dc.date.issued2022
dc.identifier.citationOmslandseter, R. O., Jiao, L., Zhang, X., Yazidi, A. & Oommen, J. (2022). The Hierarchical Discrete Pursuit Learning Automaton: A Novel Scheme With Fast Convergence and Epsilon-Optimality. IEEE Transactions on Neural Networks and Learning Systems, 1-15.en_US
dc.identifier.issn2162-2388
dc.identifier.urihttps://hdl.handle.net/11250/3053563
dc.descriptionAuthor's accepted manuscripten_US
dc.description© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.description.abstractSince the early 1960s, the paradigm of learning automata (LA) has experienced abundant interest. Arguably, it has also served as the foundation for the phenomenon and field of reinforcement learning (RL). Over the decades, new concepts and fundamental principles have been introduced to increase the LA’s speed and accuracy. These include using probability updating functions, discretizing the probability space, and using the “Pursuit” concept. Very recently, the concept of incorporating “structure” into the ordering of the LA’s actions has improved both the speed and accuracy of the corresponding hierarchical machines, when the number of actions is large. This has led to the ϵ -optimal hierarchical continuous pursuit LA (HCPA). This article pioneers the inclusion of all the above-mentioned phenomena into a new single LA, leading to the novel hierarchical discretized pursuit LA (HDPA). Indeed, although the previously proposed HCPA is powerful, its speed has an impediment when any action probability is close to unity, because the updates of the components of the probability vector are correspondingly smaller when any action probability becomes closer to unity. We propose here, the novel HDPA, where we infuse the phenomenon of discretization into the action probability vector’s updating functionality, and which is invoked recursively at every stage of the machine’s hierarchical structure. This discretized functionality does not possess the same impediment, because discretization prohibits it. We demonstrate the HDPA’s robustness and validity by formally proving the ϵ -optimality by utilizing the moderation property. We also invoke the submartingale characteristic at every level, to prove that the action probability of the optimal action converges to unity as time goes to infinity. Apart from the new machine being ϵ -optimal, the numerical results demonstrate that the number of iterations required for convergence is significantly reduce...en_US
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.titleThe Hierarchical Discrete Pursuit Learning Automaton: A Novel Scheme With Fast Convergence and Epsilon-Optimalityen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
dc.rights.holder© 2022 IEEE.en_US
dc.subject.nsiVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550en_US
dc.source.pagenumber1-15en_US
dc.source.journalIEEE Transactions on Neural Networks and Learning Systemsen_US
dc.identifier.doihttps://doi.org/10.1109/TNNLS.2022.3226538
dc.identifier.cristin2104039
dc.relation.projectEuropean Economic Area (EEA) and Norway Grants 2014–2021: EEA-RO-NO-2018-04en_US
cristin.qualitycode2


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel