Vis enkel innførsel

dc.contributor.authorZhang, Xuan
dc.contributor.authorJiao, Lei
dc.contributor.authorOommen, John
dc.contributor.authorGranmo, Ole-Christoffer
dc.date.accessioned2020-03-21T09:14:29Z
dc.date.available2020-03-21T09:14:29Z
dc.date.created2019-05-27T15:49:50Z
dc.date.issued2019
dc.identifier.citationZhang, X., Jiao, L., Oommen, J. & Granmo, O.-C. (2019). A Conclusive Analysis of the Finite-Time Behavior of the Discretized Pursuit Learning Automaton. IEEE Transactions on Neural Networks and Learning Systems, 31(1), 284-294. doi:en_US
dc.identifier.issn2162-2388
dc.identifier.urihttps://hdl.handle.net/11250/2647927
dc.descriptionAuthor's accepted version (post-print).en_US
dc.description© 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.descriptionAvailable from 20/03/2021.
dc.description.abstractThis paper deals with the finite-time analysis (FTA) of learning automata (LA), which is a topic for which very little work has been reported in the literature. This is as opposed to the asymptotic steady-state analysis for which there are, probably, scores of papers. As clarified later, unarguably, the FTA of Markov chains, in general, and of LA, in particular, is far more complex than the asymptotic steady-state analysis. Such an FTA provides rigid bounds for the time required for the LA to attain to a given convergence accuracy. We concentrate on the FTA of the Discretized Pursuit Automaton (DPA), which is probably one of the fastest and most accurate reported LA. Although such an analysis was carried out many years ago, we record that the previous work is flawed. More specifically, in all brevity, the flaw lies in the wrongly ``derived'' monotonic behavior of the LA after a certain number of iterations. Rather, we claim that the property should be invoked is the submartingale property. This renders the proof to be much more involved and deep. In this paper, we rectify the flaw and reestablish the FTA based on such a submartingale phenomenon. More importantly, from the derived analysis, we are able to discover and clarify, for the first time, the underlying dilemma between the DPA's exploitation and exploration properties. We also nontrivially confirm the existence of the optimal learning rate, which yields a better comprehension of the DPA itself.en_US
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.titleA Conclusive Analysis of the Finite-Time Behavior of the Discretized Pursuit Learning Automatonen_US
dc.typeJournal articleen_US
dc.typePeer revieweden_US
dc.description.versionacceptedVersionen_US
dc.rights.holder© 20XX IEEEen_US
dc.subject.nsiVDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420en_US
dc.source.pagenumber284-294en_US
dc.source.volume31en_US
dc.source.journalIEEE Transactions on Neural Networks and Learning Systemsen_US
dc.source.issue1en_US
dc.identifier.doi10.1109/TNNLS.2019.2900639
dc.identifier.cristin1700624
dc.relation.projectUniversitetet i Agder: CAIRen_US
cristin.qualitycode2


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel