Vis enkel innførsel

dc.contributor.authorYazidi, Anis
dc.contributor.authorSilvestre, Daniel
dc.contributor.authorOommen, John
dc.date.accessioned2022-03-22T12:46:12Z
dc.date.available2022-03-22T12:46:12Z
dc.date.created2021-08-09T13:45:16Z
dc.date.issued2021
dc.identifier.citationYazidi, A., Silvestre, D. & Oommen, J. (2021) Solving Two-Person Zero-Sum Stochastic Games With Incomplete Information Using Learning Automata With Artificial Barriers IEEE Transactions on Neural Networks and Learning Systems. 2021, .en_US
dc.identifier.issn2162-237X
dc.identifier.urihttps://hdl.handle.net/11250/2986834
dc.description.abstractLearning automata (LA) with artificially absorbing barriers was a completely new horizon of research in the 1980s (Oommen, 1986). These new machines yielded properties that were previously unknown. More recently, absorbing barriers have been introduced in continuous estimator algorithms so that the proofs could follow a martingale property, as opposed to monotonicity (Zhang et al., 2014), (Zhang et al., 2015). However, the applications of LA with artificial barriers are almost nonexistent. In that regard, this article is pioneering in that it provides effective and accurate solutions to an extremely complex application domain, namely that of solving two-person zero-sum stochastic games that are provided with incomplete information. LA have been previously used (Sastry et al., 1994) to design algorithms capable of converging to the game's Nash equilibrium under limited information. Those algorithms have focused on the case where the saddle point of the game exists in a pure strategy. However, the majority of the LA algorithms used for games are absorbing in the probability simplex space, and thus, they converge to an exclusive choice of a single action. These LA are thus unable to converge to other mixed Nash equilibria when the game possesses no saddle point for a pure strategy. The pioneering contribution of this article is that we propose an LA solution that is able to converge to an optimal mixed Nash equilibrium even though there may be no saddle point when a pure strategy is invoked. The scheme, being of the linear reward-inaction ( $L_{R-I}$ ) paradigm, is in and of itself, absorbing. However, by incorporating artificial barriers, we prevent it from being ``stuck'' or getting absorbed in pure strategies. Unlike the linear reward-εpenalty ( $L_{R-ε P}$ ) scheme proposed by Lakshmivarahan and Narendra almost four decades ago, our new scheme achieves the same goal with much less parameter tuning and in a more elegant manner. This article includes the nontrial proofs of the theoretical results characterizing our scheme and also contains experimental verification that confirms our theoretical findings.en_US
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.titleSolving Two-Person Zero-Sum Stochastic Games With Incomplete Information Using Learning Automata With Artificial Barriersen_US
dc.typeJournal articleen_US
dc.typePeer revieweden_US
dc.description.versionacceptedVersionen_US
dc.rights.holder© 2021 IEEEen_US
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550en_US
dc.source.pagenumber12en_US
dc.source.journalIEEE Transactions on Neural Networks and Learning Systemsen_US
dc.identifier.doi10.1109/TNNLS.2021.3099095
dc.identifier.cristin1924767
dc.relation.projectUniversitetet i Stavanger: CAIRen_US
cristin.qualitycode2


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel