Solving Two-Person Zero-Sum Stochastic Games With Incomplete Information Using Learning Automata With Artificial Barriers

Yazidi, Anis; Silvestre, Daniel; Oommen, John

dc.contributor.author	Yazidi, Anis
dc.contributor.author	Silvestre, Daniel
dc.contributor.author	Oommen, John
dc.date.accessioned	2022-03-22T12:46:12Z
dc.date.available	2022-03-22T12:46:12Z
dc.date.created	2021-08-09T13:45:16Z
dc.date.issued	2021
dc.identifier.citation	Yazidi, A., Silvestre, D. & Oommen, J. (2021) Solving Two-Person Zero-Sum Stochastic Games With Incomplete Information Using Learning Automata With Artificial Barriers IEEE Transactions on Neural Networks and Learning Systems. 2021, .	en_US
dc.identifier.issn	2162-237X
dc.identifier.uri	https://hdl.handle.net/11250/2986834
dc.description.abstract	Learning automata (LA) with artificially absorbing barriers was a completely new horizon of research in the 1980s (Oommen, 1986). These new machines yielded properties that were previously unknown. More recently, absorbing barriers have been introduced in continuous estimator algorithms so that the proofs could follow a martingale property, as opposed to monotonicity (Zhang et al., 2014), (Zhang et al., 2015). However, the applications of LA with artificial barriers are almost nonexistent. In that regard, this article is pioneering in that it provides effective and accurate solutions to an extremely complex application domain, namely that of solving two-person zero-sum stochastic games that are provided with incomplete information. LA have been previously used (Sastry et al., 1994) to design algorithms capable of converging to the game's Nash equilibrium under limited information. Those algorithms have focused on the case where the saddle point of the game exists in a pure strategy. However, the majority of the LA algorithms used for games are absorbing in the probability simplex space, and thus, they converge to an exclusive choice of a single action. These LA are thus unable to converge to other mixed Nash equilibria when the game possesses no saddle point for a pure strategy. The pioneering contribution of this article is that we propose an LA solution that is able to converge to an optimal mixed Nash equilibrium even though there may be no saddle point when a pure strategy is invoked. The scheme, being of the linear reward-inaction ( $L_{R-I}$ ) paradigm, is in and of itself, absorbing. However, by incorporating artificial barriers, we prevent it from being ``stuck'' or getting absorbed in pure strategies. Unlike the linear reward-εpenalty ( $L_{R-ε P}$ ) scheme proposed by Lakshmivarahan and Narendra almost four decades ago, our new scheme achieves the same goal with much less parameter tuning and in a more elegant manner. This article includes the nontrial proofs of the theoretical results characterizing our scheme and also contains experimental verification that confirms our theoretical findings.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.title	Solving Two-Person Zero-Sum Stochastic Games With Incomplete Information Using Learning Automata With Artificial Barriers	en_US
dc.type	Journal article	en_US
dc.type	Peer reviewed	en_US
dc.description.version	acceptedVersion	en_US
dc.rights.holder	© 2021 IEEE	en_US
dc.subject.nsi	VDP::Technology: 500::Information and communication technology: 550	en_US
dc.source.pagenumber	12	en_US
dc.source.journal	IEEE Transactions on Neural Networks and Learning Systems	en_US
dc.identifier.doi	10.1109/TNNLS.2021.3099095
dc.identifier.cristin	1924767
dc.relation.project	Universitetet i Stavanger: CAIR	en_US
cristin.qualitycode	2

Files in this item

Name:: article.pdf
Size:: 5.907Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record