A formal proof of the ε-optimality of absorbing continuous pursuit algorithms using the theory of regular functions

Zhang, Xuan; Granmo, Ole-Christoffer; Oommen, B. John; Jiao, Lei

dc.contributor.author	Zhang, Xuan
dc.contributor.author	Granmo, Ole-Christoffer
dc.contributor.author	Oommen, B. John
dc.contributor.author	Jiao, Lei
dc.date.accessioned	2014-11-24T11:27:22Z
dc.date.available	2014-11-24T11:27:22Z
dc.date.issued	2014
dc.identifier.citation	Zhang, X., Granmo, O.-C., Oommen, B. J., & Jiao, L. (2014). A formal proof of the ε-optimality of absorbing continuous pursuit algorithms using the theory of regular functions. Applied Intelligence, 41(3), 974-985. doi: 10.1007/s10489-014-0541-1	nb_NO
dc.identifier.issn	0924-669X
dc.identifier.uri	http://hdl.handle.net/11250/226359
dc.description	Published version of an article from the journal: Applied Intelligence. Also available on Springerlink: http://dx.doi.org/10.1007/s10489-014-0541-1	nb_NO
dc.description.abstract	The most difficult part in the design and analysis of Learning Automata (LA) consists of the formal proofs of their convergence accuracies. The mathematical techniques used for the different families (Fixed Structure, Variable Structure, Discretized etc.) are quite distinct. Among the families of LA, Estimator Algorithms (EAs) are certainly the fastest, and within this family, the set of Pursuit algorithms have been considered to be the pioneering schemes. Informally, if the environment is stationary, their ε-optimality is defined as their ability to converge to the optimal action with an arbitrarily large probability, if the learning parameter is sufficiently small/large. The existing proofs of all the reported EAs follow the same fundamental principles, and to clarify this, in the interest of simplicity, we shall concentrate on the family of Pursuit algorithms. Recently, it has been reported Ryan and Omkar (J Appl Probab 49(3):795–805, 2012) that the previous proofs for ε-optimality of all the reported EAs have a common flaw. The flaw lies in the condition which apparently supports the so-called “monotonicity” property of the probability of selecting the optimal action, which states that after some time instant t 0, the reward probability estimates will be ordered correctly forever. The authors of the various proofs have rather offered a proof for the fact that the reward probability estimates are ordered correctly at a single point of time after t 0, which, in turn, does not guarantee the ordering forever, rendering the previous proofs incorrect. While in Ryan and Omkar (J Appl Probab 49(3):795–805, 2012), a rectified proof was presented to prove the ε-optimality of the Continuous Pursuit Algorithm (CPA), which was the pioneering EA, in this paper, a new proof is provided for the Absorbing CPA (ACPA), i.e., an algorithm which follows the CPA paradigm but which artificially has absorbing states whenever any action probability is arbitrarily close to unity. Unlike the previous flawed proofs, instead of examining the monotonicity property of the action probabilities, it rather examines their submartingale property, and then, unlike the traditional approach, invokes the theory of Regular functions to prove that the probability of converging to the optimal action can be made arbitrarily close to unity. We believe that the proof is both unique and pioneering, and adds insights into the convergence of different EAs. It can also form the basis for formally demonstrating the ε-optimality of other Estimator algorithms which are artificially rendered absorbing.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Springer	nb_NO
dc.subject	ε-optimality
dc.subject	Absorbing CPA
dc.subject	CPA
dc.subject	Pursuit algorithms
dc.title	A formal proof of the ε-optimality of absorbing continuous pursuit algorithms using the theory of regular functions	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.subject.nsi	VDP::Technology: 500::Information and communication technology: 550	nb_NO
dc.source.pagenumber	974-985	nb_NO
dc.source.volume	41	nb_NO
dc.source.journal	Applied Intelligence	nb_NO
dc.source.issue	3	nb_NO
dc.identifier.doi	10.1007/s10489-014-0541-1

Tilhørende fil(er)

Filnavn:: Zhang_2014_Formal.pdf
Størrelse:: 632.5Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Scientific Publications in Information and Communication Technology [710]

Vis enkel innførsel