A policy iteration algorithm for zero-sum stochastic games with mean payoff

Jean Cochet-Terrasson; Stéphane Gaubert

doi:10.1016/j.crma.2006.07.011

Game Theory/Complex Analysis

A policy iteration algorithm for zero-sum stochastic games with mean payoff
[Algorithme d'itération sur les politiques pour les jeux stochastiques à somme nulle avec gain moyen]

Présenté par : Pierre-Louis Lions

Jean Cochet-Terrasson ¹ ; Stéphane Gaubert ²

¹ CGA, 14, rue Saint Dominique, 75007 Paris, France
² INRIA, domaine de Voluceau, B.P. 105, 78153 Le Chesnay cedex, France

Comptes Rendus. Mathématique, Volume 343 (2006) no. 5, pp. 377-382.

Abstract (VO)
Résumé

We give a policy iteration algorithm to solve zero-sum stochastic games with finite state and action spaces and perfect information, when the value is defined in terms of the mean payoff per turn. This algorithm does not require any irreducibility assumption on the Markov chains determined by the strategies of the players. It is based on a discrete nonlinear analogue of the notion of reduction of a super-harmonic function.

Nous donnons un algorithme d'itération sur les politiques pour résoudre les jeux stochastiques à somme nulle, avec espaces d'état et d'action finis, en information parfaite, lorsque la valeur du jeu est définie en termes de gain moyen par tour. Cet algorithme ne demande pas que les chaînes de Markov déterminées par les stratégies des deux joueurs soient irréductibles. Il repose sur un analogue discret non-linéaire de la notion de réduite d'une fonction surharmonique.

Reçu le : 2006-07-03
Accepté le : 2006-07-12
Publié le : 2006-08-10

DOI : 10.1016/j.crma.2006.07.011

Affiliations des auteurs :

Jean Cochet-Terrasson ¹ ; Stéphane Gaubert ²

¹ CGA, 14, rue Saint Dominique, 75007 Paris, France
² INRIA, domaine de Voluceau, B.P. 105, 78153 Le Chesnay cedex, France

@article{CRMATH_2006__343_5_377_0,
     author = {Jean Cochet-Terrasson and St\'ephane Gaubert},
     title = {A policy iteration algorithm for zero-sum stochastic games with mean payoff},
     journal = {Comptes Rendus. Math\'ematique},
     pages = {377--382},
     publisher = {Elsevier},
     volume = {343},
     number = {5},
     year = {2006},
     doi = {10.1016/j.crma.2006.07.011},
     language = {en},
}

TY  - JOUR
AU  - Jean Cochet-Terrasson
AU  - Stéphane Gaubert
TI  - A policy iteration algorithm for zero-sum stochastic games with mean payoff
JO  - Comptes Rendus. Mathématique
PY  - 2006
SP  - 377
EP  - 382
VL  - 343
IS  - 5
PB  - Elsevier
DO  - 10.1016/j.crma.2006.07.011
LA  - en
ID  - CRMATH_2006__343_5_377_0
ER  -

%0 Journal Article
%A Jean Cochet-Terrasson
%A Stéphane Gaubert
%T A policy iteration algorithm for zero-sum stochastic games with mean payoff
%J Comptes Rendus. Mathématique
%D 2006
%P 377-382
%V 343
%N 5
%I Elsevier
%R 10.1016/j.crma.2006.07.011
%G en
%F CRMATH_2006__343_5_377_0

Jean Cochet-Terrasson; Stéphane Gaubert. A policy iteration algorithm for zero-sum stochastic games with mean payoff. Comptes Rendus. Mathématique, Volume 343 (2006) no. 5, pp. 377-382. doi : 10.1016/j.crma.2006.07.011. https://comptes-rendus.academie-sciences.fr/mathematique/articles/10.1016/j.crma.2006.07.011/

Bibliographie
Cité par

[1] M. Akian; S. Gaubert Spectral theorem for convex monotone homogeneous maps, and ergodic control, Nonlinear Anal., Volume 52 (2003) no. 2, pp. 637-679

[2] H. Bjorklund, S. Sandberg, S. Vorobyov, A combinatorial strongly subexponential strategy improvement algorithm for mean payoff games, Technical Report 05, DIMACS, 2004

[3] J. Cochet-Terrasson, Algorithmes d'itération sur les politiques pour les applications monotones contractantes, Thèse, École des Mines de Paris, 2001

[4] J. Cochet-Terrasson; S. Gaubert; J. Gunawardena A constructive fixed point theorem for min–max functions, Dynamics and Stability of Systems, Volume 14 (1999) no. 4, pp. 407-433

[5] A. Condon The complexity of stochastic games, Inform. Comput., Volume 96 (1992) no. 2, pp. 203-224

[6] E.V. Denardo; B.L. Fox Multichain Markov renewal programs, SIAM J. Appl. Math., Volume 16 (1968), pp. 468-487

[7] S. Gaubert; J. Gunawardena The duality theorem for min–max functions, C. R. Acad. Sci. Paris, Ser. I, Volume 326 (1998) no. 1, pp. 43-48

[8] S. Gaubert; J. Gunawardena The Perron–Frobenius theorem for homogeneous, monotone functions, Trans. Amer. Math. Soc., Volume 356 (2004) no. 12, pp. 4931-4950

[9] A.J. Hoffman; R.M. Karp On nonterminating stochastic games, Management Sci., Volume 12 (1966) no. 5, pp. 359-370

[10] R. Howard Dynamic Programming and Markov Processes, Wiley, 1960

[11] M. Jurdziński, M. Paterson, U. Zwick, A deterministic subexponential algorithm for solving parity games, in: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA 2006), January 2006

[12] E. Kohlberg Invariant half-lines of nonexpansive piecewise-linear transformations, Math. Oper. Res., Volume 5 (1980) no. 3, pp. 366-372

[13] A.J. Lazarus; D.E. Loeb; J.G. Propp; W.R. Stromquist; D.H. Ullman Combinatorial games under auction play, Games Econom. Behav., Volume 27 (1999) no. 2, pp. 229-264

[14] Y. Peres; O. Schramm; S. Sheffield; D. Wilson Tug-of-war and the infinity Laplacian, 2006 (arXiv:) | arXiv

Cité par Sources :

Commentaires - Politique