This paper deals with finite nonzero-sum Markov games under a discounted optimality criterion and infinite horizon. The state process evolves according to a stochastic difference equation and depends on players' actions as well as a random disturbance whose distribution is unknown to the players. The actions, the states, and the values of the disturbance are observed by the players, then they use the empirical distribution of the disturbances to estimate the true distribution and make choices based on the available information. In this context, we propose an almost surely convergent procedure—possibly after passing to a subsequence—to approximate Nash equilibria of the Markov game with the true distribution of the random disturbance.
Bibliographical noteFunding Information:
This work was supported by Consejo Nacional de Ciencia y Tecnología (CONACYT) under grant Ciencia Frontera 2019‐87787. Funding information
© 2022 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd.
- discounted criterion
- empirical estimation
- Markov games
- Nash equilibrium