Abstract
This work deals with a class of discrete-time zero-sum Markov games whose state process fxtg evolves according to the equation xt+1 = F(xt; at; bt; ϵt); where at and bt represent the actions of player 1 and 2, respectively, and {ϵt} is a sequence of independent and identically distributed random variables with unknown distribution θ: Assuming possibly unbounded payo θ, and using the empirical distribution to estimate θ; we introduce approximation schemes for the value of the game as well as for optimal strategies considering both, discounted and average criteria.
Original language | English |
---|---|
Pages (from-to) | 694-716 |
Number of pages | 23 |
Journal | Kybernetika |
Volume | 53 |
Issue number | 4 |
DOIs | |
State | Published - 2017 |
Bibliographical note
Funding Information:This work was partially supported by Consejo Nacional de Ciencia y Tecnología (CONACYT) under grant CB2015/254306.
Keywords
- Discounted and average criteria
- Empirical estimation
- Markov games