Abstract
The present paper gives computable performance bounds for the approximate value iteration (AVI) algorithm when are used approximation operators satisfying the following properties: (i) they are positive linear operators; (ii) constant functions are fixed points of such operators; (iii) they have certain continuity property. Such operators define transition probabilities on the state space of the controlled systems. This has two important consequences: (a) one can see the approximating function as the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be thought of as a perturbation of the original Markov model. These two facts enable us to give finite-time bounds for the AVI algorithm performance depending on the operators accuracy to approximate the cost function and the transition law of the system. The results are illustrated with numerical approximations for a class of inventory systems.
Original language | English |
---|---|
Pages (from-to) | 261-278 |
Number of pages | 18 |
Journal | Journal of Dynamics and Games |
Volume | 3 |
Issue number | 3 |
DOIs | |
State | Published - 2016 |
Bibliographical note
Publisher Copyright:© 2016, American Institute of Mathematical Sciences.
Keywords
- Approximate value iteration algorithm
- Discounted criterion
- Markov decision processes
- Perturbed models