A perturbation approach to a class of discounted approximate value iteration algorithms with borel spaces

Óscar Vega-Amaya*, Joaquín López-Borbón

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

The present paper gives computable performance bounds for the approximate value iteration (AVI) algorithm when are used approximation operators satisfying the following properties: (i) they are positive linear operators; (ii) constant functions are fixed points of such operators; (iii) they have certain continuity property. Such operators define transition probabilities on the state space of the controlled systems. This has two important consequences: (a) one can see the approximating function as the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be thought of as a perturbation of the original Markov model. These two facts enable us to give finite-time bounds for the AVI algorithm performance depending on the operators accuracy to approximate the cost function and the transition law of the system. The results are illustrated with numerical approximations for a class of inventory systems.

Original languageEnglish
Pages (from-to)261-278
Number of pages18
JournalJournal of Dynamics and Games
Volume3
Issue number3
DOIs
StatePublished - 2016

Bibliographical note

Publisher Copyright:
© 2016, American Institute of Mathematical Sciences.

Keywords

  • Approximate value iteration algorithm
  • Discounted criterion
  • Markov decision processes
  • Perturbed models

Fingerprint

Dive into the research topics of 'A perturbation approach to a class of discounted approximate value iteration algorithms with borel spaces'. Together they form a unique fingerprint.

Cite this