Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces

M. Teresa Robles-Alcaráz, Óscar Vega-Amaya, J. Adolfo Minjárez-Sosa*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

This paper provides finite-time performance bounds for the approximate policy iteration (API) algorithm in discounted Markov decision models. This is done for a class of approximation operators called averagers. An averager is a positive linear operator with norm equal to one with an additional continuity property. The class of averagers includes several of the usual interpolation schemes as lineal and multilinear approximations, kernel-based approximators among others. The API algorithm is studied under two settings. In the first one, the transition probability is completely known and the performance bounds are given in terms of the approximation errors and the stopping error of the policy iteration algorithm. In the second setting, the system evolution is given by a difference equation where the distribution of the random disturbance is unknown. Thus, in addition to the errors by the application of the API, in this case the performance bounds also depend on a statistical estimation error. The results are illustrated with numerical approximations for a class of inventory systems.

Original languageEnglish
Pages (from-to)79-95
Number of pages17
JournalRisk and Decision Analysis
Volume6
Issue number2
DOIs
StatePublished - 2017

Bibliographical note

Publisher Copyright:
© 2017-IOS Press and the authors. All rights reserved.

Keywords

  • Markov decision processes
  • approximate policy iteration
  • density estimation
  • discounted criterion

Fingerprint

Dive into the research topics of 'Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces'. Together they form a unique fingerprint.

Cite this