Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces

M. Teresa Robles-Alcaráz, Óscar Vega-Amaya, J. Adolfo Minjárez-Sosa*

*Autor correspondiente de este trabajo

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

3 Citas (Scopus)

Resumen

This paper provides finite-time performance bounds for the approximate policy iteration (API) algorithm in discounted Markov decision models. This is done for a class of approximation operators called averagers. An averager is a positive linear operator with norm equal to one with an additional continuity property. The class of averagers includes several of the usual interpolation schemes as lineal and multilinear approximations, kernel-based approximators among others. The API algorithm is studied under two settings. In the first one, the transition probability is completely known and the performance bounds are given in terms of the approximation errors and the stopping error of the policy iteration algorithm. In the second setting, the system evolution is given by a difference equation where the distribution of the random disturbance is unknown. Thus, in addition to the errors by the application of the API, in this case the performance bounds also depend on a statistical estimation error. The results are illustrated with numerical approximations for a class of inventory systems.

Idioma originalInglés
Páginas (desde-hasta)79-95
Número de páginas17
PublicaciónRisk and Decision Analysis
Volumen6
N.º2
DOI
EstadoPublicada - 2017

Nota bibliográfica

Publisher Copyright:
© 2017-IOS Press and the authors. All rights reserved.

Huella

Profundice en los temas de investigación de 'Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces'. En conjunto forman una huella única.

Citar esto