TY - JOUR

T1 - Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces

AU - Teresa Robles-Alcaráz, M.

AU - Vega-Amaya, Óscar

AU - Adolfo Minjárez-Sosa, J.

N1 - Publisher Copyright:
© 2017-IOS Press and the authors. All rights reserved.

PY - 2017

Y1 - 2017

N2 - This paper provides finite-time performance bounds for the approximate policy iteration (API) algorithm in discounted Markov decision models. This is done for a class of approximation operators called averagers. An averager is a positive linear operator with norm equal to one with an additional continuity property. The class of averagers includes several of the usual interpolation schemes as lineal and multilinear approximations, kernel-based approximators among others. The API algorithm is studied under two settings. In the first one, the transition probability is completely known and the performance bounds are given in terms of the approximation errors and the stopping error of the policy iteration algorithm. In the second setting, the system evolution is given by a difference equation where the distribution of the random disturbance is unknown. Thus, in addition to the errors by the application of the API, in this case the performance bounds also depend on a statistical estimation error. The results are illustrated with numerical approximations for a class of inventory systems.

AB - This paper provides finite-time performance bounds for the approximate policy iteration (API) algorithm in discounted Markov decision models. This is done for a class of approximation operators called averagers. An averager is a positive linear operator with norm equal to one with an additional continuity property. The class of averagers includes several of the usual interpolation schemes as lineal and multilinear approximations, kernel-based approximators among others. The API algorithm is studied under two settings. In the first one, the transition probability is completely known and the performance bounds are given in terms of the approximation errors and the stopping error of the policy iteration algorithm. In the second setting, the system evolution is given by a difference equation where the distribution of the random disturbance is unknown. Thus, in addition to the errors by the application of the API, in this case the performance bounds also depend on a statistical estimation error. The results are illustrated with numerical approximations for a class of inventory systems.

KW - Markov decision processes

KW - approximate policy iteration

KW - density estimation

KW - discounted criterion

UR - http://www.scopus.com/inward/record.url?scp=85020194814&partnerID=8YFLogxK

U2 - 10.3233/RDA-160116

DO - 10.3233/RDA-160116

M3 - Artículo

SN - 1569-7371

VL - 6

SP - 79

EP - 95

JO - Risk and Decision Analysis

JF - Risk and Decision Analysis

IS - 2

ER -