TY - JOUR
T1 - Markov control models with unknown random state–action-dependent discount factors
AU - Minjárez-Sosa, J. Adolfo
N1 - Publisher Copyright:
© 2015, Sociedad de Estadística e Investigación Operativa.
PY - 2015/10/1
Y1 - 2015/10/1
N2 - The paper deals with a class of discounted discrete-time Markov control models with non-constant discount factors of the form $$\tilde{\alpha } (x_{n},a_{n},\xi _{n+1})$$α~(xn,an,ξn+1), where $$x_{n},a_{n},$$xn,an, and $$\xi _{n+1}$$ξn+1 are the state, the action, and a random disturbance at time $$n,$$n, respectively, taking values in Borel spaces. Assuming that the one-stage cost is possibly unbounded and that the distributions of $$\xi _{n}$$ξn are unknown, we study the corresponding optimal control problem under two settings. Firstly we assume that the random disturbance process $$\left\{ \xi _{n}\right\} $$ξn is formed by observable independent and identically distributed random variables, and then we introduce an estimation and control procedure to construct strategies. Instead, in the second one, $$\left\{ \xi _{n}\right\} $$ξn is assumed to be non-observable whose distributions may change from stage to stage, and in this case the problem is studied as a minimax control problem in which the controller has an opponent selecting the distribution of the corresponding random disturbance at each stage.
AB - The paper deals with a class of discounted discrete-time Markov control models with non-constant discount factors of the form $$\tilde{\alpha } (x_{n},a_{n},\xi _{n+1})$$α~(xn,an,ξn+1), where $$x_{n},a_{n},$$xn,an, and $$\xi _{n+1}$$ξn+1 are the state, the action, and a random disturbance at time $$n,$$n, respectively, taking values in Borel spaces. Assuming that the one-stage cost is possibly unbounded and that the distributions of $$\xi _{n}$$ξn are unknown, we study the corresponding optimal control problem under two settings. Firstly we assume that the random disturbance process $$\left\{ \xi _{n}\right\} $$ξn is formed by observable independent and identically distributed random variables, and then we introduce an estimation and control procedure to construct strategies. Instead, in the second one, $$\left\{ \xi _{n}\right\} $$ξn is assumed to be non-observable whose distributions may change from stage to stage, and in this case the problem is studied as a minimax control problem in which the controller has an opponent selecting the distribution of the corresponding random disturbance at each stage.
KW - Discounted optimality
KW - Estimation and control procedures
KW - Minimax control systems
KW - Non-constant discount factors
UR - http://www.scopus.com/inward/record.url?scp=84944169998&partnerID=8YFLogxK
U2 - 10.1007/s11750-015-0360-5
DO - 10.1007/s11750-015-0360-5
M3 - Artículo
SN - 1134-5764
VL - 23
SP - 743
EP - 772
JO - TOP
JF - TOP
IS - 3
ER -