Optimization problem where the support of a random variable depends on the value of a decision variable

125 Views Asked by At

Suppose we have a dynamic program / Markov decision process where in the objective we have something like this: $v_t(s)=\max\limits_{u_1,u_2} \{r_t(s,u_1,u_2) + E[v_{t+1}(u_1+\min(u_2,\epsilon))]\}$, where $r_t(s,u_1,u_2)$ is a reward function of the state $s$ and the decision variables $u_1, u_2$. $\epsilon$ is a discrete random variable and here the support/range of $\epsilon$ depends on the value of the decision variable $u_2$, for example, if $u_2$ is 7 then the support of $\epsilon$ is $\{0,1,2,3,4,5,6,7 \}$. How can we model/write this in a neat non-problematic way. My main problem is how to formulate the problem with this kind of random variable support and decision variable relation. Thanks.