Consider the following stochastic dynamic program (SDP): $$ V_t(\textbf{s}_t)= \max_{\textbf{a}_t\in A_t(x_t)} \{(1-\lambda(a_t))V_{t+1}(\textbf{s}_t) + \lambda(a_t)(r_t(a_t)+V_{t+1}(\textbf{s}_t-\textbf{e}))\} $$ In summary, the state of the system will change to new state $\textbf{s}_{t+1}=\textbf{s}_t-\textbf{e}$ with probability $\lambda(a_t)$ and will not change with $1-\lambda(a_t)$. The immediate reward is $r_t(a_t)$, i.e., both $\lambda_t$ and $r_t$ are functions of action taken at time $t$, $a_t$. The context for this setup is related to customer arrivals and revenue generated by that customer arriving at a period.
I can simplify the above SDP to the following form: $$ V_t(\textbf{s}_t)=V_{t+1}(\textbf{s}_t)+\max_{\textbf{a}_t\in A_t(\textbf{x}_t)} {R_t(\textbf{s}_t, \textbf{a}_t)} $$ where $$ R_t(\textbf{s}_t, \textbf{a}_t)=\lambda(a_t)(r_t(a_t)-\Delta V_{t+1}(\textbf{x}_t)) $$ with $\Delta V_{t+1}(\textbf{x}_t)=V_{t+1}(\textbf{s}_t)-V_{t+1}(\textbf{s}_t-\textbf{e})$.
I can prove that for a given $\textbf{s}_t$, the function $R_t$ is concave in $\textbf{a}_t$. The challenge here is that state variable is a vector ($\textbf{x}_t$), not a scalar. I don't know how to approach an MDP where the state is represent by a vector.
Any kind of guidance on the next step would be much appreciated.