I'm not as well versed as I would like to be to confidently evaluate the following cost function. So any affirmation would be appreciated. Given an initial stage $x_0$
$$J_\pi(x_0) = \lim_{N \to \infty} \mathop{{}^{E}_{w_k}}_{k= 0,1,...,} \{\sum_{k=0}^{N-1}\alpha^kg(x_k,\mu_k(x_k),w_k)\}$$ which is subject to the discrete time system constraint
$$x_{k+1}=f(x_k, u_k,w_k), \qquad k=0,1,...,$$
where $x_k$ is the state, $u_k$ is the control, and $w_k$ is the random disturbance. The objective is to find the best policy $\pi = \{\mu_{0},\mu_{1},... \}$, where $\mu_k : S \mapsto C$, $u_k \in C $ and $ \forall x_k \in S$, so that the cost is minimized. I should also mention that $w_k$, like $x_k$ and $u_k$, has its own space $D$ a countable set. $w_k$ may be dependent on the current state $x_k$ and control $u_k$, but not on $k$ and thus previous disturbances don't affect. As could be implied, $w_k$ is characterized by probability distributions $P(w_k | x_k , u_k )$. Also the scalar $\alpha$ is referred to as the discount factor and $g: S \times C \times D \to \mathbb{R}$ is the cost per stage.
My questions are:
1. What difference does it make having upper bound $N-1$ instead of $N$ when I have a limit as $N$ approaches infinity? Is $N-1$ only written because $k = 0$ and we would like to say the Nth stage and mean it?
2. Is there any reason besides the $w_k$ being inside stuff that the expected value is outside of the sum to make it look pretty? Could it have been inside the sum but enclosing the cost per stage function $g$ instead?
I'm confused somewhat by the notation. You describe $w_k$ as an i.i.d. random variable conditional on the contemporaneous state and control variable which seems odd since this suggests the choice of any $u_k$ minimizes the cost function through the state equation $x_{k+1}$ and by affecting the support (and hence mean) of the $w_k$ shock. But perhaps this is standard in your terrain...?
With that caveat in mind: