Cost function with stochastic variable

290 Views Asked by At

I'm not as well versed as I would like to be to confidently evaluate the following cost function. So any affirmation would be appreciated. Given an initial stage $x_0$

$$J_\pi(x_0) = \lim_{N \to \infty} \mathop{{}^{E}_{w_k}}_{k= 0,1,...,} \{\sum_{k=0}^{N-1}\alpha^kg(x_k,\mu_k(x_k),w_k)\}$$ which is subject to the discrete time system constraint

$$x_{k+1}=f(x_k, u_k,w_k), \qquad k=0,1,...,$$

where $x_k$ is the state, $u_k$ is the control, and $w_k$ is the random disturbance. The objective is to find the best policy $\pi = \{\mu_{0},\mu_{1},... \}$, where $\mu_k : S \mapsto C$, $u_k \in C $ and $ \forall x_k \in S$, so that the cost is minimized. I should also mention that $w_k$, like $x_k$ and $u_k$, has its own space $D$ a countable set. $w_k$ may be dependent on the current state $x_k$ and control $u_k$, but not on $k$ and thus previous disturbances don't affect. As could be implied, $w_k$ is characterized by probability distributions $P(w_k | x_k , u_k )$. Also the scalar $\alpha$ is referred to as the discount factor and $g: S \times C \times D \to \mathbb{R}$ is the cost per stage.

My questions are:
1. What difference does it make having upper bound $N-1$ instead of $N$ when I have a limit as $N$ approaches infinity? Is $N-1$ only written because $k = 0$ and we would like to say the Nth stage and mean it?
2. Is there any reason besides the $w_k$ being inside stuff that the expected value is outside of the sum to make it look pretty? Could it have been inside the sum but enclosing the cost per stage function $g$ instead?

1

There are 1 best solutions below

2
On BEST ANSWER

I'm confused somewhat by the notation. You describe $w_k$ as an i.i.d. random variable conditional on the contemporaneous state and control variable which seems odd since this suggests the choice of any $u_k$ minimizes the cost function through the state equation $x_{k+1}$ and by affecting the support (and hence mean) of the $w_k$ shock. But perhaps this is standard in your terrain...?

With that caveat in mind:

  1. The sum running from $N$ or to $N-1$ is incidental in the limit since $\infty = \infty-1$. (This isn't really kosher, since $\infty$ is not a number.) For $N < \infty$ the indices matter if you're, for example, reducing the equation to prove convergence (which is here guaratneed for $|\alpha| < 1$).
  2. Formally, since $E[\bullet]$ is a linear operator, it can be written as either inside or outside summand. This because the expectations are applied at $k=0$ (clearer notation would be $E_0[\bullet]$). If expectations were repeatedly updated, e.g. $\sum_k E_k[\bullet]$, then past shocks $w_{t<k}$ would be part of the information. This would be a very different problem from the one posed in which all shocks are unknown. Hence, for clarity's sake it is preferable to put the expectations operator outside the sum.