I am trying to understand how conditional expectation works when it is done over a sum of a function. Such as is the case in the following gain function g with the following properties:
The gain function of policy $\pi$ is a mapping $g^{\pi}: \mathcal{S} \rightarrow \mathbb{R}$ defined as $$ g^{\pi}(s):=\lim _{N \rightarrow \infty} \frac{1}{N} \mathbb{E}^{\pi}\left[\sum_{t=1}^{N} r\left(s_{t}, a_{t}\right) \mid s_{1}=s\right] . $$ where $\mathbb{E}^{\pi}$ indicates expectation over trajectories generated by $\pi$.
- $g^{\pi}(s)$ measure the per-step reward obtained in a steady state under $\pi$ starting from $s$.
- The limit may not exist for all policies.
- For all $\pi$ and $s$ : $$ \left|g^{\pi}(s)\right| \leq R_{\max }, $$ where $R_{\max }$ is an upper bound on the rewards.
In this case can I then rewrite expression such that it looks like this:
$g^{\pi}(s):=\lim _{N \rightarrow \infty} \frac{1}{N} \sum_{x}\left[ p\left(X = \left[\sum_{t=1}^{N} r\left(s_{t}, a_{t}\right) \mid s_{1}=s\right]\right) \cdot\left[\sum_{t=1}^{N} r\left(s_{t}, a_{t}\right) \mid s_{1}=s\right]\right]$
According to the definition of expectation as: $E[X]=\sum_{x} x \cdot P(X=x)$, where in this case $x = \left[\sum_{t=1}^{N} r\left(s_{t}, a_{t}\right) \mid s_{1}=s\right]$
And in that case can I then rewrite the conditional expectation, such that it represents a number? In neither case I am using $s_1$ when summing up.
Nor do I really know if complications arise due to the N and the lim in the expression.