tower property applied to expectations conditioned on an event; definition of the latter

123 Views Asked by At

Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space, $X$ be integrable w.r.t. this space, and $\mathcal{G} \subset \mathcal{F} \subset \mathcal{A}$ be sigma algebras.

Then by the tower property, we have that $\mathbb{E}[\mathbb{E}[X|\mathcal{F}]| \mathcal{G}] = \mathbb{E}[\mathbb{E}[X|\mathcal{G}]| \mathcal{F}] = \mathbb{E}[X|\mathcal{G}]$.

Now in some text about Markov Decision Process, people write, with $s_{0}$ denoting the state of the system at time $0$, and $a_{0}$ the action taken at time $0$, things like $\sum_{a' \in \mathcal{A}} \mathbb{P}(a_{0} = a'|s_{0}=s')\mathbb{E}[X|s_{0}=s', a_{0}=a'] = \mathbb{E}[\mathbb{E}[X|s_{0}=s', a_{0}=a']|s_{0}=s'] = \mathbb{E}[X|s_{0}=s']$.

It seems like tower property is used here for the last equality. However, here $(s_{0}=s', a_{0}=a')$, and $(s_{0}=s')$, they are events instead of sigma-algebras, so why can we apply the tower property? Recall that unlike $\mathbb{E}[X|\mathcal{G}]$, where $\mathcal{G}$ is a sigma algebra, $\mathbb{E}[X|A]$ is defined in another way, i.e., it is $\int X\mathbb{I}(A) d\mathbb{P}/\mathbb{P}(A)$ when $A$ is an event.

1

There are 1 best solutions below

1
On

Suppose we are in a Markov decision process setting. $S, U$ are spaces of states and actions/controls, resp. Assume $U$ is discrete, and the policy is stationary, i.e., $u_{t}$ depends on the value of $s_{t}$ but not on the time $t$ or anything that happened before $t$. How should we justify the iterated expectation formula (with $s^{1}$ known) $\mathbf{E}[\mathbf{E}[X|s_{t} = s^{1}, u_{t}]|s_{t} = s^{1}] = \sum_{u \in U} \mathbf{P}(u_{t} = u|s_{t} = s^{1})\mathbf{E}[X|s_{t} = s^{1}, u_{t} = u],$ if we want to think of $s_{t} = s^{1}, u_{t} = u$ as an event?

Let $\mathcal{F}$ be the sigma algebra of all possible trajectories of the MDP, i.e., the sigma algebra involved when we calculate the expectation of any measurable functions of the process. Let $\mathcal{A}, \mathcal{B} \subset \mathcal{F}$ be subcollections of events. Assume $\mathcal{B} = \{B_{1}, B_{2}, ...\}$ is discrete. Think of $B$ as asking "$u_{t} = ?$" and $B_{1}$ as the event $\{u_{t} = u^{1} \in U\} = \{\omega: u_{t} = u^{1} \in U\}$, and $A_{i}$ as the event $\{s_{t} = s^{i}\}$, etc. For each $A_{i} \in \mathcal{A}$, we have $\mathbf{P}[B_{j}|A_{i}]$ for each index $j$. Therefore, we can define a probability space $(\mathcal{B}, 2^{\mathcal{B}}, \mu_{A_{i}})$, where $2^{\mathcal{B}}$ denotes the power set of $\mathcal{B}$. Here, $\mu_{A_{i}}(\{B_{j}\}) := \mathbf{P}[B_{j}|A_{i}]$.

Consider $(*) := \mathbf{E}[\mathbf{E}[X|A_{1}, B] | A_{1}]$ = $\mathbf{E}[\mathbf{E}[X|s_{t} = s^{1}, u_{t}]|s_{t} = s^{1}]$ . Notice the inner expectation is taken with respect to $\omega$, and it is actually a function of $B$, which means the outer expectation is taken w.r.t. $B$. So $(*)$ can be viewed as $\mathbf{E}[f(B) | A_{1}].$ However, here, we are unable to view $A_{1}$ as a sigma algebra. We can not view it as an event either, because when $B \in \mathcal{B}$ is the variable, the "events" mean elements of the sigma algebra $2^{\mathcal{B}}$, and $A_{1} \in \mathcal{F}$, but not in $2^{\mathcal{B}}$. Instead, we should treat the notation $\mathbf{E}[f(B) | A_{1}]$ as $\mathbf{E}_{A_{1}}[f(B)] = \mathbf{E}_{A_{1}, M}[f(B)],$ where the subscript $M$ which is often omitted refers to the randomness of the transition brought by the Markov process, and $A_{1}$ means the underlying probability measure $\mu_{A_{1}}$ with respect to which we integrate $f(B)$ relies on $A_{1}$. This subscripting is reasonable because there are two types of randomness in the whole process where $s_{t}, a_{t}$ are generated as time $t$ goes on: $a_{t}$ is determined by our random policy based on $s_{t}$, and $s_{t+1}$ is determined by $(s_{t}, a_{t})$ and the randomness of the transition from $(s_{t}, a_{t})$ imposed by the Markov process.

Therefore, $\mathbf{E}[f(B)|A_{1}] := \mathbf{E}_{A_{1}}[f(B)] = \int f(B) d\mu_{A_{1}}(B) = \sum_{B_{i} \in \mathcal{B}} \mu_{A_{1}}(\{B_{i}\}) f(B_{i}) = \sum_{B_{i} \in \mathcal{B}} \mu_{A_{1}}(\{B_{i}\}) \mathbf{E}[X|A_{1}, B_{i}] = \sum_{B_{i} \in \mathcal{B}} \mathbf{P}(B_{i}|A_{1}) \frac{1}{\mathbf{P}(A_{1} \cap B_{i})} \int 1_{A_{1}\cap B_{i}}(\omega) X(\omega) d\mathbf{P}(\omega) = \sum_{B_{i} \in \mathcal{B}} \frac{1}{\mathbf{P}(A_{1})} \int 1_{A_{1}\cap B_{i}}(\omega) X(\omega) d\mathbf{P}(\omega) = \frac{1}{\mathbf{P}(A_{1})} \int 1_{A_{1}}(\omega) X(\omega) d\mathbf{P}(\omega) = \mathbf{E}[X|A_{1}]$