I recently used the following equality in a solution to some excerise, when I realized I don't know how to formally prove it:
Let $(\Omega, \mathcal{F}, P)$ be a probability triplet and $(A_i)_{i=1}^\infty$ a partition of the sample space, then: $$E[X]=\sum_iE[X|A_i]P(A_i)$$
Looking at the proof in Wikipedia, it seemed kind of odd and I couldn't really convince myself in any of the steps formally. After reading it, I understood that I don't know what the definition of $E[X|A]$ is (where $X$ is a RV, and $A$ is some event), and I couldn't find a definition anywhere on the internet.
I tried to define $E[X|A]$ in order to prove the original equality. All of my definitions relied on the measure-theoretical definition of conditional expectation, each definition somewhat or completely failling:
- $E[X|A]=E[X|\sigma(I_A)]$ - in this definition we get $E[X|A]=E[X|A^c]$, so it doesn't work (not to mention that in the original equality we would get that one side is a RV while the other side is a real number).
- $E[X|A]=E[X|\sigma(I_A)](a), a\in A$ - since $E[X|A]$ is defined a.s. it appears that this could be not well defined, and even if it works I couldn't manage to actually prove the original equality with it.
- $E[X|A]= E^{P(\cdot|A)}[X]$ - this is the definition the Wikipedia proof is suggesting, but as before, I can't understand how to formally prove the original equality with this definion. Also, intuitively the measure-theoretic definition of conditional expectation should be enough to define every version of conditioning that we could hope for, as it is the most general definition, so it feels somewhat weird to use something completely different in order to define a case which should be simpler.
I would very much appriciate it if someone could provide a definition for $E[X|A]$, an explanation for why it works (as opposed to my attempts at a definition), and a proof of the original equality using the definition provided.
Thank you very much for your time.
Definition 8.1 Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space, let $\mathcal{G} \subseteq \mathcal{F}$ be a $\sigma$-algebra, and let $X \in \mathcal{L}^{1}(\Omega, \mathcal{F}, \mathbb{P})$. The conditional expectation of $X$ given $\mathcal{G}$ is denoted by $\mathbb{E}[X \mid \mathcal{G}]$, and it is the subset of $\mathcal{L}^{1}(\Omega, \mathcal{G}, \mathbb{P})$ containing all random variables $Y \in \mathcal{L}^{1}(\Omega, \mathcal{G}, \mathbb{P})$ satifying $$ \forall G \in \mathcal{G}: \int_{G} Y d \mathbb{P}=\int_{G} X d \mathbb{P} . $$ Note that a priori it is not clear whether or not $\mathbb{E}[X \mid \mathcal{G}]$ is the empty set. However, Theorem $8.4$ below guarantees that $\mathbb{E}[X \mid \mathcal{G}]$ is non-empty. Its proof is rather long and technical.
Theorem 8.4 Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space, let $\mathcal{G} \subseteq \mathcal{F}$ be a $\sigma$-algebra and let $X \in \mathcal{L}^{1}(\Omega, \mathcal{F}, \mathbb{P})$. Then $\mathbb{E}[X \mid \mathcal{G}] \neq \emptyset$, i.e., there exists a $Y \in \mathcal{L}^{1}(\Omega, \mathcal{G}, \mathbb{P})$ such that $$ \forall G \in \mathcal{G}: \int_{G} Y d \mathbb{P}=\int_{G} X d \mathbb{P} . $$ holds.
I do not report the proof of this theorem as it is long, hard and not insightful for our purposes.
Theorem 8.7 Let $X \in \mathcal{L}^{1}(\Omega, \mathcal{F}, \mathbb{P})$, let $\mathcal{G} \subseteq \mathcal{F}$ be a $\sigma$-algebra. If $\mathcal{H}$ is a sub- $\sigma$-algebra of $\mathcal{G}$, then $\mathbb{E}[\mathbb{E}[X \mid \mathcal{G}] \mid \mathcal{H}]=\mathbb{E}[X \mid \mathcal{H}]$ (tower property).
Again, I will not report the proof of the tower property as you can find many proofs of it on the internet, but I will now use it to prove the special case you are interested in!
One special case states that if $\left\{A_{i}\right\}_{i}$ is a finite or countable partition of the sample space, then $$ \mathbb{E}(X)=\sum_{i} \mathbb{E}\left(X \mid A_{i}\right) \mathbb{P}\left(A_{i}\right) $$
Proof: First note that the quantities $\mathbb{E}[X|A_i] = \frac{\mathbb{E}[X \mathbb{1}_{A_i}]}{\mathbb{P}(A_i)}$, provided that $\mathbb{P}(A_i) > 0$, are just numbers as we are just conditioning on an event, not on a sigma-algebra generated by some other random variable. So it is a special case of the definition given above. Now we have, using that $\left\{A_{i}\right\}_{i}$ is a finite or countable partition of the sample space:
$$ \mathbb{E} [X]= \mathbb{E}[ \sum_i X \mathbb{1}_{A_i}] = \sum_i \mathbb{E}[ X \mathbb{1}_{A_i}] \stackrel{\mathrm{Tower \: Property}}{=} \sum_i \mathbb{E} [ \mathbb{E}[ X \mathbb{1}_{A_i} | A_i ]] \stackrel{\mathrm{measurab. \: of \: }\mathbb{1}_{A_i}}{=} \sum_i \mathbb{E} [ \mathbb{1}_{A_i} \mathbb{E}[ X | A_i ]] \stackrel{ \mathbb{E}[ X | A_i ] \mathrm{\: are \: numbers, \: pull \: them \: out \: of \: expectation}}{=} \sum_i \mathbb{E}[ X | A_i ] \mathbb{E} [ \mathbb{1}_{A_i}] = \sum_i \mathbb{E}[ X | A_i ] \mathbb{P}(A_i) \: \: \: \: \: \: \: \square$$
I hope this helps!
(Please, if this helped you, leave a positive feedback and/or accept the answer, thank you).