I have been looking into measure theory (from a probabilist's perspective), and I have found the proof of the existence of the conditional expectation to feel a little "glossed over" in literature. As such I've tried to very, very slowly break down the steps -- and as a result I have three questions pertaining to what feels like "conceptual issues".
I will first show how I would slowly develop the proof (please correct me if I'm wrong), which will then lead to the questions at the end:
Step 1: Establish probability space: $(\Omega, \Sigma, \mathbb{P})$. Consider a $\Sigma$-measurable random variable (RV), over this space as $X$ (which would work for example as: $X: \Omega \rightarrow \mathbb{R}$)
Step 2: Define sub $\sigma$-algebra: $\mathcal{G}\subset \Sigma$, resulting in measurable space: $(\Omega, \mathcal{G})$.
Step 3: Define a measure, $\nu$, on $(\Omega, \mathcal{G})$. This measure is to behave as:
\begin{align*} \nu(G) = \int_G X d\mathbb{P} = \int_{\Omega}X\mathbf{1}_{G} d\mathbb{P}=\mathbb{E}[X\mathbf{1}_{G}], \quad \forall G\in\mathcal{G}. \end{align*}
Step 4: Consider now the restricted probability measure, $\mathbb{P}^{\mathcal{G}}$, restricted to measurable space, $(\Omega, \mathcal{G})$, such that $\mathbb{P}(G) = \mathbb{P}^{\mathcal{G}}(G)$, $\forall G\in\mathcal{G}$. Thus, we are now considering to work probability space: $(\Omega, \mathcal{G},\mathbb{P}^{\mathcal{G}})$.
Step 5: By construction $\nu \ll \mathbb{P}^{\mathcal{G}}$. We can thus invoke Radon-Nikodym, meaning that there is a unique (a.s.), $\mathcal{G}$-measurable function, $Z$, s.t.
\begin{align*} \nu(G) = \int_G Z d\mathbb{P}^{\mathcal{G}} = \int_{\Omega} Z \mathbf{1}_{G} d\mathbb{P}^{\mathcal{G}} = \mathbb{E}[Z\mathbf{1}_{G}], \quad \forall G\in\mathcal{G}. \end{align*}
Step 6: We can thus conclude the following relationships:
\begin{align*} E[X\mathbf{1}_G] = \nu(G) = \int_{\Omega} Z\mathbf{1}_G d\mathbb{P}^{\mathcal{G}} = \mathbb{E}[Z\mathbf{1}_{G}] \end{align*}
The significance of this is that the LHS is a $\Sigma$-measurable RV, and in the RHS it is a $\mathcal{G}$-measurable RV. Moreover, $X=Z$ (a.s.) as by Radon-Nikodym $Z$ is unique (a.s.).
Questions:
Q1: As far as I can tell, this is the proof of existence for the conditional expectation. However, for me it is not immediately obvious why this should be the conditional expectation. It seems that authors simply make a claim at the end:
Therefore, $Z=\mathbb{E}[X\mid \mathcal{G}]$...
However, I don't get why that is? Especially because $\mathbb{E}[\cdot]$ seems for me to not just be a "matter of notation", because $\mathbb{E}[X] = \int Xd\mathbb{P}$ has a very precise definition! So if we declare, $Z=\mathbb{E}[X\mid \mathcal{G}]$, it seems like the definition of $\nu(G)$ should consist of an iterated integral.
Q2: In the long list of equalities in Step 6, whereby we conclude $\mathbb{E}[X\mathbf{1}_G]=\mathbb{E}[Z\mathbf{1}_G]$, is the expectation wrt the same probability measure? i.e. is it $\mathbb{P}$ on both sides? Or $\mathbb{P}$ in the case of $\mathbb{E}[X\mathbf{1}_G]$, and $\mathbb{P}^{\mathcal{G}}$ in the case of $\mathbb{E}[Z\mathbf{1}_{G}]$ ... or does it simply not matter? On this case, I have seen different authors have conflicting views, and I am not sure if it is an important issue, or simply transcription error.
Q3: What is the significance that $X$ is $\Sigma$-measurable, and $Z$ is $\mathcal{G}$-measurable? I wrote down before that "it is a significant conclusion", but I don't have an intuition why this is such an important concept.
First of all, it is not true that $X$ is almost sure equal to $Y = \mathbb{E}[X | \mathcal{G}]$. Technically, this does not follow from uniqueness in the Radon-Nikodym theorem because Radon-Nikodym gives a unique $\mathcal{G}$-measurable random variable $Y$ with $\nu(G) = \int_G Y d\mathbb{P}^G$ but $X$ is generally not $\mathcal{G}$-measurable. Otherwise, the whole point of the conditional expectation would be missed. For example, let $\mathcal{G} = \{ \emptyset, \Omega\}$. Then $Y = \mathbb{E}X$ is constant but $X$ obviously need not be constant, too.
Q1: The conditional expectation of $X$ with respect to a $\sigma$-algebra $\mathcal{G}$ is defined as the essentially unique $\mathcal{G}$-measurable random variable $Y$ such that for every $G \in \mathcal{G}$ it is $$ \mathbb{E}_\mathbb{P} \, X 1_g = \mathbb{E}_{\mathbb{P}^\mathcal{G}} \, Y 1_G. $$ That is exactly, what you have shown.
Q2: In the answer for Q1 I used a notation indicating that on the RHS of the equation the expectation is taken w.r.t. the restricted probability $\mathbb{P}^\mathcal{G}$ and on the LHS w.r.t. $\mathbb{P}$. This makes clear that on the right-hand side $Y$ is $\mathcal{G}$-measurable but it makes no difference whether you integrate w.r.t. to $\mathbb{P}$ or $\mathbb{P}^\mathcal{G}$. For $\mathcal{G}$-measurable simple functions of the form $\sum_{i=1}^n \alpha_i 1_{G_i}$ it is clear that the expectations for the different measures are equal and this extends to arbitrary measurable functions.
Q3: That $Y$ is $\mathcal{G}$-measurable and $X$ is $\Sigma$-measurable is the whole point of conditional expectation. In an application, the underlying $\sigma$-algebra represents the available information in the way that the sets in the $\sigma$-algebra are those that I can distinguish with the information at hand. I admit that it is intuitively not clear in which sense a $\sigma$-algebra represents information. However, maybe it becomes clear in a special situation with Doob's criterion of measurabilty:
So, assume that your $\sigma$-algebra $\mathcal{G}$ is actually generated by some random variable $Z$. Then $\mathcal{G}$ represents the information I have if I know the value of $Z$. Since $Y = \mathcal{E}[X | \mathcal{G}]$ is $\mathcal{G}$-measurable Doob gives a measurable function $g$ such that $$ Y = g \circ Z. $$ That function $g$ allows setting in the known value of $Z$ to obtain the expected value of $Y$ given this information.