Let $X$ and $Y$ be $\mathbb{R}^n$- and $\mathbb{R^m}$-valued random variables on the probability space $(\Omega, \mathcal{F}, P)$. Further assume that $X$ is $\mathcal{F}_1$-measurable and $Y$ is $\mathcal{F}_2$-measurable, where $\mathcal{F}_1$ and $\mathcal{F}_2$ are independent sub-$\sigma$-algebras of $\mathcal{F}$.
Now, for $A \in \mathcal{B}(\mathbb{R}^n)$ and $B \in \mathcal{B}(\mathbb{R}^m)$ consider the maps \begin{align} f(x, y) &= 1_{A \times B} (x, y), \\ f( x, Y) &= 1_{A \times B} (x, Y), \\ g(x) &= E[f(x, Y )] = E[1_{A \times B} (x, Y)], \\ g(X) &= E[f(x, Y)] \big|_{x = X} = E[1_{A \times B} (x, Y)] \big|_{x = X}. \end{align} Suppose I know that $$ E[1_{A \times B}(X, Y) | \mathcal{F}_1] = E[1_{A \times B} (x, Y)] \big|_{x = X}. $$ Then by the definition of the conditional expectation this means that for any $F \in \mathcal{F}_1$ $$\tag{1} \int_{F} 1_{A \times B}(X, Y) dP = \int_F E[1_{A \times B} (x, Y)] \big|_{x = X} dP. $$ I want to show that we also have $$ \tag{2} \int_{F} h(X, Y) dP = \int_F E[h (x, Y)] \big|_{x = X} dP $$ for all $\mathcal{B}( \mathbb{R^n} ) \otimes \mathcal{B}( \mathbb{R^m} )$-measurable positive step functions $h$. The latter of course will follow by linearity if we can show that $$\tag{3} \int_{F} 1_D (X, Y) dP = \int_F E[1_D (x, Y)] \big|_{x = X} dP $$ for any $D \in \mathcal{B}( \mathbb{R^n} ) \otimes \mathcal{B}( \mathbb{R^m} )$.
An argument I have come across says:
Both sides of $(1)$ can be extended from $\mathcal{B}( \mathbb{R^n} ) \times \mathcal{B}( \mathbb{R^m} )$ to define measures on $\mathcal{B}( \mathbb{R^n} ) \otimes \mathcal{B}( \mathbb{R^m} )$. By linearity $(1)$ becomes $(2)$.
How exactly should one understand this argument? For instance, if we take $F = \Omega$, the LHS of $(1)$ is $P ( (X, Y ) \in A \times B )$. The independence of $X$ and $Y$ then gives $$P ( (X, Y ) \in A \times B ) = P ( X \in A) P ( X \in B),$$ and for the distribution functions we have $P_{( X, Y)} ( A \times B ) = P_X ( A ) \times P_Y ( B )$, where $P_{(X, Y)}$ is a probability measure on $\mathcal{B}( \mathbb{R^n} ) \otimes \mathcal{B}( \mathbb{R^m} )$. Does this have any connection with the quoted argument? What about the RHS of $(1)$. And how does one obtain $(2)$ or $3$? A detailed demonstration would be very much appreciated.
1 to 3 is a standard $\pi$-$\lambda$ lemma argument. The collection $\mathcal{P}$ of sets of the form $A \times B$ is certainly closed under intersection, i.e. is a $\pi$-system. Now show that the collection $\mathcal{L}$ of all sets $D$ satisfying (3) is a $\lambda$-system. (The monotone convergence theorem will be useful.) You conclude that $\mathcal{L}$ contains $\sigma(\mathcal{P})$, which by definition equals $\mathcal{B}( \mathbb{R^n} ) \otimes \mathcal{B}( \mathbb{R^m} )$, and so you have shown that (3) holds for all $D \in \mathcal{B}( \mathbb{R^n} ) \otimes \mathcal{B}( \mathbb{R^m} )$.
It can also be done with the monotone class theorem.