From Rick Durrett's book Probability: Theory and Examples:
We define the conditional expectation of $X$ given $\mathcal{G}$, $E(X | \mathcal{G})$ to be any random variable $Y$ that has
(1) $Y \in \mathcal{G}, \text { i.e., is } \mathcal{G} \text { measurable }$
(2) $\text {for all } A \in \mathcal{G}, \int_{A} X d P=\int_{A} Y d P$
And in other materials I found:
Let $(\Omega, \mathscr{F}, P)$ be a probability space and let $\mathscr{G}$ be a σ−algebra contained in $\mathscr{F}$. For any real random variable $X \in L^{1}(\Omega, \mathscr{F}, P)$, $\operatorname{define} E(X | \mathscr{G})$ to be the unique random variable $Z \in L^{1}(\Omega, \mathscr{G}, P)$ such that for every bounded $\mathscr{G}-\text { measurable }$ random variable $Y$, $$E(X Y)=E(Z Y)$$
The difference between the two definitions is that in the first one, we need to do the test that $\mathbb E\left[XY\right]=\mathbb E\left[ZY\right]$ only when $Y$ has the form $\mathbf 1_A$ for all $A\in\mathcal G$ whereas in the second definition, this should be done for all the bounded $\mathcal G$-measurable functions.
All we need is the following fact:
We can use the fact that a bounded $\mathcal G$-measurable function can be approximated in the uniform norm by a linear combination of indicator functions.