Suppose that $Y_{i}\in\{0,1\}$ is a binary variable, and $X_{i}$ is some random vector in $\mathbb{R}^{d}$ . Why can we say the following:
\begin{eqnarray*} \mathbb{E}\left(Y_{i}|X_{i},f(X_{i})\right) & = & \mathbb{E}\left(Y_{i}|X_{i}\right) \end{eqnarray*}
where $f(X_i)=P(Y_{i}=1|X_{i})$?
Intuitively, once we know $X_{i}$, we should know $f(X_{i})$. Hence, conditioning on $f(X_{i})$ should provide no more useful information for computing the expectation of $Y_{i}$. Also, once we know $f(X_{i})$ we do not necessarily know $X_{i}$, since similar $X_{i}$ can generate the same $f(X_{i})$.
Can someone provide a more rigorous explanation of this? I think it has to do with sigma fields.
Thanks, Newbacus.
I'm new to this site, so forgive me if this question is too easy or trivial or not appropriate in some way.