I am a bit confused about how the conditional expectation works under operations. I've seen steps such as
$(E[\theta | X])^2 = E[\theta^2 | x]$
Done in proofs without much explanation. My question is, under what conditions is
$h(E[\theta | X]) = E[h(\theta) | X]$
And if you could add an intuitive explanation to it that would be great!
Recall that the conditional variance of $\theta$ given $X$ is defined as $$Var(\theta|X)=E[\theta^2 | X] - E[\theta | X]^2$$ So $E[\theta^2 | X]=E[\theta|X]^2$ would imply, that the distribution of $\theta$ given $X=x$ will have $0$ variance for all values of $x$, implying that $P(\theta = C_x |\: X=x) = 1$, where $C_x=E[\theta | X=x]\:$. Now suppose that $X$ is a continous random variable with density $f_X$, then $$P(\theta = E[\theta | X]) = \int_\mathbb{R} P(\theta = E[\theta | X]|X=x)f_X(x)dx =\int_\mathbb{R}f_X(x)dx = 1 $$ (a similar argument holds when $X$ is discrete) Which means that the identity $E[\theta^2 | X]=E[\theta|X]^2$ can only hold, when $\theta$ is almost surely a function of $X$. Writing $\theta = f(X)$ we always have that $$E[f(X) | X] = f(X)$$ for any function $f$, and therefore $$E[h(f(X)) | X] = h(f(X)) = h(E[f(X)|X])$$ for all functions $h$. There might be other cases where $E[h(\theta) | X]=h(E[\theta | X])$, but as others have mentioned, this is rarely the case. An important case however, is when $h$ is a linear function, since conditional expectation is linear.