I was trying to understand both intuitively and rigorously what the difference between $E[X\mid Y]$ vs $E[X\mid Y=y]$.
Let me tell you first the things that do make sense to me. $E[X\mid Y=y]$ makes sense to me (I think). For me it means the value that we expect $X$ to have on average given that the event $Y=y$ was observed and it has deterministic value:
$$E[X\mid Y=y] = \sum_x{xp_{X\mid Y}(x\mid y)} = \mu_{X\mid Y=y}$$
that can be computed like any other expectation (where the notation $\mu_{X\mid Y=y}$ denotes the actual real number that $E[X\mid Y=y]$ takes). $E[X\mid Y=y]$ is a function of y. Given $Y=y$, the conditional expectation will always be $\mu_{X\mid Y=y}$, which is governed by what specific y was observed.
However, $E[X\mid Y]$ makes less sense to me. I have read something similar to the following:
Notice now that Y is a random variable this time. Therefore, $E[X\mid Y]$ is a random variable.
That sort of makes sense to me because, if $Y$ is random then the expected value of $X$ has to be random too. In other words, $Y$ being random, then consequently $E[X\mid Y]$ is random too. The statement $E[X|Y]$ is a r.v. and is the exact statement I would like to understand more precisely. I feel I understand less and I would like to address it.
I feel if I really understood this concept of what $E[X\mid Y]$ really means, I should be able to answer the following questions:
1) If $E[X\mid Y]$ is random, then, what are the possible values it can take? Can it only take the values $\mu_{X\mid Y=y}$ for $y \in Y$? Say if $Y=\{1,2,3\}$ and $U_{X,Y} = \{\mu_{X\mid Y=1}= 11, \mu_{X\mid Y=2} = 22, \mu_{X\mid Y=3} = 33 \}$. Then is there any chance that $E[X\mid Y] = 123$?
2) Is the distribution of $E[X|Y]$ over $U_{X,Y}$ or over $Y$?
3) What is the probability distribution of $E[X\mid Y]$ if we have all the information we need about the distributions of $p_X(x)$ and $p_Y(y)$? Is it just the same as $p_Y(y)$? Is there a closed/specific formula for it?
4) When one is asked to find the distribution of $E[X\mid Y]$ are we asked to find $Pr[E[X\mid Y=y]]$ or $Pr[E[X\mid Y] = \mu]$? Is there a difference between the two? Is one nonsense while the other one is a valid probability distribution?
5) If we were to sketch the probability density function for $E[X \mid Y]$, would the horizontal axis be $y$ or $\mu$ ? i.e. would the probability density be a function of $y$ or of $E[X\mid Y] = \mu$? i.e. would $p_{E[X\mid Y]}(k)$ be a function of $y=Y$ or $E[X\mid Y]$?
6) Related to the above two question, it seems to me that writing an explicit formula for $E[X\mid Y=y]$ is easy, while for $E[X\mid Y]$ it is not (or at least for me). Is the formula $E[X\mid Y] = \sum_x{xp_{X\mid Y}(x\mid Y)}$? I would guess it is but for me its a very strange equation because we are conditioning on a random variable, or we are saying given $Y$, but $Y$ is random so its really not given. Therefore, I can't seem to find an expression for it that makes sense to me.
Basically, it is not clear to me what $E[X\mid Y]$ means, because I don't know what its valid outcomes are, what its distribution is (in relation to $p_X(x), p_Y(y)$ or $p_{X,Y}(x,y)$ or anything) nor can I write an explicit formula for it that makes sense to me. I can't even decide if $p_{E[X \mid Y]}(k)$ is a function of y or $\mu=E[X|Y]$.
If $\varphi(y)=E[X\mid Y=y\,]$ is the non-random function of $y$, then $E[X\mid Y\,]$ is defined to be the random variable $\varphi(Y)$.