Assume we have 2-dim data $(x,y): {(a_1,b_1),...,(a_n,b_n)}$; where $X, Y$ are random variables. The conditional expectation is $E(X|Y=b_j)=\sum_{i}a_iP(X=a_i|Y=b_j)$
There is a theorem: $E(E(X|Y))=E(X)$, but what does $E(X|Y)$ exactly mean? In the former fomula, $Y=b_j$ is a condition, but $X|Y$ does not really make sense?
I'll add some intuition here as sometimes the mathematical definitons can be difficult to really grasp. This is an attempt to help you make sense of it as you have asked for.
If we perform the random experiment and get an outcome $\omega_0$ then we have a particular value for $E[X\mid Y]$ which is $E[X\mid Y=y_0]$ where $Y(\omega_0)=y_0$. Note that this is the expectation of $X$ given a particular $Y$ value, $y_0$. So we aren't allowed to actually know the value/outcome for $X$ precisely. However, we now average $X$ over the set $\{\omega \mid Y=y_0\}$. $E[X\mid Y]$ is actually constant on this set. I like to think of it as the conditional on $Y$ actually reduces our ability to resolve the probability space $\Omega$ such that we can only see the distinct outcomes of $Y$ and the average value of $X$ when $Y$ takes a particular value.
Furthermore, you wouldn't really look at $X|Y$ written by itself outside of an expectation or probability operator, or without further information about $Y$. $E[X\mid Y]$ is a random variable, but $X|Y$ is not.
One could argue that putting a subset $A\subset\Omega$ behind the conditional bar $``\mid"$ is generally acceptable, $X|A$, and is a random variable. If we are going to condition on some information about $Y$, e.g. $Y=y$ or $Y\in A$, then we can think of $X|\{Y=y\}$ and $X|\{Y\in A\}$ as random variables. Note that $\{Y=y\}$ and $\{Y\in A\}$ are subsets of the sample space $\Omega$ and that $X|\Omega=X|\{Y \text{ takes any possible value}\}=X,$ i.e. if we are conditioning on all values of $Y$ then we just get back $X.$