If we have two joint distributions on $(X,Y)$, $P_{X,Y}$ and $Q_{X,Y}$ that are close in $L^1$ or "total variation" with $\|P_{X,Y}-Q_{X,Y}\|_1<\varepsilon$ then:
Are the distributions on the conditional expectations $P_{E[X|Y]}, Q_{E[X|Y]}$ also close in $L^1$, e.g. $\|P_{E[X|Y]}-Q_{E[X|Y]}\|_1<N\varepsilon$?
In particular, what about the case where $Y=X+Z$ with $X\perp Z$?
- If $X,Y$ are finite mean and variance, then are the means and variances of $E[X|Y]$ under the two distributions close?
Assuming they are all continuous RVs, the difference is:
\begin{align} \left|P_{E[X|Y]}(w)-Q_{E[X|Y]}(w)\right| &= \left|\int_{y\in A_P} P_Y(y) dy -\int_{y\in A_Q } Q_Y(y) dy\right| \end{align}
with $A_P=\left\{y:\int x P(X=x|Y=y) dx = w\right\},\ A_Q=\left\{y:\int x Q(X=x|Y=y) dx = w\right\}.$
I can't figure out how to make the difference between the two sets small, which makes me suspicious that conditional expectation is unfortunately not continuous in this sense.
For a counterexample for your first two cases, let $X,Z$ be independent Rademacher random variables and $Y = X+Z$. Under $P$, suppose $X$ is Rademacher($p$) for some $p$ (i.e. $P(X=1) = p$, $P(X=-1)=1-p$), and $Z$ is Rademacher(1/2). Under $Q$, suppose $X,Z$ are both Rademacher(1/2). In each case, take $Y=X+Z$ as you suggest.
Now you can check that $$P_{X,Y} = \frac{p}{2} (\delta_{(1,2)} + \delta_{(1,0)}) + \frac{1-p}{2} (\delta_{(-1,0)} +\delta_{(-1,-2)})$$ and $Q_{X,Y}$ is the same with $p=1/2$. You may compute directly that the total variation distance between $P_{X,Y}$ and $Q_{X,Y}$ is $|1-2p|$ and in particular it goes to zero as $p \to 1/2$.
Next, under $P$, note that $$E[X \mid Y] = \begin{cases} 1, & Y=2 \\ -1, & Y = -2 \\ p - \frac{1}{2}, & Y=0 \end{cases}$$
where these events have probabilities $p/2$, $(1-p)/2$, $1/2$ respectively. Hence $$P_{E[X \mid Y]} = \frac{p}{2} \delta_1 + \frac{1-p}{2} \delta_{-1} + \frac{1}{2} \delta_{p - \frac{1}{2}}$$ and $Q_{E[X \mid Y]}$ is the same with $p=1/2$. In particular, for $p \ne 1/2$ the total variation distance between $P_{E[X \mid Y]}$ and $Q_{E[X \mid Y]}$ is always at least 1.
For the third question, as far as means, the "conditional" part is really irrelevant, so we can take $Y=0$ in all cases. Fix $n$ and suppose that $P(X=n) = 1/n$, $P(X=0) = 1-1/n$, and $Q(X=0)=1$. Then the total variation distance between $P_X$ and $Q_X$ (and likewise $P_{X,Y}, Q_{X,Y}$) is $2/n$, which goes to 0 as $n \to \infty$. But under $P$, we have $E[X \mid Y] = E[X] = 1$ for any $n$, while under $Q$ it is 0.
The basic problem is that for any state space $E$, the total variation topology on the space $\mathcal{P}(E)$ of probability measures on $E$ is not able to detect the topology of $E$, but only its measurable structure. So it cannot tell whether two points of $E$ are close together or far away, but only whether or not they are the same point. This is why the weak topology is more useful for most purposes.