This question is in the context of regression with squared error loss.
Let $(X,Y) \in \mathbb{R}^p\times\mathbb{R}$ be random variables with joint distribution ${F}$. We randomly sample a training set of $n$ i.i.d observations $D_n = \{(X_i,Y_i): i = 1,\ldots,n\}$.
Let's assume $D_n$ is a fixed realization for now. The goal is to learn a function $m_n(X; D_n)$ from the observed training set that is stochastically "close" to $Y$ in the sense that $L_n = \mathbb{E}[(m_n(X;D_n) - Y)^2)|D_n]$ is minimized. It is well-known that $m^*(X) = \mathbb{E}[Y|X] = \underset{m}{\arg\min}\, \mathbb{E}[(m(X) - Y)^2]$, and we say that the Bayes Error is $L^* = \mathbb{E}[(m^*(X) - Y)^2]$. Hence, $L_n \geq L^*$
Now to my question:
Assume $D_n$ is no longer fixed (i.e. it's a random variable), and therefore $L_n$ is random (through $D_n$). Suppose our procedure for learning $m_n(X;D_n)$ from the training data is consistent, in the sense that $m_n(X;D_n) \xrightarrow{P} m^*(X)$ as $n \rightarrow \infty$. Does this imply that $L_n \xrightarrow{P} L^*$?
=======================
My thoughts on approaching the problem:
Note that $g(m_n(X;D_n)) = (m_n(X;D_n) - Y)^2$ is a continuous function of $m_n(X;D_n)$. Therefore it seems like the answer to my question might be yes, via the continuous mapping theorem. However, $L_n$ is defined as the conditional expectation of $g$ given $D_n$, which makes me unsure if it applies.