Let $`\mathcal{X}$ be the random vector describing the input distribution of the NN, and $\mathcal{Y}$ be the output distribution. Let, X, Y be the finite samples we have that form the training set (ordered correctly). Let L be a convex loss function and R the empirical risk. Do we have that $F = \argmin_{f}(R(f))$, is such that for all i in {1, |X|}, $F(x_{i}) = \mathbb{E}(\mathcal{Y}|x_{i})$.
I found a proof online for when L is MSE, in that case the proof is quite straightforward as it follows from properties of the expectation. I thought about using Jensen's inequality, but this just yields a lower bound, lastly I tried to argue that E(L) being minimized implies L(E) also being minimized, but is that even true ?