Mean-squared prediction error why these two different representations are the same?

Question

Mean-squared prediction error why these two different representations are the same?

38 Views Asked by Bumbble Comm At 28 Mar 2026 - 1:32

One representation is E[(y-g(x))^2] and other is $(1/2)m\sum_{i=1}^m (g(x^i)-y^(i))^2$ where m is number of training examples in a training set.

First one is from statistical learning book, second is from machine learning. How can I make the first expectancy equal to the second expression?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Let $x\sim U(S)$ be a random data sample drawn from a discrete uniform distribution over a set $S$ of size $|S|=m$. Then then probability of choosing $x$ at random is $p(x)=1/m$. Let $y_x$ be the associated label of $x$ (written $x_i$ and $y_i$ for a specific realization). Let $g(x)$ be your learned regression function. Then \begin{align*} \mathbb{E}_{x\sim U(S)}\left[ (y - g(x))^2 \right] &= \sum_{x\in S} (y - g(x))^2 p(x) \hspace{0.51in} \text{By definition of expectation}\\ &= \frac{1}{m} \sum_{x\in S} (y - g(x))^2 \hspace{0.59in} \text{By $p(m)=1/m$}\\ &= \frac{1}{m} \sum_{i= 1}^m (y_i - g(x_i))^2 \hspace{0.5in} \text{By summing over all $S$}\\ &=: \text{MSE}_S(g) \end{align*} is the mean-squared error loss of $g$ on $S$. This should be no surprise, since "expectation" essentially means taking the weighted average with respect to some probability distribution (here, uniform).

In the machine learning case, we often set the training loss to be $$\mathcal{L}_S(g) = \frac{1}{2}\text{MSE}_S(g),$$ so that when taking the derivative of $\mathcal{L}$ with respect to the parameters of $g$, the $1/2$ cancels the $2$ that comes from the squaring in the objective function. It's merely for aesthetics. See also this post.

In other words, they are not equal; they are different by a factor of $1/2$. But the solution in terms of the parameters of $g$ is unchanged by this scaling.

Mean-squared prediction error why these two different representations are the same?

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions