I've got to solve the following problem:
Bob fitted a linear regression and figured out that his predicted value is 0.5 more than the actual one for 400 points of the test data set and 0.7 less than the actual one for 100 points of the test data set. Thus, there are 500 observations in total. Calculate Bob's MSE.
At the same time, Anna claims that Bob's model is wrong. She thinks that the quality of the model can be increased by changing all the predicted values by some constant. Calculate Anna's MSE assuming that she found the lowest MSE based on the her given constraints.
So I decided to calculate MSE using this formula: $$ \text{MSE} = \frac{1}{N} \sum (y_i-\hat{y}_i)^2 $$
As a result, in case of Bob, I got \begin{align} & \frac{1}{400} \sum_1^{400} (y_i - (y_i + 0.5))^2 + \frac{1}{100} \sum_1^{100} (y_i - (y_i - 0.7))^2 \\[8pt] = {} & \frac{1}{400} \cdot 400 \cdot (-0.5)^2 + \frac{1}{100} \cdot 100 \cdot (0.7)^2 = 0.74 \end{align}
In case of Anna, I thought it should be $$ \frac{1}{400} \sum_1^{400} (y_i - (y_i + 0.5 + a))^2 + \frac{1}{100} \sum_1^{100} (y_i - (y_i - 0.7 + a))^2 .$$ I took the derivative and got $ a = 0.1 $, which makes $ \text{MSE} = 0.72 $.
However, I was told that the solution is incorrect. I can't seem to figure out where I went wrong. I would really appreciate it if someone could help me with that!
You shouldn't be scaling the grouped terms individually. The original equation simply sums them all and then divides the total by the sample size.
For Bob: $$ \frac{1}{500} \left( \sum_1^{400} (y_i - (y_i + 0.5))^2 + \sum_1^{100} (y_i - (y_i - 0.7))^2 \right) = \frac{1}{500} \left( 400(-0.5)^2 + 100(0.7)^2 \right) = 0.298$$
For Alice: Using the same approach as you, first verify that $\frac{d^2}{da^2}MSE(a) = 2 > 0$, so indeed finding the critical point is the way to find the minimum. So, setting $\frac{d}{da} MSE(a) = 2(0.54 + a) = 0$, we get $a=-0.54$. Then, Alice's MSE becomes $0.0064$.