Background: Masters in CS/Math.
I'm brushing up on statistics I see mean squared error everywhere. As a student I took it for granted, but now when I tried to find the reasons for why it's so prevalent I am told: simplicity, emphasis on outliers and mathematical properties like differentiability.
So what? It's not the only function with those properties. So why is it used so widely?
- Are there situations where it's provably the best function to use?
- Are there situations where there are other functions that are provably better to use?
- Say I am designing my own heuristic, and I have an error I want to minimize on. How do I know that squaring the error is the best way forward?
It has some other nice properties. For example, if you have a variable $X$ and you want to find the estimator $\hat{X}$ that minimizes: $$ \mathbb{E} [(X - \hat{X})^2] $$ the answer will be $\mathbb{E}[X]$, which is quite nice.
There are very useful results relating MSE and variance. For example, the minimum-variance unbiased estimator if it exists coincides with the MSE minimizer between the unbiased estimators. This allows us to reframe problems of minimizing variance into problems of minimizing MSE loss, which can be approached using optimization techniques.