Statistical framework for using MSE for linear classification

123 Views Asked by Bumbble Comm At 29 Mar 2026 - 6:27

We learned in class to use a linear model to to predict a real target value y. We made the assumption that $$ y = w^Tx + w_0 + \epsilon $$ where $x$ is the input vector,$w$ is the vector of the linear model parameters and $\epsilon$ is a Gaussian distributed error. The professor showed us that since $y|x$ is a Gaussian random variable, maximizing the likelihood( for $w$) is actually minimizing MSE namely minimizing

$$ \frac{1}{N} \sum_{i=1}^N (y_i - (w^Tx_i + w_0 ))^2 $$ for $w$. This is all good.

But now he said that minimizing the MSE expression above is also valid for a binary classification problem where $ y \in \{-1,1\} $. What is the statistical justification for this ? it is clear that $ y = w^Tx + w_0 + \epsilon $ does not hold any more and $y|x$ is not a Gaussian random variable.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 17 Feb 2024 - 11:57 BEST ANSWER

The validity comes from the close relationship between linear regression and linear discriminant analysis (LDA) for two classes. This result is detailed in ESL Section 4.3 and Exercise 4.2, whose solution is available online. To sum up, when there are only two classes and when the numbers of observations from both classes coincide, the least square classification of $y\in\{-1,1\}$ by solving the MSE criteria produces the same decision as the LDA.

Statistical framework for using MSE for linear classification

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in MAXIMUM-LIKELIHOOD

Trending Questions

Popular # Hahtags

Popular Questions