Cost Function Confusion for Ordinary Least Squares estimation in Linear Regression

158 Views Asked by At

Just wanted to check that my current understanding of linear regression is correct and address some confusion I have with the cost function used in OLS estimation. My current understanding is this:

Given a data set:

$\{y_i, x_{i1}, x_{i2}, ... ,x_{ip}\}_{i=1}^{N}$

The Multiple Linear Regression Model that most accurately describes the relationship between dependent variable $y$ and independent variables $x_1, x_2, ... , x_p$ is the linear function:

$y = \beta_0x_0 + \beta_1x_1 + ... + \beta_px_p = \sum_{j=0}^p\beta_j(x_j)$

such that $\forall y_i$ (where $y_i = \beta_0x_{i0} + \beta_1x_{i1} + ... + \beta_px_{ip} + \epsilon_i$)

the sum of squared residuals: $\sum_{i}^{N}\epsilon^2 = \sum_{i}^{N}(y_i - (\beta_0x_{i0} + ... + \beta_px_{ip}))^2$ is minimized.

This all comes from the following sources:

https://en.wikipedia.org/wiki/Ordinary_least_squares#Matrix/vector_formulation

https://en.wikipedia.org/wiki/Linear_regression#Simple_and_multiple_linear_regression

https://en.wikipedia.org/wiki/Linear_least_squares

My confusion comes from other sources I have looked at:

https://stackoverflow.com/questions/34148912/feature-scaling-normalization-in-multiple-regression-analysis-with-normal-equa

https://machinelearningmedium.com/2017/08/11/cost-function-of-linear-regression/

which say that the Multiple Linear Regression Model that most accurately describes the relationship between $y$ and $x_1, x_2, ... x_p$ is the same linear function I originally defined above but that the coefficients $\beta_0, \beta_1, ..., \beta_p$ for it are those which minimize the cost function:

$\frac{1}{2N}\sum_{i=1}^{N}(y_i - (\beta_0x_{i0}+ \beta_1x_{i1} + ... + \beta_px_{ip}))^2$

So my question is this: Is my current understanding correct? and which of these cost functions should I be using?

2

There are 2 best solutions below

2
On BEST ANSWER

As already mentioned - all of these functions are equivalent to each other up to a multiplication by some constant. Moreover, you can view the sum of squares as slightly more algebraic approach as $$ \sum_{i=1}^n (y_i - \beta_0 - \beta_1x_i)^2 = \|\mathrm{y} - X\beta\|^2, $$ which is equivalent to minimizing the Euclidean norm itself. That is, finding an orthogonal projection. While,
$$ \frac{1}{N}\sum_{i=1}^n (y_i - \beta_0 - \beta_1x_i)^2 = \frac{1}{N}\sum_{i=1}^n (y_i - g(x_i))^2, $$ where $g(x_i)$ is the mean of $y_i$. This approach may be referred as minimizing the empirical mean squared error (MSE) which is a more statistical POV.

0
On

Either cost function is fine. Since they are just constant multiples of each other, minimising one is equivalent to minimising the other (will result in same fitted $\beta$ coefficients).