Just wanted to check that my current understanding of linear regression is correct and address some confusion I have with the cost function used in OLS estimation. My current understanding is this:
Given a data set:
$\{y_i, x_{i1}, x_{i2}, ... ,x_{ip}\}_{i=1}^{N}$
The Multiple Linear Regression Model that most accurately describes the relationship between dependent variable $y$ and independent variables $x_1, x_2, ... , x_p$ is the linear function:
$y = \beta_0x_0 + \beta_1x_1 + ... + \beta_px_p = \sum_{j=0}^p\beta_j(x_j)$
such that $\forall y_i$ (where $y_i = \beta_0x_{i0} + \beta_1x_{i1} + ... + \beta_px_{ip} + \epsilon_i$)
the sum of squared residuals: $\sum_{i}^{N}\epsilon^2 = \sum_{i}^{N}(y_i - (\beta_0x_{i0} + ... + \beta_px_{ip}))^2$ is minimized.
This all comes from the following sources:
https://en.wikipedia.org/wiki/Ordinary_least_squares#Matrix/vector_formulation
https://en.wikipedia.org/wiki/Linear_regression#Simple_and_multiple_linear_regression
https://en.wikipedia.org/wiki/Linear_least_squares
My confusion comes from other sources I have looked at:
https://machinelearningmedium.com/2017/08/11/cost-function-of-linear-regression/
which say that the Multiple Linear Regression Model that most accurately describes the relationship between $y$ and $x_1, x_2, ... x_p$ is the same linear function I originally defined above but that the coefficients $\beta_0, \beta_1, ..., \beta_p$ for it are those which minimize the cost function:
$\frac{1}{2N}\sum_{i=1}^{N}(y_i - (\beta_0x_{i0}+ \beta_1x_{i1} + ... + \beta_px_{ip}))^2$
So my question is this: Is my current understanding correct? and which of these cost functions should I be using?
As already mentioned - all of these functions are equivalent to each other up to a multiplication by some constant. Moreover, you can view the sum of squares as slightly more algebraic approach as $$ \sum_{i=1}^n (y_i - \beta_0 - \beta_1x_i)^2 = \|\mathrm{y} - X\beta\|^2, $$ which is equivalent to minimizing the Euclidean norm itself. That is, finding an orthogonal projection. While,
$$ \frac{1}{N}\sum_{i=1}^n (y_i - \beta_0 - \beta_1x_i)^2 = \frac{1}{N}\sum_{i=1}^n (y_i - g(x_i))^2, $$ where $g(x_i)$ is the mean of $y_i$. This approach may be referred as minimizing the empirical mean squared error (MSE) which is a more statistical POV.