In a linear regression minimization problem, why does replacing the expectation by the empirical distributions lead to OLS?

243 Views Asked by At

The linear regression problem can be written in general by first considering random variables $X \in \mathbb{R}^p$ and $Y \in \mathbb{R}$. Then, the linear regression problem can be described as looking for:

$$ \arg\min_{\beta \in \mathbb{R}^p} \mathbb{E}\left[(Y-X\beta)^2\right] $$

which has a closed form solution given by:

$$ \widehat{\beta} = \mathbb{E}[X^TX]^{-1}\mathbb{E}[X^TY] $$

I read in a talk that if we replace the expectation $\mathbb{E}$ with respect to the distribution of $(X,Y)$, by the empirical distributions $\frac{1}{n}\sum_{i=1}^{n}\delta_{Y_i}$ and $\frac{1}{n}\sum_{i=1}^{n}\delta_{X_i}$, corresponding to our dataset, then we retrieve the Ordinary Least Squares regression problem.

Can someone tell me why this is the case?

1

There are 1 best solutions below

4
On BEST ANSWER

The empirical MSE is $$ \frac{1}{n} \sum_{i=1}^n( y_i - \widehat{\mathbb{E}y_i} )^n = \frac{1}{n} \sum_{i=1}^n( y_i - \hat{y}_i )^2, $$ such that $\hat{y}_i = \hat{\beta}_0 + \sum_{j=1}^p\hat{\beta}_j x_j$. As such, the problem is finding the $\hat{\beta}$s that minimize the empirical MSE. Now, note that you can multiple the target function by $n$, it is monotone transformation, thus does not change the solution. Hence, you get the following problem $$ \arg \min_{\beta \in \mathbb{R}^{p+1}} \sum_{i=1}^n (y_i-\beta_0 - \sum_{j=1}^p\beta_j x_j)^2, $$ which is exactly the OLS problem.