The linear regression problem can be written in general by first considering random variables $X \in \mathbb{R}^p$ and $Y \in \mathbb{R}$. Then, the linear regression problem can be described as looking for:
$$ \arg\min_{\beta \in \mathbb{R}^p} \mathbb{E}\left[(Y-X\beta)^2\right] $$
which has a closed form solution given by:
$$ \widehat{\beta} = \mathbb{E}[X^TX]^{-1}\mathbb{E}[X^TY] $$
I read in a talk that if we replace the expectation $\mathbb{E}$ with respect to the distribution of $(X,Y)$, by the empirical distributions $\frac{1}{n}\sum_{i=1}^{n}\delta_{Y_i}$ and $\frac{1}{n}\sum_{i=1}^{n}\delta_{X_i}$, corresponding to our dataset, then we retrieve the Ordinary Least Squares regression problem.
Can someone tell me why this is the case?
The empirical MSE is $$ \frac{1}{n} \sum_{i=1}^n( y_i - \widehat{\mathbb{E}y_i} )^n = \frac{1}{n} \sum_{i=1}^n( y_i - \hat{y}_i )^2, $$ such that $\hat{y}_i = \hat{\beta}_0 + \sum_{j=1}^p\hat{\beta}_j x_j$. As such, the problem is finding the $\hat{\beta}$s that minimize the empirical MSE. Now, note that you can multiple the target function by $n$, it is monotone transformation, thus does not change the solution. Hence, you get the following problem $$ \arg \min_{\beta \in \mathbb{R}^{p+1}} \sum_{i=1}^n (y_i-\beta_0 - \sum_{j=1}^p\beta_j x_j)^2, $$ which is exactly the OLS problem.