Imagine $X$ is a sample and we have a model $y = \alpha + \beta x$. We know that points $(x_1, ... ,x_n)$ from $X$ do not necessarily satisfy the equation, so we include $\epsilon _i$ error terms. Now we have better model $y = \alpha + \beta x + \epsilon$. We want to minimize the sum of the squared error terms, so we solve the following minimization problem:
$$ \min_{\alpha,\beta} \sum_{i=1}^{n} \epsilon_i^2 =\min_{\alpha,\beta} \sum_{i=1}^{n} (y_i - \alpha -\beta x_i)^2 $$
We call $\hat{\alpha}$ and $\hat{\beta}$ the solutions of the above optimization problem and now we have the best fitting model in terms of ordinary least square method.
I could not solve this minimization problem myself, but I will believe the derivation from the following Wikipedia article.
It is obvious that the center of mass point $(\bar{x}, \bar{y})$ passes through the model line.
Now, it's time for another of my stupid questions: If we know that in the true model, $Y=0$, when $X=0$ (i.e. intercept term is zero), isn't it very easy to create the model by just connecting the origin $(0,0)$ with $(\bar{x}, \bar{y})$?