I am having trouble intuitively understanding the correctness of the formula to compute the coefficient for the regression line in a linear regression.
I know the formula is
$$\frac{\sum_{i=1}^N (x_i - \bar{x}) (y_i - \bar{y})}{\sum_{i=1}^N(x_i - \bar{x})^2}$$
I have at some point gone through the proof and mechanically understood it. But intuitively I still don't see why above formula computes the correct coefficient. In fact, intuitively I would have said, the coefficient for the regression line needs to be the average ratio of $y_i$ and $x_i$, $(x_i, y_i)$ being the data points.
I wrote a small Jupyter-Notebook to illustrate this. I found that my naive approach is not completely wrong and in fact converges towards the correct value with more data, if the data scatters in a fixed intervall.
So... what is the critical points that my naive approach gets wrong and what is the intuitive explanation for why the correct formula works better?
Continuing with your simplifying assumptions, let's assume for simplicity that $\bar x=0$ and $\bar y=0$, so the standard solution is
$$ \frac{\sum_{i=1}^N x_iy_i}{\sum_{i=1}^Nx_i^2}\;. $$
We can write this as
$$ \frac{\sum_{i=1}^N x_i^2\frac{y_i}{x_i}}{\sum_{i=1}^Nx_i^2}\;. $$
So it's actually a weighted average of the ratios $\frac{y_i}{x_i}$, with weights $x_i^2$, not as different from your proposed solution as you perhaps thought it was.
The question remains why the weights $x_i^2$ in the standard solution are better than the equal weights that you propose to use. This is because under the standard assumption that the $y_i$ all have the same additive error, the errors of values near the origin get amplified when you take the ratio $\frac{y_i}{x_i}$ with small values of $x_i$. It's intuitively clear than when you shift a data point near the origin by a certain vertical error, that changes the ratio more than if you do it with a data point further away; so the ratios for small $x_i$ are more uncertain and should carry less weight.
In fact, this can be stated more quantitatively. If you perform a linear regression with different error bars for the different data points, you find that each data point should be weighted with the inverse of its variance, that is, the inverse square of its standard deviation. Forming the ratio $\frac{y_i}{x_i}$ amplifies the error in $y_i$ by a factor $\frac1{x_i}$, so if we assume that the errors in the $y_i$ are all the same, the errors in the ratios are proportional to $\frac1{x_i}$, so the weights should be proportional to the inverse squares of those errors, that is, to $x_i^2$. So the standard formula is in fact just your formula, properly weighted.