In wikipedia, in reference to generalized linear models, I read:
Ordinary linear regression predicts the expected value of a given unknown quantity (the response variable, a random variable) as a linear combination of a set of observed values (predictors). This implies that a constant change in a predictor leads to a constant change in the response variable (i.e. a linear-response model). This is appropriate when the response variable has a normal distribution (intuitively, when a response variable can vary essentially indefinitely in either direction with no fixed "zero value", or more generally for any quantity that only varies by a relatively small amount, e.g. human heights).
I think I understand intuitively that if the error, after you do a fit with ordinary least squares, is normally distributed, then the OLS was likely a good model. (It got the expectation correct, and the errors were normally distributed about the mean.
But why does the dependent variable (response variable) itself need to be normally distributed? What does it matter if the variable only varies by a small amount? I think they mean the variance is low?
It doesn't need to be normally distributed. The paragraph merely says if normally distributed then the assumptions in OLS are satisfied with log likelihood as our loss function.
Basically in OLS we are minimising a loss function that is a quadratic $L(\beta):=\sum(y_i-x_{i,1}\beta_1-\dots-x_{i,p}\beta_p)^2$. The $x_i$ doesn't need to be stochastic (for example, if you know $y$ is a complicated function in $x$'s, and you know the exact values at known $x_i,y_i$ but due to computational resource constraints you can only fit linear models, then it still make sense to come up with the best approximation that minimizes the sum of squared errors at these known values).