Most illustrations indicate that the distance of interest in the sum of the squares is the perpendicular distance between individual data points and the hyperplane defined by the linear model. The issue is when these hyperplanes do not go through the origin. Typically, the concept of the dummy variable that is always set to 1 is introduced and then it is concluded that without loss of generality, we can simply concern ourselves with linear (i.e. not affine) hyperplanes. But I think this is only the case when we consider our n-dimensional data as living in the augmented n+1 dimensional space (but restricted to the hyperplane defined by the dummy variable equaling 1).
I am reasonably confident that in these cases (i.e. when our n-dimensional data in the original R$^n$ space requires an affine line instead of a line through the origin), the distances of interest to us are the shortest perpendicular distances between our points and the LINEAR hyperplane defined in n+1 dimensional space using the dummy variable. I am also pretty confident that this is NOT the same distance as the shortest perpendicular distance between our data points and the n dimensional AFFINE hyperplane: the latter is a PROJECTION of the former onto the n+1 dimensional hyperplane on which our data lives (dummy variable = 1).
For example, consider the following example on MSE:
I think that the actual distances being minimized are those distances in the +1 dimensional space (perpendicular distances between data and LINEAR plane in R$^3$), but what is indicated in the figure are the projections of those distances back into R$^2$ (where the hyperplane in R$^2$ is now affine). I am reasonably confident that this detail is not of great significance, as minimizing the perpendicular distances to the n+1 dimensional linear hyperplanes yields the same result as minimizing the distances of the perpendicular distances to the n-dimensional affine hyperplanes, but I find it surprising that I have never seen it noted that those distances are in fact different (assuming I am correct).
Does anyone else agree with this perspective? I have not seen this detail addressed in any resources...