I was reading a tutorial written on Linear Regression by Avi Kak (https://engineering.purdue.edu/kak/Tutorials/RegressionTree.pdf). There is a part about geometric interpretation of linear regression on pg.19.
The optimum solution for β~ that minimizes the cost function C(β~) in Eq. (14) possesses the following geometrical interpretation: Focusing on the equation ~y = Xβ~, the measured vector ~y on the left resides in a large N dimensional space. On the other hand, as we vary β~ in our search for the best possible solution, the space spanned by the product Xβ~ will be a (p+1)-dimensional subspace (a hyperplane, really) in the N dimensional space in which ~y resides. The question now is: which point in the hyperplane spanned by Xβ~ is the best approximation to the point ~y which is outside the hyperplane. For any selected value for β~, the “error” vector ~y − Xβ~ will go from the tip of the vector Xβ~ to the tip of the ~y vector. Minimization of the cost function C in Eq. (14) amounts to minimizing the norm of this difference vector.
I could not understand how to relate N-dimensional space and (p+1)-dimensional subspace. B vector defines a (p+1) dimensional subspace but I could not understand why N dimensional space contains the (p+1) subspace. As I understand in (p+1) each dimension means features but in N dimensional space each dimension means a data point. I'm a lot confused about the idea. Are there any other resource that explains the idea in a much more detail? or Could anyone explains the idea how these spaces relate?
The matrix $X$ is an $N\times (p+1)$ matrix. Its row space is at most of dimension $N$ and its column space is at most of dimension $p+1$. In your notes, it is assumed that $N>p+1$ and the columns of $X$ are linearly independent. So the rank of $X$ is $p+1$. A theorem in linear algebra says that the dimension of the row space and that of the column space are the same. So the row space is also of dimension $p+1$.
By matrix multiplication, the vector $y=X\beta$ is an $N\times 1$ matrix. This is why they say $y$ is in a large $N$ dimensional space (since there are $N$ rows).
On the other hand, $y=X\beta$ implies that the vector $y$ is a linear combination of the column vectors of $X$, and hence in the column space of $X$, which is of dimension $p+1$.
Here is an example.
Let $$ X= \begin{pmatrix} 1 & 0 & 1\\ 0 & 1 & 1\\ 0 & 0 & 1\\ 1 & 1 & 1 \end{pmatrix} $$ $N=4$ and $p=2$.
On the one hand, for any $\beta=(b_1,\cdots,b_4)^T$, $y=X\beta$ is a vector in a large space $\mathbb{R}^4$, which is of dimension $N=4$, it is also a vector in a subspace (of $\mathbb{R}^4$) of dimentions $p+1=3$, namely, the space span by the column vectors of $X$.