I am confused by the language of my math text book where it says $\hat y = \alpha \times$basis function summation from $i=1$ to $n$? What are basis functions? And why does it says there are more data points than basis functions? I know how to derive $\alpha$ using linear algebra and calculus by using $mx + c$ as the equation of the regression line.
Also I am a bit confused on the fact that how does it know that the smaller distance of the actual data point and $\hat y$ is perpendicular to the line of $\hat y$? I will be very very grateful if anyone can help me with this.
A model which is linear with respect to the parameters write $$y=\sum_{k=1}^n a_k\,f_k(x)$$ and the $f_k(x)$ are the $k$ basis functions.
Suppose that you want to fit data to the model $$y=a_0+a_1\sin(x)+a_2\log(x)+a_3 e^{-\pi x}$$ Define $t_i=\sin(x_i)$, $u=\log(x_i)$, $v=e^{-\pi x_i}$. So the model is just $$y=a_0+a_1t+a_2u+a_3v$$ which corresponds to a multilinear regression.