The derivation of the well-known projection formula $proj_\vec{b}(\vec{a})=\frac{\vec{a}\cdot \vec{b}}{\vec{b}\cdot \vec{b}}\vec{b}$ uses an argument based completely on geometry. We assume vectors are arrows in a normal cartesian space and use the law of cosines (among other things).
But now let's say that we have a vector space made up of all polynomials of order < N, and call our inner product $$[a(x),b(x)]=\int_{-1}^1ab\ \mathrm{d}x$$ where $a(x)$ and $b(x)$ are in polynomials in our vector space.
Okay, easy enough. But now things are getting complicated: the idea of a projection is a bit different. Now when we say $proj_{b(x)}a(x)$ we mean to say "write $a(x)$ as the linear combination of the two vectors $b(x)$ and "something else", and give back only the part with $b(x)$". Obviously this "something else" is a polynomial that is orthogonal to $b(x)$....
....except that this isn't obvious at all. How can we say that just because the inner product of $b(x)$ and "something else" is $0$ that they have no "common directionality"? Even worse, what does "directionality" even mean?
So this is my question: "How can we derive the projection formula without making references to geometry?"
Thank you very much
Ignore my comment, I wrote it backwards I think. It's basically the same idea as with vectors in $\mathbb{R}^n$: minimize the residual error. To project $a$ onto $b$ means put $a$ into the space spanned by $b$ such that the residual error is minimal (see the diagram below).
Since the projection is in the space spanned by $b$, it is some scalar multiple of $b$. The squared-residual error is $||a-\alpha b||^2$. To minimize set:
$$ f(\alpha) = ||a-\alpha b||^2 $$
$$ f'(\alpha) = 0 \Rightarrow \alpha = \frac{\langle a,b \rangle}{||b||^2} $$
Note that the residual vector $a-\alpha b$ is orthogonal to $b$.