I'm trying to understand how the B-Spline definition is constructed. That is, where did the knot vector and the basis functions and their recursive definition come from.
The definition can be seen here: http://web.cse.ohio-state.edu/~tamaldey/course/784/note5.pdf
There it's also shown how the curve function is got. But not how the reasoning went in building the theory.
In a nutshell, when people (engineers) started looking at tools to define freeform curves and surfaces, they understood the potential of approximation methods (as opposed to interpolation), which use control points close to the curve rather than exactly on the curve.
The general principle is to define as set of weighting functions $w_k(t)$ and compute the weighted average of the coordinates of the control points.
$$P=\frac{\sum_{k=0}^n w_k(t)P_k}{\sum_{k=0}^n w_k(t)}.$$
The weight functions $w_k$ are chosen so that they are smooth and dominate the other weights in some part of the range of $t$, so that the curve is "attracted" by the corresponding control point.
Pierre Bézier pioneered this work using the Bernstein polynomials ($t^k(1-t)^{n-k}$ with a coefficient) as the weights.
As these polynomials are global (the weights are not zero inside the interval), the control points have long-range influence and the approach isn't practical for a large number of them.
So the next approaches started using piecewise polynomials, nonzero in a subinterval and zero elsewhere. (The situation is similar with Lagrangian interpolation, which shouldn't be used for large $n$, and replaced by piecewise polynomial schemes such as cubic splines.)
The limits of the polynomial pieces define the knot vector. The weights functions on $n$ points are defined recursively as a linear combination of weight functions of subsets of size $n-1$. (This reminds of the recursive scheme used in the Neville algorithm for Lagrangian interpolation.)