Recently I've been looking at Bezier curves and trying to understand how they work. I know that a general Bezier curve is given by the equation
$$ \vec{\mathbf{B}}(t) = \sum_{k=0}^n{b_{k,\ n}(t)\vec{\mathbf{P}}}_k $$ where $b_{k,\ n}(t)$ are the Bernstein Basis polynomials $$ b_{k,\ n}(t) = {n \choose k}t^k(1 - t)^{n-k}. $$ On an intuitive level I understand why this construct creates such a smooth curve. Basically, what's happening is as $t$ ranges from $0$ to $1$, it ranges over the maxima of the Bernstein polynomials. This causes the different points in the sum to receive different weight values, and when $t = k/n$, the point with the greatest weight is $\vec{\mathbf{P}}_k$, so the curve tends towards that point. This is what causes Bezier curves to be so smooth.
Now I thought, could I use this intuitive understanding of a Bezier curve to construct other types of Bezier curves? The main thing I had in mind was to make a "Bezier Curve" that passed through all the control points. To do this, instead of using Bernstein basis polynomials, I created my own polynomials:
$$
P_{r,\ n}(t) = \begin{cases}
(-1)^n\left(\frac{2}{r}\right)^{2n}t^n(t - r)^n, & 0 < x < r \\
0, & \text{otherwise}
\end{cases}
$$
These polynomials have the property that they have maxima at $r/2$ equal to $1$, have a double root at $0$ and $r$, and are $C^\infty$ continuous. I thought that if I defined a Bezier curve as
$$
\vec{\mathbf{B}}(t) = \sum_{k=0}^n{P_{r,\ n}\left(t + \frac{k -1}{n}\right)\vec{\mathbf{P}}_k}
$$
with $r = \frac{2}{n-1}$, then the curve would smoothly interpolate between each of the points. The result was less than satisfactory.

That "curve" was formed with the points $P_0 = \{0,\ 0\}$, $P_1 = \{1,\ 3\}$, $P_2 = \{3,\ 2\}$, $P_3 = \{4,\ 5\}$, and $P_4 = \{5,\ 0\}$.
Okay, so maybe the Bernstein basis polynomials form a smooth curve because they aren't 0 everywhere other than the interval $(0, r)$. So I edited the polynomials: $$ P_{r,\ n}(x) = \begin{cases} (-1)^n\left(\frac{2}{r}\right)^{2n}x^n(x-r)^n, & x \in [0, r] \\ (-1)^n\left(\frac{2}{r}\right)^{2n}2^{-\left|\left\lfloor\frac{x}{r}\right\rfloor\right|}\left(x-2r\left\lfloor\frac{2x}{r}\right\rfloor\right)^n\left(x - 2r\left\lfloor\frac{2x}{r}\right\rfloor-\frac{r}{2}\right)^n, & \text{otherwise} \end{cases} $$
The following picture illustrates $P_{\frac12,\ 4}(x)$ on the interval $[0, 1]$ (note the x-axis is scaled by $1000$).

What does the curve look like now?

Yikes. I guess that wasn't the solution either.
The last thing I wanted to try was to scale the Bernstein basis polynomials so their maximum was at $y=1$.
$b_{v,\ n}(t)$ has local maximum at $x = \frac{v}{n}$, $y = {n\choose v}\left(\frac{v}{n}\right)^v(1-\frac{v}{n})^{n-v}$. So if we want to scale the Bernstein basis polynomials, we just have to scale them by the inverse of $y$. Define $$ \vec{\mathbf{B}}(t) = \sum_{k=0}^n{C_{k,\ n}b_{k,\ n}(t)\vec{\mathbf{P}}_k} $$ with $$ C_{k,\ n} = \begin{cases} 1, & \text{if $k = 0$}\\ \left[{n\choose k}\left(\frac{k}{n}\right)\right]^{-n}(1-\frac{k}{n})^{k-n}, & \text{otherwise} \end{cases} $$
What does our curve look like now?

What on earth? That's not even close!
So, is my initial intuition wrong or incomplete? What is it about the Bernstein basis polynomials that causes the Bezier curve to be so smooth?
As an addendum, why aren't any of my curves (except the last one) continuous, even though the basis polynomials are $C^\infty$ continuous?
EDIT: Come to think of it, the second curve I created kind of reminds me of some sort of Kochanek-Bartels spline variant with weird $t$, $b$, and $c$ parameters. Did I accidentally stumble across one?
Your basic idea is correct. There's nothing very special about the Bernstein polynomials, and in fact there are other alternatives that are sometimes used.
The fundamental idea is that we are dealing with "blended curves" of the form $$ \mathbf{C}(t) = \sum_{i=0}^m\phi_i(t)\mathbf{P}_i $$ where $\phi_0, \ldots, \phi_m$ are "blending" functions and $\mathbf{P}_0, \ldots, \mathbf{P}_m$ are control points. Note that we are forming a linear combination of points, here, and this makes sense only if the coefficients add up to one.
The properties of the curve are heavily dependent on the properties of the blending functions, of course. See these notes for some further info.
If you want your curve to pass through all the control points, you can use Lagrange polynomials as blending functions. For the cubic case, the Lagrange polynomials are: \begin{align*} \phi_0(t) & = -\tfrac12 (3t-1)(3t-2)(t-1) = -\tfrac92 t^3 + 9t^2 - \tfrac{11}2 t + 1 \\ \phi_1(t) & = -\tfrac92 t(3t-2)(t-1) = \_\tfrac{27}2 t^3 - \tfrac{45}2 t^2 + 9t \\ \phi_2(t) & = -\tfrac92 t(3t-1)(t-1) = -\tfrac{27}2 t^3 + 18t^2 - \tfrac92 t \\ \phi_3(t) & = \_\tfrac12 t(3t-1)(3t-2) = \_\tfrac92 t^3 - \tfrac92 t^2 + t \end{align*}
If you graph these, you'll see why they work. Lagrange polynomials sum to one, but they are negative in some places. This means you lose the convex hull property of Bezier curves. Also, interactive control is less intuitive: if you move a control point in some direction, some parts of the curve will move in the opposite direction.
You can use rational functions (rather than polynomials) as blending functions. That's what rational Bezier curves are.
You can use (piecewise polynomial) spline functions for blending; that gives you parametric spline curves.
You can use trigonometric functions for blending, which gives you so-called "trigonometric Bezier" curves.
Finally, there are blending functions that make use of hyperbolic sines and cosines. These lead to the construction of so-called “splines under tension”. See papers from the 1970's by Cline, Schweikert, Spath, and others.
And so on.