Define a cost function :
$$L(x,w) = \frac{1}{2} \sum_{i=1}^N\left(y^{(i)} - w^T\phi\left(x^{(i)}\right)\right)^2 + \frac{\lambda}{2}\sum_{j=1}^M w_j^2$$
Show that minimizing $L$ is equivalent to minimizing
$$\hat L(x,w) = \frac{1}{2} \sum_{i=1}^N\left(y^{(i)} - w^T\phi(x^{(i)}\right)^2$$
subject to the constraint
$$\sum_{j=1}^M w_j^2 \leq e$$
What is the significance of $e$?
So my poor understanding Lagrange multipliers leads to do the following:
Set
$$ g(x, w) = \sum_{j=1}^M w_j^2 - e $$
so
$$\mathcal{L}(x,w,\lambda) = \hat L(x,w) = \frac{1}{2} \sum_{i=1}^N\left(y^{(i)} - w^T\phi(x^{(i)}\right)^2 + \lambda \left(\sum_{j=1}^M w_j^2 - e\right)$$
So $\mathcal{L}$ appears very similar to $L$. I think this is where the $e$ comes into play. If the two functions are to be identical (which I don't think is the point), then
$$e = \sum_{j=1}^M \frac{w_j^2}{2}$$
but looking at the original constraint
$$\sum_{j=1}^M w_j^2 \leq e \implies \sum_{j=1}^M w_j^2 - \sum_{j=1}^M \frac{w_j^2}{2} \leq 0$$
which is clearly violated, unless $w = 0$, and assuming we're dealing with real numbers.
I just don't see why $e$ is so special. Nor do I see what the purpose of the exercise is. Please help me see what I'm supposed to.
You basically have it. The question is asking about showing that imposing a constraint on $\vert \vert w \vert \vert_2^2$ is equivalent to penalizing $\vert \vert w \vert \vert_2^2$ in the objective. This means that you need to show that for each $\lambda \geq 0$ there is some $e \geq 0$ such that the two optimization problems return the same optimal $w$'s. It doesn't mean that the two functions are identical
In fact, if we assign a price of $\mu$ to the second constraint then the second function would have a value $-\mu e$ cheaper, since putting a constraint in the objective means writing "$+\mu(w^\top w-e)$"