We were told to assume in class that the below optimization formulations are equivalent-
$$\min_w\max_{\delta:||\delta||_F\leq\epsilon}||(X+\delta)w-y||_2^2$$
$$\min_{w}||Xw-y||_2^2+\lambda||w||_2^2 $$
for appropriately chosen $\lambda$.
$X,\delta\in R^{m\times n},~w\in R^{n\times1},~y\in R^{m\times1}$
Can someone please explain why this is true? A reference paper pointing this out would also be appreciated.