Why does the gradient descent of square loss result in minimizing the l2 norm in compressed sensing problem

639 Views Asked by At

Why is the gradient descent of this problem with an extremely small step size (start at $\mathbf{x_0}=\mathbf{0}$) $$ \min_\mathbf{x} ||\mathbf{y}-\mathbf{A}\mathbf{x}||_2^2 $$ equivalent to $$ \min_\mathbf{x} ||\mathbf{x}||_2^2 s.t. \mathbf{y}=\mathbf{A}\mathbf{x} $$ where $\mathbf{x}$ and $\mathbf{y}$ are vectors and $\mathbf{A}$ is a fat matrix.

It seems to be a common result but I cannot tell the why, could anyone help me with that?