My textbook, Deep Learning by Goodfellow, Bengio, and Courville, says the following in a section on constrained optimization:
Sometimes we wish not only to maximize or minimize a function $f(\mathbf{x})$ over all possible values of $\mathbf{x}$. Instead we may wish to find the maximal or minimal value of $f(\mathbf{x})$ for values of $\mathbf{x}$ in some set $\mathbb{S}$. This is known as constrained optimization. Points $\mathbf{x}$ that lie within the set $\mathbb{S}$ are called feasible points in constrained optimization terminology.
We often wish to find a solution that is small in some sense. A common approach in such situations is to impose a norm constraint, such as $||\mathbf{x}|| \le 1$.
What I don't understand is that, if we wish to find a solution that is small in some sense, then why would we impose a norm constraint on the input rather than on the output? After all, that is not how a function necessarily works: Putting a small input into a function does not necessarily mean that the output will also be small. This obviously depends on the function itself, and it seems like a very elementary fallacy - something a student learning functions for the first time would think. So why would we not, in some way, impose a norm constraint on the output instead? For instance, by iteratively selecting "small values" for the input and discarding values of the input/output when the output violates the imposed norm constraint, until we get an input value that leads to an acceptable output value within the imposed norm constraint?
I would greatly appreciate it if people could please take the time to clarify this.
By solution, they means $x$, they are not referring to $f(x)$.
One example is we might want to $\|x\|_1$ to be small as that would encourage the solution to be sparse and hence that could be more interpretable.