I recent years, particularly in CS and machine learning, I notice that many authors write things like (see equation (4) for one of the thousands of examples)
Let $f$ be a function/hypothesis/policy in some function space $\mathcal{F}$, then the goal of this "learning problem" is to find $f^\star$ such that $f^\star = \text{argmin}_{f\in \mathcal{F}} \; \; L(f)$, where $L$ is some loss function
Is this a good practice to write an optimization problem this way?
Does it even make sense? For example, I understand what a minimum point means with respect to parameters, i.e., $1 < 2$. But I don't understand what minimum means with respect to functions, as the space of functions is not linearly ordered, i.e., does not make sense to write $\sin < \cos$.
Wouldn't it be clearer, if not more correct, to write it as a minimization problem over the space of the parameters associated with the function instead?
All the algorithms for solving the optimization problem involves the parameters associated with a function as opposed to constructing the function directly.
For instance, the gradient descent is written as,
$w_{k+1} = w_k - \eta \nabla L(w_k), w_k \in \mathbb{R}^n$
and not
$f_{k+1} = f_{k} - \eta \nabla f_k, f_k \in \mathcal{F}$
Can anyone chime in whether writing a minimization problem in terms of the function is a good practice?
The $L$ in this context is a map from the function space $\mathcal{F}$ to the real numbers $\Bbb{R}$; symbolically $L: \mathcal{F} \to \Bbb{R}$. You are right that the function space $\mathcal{F}$ has no ordering defined on it, but the real numbers do; and this is the ordering we are making use of! What we are doing is for each function $f$ in the function space $\mathcal{F}$, we consider the real number $L(f)$, and then we ask if there is a specific function $f^*\in \mathcal{F}$ which minimizes $L$.
Said more concisely, two questions have to be answered: whether there exists an $f^* \in \mathcal{F}$, such that for all $f \in \mathcal{F},$ it is true that $L(f^*)\leq L(f)$. Secondly, if such an $f^*$ exists, how we can we find it? (from your specific quote, it seems the authors are only interested in the second question)
Yes this is indeed the correct way to pose a minimization problem. The confusion you're having is you're thinking of $f$ as the function, and something else (possibly elements of the domain of $f$) as the parameters. But in this context, $L: \mathcal{F} \to \Bbb{R}$ is the function we wish to minimize, and $f \in \mathcal{F}$ are the "parameters" (for $L$).
As mentioned in the comments, the subject of Calculus of Variations is full of such problems, one of the simplest to state is the following:
Given two points $p,q$ in the plane ($\Bbb{R}^2$), find the curve $\gamma$ with shortest length joining the two points. Here, we have a certain function space $\mathcal{F}$ as our "parameters" (those whose initial and ending points are $p$ and $q$ respectively), and we have a map $L: \mathcal{F} \to \mathbb{R}$, which to each curve $\gamma$, assigns its length $L(\gamma)$. (of course it needs to be formulated more precisely, but the general question should be easy enough to understand)