Let's say we have a functional $J[f] = \int_a^b L(x, f, f') dx$ that we are trying to minimize. I'm trying to think what is the best way to do this numerically, with something like gradient descent.
My idea goes like this:
- Let $f_0 = [f(x_1), \ldots, f(x_n)]^T$ be a vector containing my current guess for the function (due to computational constraints, the function is represented by what it evaluates to at a bunch of points between $a$ and $b$ [e.g. $x_1=a, x_2 = a+\epsilon, x_3 = a + 2\epsilon$, etc.])
Compute the direction $D=[D_1, \ldots, D_n]$ to step in by either
- $D=\left. \frac{\delta J}{\delta f} \right|_{f=f_0,x=x^{(n)}}$. I.e. the variational derivative evaluated at our current guess for the function (and at the points $x^{(n)}=x_1, \ldots, x_n$).
- $D=\left. \nabla_f J \right|_{f=f_0}$, which is the gradient of $J$ with respect to $f$ being treated as a $n$-dimensional vector. This gradient can be estimated numerically by perturbing each element of the $f_0$ vector.
- Update the current guess to be $f_0 \leftarrow f_0 - \eta D$, where $\eta$ is some suitable step size.
My question is, which of these two methods (if any) can accomplish this? Or is there a better way?
Note: when evaluating $J[f_0]$ numerically, I'd probably use something like spline interpolation to get $f'$, and a Reimann sum to compute the integral.
Honestly, I don't know of any good books on how to do this well, but really you're working in the area of PDE constrained optimization. Even though, you don't have constraints, you're working with function spaces and that community deals with that issue. If you do want a book, you can try Optimization with PDE Constraints by Hinze, Pinnau, Ulbrich, and Ulbrich.
Typically, the issue with optimizing in a Hilbert or Banach space is that you're going to have to discretize everything at some point. The question is whether to do that before or after deriving the optimality conditions and gradients. This is the infamous discretize-optimize or optimize-discretize debate. This is discussed in chapter 3 of the above book.
As someone who has dealt with this a lot, just discretize your problem first and be done with it. There are some technical results that say that this can lead to suboptimal convergence results. At the end of the day, though, trying to get this to work properly is difficult and you'll constantly be dealing with adjusting the inner product to not get error in the gradient calculation. I've seen teams struggle with this for years. Just discretize and be happy. No, I don't care if my constants are mesh independent.
That said, the discretization approach really dictates how you should approach things. The vogue thing to do in this community is to use a finite element method. That's fine. The say it works is to define a functional basis $B:\mathbb{R}^m\rightarrow C^{1}(\Omega)$. Technically, you could probably get away with $L^2(\Omega)$, but don't make life difficult. Anyway, then we can define something like
$$ B \alpha = \sum\limits_{i=1}^m f_i \alpha_i $$
and
$$ B^\prime \alpha = \sum\limits_{i=1}^m f_i^\prime \alpha_i $$
Essentially, $B$ maps the finite element coefficients to the function space. Likely, you'll need the adjoint operator, which in this case is
$$ B^{*} f = \begin{bmatrix}\langle f,f_1\rangle\\\vdots\\\langle f,f_m\rangle\end{bmatrix} $$
Then, in your case, you'll optimize
$$ J(\alpha) = \int_a^b L(x,B\alpha,B^\prime\alpha) $$
At this point, you've a function in $\mathbb{R}^m$. You can use any gradient based method, but given the size I'll recommend a matrix-free Newton method. There's a lot of nuance to that method, but they scale appropriately. Something like Numerical Optimization by Nocedal and Wright have enough details to get started.