Question
With two vectors $r(x(t), y(x(t), u(t)))$ and $u(t)$, what is the matrix of partial derivatives $$\frac{\partial r}{\partial u}\ =\ ?$$ when the total derivative is known as $$\frac{dr}{dt} = F(x(t)) (u^\ast - u(t))$$ where the RHS is a matrix vector product? By integration I obtain the vector function $r$ such that $r(t_1) = r(t_0) + \int_{t_0}^{t_1} F(x(t)) (u^\ast - u(t)) dt$.
For context, this is from a control problem. I want to feed $u(t)$ to the system using $$\frac{du}{dt} = - \left(\frac{\partial r}{\partial u}\right)^T Q r,$$ which is why I try to determine $\frac{\partial r}{\partial u}$.
Update: With $r \in \mathbb{R}^{n' \times 1}, u \in \mathbb{R}^{m \times 1}$ and a matrix function $F: \mathbb{R}^{n' \times 1} \rightarrow \mathbb{R}^{n' \times m}$, I will most likely have to form the derivative $\frac{\partial F}{\partial u}$ of a matrix w.r.t. a vector to answer this question. That would intermediately introduce tensors of higher order. Potentially, my question is related to the question asked here. The accepted answer suggests working with differentials to avoid higher order tensors, but I don’t know how to apply differentials to my question given the integral in the expression for $r$.
Details
Let me add more information on this question, which emerges from a control problem.
Background
I have a dynamical system $$\frac{d}{dt} \begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}F(x(t)) \ u^\ast\\F(x(t))\ u(t)\end{bmatrix}$$ with a state of size $\mathbb{R}^{n}$, an input vector $u(t)$ and unknown constant vector $u^\ast$ ($u, u^\ast \in \mathbb{R}^m$) and a function matrix $F(x(t)) \in \mathbb{R}^{(n/2) \times m}$. Starting from an initial guess $u(t_0) = u_0$ I want to improve $u(t)$ such that the vector $$r(t) = x(t) - y(t)$$ converges to zero for $t \rightarrow \infty$ ($r, x, y \in \mathbb{R}^{n/2}$).
Aim: The input $u(t)$ is supposed to be an estimate of the unknown $u^\ast$. I try to find a control law $u(t)$ in the form of an ODE $\frac{du}{dt} = \dots$ that ensures the convergence of $r(t) \rightarrow 0$ starting with an initial guess $u(t_0) = u_0$ for $u^\ast$. I am looking for a representation $\frac{du}{dt} = \dots$ such that I can feed $u(t) = u_0 + \int_{t_0}^t \frac{du}{d\tau} d\tau$ into the system.
My attempt
In order to minimize $r(t)$ I define a loss function $$\mathcal{L}(r(t)) = \frac{1}{2} r(t)^T Q r(t)$$ with constant symmetric positive definite $Q > 0$ and $r, x, y \in \mathbb{R}^{n/2}$. From the system dynamics I know that $r$ evolves over time: $$\frac{dr}{dt} = F(x(t)) (u^\ast - u(t))$$ with $u^\ast, u(t) \in \mathbb{R}^m$, unknown constant $u^\ast$ and a function matrix $F(x(t)): \mathbb{R}^{n/2} \rightarrow \mathbb{R}^{(n/2) \times m}$.
In order to get convergence of $r(t) \rightarrow 0$ over time, I want to update $u(t)$ using the gradient flow of the loss function: $$\frac{du}{dt} = - \nabla_u \mathcal{L}.$$
When trying to calculate the (transposed) gradient of $\mathcal{L}$ w.r.t. $u$ $$\left(\nabla_u \mathcal{L}\right)^T = \frac{\partial \mathcal{L}}{\partial u} = r^T Q \frac{\partial r}{\partial u},$$ I encounter as a problem that I only know $\frac{dr}{dt}$ and not $r$ as a function $r(u)$ itself. So I am not sure how to calculate the partial derivatives $\frac{\partial r}{\partial u}$: $$\frac{\partial r}{\partial u} \stackrel{?}{=} \frac{\partial}{\partial u} \int_{t_0}^{t_1} F(x(t)) (u^\ast - u(t)) dt \stackrel{?}{=} - \int_{t_0}^{t_1} F(x(t)) \frac{\partial}{\partial u} u(t) dt \stackrel{?}{=} - \int_{t_0}^{t_1} F(x(t)) dt\ =\ ?$$
I am not sure what the last integral evaluate to. Maybe $F(x(t))$? If so, how? Maybe there are more intelligent ways to obtain $\frac{\partial r}{\partial u}$ or even to achieve $r(t) \rightarrow 0$? I would be very happy if anyone could help me, please?
The final step is then to get the control law $$\frac{du}{dt} = - \nabla_u \mathcal{L} = - \left(\frac{\partial r}{\partial u}\right)^T Q r,$$ for which I need to know $\frac{\partial r}{\partial u}$.
Last remarks
Unfortunately in CS we often use mathematics quite sloppy, but I now try to teach myself to be mathematically rigorous. Is there anything I can change to make my approach more mathematically precise? All suggestions are welcome!
I assume you are meanly interested in driving $r(t)$ to zero and not necessarily exactly with your intended solution. One can still use $\mathcal{L}(r)$ as loss function, or in control theory more commonly called a Lyapunov function. In order to do this one wants that $\dot{\mathcal{L}}$ (short hand for the time derivative of $\mathcal{L}(r)$) is negative definite, or negative semi-definite plus some other way to show that $\mathcal{L}(r)$ does eventually go to zero. Evaluating $\dot{\mathcal{L}}$ yields
$$ \dot{\mathcal{L}} = r^\top\!(t)\,Q\,F(x(t))\,(u^* - u(t)), \tag{1} $$
assuming that $Q$ is symmetric matrix ($Q = Q^\top$). Additionally $\mathcal{L}(r)$ would also only really be a sensible Lyapunov function if $Q$ is also a positive definite matrix.
An initial guess for $u(t)$, which might make $(1)$ at least negative semi-definite, could be using
\begin{align} u(t) &= -\nabla_u \dot{\mathcal{L}}, \\ &= F^\top\!(x(t))\,Q\,r(t), \end{align}
or one could generalized it a little more by adding a positive "gain"
$$ u(t) = \Gamma\,F^\top\!(x(t))\,Q\,r(t), \tag{2} $$
with $\Gamma$ a positive definite matrix in $\mathbb{R}^{m \times m}$. Note that one could always just use the identity matrix $\Gamma$ and obtain the initial expression.
Plugging $(2)$ in $(1)$ and dropping whether or not a variable is a function of time, for sorter notation, yields
$$ \dot{\mathcal{L}} = r^\top Q\,F(x)\,u^* - r^\top Q\,F(x)\,\Gamma\,F^\top\!(x)\,Q\,r, \tag{3} $$
whose second term is negative semi-definite, since the matrix $Q\,F(x)\,\Gamma\,F^\top\!(x)\,Q$ should be positive semi-definite. The first term of $(3)$ scales only linearly in $r$, but the second, negative semi-definite, term scales quadratically. Therefore, when using $(2)$ and assuming that $u^*$ is bounded it can be shown that $\mathcal{L}(r)$ should remain bounded, if the positive semi-definite satisfies certain condition. Such condition could for example be something similar to persistence of excitation, such that for every $t$ one can find $\delta,\alpha_0,\alpha_1 > 0$ such that
$$ \alpha_1\,I \geq \int_t^{t+\delta} Q\,F(x(\tau))\,\Gamma\,F^\top\!(x(\tau))\,Q\,d\tau \geq \alpha_0\,I, \tag{4} $$
with $I$ the identity matrix in $\mathbb{R}^{2n \times 2n}$.
However, one can do better than having a bounded $\mathcal{L}$, and thus $r$, by also trying to estimate $u^*$ and subtracting this from $(2)$, thus instead using
$$ u(t) = \Gamma\,F^\top\!(x(t))\,Q\,r(t) - \hat{u}^*\!(t), \tag{5} $$
with $\hat{u}^*\!(t)$ the estimate of $u^*$. In order to do this I assume that $x(t)$ and $F(x(t))$ are known. Although, one needs to know $\dot{x}(t)$ and $F(x(t))$ to directly relate them to $u^*$. Usually it is not feasible to directly differentiate $x(t)$. However, one can achieve such direct linear relationship for $u^*$ with low-pass and high-pass filters, for example with transfer functions with bandwidth $\omega$
$$ L(s) = \frac{\omega}{s + \omega}, \quad H(s) = \frac{s}{s + \omega}. $$
Namely, such first order high-pass filter is mathematically equivalent to differentiating the output of the low-pass filter (plus dividing by the gain $\omega$) with respect to time. By using the initial differential equation
$$ \dot{x}(t) = F(x(t))\,u^* $$
and applying a low-pass filter to both sides thus yields
$$ \frac{s}{s + \omega} x(t) = \frac{1}{s + \omega} \left(F(x(t))\,u^*\right), $$
here multiplication by a transfer function mean that the time domain signal is filtered by the transfer function. However, transfer functions are linear time invariant, which allows one to factor out the constant $u^*$. This allows us to obtain the following direct linear relationship for $u^*$
\begin{align} z &= \frac{s}{s + \omega} x(t), \\ \phi^\top &= \frac{1}{s + \omega} F(x(t)), \\ z &= \phi^\top u^*. \end{align}
Here I used the $z$ and $\phi$ in order to have similar notation as used in Ioannou, P.A., and Sun, J. (2012). Robust adaptive control. Courier Corporation. and Ioannou, P. and Fidan, B. (2006). Adaptive control tutorial. Society for Industrial and Applied Mathematics.
Using such formulation it is possible to obtain a $\hat{u}^*\!(t)$ which converges to $u^*$, for example using recursive least squares. Such filters also require certain condition to be true in order to guarantee convergence, similar to the persistence of excitation as mentioned previously.
Once one can show that $\hat{u}^*\!(t)$ converges to $u^*$ and $(4)$ (or something similar) is satisfied one should be able to show that $(5)$ drives $\mathcal{L}(r)$ to zero. I did gloss over some of the steps, so if you want to proof this in a completely mathematically rigorous way one should do this with a little more attention to detail. However, I hope that this does give you some insights in how one might choose a $u(t)$ that drives $r(t)$ to zero and be able to proof this as well. It can also be noted that the high- and low-pass filters add additional internal states to the controller, but I suspect that this can be reduced.