Trivial question on derivative of quadratic form of vector-valued function

77 Views Asked by At

This seems like a trivial question but I am currently stuck and cannot see what I am doing wrong.

So let us consider a function $f(x) : \mathbb{R}^d \rightarrow \mathbb{R}^d$.

I want to compute the derivative w.r.t. $x \in \mathbb{R}^d$ of an expression that contains a quadratic form of $f(x)$

$$I = f(x)^{\top} C f(x) . $$

Here $C$ is a $d\times d$ matrix.

By taking the derivative w.r.t to the vector $x$ we have

$$ \frac{\partial I}{\partial x} = 2C f(x) \cdot \nabla f(x), $$ where $\nabla f(x)$ denotes the Jacobian of $f$ which will be a $d \times d$ matrix.

Now my problem is that the dimensions of the matrices in the last expression do not match: We have

  • $C: d\times d$,
  • $f(x): d\times 1$, and
  • $\nabla f(x): d \times d$.

So the last two dimensions do not add up. What I am doing wrong? Is the correct derivative $$ \frac{\partial I}{\partial x} = \nabla f(x) 2 C f(x) , $$ or $$ \frac{\partial I}{\partial x} = ( 2 C f(x) )^{\top} \cdot \nabla f(x) $$

1

There are 1 best solutions below

0
On BEST ANSWER

I found a related answer here: https://math.stackexchange.com/a/3128040/527323

Given a differentiable vector field $\mathrm f : \mathbb R^d \to \mathbb R^d$ and a matrix $\mathrm C \in \mathbb R^{d \times d}$, let function $F : \mathbb R^d \to \mathbb R$ be defined by

$$F (\mathrm x) := \langle \mathrm f (\mathrm x), \mathrm C \mathrm f (\mathrm x) \rangle$$

whose directional derivative in the direction of $\mathrm y \in \mathbb R^d$ at $\mathrm x \in \mathbb R^d$ is

$$D_{\mathrm y} F (\mathrm x) := \lim_{h \to 0} \frac{F (\mathrm x + h \mathrm y) - F (\mathrm x)}{h} = \cdots = \langle \mathrm y, \mathrm J_{\mathrm f}^\top (\mathrm x) \, \mathrm C \, \mathrm f (\mathrm x) \rangle + \langle \mathrm J_{\mathrm f}^\top (\mathrm x) \, \mathrm C^\top \mathrm f (\mathrm x) , \mathrm y \rangle$$

where matrix $\mathrm J_{\mathrm f} (\mathrm x)$ is the Jacobian of vector field $\rm f$ at $\mathrm x \in \mathbb R^d$.

Thus, the gradient of $F$ is

$$\nabla_{\mathrm x} F (\mathrm x) = \mathrm J_{\mathrm f}^\top (\mathrm x) \left( \mathrm C + \mathrm C^\top \right) \mathrm f (\mathrm x)$$