Derivative of vector/matrix

376 Views Asked by At

I have to calculate the gradient of:

$ F(x) = 1/2 * x^T * H *x$

where H is a constant symmetric n x n matrix and x an nx1 vector . My question: Do i have to multiply out the expression to get a scalar, or do i need to use a chain-rule? If i multiply it out (I did it with an easier 3x3 matrix) - the result is $grad(f(x) = H$ which should be correct.

I need to get used to this vector/matrix derivation - but I cant always multiply things out (by easier examples). So: Is there a better way to do such things :)? Thanks for your help

2

There are 2 best solutions below

0
On

You have two ways to tackle the problem:

  1. Immediately go to the vector coordinates and compute the partial derivatives.
  2. Stay at the vector level and compute the Fréchet derivative. From there you can if you want retrieve the partial derivatives.

On my side, I prefer to use the second technique as (1) it allows to avoid the confusion with many indexes and (2) can be applied in any vector spaces.

I follow on with the second technique from there.

The target is to use the chain rule with the appropriate maps. Consider the bilinear map $$\begin{array}{l|rcl} f : & V \times V & \longrightarrow & V \\ & (u,v) & \longmapsto & u^T * H *v \end{array}$$where $V = \mathbb R^n$. The derivative of $B$ at point $(u,v)$ is the map $$B^\prime(u,v).(h,k)=h^T * H * v + u^T * H * k.$$

Now the important point is to notice that $$F(x)=\frac{1}{2}B(x,x).$$ Hence applying the chain rule, you get $$F^\prime(x).h=\frac{1}{2}h^T * H *x + \frac{1}{2}x^T * H *h.$$ Which is equal to $$\color{red}{F^\prime(x).h=x^T*H* h}$$ as you suppose $H$ symmetric. Here $F^\prime(x)=\nabla F(x)=H*x$ is therefore the gradient.

From there, you can retrieve the partial derivatives. Note $(e_1, \dots, e_n)$ the canonical basis of $\mathbb R^n$. You have $$\color{red}{\frac{\partial F}{\partial x_i}(x)=F^\prime(x).e_i=\nabla(x)*e_i=x^T*(H*e_i)=\sum_{i=j}^n x_j H_{ji}}$$

3
On

You may always identify where your objects live when you deal with differentiation problems. Here we have $$\begin{array}{ccccc} F & : & \mathbb{R}^3 & \longrightarrow & \mathbb{R}\\ & & x & \longmapsto & \frac{1}{2}x^THx \end{array}.$$ So multiplication here is matricial multiplication. Now, $\mathbb{R}^3$ is a pre-Hilbert space : it is provided with a dot product $\left(x,y\right)\mapsto\langle x,y\rangle:=x^Ty$ for all $x,y\in\mathbb{R}^3$, so we can rewrite $$F(x)=\frac{1}{2}\langle x,Hx\rangle.$$ Now, for multilinear applications in general, the Leibnitz's rule applies ; in particular, $\langle.,.\rangle$ is a bilinear form on $\mathbb{R}^3$, and thus the differential of the map $x\mapsto \langle x,Hx\rangle$ in the direction $\xi\in\mathbb{R}^3$ is the (bounded linear) map $$\xi\longmapsto \langle \xi,Hx\rangle+\langle x,H\xi\rangle.$$ If you do not know this formula for general multilinear maps, you can easily derive it for your case : $$\langle x+\xi,H(x+\xi)\rangle=\langle x,Hx\rangle+\langle \xi,Hx\rangle+\langle x,H\xi)\rangle+\langle \xi,H\xi\rangle$$ and $|\langle \xi,H\xi\rangle|\leq\max_{1\leq i,j\leq3}|H_{ij}||\xi|^2$.

Finally, as you talk about the gradient in you question, remember that the gradient is defined in general as the only vector in $\nabla F(x)\in\mathbb{R}^3$ such that $$\mathrm{d}F(x)[\xi]=\langle\nabla F(x),\xi\rangle\quad\quad\quad\forall\xi\in\mathbb{R}^3.$$ Hence, as $\langle.,.\rangle$ is bilinear symmetric and as $\langle x,Hy\rangle=\langle H^Tx,y\rangle$ for all $x,y\in\mathbb{R}^3$, you can write $$\langle\nabla F(x),\xi\rangle=\mathrm{d}F(x)[\xi]=\frac{1}{2}\langle \xi,Hx\rangle+\frac{1}{2}\langle x,H\xi\rangle=\frac{1}{2}\langle Hx,\xi\rangle+\frac{1}{2}\langle H^Tx,\xi\rangle$$ $$=\langle \frac{1}{2}(H+H^T)x,\xi\rangle=\langle Hx,\xi\rangle\quad\quad\quad\forall\xi\in\mathbb{R}^3$$ as $H$ is supposed symmetric here, whence $$\nabla F(x)=Hx\quad\quad\quad\forall x\in\mathbb{R}^3.$$