Gradient of a function of several variables

41 Views Asked by At

I consider the function $q(b) : \mathbb{R}^p\to\mathbb{R}$ defined by $q(b) = b^{t}Ab$ where $A\in\mathcal{M}_{p}(\mathbb{R})$ and I would like to compute its gradient. For this I consider the partial derivative

$$ \frac{\partial q}{\partial\beta_i} = \frac{q(b+he_i) - q(b)}{h} = \frac{h(e_i^{t}Ab + b^{t}Ae_{i}) + h^2q(e_i)}{h} $$

where $e_i$ is the i-th vector of the canonical basis of $\mathbb{R}^p$ and $h\in\mathbb{R}$. The term $hq(e_i)$ will go to zero and we have this term :

$$ \frac{h(e_i^{t}Ab + b^{t}Ae_{i})}{h} =e_i^{t}Ab + b^{t}Ae_{i} $$

The conclusion is almost here but I would like to find a more readable form for this term, I have tried to use the transpose but I cannot see a better way to write this. Have you some ideas ? If I found such a form, the conclusion will be

Thank you a lot !

1

There are 1 best solutions below

1
On BEST ANSWER

To continue from where you left off, $$\mathbf{b}^\intercal A \mathbf{e}_i=\left(\mathbf{b}^\intercal A \mathbf{e}_i\right)^\intercal=\mathbf{e}_i^\intercal A^\intercal \mathbf{b}$$ because a quadratic form is just a scalar, and the transpose of a scalar is itself. Therefore,

$$\mathbf{e}_i^\intercal A \mathbf{b}+\mathbf{b}^\intercal A \mathbf{e}_i=\mathbf{e}_i^\intercal A \mathbf{b}+\mathbf{e}_i^\intercal A^\intercal \mathbf{b}=\mathbf{e}_i^\intercal \left(A+A^\intercal \right)\mathbf{b}$$

So, $$\nabla q(\mathbf{b})=\left(A+A^\intercal \right)\mathbf{b}$$

If instead we defined $q(\mathbf{b})=\frac{1}{2}\mathbf{b}^\intercal A \mathbf{b}$, we would get $$\nabla q(\mathbf{b})=\left(\frac{A+A^\intercal}{2}\right)\mathbf{b}$$

This should look familiar: $\frac{A+A^\intercal}{2}$ is the symmetric part of $A$. In fact, if $A$ is symmetric, then $\frac{A+A^\intercal}{2}=A$ and thus $$\nabla q(\mathbf{b})=A\mathbf{b}$$

This suggests that the value of the quadratic form only depends on the symmetric part of $A$, which actually turns out to be true. This is because if we write $A=B+C$ where $B$ is symmetric and $C$ is antisymmetric, $$\mathbf{x}^\intercal C\mathbf{x}=\left(\mathbf{x}^\intercal C\mathbf{x}\right)^\intercal=\mathbf{x}^\intercal C^\intercal\mathbf{x}=-\mathbf{x}^\intercal C\mathbf{x}$$ which implies $\mathbf{x}^\intercal C\mathbf{x}=0$, meaning $$\mathbf{x}^\intercal A\mathbf{x}=\mathbf{x}^\intercal B\mathbf{x}+\mathbf{x}^\intercal C\mathbf{x}=\mathbf{x}^\intercal B\mathbf{x}$$