Gradient and Hessian of the product of two quadratic forms

469 Views Asked by At

Let $Q, R \in \mathbb{R}^{n\times n}$ such that $Q, R \succ 0$. Let $g\left(\boldsymbol{x}\right) : \mathbb{R}^{n} \to \mathbb{R}$ such that

$$g\left(\boldsymbol{x}\right)=\left(\frac{1}{2}\boldsymbol{x}^{T}Q\boldsymbol{x}\right)\left(\frac{1}{2}\boldsymbol{x}^{T}R\boldsymbol{x}\right)$$

I want to find the gradient and the Hessian of $g\left(\boldsymbol{x}\right)$.


What I tried so far

To find the gradient and the Hessian using the derivative rules and got the following:

enter image description here

but while I notice (2nd element of the last row at the hessian calculation Ive got the term $2R\boldsymbol{x}\cdot Q\boldsymbol{x}$ which is column vector times another column vector and that's obviously a mistake.on my calculations product rule of gradient but I don't sure if It's a valid rule at matrix calculus. So, how can I calculate the gradient and the hessian of $g\left(\boldsymbol{x}\right)$?

2

There are 2 best solutions below

1
On

I would suggest the following: $g(x)$ can be rewritten as $$ g(x)=\left(\frac{1}{2}\sum_{i,j=1}^nQ_{ij}x_ix_j\right)\left(\frac{1}{2}\sum_{i,j=1}^nR_{ij}x_ix_j\right) $$ where $Q_{ij}$ and $R_{ij}$ are the coefficients of the matrices $Q$ and $R$. Now you can proceed similarly as you did, taking derivative with respect to $x_k$, $k=1,\ldots,n$: $$ \frac{\partial g}{\partial x_k}=\left(\frac{1}{2}\frac{\partial}{\partial x_k}\sum_{i,j=1}^nQ_{ij}x_ix_j\right)\left(\frac{1}{2}\sum_{i,j=1}^nR_{ij}x_ix_j\right)+\left(\frac{1}{2}\sum_{i,j=1}^nQ_{ij}x_ix_j\right)\left(\frac{1}{2}\frac{\partial}{\partial x_k}\sum_{i,j=1}^nR_{ij}x_ix_j\right). $$

Can you take it from here?

6
On

The individual terms are easy to handle: $$\eqalign{ \def\LR#1{\left(#1\right)} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\p{\partial} \alpha &= \tfrac{1}{2}x^TQx, \qquad \frac{\p\alpha}{\p x} &= Qx, \qquad \frac{\p^2\alpha}{\p x\,\p x^T} &= Q \\ \beta &= \tfrac{1}{2}x^TRx, \qquad \frac{\p\beta}{\p x} &= Rx, \qquad \frac{\p^2\beta}{\p x\,\p x^T} &= R \\ \\ }$$ The calculation for their product is straight forward: $$\eqalign{ \pi &= \alpha\beta \\ \\ \frac{\p\pi}{\p x} &= \beta\frac{\p\alpha}{\p x} + \alpha\frac{\p\beta}{\p x} \\ &= \beta Qx \;+\; \alpha Rx \\ \\ \frac{\partial^2\pi}{\p x\,\p x^T} &= \beta\fracLR{\p^2\alpha}{\p x\,\p x^T} + \fracLR{\p\beta}{\p x} \fracLR{\p\alpha}{\p x^T} + \fracLR{\p\alpha}{\p x} \fracLR{\p\beta}{\p x^T} + \alpha\fracLR{\p^2\beta}{\p x\,\p x^T} \\ &= \beta Q \;+\; Rxx^TQ \;+\; Qxx^TR \;+\; \alpha R \\ }$$


Update

This update addresses ordering issues raised in the comments.

Differentials are often the best approach for matrix calculus problems because, unlike gradients, they satisfy a simple product rule: $$\eqalign{ d(A\star B) &= (A+dA)\star(B+dB) \;\;-\;\; A\star B \\ &= dA\star B + A\star dB \\ }$$ where $A$ is a {scalar, vector, matrix, tensor}, $B$ is a {scalar, vector, matrix, tensor}, and $\star$ is any product which is compatible with $A$ and $B.\;$ This includes the Kronecker, Hadamard/elementwise, Frobenius/trace and dyadic/tensor products, as well as the Matrix/dot product.

IFF the product commutes, you can rearrange the product rule to $$d(A\star B) = B\star dA + A\star dB$$ The Hadamard and Frobenius products always commute. The other products are commutative only in special situations. For example the Kronecker product commutes if either $A$ or $B$ is a scalar, and the dot product commutes if both $A$ and $B$ are real vectors.

The differential and the gradient are related and can be derived from one another, i.e. $$\frac{\p\alpha}{\p x} = Qx \quad\iff\quad d\alpha = (Qx)^Tdx = x^TQ\,dx$$ Let's examine one of the terms in the preceding hessian calculation.
First calculate its differential, and then its gradient. $$\eqalign{ y &= \alpha(Rx) = (Rx)\alpha \qquad \big({\rm the\,scalar\star vector\,product\,commutes}\big) \\ dy &= \alpha(R\,dx) + (Rx)\,d\alpha \\ &= \alpha R\,dx \;\;\,+ Rx\;x^TQ\,dx \\ &= (\alpha R+Rx\,x^TQ)\,dx \\ \frac{\p y}{\p x} &= \alpha R+Rx\,x^TQ \\ }$$