Derivative of complex quadratic form-functions with respect to a vector

86 Views Asked by At

I have a quadratic form like this:

$Q_1=v_1^TC_1(x)v_1$

$Q_2=v_2^TC_2(y)v_2$

where

$x,y,z$ - $3 \times 1$-vectors

$J_1,J_2,C_1,C_2$ - $3 \times 3$-matrix

$v_1=x-(J_1(x)y-z),v_2=x-(J_2(y)y-z)$

Derivatives-by-vector will be determined by the chain rule:

$\frac{dQ_1}{dx}=\frac{dv_1}{dx}^TC_1v_1+v_1^T\frac{dC_1}{dx}v_1+v_1^TC_1\frac{dv_1}{dx}$

$\frac{dQ_2}{dy}=\frac{dv_2}{dy}^TC_2v_2+v_2^T\frac{dC_2}{dy}v_2+v_2^TC_2\frac{dv_2}{dy}$

It's obvious that:

$\frac{dv_1}{dx}=\frac{dv_1}{dx}^T$ and $\frac{dv_2}{dy}=\frac{dv_2}{dy}^T$

If I'm not mistaken, then the derivative $v_1,v_2,v_3$ by vectors:

$\frac{dv_1}{dx}=I-(\frac{dJ_1}{dx}(y \otimes I))$

$\frac{dv_2}{dy}=-(\frac{dJ_2}{dy}(y \otimes I)+J_2)$

But the problem is that complex vector-matrix symbolic calculations are difficult to do, and the capabilities of software packages are limited. Let's assume that for the most part I work with expressions of the types in question. What will the derivatives of quadratic form-functions with respect to vectors look like in a simpler form?

2

There are 2 best solutions below

1
On BEST ANSWER

$ \def\bbR#1{{\mathbb R}^{#1}} \def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Start with the $v_\o$ variable, but for ease of typing we'll drop all subscripts. $$\eqalign{ v &= x+z - Jy \\ dv &= dx - J\,dy - dJ\,y \\ &= dx - J\,dy - \LR{y^T\otimes I}\vecc{dJ} \\ \grad{v}{x} &= \grad{x}{x} - J\,{\gradLR yx} - \LR{y^T\otimes I}\gradLR{\vecc{J}}{x} \\ &= I - \LR{y^T\otimes I}\gradLR{j}{x} \\ }$$ Then move on to the first quadratic form $$\eqalign{ Q &= vv^T:C \\ &= \vecc{vv^T}:\vecc C \\ &= \LR{v\otimes v}:c \\ dQ &= \LR{v\otimes v}:dc \;+\; C:\LR{dv\,v^T+v\,dv^T} \\ &= \LR{v\otimes v}:dc \;+\; \LR{C+C^T}v:dv \\ &= \LR{v\otimes v}:\gradLR{c}{x}dx \;+\; \LR{C+C^T}v:\gradLR{v}{x}dx \\ &= \gradLR{c}{x}^T\LR{v\otimes v}:dx \;+\; \gradLR{v}{x}^T\LR{C+C^T}v:dx \\ \grad{Q}{x} &= \gradLR{c}{x}^T\LR{v\otimes v} \;+\; \gradLR{v}{x}^T\LR{C+C^T}v \\ &= \gradLR{c}{x}^T\LR{v\otimes v} \;+\; \LR{I - \LR{y^T\otimes I}\gradLR{j}{x}}^T\LR{C+C^T}v \\ &= \gradLR{c}{x}^T\LR{v\otimes v} \;+\; \LR{C+C^T}v \;-\; \gradLR{j}{x}^T{\LR{y\otimes I}}\LR{C+C^T}v \\ \\ }$$ In this derivation, we vectorized some of the matrix variables. This allowed us to avoid matrix-by-vector gradients (which are tensor-valued) and write the result using standard vector-by-vector gradients (which are matrix-valued).

We also introduced the symbols $(\otimes)$ to denote the Kronecker product and $(:)$ to denote the Frobenius product $-$ which is really just a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$ When applied to vector variables, the Frobenius product reduces to the standard dot product.

The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A \\&= \LR{A^TC}:B \\ }$$

4
On

Here are some steps to consider, with 3×3 matrices.

A general matrix C can be decomposed into three matrices, each derived from a single vector.

$$ C = \mathrm{diag}(\vec{d}) + \vec{c} \odot \vec{c} + [ \vec{w} \times] $$

where $\odot$ is the outer product, ${\rm diag}()$ creates a matrtix from the vector in its diagonal elements and $[\vec{w}\times]$ is the skew-symmetric cross product operator matrix

$$ {\rm diag}(\vec{d}) = \begin{bmatrix} d_1 & & \\ & d_2 & \\ & & d_3 \end{bmatrix} $$

$$ \vec{c} \odot \vec{c} = \begin{bmatrix} c_1^2 & c_1 c_2 & c_1 c_3 \\ c_1 c_2 & c_2^2 & c_2 c_3 \\ c_1 c_3 & c_2 c_3 & c_3^2 \end{bmatrix}$$

$$ [\vec{w}\times] = \begin{bmatrix} 0 & -w_3 & w_2 \\ w_3 & 0 & -w_1 \\ -w_2 & w_1 & 0 \end{bmatrix}$$

The definition of each vector is

$$ \vec{w} = \pmatrix{ \frac{C_{3,2}-C_{2,3}}{2} \\ \frac{C_{1,3}-C_{3,1}}{2} \\ \frac{C_{2,1}-C_{1,2}}{2} } \\ $$

$$ \vec{d} = \pmatrix{ C_{1,1} - \frac{ (C_{2,1}+C_{1,2}) (C_{3,1}+C_{1,3})}{2 \sqrt{C_{3,2}+C_{2,3}} } \\ C_{2,2} - \frac{ (C_{2,1}+C_{1,2}) (C_{3,2}+C_{2,3})}{2 \sqrt{C_{3,1}+C_{1,3}} } \\ C_{3,3} - \frac{ (C_{3,1}+C_{1,3}) (C_{3,2}+C_{2,3})}{2 \sqrt{C_{3,1}+C_{1,2}} } } $$

and

$$\vec{c} = \pmatrix{ \sqrt{ \frac{ (C_{2,1}+C_{1,2}) (C_{3,1}+C_{1,3})}{2 (C_{3,2}+C_{2,3}) } }\\ \sqrt{ \frac{ (C_{2,1}+C_{1,2}) (C_{3,2}+C_{2,3})}{2 (C_{3,1}+C_{1,3}) } }\\ \sqrt{ \frac{ (C_{3,1}+C_{1,3}) (C_{3,2}+C_{2,3})}{2 (C_{2,1}+C_{1,2}) } } } $$

So now the derivative of $C$ can be evaluated with linear algebra in terms of the derivatives of each vector

$$\tfrac{{\rm d}}{{\rm d}t}C=\mathrm{diag}(\left(\tfrac{{\rm d}}{{\rm d}t}\vec{d}\right))+2\left(\tfrac{{\rm d}}{{\rm d}t}\vec{c}\right)\odot\vec{c}+[\left(\tfrac{{\rm d}}{{\rm d}t}\vec{w}\right)\times]$$

and each part maintains its structure in general, which means it can be used for further simplifications. The exception is $\vec{c} \odot \vec{c}$ is symmetric, whereas $ \tfrac{\rm d}{{\rm d}t} \vec{c} \odot \vec{c}$ isn't.