The derivative of a linear operator with respect to its argument

442 Views Asked by At

How can one differentiate an expression involving a linear operator with respect to the argument. I never met something like this yet.

Assume $\mathcal{A}\colon \mathbb{R}^{n} \to \mathbb{R}^{p\times q}$ is a linear operator defined on $n$-dimensional real vectors. An example of this can be the Hankel operator. Given $g = [g1 \dots g5]^T$, The Hankel operator $\mathcal{H}$ takes $g$ as its argument and returns $\mathcal{H}(g)=\begin{bmatrix} g1 &g2&g3\\g2&g3&g4\\g3&g4&g5 \end{bmatrix}$. One might be interested in the operator $\mathcal{A}(g) = W_1\mathcal{H}(g)W_2$ for any constant matrices $W_1$ and $W_2$ with compatible dimensions.

How can we differentiate a scalar function of $g$ involving $\mathcal{A}$?

Here is an example: $\operatorname{trace}(\mathcal{A}(g)^T\mathcal{A}(g)) + q^Tg$. If we want to minimize this functional by deciding the value of $g$, for a given linear operator $\mathcal{A}$ and a constant vector $q \in \mathbb{R}^n$, one would like to set the derivative to $0$.

I came across this in the literature: Given a vector $a$ and constant matrices $Z$, $H$ and $X$, consider the scalar function $\frac{1}{2}(g-a)^T H (g-a) + \text{trace}(Z^T (\mathcal{A}(g)-X)) +\frac{\rho}{2}\|\mathcal{A}(g)-X\|_F^2$ as a function of $g$. The norm is the Frobenius norm. To minimize with respect to $g$ the author says that we should set the derivative with respect to $g$ to $0$ to get

$(H+\rho M)g^* = \mathcal{A}_{\text{adj}}(\rho X -Z) + Ha$

in which $M$ is a positive semidefinite matrix given by the solution of

$Mz = \mathcal{A}_{\text{adj}}(\mathcal{A}(z)) \;\;\forall z\in\mathbb{R}^n$

I do not know how to come about showing this. I don't see where the adjoint comes from.

1

There are 1 best solutions below

0
On

Consider the matrix version of the problem where $g \in {\mathbb R}^{n\times m}$, and after finding a solution, the vector problem can be recovered by setting $m=1$.

Let $T \in {\mathbb R}^{p\times q\times n\times m}$, and define the following quantities $$ \eqalign { A &= T:g \cr b &= g-a \cr C &= A-X \cr S &= {\rm sym}(H) \equiv \frac {1}{2}(H+H') \cr } $$ Now the function that you saw in the literature can be written as $$ \eqalign { f &= {\frac {1}{2}}H:bb' + Z:C + {\frac {\rho}{2}}C:C \cr } $$ The differential of the function is $$ \eqalign { df &= H:{\rm sym}(db\,b') + Z:dC + \rho C:dC \cr &= {\rm sym}(H):db\,b' + (\rho C + Z):dC \cr &= Sb:db + (\rho C + Z):dA \cr &= S(g-a):dg + (\rho A -\rho X + Z):dA \cr &= S(g-a):dg + (\rho A -\rho X + Z):T:dg \cr &= (Sg-Sa + \rho A:T -\rho X:T + Z:T):dg \cr } $$ So the derivative is $$ \eqalign { \frac {\partial f}{\partial g} &= Sg - Sa + \rho A:T -\rho X:T + Z:T \cr &= Sg - Sa + \rho T':T:g -\rho X:T + Z:T \cr } $$ Setting the derivative to zero yields $$ \eqalign { Sg + \rho T':T:g &= Sa + (\rho X-Z):T \cr } $$ Now is a good time to set $m=1$ so that $T$ becomes a third-order tensor and we need to solve this equation for the vector $g$ $$ \eqalign { (S + \rho T':T)g &= Sa + (\rho X-Z):T \cr } $$ Letting $M=T':T$ and assuming $H$ is symmetric so that $S=H$, one obtains $$ \eqalign { (H + \rho M)g &= T':(\rho X-Z) + Ha \cr } $$ which is as close as I can get to the literature that you've cited.

NB: The transpose operation for the third order tensor is such that $Z:T = T':Z$ or in index notation $Z_{ij}T_{ijk} = {T'}_{kij}Z_{ij}$, which means ${T'}_{kij}=T_{ijk}$