Antiderivative of a linear matrix expression

118 Views Asked by At

Let $f:\mathbb{R}^{n\times m}\rightarrow \mathbb {R}$ be a function that takes an $n\times m$ matrix $X$ and maps it to the real line. Suppose that the derivative of $f$ with respect to one element $X_{ij}$ is $$ \frac{df}{dX_{ij}}=a^\top XE1_jb_i $$ where $a$ is an $n\times 1$ vector, $E$ is an $m\times m$ positive definite matrix, $b$ is an $n\times 1$ vector, and $1_j$ is the $j$th standard basis vector of $\mathbb{R}^m$.

My question is: What is $f$? Does such a $f$ even exist?

If $a=b$, I believe that the answer is

$$f=\frac{1}{2}a^\top X E X^\top a+\text{cte},$$

but I can't seem to generalize this result. Similarly, if $m=n=1$ it is trivial to find $f$.

3

There are 3 best solutions below

0
On BEST ANSWER

$\def\rank{\mathrm{rank}}$ $\def\pdv#1#2{\frac{\partial #1}{\partial #2}}$ $\def\inv#1{\left( #1 \right)^{-1}}$ $\def\nspace#1{\mathrm{null}\left( #1 \right)}$ $\def\dim#1{\mathrm{dim}\left( #1 \right)}$ $\def\null#1{\mathrm{null}\left( #1 \right)}$ $\def\tinybox#1{\mbox{\tiny #1}}$ $\def\range#1{\mathrm{range}\left( #1 \right)}$

$1_j$ is the $j$th standard basis vector of $\mathbb{R}^m$, $e_j$, so $Ee_j$ is equal to the $j$th column of $E$ and the expression you have written is $$\pdv{ f}{ X_{ij}}=\sum_{k=1}^n \sum_{l=1}^m a_k X_{kl} E_{lj} b_i$$ Lets treat $f$ as a multivariate function where the variables are the elements of the matrix $X$. The expression in the r.h.s. contains the term $a_i X_{ij} E_{jj} b_i $; we can only get this term in the partial derivative if $f=\frac{1}{2} a_i X_{ij} E_{jj }X_{ij} b_i+\ldots$. We also have $$ \sum_{\substack{k=1 \\ k\neq i}}^n \sum_{\substack{ l=1 \\ l \neq j}}^m a_k X_{kl} E_{lj} b_i $$ which is the partial derivative w.r.t. $X_{ij}$ of, $$\sum_{\substack{k=1 \\ k\neq i}}^n \sum_{\substack{ l=1 \\ l \neq j}}^m a_k X_{kl} E_{lj} X_{ij} b_i$$ So we can write, $$ f = \frac{1}{2} a_i X_{ij} E_{jj }X_{ij} b_i + \sum_{\substack{k=1 \\ k\neq i}}^n \sum_{\substack{l=1 \\ l\neq j}}^m a_k X_{kl} E_{lj} X_{ij} b_i + g $$
where $g$ is not a function of $X_{ij}$.

Now lets derive $\pdv{ f}{ X_{pr}}$: $\frac{1}{2} a_i X_{ij} E_{jj }X_{ij} b_i$ does not contain $X_{pr}$; the second term is a linear function of $X_{pr}$ so we get $a_p E_{rj} X_{ij} b_i$; the third term is $\pdv{ g}{ X_{pr}}$. So we must have, $$ \pdv{ f}{ X_{pr}}= a_p E_{rj} X_{ij} b_i+ \pdv{ g}{ X_{pr}} = \sum_{k=1}^n \sum_{l=1}^m a_k X_{kl} E_{lr} b_p $$ The $X_{ij}$ term on the r.h.s. has coefficient $a_i E_{jr} b_p$; we have already established that $g$ is not a function of $X_{ij}$. So we must have $a_p E_{rj} b_i = a_i E_{jr} b_p$; this is possible only if $a=b$ and $E_{rj}=E_{jr}$ (i.e. the matrix is symmetric). So unless these conditions hold the expression you wrote does not have an antiderivative.

1
On

A better way to write the desired derivative is $$\eqalign{ \def\b{\beta} \def\a{\alpha} \def\h{\tfrac12} \frac{\partial f}{\partial X_{ij}} &= \big(ba^TXE\big)_{ij} \\ }$$ Consider the function $$\eqalign{ f &= \h{\rm Trace}\big(ba^TXEX^T\big) \quad\implies\quad \frac{\partial f}{\partial X_{ij}} &= \h\big(ba^TXE + ab^TXE\big)_{ij} \\ }$$ Then a derivative of the desired form occurs if $\,b=a,\,$ as you noted.

Also as you noted, if $\,n={\tt1},\,$ then it is easy to find a function which produces the desired derivative because $i$ gets fixed at $i={\tt1},\,$ the vectors $(a,b)\,$ collapse to scalars $(\a,\b)\,$ and $X$ to a row vector $x^T$ $$\eqalign{ f &= \h{\rm Trace}\big(\a\b\,x^TEx\big) \quad\implies\quad \frac{\partial f}{\partial x_{j}^T} &= \big(\b\a\,x^TE\big)_{j} \\ }$$


If you replace the outer product $ba^T$ by a general matrix $A,\,$ then $$\eqalign{ f &= \h{\rm Trace}\big(AXEX^T\big) \quad\implies\quad \frac{\partial f}{\partial X_{ij}} &= \h\big(AXE + A^TXE^T\big)_{ij} \\ }$$ If both $A\ {\rm and}\ E$ are symmetric then you get a nice anti-derivative for $$\eqalign{ \frac{\partial f}{\partial X_{ij}} &= \big(AXE\big)_{ij} }$$ otherwise I don't think the anti-derivative exists.

0
On

$ \def\k{\otimes} \def\h{\tfrac12\,} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\sym#1{\op{Sym}\LR{#1}} \def\skew#1{\op{Skew}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qif{\quad\iff\quad} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Before jumping into matrix variables, consider the easier vector problem $$ \grad fx = Mx \qif f = \h x^TMx $$ The anti-derivative in this case is a $\sf Quadratic\ Form\,$ and $M$ is a symmetric matrix.

Now take @greg's first formula, drop the indices, and vectorize it $ \LR{\,x=\vc X} $ $$\eqalign{ \grad fX &= \LR{ba^TXE} \qif \grad fx = \LR{E^T\k ba^T}x \\ }$$ This gets us back to the vector case, wherein the coefficient matrix must be symmetric.
Let's check $$\eqalign{ \LR{E^T\k ba^T}^T = \LR{E\k ba^T}^T = \LR{E^T\k ab^T} \ne \LR{E^T\k ba^T} }$$ Since $E$ is symmetric this almost works, but the $ba^T$ term is not symmetric, unless $b=a$ (as others have pointed out).