Gradient of a matrix polynomial

55 Views Asked by At

Given the vector $b \in \mathbb{R}^h$ and the matrix $V \in \mathbb{R}^{h \times h}$, let $o : \mathbb{R}^{h \times h} \to \mathbb{R}^h$ be defined by

$$ o (U) : = V \sum_{i=0}^{n-1} U^i b $$

Compute the gradient $\nabla_U o$.


I found

$$ \nabla_U o (U) = \left[ \sum_{i=0} ^{n-1}\sum_{j=0} ^{n-1}\left[ U^jb U^{n-1-j} b \right] \right]$$

but a friend told me that's wrong and I can't where. Can someone please help me? Is this really wrong?

1

There are 1 best solutions below

0
On BEST ANSWER

$ \def\bbR#1{{\mathbb R}^{#1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\o{{\tt1}} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\Sk{\sum_{k=\o}^{n-1}} \def\Sj{\sum_{j=\o}^k} \def\A{A_{pq}} \def\E{E_{pq}} $Let me introduce some nicer variable names $$\eqalign{ A &= U \\ f &= o(A) \;=\; Vb \;+\; V\;{\Sk A^kb} \\ }$$ Let's assume that you reached the following point in your derivation
$\big(\bf NB\!:\:$ There was a mistake on the limits of your $j$-summation.$\,\big)$ $$\eqalign{ df &= V\LR{\Sk\Sj A^{k-j}\:\c{dA}\;A^{j-\o}b} \in\bbR{h} \\ }$$ The issue is finding the correct way to rearrange this differential expression into a proper gradient.

One way to do that is to use the component-wise$\;$self-gradient of the $A$ matrix $$\grad{A}{\A} = \E \:\in\:\bbR{h\times h} $$ where $\E$ is the so-called single-entry matrix whose $(p,q)$ component is equal to one and all other components are equal to zero.

This immediately yields the component-wise gradients of the function $$\eqalign{ \grad{f}{\A} &= V\LR{\Sk\Sj A^{k-j}\:\c{\E}\;A^{j-\o}b} \in\bbR{h} \\ }$$ But the full vector-by-matrix gradient is a third-order tensor. In order to calculate that, we need to introduce two special products.

$\tt1\big)\:$The Frobenius product $(:)$ is a concise notation for the trace $$\eqalign{ X:Y &= \sum_{i=1}^m\sum_{j=1}^n X_{ij}Y_{ij} \;=\; \trace{X^TY} \\ X:X &= \|X\|^2_F \qquad \{ {\rm Frobenius\;norm} \} \\ }$$ This is also called the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the standard dot product.

$\tt2\big)\:$The Dyadic product $(\star)$ concatenates its arguments to create a higher-order tensor, e.g. $$\eqalign{ \def\G{{\large\Omega}} \G &= X\star Y \qiq \G_{ijkl} = X_{ij}Y_{kl} \\\\ }$$ These products can be used to rearrange the differential expression into the required form $$\eqalign{ df &= V\LR{\Sk\Sj A^{k-j}\star\LR{A^{j-\o}b}} : \c{dA} \\ \grad{f}{A} &= V\LR{\Sk\Sj A^{k-j}\star\LR{A^{j-\o}b}} \in\bbR{h\times h\times h} \\ }$$

Yet another way to approach the problem is to vectorize the matrix on the RHS of the differential expression $\Big(da={\rm vec}(dA)\Big)$ and calculate the Jacobian $\,\large\LR{\grad{f}{a}}$