Gradient of $x'\left(\sum_i x_i A_i\right)^{-1}x$

87 Views Asked by At

I want to compute the gradient of $$f(x)=x'\left(\sum_i x_i A_i\right)^{-1}x$$ where $x$ is a $n\times 1$ column vector and $A_i$ is a square matrix.

One option is to differentiate with respect to a single $x_j$ and then reconstitute the gradient but this gets very messy.

The other option is to use matrix calculus rules. One can write $$ C:=\sum_i x_i A_i = (x'\otimes I_n)B$$ where $B$ is a block column vector: $B=\left[A_{1},\dots,A_{n}\right]'.$

Then using the rules for differentiating a matrix inverse

$$ df=2C^{-1}xdx-x'C^{-1} \left(dC\right) C^{-1}x. $$ Finally, $$ dC=(dx'\otimes I_n)B. $$ But I am stuck there. How can I move from this to an expression for $\frac{df}{dx}$?

EDIT: Corrected some mistakes that commenters noticed.

2

There are 2 best solutions below

2
On BEST ANSWER

I assume by $x'$ you mean the transpose and by $\otimes$ you mean the Kronecker product.

You're $C$ is not quite right, we should have $$ C = \bigl(x_1I_n,\; x_2I_n,\; \dotsc\bigr)\begin{pmatrix}A_1\\A_2\\\vdots\end{pmatrix} = (x'\otimes I_n)B $$ where $B = \bigl(A_1, A_2,\dotsc\bigr)'$.

Your $df$ is not quite right; we need to replace $C^{-1}$ with its symmmetrization in the first term: $$ df = x'(C^{-1} + (C^{-1})')dx - x'C^{-1}(dC)C^{-1}x. $$ Otherwise you are are the right track. What we need to do is get $df$ in the form $df = y'(dx) = (dx)'y$, whence $y$ is the gradient. So let's focus on the last term: $$ x'C^{-1}(dx'\otimes I_n)BC^{-1}x. $$ What we should note is that $$ (dx'\otimes I_n)B = \sum_{i=1}^ndx_iA_i $$ so then $$ x'C^{-1}(dC)C^{-1}x = \sum_{i=1}^ndx_i\,x'C^{-1}A_iC^{-1}x. $$ The quantities $b_i = x'C^{-1}A_iC^{-1}x$ are scalars and arrange into a vector $b$. Now $$ df = x'(C^{-1} + (C^{-1})')dx - b'dx = \Bigl[(C^{-1} + (C^{-1})')x - b\Bigl]'dx $$ so the gradient is $$ (\nabla f)_i = \Bigl[(C^{-1} + (C^{-1})')x - b\Bigr]_i = \sum_{j=1}^n(C_{ij}^{-1} + C_{ji}^{-1})x_j - x'C^{-1}A_iC^{-1}x. $$

0
On

Let $\{e_k\}$ denote the standard basis vectors and $\{\star\}$ the dyadic/tensor product.

Then define the variables $$\eqalign{ \def\A{{\cal A}} \def\M{M^{-1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \A &= \sum_k A_k\star e_k &\qiq \A_{pqr} &= \sum_k \LR{A_k}_{pq} \: \LR{e_k}_{r}= \sum_k \LR{A_k}_{pq} \; \delta_{kr} \;\equiv\; \LR{A_r}_{pq} \\ M &= \A\cdot x &\qiq dM &= \A\cdot dx \\ W &= \M &\qiq dW &= -W\:dM\:W\\ b &= W^T x \\ }$$ where $\:\{b,e_k\}$ are vectors, $\:\{M,W\}$ are matrices, and $\,\A$ is a third-order tensor.

Next, calculate the differential and gradient of the given function $$\eqalign{ f &= x^TWx \;\,=\,\; xx^T:W \\ df &= dx^TWx + x^TWdx \;-\; {xx^T}:\LR{W\:dM\:W} \\ &= \LR{Wx + W^Tx}\cdot dx \;-\; {bb^T:\A}\cdot dx \\ \grad{f}{x} &= Wx + W^Tx - bb^T:\A \\ &= \LR{W^T+W}x \;-\; \sum_k \LR{bb^T:A_k} \star e_k \\ &= \LR{W^T+W}x \;-\; \sum_k \LR{b^TA_k\,b} e_k \\ }$$ If you're more comfortable with index notation, the $\{:\}$ products can be evaluated as $$\eqalign{ f &= xx^T:W \\ &= \sum_p\sum_q \:x_p\,x_q\,W_{pq} \\ &= x^TWx \\ \\ g &= bb^T:\A \\ g_r &= \sum_p\sum_q \:b_p\,b_q\,\A_{pqr} \\ &= \sum_p\sum_q \:b_p\,b_q\,\LR{A_r}_{pq} \\ g_r &= b^T A_r\: b \qiq g = \sum_r v_r\,e_r \\ }$$