Calculate the gradient and Hessian of $x_0^T(X\operatorname{Diag}(w)X^T)^{-1}x_0$ w.r.t. vector $w$ in matrix calculus?

179 Views Asked by At

I'm trying to calculate the Hessian matrix of the formula $$ f(w) = x_0^T(XWX^T)^{-1}x_0 $$ where $w=(w_1,\cdots ,w_n)^T$, $W=\operatorname{Diag}(w)$ is a diagonal matrix that has $w$ as diagonal elements, $x_0$ is a column vector, $X$ is a matrix.

To calculate the first-order derivative, do I need to first transform $W$ into $$\sum_i w^T e_i E_i,$$ where $e_i$ is a column vector whose $i$-th element is $1$ and others $0$, and $E_i$ is a matrix whose $(i,i)$ element is $1$ and others $0$? Any reference is appreciated.

1

There are 1 best solutions below

1
On BEST ANSWER

$\def\o{\operatorname}\def\t{\o{Tr}}\def\v{\o{vec}}\def\d{\o{diag}}\def\D{\o{Diag}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$For typing convenience, define the symmetric matrices $$\eqalign{ A = X\,\D(w)\,X^T \quad\qquad B = A^{-1}x_0x_0^TA^{-1} \\ }$$ and use a colon as a product notation for the trace, i.e. $$\eqalign{A:B = \t(A^TB) = B:A}$$ Write the function in terms of the new variables. Then calculate the differential and gradient. $$\eqalign{ f &= x_0x_0^T:A^{-1} \\ df &= x_0x_0^T:dA^{-1} \\ &= -x_0x_0^T:A^{-1}dA\,A^{-1} \\ &= -B:dA \\ &= -B:X\,\D(dw)\,X^T \\ &= -X^TBX:\D(dw) \\ &= -\d\left(X^TBX\right):dw \\ &= -\d\left(X^TA^{-1}x_0x_0^TA^{-1}X\right):dw \\ \p{f}{w} &= -\d\left(X^TA^{-1}x_0x_0^TA^{-1}X\right) \;\doteq\; g\quad\big({\rm gradient}\big) \\ }$$ where the $\d(M)$ function takes the main diagonal from its matrix argument and returns it as a column vector, while the $\D(v)$ function takes its vector argument and creates a diagonal matrix.

NB: The properties of the trace allow the terms in a colon product to be rearranged in many equivalent ways, e.g. $$\eqalign{ A:B &= B:A = B^T:A^T \\ CA:B &= C:BA^T = A:C^TB \\ }$$


A comment asked for the Hessian.

If you define $$\eqalign{ y &= X^TA^{-1}x_0 \\ dy &= -X^TA^{-1}X\,dW\,X^TA^{-1}x_0 \\ Y &= \D(y) \\ }$$ then you can write the gradient as $$\eqalign{ g &= -\d(yy^T) = -Yy \\ }$$ The Hessian is the gradient of this, so $$\eqalign{ dg &= -\d(dy\,y^T + y\,dy^T) \\ &= -2Y\,dy \\ &= +2Y\Big(X^TA^{-1}X\,dW\,X^TA^{-1}x_0\Big) \\ &= 2YX^TA^{-1}X\,dW\,y \\ &= 2YX^TA^{-1}XY\,dw \\ \p{g}{w} &= 2 YX^TA^{-1}XY \;\doteq\; H \qquad\big({\rm Hessian}\big) \\ }$$ $(Y,dW)$ are diagonal matrices corresponding to the $(y,dw)$ vectors, therefore $$dW\,y = Y\,dw$$ The Hessian is obviously symmetric, and since $(X,Y)$ appear an even number of times, its positivity is determined by the $A^{-1}$ term, which in turn is determined by $\D(w)$, which in turn is determined by the signs on the individual $(w_k)$ elements.