Likelihood differentiation with respect to matrix

77 Views Asked by At

I am going through a paper about likelihood estimation of panel data. In the appendix A.1 the likelihood which was developed earlier in the text is concentrated w.r.t. parameters which have closed form solutions.

There are two steps. In the second one a differential of the likelihood is computed and compared to zero, because we are looking for the maximum. I am new to matrix calculus and struggling with obtaining the same result for the differential as in the paper.

I believe I am missing something from matrix calculus, hence in the following I will use simplified notation to avoid using all indices from the paper. So what I have is:

$ L(X) \propto a \cdot \ln \det X - tr(HX) $

where $H = (A+BX^{-1})^TQ(A+BX^{-1})$ is a matrix which depends on $X$. $A$, $B$ and $Q$ are constant matrices (during differentiation) and $a$ is a constant (scalar). I need to find differential of $L$ with respect to $X$. In the paper the matrix $G_{22}$ plays the role of $X$. Matrices $X$, $H$ and $Q$ are symmetric.

Using matrix cookbook I've managed to obtain the following differential:

$dL = a \cdot tr(X^{-1}dX) - tr(dHX+HdX)$

I have a problem with calculating the differential $dH$. I believe it should be equal to zero, because the part disappears in the paper. What I've got so far is:

$dH = d(A+BX^{-1})^T \cdot Q(A+BX^{-1}) + (A+BX^{-1})^T \cdot d Q(A+BX^{-1}) = (BX^{-1}dXX^{-1})^T Q (A+BX^{-1}) + (A+BX^{-1})^T Q BX^{-1}dXX^{-1} = ((A+BX^{-1})^TQBX^{-1}dXX^{-1})^T + (A+BX^{-1})^TQBX^{-1}dXX^{-1}$

Since $H$ and $X$ are symmetric I believe $dH$ and $dX$ are symmetric as well, but this still doesn't allow me to deduce that $dH = 0$. What am I missing?

PREMIUM QUESTION: If there are any introductory books on matrix calculus which you could recommend to me I would be extremely thankful.

1

There are 1 best solutions below

3
On BEST ANSWER

$ \def\l{\left} \def\r{\right} \def\lr#1{\l(#1\r)} \def\LR#1{\Big(#1\Big)} \def\p{\partial} \def\vec#1{\operatorname{vec}\lr{#1}} \def\trace#1{\operatorname{Tr}\lr{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $For typing convenience, define the matrix variable $$M=A+BX^{-1} \quad\implies\quad H = M^TQM $$ and introduce the Frobenius product notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{AB^T} \\ A:A &= \big\|A\big\|^2_F \\ }$$ $\big($Note that $H^T=H$ implies that $Q^T=Q.\big)$

Rewrite the differential that you're having trouble with as $$\eqalign{ \trace{X\,dH} &= X:dH \\ &= {X}:\lr{dM^TQM+M^TQ\,dM} \\ &= \lr{X+X^T}:{M^TQ\,dM} \\ &= 2QMX:dM \\ &= 2QMX:B\,dX^{-1} \\ &= -2B^TQMX:X^{-1}dX\,X^{-1} \\ &= -2X^{-1}B^TQM:dX \\ &= -2X^{-1}B^TQ\lr{A+BX^{-1}}:dX \\ }$$ Combining this result with the rest of your differential yields $$\eqalign{ dL &= aX^{-1}:dX-2X^{-1}B^TQ\lr{A+BX^{-1}}:dX - H:dX \\ \grad{L}{X} &= aX^{-1} -2X^{-1}B^TQ\lr{A+BX^{-1}} - H \\\\ }$$


NB: The properties of the underlying trace function allow the terms in a Frobenius to be rearranged in many different ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ A:BX &= B^TA:X = AX^T:B \\ }$$ In particular, the first property means that the product is commutative, and this makes working with the Frobenius product of matrix variables almost as simple as working with products of scalar variables except for those pesky transpose rules in the third line.

However, when you're dealing with symmetric matrices, you can ignore the transposes and just pay attention to the relative ordering of the variables.


As for book recommendations, the standard text is probably Magnus and Neudecker's Matrix Differential Calculus, although personally I prefer Hjorungnes's Complex-Valued Matrix Derivatives.