Matrix differentiation in maximum likelihood estimation - mixed model -

403 Views Asked by At

As I was reading a paper on the comparison of ML and REML estimation for linear mixed model I encountered difficulties with following the steps of estimating fixed effects. Just to get the whole context of the question, the paper goes as follow :

  1. The model being $Y \sim \mathcal{N}(X\beta, V)$ with $Y$ the response variable, $X$ the design matrix of the fixed effects, $\beta$ the fixed effects associated and $V = \underset{i}{\sum}{\tau_iK_i} + \sigma_e^2\mathbb{I}_n$, where $\tau_i$ is the random effects' variance components, $K_i$ the variance-covariance matrices of the random effects and $\sigma_e^2\mathbb{I}_n$ accounts for the random error.
  2. The parameters to estimate being $\beta$, $\tau_i$ ($\forall i$) and $\sigma_e^2$
  3. The likelihood of such model is by definition proportional to $L(\beta, \tau_1, ..., \tau_k, \sigma_e^2) = \frac{1}{\sqrt{|V|}}exp(-\frac{1}{2}(Y - X\beta)^TV^{-1}(Y-X\beta))$ so no problem here
  4. Log-likelihood can be then derived as $l(\beta, \tau_1, ..., \tau_k, \sigma_e^2) = -\frac{1}{2}log|V| -\frac{1}{2}(Y - X\beta)^TV^{-1}(Y-X\beta)$, again no problem there
  5. To get the ML estimate of $\beta$ you then have to differenciate $l(\beta, \tau_1, ..., \tau_k, \sigma_e^2)$ wrt $\beta$ and search a solution to $\frac{\partial l}{\partial \beta} = 0$ and that is where I am having difficulties

When trying myself to get the solution what I find is the following :

  • Using the formula $\frac{\partial \alpha}{\partial z} = u^T(A+ A^T)\frac{\partial u}{\partial z} \iff \alpha = u^TAu$ (with $\alpha$ and $u$ depending on $z$ and $A$ a constant matrix) and stating that $u = Y - X\beta $ in my exemple I get that

$$l(\beta, \tau_1, ..., \tau_k, \sigma_e^2) = -\frac{1}{2}log|V| -\frac{1}{2}u^TV^{-1}u$$ $$ \frac{\partial l}{\partial \beta} = -\frac{1}{2}u^T(V^{-1}+V^{{-1}^T})\frac{\partial u}{\partial \beta} $$ $$ \frac{\partial l}{\partial \beta} = -\frac{1}{2}(Y - X\beta)^T(V^{-1}+V^{{-1}^T})(-X) = (Y - X\beta)^TV^{-1}X $$

But the problem is that the paper gives as a solution this $$ \frac{\partial l}{\partial \beta} = -\frac{1}{2}(-X)(V^{-1}+V^{{-1}^T})(Y - X\beta)^T = X^TV^{-1}(Y - X\beta) $$

The "first" step of the differenciation seems to indicate that we use the same formula but the partial and the transpose are switched and I don't know why.

Is there something obvious I am missing ?

My calculus is not good and we didn't even get to matrix differenciation in class so I had to get the formula from the internet (here, proposition 13)

On a side note, if you have any recommendation for a good catch-up on the subject of matrix differenciation please tell me

1

There are 1 best solutions below

5
On BEST ANSWER

Here's a way to deal with the problem without resorting to fancy matrix calculus formulas.

Let $f:\beta \mapsto (Y - X\beta)^TV^{-1}(Y-X\beta)$.

Since $V$ is symmetric and positive definite, so is $V^{-1}$ and $$\begin{align} f(\beta + h)&=(Y-X\beta)^TV^{-1}(Y-X\beta)-(Y-X\beta)^TV^{-1}Xh-(Xh)^TV^{-1}(Y-X\beta)+(Xh)^TV^{-1}Xh\\ &= (Y-X\beta)^TV^{-1}(Y-X\beta)-(Y-X\beta)^TV^{-1}Xh-(V^{-1}(Y-X\beta))^TXh+(Xh)^TV^{-1}Xh\\ &=(Y-X\beta)^TV^{-1}(Y-X\beta)-(Y-X\beta)^TV^{-1}Xh-(Y-X\beta)^TV^{-1}Xh+(Xh)^TV^{-1}Xh\\ &=(Y-X\beta)^TV^{-1}(Y-X\beta)-2(Y-X\beta)^TV^{-1}Xh+h^TX^TV^{-1}Xh \end{align} $$

This means two things: the Hessian of $f$ at $\beta$ is $X^TV^{-1}X$, which is at least positive semi-definite (prove it), hence $f$ is convex.

Moreover, the gradient of $f$ at $\beta$ is $(-2(Y-X\beta)^TV^{-1}X)^T=-2X^TV^{-1}(Y-X\beta)$.

$-\frac 12f$ is concave, so it reaches a global maximum whenever its gradient is $0$, that is to say iff $$X^TV^{-1}(Y-X\beta)=0$$