As I was reading a paper on the comparison of ML and REML estimation for linear mixed model I encountered difficulties with following the steps of estimating fixed effects. Just to get the whole context of the question, the paper goes as follow :
- The model being $Y \sim \mathcal{N}(X\beta, V)$ with $Y$ the response variable, $X$ the design matrix of the fixed effects, $\beta$ the fixed effects associated and $V = \underset{i}{\sum}{\tau_iK_i} + \sigma_e^2\mathbb{I}_n$, where $\tau_i$ is the random effects' variance components, $K_i$ the variance-covariance matrices of the random effects and $\sigma_e^2\mathbb{I}_n$ accounts for the random error.
- The parameters to estimate being $\beta$, $\tau_i$ ($\forall i$) and $\sigma_e^2$
- The likelihood of such model is by definition proportional to $L(\beta, \tau_1, ..., \tau_k, \sigma_e^2) = \frac{1}{\sqrt{|V|}}exp(-\frac{1}{2}(Y - X\beta)^TV^{-1}(Y-X\beta))$ so no problem here
- Log-likelihood can be then derived as $l(\beta, \tau_1, ..., \tau_k, \sigma_e^2) = -\frac{1}{2}log|V| -\frac{1}{2}(Y - X\beta)^TV^{-1}(Y-X\beta)$, again no problem there
- To get the ML estimate of $\beta$ you then have to differenciate $l(\beta, \tau_1, ..., \tau_k, \sigma_e^2)$ wrt $\beta$ and search a solution to $\frac{\partial l}{\partial \beta} = 0$ and that is where I am having difficulties
When trying myself to get the solution what I find is the following :
- Using the formula $\frac{\partial \alpha}{\partial z} = u^T(A+ A^T)\frac{\partial u}{\partial z} \iff \alpha = u^TAu$ (with $\alpha$ and $u$ depending on $z$ and $A$ a constant matrix) and stating that $u = Y - X\beta $ in my exemple I get that
$$l(\beta, \tau_1, ..., \tau_k, \sigma_e^2) = -\frac{1}{2}log|V| -\frac{1}{2}u^TV^{-1}u$$ $$ \frac{\partial l}{\partial \beta} = -\frac{1}{2}u^T(V^{-1}+V^{{-1}^T})\frac{\partial u}{\partial \beta} $$ $$ \frac{\partial l}{\partial \beta} = -\frac{1}{2}(Y - X\beta)^T(V^{-1}+V^{{-1}^T})(-X) = (Y - X\beta)^TV^{-1}X $$
But the problem is that the paper gives as a solution this $$ \frac{\partial l}{\partial \beta} = -\frac{1}{2}(-X)(V^{-1}+V^{{-1}^T})(Y - X\beta)^T = X^TV^{-1}(Y - X\beta) $$
The "first" step of the differenciation seems to indicate that we use the same formula but the partial and the transpose are switched and I don't know why.
Is there something obvious I am missing ?
My calculus is not good and we didn't even get to matrix differenciation in class so I had to get the formula from the internet (here, proposition 13)
On a side note, if you have any recommendation for a good catch-up on the subject of matrix differenciation please tell me
Here's a way to deal with the problem without resorting to fancy matrix calculus formulas.
Let $f:\beta \mapsto (Y - X\beta)^TV^{-1}(Y-X\beta)$.
Since $V$ is symmetric and positive definite, so is $V^{-1}$ and $$\begin{align} f(\beta + h)&=(Y-X\beta)^TV^{-1}(Y-X\beta)-(Y-X\beta)^TV^{-1}Xh-(Xh)^TV^{-1}(Y-X\beta)+(Xh)^TV^{-1}Xh\\ &= (Y-X\beta)^TV^{-1}(Y-X\beta)-(Y-X\beta)^TV^{-1}Xh-(V^{-1}(Y-X\beta))^TXh+(Xh)^TV^{-1}Xh\\ &=(Y-X\beta)^TV^{-1}(Y-X\beta)-(Y-X\beta)^TV^{-1}Xh-(Y-X\beta)^TV^{-1}Xh+(Xh)^TV^{-1}Xh\\ &=(Y-X\beta)^TV^{-1}(Y-X\beta)-2(Y-X\beta)^TV^{-1}Xh+h^TX^TV^{-1}Xh \end{align} $$
This means two things: the Hessian of $f$ at $\beta$ is $X^TV^{-1}X$, which is at least positive semi-definite (prove it), hence $f$ is convex.
Moreover, the gradient of $f$ at $\beta$ is $(-2(Y-X\beta)^TV^{-1}X)^T=-2X^TV^{-1}(Y-X\beta)$.
$-\frac 12f$ is concave, so it reaches a global maximum whenever its gradient is $0$, that is to say iff $$X^TV^{-1}(Y-X\beta)=0$$