Direct solution to maximum likelihood computation problem using the derivative of multivariate Gaussian w.r.t. covariance matrix

194 Views Asked by At

For an application, I need to compute the maximum loglikelihood of data coming from a $d$-dimensional multivariate Gaussian random variable: $$ \textbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) $$ where the covariance matrix $\Sigma$ is a function two scalars $\sigma, \gamma$ such that $\Sigma = \sigma V + \gamma I$ and $V$ is symmetric and independent of both $\gamma$ and $\sigma$.

I have already computed the maximum likelihood using R optim function. However, I was wondering if I could directly compute the optimal $\sigma$ and $\gamma$ using some closed-form expression.

For this, I tried to follow the steps given in the answer provided by @greg for a similar question (Derivation of derivative of multivariate Gaussian w.r.t. covariance matrix). However I am stuck at the step which involves expanding $\Sigma^{-1} = (\sigma V + \gamma I)^{-1}$.

My derivations are as follows. Using the same notations in the answer for $\Sigma=S$, $Z = X-\mu1$, $A : B = \text{tr}(A^TB)$,

\begin{align} dL &= (S^{-1} - S^{-1} Z Z^T S^{-1}) : dS \\ &= (S^{-1} - S^{-1} Z Z^T S^{-1}) : (d\sigma V + \sigma dV + d\gamma I) \end{align}

Setting $dV=0$ and $d\gamma=0$ to get partial derivative w.r.t $\sigma$, \begin{align} dL &= (S^{-1} - S^{-1} Z Z^T S^{-1}) : (d\sigma V) \\ & = d\sigma\, \text{tr}((S^{-1} - S^{-1}Z Z^T S^{-1})^TV) \end{align}

That implies, \begin{align} \frac{dL}{d\sigma} &= \text{tr}((S^{-1} - S^{-1}Z Z^T S^{-1})^TV) \\ & = \text{tr}((\sigma V + \gamma I)^{-1} - (\sigma V + \gamma I)^{-1} Z Z^T (\sigma V + \gamma I)^{-1})^T V) \end{align}

Similarly, by setting $dV=0$ and $d\sigma=0$, we get partial derivative w.r.t $\gamma$, \begin{align} \frac{dL}{d\gamma} &= \text{tr}(\sigma V + \gamma I)^{-1} - (\sigma V + \gamma I)^{-1} Z Z^T (\sigma V + \gamma I)^{-1}) \end{align}

Is there a way to solve $\frac{dL}{d\sigma}=0$ and $\frac{dL}{d\gamma}=0$ to get closed-form solution for $\gamma$ and $\sigma$?

1

There are 1 best solutions below

5
On BEST ANSWER

You might be better off solving the gradient for $S$, i.e. $$\eqalign{ \frac{\partial L}{\partial S} &= \big(S^{-1} - S^{-1}ZZ^TS^{-1}\big) \;=\; 0 \cr S^{-1} &= S^{-1}ZZ^TS^{-1} \cr S &= ZZ^T \cr }$$ Then find the values of $(\sigma,\gamma)$ which yield (in a least-squares sense) this matrix. $$\eqalign{ \min_{\sigma,\gamma} \; \Big\|\,\sigma V + \gamma I - ZZ^T\Big\|^2_F \cr }$$ Start with an easy problem whose solution is well known. $$\eqalign{ \min_\alpha \|\alpha A-C\|^2_F \implies \alpha = \frac{A:C}{A:A} \cr }$$ Setting $\,\alpha A=\sigma V$ and $C=(ZZ^T-\gamma I),\,$ and then
setting $\,\alpha A=\gamma I\,$ and $C=(ZZ^T-\sigma V)\,$ yields the scalars. $$\eqalign{ \sigma = \frac{V:(ZZ^T-\gamma I)}{V:V},\quad \gamma = \frac{I:(ZZ^T-\sigma V)}{I:I} \cr }$$ Plug the $\gamma$-expression into the $\sigma$-expression (and vice versa) to obtain $$\eqalign{ \sigma &= \frac{(I:I)(V:ZZ^T) - (V:I)(I:ZZ^T)}{(I:I)(V:V) - (V:I)(I:V)},\quad \gamma &= \frac{(V:V)(I:ZZ^T) - (I:V)(V:ZZ^T)}{(V:V)(I:I) - (I:V)(V:I)} \cr }$$ Note that the formulas are conjugate to one other, under the interchange of $\,I\Longleftrightarrow V$.