How can I prove the Hessian of the log likelihood of the Generalized Error Distribution is negative definite?

313 Views Asked by At

I'm working with the multivariate generalized error distribution to model some data. (The parameterization that I am working with follows Graham Giller's work here: https://www.researchgate.net/publication/255626258_A_Generalized_Error_Distribution) I'd like to prove that the Hessian matrix with respect to the mean vector is negative semi-definite for all $\kappa$. To be fair, I don't have a definitive source that says this is indeed true but I have plotted the log likelihood functions for increasing values of $\kappa$ and have yet to encounter a case where it is not concave, with it gradually flattening as $\kappa\rightarrow\inf$, as one might expect from inspection of the pdf. Given that, and the distribution's close relation to other elliptically symmetric distributions with negative-definite Hessians, I feel confident the GED's Hessian should be negative definite w.r.t. the mean vector but the results I am getting suggest that is not the case for $\kappa>1$.

The log likelihood function for $x: \mathbb{R}^n, \mu: \mathbb{R}^n, \Sigma: \mathbb{R}^{n\times n}, \kappa: \mathbb{R}\in(0,\inf)$ is of the form:

$$ \ln(L(x|\mu,\Sigma,\kappa)) = - \left[\frac{\Gamma(3\kappa)}{\Gamma(\kappa)}(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^\frac{1}{2\kappa} - \frac{1}{2}\ln\left(\left|\Sigma^{-1} \right|\right) - \frac{n}{2} \ln\left(\frac{\pi\Gamma(\kappa)}{\Gamma(3\kappa)}\right) - \ln\left(\frac{\Gamma(1+n\kappa)}{\Gamma(1+\frac{n}{2})}\right) $$

I've calculated the gradient of $\ln(L(x))$ w.r.t. $\mu$ as follows:

$$ \begin{eqnarray*} \triangledown_{\mu}\ln(L(x)) &=& \frac{\partial}{\partial\mu}\left[- \left[\frac{\Gamma(3\kappa)}{\Gamma(\kappa)}(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^\frac{1}{2\kappa}\right]\\ &=&-\left[\frac{\Gamma(3\kappa)}{\Gamma(\kappa)}\right]^{\frac{1}{2\kappa}}\frac{\partial}{\partial\mu}\left[z^{\frac{1}{2\kappa}}\right] \text{, where $z=(x-\mu)^T\Sigma^{-1}(x-\mu)$}\\ &=& -\left[\frac{\Gamma(3\kappa)}{\Gamma(\kappa)}\right]^{\frac{1}{2\kappa}}\frac{1}{2\kappa}z^{\frac{1}{2\kappa}}\frac{\partial z}{\partial \mu} \text{, where $\frac{\partial z}{\partial \mu}=-2\Sigma^{-1}(x-\mu)$}\\ &=& \left[\frac{\Gamma(3\kappa)}{\Gamma(\kappa)}\right]^{\frac{1}{2\kappa}}\frac{1}{ \kappa}\Sigma^{-1}(x-\mu)\left[(x-\mu)^T\Sigma{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-1}\\ \end{eqnarray*} $$

Ignoring the leading constants $\left[\frac{\Gamma(3\kappa)}{\Gamma(\kappa)}\right]^{\frac{1}{2\kappa}}\frac{1}{ \kappa}$, I've calculated the Hessian as follows:

$$ \begin{eqnarray*} H &=& \frac{\partial}{\partial\mu}\left[\Sigma^{-1}(x-\mu)\left[(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-1}\right]\\ &=& \left[\left[(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-1}\right] J_\mu\left[\Sigma^{-1}(x-\mu)\right] + \Sigma^{-1}(x-\mu)\triangledown^T\left[\left[(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-1}\right]\\ &=& -\Sigma^{-1}\left[(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-1} + \left(\frac{1}{2\kappa}-1\right)\left[(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-2}(-2)\Sigma^{-1}(x-\mu)(x-\mu)^T\Sigma^{-1}\\ &=& \left(2-\frac{1}{\kappa}\right)\left[(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-2}\Sigma^{-1}(x-\mu)(x-\mu)^T\Sigma^{-1} - \Sigma^{-1}\left[(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-1} \end{eqnarray*} $$

Factoring out $\left[(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-2}$ from both terms, we're left with:

$$ H\varpropto \left(2-\frac{1}{\kappa}\right)\Sigma^{-1}(x-\mu)(x-\mu)^T\Sigma^{-1} - \Sigma^{-1}(x-\mu)^T\Sigma^{-1}(x-\mu) $$

Recognizing that $\Sigma^{-1}$ is positive semi-definite by construction, it is easy to see that the Hessian is negative semi-definite for at least all $\kappa \leq 0.5$. It is for $\kappa > 0.5$ that I am unable to prove that the Hessian must be negative (semi-)definite. In fact, if I assume $d=1$, it would seem to be that the Hessian cannot be negative definite for any $\kappa > 1$. Assuming $d=1$:

$$ (x-\mu)(x-\mu)^T\Sigma^{-1} = (x-\mu)^T\Sigma^{-1}(x-\mu) $$

As a result,

$$ H\varpropto \left(2-\frac{1}{\kappa}\right)\Sigma^{-1} - \Sigma^{-1} $$

which must be positive definite when $(2-\frac{1}{\kappa}) > 1$, i.e., when $\kappa > 1$. This does not seem to agree with numerical calculations of the $\ln(L(x))$ across a range of $\mu$ values when $\kappa > 1$, which show a function that is clearly concave w.r.t $\mu$. I'd also note that setting $\kappa = 0.5$ is equivalent to having a multivariate Gaussian and, in that case, the $H$ I derived for the GED evaluates to $-\Sigma^{-1}$, which is the expected result for a Gaussian.

Did I make a mistake in my derivation of the Hessian matrix? I am a self-taught novice when it comes to matrix calculus, so it wouldn't be a shock if I did. Can it be proven that the Hessian is negative (semi-)definite for all $\kappa$? If not, what are the implications for parameter estimation via numerical optimization of $\ln(L(x))$? Any help here would be greatly appreciated.

2

There are 2 best solutions below

4
On

Start with the gradient formulated in simplified terms: $$ \mathbf{g} = -2\beta a^{\beta-1} \mathbf{\Lambda} (\mathbf{m-x}) $$ with $\phi=a^\beta$ and the scalar $a=(\mathbf{m-x})^T \mathbf{\Lambda} (\mathbf{m-x}) > 0$.

Now using differentials, \begin{eqnarray} d\mathbf{g} &=& -2\beta a^{\beta-1} \mathbf{\Lambda} (d\mathbf{m}) - 4\beta (\beta-1) a^{\beta-2} \mathbf{\Lambda} (\mathbf{m-x}) (\mathbf{m-x})^T \mathbf{\Lambda} (d\mathbf{m}) \end{eqnarray} Thus \begin{equation} \mathbf{H} = 2 \beta a^{\beta-2} \left[ 2(1-\beta) \mathbf{u} \mathbf{u}^T - a \mathbf{\Lambda} \right] \end{equation} with $\mathbf{u} = \mathbf{\Lambda} (\mathbf{m-x})$. This is a rank-one update of the precision matrix. When $\beta>1$, the matrix is clearly negative (semi)definite.

2
On

Picking up from:

$$ H \propto \left(2-\frac{1}{\kappa}\right)\left[(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-2}\Sigma^{-1}(x-\mu)(x-\mu)^T\Sigma^{-1} - \Sigma^{-1}\left[(x-\mu)^T\Sigma^{-1}(x-\mu)\right]^{\frac{1}{2\kappa}-1} $$

we can define $\Sigma^{-1} = L^TL$ via a Cholesky decomposition and define $c = (x-\mu)L$:

$$ \begin{eqnarray*} H &\propto& \left[(x-\mu)^TL^TL(x-\mu)\right]^{\frac{1}{2\kappa}-1} \left(\frac{\left(2-\frac{1}{\kappa}\right)}{(x-\mu)^TL^TL(x-\,u)}L^TL(x-\mu)(x-\mu)^TL^TL - \Sigma^{-1}\right)\\ H &\propto& (c^Tc)^{\frac{1}{2\kappa}-1}\left(\frac{\left(2-\frac{1}{\kappa}\right)}{c^Tc}L^Tcc^TL - \Sigma^{-1}\right)\\ H &\propto& \langle c,c\rangle^{\frac{1}{2\kappa}-1} \left(\left(2-\frac{1}{\kappa}\right) L^T \frac{cc^T}{\langle c,c\rangle} L - \Sigma^{-1}\right) \end{eqnarray*} $$

$\frac{cc^T}{\langle c,c\rangle} = \frac{cc^T}{\lVert{c}\rVert \lVert{c}\rVert} $, which is what's known as a projection matrix. A projection matrix, $P$, has the property that $AP = A$. Therefore, the relevant portion of our Hessian becomes:

$$ \begin{eqnarray*} H &\propto& \left(2-\frac{1}{\kappa}\right)L^TL - \Sigma^{-1} \\ H &\propto& \left(2-\frac{1}{\kappa}\right)\Sigma^{-1} - \Sigma^{-1}\\ H &\propto& \left(1-\frac{1}{\kappa}\right)\Sigma^{-1} \end{eqnarray*} $$

From this, it is readily apparent that the Hessian w.r.t. $\mu$ is negative (semi-)definite for $0 < \kappa \leq 1$ and positive definite for $\kappa > 1$.