Maximization of a nasty Gaussian likelihood

100 Views Asked by At

I have a Gaussian likelihood function, $$p(y|x) = \mathcal{N}(y; Ax, (x^\top V x + \lambda) \otimes I)$$ where $A,V,\lambda$ is known, and $\otimes$ is the Kronecker product. (the notation indicates that covariance is a scalar times identity matrix -- scalar is: $x^\top V x + \lambda$). Note that $A$ is a rectangular matrix, say $m\times n$ with $m>n$. I would like to maximise this with respect to $x$, in other words, solve the following problem, $$x^* = \arg \max_x p(y|x) = \arg \max_x \mathcal{N}(y;Ax,(x^\top V x + \lambda) \otimes I)$$ I tried to take derivative of the log-likelihood and set it to zero, however I was unable to leave out $x$ and obtain an exact solution.

I wonder if there is an exact solution, and if not: what the best numerical scheme is to overcome this problem.

Any help is greatly appreciated. Thanks!

PS: Pseudoinverse is not the solution, according to numerical examples! And another empirical observation from 2d simulations: As $\lambda \to \infty$ (for very large values), pseudoinverse solution becomes more and more accurate, so this hints about structure of the solution a bit.

PS2: As $\lambda \to \infty$, the solution is the following, $$x^* = (d (C^\top C)^{-1} V + I)^{-1} (C^\top C)^{-1} C^\top y$$ where $d = \dim(y)$. I don't know how it is useful though...

1

There are 1 best solutions below

9
On

We want to maximize the log likelihood function $$L(x) = \ln{p(y|x)} = -\frac{1}{2}\ln((x^TVx + \lambda)^n) - \frac{1}{2}\frac{||y - Ax||^2}{(x^TVx + \lambda)} - \frac{n}{2}\ln{2\pi}$$

Assume that $A$ is invertible and define new variables: $z = y - Ax$. In this parametrization $x = A^{-1}(y - z)$ and $$L(z) = -\frac{1}{2}\ln(((A^{-1}(y - z))^TV(A^{-1}(y - z)) + \lambda)^n) - \frac{1}{2}\frac{||z||^2}{((A^{-1}(y - z))^TV(A^{-1}(y - z)) + \lambda)} - \frac{n}{2}\ln{2\pi}$$ For simplicity, define $M(y, z) = (A^{-1}(y - z))^TV(A^{-1}(y - z)) + \lambda$. Dropping the constant we have $$L(z) = -\frac{n}{2}\ln(M(y,z)) - \frac{1}{2}\frac{||z||^2}{M(y, z)}$$

For later: $$\frac{d}{dz_i} M(y, z) = 2\delta_i(A^{-1})^TVA^{-1}(z - y) = 2\delta_iN(z - y)$$ where I have set $N = (A^{-1})^TVA^{-1}$ (for future brevity) and $\delta_i$ is the standard $i$th row vector.

Then \begin{eqnarray*} \frac{d}{dz_i} L(z) &=& \frac{d}{dz_i} \left(-\frac{n}{2}\ln(M(y,z)) - \frac{1}{2}\frac{||z||^2}{M(y, z)}\right) \\ &=& -\frac{n}{2}\frac{\frac{d}{dz_i}M}{M} - \frac{1}{2}\left(\frac{(2z_i)M - ||z||^2\frac{d}{dz_i}M}{M^2}\right) \\ &=& \frac{-1}{M}\left(\frac{n}{2}(2\delta_iN(z - y)) + \frac{1}{2}\left(\frac{(2z_i) - ||z||^22\delta_iN(z - y)}{M}\right)\right) \\ &=& \frac{-1}{M}\left(n(\delta_iN(z - y)) + \left(\frac{(z_i) -||z||^2\delta_iN(z - y)}{M}\right)\right) \\ \end{eqnarray*}

Now, after some algebra, we find that $$nM\delta_iN(z - y) + z_i - ||z||^2 \delta_iN(z - y) = 0$$ Rearranging terms we obtain $$z_i = (||z||^2 - nM)\delta_iN(z - y)$$