Understanding derivation of ADMM update rule for graphical lasso optimization by solving quadratic matrix equation

306 Views Asked by At

I'm trying to understand the derivation of an ADMM update rule in some convex optimization lecture notes by Emmanuel Candes [1]. In the course of the solution (on page 25-4 and 25-5), it is required to solve the following optimization problem (which comes from the augmented Lagrangian): $$ X_k = \text{arg min}_{X} \mathcal{L}_{\mu}(X, Y_{k-1}, Z_{k-1}) = \text{arg min}_{X \succeq 0} \Big \{ -\text{log det}X + \frac{\mu}{2} || X + (Z_{k-1} - Y_{k-1} + \frac{1}{\mu}C) ||^2_F \Big \}. $$ where $C$ is a fixed symmetric and positive semidefinite covariance matrix, $Z_{k-1}$ is a Lagrange multiplier variable, $Y_{k-1}$ is the ADMM "dummy variable" for $X$, and $\mu$ is the penalty parameter (see the lecture notes on page 25-4 for more details). Note that we want $X \succeq 0$. First, let $$ \tilde{C}_{k-1} = X_{k-1}- Y_{k-1} - \frac{1}{\mu} C $$ We want to set the gradient of the Lagrangian to zero to solve for $X_k$, which means that we want to solve $$ \nabla_X \mathcal{L}_{\mu} = -X^{-1} + \mu X -\mu \tilde{C}_{k-1} = 0 $$ which is equivalent to solving $$ 0 = X^2 - \tilde{C}_{k-1}X - \frac{1}{\mu} I $$

Candes says on page 25-4 at the top that if the eigendecomposition of $C_{k-1}$ is $U \Lambda U^{-1}$ with $\Lambda = \text{diag}(\{\lambda_i\})$ [2], then the solution is $X^* = \mathcal{F}_{\mu}(\tilde{C}_{k-1}) = U \mathcal{F}_{\mu}(\Lambda) U^{-1} $ where $$ \mathcal{F}_{\mu}(\Lambda) = \frac{1}{2}\text{diag} \Big ( \Big \{ \lambda_i + \sqrt{\lambda_i^2 + 4/\mu} \Big ) \Big \} $$ From this he concludes that the update rule is $$ X_k = \mathcal{F}_{\mu}(\tilde{C}_{k-1}) = \mathcal{F}_{\mu}(Z_{k-1} - Y_{k-1} + \frac{1}{\mu}C). $$

I can kind of guess at where the $\sqrt{\lambda_i^2 + 4/\mu}$ term comes from, as a quadratic matrix equation of the type $AX^2+BX+K=0$ (under some conditions) is solved by $X = (-B \pm Q)/2$ where $Q = PD^{1/2}P^{-1}$ and $PDP^{-1}$ is the eigendecomposition of the discriminant $B^2-4AK$. And here, our $B^2-4AK$ is $\tilde{C}_{k-1}^2+(4/\mu)I$. But the equation for the eigenvalues seems to assume that our $B^2-4AK$ is $C^2+(4/\mu)I$, as I believe that $C^2+(4/\mu)I$ has eigenvalues $\lambda_i^2 + 4/\mu$. And where does the first $\lambda_i$ in the sum (to the left of the square root) come from? At this point I'm kind of confused and hoping for some help.

Apologies if the solution is immediately obvious, I haven't taken an advanced course in linear algebra or optimization yet.

[1] https://statweb.stanford.edu/~candes/math301/Lectures/ADMM.pdf

[2] Note that there seems to be a typo in the notes here: where it is stated that "$\tilde{C}_{k-1} = U \Lambda U^{-1}$ is the eigendecomposition of $C_{k-1}$", I assume the scribe meant to write that $C_{k-1} = U \Lambda U^{-1}$ is the eigendecomposition of $C_{k-1}$. At least I hope! Perhaps this is the source of my confusion.

1

There are 1 best solutions below

1
On BEST ANSWER

This is late, but I came across this question when reading the same notes and hopefully this will help someone.

You are correct that there is a typo in the notes, but I think the typo is that "... is the eigenvalue decomposition of $C_{k-1}$ with $\lambda_i \geq 0$" should read "is the eigenvalue decomposition of $\tilde C_{k-1}$", with no claim on the sign of $\lambda_i$. Below, I provide justification.

We are solving $-X^{-1} + \mu X - \mu \tilde C_{k-1} = 0$, with $\tilde C_{k-1} = -Z_{k-1}+Y_{k-1}-1/\mu \cdot C$, and we want to find a positive definite solution for $X$. In the notes, a positive semidefinite solution is sought, but the below procedure will in fact give a positive definite solution.

We begin by writing $\tilde C_{k-1} = U \Lambda U^T$ as suggested in the notes. Assume for the moment that there exists a solution of the form $X = UDU^T$, where $D$ is a diagonal matrix with positive diagonal elements. In this case, the above equation becomes $U(-D+\mu D-\mu \Lambda)U^T = 0$, which reduces to $-D+\mu D-\mu \Lambda=0$. Now, $D$ and $\Lambda$ are diagonal matrices, so this equation further reduces to $-d_i^{-1}+\mu d_i - \mu \lambda_i=0$ for all $i$, where $D = {\rm diag}(\{d_i\})$ and $\Lambda = {\rm diag}(\{ \lambda_i\})$. We can equivalently solve $d_i^2-\lambda_i d_i - 1/\mu = 0$, which gives $d_i = 1/2 \cdot (\lambda_i + \sqrt{\lambda_i^2+4/\mu})$ since we want $d_i>0$. Note that $d_i$ is positive even if $\lambda_i$ is negative, so we don't need $\tilde C_{k-1}$ to have nonnegative eigenvalues.

Now, working backwards, we see that $X = UDU^T$, with $D = {\rm diag}(\{d_i\})$ as described above, will give us a positive definite solution of $-X^{-1}+\mu X-\mu \tilde C_{k-1}=0$. This is the ADMM update for $X_k$.