What is the formula for projection onto spectraplex?

385 Views Asked by At

A spectraplex (special case of spectrahedron) is the set of all positive semi-definite matrices whose trace is equal to one. Formally, let

$$ S = \left\{\textbf{W} \in \mathbb{R}^{d \times d} \mid \textbf{W} \succeq 0, \text{Tr}(\textbf{W})=1 \right\} $$

On page 7, lemma 2 in Dan Garber's On the regret minimization of nonconvex online gradient ascent for online PCA, it is stated that "it is well known that the projection of $\textbf{W}$ onto $S$ is given by" $$ \Pi_S[\textbf{W}]=(\lambda_1(\textbf{W})-\lambda)w'w'^T+ \sum_{i=2}^{d}\max\{0,\lambda_i(\textbf{W})-\lambda \}y_iy_i^T $$

where $(w',y_2,y_3,\cdots,y_d)$ are eigenvectors of $\textbf{W}$ corresponding to descending eigenvalues such that

$$ \lambda_1(\textbf{W})-\lambda+ \sum_{i=2}^{d}\max\{0,\lambda_i(\textbf{W})-\lambda \}=1 $$ for some $\lambda \geq 0$. Where the above comes from? Is there any reference that explains such projections?

1

There are 1 best solutions below

1
On BEST ANSWER

First, note that no such $\lambda$ exists if

$\sum_{i=1}^{d} \lambda_{i} < 1$.

For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $\lambda_{i}$ is large enough.

I believe that if you allow for negative values of $\lambda$ the method will actually produce correct results for matrices $W$ that have $\sum_{i=1}^{d} \lambda_{i}<1$.

In the following, we'll assume that

$\sum_{i=1}^{d} \lambda_{i} \geq 1$.

Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as

$W=\sum_{i=1}^{d} \lambda_{i} y^{(i)}{y^{(i)}}^{T}$

or

$W=\sum_{i=1}^{d} \lambda_{i} Y^{(i)}$

where

$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.

We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is

$\langle A, B \rangle = \mbox{tr}(AB)$

and the norm is

$\| A \|_{F}=\sqrt{\langle A, A \rangle}$.

Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.

$\langle Y^{(i)}, Y^{(j)} \rangle=\mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$

$\langle Y^({i)}, Y^{(j)} \rangle=\mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.

Also,

$\| Y^{(i)} \|_{F}^{2}=\mbox{tr}(Y^{(i)}Y^{(i)})= \mbox{tr} (y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T}) =\mbox{tr}(y^{(i)}{y^{(i)}}^{T}) =\mbox{tr}({y^{(i)}}^{T}y^{(i)}) =1$.

The desired projection is

$\Pi_{S}(W)=\mbox{arg} \mbox{min}_{Y} \| W-Y \|_{F}^{2}$

subject to the constraints

$\mbox{tr}(Y)=1$

$Y \succeq 0$.

Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.

$Y=Y^{(0)}+\sum_{i=1}^{d} \alpha_{i} Y^{(i)}$

Now, by the Pythagorean theorem,

$\| W-Y \|_{F}^{2}=\left\| W-Y^{(0)} \right\|_{F}^{2}+\left\| \left( W- \sum_{i=1}^{d} \alpha_{i} Y^{(i)} \right) \right\|_{F}^{2}$.

$\| W-Y \|_{F}^{2}=\left\| W-Y^{(0)} \right\|_{F}^{2}+\left\| \sum_{i=1}^{d} (\lambda_{i}-\alpha_{i}) Y^{(i)} \right\|_{F}^{2}$.

Using the orthogonality of the $Y^{(i)}$ matrices, we get that

$\| W-Y \|_{F}^{2}=\left\| W-Y^{(0)} \right\|_{F}^{2}+\sum_{i=1}^{d} (\lambda_{i}-\alpha_{i})^{2}$.

Since $W \perp Y^{(0)}$,

$\| W-Y \|_{F}^{2}=\left\| W \right\|_{F}^{2}+\left\| Y^{(0)} \right\|_{F}^{2}+\sum_{i=1}^{d} (\lambda_{i}-\alpha_{i})^{2}$.

Since

$\mbox{tr}(Y)=\mbox{tr}(Y^{(0)})+\sum_{i=1}^{d} \alpha_{i}$,

and

$\sum_{i=1}^{d} \alpha_{i} \leq \sum_{i=1}^{d} \lambda_{i}$,

using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.

We've now reduced the problem to

$\min \sum_{i=1}^{d} (\lambda_{i}-\alpha_{i})^{2}$

subject to

$\sum_{i=1}^{d} \alpha_{i}=1$

$\alpha \geq 0$.

It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.

If we let $\mu$ be the Lagrange multiplier for the constraint $\sum_{i=1}^{d}\alpha_{i}=1$ and $\nu_{i}$ be the Lagrange multiplier for the constraint $\alpha_{i} \geq 0$, then the ith component of the KKT conditions depends on whether $\alpha_{i}>0$, in which case $\nu_{i}=0$, or $\alpha_{i}=0$, in which case $\nu_{i}$ is free to be nonzero.

In the case $\alpha_{i}>0$ and $\nu_{i}=0$, we have

$-2(\lambda_{i}-\alpha_{i})=\mu$

or

$(\lambda_{i}-\alpha_{i})=\mu/2$.

In the case where $\alpha_{i}=0$ and $\nu_{i}$ is positive, the equation can satisfied by adjusting $\nu$. Next, we use the constraint

$\sum_{\alpha_{i}>0} \alpha_{i}=1$

to get

$\sum_{\alpha_{i}>0} (\lambda_{i}-\mu/2)=1$

Adding in the terms for $\alpha_{i}=0$, we get

$\sum_{i=1}^{d} \max(0,\lambda_{i}-\mu/2)=1$.

Once $\mu$ has been determined, let

$\alpha_{i}=\max(0,\lambda_{i}-\mu/2)$ for $i=1, 2, \ldots, d$.