Prove that for L all local minima are global.

39 Views Asked by At

For a vector v consider the rank-$1$ PCA loss function,

$$L(w) = ||{vv^T - ww^T}||_2^2$$

Prove that for L all local minima are global.


To prove that all local minima of the function $L(w)$ are global, we need to show that $L(w)$ has no saddle points or local maxima.

Let $w_1$ and $w_2$ be two different local minima of $L(w)$. Since $w_1$ and $w_2$ are both local minima, the gradients of $L(w)$ at these points must be zero:

$$\nabla L(w_1) = 0 \quad \text{and} \quad \nabla L(w_2) = 0.$$

Let $\Delta w = w_1 - w_2$ be the difference between these two points. Since $w_1$ and $w_2$ are both local minima, we know that $L(w_1) \leq L(w)$ and $L(w_2) \leq L(w)$ for all $w$ in some neighborhood of $w_1$ and $w_2$, respectively.

Now, consider the function $f(t) = L(w_1 - t\Delta w)$. This function gives the value of $L(w)$ along the line connecting $w_1$ and $w_2$ as $t$ varies from $0$ to $1$. Note that $f(0) = L(w_1)$ and $f(1) = L(w_2)$.

By the mean value theorem, there exists some $t^* \in (0,1)$ such that $f'(t^*) = 0$. We can calculate the derivative of $f(t)$ using the chain rule:

$$f'(t) = -2\Delta w^T(vv^T - (w_1 - t\Delta w)(w_1 - t\Delta w)^T)v.$$

Setting $t = t^*$ and using the fact that $\nabla L(w_1) = 0$, we get:

\begin{align*} L(w) &= ||vv^T - ww^T||_2^2 \\ &= ||vv^T - (w_1 + t\Delta w)(w_1 + t\Delta w)^T||_2^2 \\ &= ||vv^T - w_1w_1^T - t\Delta w(w_1^T + w_1 - t\Delta w^T)||_2^2 \\ &= ||(w_1w_1^T - vv^T) - t\Delta w(w_1^T + w_1 - t\Delta w^T)||_2^2 \\ &= ||(w_1w_1^T - vv^T)v||_2^2 - 2t(w_1w_1^T - vv^T)\Delta w(w_1^T + w_1 - t\Delta w^T)v + t^2||\Delta w(w_1^T + w_1 - t\Delta w^T)||_2^2 \\ \end{align*}

The first term does not depend on $t$, so it does not affect the location of the minimum. The third term is always nonnegative, so it also does not affect the location of the minimum. Therefore, we can focus on the middle term, which is a quadratic function of $t$.

Let $a = \Delta w(w_1^T + w_1)v$ and $b = \Delta w^T(w_1w_1^T - vv^T)\Delta w$. Then the middle term can be written as $-2ta^T\Delta w + tb$. This is a quadratic function of $t$ with leading coefficient $b > 0$, since $(w_1w_1^T - vv^T)$ is positive semidefinite and $\Delta w^T\Delta w > 0$. Therefore, this quadratic function is either always increasing or always decreasing, depending on the sign of $a$.

If $a = 0$, then the quadratic function is constant, and $L(w)$ is minimized at any point on the line connecting $w_1$ and $w_2$. If $a \neq 0$, then the quadratic function has a unique minimum at $t = \frac{a}{b}$, and $L(w)$ is minimized at $w_1 + \frac{a}{b}\Delta w$. Since $a$ is orthogonal to $\Delta w$, this point is also orthogonal to $(w_1w_1^T - vv^T)v$, and hence $L(w_1 + \frac{a}{b}\Delta w) = L(w_1)$. Therefore, any local minimum of $L(w)$ is also a global minimum.

In summary, we have shown that $L(w)$ has no saddle points or local maxima, and that any local minimum is also a global minimum. Therefore, all local minima of $L(w)$ are global.