Stationary points of $ \frac{1}{2} \|A-xx^T\|^2_F $

487 Views Asked by At

Stationary points of $ \dfrac{1}{2} \|A-xx^T\|^2_F $.

If $A$ is an $n \times n$ matrix and $x$ is an $n \times 1$ vector. How do I get the stationary points? I know that I am supposed to find the gradient and equate it to $0$, but how do I do that for a vectorized equation? I can solve this when $n = 1$, but not for a general case.

3

There are 3 best solutions below

0
On

We have $$\frac{1}{2}\|A-xx^T\|^2_F = \frac{1}{2}\text{trace}( (A-xx^T)^T(A-xx^T)) = \frac{1}{2}\|A\|^2_F - \frac{1}{2}x^T(A+A^T)x+\frac{1}{2}(\|x\|^2)^2. $$

This gives the gradient of $\frac{1}{2}\|A-xx^T\|^2_F$ as $$ -(A+A^T)x + 2\|x\|^2x. $$

And the stationary points are the $x$ which satisfy $$(A+A^T)x = 2\|x\|^2x.$$

$x = 0$ is clearly a stationary point.

Let $A+A^T = \sum_{i=1}^n \lambda_i u_iu_i^T$ be the spectral decomposition of $A+A^T$ where $u_i$'s form an orthonormal basis.

Clearly a stationary point $x \neq 0$ is an eigenvector of $A+A^T$ with eigenvector $2\|x\|^2$. So, $x = cu$ for some $u$ where $u$ is a unit norm eigenvector corresponding to (say)the eigenvalue $\lambda_i$ and $c \neq 0$. We must have $\lambda_i = 2\|x\|^2 = 2c^2$. Which is only possible if $\lambda_i >0$, in which case we have $c = \sqrt{\frac{\lambda_i}{2}}$ and $x = \sqrt{\frac{\lambda_i}{2}}u$.

So the stationary points are $0$ and all vectors of the form : $\sqrt{\frac{\lambda_i}{2}} u$, where $\lambda_i > 0$ is a positive eigenvalue of $A+A^T$ and $u$ a corresponding eigenvector with unit norm.

If $A+A^T$ does not have any positive eigenvalues $0$ is the only stationary point.

0
On

$$J=\frac12 |||A-x x^{\intercal}||^2_F=\frac12 \sum_{i=1}^n \sum_{j=1}^n (a_{ij}-x_ix_j)^2$$

$$\nabla J=0$$


$$\nabla J_k=-x_k \sum_{i=1}^n (a_{ik} - x_i x_k)-x_k\sum_{j=1}^n (a_{kj} - x_kx_j)+2x_k(a_{kk}-x_k^2)=0,\quad \forall k\in {1,\dots, n}$$

Divided by $x_k$

$$- \sum_{i=1}^n (a_{ik} - x_i x_k)-\sum_{j=1}^n (a_{kj} - x_kx_j)+2(a_{kk}- x_k^2)=0,\quad \forall k\in {1,\dots, n}$$

Separate

$$2 x_k \sum_{i=1}^n (x_i)+2a_{kk}-2x_k^2-\sum_{i=1}^n (a_{ki}+a_{ik})=0,\quad \forall k\in {1,\dots, n}$$

Neat

$$ x_k (\sum_{i=1}^n (x_i) -x_k)=\frac12(\sum_{i=1}^n (a_{ki}+a_{ik}))-a_{kk},\quad \forall k\in {1,\dots, n}$$

0
On

Let $f(x) = \frac{1}{2} \|A-xx^T\|^2_F$ and $\partial_k f$ the partial derivative of $f$ with respect to $x_k$.

By the definition of the Frobenius norm $$ f(x) = \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} (a_{ij} - x_i x_j)^2 $$

We can split that double summation in clear way so that computing $\partial_k f$ becomes straightforward as follows

$$\begin{align} f(x) &= \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} (a_{ij} - x_i x_j)^2 \\ &= \frac{1}{2} \left( \overbrace{\sum_{i=1 \\ i\neq k}^{n} \sum_{j=1 \\ j \neq k}^{n} (a_{ij} - x_i x_j)^2}^{\text{has no $x_k$}} + \overbrace{\sum_{i=1 \\ i\neq k}^{n} (a_{ik} - x_i x_k)^2}^{\text{has only one $x_k$}} + \overbrace{\sum_{j=1 \\ j\neq k}^{n} (a_{kj} - x_k x_j)^2}^{\text{also has only one $x_k$}} + \overbrace{(a_{kk} - x_k^2)^2}^{\text{has two $x_k$s}} \right) \end{align}$$

Therefore

$$\begin{align} \partial_k f(x) &= \frac{1}{2} \left( 0 + \sum_{i=1 \\ i\neq k}^{n} 2(a_{ik} - x_i x_k)(-x_i) + \sum_{j=1 \\ j\neq k}^{n} 2(a_{kj} - x_k x_j)(-x_j) + 2(a_{kk} - x_k^2)(-2 x_k) \right) \\ &= - \sum_{i=1 \\ i\neq k}^{n} (a_{ik} - x_i x_k) x_i - \sum_{j=1 \\ j\neq k}^{n} (a_{kj} - x_k x_j) x_j - (a_{kk} - x_k^2) 2 x_k \\ &= - \sum_{i=1}^{n} (a_{ik} - x_i x_k) x_i - \sum_{j=1}^{n} (a_{kj} - x_k x_j) x_j \\ &= - \sum_{i=1}^{n} \left( (a_{ik} - x_k x_i) x_i + (a_{ki} - x_k x_i) x_i \right) \\ &= - \sum_{i=1}^{n} (a_{ik} - x_k x_i + a_{ki} - x_k x_i) x_i \\ &= - \sum_{i=1}^{n} (a_{ik} + a_{ki} - 2 x_k x_i) x_i \\ \end{align}$$

Since $\nabla f(x) = \left[ \begin{matrix} \partial_1 f(x) & \cdots & \partial_n f(x) \end{matrix} \right]^T$ it follows that $\nabla f(x) = - (A + A^T - 2x x^T)x $

Then $\nabla f(x) = 0$ implies $x = 0$ or $A + A^T - 2x x^T = 0$.

Now suppose that $x \neq 0$. So $x x^T = \dfrac{1}{2}(A + A^T)$. That is, $x x^T$ is equal to the symmetric part of $A$. This is only feasible if $A$ meets certains conditions which I will left for you to verify. Namely if $A$ have non-negative diagonal and for all $i, j \in 1 \dots n $ it holds that $$ \underbrace{\sqrt{a_{ii} a_{jj}}}_{\text{geometric mean}} = \underbrace{\frac{a_{ij} + a_{ji}}{2}}_{\text{arithmetic mean}} $$

(Which particularly I think is a nice relation)

So if such conditions are meet, then $x = \left[ \begin{matrix} \sqrt{a_{11}} \cdots \sqrt{a_{nn}} \end{matrix} \right]^T$.

Thus the stationary points of $f(x)$ are $x=0$, and $x = \left[ \begin{matrix} \sqrt{a_{11}} \cdots \sqrt{a_{nn}} \end{matrix} \right]^T$ if $A$'s additional conditions are met.