System of linear equations with gradient descent

110 Views Asked by At

I have a misunderstanding I am trying to solve.

I have a matrix $A \in \mathbb R^{n \times d}$ and a vector $b\in \mathbb R^n$ where $rank(A) = n$ and there are infinitely many $x \in \mathbb R^d$ such that $Ax = b$.

I want to solve $Ax = b$ with gradient descent by minimizing $\|Ax-b\|_2^2$, so the iteration is

$x_{k} = x_{k - 1} - \alpha A^T(Ax_{k - 1}-b)$ where $\alpha$ is the learning rate. We can also write this differently as

$x_{k} = (I_d - \alpha A^TA)^kx_0+\alpha\sum_{j=0}^{k-1}(I_d-\alpha A^TA)^jA^Tb$

I understand all this, but I'm having trouble understanding where this converges as $k \to \infty$.

Since $A^TA$ is a $d$ by $d$ matrix of rank $n$, it has a zero eigenvalue, so $1$ is an eigenvalue of $I_d - \alpha A^TA$. Assume $\alpha$ is such that all the other eigenvalues of $I_d - \alpha A^TA$ are strictly less than $1$, then $\|I_d - \alpha A^TA\| = 1$.

On the one hand, we must require $\lim_{k \to \infty}(I_d - \alpha A^TA)^k = 0_d$ so that the series $\sum_{j=0}^{\infty}(I_d-\alpha A^TA)^j$ will converge. On the other that, if $\lim_{k \to \infty}(I_d - \alpha A^TA)^k = 0_d$ then $\lim_{k \to \infty}(I_d - \alpha A^TA)^kx_0 = \vec{0}$ and that makes no sense, as it basically means there's no significance to the initial point I'm choosing. Evidently this is wrong, as if I consider two different solutions $y_0,z_0$ as initial guesses, then $y_\infty = y_0 \neq z_0 = z_{\infty}$.

Is there an explanation for this?

1

There are 1 best solutions below

3
On

If $x\in ker\ A \setminus \{0\}$ then $(I-\alpha A^TA)x=x$, hence $(I-\alpha A^TA)^k$ does not converge to zero.

Now recall that $\mathbb R^d = kern(A) \oplus im(A^T)$. If we split $x = u+v$ with $u\in kern(A)$ and $v\in im(A^T)$, then $$ (I-\alpha A^TA)(u+v) = u + (I-\alpha A^TA)v, $$ where $u\in ker(A)$ again, and $(I-\alpha A^TA)v \in im(A^T)$. Hence the iteration does not change the part of $x_0$ that lies in $kern(A)$, but only works on the part of $x_0$ that is in its orthogonal complement.