The intuition behind eigenvalues in relation to RNN

161 Views Asked by At

I have a question about wonderful article of Andrej Karpathy Yes you should understand backprop

The question is not about back-propagation but about eigenvectors and eigenvectors.

What happens when you take one number a and start multiplying it by some other number b (i.e. $a*b*b*b*b*b*b$…)? This sequence either goes to zero if $|b| < 1$, or explodes to infinity when $|b|>1$. The same thing happens in the backward pass of an RNN, except b is a matrix and not just a number, so we have to reason about its largest eigenvalue instead.

enter image description here

Why in case of matrices he speaks about larger eigenvector? Why this analogy is correct (if integer than either <0 or >0 and if matrices than the same relates to the larrger eigenvector). I know the definition of eigenvector/eigenvalue but it seems like I miss the intuition behind them.

2

There are 2 best solutions below

0
On

The eigenvectors of a matrix are "special" to that matrix, they "belong" to that matrix. When one of these eigenvectors is acted on by the matrix, the resulting vector points in exactly the same direction. It may be streched or compressed but, it still points in exactly the same direction.

The amount the eigenvector is streched/compressed depends on the eigenvalue. If the magnitude of the eigenvalue is larger than 1, the eigenvector is stretched. If the magnitude of the eigenvalue is smaller than 1, the eigenvector is compressed.

As such, if a matrix is applied repeatedly and some eigenvector has an eigenvalue with magnitude greater than 1, the result will be to strectch the outputs so they get larger and larger (without bound).

0
On

Let be $A$ a complex valued matrix.

You want to look at $(A^n)_n$ as a sequence, let us denote $\mathrm{Spec}(A)$ its set of eigenvalues.

Proposition 1 : $\lim_{n \to +\infty} A^n = 0 \iff \mathrm{Spec}(A) \subset \{ z \in \mathbb{C} \mid \lvert z \rvert < 1 \}$

We will only prove $\impliedby$ as this is the only result we are interested in.

We can, by Dunford's decomposition, write: $A = D + N$ with $D$ diagonalizable and $N$ nilpotent such that $DN = ND$.

Now: $A^n = (D + N)^n = \sum_{k=0}^n \binom{n}{k} N^k D^{n - k}$.

Let us denote $m \in \mathbb{N}$ such that $N^m = 0$ and $N^{m - 1} \neq 0$ by nilpotency.

Then: $A^n = \sum_{k=0}^m \binom{n}{k} N^k D^{n - k}$.

Finally, as $D$ is diagonalizable, you can write $D = PUP^{-1}$ with $P$ non singular and $D^q = P U^q P^{-1}$.

As all non-null eigenvalues are written on the diagonal of $U$ and as they verify $\lvert z \rvert < 1$, $U^q \to 0$ when $q \to +\infty$.

Thus: $D^n \to 0$ also when $n \to +\infty$.

Thus, for all $k \in [[0, m]]$, $\binom{n}{k} N^k D^{n - k} \to 0$ when $n \to +\infty$, notice that $\binom{n}{k}$ when $k$ is fixed is a polynomial in $n$: $\binom{n}{k} = \dfrac{n(n - 1)\ldots(n - k + 1)}{k!} = \left(\dfrac{1}{k!} X(X - 1)\ldots(X - k + 1)\right)(n)$.

Thus: $A^n \to 0$ when $n \to +\infty$ by finite sum.

Now, it suffices to look at $\max_{z \in \mathrm{Spec(A)}} \lvert z \rvert$ to know if $A^n$ is going to vanish or explode in norm.

We can also show:

Proposition 2 : $\lim_{n \to +\infty} A^n \in \mathbb{C} \iff \mathrm{Spec}(A) \subset \{ z \in \mathbb{C} \mid \lvert z \rvert < 1 \} \cup \{ 1 \} \text{ and } \mathrm{rk}(A - I_n) = \mathrm{rk}((A - I_n)^2)$

That's why the integer analogy is so strong, at least over $\mathbb{C}$.