How to intuitively understand eigenvalue and eigenvector?

148.8k Views Asked by At

I’m learning multivariate analysis and I have learnt linear algebra for two semesters when I was a freshman.

Eigenvalues and eigenvectors are easy to calculate and the concept is not difficult to understand. I found that there are many applications of eigenvalues and eigenvectors in multivariate analysis. For example:

In principal components, proportion of total population variance due to $k$th principal component equals $$\frac{\lambda_k}{\lambda_1+\lambda_2+\ldots+\lambda_k}$$

I think eigenvalue product corresponding eigenvector has same effect as the matrix product eigenvector geometrically.

I think my former understanding may be too naive so that I cannot find the link between eigenvalue and its application in principal components and others.

I know how to induce almost every step form the assumption to the result mathematically. I’d like to know how to intuitively or geometrically understand eigenvalue and eigenvector in the context of multivariate analysis (in linear algebra is also good).

Thank you!

3

There are 3 best solutions below

9
On BEST ANSWER

Personally, I feel that intuition isn't something which is easily explained. Intuition in mathematics is synonymous with experience and you gain intuition by working numerous examples. With my disclaimer out of the way, let me try to present a very informal way of looking at eigenvalues and eigenvectors.

First, let us forget about principal component analysis for a little bit and ask ourselves exactly what eigenvectors and eigenvalues are. A typical introduction to spectral theory presents eigenvectors as vectors which are fixed in direction under a given linear transformation. The scaling factor of these eigenvectors is then called the eigenvalue. Under such a definition, I imagine that many students regard this as a minor curiosity, convince themselves that it must be a useful concept and then move on. It is not immediately clear, at least to me, why this should serve as such a central subject in linear algebra.

Eigenpairs are a lot like the roots of a polynomial. It is difficult to describe why the concept of a root is useful, not because there are few applications but because there are too many. If you tell me all the roots of a polynomial, then mentally I have an image of how the polynomial must look. For example, all monic cubics with three real roots look more or less the same. So one of the most central facts about the roots of a polynomial is that they ground the polynomial. A root literally roots the polynomial, limiting it's shape.

Eigenvectors are much the same. If you have a line or plane which is invariant then there is only so much you can do to the surrounding space without breaking the limitations. So in a sense eigenvectors are not important because they themselves are fixed but rather they limit the behavior of the linear transformation. Each eigenvector is like a skewer which helps to hold the linear transformation into place.

Very (very, very) roughly then, the eigenvalues of a linear mapping is a measure of the distortion induced by the transformation and the eigenvectors tell you about how the distortion is oriented. It is precisely this rough picture which makes PCA very useful.

Suppose you have a set of data which is distributed as an ellipsoid oriented in $3$-space. If this ellipsoid was very flat in some direction, then in a sense we can recover much of the information that we want even if we ignore the thickness of the ellipse. This what PCA aims to do. The eigenvectors tell you about how the ellipse is oriented and the eigenvalues tell you where the ellipse is distorted (where it's flat). If you choose to ignore the "thickness" of the ellipse then you are effectively compressing the eigenvector in that direction; you are projecting the ellipsoid into the most optimal direction to look at. To quote wiki:

PCA can supply the user with a lower-dimensional picture, a "shadow" of this object when viewed from its (in some sense) most informative viewpoint

6
On

First let us think what a square matrix does to a vector. Consider a matrix $A \in \mathbb{R}^{n \times n}$. Let us see what the matrix $A$ acting on a vector $x$ does to this vector. By action, we mean multiplication i.e. we get a new vector $y = Ax$.

The matrix acting on a vector $x$ does two things to the vector $x$.

  1. It scales the vector.
  2. It rotates the vector.

However, for any matrix $A$, there are some favored vectors/directions. When the matrix acts on these favored vectors, the action essentially results in just scaling the vector. There is no rotation. These favored vectors are precisely the eigenvectors and the amount by which each of these favored vectors stretches or compresses is the eigenvalue.

So why are these eigenvectors and eigenvalues important? Consider the eigenvector corresponding to the maximum (absolute) eigenvalue. If we take a vector along this eigenvector, then the action of the matrix is maximum. No other vector when acted by this matrix will get stretched as much as this eigenvector.

Hence, if a vector were to lie "close" to this eigen direction, then the "effect" of action by this matrix will be "large" i.e. the action by this matrix results in "large" response for this vector. The effect of the action by this matrix is high for large (absolute) eigenvalues and less for small (absolute) eigenvalues. Hence, the directions/vectors along which this action is high are called the principal directions or principal eigenvectors. The corresponding eigenvalues are called the principal values.

0
On

Given a linear operator $T:V \to V$, it's natural to try to find a basis $\beta$ of $V$ so that $[T]_{\beta}$, the matrix of $T$ with respect to $\beta$, is as simple as possible. Ideally, $[T]_{\beta}$ would be diagonal. And it's easy to see that if $[T]_{\beta}$ is diagonal, then $\beta$ is a basis of eigenvectors of $T$. This is one way that we might discover the idea of eigenvectors, and recognize their significance.

Here are some details. Suppose that $\beta = (v_1,\ldots,v_n)$ is an ordered basis for $V$, and $[T]_{\beta}$ is diagonal: \begin{equation} [T]_{\beta} = \begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{bmatrix}. \end{equation} The first column of $[T]_{\beta}$ is $[T(v_1)]_{\beta}$, the coordinate vector of $T(v_1)$ with respect to $\beta$. This shows that \begin{equation} [T(v_1)]_{\beta} = \begin{bmatrix} \lambda_1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}. \end{equation} And this means that \begin{align} T(v_1) &= \lambda_1 v_1 + 0 \cdot v_2 + \cdots + 0 \cdot v_n \\ &= \lambda_1 v_1. \end{align} We see that $T(v_1)$ is just a scalar multiple of $v_1$. In other words, $v_1$ is an eigenvector of $T$, with eigenvalue $\lambda_1$.

Similarly, $v_2,\ldots,v_n$ are also eigenvectors of $T$.