The need for the Gram–Schmidt process

11.3k Views Asked by At

As far as I understood Gram–Schmidt orthogonalization starts with a set of linearly independent vectors and produces a set of mutually orthonormal vectors that spans the same space that starting vectors did.

I have no problem understanding the algorithm, but here is a thing I fail to get. Why do I need to do all these calculations? For example, instead of doing the calculations provided in that wiki page in example section, why can't I just grab two basis vectors $w_1 = (1, 0)'$ and $w_2 = (0, 1)'$? They are clearly orthonormal and span the same subspace as the original vectors $v_1 = (3, 1)'$, $v_2 = (2, 2)'$.

It is clear that I'm missing something important, but I can't see what exactly.

6

There are 6 best solutions below

4
On BEST ANSWER

Orthonormal bases are nice because several formulas are much simpler when vectors are given wrt an ON basis.

Example: Let $\mathcal E = \{e_1, \dots, e_n\}$ be an ON basis. Then the Fourier expansion of any vector $v\in\operatorname{span}(\mathcal E)$ is just $$v = (v\cdot e_1)e_1 + (v\cdot e_2)e_2 + \cdots + (v\cdot e_n)e_n$$

Notice that there are no normalization factors and we don't need to construct a dual basis -- it's just a really simple formula.

In your example, of course $\{(1,0),(0,1)\}$ spans the same space as $\{(3,2),(2,2)\}$. But let me provide an example of my own: what about $\{(1.1,1.2,0.9,2.1,4),(3,-2,6,14,2),(6,6,6,3.4,11.1)\}$? There's certainly no subset of the standard basis vectors that spans the same space as these linearly independent vectors. But this is a pretty poor choice of basis because they're not orthonormal. It'd sure be nice if we had some algorithm that could produce as ON basis from them...

0
On

If the entire vector space we wanted a basis for all of $\mathbb{R}^n$ or a basis for, say, the $xy$-plane in $\mathbb{R}^3$, then we could certainly do this. The problem arises when we're looking for an orthonormal basis for subspaces that are more complicated.

For example, let take our subspace to be the plane $x+y+z=0$ inside of $\mathbb{R}^3$. This subspace can be written as

$$\mathrm{span} \left\{ \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} , \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix} \right\}$$

Running Gram-Schmidt on this gives the basis

$$\left\{ \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix} ~,~ \begin{bmatrix} \frac{1}{\sqrt{6}} \\\frac{1}{\sqrt{6}}\\ -\sqrt{\frac{2}{3}} \end{bmatrix} \right\}$$ which generates the same subspace of $\mathbb{R}^3$. However, if we just tried to take the basis $\left\{ \vec{e}_1 = [1~ 0~ 0]^T, ~ \vec{e}_2 = [ 0 ~ 1 ~ 0]^T \right\}$, we would certainly have an ortho-normal basis, but of a different subspace.

TL;DR: Gram-Schmidt is designed to turn a basis into an ortho-normal basis without altering the subspace that it spans.

Edit Gram-Schmidt is also important in that it preserves the orientation of given basis (roughly speaking, the order in which the basis elements are introduced). Somewhat higher level and beyond the scope of an introductory course to linear algebra, but worth tucking away for later.

2
On

Your choice of $w_1 = (1,0)$ and $w_2 = (0,1)$ fails one of the basic purposes of the Gram-Schmidt process: the result of the algorithm would not only have $\mathrm{Span}(v_1,v_2) = \mathrm{Span}(w_1,w_2)$ but also $\mathrm{Span}(v_1) = \mathrm{Span}(w_1).$

There are a few things I should mention:

(1) There are finite-dimensional vector spaces for which it is not easy to guess an orthonormal basis (or indeed any basis). Modular forms with their Petersson scalar product are an intimidating example of this.

(2) The Gram-Schmidt process is smooth in an appropriate sense, which makes it possible to use the Gram-Schmidt process to orthogonalize sections of a Euclidean bundle (a vector bundle with scalar product) and in particular to define things like the orthogonal complement of subbundles. This turns out to be important.

(3) There are also concrete and relatively elementary situations such as QR-factorization for why Gram-Schmidt orthogonalization is useful. Maybe you are already familiar with this.

3
On

You can also get into function spaces where it's not clear what the basis you can just grab from is. The Legendre polynomials can be constructed by starting with the functions $1$ and $x$ on the interval $x \in [-1,1]$, and using Gram-Schmidt orthogonalization to construct the higher order ones. The second order polynomial is constructed by removing the component of $x^2$ that points in the direction of $1$, for example.

0
On

To give a somewhat practical example, let’s consider the following iterative sequence:

$$ x_{i+1} = \frac{A x_i}{\left |A x_i \right|},$$

Where the $x_i$ are vectors and $A$ is a quadratic matrix with a matching dimension. The first vector $x_1$ shall be random¹.

When $i$ is increased, $x_i$ will more and more be aligned to an eigenvector corresponding to the largest eigenvalue of $A$ and quickly become such an eigenvector for all practical purposes. So we can use this sequence to get a numerical approximation for the largest eigenvalue of $A$ and a corresponding eigenvector (of length 1). This approach is actually used for this purpose.

Now, if we want to obtain the second largest eigenvalue of $A$ (and the corresponding eigenvector) in a similar way, we have to remove the influence of the first eigenvalue, as it would inevitably dominate our sequence otherwise. Hence we have to work in a subspace that is orthogonal to the eigenvector for the largest eigenvalue. Gram–Schmidt orthonormalisation provides a way to do this: We start with two random vectors $x_0$ and $y_0$ and then we do a Gram–Schmidt after every step:

$$\begin{align} x_{i+1} &= \frac{A x_i}{\left |A x_i \right|},\\ y_{i+1} &= \frac{A y_i - \left \langle A y_i, x_{i+1} \right \rangle}{\left | A y_i - \left \langle A y_i, x_{i+1} \right \rangle \right |} \end{align} $$

Due to this, $x_i$ will align with the first eigenvector and $y_i$ will align with the second one. We can extend this scheme to further eigenvectors if we like.

The crucial feature of the Gram–Schmidt process that we exploit here is that the first $k$ vectors of its result span the same subspace as the first $k$ vectors of its input for any $k$. A consequence of this is that the $k$th output vector is orthogonal to all previous output vectors. Obviously, this would not work with any basis.

Now in many cases, you can determine the first eigenvectors more easily, but there are analogous problems, where you need to go the way described above. For example, it is used for numerical calculation of Lyapunov exponents, where $A$, $x$, and $y$ are subject to a complex temporal evolution.


¹ and thus we can assume that it is not orthogonal to any eigenvector of $A$

0
On

If you want to represent a subspace, the standard basis will no do.

For example, assume a plane defined by two arbitrary vectors in 3D: G-S will define an orthonormal base that spans the plane.