I am reading through "Numerical Linear Algebra", by Lloyd N. Trefethen and David Bau. I reached the final chapter about iterative methods. They all share the fact, that they are using Krylov subspaces.
What I understood so far: We are using those subspaces $K_{n}=<b, Ab, AAb, ... A^{n-1}b>$ for different problems, e.g. solving systems, that require dimensions much higher than $n$. Usually we first orthonormalize $K_n$, e.g. by using the Arnoldi iteration, so it is more stable. This way, we can save much time and computing-cost, and the solution can converge nicely, even for $n<<m$.
My question is: What makes Krylov spaces in particular so special? Why doesn't it work well (or does it?) for arbitrary other orthonormal bases, that I can expand dimension by dimension, too?
Strictly speaking Krylov methods do not always work well. For instance it is possible to construct $A \in \mathbb{R}^{n \times n},b \in \mathbb{R}^n$, $A$ invertible, $b$ nonzero such that the first $n-1$ GMRES solutions to $Ax=b$ all have the same residual, and the residual only goes to zero at the $n$th step. This situation turns out to be rather pathological, though (i.e. you have to construct the problem from scratch, real world problems don't look like this).
Getting back to more realistic situations, you can use other bases as you see fit. But by directly studying Krylov methods, which are mathematically convenient, you can find out when they perform well (by proving convergence estimates), and then divide a general problem between "convert the problem to a problem where Krylov methods perform well" and "use a Krylov method". That first step is called preconditioning; there are lots of preconditioners for Krylov methods out there.