I have been using Krylov subspace methods for a few years both in formal research and for fun. However all of the ones I have come into contact with so far are iterative - handling only one or a few dimensions at a time. Do there exist Krylov subspace methods which try to parallelize by taking a larger subspace at a time, resulting in fewer iterations but which each iteration involves more calculations and (hopefully) is more parallelize-able?
The reason I'm asking is that up until now I've been solving quite large sparse systems of linear (normal) equations, and now I'm thinking to solve smaller systems while utilizing the parallel capabilities of my hardware, so I would want to minimize the serial bottlenecks.