Suppose I want to solve $Ax = b$ with an iterative method (I'm specifically thinking CG, so we're minimizing $\phi(x) := x^T A x - b^Tx,\, A \succ 0$). I was told to start my initial guess at $x_0 = 0$, but I don't understand why.
If I'm using CG, the gradient of $phi$ gives residual: $Ax_0 - b = 0 - b =-b$, so I see that this produces the "$b$" in the Krylov subspace $K_m = \{b, Ab, A^2b, ..., A^{m-1}b\}$. Similarly, at the second step, we get something like $Ab$.
Is this the main reason for starting $x0 = 0$ in this case? What's wrong with having a Krylov subspace that's "impure" (i.e., $\tilde{K} = \{Ax_0-b, A(Ax_0-b), \ldots \}$) or am I misunderstanding something?
Finally, does choosing $x_0 = 0$ matter for non-Krylov methods? For example, if we want to minimize a generic quadratic $\psi(y) := y^T B y + c^Ty + d,\, B \succeq 0$, can we do better than choosing $y_0=0$?