Proof
I would like to check my understanding about the above.
Thoughts
So for this we would have $Q,x^{(0)},b$ and we're trying to find $x^{(*)}$.
The set of Q-conjugate vectors $d^{(i)}, \; i \in 1, 2, \cdots , n-1$ form a basis, therefore we can express the difference between the optimal point $x^*$ and the starting point $x^{(0)}$ in terms of this basis $\sum_{i=0}^{n-1}\beta_i d^{(i)}$.
Using that for Q-conjugate vectors $d^{(i)}, d^{(j)}$ where $i \neq j$ we have $d^{(i)T}Qd^{(j)} = 0$, the product simplifies and we can factor to get things in terms of $\beta_k$.
The $\beta$ terms can be expressed in terms of $g^{(0)}$, but we want to express them in terms of $g^{(k)}$ so that we can use the expression in each step.
Writing $x^{(k)} - x^{(0)}$ in terms of of the basis (Question 1) vectors multiplied
by $\alpha$ terms enables us to do that.
Then rearranging shows that $\beta_k = \alpha_k$ (Question 2)
question 1
there seems to be more $\beta$ terms than $\alpha$. There seems to be $1 , \cdots , k$ coefficients for $\beta$ and $1 , \cdots , k - 1$ for $\alpha$.
Why is this? If so - how can we equate them? If not (and there are the same number of each), what am I missing?
question 2
Why does this complete the proof? Is this showing that we have $k$ coefficients, and that we can solve for a coefficient with each step. Once we have completed all steps we're able to express the optimal point $x^{(*)}$ in terms of the Q-conjugate vectors?
This is Theorem 10.1 from Chong and Zak "An Introduction to Optimization"
edit - confusion about number of coefficients.
From reading the above (to me) it appeared that, if the $\beta$ coefficients range from $ 1, ... , n-1$, and the $\alpha$ range through $1,...,k-1$.
It's previously stated in the line "Now premultiply both sides of the ...." that $0\leq k < n$.
Then if I can go upto $k$ for $\beta$ coefficients and $k-1$ in $\alpha$ it seems (to me) that there is one less $\alpha$ coefficient than $\beta$.
Hopefully that explains the confusion that I had wrt to that.


First, the algorithm produces $\alpha_0,...,\alpha_{n-1}$ (and $d_0,...,d_{n-1}$), so there are $n$ of them.
After the last iteration we have $x_n = x_0+\sum_{k=0}^{n-1}\alpha_k d_k$.
Since the $d_k$ are linearly independent ($Q$ is symmetric positive definite, I presume) we can write the (unique) solution $x_*$ to $Qx_*=b$ as $x_*=x_0+\sum_{k=0}^{n-1}\beta_k d_k$ (in fact, any $x$ can be written this way for suitable coefficients).
The theorem shows that $\beta_k = \alpha_k$, hence we have $x_* = x_n$.
Note that the above is a theoretical result, in practice numerical errors accumulate; one can restart to reduce the impact. GC algorithms and their ilk are often used for finding quick approximate solutions.