On each iteration we will compute a diagonal preconditioning matrix $\mathbf{D}$ (we omit the subscript $n$ ). $\mathbf{D}$ is expected to be a rough approximation to the Hessian. In our experiments, following Martens (2010), we set D to the diagonal of the Fisher matrix computed over $\mathcal{A}_{n} .$ To precondition, we define a normalized parameter vector $\tilde{\boldsymbol{\theta}}=\mathbf{D}^{1 / 2} \boldsymbol{\theta}$, compute the Krylov subspace in terms of $\tilde{\boldsymbol{\theta}}$, and convert back to the "canonical" coordinates. The result is the subspace spanned by the vectors $$ \left\{\left(\mathbf{D}^{-1} \mathbf{H}\right)^{k} \mathbf{D}^{-1} \mathbf{g}, 0 \leq k<K\right\} $$
Can somebody explain how step 11 of algorithm 2 is obtain ? The 11 th step gives the construction of k th row of the Hessian Matrix in coordinates of krylov subspace . But how it is the same as mention in step 11 ?
