I'm trying to understand how a very simple change of variables works for the following gradient descent equation. \begin{align} x^{k+1} = x^k - \alpha^kD^k \nabla f(x^k) \end{align} This is supposedly viewed as a scaled version of steepest descent and they prove it by setting: \begin{align} S = (D^k)^{1/2} \end{align} They then use this to consider the transformation of variables \begin{align} x = Sy \end{align} The minimization problem gets rewritten as \begin{align} &\text{minimize } h(y) \equiv f(Sy) \\ &\text{subject to } y \in \mathbb{R}^n \end{align} And finally they say they can write this as follows: \begin{align} y^{k+1} = y^k - \alpha^k \nabla h(y^k) \end{align}
Where does $D^k$ go? My work so far is simply plugging in $Sy$ for $x$ and then multiplying by the inverse as follows which gets me the following: \begin{align} Sy^{k+1} &= Sy^k - \alpha^k D^k \nabla f(Sy^k) \\ S^{-1}(Sy^{k+1}) &= S^{-1}(Sy^k - \alpha^k D^k \nabla h(y^k)) \\ y^{k+1} &= y^k - \alpha^k S^{-1}D^k \nabla h(y^k) \\ y^{k+1} &= y^k - \alpha^k D^{-1/2}D^k \nabla h(y^k) \\ \end{align} Where do I find my other $D^{-1/2}$ to make $D^k$ go to 1?
Edit: Here is the solution for anyone who needs it:
We need the $D = S^2$ and $\nabla h(y^k) = S\nabla f(Sy^k)$
\begin{align} Sy^{k+1} &= Sy^k - \alpha^k S^2 \nabla f(Sy^k) \\ Sy^{k+1} &= Sy^k - \alpha^k S \nabla h(y^k) \\ S^{-1}(Sy^{k+1}) &= S^{-1}(Sy^k - \alpha^k S \nabla h(y^k)) \\ y^{k+1} &= y^k - \alpha^k \nabla h(y^k) \\ \end{align}
You need to write the gradient of $h(y) = f(Sy)$ wrt $y$:
$\nabla_y h(y) = S^T \nabla_x f(Sy) = S \nabla_x f(Sy)$ ($S$ seems to be symmetric?)
Could be where you had a missing $S$.