From Convex Optimization by Boyd & Vandenberghe:
Let $\Delta x_{\text{snd}} = \arg \min \left\{ \nabla f(x)^T v : \|v\| = 1 \right\}$ be the normalized steepest descent direction with respect to the norm $\|*\|$.
Consider the norm $ \|z\|_P = (z^T P z)^{1/2} = \| P^{1/2} z \|_2$ where $P \in S^n_{++}$ the set of positive semidefinite matrices.
Then $\Delta x_{\text{snd}} = -(\nabla f(x)^T P^{-1} \nabla f(x))^{-1/2}P^{-1} \nabla f(x)$.
Can someone explain how $\Delta x_{\text{snd}} =-(\nabla f(x)^TP^{-1}\nabla f(x))^{-1/2}P^{-1} \nabla f(x)$?
I see that $\Delta x_{\text{snd}} = \arg \min\{\nabla f(x)^Tv : \|v\|_P = (v^TPv)^{1/2} = \|P^{1/2}v\|_2 = 1\}$
But from here I'm not seeing how the above equality is derived.
Generally, the solution to $\min_{\|x\|=1} g^T x$ is $x = -{1 \over \|g\|} g$. This follows from Cauchy Schwartz.
The above problem can be written as $\min_{\|x\|_P = 1} g^T x = \min_{\|\sqrt{P}x\| = 1} g^T x = \min_{\|y\| = 1} g^T \sqrt{P}^{-1} y $ (where the transformation $y=\sqrt{P} x$ was used).
So the solution (of the last problem) is $y=-{1 \over \|\sqrt{P}^{-1} g\|} \sqrt{P}^{-1} g$ and converting back into the 'x' representation we have $x = \sqrt{P}^{-1}y = -{1 \over \|\sqrt{P}^{-1} g\|} P^{-1} g $.
Note that $\|\sqrt{P}^{-1} g\| = \sqrt{g^TP^{-1} g}$.