Steepest descent for the quadratic norm?

1.2k Views Asked by At

From Convex Optimization by Boyd & Vandenberghe:

Let $\Delta x_{\text{snd}} = \arg \min \left\{ \nabla f(x)^T v : \|v\| = 1 \right\}$ be the normalized steepest descent direction with respect to the norm $\|*\|$.

Consider the norm $ \|z\|_P = (z^T P z)^{1/2} = \| P^{1/2} z \|_2$ where $P \in S^n_{++}$ the set of positive semidefinite matrices.

Then $\Delta x_{\text{snd}} = -(\nabla f(x)^T P^{-1} \nabla f(x))^{-1/2}P^{-1} \nabla f(x)$.


Can someone explain how $\Delta x_{\text{snd}} =-(\nabla f(x)^TP^{-1}\nabla f(x))^{-1/2}P^{-1} \nabla f(x)$?

I see that $\Delta x_{\text{snd}} = \arg \min\{\nabla f(x)^Tv : \|v\|_P = (v^TPv)^{1/2} = \|P^{1/2}v\|_2 = 1\}$

But from here I'm not seeing how the above equality is derived.

3

There are 3 best solutions below

0
On BEST ANSWER

Generally, the solution to $\min_{\|x\|=1} g^T x$ is $x = -{1 \over \|g\|} g$. This follows from Cauchy Schwartz.

The above problem can be written as $\min_{\|x\|_P = 1} g^T x = \min_{\|\sqrt{P}x\| = 1} g^T x = \min_{\|y\| = 1} g^T \sqrt{P}^{-1} y $ (where the transformation $y=\sqrt{P} x$ was used).

So the solution (of the last problem) is $y=-{1 \over \|\sqrt{P}^{-1} g\|} \sqrt{P}^{-1} g$ and converting back into the 'x' representation we have $x = \sqrt{P}^{-1}y = -{1 \over \|\sqrt{P}^{-1} g\|} P^{-1} g $.

Note that $\|\sqrt{P}^{-1} g\| = \sqrt{g^TP^{-1} g}$.

0
On

My attempt: Perhaps you can finish it
$\nabla_z ||P^{1/2}z||_2 = \frac{P^{T/2}P^{1/2} z}{||P^{1/2}z||_2}$. Now you want optimize $$ \min \frac{z^T P^{T/2}P^{1/2}}{||P^{1/2}z||_2} v \quad \text{s.t.} ||v||^2_2 = 1, $$ which is equivalent to $$ \min \frac{z^T P}{||P^{1/2}z||_2} v \quad \text{s.t.} ||v||^2_2 = 1 $$ To minimize the expression $v$ should be in the null-space $ P $.

0
On

$$ \Delta x_{\textrm{nsd}} = \textrm{argmin} \{ \nabla f(x)^Tv \mid \space\space\space \vert\vert v \vert\vert_{P} \le 1 \} $$

$$ = \textrm{argmin} \{ \nabla f(x)^Tv \mid \space\space\space\vert\vert P^{1/2}v \vert\vert_{2} \le 1 \} $$

$$ = \textrm{argmin} \{ \nabla f(x)^T P^{-1/2}P^{1/2}v \mid \space\space\space \vert\vert P^{1/2}v \vert\vert_{2} \le 1 \} $$

$$ = \textrm{argmin} \{ (\nabla f(x)^T P^{-1/2})y \mid \space\space\space \vert\vert y \vert\vert_{2} \le 1 \} $$

where we set $y = P^{1/2}v$.

Now we're back in the familiar Euclidean case and so immediately conclude that the $y$ which minimizes this inner product is the unit vector pointing in the opposite direction to $\nabla f(x)^T P^{-1/2}$:

$$y^{*} = \frac{-P^{-1/2}\nabla f(x)}{\vert\vert P^{-1/2}\nabla f(x) \vert\vert_{2}} $$

$$ \implies v^{*} = \frac{-P^{-1/2}P^{-1/2}\nabla f(x)}{\vert\vert P^{-1/2}\nabla f(x) \vert\vert_{2}} $$

$$ = \frac{-P^{-1}\nabla f(x)}{\vert\vert P^{-1/2}\nabla f(x) \vert\vert_{2}} $$

$$ = \frac{-P^{-1}\nabla f(x)^{T}}{(\nabla f(x)^{T} P^{-1} \nabla f(x))^{1/2}} $$

And we're done. To get $v^{*}$ we used $y = P^{1/2}v$ so $v = P^{-1/2}y$.

That $(\nabla f(x)^{T} P^{-1} \nabla f(x))^{1/2} = \vert\vert P^{-1/2}\nabla f(x) \vert\vert_{2}$ is given in the textbook referenced in the question.