how does one perform quadratic norm steepest descent

294 Views Asked by At

When performing steepest descent to minimize a function $f(x)$, with quadratic norm, $\|Z\|_p = (Z^TPZ)^{1/2} = \|P^{1/2}z\|_2 : P \in S_{++}^n$, where $ S_{++}^n$ is the set of positive semi definite matrices, is there any good procedure for choosing the matrix $P$?

I have read that we should choose $P$ such that the sublevel sets of $P^{-1} f(x)$ are well conditioned. What exactly does this mean, in terms of intuition and procedure for determining $P$? I know that sublevel sets are $S_t = \{x : f(x)\leq t\}$.

1

There are 1 best solutions below

4
On

I think the last statement should be corrected to ``we should choose $P$ such that the sublevel sets of $f\left(P^{-1/2}x\right)$ are well conditioned''. Here's why!

Let $f:R^n\to R$ and we want to minimize $f(x)$ using the steepest descent method with respect to the quadratic norm $\|\cdot\|_P$ for some $P\in S_{++}^n$ where $\|x\|_P = \sqrt{x^TPx}$ for any $x\in R^n$.

Then the steepest descent direction at $x$ is defined by \begin{equation} v := \mathop{\mathrm{arginf}}_{y: \|y\|_P=1} y^T \nabla f(x). \end{equation}

If we let $z = P^{1/2} y$, then $\|y\|_P^2 = y^T P y = z^Tz = \|z\|_2^2$, thus \begin{equation} \inf_{y: \|y\|_P=1} y^T \nabla f(x) = \inf_{z: \|z\|_2=1} (P^{-1/2} z)^T \nabla f(x) = \inf_{z: \|z\|_2=1} z^T P^{-1/2} \nabla f(x). \end{equation} Therefore $z=-P^{-1/2}\nabla f(x)$ minimizes the quantity. Thus, \begin{equation} v := \mathop{\mathrm{arginf}}_{y: \|y\|_P=1} y^T \nabla f(x) = P^{-1/2} \left(-P^{-1/2} \nabla f(x)\right) = - P^{-1} \nabla f(x). \end{equation}

Now if we let $\tilde{f}:R^n \to R$, such that $\tilde{f}(z) = f\left(P^{-1/2}z\right)$, then the normal gradient descent direction is at $z$ \begin{equation} w := - \nabla \tilde{f} (z) = - P^{-1/2} \nabla f\left( P^{-1/2} z\right), \end{equation} thus the corresponding search direction in the original space at $x=P^{-1/2}z$ is \begin{equation} \tilde{w} := P^{-1/2} w = - P^{-1} \nabla f(x). \end{equation}

Therefore, the steepest descent direction with respect to $P$-norm is equivalent to that with respect to $2$-norm in the transformed space by $P^{1/2}$.

Now in order for the gradient descent method (which is the steepest descent method with respect to $2$-norm) to work well, the condition number of the Hessian, $\nabla^2 \tilde{f}(z)$ should be close to $1$, i.e., $\nabla^2 \tilde{f}(z)$ should be well-conditioned. The $\nabla^2 \tilde{f}(z)$ is well-conditioned is equivalent to the sublevel set of $\tilde{f}(z) = f\left(P^{-1/2}x\right)$ being well-conditioned!