Scaled Proximal Operator for Proximal Newton Method

313 Views Asked by At

The scaled proximal operator was introduced as an extension of the (regular) proximal operator:

$prox^H_h(x) = \arg\min_y h(y) + \frac{1}{2}\|y-x\|^2_H$.

(See http://papers.nips.cc/paper/4740-proximal-newton-type-methods-for-convex-optimization.pdf for a more complete description.)

Suppose we are solving $\min_x g(x) + h(x)$ which has optimum $x^*$. Is the following always true for any positive definite matrix $Q$?

$\qquad\|prox^Q_h\big(x - Q^{-1} \nabla g(x)\big) - x^*\|_2 \le \|x - Q^{-1} \nabla g(x) - x^*\|_2$

Intuitively it would seem so, because the update $x - Q^{-1} \nabla g(x)$ without the proximal step ignores the $h(x)$ component of the complete objective. Or would it only be guaranteed to hold for $Q=\nabla^2 g(x)$?

Also, I know this definitely holds for the regular proximal operator. For the regular proximal operator, it's nonexpansive property is as follows:

$\qquad \|prox_h(u) - prox_h(v)\|_2 \le \|u-v\|_2$

Choosing $ u = x-Q^{-1} \nabla g(x)$ and $v = x^*$, and using $prox_h(x^*)=x^*$, we have:

$\qquad\|prox_h\big(x - Q^{-1} \nabla g(x)\big) - x^*\|_2 \le \|x - Q^{-1} \nabla g(x) - x^*\|_2$.

Edit: As pointed out in the comments, $prox_h(x^*)\ne x^*$. Instead, $prox_h(x^* - Q^{-1}\nabla g(x^*)) = x^*$. So the equality is sadly not true for either $prox_h$ or $prox^Q_h$.