trust region - choice of scaling matrix

560 Views Asked by Bumbble Comm At 31 Mar 2026 - 7:52

According to many resources, TR algorithms often suffer from bad scaling. The simplest remedy is to use scaling matrix D in following way

\begin{align} \min_d \ f + g'd + \frac{1}{2}*d'Bd \\ \text{subject to} \ ||Dd|| <= \Delta , \end{align}

where D is diagonal matrix with positive diagonal elements.

My question is, how to construct such a matrix? which properties should it fulfill?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 08 May 2016 - 1:26

JJ Moré's paper Recent Developments in Algorithms and Software for Trust Region Methods provides guidance for choosing the diagonal scaling matrix.

He shows that the convergence theorems for trust region optimization still hold provided the diagonal scaling matrices $D_k$ satisfy these properties:

(4.6) $\left\lVert D_k^{-1} \right\rVert \le \sigma_2$ for some $\sigma_2\gt0$

(4.12) $\left\lVert x_k-x_m \right\rVert \le \delta_1$, $k=m,\ldots, l$, $\implies \left\lVert D_l-D_m \right\rVert \le \delta_2$ for $\delta_1,\delta_2\gt0$

If $f:R^n \rightarrow R$ represents the optimization problem, he then proposes choosing

(4.13) $D_k=diag(d_{k,i})$, $d_{k,i}=max(d_{k-1,i}, \left| \partial_{ii}f(x_k) \right|^{1/2})$

I gather the reasoning is

It makes the algorithm scale invariant under diagonal linear transformations.
If $\left|\partial_{ii}f(x_k)\right|$ is large, then it's likely that $f(x)$ is more sensitive to change in that direction, and therefore it's likely the second order approximation to $f(x_k)$ is less accurate and the search area should be more restricted along the ith coordinate.

Moré's other paper The Levenberg-Marquardt algorithm: Implementation and theory has numerical results comparing different scaling methods. Although the methods are slightly different and specialized to least-squares optimization, they would correspond to these strategies:

(Initial) $D_k=\left|\partial_{ii}f(x_0)\right|^{1/2}$
(Adaptive) (The method described above)
(Continuous) $D_k=\left|\partial_{ii}f(x_k)\right|^{1/2}$

For the problems he tested, his results show that (Adaptive) is the best and (Continuous) is worse than either (Initial) or (Adaptive)

trust region - choice of scaling matrix

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in NONLINEAR-OPTIMIZATION

Related Questions in NONLINEAR-SYSTEM

Related Questions in NUMERICAL-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions