Trust Region Radius

149 Views Asked by At

Suppose we are in $\mathbb{R}^n$. The trust region subproblem is to find a vector $\ell$ such we minimize the function $$f(\ell) = \ell^{T}g + \ell^T B \ell$$ subject to $$\|\ell\|_2 \leq \Delta$$ for some $\Delta > 0$. Suppose that $B$ is a positive definite matrix. What is $$\lim_{\Delta \to 0} \frac{\ell}{\|\ell\|_2}?$$

My attempt so far has been the following. Suppose that $\Delta$ is very small. Then $$f(\ell) = \ell^{T}g + \ell^T B \ell \approx \ell^{T}g.$$ The function approximation should be minimized when $\ell = -\frac{\Delta}{\|g\|_2}\langle g_1,\ldots,g_n\rangle$, so we can use this vector $\ell$ to calculate the limit.

However, I'm not sure how to make my approximation rigorous. I can't see how to use any inequalities since the limit above depends on the point where the minimum will occur, not the minimum itself.

1

There are 1 best solutions below

10
On BEST ANSWER

If $Δ$ is small enough the minimum will be on the sphere $\|ℓ\|=Δ$. The optimization problem with this equality constraint can be encoded in a Lagrange functional $$ L(ℓ,\mu)=g^Tℓ+ℓ^TBℓ+\mu(ℓ^Tℓ-Δ^2) $$ (make that more general with inequality constraints and KKT conditions).

The condition for an extremum point reads then $$ 0=\frac{\partial L}{\partial ℓ}=g+2Bℓ+2\muℓ $$ or $$ ℓ=-\frac12(B+\mu I)^{-1}g. $$ The equation for $\mu$ is then $$ g^T(B+\mu I)^{-2}g=4Δ^2,\\ \mu^{-2}\left(\|g\|^2-2\mu^{-1}g^TBg+3\mu^{-2}g^TB^2g\mp\ldots\right)=4Δ^2. $$ As $Δ$ tends to zero, $\mu$ as the other free variable here has to become very large so that $\mu I$ dominates $B$. To normalize the equation, set $\mu^{-1}=\frac{2Δ}{\|g\|}u$, insert and simplify to get $$ u^2\left(1-4Δ\frac{g^TBg}{\|g\|}u+12Δ^2\frac{g^TB^2g}{\|g\|^2}u^2+\dots\right)=1 $$

In first order, this has the solution $u=1+O(Δ)$, or with the next term $u=1+2Δ\frac{g^TBg}{\|g\|}+O(Δ^2)$, leading to $$ \mu=\frac{\|g\|}{2Δ}(1+ O(Δ)), $$ as you also found in your solution (the other sign would approximately maximize the objective function on the sphere). Then, moving the small radius $Δ$ into numerators, $$ ℓ=-\fracΔ{\|g\|}\left(I+\frac{2Δ}{\|g\|}B\right)^{-1}g. \implies \frac{ℓ}{\|ℓ\|}=-\frac{(I+\frac{2Δ}{\|g\|}B)^{-1}g}{\|(I+\frac{2Δ}{\|g\|}B)^{-1}g\|} $$ which in the limit $Δ\to 0$ indeed converges to $-\frac{g}{\|g\|}$.