Why is this symmetric rank one update "the symmetric solution that is closest?"

592 Views Asked by At

Trying to understand symmetric rank one updates and there is this like in the Wikipedia page that says...

A twice continuously differentiable function $x\mapsto f(x)$ has a gradient ($\nabla f$) and Hessian matrix $B$: The function $f$ has an expansion as a Taylor series at $x_{0}$, which can be truncated

$$ f(x_{0}+\Delta x)\approx f(x_{0})+\nabla f(x_{0})^{T}\Delta > x+{\frac {1}{2}}\Delta x^{T}{B}\Delta x $$

its gradient has a Taylor-series approximation also

$$\nabla f(x_{0}+\Delta x)\approx \nabla f(x_{0})+B\Delta x $$

which is used to update $B$. The above secant-equation need not have a unique solution $B$. The SR1 formula computes (via an update of rank 1) the symmetric solution that is closest to the current approximate-value $B_{k}$:

$$ B_{k+1}=B_{k}+{\frac {(y_{k}-B_{k}\Delta x_{k})(y_{k}-B_{k}\Delta x_{k})^{T}}{(y_{k}-B_{k}\Delta x_{k})^{T}\Delta x_{k}}} $$

Question

Why is the last formula above "the symmetric solution that is closest to the current approximate value $B$?

I can trivially see that the solution is symmetric, but how do we justify constructing it like this and why does $(y_{k}-B_{k}\Delta x_{k})^{T}$ suddenly appear in the denominator?

2

There are 2 best solutions below

4
On BEST ANSWER

The approximate matrix updates as follows: $$ B_{k+1}=B_{k}+\Delta B_{k}. $$

Note that we expect $B_{k}=\nabla^{2}f(x_{k})^{-1}, B_{k+1}=\nabla^{2}f(x_{k+1})^{-1}$. Then we have $$ \Delta B_{k}\approx \nabla^{2}f(x_{k+1})^{-1}-\nabla^{2}f(x_{k})^{-1}. $$

Since $\nabla^{2}f(x_{k+1})^{-1}$ and $\nabla^{2}f(x_{k})^{-1}$ are symmetric matrices, $\Delta B_{k}$ is supposed to be symmetric. Denote $\Delta B_{k}=\beta u u^{\top}$. Then the update becomes $$ B_{k+1}=B_{k}+\beta u u^{\top}, $$ where $\beta\in\mathbb{R},u\in\mathbb{R}^{n}$. Then we have $$ B_{k+1}y_{k}=B_{k}y_{k}+\beta u u^{\top}y_{k} \Rightarrow B_{k+1}y_{k}-B_{k}y_{k}=\beta(u^{\top}y_{k})u. $$

Thus $u=(B_{k+1}y_{k}-B_{k}y_{k})/(\beta(u^{\top}y_{k}))$. Denote $r=1/(\beta(u^{\top}y_{k}))\in\mathbb{R}$. Similarly we have $B_{k+1}y_{k}-B_{k}y_{k}=\beta u u^{\top}y_{k}$. Then it holds $$B_{k+1}y_{k}-B_{k}y_{k}=\beta r^{2}(B_{k+1}y_{k}-B_{k}y_{k})(B_{k+1}y_{k}-B_{k}y_{k})^{\top}y_{k}.$$

Then we have $\beta r^{2}(B_{k+1}y_{k}-B_{k}y_{k})^{\top}y_{k}=1$. Substituting this into the previous formula, we get $$B_{k+1}=B_{k}+\frac{(B_{k+1}y_{k}-B_{k}y_{k})(B_{k+1}y_{k}-B_{k}y_{k})^{\top}}{(B_{k+1}y_{k}-B_{k}y_{k})^{\top}y_{k}}.$$

Substitute $B_{k+1}y_{k},y_{k}$ with $y_{k}, \Delta x_{k}$, and you can get the update scheme in your post.

0
On

In this approach, we require the update of the Hessian matrix to be of the form $$ \mathbf{B}_{k+1}=\mathbf{B}_{k} + \mathbf{u}\mathbf{u}^T \tag{1} $$ under the constraint $$ \mathbf{B}_{k+1} \mathbf{s}_{k}=\mathbf{y}_{k} \tag{2} $$

Here $\mathbf{y}_{k}=\nabla f(\mathbf{x}_{k}+\mathbf{s}_{k})- \nabla f(\mathbf{x}_{k}) $.

Plugging $(1)$ into $(2)$ yields \begin{eqnarray*} \mathbf{B}_{k} \mathbf{s}_{k}+ \mathbf{u}\mathbf{u}^T\mathbf{s}_{k} &=& \mathbf{y}_{k} \\ (\mathbf{u}^T\mathbf{s}_{k}) \mathbf{u} &=& \mathbf{y}_{k}-\mathbf{B}_{k} \mathbf{s}_{k} = \mathbf{e}_k \tag{3} \end{eqnarray*} which basically says that we can write $\mathbf{u}=\alpha\ \mathbf{e}_k$ From the constraint (3), it follows that $ \alpha^2(\mathbf{e}_k^T\mathbf{s}_{k}) = 1 $

The Hessian update is finally $$ \mathbf{B}_{k+1} =\mathbf{B}_{k} + \alpha^2 \mathbf{e_k}\mathbf{e}_k^T =\mathbf{B}_{k} + \frac{1}{\mathbf{e}_k^T\mathbf{s}_{k}} \mathbf{e_k}\mathbf{e}_k^T $$ The closeness between $\mathbf{B}_{k+1}$ and $\mathbf{B}_{k}$ does not seem to hold in the derivation of the SR1 update.