eigenvalues of bordered Hessian

627 Views Asked by At

$$ \newcommand{\mb}[1]{\mathbf{#1}} \newcommand{\bs}[1]{\boldsymbol{#1}} \newcommand{\mleft}{\left} \newcommand{\mright}{\right} $$

I'm interested in a general statement on the eigenvalues of the bordered Hessian. The bordered Hessian is arising from optimization with equality constraints in a Lagrange-multiplier framework.

We optimize a function $f(\mb{x})$ over an $n$-dimensional vector $\mb{x}$. There are $m$ equality constraints $g_i(\mb{x}) = 0$, summarized in a vector $\mb{g}(\mb{x})$. The Lagrangian (with Lagrange multipliers $\bs{\lambda}$) is given by

$$ \begin{align} L\mleft( \mb{x},\bs{\lambda} \mright) &= f\mleft( \mb{x} \mright) + \bs{\lambda}^T \mb{g}\mleft( \mb{x} \mright) \end{align}, $$

the first-order derivatives are

$$ \begin{align} \frac{\partial}{\partial \mb{x}}L\mleft( \mb{x},\bs{\lambda} \mright) &= \frac{\partial}{\partial \mb{x}}f\mleft( \mb{x} \mright) + \bs{\lambda}^T \mleft( \frac{\partial}{\partial \mb{x}}\mb{g}\mleft( \mb{x} \mright) \mright)\\ %==================== \frac{\partial}{\partial \bs{\lambda}}L\mleft( \mb{x},\bs{\lambda} \mright) &= \mleft( \mb{g}\mleft( \mb{x} \mright) \mright)^T \end{align} $$

and the bordered Hessian contains the second order derivatives

$$ \begin{align} \bar{\mb{H}} &= \begin{pmatrix}\frac{\partial}{\partial \mb{x}}\mleft( \frac{\partial}{\partial \mb{x}}L\mleft( \mb{x},\bs{\lambda} \mright) \mright)^T & \frac{\partial}{\partial \bs{\lambda}}\mleft( \frac{\partial}{\partial \mb{x}}L\mleft( \mb{x},\bs{\lambda} \mright) \mright)^T\\\frac{\partial}{\partial \mb{x}}\mleft( \frac{\partial}{\partial \bs{\lambda}}L\mleft( \mb{x},\bs{\lambda} \mright) \mright)^T & \mb{0}_{m,m}\end{pmatrix} \end{align} $$

with

$$ \begin{align} \frac{\partial}{\partial \mb{x}}\mleft( \frac{\partial}{\partial \mb{x}}L\mleft( \mb{x},\bs{\lambda} \mright) \mright)^T &= \frac{\partial}{\partial \mb{x}}\mleft( \frac{\partial}{\partial \mb{x}}f\mleft( \mb{x} \mright) \mright)^T + \mleft( \sum\limits_{k = 1}^{m}\lambda_{k} \mleft[ \frac{\partial}{\partial x_{j}}\mleft\{ \frac{\partial}{\partial x_{i}}g_{k}\mleft( \mb{x} \mright) \mright\} \mright] \mright)^{n\times n}_{i,j} \end{align} $$

and

$$ \begin{align} \frac{\partial}{\partial \bs{\lambda}}\mleft( \frac{\partial}{\partial \mb{x}}L\mleft( \mb{x},\bs{\lambda} \mright) \mright)^T &= \mleft( \frac{\partial}{\partial \mb{x}}\mb{g}\mleft( \mb{x} \mright) \mright)^T \end{align} $$

(note that any Hessian is symmetric).

I'm interested in the question whether $L(\mb{x},\bs{\lambda})$ has a saddle point at each fixed point of the first-order derivatives when the vector space is defined by $\mb{x}$ together with $\bs{\lambda}$.

G.R. Walsh (Methods of Optimization, Wiley, 1975) gives a Theorem (1.1, p.20) which states that it is a saddle point. I'm afraid I don't understand the proof; it seems to be based on the observation that a quadratic form of the bordered Hessian is zero for vectors where the components corresponding to $\mb{x}$ are zero.

D. Kalman (Leveling with Lagrange: An Alternate View of Constrained Optimization, Mathematics Magazine, 82:3, 186-196, 2009) provides a proof of the saddle-point property for the case of $n=2$ and $m=1$, but I'm not sure how this generalizes to arbitrary $n$ and $m$.

J. Baker (Geometry Optimization in Cartesian Coordinates: Constrained Optimization, J. Comp. Chem., 13:2, 240-253, 1992) writes "what typically happens ... is that the inclusion of the constraint introduces an additional mode ... whose largest component corresponds to the ... Lagrange multiplier ... and which has ... a negative Hessian eigenvalue ..." and "... each constraint results in an additional mode with negative curvature". There are also statements that the original modes are modified to some degree by the Lagrange multipliers (I assume that they retain their positive eigenvalues, but this is not clear). Unfortunately, no citation or analysis is provided.

So there seems to be some knowledge on the eigenvalues of the bordered Hessian, but looking at the equation of the bordered Hessian above I have no idea how they can be derived. I would be really grateful for some ideas or pointers to the literature. Thanks a lot for your help!


Edit:

I wonder if this answer may lead to a solution: https://math.stackexchange.com/a/3944430?

It shows that a symmetric block matrix with a positive definite upper left corner, a zero matrix in the lower right corner, and a matrix with less rows than columns and full row rank in the lower left corner (and transposed in the upper right corner) is indefinite.

However, it is not clear to me that the upper left corner would be positive definite in the Bordered Hessian. The rank condition may (usually?) be satisfied in the Bordered Hessian (see Izmailov AF, Solodov MV: Newton-Type Methods for Optimization and Variational Problems, Springer 2014, section 4.1.1. Newton-Lagrange Method).


Edit:

Even better may be this approach: https://math.stackexchange.com/a/4805925. We can assume that the bordered Hessian is usually non-singular. We could then pick direction vectors with zero entries corresponding to the state vector elements, and arbitrary non-zero entries corresponding to the Lagrange multipliers. The structure of the bordered Hessian implies that the resulting quadratic form will always be zero. Therefore the bordered Hessian is indefinite.