Second Order Conditions For Constrained Optimization

156 Views Asked by Bumbble Comm At 12 Apr 2026 - 3:07

I'm having trouble understanding the second order conditions for deducing whether an extreme value of a function subject to constraints is a minima or a maxima. I'll put forth what I do understand. Please fill in the gaps.

Suppose I have a function $f:\mathbb{R}^n \to \mathbb{R}$ with constraints $g_\ell : \mathbb{R}^n \to \mathbb{R}$ such that $g_\ell = C_\ell$ for $\ell \in [m]$. Consider the Lagrange Function: $$\Lambda(\textbf{x},\lambda) = f(\textbf{x}) - \sum_{\ell=1}^m\lambda_\ell[g_\ell(\textbf{x}) - C_\ell]$$ Let $(\textbf{x}^*,\lambda^*)$ be a solution to $\nabla \Lambda(\textbf{x},\lambda) = \textbf{0}$. Then $\textbf{x}^*$ extremizes $f$ subject to the constraints $g_\ell$ with $\ell \in [m]$. Next, if we consider

$$\textbf{H}_\Lambda(\textbf{x},\lambda) = \begin{bmatrix} \dfrac{\partial^2 \Lambda}{\partial \lambda^2} & \dfrac{\partial^2 \Lambda}{\partial \lambda \partial \mathbf x} \\ \left(\dfrac{\partial^2 \Lambda}{\partial \lambda \partial \mathbf x}\right)^{\mathsf{T}} & \dfrac{\partial^2 \Lambda}{\partial \mathbf x^2} \end{bmatrix} = \begin{bmatrix} 0 &\ldots & 0 & -\dfrac{\partial g_1}{\partial x_1} & \cdots & -\dfrac{\partial g_1}{\partial x_n} \\[1.5ex] \vdots & \ddots & \vdots & \vdots & \ddots & \vdots &\\[1.5ex] 0 &\ldots & 0 & -\dfrac{\partial g_m}{\partial x_1} & \cdots & -\dfrac{\partial g_m}{\partial x_n} \\[1.5ex] -\dfrac{\partial g_1}{\partial x_1} & \cdots & -\dfrac{\partial g_m}{\partial x_1} & \dfrac{\partial^2 \Lambda}{\partial x_1\, \partial x_1} & \cdots & \dfrac{\partial^2 \Lambda}{\partial x_1\,\partial x_n} \\[1.5ex] \vdots & \ddots & \vdots & \vdots & \ddots & \vdots &\\[1.5ex] -\dfrac{\partial g_1}{\partial x_n} & \cdots & -\dfrac{\partial g_m}{\partial x_n} & \dfrac{\partial^2 \Lambda}{\partial x_n\, \partial x_1} & \cdots & \dfrac{\partial^2 \Lambda}{\partial x_n\,\partial x_n} \\[1.5ex] \end{bmatrix}$$ So we can rewrite this as $$\textbf{H}_\Lambda(\textbf{x},\lambda) = \begin{bmatrix} 0 & -\dfrac{\partial g}{\partial \mathbf x} \\ -\left(\dfrac{\partial g}{\partial \mathbf x}\right)^{\mathsf{T}} & \dfrac{\partial^2 \Lambda}{\partial \mathbf x^2} \end{bmatrix}$$

Suppose $\textbf{r}(t) \in g^{-1}(\textbf{C})$ with $\textbf{r}(t^*)=\textbf{x}^*$ and $\textbf{r}'(t^*) = \textbf{v}$, then $$\left.\frac{d^2}{dt^2} f(\textbf{r}(t))\right|_{t=t^*} = \textbf{v}^T \left(\frac{\partial^2 \Lambda}{\partial \textbf{x}^2}\right) \textbf{v}$$

Proof: $$\frac{d}{dt} f(\textbf{r}(t)) = \sum_{i=1}^n \frac{\partial f(\textbf{r}(t))}{\partial x_i} r'_i(t) \longleftarrow \text{ By Chain Rule} \\ \implies \left.\frac{d^2}{dt^2} f(\textbf{r}(t))\right|_{t=t^*} = \left.\sum_{i=1}^n \frac{d}{dt} \frac{\partial f(\textbf{r}(t))}{\partial x_i}\right|_{t=t^*} r'_i(t^*) + \sum_{i=1}^n \frac{\partial f(\textbf{x}^*)}{\partial x_i} r''_i(t^*) \leftarrow \text{ By Product Rule}\\ \implies \left.\frac{d^2}{dt^2} f(\textbf{r}(t))\right|_{t=t^*} = \sum_{i,j \in [n]} \frac{\partial^2 f(\textbf{x}^*)}{\partial x_i \partial x_j} r'_i(t^*) r'_j(t^*)+ \sum_{i=1}^n \frac{\partial f(\textbf{x}^*)}{\partial x_i} r''_i(t^*) \\ \implies \left.\frac{d^2}{dt^2} f(\textbf{r}(t))\right|_{t=t^*} = \textbf{v}^T \textbf{H}_f \textbf{v} + \nabla f(\textbf{x}^*) \cdot \textbf{r}''(t^*) $$

$$\implies \left.\frac{d^2}{dt^2} f(\textbf{r}(t))\right|_{t=t^*} = \textbf{v}^T \textbf{H}_f \textbf{v} + \sum_{\ell = 1}^n\lambda_\ell \nabla g_\ell(\textbf{x}^*) \cdot \textbf{r}''(t^*) \tag{$\dagger$} \label{eq1}$$

Now performing a similar calculation for $g_\ell(\textbf{r}(t)) = 0$ who's derivatives must be $0$ gives us:

$$0 = \left.\frac{d^2}{dt^2} g_\ell(\textbf{r}(t))\right|_{t=t^*} = \textbf{v}^T \textbf{H}_{g_\ell} \textbf{v} + \nabla g_\ell(\textbf{x}^*) \cdot \textbf{r}''(t^*) \\ \implies - \lambda_\ell \textbf{v}^T \textbf{H}_{g_\ell} \textbf{v} = \lambda_\ell\nabla g_\ell(\textbf{x}^*) \cdot \textbf{r}''(t^*)$$ Substituting this in \ref{eq1} gives us: $$\left.\frac{d^2}{dt^2} f(\textbf{r}(t))\right|_{t=t^*} = \textbf{v}^T \textbf{H}_f \textbf{v} - \sum_{i=1}^n \lambda_\ell \textbf{v}^T \textbf{H}_{g_\ell} \textbf{v} \\ \implies \left.\frac{d^2}{dt^2} f(\textbf{r}(t))\right|_{t=t^*} = \textbf{v}^T \left(\textbf{H}_f - \sum_{i=1}^n \textbf{H}_{g_\ell} \right)\textbf{v} = \textbf{v}^T \left(\frac{\partial^2 \Lambda}{\partial \textbf{x}^2}\right) \textbf{v}$$

Now if $t^*$ is a maximum (or minimum), we should have $\left.\dfrac{d^2}{dt^2} f(\textbf{r}(t))\right|_{t=t^*} \leq 0 \; (\geq 0)$ with strict inequality if it's a strict maximum (or strict minimum) which translates to $\textbf{v}^T \left(\dfrac{\partial^2 \Lambda}{\partial \textbf{x}^2}\right) \textbf{v} \leq 0 \;(\geq 0) \; \forall \, \textbf{v} \in \text{null } \dfrac{\partial g(\textbf{x}^*)}{\partial\textbf{x}} $, once again with strict inequalities for strict maxima (or minima).

Now of course this will always be true if $\left(\dfrac{\partial^2 \Lambda}{\partial \textbf{x}^2}\right)$ is negative (or positive) definite.
Q1. Is it also true in the cases where it is semidefinite (why/why not)?
Also, it seems like definiteness is a much stronger than required condition since we only really need to show definiteness restricted to the set $\text{null } \dfrac{\partial g(\textbf{x}^*)}{\partial \textbf{x}}$.
Q2. Is there a quick and easy way to check definiteness on the restricted set?
Q3. I also don't understand how this translates to restrictions on signs on on the sequence of leading principal minors of $\textbf{H}_\Lambda$ which is what Wikipedia suggests. Could someone write me a proof?

Original Q&A

Second Order Conditions For Constrained Optimization

Related Questions in CALCULUS

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in OPTIMIZATION

Related Questions in LAGRANGE-MULTIPLIER

Trending Questions

Popular # Hahtags

Popular Questions