Why do we find the determinant when finding extrema of multivariable functions?

1k Views Asked by At

So I know that when we are looking for the relative extrema of a function, f(x,y), we must eventually get the 2nd partial derivatives of f(x,y) (i.e. $f_{xx}(x,y)$, $f_{xy}(x,y)$, $f_{yx}(x,y)$, and $f_{yy}(x,y)$). We then must get the determinant of the matrix

$$ \begin{matrix} f_{xx}(x,y) & f_{xy}(x,y) \\ f_{yx}(x,y) & f_{yy}(x,y) \\ \end{matrix} $$

And then we use the determinant to help figure out what we need to know.

However, why do we need to find the determinant? What is the reasoning behind this?

1

There are 1 best solutions below

2
On BEST ANSWER

You don't want the determinant, really. It just turns out that the determinant (in addition to the values of the diagonal entries) is enough, in 2D, to tell you want you really want to know: the character of the eigenvalues of the Hessian matrix.

Consider what it means to be at a critical point $(x_0, y_0)$: you can Taylor-expand the function to get

$$f(x_0+\delta x, y_0 + \delta y) = f(x_0,y_0) + \nabla f \cdot (\delta x, \delta y) + \frac{1}{2}(\delta x,\delta y)^T Hf (\delta x, \delta y) + \ldots$$

When you are at a critical point, you have that $\nabla f = 0$, so unlike the usual case where the first-order term in Taylor's theorem dominates the others for small perturbations, we instead must look to the second-order information encoded in the Hessian: $$f(x_0+\delta x, y_0 + \delta y) = f(x_0,y_0) + \frac{1}{2}(\delta x,\delta y)^T Hf (\delta x, \delta y) + \ldots$$

If $(\delta x,\delta y)$ is an eigenvector of $Hf$ with positive eigenvalue, then moving in the $(\delta x,\delta y)$ direction increases $f$. If it is an eigenvector of $Hf$ with negative eigenvalue, then moving in its direction decreases $f$. It follows that if all of the eigenvalues are positive (ie, $Hf$ is positive-definite) then $(x_0,y_0)$ is a local minimum since moving a small distance in any direction increases the function. Similarly, if $Hf$ is negative-definite, you are at a local maximum, and if $Hf$ is indefinite, you are at a saddle point. (There is a fourth case, where $Hf$ is singular and you cannot conclude anything from the second-order information, and must look to the third-order term in the Taylor expansion. But this situation is uncommon for "generic" functions $f$.)

In 2D, by Sylvester's criterion you can determine if the matrix is positive-definite by looking at the diagonal entries and the determinant. This leads to the usual rules:

  • if the diagonal entries are positive, and $\det Hf>0$, you are at a local minimum;
  • if the diagonal entries are negative, and $\det Hf>0$, you are at a local maximum;
  • if $\det Hf < 0$ you are at a saddle point;
  • if $\det Hf = 0$, you cannot conclusively determine the character of the critical point without looking at the (rank-three) third derivative of $f$.