See the book OPTIMIZATION: Algorithms and Applications by Rajesh Kumar Arora, Page-$56$.
... ...
The necessary conditions for $x^*$ to be a minimum are that $$∇f(x*) = 0 \qquad (3.1)$$
and $x^T Hx $ is positive definite $(x^T H x > 0)$. To ensure this, eigenvalues of $H$ are to be positive. Consider a two-variable function $$f_x = {x_1}^2 + {x_2}^2 - 2x_1 \qquad (3.2)$$
... ...
I have three questions:
- What is $Hx$?
- What does it mean by the term positive-definite?
- Why are (a) $\displaystyle ∇f(x*) = 0$, and (b) $\displaystyle x^T H x > 0$ necessary conditions for $x^*$ to be minimum?
The matrix $H$ is referred to as the Hessian matrix, i.e.
$$H = \begin{bmatrix} \dfrac{\partial^2 f}{\partial x_1^2} & \dfrac{\partial^2 f}{\partial x_1\,\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_1\,\partial x_n} \\[2.2ex] \dfrac{\partial^2 f}{\partial x_2\,\partial x_1} & \dfrac{\partial^2 f}{\partial x_2^2} & \cdots & \dfrac{\partial^2 f}{\partial x_2\,\partial x_n} \\[2.2ex] \vdots & \vdots & \ddots & \vdots \\[2.2ex] \dfrac{\partial^2 f}{\partial x_n\,\partial x_1} & \dfrac{\partial^2 f}{\partial x_n\,\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_n^2} \end{bmatrix}$$
Positive definite means that if you grab any vector $z$, the term $z^T H z$ is always positive for any non-zero $z$. Informally speaking, it is a way of saying that your matrix is "positive". To see that, take a scalar and say it is positive-definite, it just turns out that the scalar is positive.
In 1-dimensional functions, so that you have a local minimum around $x^*$, wiggling around $x^*$ should make your function larger, right? This means that we must have $$f(x^*+\epsilon )\gt f(x^*)\Rightarrow f(x^*+\epsilon )- f(x^*)\gt 0$$ Taylor series tells us that $$f(x^*+\epsilon)\approx f(x^*)+\frac{df}{dx}|_{x^*}\epsilon+\frac 12\frac{d^2f}{dx^2}|_{x^*}\epsilon^2+O(\epsilon^3)$$ We can substitute the approximation of $f(x^*+\epsilon)$ to the definition of minimum $$f(x^*+\epsilon )- f(x^*)=f(x^*)+\frac{df}{dx}|_{x^*}\epsilon+\frac 12\frac{d^2f}{dx^2}|_{x^*}\epsilon^2-f(x^*)\gt 0$$ $$\frac{df}{dx}|_{x^*}\epsilon+\frac 12\frac{d^2f}{dx^2}|_{x^*}\epsilon^2\gt 0$$
Now in $n-$dimensional, which is your case, (note that $\epsilon$ becomes a vector now) $$\nabla f(x^*)\epsilon+ \epsilon^T \frac 12\nabla^2 f(x^*)\epsilon\gt 0$$ We now use the assumption that $\nabla f(x^*)=0$, which gives $$\epsilon^T \nabla^2 f(x^*)\epsilon\gt 0$$ Since the term $\epsilon^T A \epsilon\gt 0$ is always positive definite for every $\epsilon$ the condition for local minimum becomes $$\nabla^2 f(x^*)\gt 0$$