Question about "Derivative" v.s. "Gradient" v.s. "Hessian matrix"

296 Views Asked by At

I learned them from 'An Introduction to Optimization' by Edwin K. P. Chong and Stanislaw H. Zak.

  • Derivative of $f$ $$Df(x)=\left[\begin{matrix} \frac{{\partial}f}{{\partial}x_1} (x) & \cdots & \frac{{\partial}f}{{\partial}x_n} (x) \end{matrix}\right]$$

  • Gradient of $f$ $${\nabla}f(x)=Df(x)^\top=\left[\begin{matrix} \frac{{\partial}f}{{\partial}x_1} (x) \\ \vdots \\ \frac{{\partial}f}{{\partial}x_n} (x) \end{matrix}\right]$$

  • Hessian of $f$ $$F(x) = D^2f(x)=\left[\begin{matrix} \frac{{\partial^2}f}{{\partial}x_1^2} & \frac{{\partial^2}f}{{\partial}x_2{\partial}x_1} & \cdots & \frac{{\partial^2}f}{{\partial}x_n{\partial}x_1}\\ \frac{{\partial^2}f}{{\partial}x_1{\partial}x_2} & \frac{{\partial^2}f}{{\partial}x_2^2} & \cdots & \frac{{\partial^2}f}{{\partial}x_n{\partial}x_2}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{{\partial^2}f}{{\partial}x_1{\partial}x_n} & \frac{{\partial^2}f}{{\partial}x_2{\partial}x_n} & \cdots & \frac{{\partial^2}f}{{\partial}x_n^2}\\ \end{matrix}\right]$$

However, in wikipedia or wolfram, the Hessian matrix of $f$ is defined as $$H(x) = \left[\begin{matrix} \frac{{\partial^2}f}{{\partial}x_1^2} & \frac{{\partial^2}f}{{\partial}x_1{\partial}x_2} & \cdots & \frac{{\partial^2}f}{{\partial}x_1{\partial}x_n}\\ \frac{{\partial^2}f}{{\partial}x_2{\partial}x_1} & \frac{{\partial^2}f}{{\partial}x_2^2} & \cdots & \frac{{\partial^2}f}{{\partial}x_2{\partial}x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{{\partial^2}f}{{\partial}x_n{\partial}x_1} & \frac{{\partial^2}f}{{\partial}x_n{\partial}x_2} & \cdots & \frac{{\partial^2}f}{{\partial}x_n^2}\\ \end{matrix}\right]$$

That is, $H(x)=F(x)^\top$. What is the correct Hessian???

I know almost all functions have $\frac{{\partial}f}{{\partial}x_i{\partial}x_j}(x) = \frac{{\partial}f}{{\partial}x_j{\partial}x_i}(x)$.

However, there exist some functions that have $\frac{{\partial}f}{{\partial}x_i{\partial}x_j}(x) \ne \frac{{\partial}f}{{\partial}x_j{\partial}x_i}(x)$.

1

There are 1 best solutions below

0
On BEST ANSWER

Let $f$ be a scalar function. The derivative of $f$, denoted by $Df$, at a point $a$ is a covector, i.e. a linear functional that takes in vectors and spit out scalars. The gradient of $f$ at a point, denoted by $\nabla f$, is the dual vector of $Df$, i.e. $Df(\nabla f)=\langle Df, \nabla f\rangle = \| \nabla f\|^2$.

Same concept applies to the Hessian which is the dual tensor of your matrix.