How do we define the Hessian of a function $f$ of a matrix $X$?

66 Views Asked by At

Suppose $f$ is a real-valued function of the matrix $X \in \mathbb{R}^{m \times n}$. We typically define the gradient of $f$, written $\nabla f(X)$, as the $m \times n$ matrix whose $(i,j)$th entry is the corresponding partial derivative of $f$, so that $$(\nabla f(X))_{ij} = \frac{\partial f}{\partial x_{ij}}(X).$$

So, how do you define the Hessian of $f$? Will this be some higher-dimensional tensor?

If I "flatten" $X$ into a vector (or more formally, fix a basis of $\mathbb{R}^{m \times n}$ and express $X$ using coordinates in that basis), then the gradient of $f$ becomes a vector again (written in the coordinates for that basis.) Then, I think we could just write the Hessian as a matrix, as usual for maps $f:\mathbb{R}^n \to \mathbb{R}$. Is this the correct approach?

1

There are 1 best solutions below

2
On

This doesn't address the question directly but is more like a long comment.

The main problem with the full Hessian is that it can contain a very large amount of coefficients if $m\times n$ is large itself.

Often, especially for large optimization problems, you might prefer to use the directional Hessian, which can be more tractable for large matrices and that is supported by algorithms (including scipy.optimize.minimize for example). If that's what you're looking for, then the directional Hessian (in direction $P\in\mathbb R^{m\times n}$) is defined as: $$\nabla^2_P(f)=\nabla(P:\nabla f)$$

By sampling for a few $P$, a well written algorithm can then get a good idea of the full Hessian without having to compute it entirely.