How to comput Hessian or second derivative using chain rule.

179 Views Asked by At

Suppose I have function $L=f(\mathbf{W}\mathbf{x})$, where $\mathbf{W}$ is a matrix, $\mathbf{x}$ is a vector, and the $f(\cdot)$ produces a scalar. I am wondering how to compute the Hessian of $L$ against $\mathbf{x}$.

The most hard part is that I am confusing to align the dimension, I do not know where to put $\mathbf{W}$ and whether the transpose is needed.

1

There are 1 best solutions below

2
On BEST ANSWER

We want to compute the gradient and Hessian of the function $L(x) = f(g(x))$, where $g(x) = Wx$. The derivative of $g$ is $g'(x) = W$. By the chain rule, $$ L'(x) = f'(g(x)) g'(x) = f'(Wx) W. $$ Note that $L'(x)$ is a row vector. If we use the convention that the gradient is a column vector, then the gradient $\nabla L(x)$ is the transpose of this row vector: $$ \nabla L(x) = L'(x)^T = W^T f'(Wx)^T = W^T \nabla f(Wx). $$

The Hessian $HL(x)$ is by definition the derivative of the function $h(x) = \nabla L(x)$. So, using the chain rule again, we obtain $$ HL(x) = W^T Hf(Wx) W. $$