Hessian matrix of $ \log(1+\exp(a^Tx) )$

249 Views Asked by At

I need to compute the Hessian matrix of $ ( \log(1+\exp(a^Tx)) $ where $a$ is a vector of constants. I was able to compute the Hessian matrix for $ (1+\exp(a^Tx)) $ but not sure how to proceed. I know I should use the fact that I have a composition of functions but not sure exactly how.

3

There are 3 best solutions below

0
On BEST ANSWER

Define

$$ f : \mathbb{R}^N \rightarrow \mathbb{R} : x \mapsto \log\left(1 + \exp\left( a^{\top} x \right) \right), $$ where we adopt the convention that $a = \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_N \end{bmatrix} \in \mathbb{R}^N$ is a column vector (and all vectors are assumed to be column vectors in the following). For convenience, let $$T_1 : \mathbb{R} \rightarrow \mathbb{R} : \xi \mapsto \log(\xi)$$ and $$ T_2 : \mathbb{R}^N \rightarrow \mathbb{R} : x \mapsto 1 + \exp\left( a^{\top} x \right). $$ Then, $f = T_1 \circ T_2,$ and

  • $\nabla T_1(x) = \frac{1}{x} \in \mathbb{R};$
  • $\nabla^2 T_1(x) = - \frac{1}{x^2} \in \mathbb{R};$
  • $\nabla T_2(x) = a \exp\left( a^{\top} x \right) \in \mathbb{R}^{N};$
  • $\nabla^{2} T_2(x) = a a^{\top} \exp\left( a^{\top} x \right) \in \mathbb{R}^{N \times N}.$

Hence, $$\nabla f(x) = a \left[ \frac{\exp\left( a^{\top} x \right)}{1 + \exp\left( a^{\top} x \right)} \right] = \left[ 1 + \exp\left( a^{\top} x \right) \right]^{-1} a \exp\left( a^{\top} x \right) \in \mathbb{R}^{N} $$ and \begin{align*} \nabla^{2}f(x) &= -\left[ 1 + \exp\left( a^{\top} x \right) \right]^{-2} a a^{\top} \exp\left( a^{\top} x \right)^{2} + \left[ 1 + \exp\left( a^{\top} x \right) \right]^{-1} a a^{\top} \exp\left( a^{\top} x \right) \\ &= a a^{\top} \left( \left[ 1 + \exp\left( a^{\top} x \right) \right]^{-1} \exp\left( a^{\top} x \right) - \left[ 1 + \exp\left( a^{\top} x \right) \right]^{-2} \exp\left( 2 a^{\top} x \right) \right) \\ &= a a^{\top} \left( \frac{\exp\left( a^{\top} x \right)}{1 + \exp\left( a^{\top} x \right)} - \frac{\exp\left( 2 a^{\top} x \right)}{\left[ 1 + \exp\left( a^{\top} x \right) \right]^2} \right) \in \mathbb{R}^{N \times N}, \end{align*} where $$ a a^{\top} = \begin{bmatrix} a^{2}_1 & a_{1} a_2 & \cdots & a_{1} a_N \\ a_2 a_1 & a^{2}_2 & \cdots & a_2 a_N \\ \vdots & \vdots & \ddots & \vdots \\ a_N a_1 & a_N a_2 & \cdots & a^{2}_N \end{bmatrix}. $$

0
On

Call your given expression $f(x_1,...,x_n)$ Then $$\nabla f=[\frac{a_1\exp(\mathbf{a^Tx})}{1+\exp(\mathbf{a^Tx})},... ,\frac{a_n\exp(\mathbf{a^Tx})}{1+\exp(\mathbf{a^Tx})}]$$ Then each of the entries $f_{ij}$ of the Hessian matrix can be found by applyingthe quotient rule to calculate the appropriate second-order partial derivative.

0
On

The gradient of your function is easily found to be $$\mathbf{g} = \sigma(z) \mathbf{a} $$ where $z=\mathbf{a}^T\mathbf{x}$ and $\sigma(z)$ is the sigmoid function.

The differential follows \begin{eqnarray*} d\mathbf{g} &=& \sigma(z)[1-\sigma(z)] \mathbf{a} dz \\ &=& \sigma(z)[1-\sigma(z)] \mathbf{a} \mathbf{a}^T d\mathbf{x} \end{eqnarray*} The Hessian writes (as you found) \begin{equation} \mathbf{H} = \sigma(z)[1-\sigma(z)] \mathbf{a} \mathbf{a}^T \end{equation}