Given $a \in \mathbb{R}$ and vector $b \in \mathbb{R}^n$, let the scalar field $f: \mathbb{R}^n \to \mathbb{R}$ be defined by $$ f(x) := \ln \left( 1 + e^{-ab^Tx} \right)$$ Find the gradient of $f$.
I'm attempting to apply the chain rule here but I'm finding it difficult to apply. The chain rule as I know it goes like:
for $h(x) = f(g(x))$
$$ \nabla h(x) = D f(x)^T \nabla g(f(x)) $$ where $f :\mathbb{R}^n \rightarrow \mathbb{R}^m$, $g: \mathbb{R}^m \rightarrow \mathbb{R}$ and $h : \mathbb{R}^n \rightarrow \mathbb{R}$
So I set, $g(x) = 1 + e^{-ab^Tx}$ and $f(y) = \ln(y)$. I'm not sure where to go with this now. My guess is that, $$Df(x)^T = \frac{1}{1+e^{-ab^Tx}}$$ and $$ \nabla g(f(x)) = -abe^{-ab^Tx}$$ So the final answer should be $$ \nabla f(x) = \frac{-abe^{-ab^Tx}}{e^{-ab^Tx}}$$
Is this correct? If wrong, any help to find the right answer would be appreciated. ${{}}$
I like to think of the chain rule with the notation
$$ D (f \circ g)(x) = Df(g(x)) \circ Dg(x) $$ (See the Wikipedia page on the chain rule. $Df(x)$ means the Jacobian matrix of $f$ at the point $x$)
I would decompose your function as $F(x) = f(g(x))$ where $g(x) = b^T x$ and $f(y) = \ln(1+e^{ay})$. Since $g(x) = b_1x_1 + \cdots b_n x_n$, then $Dg(x) = \nabla g(x) = b$. Also, $Df(y) = \frac{df}{dy} = \frac{ae^{ay}}{1+e^{ay}}$. So I think you get
$$ \nabla F(x) = \frac{ae^{ab^Tx}}{1+e^{ab^Tx}} \, b $$