Finding the gradient of $f(x) := \ln \left( 1 + e^{-ab^Tx} \right)$

281 Views Asked by At

Given $a \in \mathbb{R}$ and vector $b \in \mathbb{R}^n$, let the scalar field $f: \mathbb{R}^n \to \mathbb{R}$ be defined by $$ f(x) := \ln \left( 1 + e^{-ab^Tx} \right)$$ Find the gradient of $f$.


I'm attempting to apply the chain rule here but I'm finding it difficult to apply. The chain rule as I know it goes like:

for $h(x) = f(g(x))$

$$ \nabla h(x) = D f(x)^T \nabla g(f(x)) $$ where $f :\mathbb{R}^n \rightarrow \mathbb{R}^m$, $g: \mathbb{R}^m \rightarrow \mathbb{R}$ and $h : \mathbb{R}^n \rightarrow \mathbb{R}$

So I set, $g(x) = 1 + e^{-ab^Tx}$ and $f(y) = \ln(y)$. I'm not sure where to go with this now. My guess is that, $$Df(x)^T = \frac{1}{1+e^{-ab^Tx}}$$ and $$ \nabla g(f(x)) = -abe^{-ab^Tx}$$ So the final answer should be $$ \nabla f(x) = \frac{-abe^{-ab^Tx}}{e^{-ab^Tx}}$$

Is this correct? If wrong, any help to find the right answer would be appreciated. ${{}}$

2

There are 2 best solutions below

0
On BEST ANSWER

I like to think of the chain rule with the notation

$$ D (f \circ g)(x) = Df(g(x)) \circ Dg(x) $$ (See the Wikipedia page on the chain rule. $Df(x)$ means the Jacobian matrix of $f$ at the point $x$)

I would decompose your function as $F(x) = f(g(x))$ where $g(x) = b^T x$ and $f(y) = \ln(1+e^{ay})$. Since $g(x) = b_1x_1 + \cdots b_n x_n$, then $Dg(x) = \nabla g(x) = b$. Also, $Df(y) = \frac{df}{dy} = \frac{ae^{ay}}{1+e^{ay}}$. So I think you get

$$ \nabla F(x) = \frac{ae^{ab^Tx}}{1+e^{ab^Tx}} \, b $$

0
On

$ \def\a{\alpha} \def\b{\lambda} \def\l{\lambda} \def\bb{\l^{-1}} \def\o{{\tt1}} \def\LR#1{\left(#1\right)} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} $Define a new scalar variable $$\eqalign{ &\b = \o+ e^{-\a b^Tx} \\ &\LR{\b-\o} = e^{-\a b^Tx} &\qiq d\b &= e^{-\a b^Tx}\LR{-\a b^Tdx} \\ &&& = \LR{\o-\b}\LR{\a b^Tdx} \\ }$$ and use it to rewrite the function and calculate the gradient $$\eqalign{ f &= \log(\b) \\ df &= \bb\,d\b \\ &= \a\bb\LR{\o-\b}\,b^Tdx \\ \grad{f}{x} &= \a\bb\LR{\o-\b}b \\ &= \LR{\frac{-\a e^{-\a b^Tx}}{\o+e^{-\a b^Tx}}}b \\ }$$