Calculate the gradient of a function that is written with abstract vectors

56 Views Asked by At

:)

I am supposed to calculate the gradient of the following function:

$$f(\mathbf{w})=\sum^{n}_{i=0}\log(1+\exp(-y_i\mathbf{w}^T\mathbf{x}_i))+\frac{1}{b}\sum^{n}_{i=0}w_i^4$$

Where $\mathbf{x} \in \mathbb{R}^d$, $\mathbf{w} \in \mathbb{R}^d$, $b$ some (real) constant $>0$ and $w_i$ denotes the i-th coordinate of the paramter $\mathbf{w}$.

Now, let's start with the second part, because it's easy:

$$\nabla \frac{1}{b}\sum^{n}_{i=0}w_i^4 = \frac{4}{b}\sum^{n}_{i=0}w_i^3$$

Using basic calculus. For the other part, I would just use the chain rule, differentiating $\log$ first, $\exp$ then etc. However, its the first time I have to do something like this using vector notation. I don't know how to "pull out" or use the $\mathbf{x}$ as inner differative.

Is it: $$\nabla \sum^{n}_{i=0}\log(1+\exp(-y_i\mathbf{w}^T\mathbf{x}_i)) = \sum^{n}_{i=0} (1+\exp(-y_i\mathbf{w}^T\mathbf{x}_i))^{-1} \cdot \exp(-y_i\mathbf{w}^T\mathbf{x}_i) \cdot -y_i\mathbf{x}_xi $$

?

Thanks for your help!

1

There are 1 best solutions below

0
On

Note that the $x_i\in\mathbb{R}^{d}$ are just columns of the full data matrix $X\in\mathbb{R}^{n\times d}$.

For convenience, define two new variable $$\eqalign{ z &= X^Tw \cr dz &= X^Tdw \cr\cr e &= \exp(-y\circ z) \cr de &= -e\circ y\circ dz \cr }$$ where $\circ$ denotes the elementwise (aka Hadamard) product.

Now write the function in terms of these new variables and find its differential $$\eqalign{ f &= 1:\Big(\log(1+e)+\frac{w^{\circ 4}}{b}\Big) \cr df &= 1:\Big(\frac{de}{1+e}+\frac{4w^{\circ 3}\circ dw}{b}\Big) \cr &= 1:\Big(-\frac{e\circ y\circ dz}{1+e}+\frac{4w^{\circ 3}\circ dw}{b}\Big) \cr &= \frac{-e\circ y}{1+e}:dz+\frac{4w^{\circ 3}}{b}:dw \cr &= \frac{-e\circ y}{1+e}:X^Tdw+\frac{4w^{\circ 3}}{b}:dw \cr &= \frac{4w^{\circ 3}}{b}:dw-X\Big(\frac{e\circ y}{1+e}\Big):dw \cr }$$ In the above,

  • a colon denotes the Frobenius Inner Product,
  • $w^{\circ 3}$ denote an elementwise (aka Hadamard) power,
  • and $\frac{y}{z}$ denotes the elementwise (Hadamard) division of the vector $y$ by the vector $z$.


Anyway, since $df=\big(\frac{\partial f}{\partial w}:dw\big),\,$ the gradient can be identified from that last line as $$\eqalign{ \frac{\partial f}{\partial w} &= \frac{4w^{\circ 3}}{b}-X\Big(\frac{e\circ y}{1+e}\Big) }$$