Gradient of $f(W)=\sum_{j \neq {t}} \left[ \max\left(0, [ W x ]_j - \left[ W x \right]_{t} + \delta \right) \right] + \lambda \left\| W \right\|_F^2$

86 Views Asked by At

How to find the gradient of the following function \begin{align} f(W) := \sum_{j \neq {t}} \left[ \max\left(0, [ W x ]_j - \left[ W x \right]_{t} + \delta \right) \right] + \lambda \left\| W \right\|_F^2 \ , \end{align} w.r.t. $W \in \mathbb{R}^{m \times n}$ matrix, where $x \in \mathbb{R}^n$ and $\delta, \lambda$ are known variables. The $j$th element of a vector $y \in \mathbb{R}^n$ is denoted as $[y]_j$.

1

There are 1 best solutions below

4
On BEST ANSWER

The max function isn't differentiable, but you could replace it with a smooth-max approximation. Or you could try this...

Define the variables $$\eqalign{ y &= Wx, \quad dy = dW\,x \cr k &= t, \quad z = (1e_k^T)y = Ky \cr b &= (y-z + 1\delta) = (I-K)y + 1\delta \cr s &= {\rm sign}(b), \quad p = \tfrac{1}{2}(1+s) \cr a &= p\odot b = \max(0,b)\cr }$$ where max() and sign() are applied elementwise and the latter is defined as $${\rm sign}(\beta)=\begin{cases} +1 & \beta \ge 0 \\ -1 & \beta \lt 0 \end{cases}$$

Write the function in terms of these variables and find its gradient $$\eqalign{ f &= 1:a + \lambda W:W - \delta \cr &= p:b + \lambda W:W - \delta \cr \cr df &= p:db + 2\lambda W:dW \cr &= p:(I-K)dy + 2\lambda W:dW \cr &= p:(I-K)dW\,x + 2\lambda W:dW \cr &= \Big((I-K)^Tpx^T + 2\lambda W\Big):dW \cr \frac{\partial f}{\partial W} &= \Big(I-e_k1^T\Big)px^T + 2\lambda W \cr }$$ NB:  The above derivation uses the symbol {$\odot$} for the elementwise/Hadamard product, {$e_k$} for the standard basis vectors, {$1$} for the all-ones vector, and {:} for the trace/Frobenius product, i.e. $$\eqalign{ A:B = {\rm Tr}(A^TB) }$$ Instead of a smooth-max function, you could re-work the above approach with a smooth-sign function, e.g. $\tanh(\mu b)$ with a sufficiently large $\mu$-parameter. $$\lim_{\mu\to\infty}\tanh(\mu b) = {\rm sign}(b)$$