Gradient of summation involving max function

37 Views Asked by At

I'm having trouble figuring out how to start taking the gradient with respect to w ($\frac{\partial J(w)}{dw}$) of the following function:

$$J(w) = \sum_{i=1}^{M} max[0, -w^Tx_iy_i]$$ where $w \in \mathbb{R}^n, x_i \in \mathbb{R}^n,$ and $y_i \in \{-1,1\}$. Assume $w^Tx_i \neq 0, \forall i$

If the function was simply: $$J(w) = \sum_{i=1}^{M}-w^Tx_iy_i$$ then the answer would just be: $\sum_{i=1}^{M}-x_iy_i$ right? But surely this isn't how you would solve the first question correct?